key: cord-0844196-u0cdv1nj authors: nan title: Advanced Meta-Heuristics, Convolutional Neural Networks, and Feature Selectors for Efficient COVID-19 X-Ray Chest Image Classification date: 2021-02-22 journal: IEEE Access DOI: 10.1109/access.2021.3061058 sha: 9accd71d68cbe6c5817228f60093538595d18d33 doc_id: 844196 cord_uid: u0cdv1nj The chest X-ray is considered a significant clinical utility for basic examination and diagnosis. The human lung area can be affected by various infections, such as bacteria and viruses, leading to pneumonia. Efficient and reliable classification method facilities the diagnosis of such infections. Deep transfer learning has been introduced for pneumonia detection from chest X-rays in different models. However, there is still a need for further improvements in the feature extraction and advanced classification stages. This paper proposes a classification method with two stages to classify different cases from the chest X-ray images based on a proposed Advanced Squirrel Search Optimization Algorithm (ASSOA). The first stage is the feature learning and extraction processes based on a Convolutional Neural Network (CNN) model named ResNet-50 with image augmentation and dropout processes. The ASSOA algorithm is then applied to the extracted features for the feature selection process. Finally, the Multi-layer Perceptron (MLP) Neural Network’s connection weights are optimized by the proposed ASSOA algorithm (using the selected features) to classify input cases. A Kaggle chest X-ray images (Pneumonia) dataset consists of 5,863 X-rays is employed in the experiments. The proposed ASSOA algorithm is compared with the basic Squirrel Search (SS) optimization algorithm, Grey Wolf Optimizer (GWO), and Genetic Algorithm (GA) for feature selection to validate its efficiency. The proposed (ASSOA + MLP) is also compared with other classifiers, based on (SS + MLP), (GWO + MLP), and (GA + MLP), in performance metrics. The proposed (ASSOA + MLP) algorithm achieved a classification mean accuracy of (99.26%). The ASSOA + MLP algorithm also achieved a classification mean accuracy of (99.7%) for a chest X-ray COVID-19 dataset tested from GitHub. The results and statistical tests demonstrate the high effectiveness of the proposed method in determining the infected cases. limited performance, and to improve efficiency, they are usually consolidated. Traditional approaches focused on fetching geometric or handcrafted features are generally treated to reduce dimensionality, elapsed time, and redundancy features concerning extract salient features. Moreover, these methods suffer from failures that affect the classification accuracy. Hence, any improvements in the feature extraction step and the segmentation process are required [5] [6] [7] [8] . For image classification tasks, some traditional classification methods have also achieved excellent results in recent years [9] [10] [11] [12] ; however, the deep learning methods have some advantages over the traditional methods. CNN's (Convolutional Neural Networks) or pre-trained networks are commonly involved in different medical imaging tasks. They can offer rather good performance in analyzing high-resolution images as in abdominal X-rays. However, a need for sufficient amounts of the training dataset is a critical problem [13] [14] [15] . More training phases with large datasets are mainly required to apply the pre-trained networks for medical image classification tasks. Hence, in some cases, adopting these types of networks as classifiers are often not the preferred method to apply the CNN to the diagnosis tasks of CAD [16] [17] [18] . Using pre-trained networks (e.g., DCNNs), extra complicated datasets such as pneumonia's presence/absence did not seem good. Therefore, more data augmentation samples and training can improve the efficiency [19] , [20] . While identification systems based on CNN provide greater precision for various tasks, the key downside of these methods is the need for heavy training [21] [22] [23] . Hence, machine learning increases rapidly, which has caused many technical breakthroughs and is extensively employed in many fields. As a significant part of machine learning tasks, optimization has attracted much attention in several research areas. With the expedited growth of the amount of employed data and the increase of design complexity, optimization approaches in machine learning face more further challenges. For a specific problem, optimization can be the most reliable solution between all available solutions, especially towards multi-dimensional space [24] . Practically, this involves maximization or minimization of an objective function. The objective function defines the solution candidate's quality and efficiency represented by a particular vector in a search domain. There are two classes of optimization: nonlinear versus linear [25] , [26] . Meta-heuristic algorithms are considered among the most powerful methods for solving real-world engineering problems [27] . Most of these algorithms' derivation is done from physical algorithms' rational behavior in nature, biological inspired algorithms' behavior, swarm particles' collective intelligence, and evolutionary algorithms' fittest theory of survival [28] , [29] . These optimization techniques provide acceptable solutions in a reasonable time with less computational effort. They are mostly used in engineering and science for finding solutions to complex and challenging problems because: a) of their utilization in different issues that come under other subjects, b) of no requirement of gradient information, c) they can bypass local optima, and d) they are easy to be implemented and are dependent on comparatively simple concepts. This paper proposes a classification method to classify infected cases from the chest X-ray images. The method can decrease detection costs significantly. First, a feature learning stage is developed using the CNN model named ResNet-50 with image augmentation as a pre-processing and dropout as post-processing. Second, the features are extracted to start the feature selection process. A proposed Advanced Squirrel Search Optimization Algorithm (ASSOA) is developed for feature selection. The advanced classification stage starts to classify the infected cases using the optimized Multilayer Perceptron Neural Network (MLP) by the proposed ASSOA algorithm. ASSOA's basic rule in the classification stage is to optimize the connection weights of MLP to improve accuracy. A dataset from Kaggle, chest X-ray images (Pneumonia) dataset [30] consists of 5,863 X-ray images used in experiments. A chest X-ray COVID-19 dataset [31] is also tested in the experiments. The proposed ASSOA algorithm is compared with the basic Squirrel Search (SS) optimization algorithm [32] , Grey Wolf Optimizer (GWO) [29] , and Genetic Algorithm (GA) [33] for feature selection to test its efficiency. The ASSOA + MLP algorithm is also compared with other classifiers, based on (SS + MLP), (GWO + MLP), and (GA + MLP), in performance metrics. Moreover, Wilcoxon ranksum and one-way analysis of variance (ANOVA) are tested to statistically verify the proposed algorithm's superiority. This paper's main contributions are as follows: • An Advanced Squirrel Search Optimization Algorithm (ASSOA) is developed for feature extraction and classification. • The proposed ASSOA algorithm adds horizontal, vertical, diagonal, and exponential movements to the basic moves in the search process of the basic SS algorithm. • A new agents' relocation equation is modeled in the proposed ASSOA algorithm, affecting local and global optima under specific conditions. • A classification method for chest X-ray images is proposed based on the ASSOA algorithm. • The classification method is tested using a dataset from Kaggle with 5,863 chest X-ray images. • The classification method is also tested for a chest X-ray COVID-19 dataset from GitHub. • Wilcoxon rank-sum and ANOVA statistical tests are performed to ensure the proposed ASSOA algorithm quality. The next sections of this paper are as follows: Section II presents the related works. The materials and methods used in the study are defined in Section III. In-depth, Section IV describes the proposed method and the ASSOA algorithm. The experimental results are shown in Section V. Section IV discusses the proposed method findings. The research conclusions are seen in section VII. Large datasets availability and the recent advances in deep learning models have led to the possession of power-assisted algorithms, which beats the medical professionals in various clinical image resolution. These images are such as cancer classification [34] , detection of arrhythmia [35] , [36] , identification of haemorrhage [37] , and diagnosis/detection of diabetic retinopathy [38] . Using radiography, the automated diagnosis of chest diseases has gained a lot of enthusiasm and interest. Several CNN models' efficiency on various oddities certainly does not do well with all abnormalities, deep-learning approaches, and ensemble models may improve classification accuracy considerably reviewed to other Versions. Statistical dependence was studied between the precision levels of the predictions and the Multi-label Disease Classification (MDC). In the literature, the detection of health conditions from chest X-ray images was performed based on different methods [39] [40] [41] . The processes for Xray cardiovascular angiogram images are proposed in the literature [42] , [43] . Recent research has implemented several automatic pneumonia detection systems based on chest X-rays [44] , [45] . Deep learning is applied for the training AI algorithms to detect pneumonia by studying chest X-ray images [46] . In terms of accuracy, Chhikara et al. achieved an accuracy of (90.1%) in [47] using 5,866 chest X-ray images compared to the latest models of classification. The CNN model proposed by Okeke Stephen et al. in [34] , was constructed by extracting characteristics from the images of chest X-ray to test the existence of pneumonia. The authors in that model deployed multiple data augmentation algorithms to enhance both validation and classification accuracy of their model to achieve an accuracy of (93.73%). An AI approach to diagnosing COVID-19 and other types of pneumonia is already developed in [48] . For COVID-19, their proposed method achieved an AUC (area under the curve) of (0.981) and accuracy of (92.49%). Butt et al. in [49] A CNN model called ResNet-18 was proposed to classify the CT images as COVID-19, regular, and pneumonia. With an AUC value of (0.996), they can achieve an accuracy of (86.7%). Authors in [50] The nCOVnet, based on deep learning, was proposed to detect COVID-19 by analyzing patients' X-ray images. Their nCOVnet system obtained an AUC of (0.881) and a COVID-19 accuracy of (88.10%). Nour et al. [51] using X-ray images, a CNN model trained from scratch was suggested. The model's extracted features fed K-NN, SVM, and decision tree in their model. The SVM classifier achieved an accuracy of (98.97%). Hu et al. in [52] A weakly-supervised CNN model was proposed, which achieved an accuracy of (96.2%) with an AUC value of (0.970). To classify the chest's x-ray images into COVID-19 or non-COVID-19, an ML-method in [53] was proposed. A Manta-Ray Foraging Optimization technique, using differential evolution, was developed for feature selection. The authors evaluated their method by testing two COVID-19 x-ray datasets. The recent machine learning research for CT and X-ray images is summarized in Table 1 . Therefore, building a classification method for various infections is one of the most critical issues prohibitively expensive for mass adoption. Deep transfer learning has been introduced for pneumonia detection from chest X-rays in different literature models. However, there is still a need for more improvements in the feature extraction and classification stages. This section introduces the chest X-ray datasets used in this paper and will also discuss the essential CNN deep transfer learning, multilayer perceptron neural network, and the original Squirrel Search (SS) optimization algorithm. Chest (Pneumonia) X-ray images from Kaggle dataset [30] has been used. In (JPEG) format, the dataset has 5,863 X-rays. It is classified into two cases, either normal or pneumonia. In this paper, The Kaggle dataset has been selected because it is used in many forms of research globally and makes comparisons that can enrich scientific research. Figure 1 (a) shows image samples of normal Pneumonia-free cases, while Fig. 1(b) and Fig. 1 (c) present Pneumonia image samples (Bacteria and Viral cases), respectively. Another chest X-ray COVID-19 dataset [31] is also tested in the experiments and image samples are shown in Fig. 2 . Besides indirect collection from hospitals and physicians, the COVID-19 dataset is obtained from public sources. In the GitHub repo, all data and images are released publicly. The tested dataset's project was accepted by the Ethics Committee of the University of Montreal #CERSES-20-058-D. In traditional learning, the model is isolated and based mainly on specific tasks and particular datasets [54] . The knowledge, in this learning, cannot be transferred from one model to another. In the transfer learning, knowledge, such as features and weights, can be transferred from the pre-trained model to new training models and different problems that may have fewer data. Transfer Learning is usually applied in various models for a dataset with less data than the dataset used to train the model. Multitask learning allows several tasks to learn simultaneously, which can help the model receive multiple tasks at once. The learner initially may have no idea about the target task [47] . CNN's [55] , a deep neural network, is known to be ideal for image processing applications and can achieve greater precision in the subject of disease classification than conventional approaches. It can thus be used in applications such as clustering, detecting objects, and classifying images. Several CNN models have recently been introduced, such as AlexNet, [56] , VGGNet [57] , GoogLeNet [58] , Spotmole [59] and ResNet [60] . Convolution models used in the CNN models have different layers; higher classification accuracy is achieved if the number of convolution layers increases. [61] [62] [63] . Residual Network (ResNet) is known as an efficient CNN model [60] . The ResNet model was declared in 2016 to be the best paper at the Computer Vision and Pattern Recognition Conference (CVPR 06). The ResNet concept is based on the assumption that only a residual correction of the previous layer should be a deeper network training that can function efficiently, not transforming the whole feature space. The main idea of ResNets is not to learn the mapping from x → F(x), but instead learns the mapping from for input x, have the same dimension, G(x) = x function is identity and the connection is identity. ResNet, without exploding and gradient vanishing issues, has much deeper neural network training. Feed-forward neural networks are considered supervised machine learning methods consisting of neurons distributed over fully connected layers. The first (input) layer maps the network input variables, and the last layer is the output one. Layers between the first and last layers are called hidden layers [64] , [65] . Multilayer perceptron (MLP) is a common type of feed-forward network. The neurons interconnect in MLP, are one-directional fashion. The weights of the connections are within [−1, 1]. Figure 3 shows the MLP network, which includes one layer between input and output. To calculate the node output value, the weighted sum is firstly calculated as follows: where I i represents input variable i and w ij indicates connection weight between I i and neuron j in the hidden layer. β j is bias value for this layer. By applying the sigmoid activation function, which is the mostly applied, node j output is defined as Based on the value of f j (S j ) for all hidden layer neurons, the following equation can define the network output: where w jk indicates weights between neuron j in the hidden layer and output node k and β k is the bias value for the output layer. The Squirrel Search (SS) basic optimization algorithm simulates the search process of flying squirrels [32] . The SS algorithm considers that the squirrels are moving between three kinds of trees named normal, oak, and hickory trees. The oak and hickory trees are the nuts food source, while normal trees have no food source. Mathematically, the SS algorithm assumes the squirrels are flying to search for three oak trees and one hickory tree as nutritious food resources N fs available for n flying squirrels (FS). The flying agents' locations is in matrix form as follows: where FS i,j indicates i th flying squirrel in the j th dimension for i ∈ 1, 2, 3, . . . , n and j ∈ 1, 2, 3, . . . , d. The initial locations of FS i,j are uniform distribution within lower and upper bounds. The fitness values f = f 1 , f 2 , f 3 , . . . , f n are calculated for each flying squirrel as in the following array . . . f n (FS n,1 , FS n,2 , FS n,3 , . . . , FS n,d ) where the fitness value indicates the food source quality searched by each flying squirrel. The optimal value means a hickory tree. These values are then reordered in ascending order. The first best solution in declared to be FS ht on the hickory nut tree followed by three best solutions that are considered to be FS at on the acorn nuts trees. The remaining solutions are supposed to be FS nt on normal trees. New location generation mathematically for each flying squirrel is considered as one of the three following cases: Case 1: Location of FS at and moving to the hickory nut tree: Case 2: Location of FS nt and moving to the acorn nut trees: Case 3: Location of FS nt and moving to the hickory nut tree: where R 1 , R 2 , and R 3 are random numbers ∈ [0, 1]. The d g parameter is random distance for gliding and t indicates the current iteration. G c is equal to 1.9 and it is constant to achieve exploration and exploitation balance, and the value of P dp probability is equal to 0.1 for the three cases. The seasonal constant (S c ) is calculated from the following equation to check the monitoring condition (S t c < S min ) as where t is the current iteration and t m represents the maximum iteration value. The value of S min can affect the algorithm exploration and exploitation capabilities during iterations. If specific condition is occurred, such flying squirrels's relocation is modeled by Eq. 11 where the distribution Levy helps in encouraging better search space exploration. The calculation of the Levy flight is as follows: where the parameters r a and r b are random in [0,1]. β is equal to 1.5 in the SS algorithm and σ is calculated as where (x) = (x − 1)!. The basic Squirrel Search (SS) optimization algorithm is explained step by step in Algorithm 1. The proposed classification method consists of two stages. The first stage has a feature engineering process, including image augmentation, CNN training using the ResNet-50 model, transfer learning, and dropout. The proposed ASSOA algorithm is then applied to select features from the ResNet-50 model's extracted features. The second stage involves the classification process to classify cases in which the MLP is optimized by the proposed ASSOA algorithm (ASSOA + MLP). The ResNet-50 model is applied in this stage as a part of the proposed method for features extraction from the chest Xray images in the fully connected layer by altering the nodes Algorithm 1 Basic SS Optimization Algorithm [32] 1: Initialize SS population FS i (i = 1, 2, . . . , n) with size n using Eq. (4), maximum iterations t m , and fitness function F n . 2: Initialize SS parameters R 1 , R 2 , R 3 , n 1 , n 2 , n 3 , P dp , G c , t = 1 3: Calculate fitness function F n for each FS i using Eq. (5) 4: Sort flying squirrels locations in ascending order 5: Find the first best individual FS ht 6: Find the next three best individuals FS at 7: Find the normal individuals FS nt 8: while t ≤ t m (Stopping condition) do 9: for (t = 1 : n 1 ) do 10: if (R 1 ≥ P dp ) then 11 : 12: else 13: FS t+1 at = Random location 14: end if 15: end for 16: for (t = 1 : n 2 ) do 17: if (R 2 ≥ P dp ) then 18 : 19: else 20: FS t+1 nt = Random location 21: end if 22: end for 23: for (t = 1 : n 3 ) do 24: if (R 3 ≥ P dp ) then 25 : else 27: FS t+1 nt = Random location 28: end if 29: end for 30: Calculate seasonal constant (S t c ) using Eq. (9) 31: Calculate minimum value of seasonal constant (S min ) using Eq. (10) 32: if (S t c < S min ) then 33 : Set t = t + 1 37: end while 38: Return optimal solution FS ht and doing a fine-tuning based on the input dataset. Each input image is resized to 224 × 224 pixels to be suitable for the model. Then, the Min-Max-Scalar is used to normalize the ith input image x i to a scale from 0 to 1 by applying the following equation. After the resizing and normalization, the output image x i is used as input to the CNN model. The adopted CNN structure of the number of filters and layers and the related specifi- cations are identical to the ResNet-50 model. This model focuses on classifying input case categories. To reduce the overfitting problem during network learning, two regularization techniques of dropout and image augmentation have been applied in this research. The dropout is applied during the training procedure of CNN, and image augmentation [66] is used for the X-ray images' input images. Data Augmentation is applied to improve the quality and size of the training datasets. The proposed ASSOA algorithm adds horizontal, vertical, diagonal, and exponential movements to the basic moves in the search process of flying squirrels, as shown in Fig. 4 . The ASSOA algorithm considers, as in the basic SS algorithm, that the squirrels are moving between three kinds of trees named normal, oak, and hickory trees. The nuts food sources are the oak and hickory trees, while there are no food sources on the other trees. Mathematically, the ASSOA algorithm assumes the squirrels are flying in directions shown in Fig. 4 to search for one hickory tree, the best solution, and three oak trees, next best solutions, as nutritious food resources N fs available for n flying squirrels (FS). The following matrices represent the flying squirrels' locations and velocities: where FS i,j indicates i th flying squirrel location in the j th dimension for i ∈ 1, 2, 3, . . . , n and j ∈ 1, 2, 3, . . . , d. V i,j indicates i th flying squirrel velocity in the j th dimension for i ∈ 1, 2, 3, . . . , n and j ∈ 1, 2, 3, . . . , d. The initial locations of FS i,j are uniform distribution within lower and upper bounds. The fitness values f = f 1 , f 2 , f 3 , . . . , f n are calculated for each flying squirrel as in Eq. 5. The optimal value means a hickory tree. These values are then sorted in ascending order. The first best solution in declared to be FS ht on the hickory nut tree followed by three best solutions that are considered to be FS at on the acorn nuts trees. The remaining solutions are supposed to be FS nt on normal trees. In the ASSOA algorithm, the new location generation for each flying squirrel is updated as in the following cases. For a random value p, the following cases will be applied if p ≥ 0.5: Case 1: Location of FS at and moving to the hickory nut tree: Case 2: Location of FS nt and moving to the acorn nut trees: Case 3: Location of FS nt and moving to the hickory nut tree: where R 1 , R 2 , and R 3 are random numbers ∈ [0, 1]. The d g parameter is random distance for gliding and t indicates the current iteration. G c is equal to 1.9 and it is constant to achieve the exploration and exploitation balance, and the value of P dp probability is equal to 0.1 for the three cases. For the random value p, the following cases will be applied if p < 0.5: VOLUME 9, 2021 where c 1 , c 2 , r, P a , and a are random numbers ∈ [0, 1]. In case of choosing a random agent FS t rand from the normal agents FS t nt , the fitness value F n (FS t rand ) for FS t rand and F n (FS t nt for FS t nt will be calculated to decide about the horizontal and vertical movement. In case of F n (FS t rand ) < F n (FS t nt ), the movement will be vertically and it will be horizontally otherwise as follow Case 5: Location of FS nt and moving vertically or horizontally based on the fitness value F n (FS t rand ): where c 3 is a random number ∈ [0, 1]. The last case will be applied if the condition of the horizontal and vertical movement is not achieved. Case 6: Location of FS nt and moving will be exponentially: The seasonal constant (S c ) and the minimal value of the seasonal constant S min are calculated from Eq. 9 and Eq. 10 to check the monitoring condition (S t c < S min ) for t is the current iteration and t m indicates iterations maximum value. The value of S min can affect the algorithm exploration and exploitation capabilities during iterations. If specific condition is occurred, such flying squirrels's relocation is modeled by Eq. 23 which has the effect on local and global optima as shown in Fig. 5 : The proposed ASSOA algorithm is explained step by step in Algorithm 2. The proposed algorithm's computational complexity will be discussed as shown in Algorithm 2. Let the number of population be n = n 1 + n 2 + n 3 ; the maximum number of iterations be t m . For parts of the ASSOA algorithm, the time complexity will be defined as in the following points: • Initialize of ASSOA population: O (1). • Initialize of ASSOA parameters R 1 , R 2 , R 3 , n 1 , n 2 , n 3 , P dp , G c , c 1 , c 2 , c 3 , r, b, P a , P d , a, d, p: O (1). • Calculate fitness function for each agent: O (n). • Sorting agents in ascending order: O (n). • Finding first best individual, next three best individuals, normal individuals: O (n). (16), maximum iterations t m , and fitness function F n . 2: Initialize ASSOA parameters R 1 , R 2 , R 3 , n 1 , n 2 , n 3 , P dp , G c , c 1 , c 2 , c 3 , r, b, P a , P d , a, d, p, t = 1 3: Calculate fitness function F n for each FS i using Eq. (5) and Sort flying squirrels locations in ascending order 4: Find the first best individual FS ht , the next three best individuals FS at , the normal individuals FS nt 5: while t ≤ t m (Stopping condition) do 6: if (p ≥ 0.5) then 7: for (t = 1 : n 1 ) do 8: if (R 1 ≥ P dp ) then 9 : 10: else 11: FS t+1 at = Random location 12: end if 13: end for 14: for (t = 1 : n 2 ) do 15: if (R 2 ≥ P dp ) then 16 : 17: else 18: FS t+1 nt = Random location 19: end if 20: end for 21: for (t = 1 : n 3 ) do 22: if (R 3 ≥ P dp ) then 23 : else 25: FS t+1 nt = Random location 26: end if 27: end for 28: else 29: if (P a < a) then 30 : else 32: Choose random agent FS t rand from normal agents FS t nt 33: if (P d < d) then 34: Calculate fitness function F n (FS t rand ) for FS t rand 35: if (F n (FS t rand ) < F n (FS t nt )) then 36 Calculate seasonal constant (S t c ) using Eq. (9) 46: Calculate minimum value of seasonal constant (S min ) using Eq. (10) 47: if (S t c < S min ) then 48 : where FS (t+1) d is the binary position at iteration t of d dimension. The Sigmoid function scales the continuous values to be zero or one. Sigmoid(x) ≥ 0.5 condition is employed here to filter the values to be o or 1. the x value indicates the best solution of the algorithm which is denoted as FS ht in Algorithm 2. The fitness function measures the optimizer solutions' quality. The function is dependent on the classification error rate and the selected features. The excellent solution corresponds to a set of features that give lower features and classification error rate. To evaluate the solution quality, Eq. 25 can be employed where Err(O) indicates the optimizer error rate, s denotes the set of features selected by the optimizer, f denotes the features' total number. The h 1 ∈ [0, 1], h 2 = 1 − h 1 values manage the importance of the error rate of classification process and the selected feature number. There are three scenarios in the experiments. The first scenario shows the effectiveness of four CNN models for classifying the chest X-ray cases and offers the importance of features extraction for the next stage. The second scenario is designed to test and compare the proposed ASSOA algorithm to other optimization algorithms for feature selection. The third scenario is conducted to test the proposed ASSOA algorithm's ability as a classifier for improving the classification accuracy based on MLP. Wilcoxon's rank-sum test is performed to verify the proposed algorithm's superiority statistically. For the chest X-ray datasets, the images are separated randomly into training images of (60%), validation images of (20%), testing images of (20%). The data in the training process is used to train the CNN model. In contrast, the validation process data is applied for verification purposes, and the testing data evaluated the efficiency of the proposed method for the unknown chest X-ray cases. The classification accuracy of the four CNN models namely AlexNet [56] , VGGNet [57] , GoogLeNet [58] , and ResNet-50 [60] is claculated in this scenario for the tested chest X-ray dataset. Let TP indicates true-positive value, FP represents false-positive value, TN indicates true-negative value, and FN represents false-negative value. The performance metrics, such as accuracy, precision, and F-score [29] , are calculated to measure the classification performance of the CNN models as shown in Table 2 . The results of this scenario including the required CPU time are shown in Table 3 . Table 4 presents the settings of the CNN experimental setup in this scenario. The default parameters are used in this case since the current stage is employed for feature extraction of the chest X-ray images from the CNN model to be used for the next scenario. The highest accuracy achieved in this case, for the X-ray images, is (91.0%) by the ResNet-50 model with an F-score of (89.2%) and required time of (203) seconds. According to the promising performance of the ResNet-50 model, a set of features is extracted from the model's earlier layers since the model accuracy should be improved for the critical cases. In the second scenario, these features are employed to extract the best classification features by the proposed ASSOA algorithm. In this scenario, the efficiency of feature selection by the proposed ASSOA algorithm is investigated. ASSOA algorithm performance is compared with the basic Squirrel Search (SS) optimization algorithm [32] , Grey Wolf Optimizer (GWO) [29] , and Genetic Algorithm (GA) [33] based on performance metrics shown in Table 5 . Let M be the number of runs of an optimizer; g * j represents the best solution at the run number j; size(g * j ) is the size of the vector g * j . N is the number of tested points; C i is the classifier's output label for a point i; L i is the class's label for a point i; the total number of features (D); and matching between two inputs is calculated by Match function. The metrics used in this scenario are average error, select size, fitness, best and worst fitness, and standard deviation fitness. ASSOA algorithm configuration setting is shown in Table 6 . h 1 parameter in the objective function is assigned to 0.99 and h 2 parameter to 0.01. The configuration of the SS, GWO, GA algorithms, including the number of iterations, agents, and parameters, is shown in Table 7 . Table 8 shows the ASSOA, SS, GWO, and GA algorithms' output results in this scenario. For the displayed results, if the optimizer can select the proper set of features, the error is minimized. ASSOA can achieve a minimum average error of (0.2113) for feature selection. Based on the tested problem's minimum error, the ASSOA algorithm is the best, and GA is the worst. This means that the proposed ASSOA algorithm achieved better results than the original SS algorithm. In terms of standard deviation, the ASSOA algorithm has the lowest value than other algorithms that indicate the algorithm's robustness and stability. Figure 6 shows the ASSOA convergence curve compared to different algorithms. The figure demonstrates the optimizer exploitation capability and the ability of the algorithm to avoid local optima. The figure results show the reliability and robustness of the ASSOA algorithm to get the optimal set of features in a minimum time. The p-values of the ASSOA algorithm are tested compared to SS, GWO, and GA algorithms by Wilcoxon's rank-sum test. The employed test can get if there is a significant difference between the ASSOA algorithm and other algorithms. If the p-value < 0.05, this indicates that the ASSOA algorithm results are significantly different from other algorithms. If p-value > 0.05, this indicates that the algorithm results have no significant difference. The p-value results in this scenario are shown in Table 9 . Results show that the p-values are less than 0.05, proving the superiority of the proposed VOLUME 9, 2021 ASSOA algorithm and that the algorithm has statistically significant. The last scenario checks the classification accuracy of the ASSOA algorithm based on MLP (ASSOA + MLP) in comparison with other algorithms of SS + MLP, GWO + MLP, and GA + MLP. The classification performance is tested for chest X-ray cases and other cases based on chest X-ray COVID-19. The configuration of the proposed ASSOA algorithm and the compared algorithms are shown in Table 6 and Table 7 , respectively. Metrics of the classification performance used in this scenario are presented in Table 2 . The results of the ASSOA + MLP algorithm and other algorithms regarding accuracy are shown in Table 10 . The proposed algorithm (ASSOA + MLP) from the descriptive statistics, as shown in Table 10 , can achieve a mean accuracy of (99.26%) and a standard deviation of (0.001098) within (135) seconds to classify a new input X-ray chest image which outperforms other algorithms. The ROC curves of the proposed ASSOA algorithm based on MLP versus the compared classification algorithms are shown in Figure 7 . From this figure, the proposed algorithm can highly distinguish among the X-ray chest images with a high AUC value equal to (0.9875). The Box plot accuracy and Histogram of accuracy are also tested, and the output figures are shown in Figures 8 and 9 . These figures show the stability and consistency of the proposed algorithm for the classification of different cases. Wilcoxon's rank-sum and ANOVA tests are performed in this scenario to get the ASSOA + MLP algorithm's p-values compared to SS + MLP, GWO + MLP, GA + MLP classification algorithms. These tests can indicate the significant difference between the ASSOA + MLP algorithm results and compared algorithms. The output p-values are shown in Table 12 for Wilcoxon's rank-sum test, and in Table 11 for ANOVA test. Note that the p-values are less than 0.05, which indicates the superiority of the ASSOA + MLP algorithm and that the algorithm is statistically significant. The possible problems can be observed from the residual values, and residual plots rather than the original dataset plot since some datasets are not good candidates for classification. The ideal case is achieved if the residual values are equally and randomly spaced around the horizontal axis. The residual value can be calculated as (Actual value -Predicted value) with the mean and sum of the residuals are equal to zero. A residual plot is used to present the vertical axis's residual values and the independent variable on the horizontal axis. Figure 10 shows the residual plot. A linear or a nonlinear model can be decided from plot patterns in a residual plot, and an appropriate one can be determined. The homogeneity of variance (heteroscedasticity) provides a visual examination between the prediction errors and the predicted dependent variable scores. The heteroscedasticity plot, shown in Figure 12 , can quickly determine any violation and easily improve the research findings' accuracy. Homoscedasticity describes a situation in which the error term (random disturbance in the relationship between the dependent variable and the independent variables, or noise) is the same across the independent variables' values. The quantile-quantile (QQ) plot is also shown in Figure 12 . It is known as a probability plot. It is mainly used by plotting their quantiles against each other to compare two probability distributions. It is noted from the figure that the points distributions in the QQ plot are approximately fit on the line. Thus, the actual and the predicted residuals are linearly related, confirming the proposed ASSOA + MLP classifier's performance to identify the chest X-ray images. A chest X-ray COVID-19 dataset [31] is tested in the experiments to test the performance of the proposed ASSOA + MLP algorithm for the classification of chest X-ray COVID-19 cases. The output descriptive statistics of this experiment are shown in Table 13 . The proposed (ASSOA + MLP) algorithm achieved a mean accuracy of (99.7%) for the COVID-19 dataset. The mean accuracy of compared algorithms of SS + MLP, GWO + MLP, and GA + MLP are (99.1%), (97.1%), and (95.9%), respectively. These results show that the proposed algorithm can improve the classification accuracy of COVID-19 patients from their chest X-ray images. The Box plot accuracy is tested, and the output figure is shown in Fig. 11 . This figure shows the stability and consistency of the proposed algorithm for the classification of COVID-19 cases. ANOVA test is also performed for this experiment to test the ASSOA + MLP algorithm's p-values compared to SS + MLP, GWO + MLP, GA + MLP classification algorithms. The output p-values are shown in Table 11 for the ANOVA test. Note that the p-values are less than 0.05, which indicates the superiority of the ASSOA + MLP algorithm and that the algorithm is statistically significant. The experiments are divided into three different scenarios to assess the proposed method performance to classify chest X-ray images. According to the promising performance, the first scenario shows that the features can be extracted from the earlier layers of the ResNet-50 model. The extracted features are fed to the next scenario for feature selection. The second scenario shows the robustness and reliability of the ASSOA algorithm in finding the optimal subset of features in a reasonable amount of time. In this scenario, VOLUME 9, 2021 Wilcoxon's rank-sum test emphasizes the superiority of the proposed ASSOA algorithm and shows that the algorithm is statistically significant. In the third scenario, the experiments show that the proposed algorithm (ASSOA + MLP) can achieve a mean accuracy of (99.26%) and an AUC value equal to (0.9875) within (135) seconds to classify a new input X-ray chest image which outperforms other algorithms. The ASSOA + MLP algorithm also achieved a classification mean accuracy of (99.7%) for a chest X-ray COVID-19 dataset. Wilcoxon's rank-sum and ANOVA tests confirm the proposed algorithm's superiority and that the algorithm is statistically significant. The results and statistical tests demonstrate the high effectiveness of the proposed method in determining the infected cases. Developing a classification model for diagnosing infected cases is considered one of the most critical problems, which is still much too pricey for the mass selection. This paper proposes a classification model to detect the afflicted instances from the chest X-ray images, which may dramatically minimize the diagnosis prices, particularly in cultivating nations. The training and feature extraction processes are based on a convolutional neural network (CNN) based model (ResNet50) with fine-tuning and image augmentation. The X-ray images' classification to viral, normal, and bacterial, and popular scenarios are based upon an MLP neural network along with the proposed ASSOA algorithm. In this work, the chest X-ray images (Pneumonia) dataset composed of 5,863 X-ray images are utilized in the experiments. In the proposed model, a transfer learning technique is applied during the training stage and feature extraction. Experimental results show the proposed classification model's efficiency in classifying the afflicted situations and a mean accuracy of (99.26%), which surpasses the cuttingedge strategies discovered in the literature. The proposed (ASSOA + MLP) algorithm also achieved a classification mean accuracy of (99.7%) for another chest X-ray COVID-19 dataset. MARWA METWALLY EID received the Ph.D. degree in electronics and communications engineering from the Faculty of Engineering, Mansoura University, Egypt, in 2015. Since 2011, she has been working as an Assistant Professor with the Delta Higher Institute for Engineering and Technology. Her current research interests include image processing, encryption, wireless communication systems, and field programmable gate array (FPGA) applications. VOLUME 9, 2021 Image segmentation methods based on superpixel techniques: A survey Deformation and refined features based lesion detection on chest X-ray Breast cancer segmentation from thermal images based on chaotic salp swarm algorithm Breast cancer detection and classification using thermography: A review CXNet-m1: Anomaly detection on chest X-rays with image-based deep learning Hybrid gray wolf and particle swarm optimization for feature selection Human thermal face extraction based on superpixel technique Human thermal face recognition based on random linear oracle (RLO) ensembles Unsupervised dimensionality reduction for hyperspectral imagery via local geometric structure feature learning Feature learning using spatial-spectral hypergraph discriminant analysis for hyperspectral image Dimensionality reduction of hyperspectral image using spatial regularized local graph discriminant embedding Dimensionality reduction and classification of hyperspectral images using ensemble discriminative local metric learning DeTrac: Transfer learning of class decomposed medical images in convolutional neural networks Current applications and future impact of machine learning in radiology Assessment of convolutional neural networks for automated classification of chest radiographs Novel feature selection and voting classifier algorithms for COVID-19 classification in CT images An advanced patient health monitoring system Anemia estimation for COVID-19 patients using a machine learning model Can artificial intelligence improve the management of pneumonia A novel transfer learning based approach for pneumonia detection in chest X-ray images Optimizing convolutional neural network hyperparameters by enhanced swarm intelligence metaheuristics Towards improving the convolutional neural networks for deep learning using the distributed artificial bee colony method Efficient pneumonia detection for chest radiography using ResNet-based SVM Dynamic group-based cooperative optimization algorithm Optimized superpixel and AdaBoost classifier for human thermal face recognition An imbalanced big data mining framework for improving optimization algorithms performance PAPSO: A poweraware VM placement technique based on particle swarm optimization An imbalanced big data classification framework using whale optimization and deep neural network MbGWO-SFS: Modified binary grey wolf optimizer based on stochastic fractal search for feature selection Identifying medical diagnoses and treatable diseases by image-based deep learning COVID-19 image data collection: Prospective predictions are the future A novel nature-inspired algorithm for optimization: Squirrel search algorithm A new local search based hybrid genetic algorithm for feature selection An efficient deep learning approach to pneumonia classification in healthcare An end-to-end deep learning approach for landmark detection and matching in medical images Machine learning approach to detect cardiac arrhythmias in ecg signals: A survey RADnet: Radiologist level accuracy using deep learning for hemorrhage detection in CT scans Fundus photograph-based deep learning algorithms in detecting diabetic retinopathy Clinical factors, C-reactive protein point of care test and chest X-ray in patients with pneumonia: A survey in primary care Deep neural network ensemble for pneumonia localization from a large-scale chest X-ray database Identifying pneumonia in chest X-rays: A deep learning approach Iterative weighted sparse representation for X-ray cardiovascular angiogram image denoising over learned dictionary Spatially adaptive denoising for X-ray cardiovascular angiogram images Diagnostic value of bedside lung ultrasonography in pneumonia Lung ultrasound vs. chest X-ray in children with suspected pneumonia confirmed by chest computed tomography: A retrospective cohort study Pneumonia detection using CNN based feature extraction Deep convolutional neural network with transfer learning for detecting pneumonia on chest X-rays Clinically applicable AI system for accurate diagnosis, quantitative measurements, and prognosis of COVID-19 pneumonia using computed tomography Deep learning system to screen coronavirus disease 2019 pneumonia Application of deep learning for fast detection of COVID-19 in X-rays using nCOVnet A novel medical diagnosis model for COVID-19 infection detection based on deep features and Bayesian optimization Weakly supervised deep learning for COVID-19 infection detection and classification from CT images New machine learning method for image-based diagnosis of COVID-19 Applications and datasets for superpixel techniques: A survey Convolutional Neural Network Advanced deep-learning techniques for salient and category-specific object detection: A survey Very deep convolutional networks for large-scale image recognition Transfer deep learning along with binary support vector machine for abnormal behavior detection A color-based approach for melanoma skin cancer detection Learning long-term temporal features with deep neural networks for human action recognition Automatically designing convolutional neural network architecture with artificial flora algorithm Application of metaheuristic algorithms for determining the structure of a convolutional neural network with a small dataset CPSO-CNN: An efficient PSO-based algorithm for fine-tuning hyper-parameters of convolutional neural networks Efficient network architecture search via multiobjective particle swarm optimization based on decomposition Malignant melanoma detection using multi layer perceptron with optimized network parameter selection by PSO,'' in Contemporary Advances in Innovative and Applicable Information Technology A survey on image data augmentation for deep learning The authors would like to thank Dr. Mohamed Elsayed Gawish and Dr. Shaaban Omar for their support in interpreting the X-ray datasets. and separate X-ray images of the affected patients from other non-infected EL-SAYED M. EL-KENAWY (Member, IEEE) is currently an Assistant Professor with the Delta Higher Institute for Engineering and Technology (DHIET), Mansoura, Egypt. Inspiring and motivating students by providing a thorough understanding of a variety of computer concepts. He has published over 25 publications with over 550 citations and an H-index of 17. He has pioneered and launched independent research programs. Adept at sometimes explaining complex concepts in an easy-to-understand manner. His research interests include optimization, artificial intelligence, machine learning, deep learning, data science, and digital marketing. He serves as a Reviewer for the journal IEEE ACCESS.SEYEDALI MIRJALILI (Senior Member, IEEE) is currently the Director of the Centre for Artificial Intelligence Research and Optimization, Torrens University Australia, Brisbane, QLD, USA. He is internationally recognized for his advances in swarm intelligence and optimization, including the first set of algorithms from a synthetic intelligence standpoint-a radical departure from how natural systems are typically understood-and a systematic design framework to reliably benchmark, evaluate, and propose computationally cheap robust optimization algorithms. He has published over 200 publications with over 27 000 citations and an H-index of 58. As the most cited researcher in robust optimization, he is in the list of 1% highly cited researchers and named as one of the most influential researchers in the world by the Web of Science. He is working on the applications of multi-objective and robust meta-heuristic optimization techniques as well. His research interests include robust optimization, engineering optimization, multi-objective optimization, swarm intelligence, evolutionary algorithms, and artificial neural networks. He is an Associate Editor of several journals, including Neurocomputing, Applied Soft Computing, Advances in Engineering Software, Applied Intelligence, PLOS One, and IEEE ACCESS.