key: cord-0788104-3men70wu authors: Ezzat, Dalia; Hassanien, Aboul Ella; Ella, Hassan Aboul title: An optimized deep learning architecture for the diagnosis of COVID-19 disease based on gravitational search optimization date: 2020-09-22 journal: Appl Soft Comput DOI: 10.1016/j.asoc.2020.106742 sha: fe8b30ddc096442e683b6157fab3d5181c42f38a doc_id: 788104 cord_uid: 3men70wu In this paper, a novel approach called GSA-DenseNet121-COVID-19 based on a hybrid convolutional neural network (CNN) architecture is proposed using an optimization algorithm. The CNN architecture that was used is called DenseNet121, and the optimization algorithm that was used is called the gravitational search algorithm (GSA). The GSA is used to determine the best values for the hyperparameters of the DenseNet121 architecture. To help this architecture to achieve a high level of accuracy in diagnosing COVID-19 through chest x-ray images. The obtained results showed that the proposed approach could classify 98.38% of the test set correctly. To test the efficacy of the GSA in setting the optimum values for the hyperparameters of DenseNet121. The GSA was compared to another approach called SSD-DenseNet121, which depends on the DenseNet121 and the optimization algorithm called social ski driver (SSD). The comparison results demonstrated the efficacy of the proposed GSA-DenseNet121-COVID-19. As it was able to diagnose COVID-19 better than SSD-DenseNet121 as the second was able to diagnose only 94% of the test set. The proposed approach was also compared to another method based on a CNN architecture called Inception-v3 and manual search to quantify hyperparameter values. The comparison results showed that the GSA-DenseNet121-COVID-19 was able to beat the comparison method, as the second was able to classify only 95% of the test set samples. The proposed GSA-DenseNet121-COVID-19 was also compared with some related work. The comparison results showed that GSA-DenseNet121-COVID-19 is very competitive. In this paper, a novel approach called GSA-DenseNet121-COVID-19 based on a hybrid convolutional neural network (CNN) architecture is proposed using an optimization algorithm. The CNN architecture that was used is called DenseNet121, and the optimization algorithm that was used is called the gravitational search algorithm (GSA). The GSA is used to determine the best values for the hyperparameters of the DenseNet121 architecture. To help this architecture to achieve a high level of accuracy in diagnosing COVID-19 through chest x-ray images. The obtained results showed that the proposed approach could classify 98.38% of the test set correctly. To test the efficacy of the GSA in setting the optimum values for the hyperparameters of DenseNet121. The GSA was compared to another approach called SSD-DenseNet121, which depends on the DenseNet121 and the optimization algorithm called social ski driver (SSD). The comparison results demonstrated the efficacy of the proposed GSA-DenseNet121-COVID-19. As it was able to diagnose COVID-19 better than SSD-DenseNet121 as the second was able to diagnose only 94% of the test set. The proposed approach was also compared to another method based on a CNN architecture called Inception-v3 and manual search to quantify hyperparameter values. The comparison results showed that the GSA-DenseNet121-COVID-19 was able to beat the comparison method, as the second was able to classify only 95% of the test set samples. The On 11/March/2020, the world health organization (WHO) announced that the novel coronavirus disease-2019 (COVID- 19) had been a Pandemic outbreak. COVID-19 is a respiratory disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). SARS-CoV-2 is a virus that belongs to the Coronavirdeae family, which is the same family of both severe acute respiratory syndrome coronavirus and middle east respiratory syndrome coronavirus (MERS-CoV) [1] . Both SARS-CoV-1 and MERS-CoV are the 2002 severe acute respiratory syndrome causative agent and the 2012 middle east respiratory syndrome MERS epidemic [2, 3] . This announcement was the beginning of the current medical health problem faced and shared by the whole world during the last few months. Up till now, there are no efficient protective vaccines, neutralizing antisera or curative medication have been developed or officially approved to be used in COVID-19's patients worldwide. The continuous increase in morbidities and mortalities due to SARS-CoV-2 leads to international medical health worsen situations day after day. Therefore, this emerging COVID-19 pandemic becomes an ongoing challenge for all medical health workers and researchers, by applying the natural timeline of infectious diseases [4] on COVID-19, as shown in Fig. 1 . The importance of shortening the period between the onset of symptoms and the usual diagnosis will appear. Therefore, an efficient rapid diagnostic test or protocol will help to achieve proper early medical caring to COVID-19 patients that, by its role, will help to save a lot of lives worldwide. Finding a rapid, efficient diagnostic test or protocol becomes one of those top critical priorities. Quantitative reverse transcriptase-polymerase chain reaction (qPCR) is the golden standard test for confirmed laboratory diagnosis of COVID-19. Other rapid, bedside, field, and point of care immunochromatographic lateral flow, nucleic acid lateral flow, nucleic immunochromatographic lateral flow, and CRISPR CAS-12 lateral flow are under development [5] . COVID-19 as a pneumonic disease characterized by general pneumonic lung affections with absolute uniqueness from other pneumonia-causing Coronaviruses. Although the radiological imaging features closely similar and overlapping those associating with SARS and MERS. The bilateral lung involvement on initial imaging is more likely to be seen with COVID-19, as those associating SARS and MERS are more predominantly unilateral. So using radiological imaging techniques such as X-rays and computed tomography (CT) is of great value as confirmed, need an expert but rapid diagnostic approach either separately or in combination with qPCR. To avoid the false negative/positive COVID-19 results, recorded and reported during separate qPCR in the disease's early stage [6] . Deep learning (DL) is the most common and accurate method for dealing with medical datasets that contain a relatively large number of training samples. For instance, the classification of brain abnormalities, the classification of different types of cancer, the classification of pathogenic bacteria, and biomedical image segmentation [7] [8] [9] [10] [11] . One of DL's most important characteristics is its ability to handle relatively large amounts of data efficiently [12] . As well as, it eliminates the need to extract essential features manually [13] . According to these reasons, many efforts have depended on the DL methods, especially CNN architectures, to diagnose COVID-19 through chest radiological imaging, especially X-rays. For instance, in this study [14] , the authors used several state-of-the-art CNN architectures by applying the transfer learning method to diagnose COVID-19. The results of their experiments indicated that the MobileNet architecture was the best with an accuracy of 96.78%. In another study [15] , the DarkCovidNet model was proposed to diagnose COVID-19 using a two-classes dataset (COVID-19, No-Findings) and threeclasses dataset (COVID-1, No-Findings, Pneumonia). The experimental results showed that the DarkCovidNet model had diagnosed the COVID-19 with higher accuracy by using the twoclasses dataset, where the accuracy reached 98.08%. In [16] to improve CNN architectures' performance in diagnosing COVID-19, a new model called CovidGAN is built, generating new samples from the dataset samples used. The experiment results demonstrated that the CovidGAN model helped the VGG16 network diagnose the COVID-19 with 95% accuracy. In [17] , the authors produced a model called CoroNet, and this model was able to diagnose COVID-19 with 89.6% accuracy in a four-category classification (COVID-19 vs. bacterial pneumonia vs. viral pneumonia vs. normal). In a three-category classification (COVID vs. pneumonia vs. normal), the CoroNet model achieved 94% accuracy. While in a two-category classification (COVID vs. normal), the CoroNet achieved a higher accuracy. Despite the promising results these CNN architectures have achieved in detecting the COVID-19, the large number of hyperparameters therein represents an obstacle to achieving better results. However, very few studies, such as [18] , have sensed these hyperparameters' importance in obtaining high efficiency with CNN architectures and the necessity of treating them as an optimization problem. Where in [18] , the authors presented a method for detecting COVID-19, which is the deep Bayes-SqueezeNet method. This method is based on Bayesian optimization to fine-tune hyperparameters of the CNN architecture called SqueezeNet. The deep Bayes-SqueezeNet reached 98.3% accuracy in the three-category classification (Normal vs. COVID-19 vs. Pneumonia). Many studies, such as [19, 20] conducted to determine the extent of hyperparameters' influence on various DL architectures. These studies have found the hyperparameters that offer significant performance improvements in simple networks do not have the same effect in more complex networks. The hyperparameters that fit one dataset do not fit another dataset with different properties The choice of values for these hyperparameters often depends on a combination of human experience, trial and error, or a grid search method [21] . Due to the nature of computationally expensive CNN architectures, which can take several days to train, the trial and error method is ineffective [22] . The grid search method is usually not suitable for CNN architectures because the number of combinations grows exponentially with the number of hyperparameters [23] . Therefore, the automatic optimization of the hyperparameters of CNNs architectures is so essential [24, 25] . This paper introduces an approach called GSA-DenseNet121-COVID-19. This approach relies on a pre-trained CNN architecture called DenseNet121 to diagnose COVID-19 by applying the transfer learning method. GSA [26] was used to select optimal values for the hyperparameters of DenseNet121. The aim of proposing this approach is to facilitate and expedite the analysis of chest X-rays taken during various COVID-19 diagnostic protocols. Especially now, after the regular increase in COVID-19 patients every day, it is complicated and exhaustive to all medical field personnel to achieve the same high-quality analysis of chest Xrays along the whole day, 24/7. Therefore, automating specific steps of diagnostic protocols is a must to keep the integrity of the diagnostic quality of medical field practitioners. Accordingly, the contributions of this paper can be summarized in the following points: -This paper provides a diagnostic approach to COVID-19 that can be used alone or in combination with qPCR to reduce the false negative and false positive rates of qPCR. -The proposed approach is called GSA-DenseNet121-COVID-19, and it adopts a transfer learning method using a pretrained CNN architecture called DenseNet121 that has been hybridized with an optimization algorithm called GSA. -The GSA is used to improve the classification performance of DenseNet121 by optimizing its hyperparameters. -The proposed approach is scalable, that is, it can expand the classification, as the number of training, testing, and validation samples can be increased without having to specify the values of the hyperparameters of DenseNet121 manually. -The results of the proposed approach to diagnosing the COVID-19 were very felicitous, achieving a 98.38% accuracy level on the test set. -The proposed approach was compared with other approaches proposed, the results of the comparison showed that the proposed approach is superior to the other approaches despite being trained on smaller and more varied samples. The rest of the paper is structured as follows: Section 2 represents the CNNs and the GSA's theoretical background. The dataset used is discussed in Section 3. While Section 4 shows details of the proposed approach. The results achieved by the proposed approach are illustrated in Section 5. GSA is an optimization technique gaining attention in the last years and developed by Rashedi [26] . It is based on the law of gravity, as shown in Eq. (1) and the second law of motion, as shown in Eq. (3) [27] . It also depends on the general physical concept that there are three types of mass: inertial mass, active gravitational mass and passive gravitational mass [28] . The law of gravity states that every particle attracts every other particle with a gravitational force (F). The gravitational force (F) between two particles is directly proportional to the product of their masses (M 1 and M 2 ) and inversely proportional to the square of their distance (R 2 ). The second law of motion states that when a force (F ) is utilized to a particle, its acceleration (a) is determined by the force and its mass (M). Where G is the gravitational constant, which decreases with increasing time, and it is calculated as equation (2) [29] . The GSA is similar to the above basic laws with minor modifications in Eq. (1). Where Rashid [26] stated that based on the experimental results, inverse proportionality to distance (R) produces better results than R 2 . The GSA can be expressed as an isolated N particle system, and their masses measure their performance. All particles attract each other by the force of gravity, and this force causes a universal movement of all particles towards the particles that have heavier mass. Consequently, masses collaborate using a direct form of communication through the force of gravity. The heavy masses represent good solutions as they move more slowly than lighter mass, while light masses represent worse solutions, moving towards the heavier masses faster. Each mass has four specifications: active gravitational mass, passive gravitational mass, inertial mass, and position. The mass' position corresponds to a solution to the problem, and the other specifications of the mass (active gravitational mass, passive gravitational, inertial mass) are determined utilizing the fitness function. The algorithm of GSA can be summarized in eight steps as follows: Assuming there is an isolated system with N particles (masses), the position of ith particle is denoted as : Where p d i presents the position of ith particle in the dth dimension. Step two: Fitness evaluation of particles In this step, the worst and best fitness is calculated as Eqs. (5) and (6) respectively for a minimization problem, and calculated as Eqs. (7) and (8) respectively for a maximization problem. Where fitness j (t) is the fitness of the jth particle at time t. • Step three: Calculate the gravitational constant G(t) In this step, the gravitational constant G(t) at time t is calculated as follows [30] : Where G 0 represents the initial value of the gravitational constant initializes randomly, t is the current time, t max is the total time. Step four: Update the inertial and gravitational masses In this step, the inertia and gravitational masses are updated by the fitness function. Assuming the equality of the inertia and gravitational mass, the masses' values are calculated as follows: Where fitness i (t) is the fitness of the ith particle at time t, M i (t) is the mass of the ith particle at time t. • Step five: Compute the total force In this step, the total force F d i (t) that exerting on particle i in a dimension d at time t is calculated as follows: Where rand j is a random number ∈ [0, 1], kbest is the set of first kbest particles with the best fitness value and the biggest masses. F d ij (t) is the force exerting from mass 'j' on mass 'i' at time 't' and is calculated as the following equation: Where M pi represents the passive gravitational mass associated with particle i, M aj represents the active gravitational mass associated with particle j. τ is a small positive constant to prevent division by zero, R ij (t) represents the Euclidian distance between particles j and i: Step six: compute the velocity and acceleration In this step based on F d i (t), the acceleration of the particle i, a d i (t), at time t in the direction dth, and the next velocity of the particle i in the direction dth, u d i (t + 1), are calculated as follows: Where M ii (t) represents the inertial mass of ithparticle, rand i is a random number ∈ [0, 1]. • Step seven: update particles' position In this step the next position of the particle i in the direction dth, p d i (t + 1), is calculated as follows: • Step eight: Repeat steps two to seven until the stop criteria are reached; these eight steps are illustrated by a flowchart shown in Fig. 2 . In this section, the main structure of any CNN architecture will be illustrated. Additionally, the transfer learning method and how to apply this method using a pre-trained CNN architecture will be explained. The convolutional base includes three significant types of layers are: convolutional layers [31] , activation layers [32] , and pooling layers [33] . These types of layers are used to discover the basic features of input images, which are called feature maps. A feature map is getting by performing convolution processes to the input image or prior features using a linear filter, merging a bias term. Then passing this feature map through a non-linear activation function such as Sigmoid [34] and Rectified Linear Unit (RELU) [35] . In contrast, the classifier base includes the dense layers combined with the activation layers to convert the feature maps to one dimension vectors to expedite the classification task using many neurons. Usually, one or more dropout layers [36] are added to the classifier base to minimize the overfitting that may encounter CNN architectures and improve their generalization. Adding any dropout layer to the classifier base introduces a new hyperparameter called dropout rate. This hyperparameter determines the probability at which outputs of the layer are removed, or reciprocally, the probability at which outputs of the layer are kept. Typically, the dropout rate is set in the range from 0.1 to 0.9 [37] . Transfer Learning: One of the famous and very influential techniques for dealing with small datasets is using a pre-trained network. A pre-trained network is a network that was trained on a vast dataset, usually in the task of categorizing images, and then its architecture and weights were preserved. If this initial dataset is big enough and general enough, the set of features that the pre-trained network has learned can be useful as a general visual model. Therefore, these features can help several different computer vision tasks, even if the new tasks may contain fully different categories from the initial task [38, 39] . For example, networks that have been trained on the ImageNet database, such as DenseNet121 [40] , can reset to something as remote as exploring medical image features. Transfer learning from a pretrained network can be applied in two ways, namely feature extraction and fine-tuning. The Feature extraction involves taking the convolutional base of a pre-trained network to extract the new dataset features and then training a new classifier on top of these outputs. The fine-tuning is complementary to the feature extraction method, where it involves unfreezing the last layers of the frozen convolutional base utilized for the feature extraction. The unfrozen layers are then retrained in combination with the new classifier previously learned in the feature extraction method. The fine-tuning method aims to adjust the pre-trained model's most abstract features to make them more relevant to the new task. The steps for using these ways can be explained as follows [41] : The binary COVID-19 dataset used in this paper is a combination of two datasets. The first dataset is the COVID19 Chest X-ray dataset made available by Dr. Joseph Paul Cohen of the University of Montreal [42] . The COVID19 Chest X-ray dataset consists of 150 chest X-ray and CT images as of the time of writing this paper, 121 Table 1 Number of cases of COVID-19, SARS, ARDS, Pneumocystis and Streptococcus, number of cases diagnosed with X-rays, and CT scans in each cause of pneumonia, and the total number of cases in the COVID19 Chest X-ray dataset. The cause of pneumonia 3 . Some images of the COVID19 Chest X-ray dataset, next to each image is its metadata. images of this dataset represent cases infected with the COVID-19. While 11 images represent cases infected with SARS, and four images represent cases infected with acute respiratory distress syndrome (ARDS). This dataset also contains five pneumocystis cases and six cases of streptococcus, as shown in Table 1 . The COVID19 Chest X-ray dataset contains many metadata for each image. The most important of which are: offset, sex, age, finding, survival, modality, date, location, and clinical observations about the radiograph in particular, not just the patient. The offset is the number of days since the onset of symptoms or hospitalization; the offset values ranged from 0 to 32. The ages of the patients enrolled in this dataset ranged from 12 to 87. While the finding field explains the cause of pneumonia, and the surviving field clarifies whether the patient is still alive or not. The modality defines how the diagnosis was made, either X-ray or CT scan. Fig. 3 shows some samples of the COVID19 Chest X-ray dataset and metadata for each sample. The second dataset is the Kaggle Chest X-ray dataset made available for a Data Science competition [43] . This dataset consists of 5811 X-ray images, 1538 images represent normal cases, and 4273 images represent pneumonia cases. The binary COVID-19 dataset was built to distinguish COVID-19 cases from those suffering from other diseases and healthy cases using only Xray images. The cases that were diagnosed using CT scans were excluded. The used dataset consists of two categories: positive and negative, as shown in Table 2 . The positive category contains 99 X-ray images representing cases infected with the COVID-19, taken from the COVID19 Chest X-ray dataset. The negative category contains 207 X-ray images, 104 images in this category represent healthy cases, and 80 images represent pneumonia cases, taken from the Kaggle Chest X-ray dataset. The other 23 images in the negative category represent cases affected by SARS, ARDS, pneumocystis, or streptococcus, taken from the COVID19 Chest X-ray dataset. Some images of each category of the binary COVID-19 dataset are shown in Fig. 4 . The proposed approach GSA-DenseNet121-COVID-19, relies on the transfer learning from a pre-trained CNN architecture. The pre-trained architecture utilized in the proposed approach is DenseNet121. For this architecture's best performance, its hyperparameters have been optimized using the GSA. After determining the optimal values for these hyperparameters, DenseNet121 was trained using transfer learning techniques. Once this training is completed, it is evaluated using a separate test set. In other words, the training and validation sets were used to determine the optimal values for the hyperparameters of the DenseNet121 and trained it. Whereas the fully trained DenseNet121 is then evaluated using the test set. The proposed approach consists of four main stages, as shown in Fig. 5 . The first stage is the data preparation, the second stage is the hyperparameters selection, the third stage is the learning, and the performance measurement is the fourth stage. Each stage will be explained in detail in the following sections. As explained in the data description section, the positive category of the binary COVID-19 dataset contains 99 samples, while the negative category includes 207 samples, which means that this dataset is not balanced. In most cases, not all ML algorithms can handle this type of dataset well. Because most of the information available in this type of dataset belongs to the dominant category, making any ML algorithm learn to categorize the dominant class and not categorize the other minor category. Therefore, samples in the positive category have been increased by randomly copying some images after cutting each image. So that random copying does not cause the used CNN architecture to overfit the dataset. After that, the dataset became balanced, with each category containing 207 images. The balanced binary COVID-19 dataset was divided into three sets: training set, validation set, and testing. The training set contains 70% of the dataset; that is, it has about 146 images in each category. While each of the validation and the test set contains 15% of the dataset samples; that is, each set includes 31 images in each category. Various data augmentation techniques [44] have been applied to increase the number of training samples to reduce overfitting and improve generalization. The data augmentation techniques used are brightness, rotation, width shift, height shift, shearing, zooming, vertical flip, and horizontal flip. Also, featurewise centering, featurewise standard deviation normalization and fill mode. Before the images were supplied to other stages, they were resized to a 180 x 180. As previously reviewed in Section 2.2, the transfer learning method takes the same structure of the pre-trained network after making minor changes. The most crucial change is to replace the classifier with a new one, which requires changing the values of some hyperparameters or adding new ones. Examples of the hyperparameters that require modification are the batch size, the learning rat's value, and the number of neurons in the dense layer. The hyperparameter that may add is the rate of the dropout layer. In the proposed GSA-DenseNet121-COVID-19, three hyperparameters have been optimized, namely the batch size, the rate of the newly added dropout layer, and the number of the neurons of the first dense layer. Therefore, the search space is three-dimensional, and each point in the space represents a mixture of these three hyperparameters. The feature extraction and fine-tuning techniques are utilized to prepare the DenseNet121 architecture to learn from the binary COVID-19 dataset. In the feature extraction, the convolutional base is kept unchanged, Whereas, the original classifier base is replaced by a new one that fits the binary COVID-19 dataset. The new classifier consists of four stacked layers: a flatten layer, and two dense layers separated by a new dropout layer. GSA determines the number of neurons in the first dense layer that uses RELU as an activation function and the dropout layer rate. The second dense layer has one neuron with a sigmoid function. After training the new classifier for some epochs, the fine-tuning is configured by retraining the last two blocks of the convolutional base of the DenseNet121 with the newly added classifier simultaneously. At this phase, the proposed approach is evaluated. Six measures are utilized to evaluate the proposed approach, namely accuracy, precision, recall, F1 score, and confusion matrix. Accuracy is among the foremost remarkably used measures for measuring the performance of classification models. It is outlined as a proportion between correctly classified samples to the overall number of samples as shown in Eq. (19) . The error rate is the complement of the accuracy; it represents the samples that are misclassified by the model and calculated as Eq. (20) [45] . Where P= the number of the positive samples, and N= the numbers of the negative samples. Precision as shown in Eq. (21) , it is the number of true positives divided by the number of true positives and false positives. In other words, it is the number of positive predictions divided by the total number of positive category values predicted. Precision can be considered a measure of the rigor of a classifier. A low precision can also indicate a large number of false positives [46] . Recall, which also termed as sensitivity is the number of true positives divided by the number of true positives and the number of false negatives as shown in Eq. (22) . In other words, it is the number of positive predictions divided by the number of positive class values in the test set. Recall can be considered a measure of how complete a classifier is. A low Recall indicates many false negatives [46] . F1 score, which is also termed as F score, is a function of precision and recall and calculated as Eq. (23) . It is used to seek a balance between precision and recall [46] . Confusion Matrix is a synopsis of the prediction results regarding the classification problem. The confusion matrix gives insight into not only the mistakes committed by the classifier but, more importantly, the types of errors that are made [47] . This section presents and analyzes the results obtained through the proposed approach described in detail in Section 4. All the proposed approach procedures have been executed on Google Colaboratory [48] and implemented using Python with Keras [49] . Keras is a high-level neural network API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It was developed for rapid use and the ability to conduct several experiments and get results as quickly as possible and the lowest delay, which helps to conduct adequate research. The results are divided into five sections to display clearly. The Keras ImageDataGenerator was utilized to implement augmentation techniques to increase the number of images of the binary COVID-19 dataset's training set. The data augmentation techniques used and the range used for each technique are listed in Table 3 . Fig. 6 illustrates some of the images obtained by applying augmentation techniques to one image from each category. The search space for hyperparameters whose values are to be set by the GSA was bounded as follows. The searching range of batch size was bounded by [1, 64] , and the searching space of the dropout rate was bounded by [0.1,0.9]. While the searching space of the number of neurons is bounded by [5, 500] , as listed in Table 4 . The GSA parameters' values were randomly specified, where the maximum number of iteration and population size set to 15 and 30, respectively, as listed in Table 4 . The number of DenseNet121 training epochs was chosen by experimenting with more than one value. The experiment concluded that when using a number of epochs over ten, the training process for each of the GSA takes exponential time. While when using less than ten epochs, the results of the DenseNet121 were not sufficiently accurate. Therefore, the number of epochs used to train the DenseNet121 was set to ten epochs. The goal of using the GSA is to reduce the loss rate of the validation set as much as possible. The proposed GSA solutions' suitability is evaluated based on the achieved loss rate of the validation set using these proposed solutions after ten network training periods. After completing the approximately 11-hour GSA training, the optimum values for the batch size, dropout rate, and number of neurons of the first dense layer were determined. Table 5 shows the optimal values for the hyperparameters selected by GSA, where the batch size, dropout and the number of neurons are 8, 0.1, 110 respectively. At this stage, the DenseNet121 was trained using the optimal values for the hyperparameters chosen by GSA. DenseNet121 architecture was trained on the training set and evaluated on the validation set for K number of epochs. To determine the value of K, several experiments were conducted, and it was found that the DenseNet121 achieved the best results on the validation set around the 30th epoch within the feature extraction method. While about the 40th epoch within the fine-tuning method and that no improvement was observed after that. Thus, the value of K was marked to 30 within the feature extraction and 40 within the fine-tuning. To minimize the overfitting, the process of the training was forced to finish before repetition K if no improvement was perceived for seven iterations, this control was made using early stopping [50] . As the COVID-19 dataset used is a binaryclass classification problem, the DenseNet121 is compiled with the binary cross-entropy [51] . The Adam optimizer algorithm [52] was used with a constant learning rate =2e-5 within the feature extraction method. Within the fine-tuning method, a step decay schedule [53] was utilized, where the initial learning rate LR 0 = 1e − 5, and the value of the learning rate drops by 0.5 every 10 training epochs. The use of a low learning rate in the fine-tuning method is due to the fact that the number of changes that will occur in this method should be very small. So that the features learned from the feature extraction method are not lost. This section presents the results of the performance evaluation of the DenseNet121 architecture using hyperparameters values specified by the GSA. The performance of the proposed approach GSA-DenseNet121-COVID-19 was evaluated using accuracy, loss rate, precision, recall, and F1 score. The proposed approach achieved 98.38% accuracy in the test set, the average precision, recall, and f1 score were 98.5%, 98.5%, and 98%, respectively. The macro average and weighted average for the precision, The performance of the proposed approach GSA-DenseNet121-COVID-19 and the overall performance is calculated using macro and weighted average. recall, and F score was equal as the values for both were 98%, as listed in Table 6 . To find out the number of samples incorrectly classified by the proposed approach GSA-DenseNet121-COVID-19. As well as the number of samples that it was able to classify correctly, the confusion matrix was used as shown in Fig. 7 . The dark-colored shaded cells of the confusion matrix represent samples that were correctly categorized in each category. Whereas, the light-colored shaded cells of the confusion matrix represent incorrectly categorized samples in each category. As shown in the confusion matrix in Fig. 7 , the proposed approach GSA-DenseNet121-COVID-19 erroneously classified only one sample from the test group, while it succeeded in classifying all other samples. Fig. 8 shows the results of the proposed approach for four images. The images surrounded by a green rectangle represent the images that the proposed approach correctly classified. While the image surrounded by the red rectangle represents the only sample that the proposed approach has incorrectly classified. The image in the red rectangle is an image that belongs to the positive category. However, the proposed approach classified this image as belonging to the negative category with Table 7 Comparison of the proposed GSA-DenseNet121-COVID-19 performance with the performance of the SSD-GSA approach and Inception-v3. MA-Precision=macro average of precision, MA-Recall = macro average of recall, and MA-F score = macro average of F-score. Proposed approach SSD-DenseNet121 Inception-v3 a certainty of 83.8% as a first decision and to a positive category with a certainty of 16.2% as a second decision. To know whether the proposed GSA-DenseNet121-COVID-19 is aware of X-rays as radiologists do, or whether it is learning unhelpful features to make predictions. The gradient weighted class activation mapping (Grad-CAM) [54] was used. As shown in Fig. 9 , the Grad-CAM visualization of a positive sample and a negative sample, prove the effectiveness of the proposed approach in determining the important features relevant to each category. To ensure the effectiveness of the GSA in determining optimum values for the hyperparameters of the DenseNet121 architecture that can achieve the highest level of accuracy. It has been compared with the SSD algorithm [55] that has been proven effective in determining optimal values for the hyperparameters of the CNN architecture used to detect the nanoscience scanning electron microscope images [25] . For a fair comparison between GSA and SSD algorithm. The values of the SSD algorithm parameters have been set with the same values that have been set for the GSA parameters, as shown in Table 4 . After the SSD algorithm has completed the training process, the batch size value was set to 6, while the number of neurons and the dropout rate were set to 220 and 0.71, respectively. The comparative results showed that the GSA is more suitable for pairing with the DenseNet121 to classify the binary COVID-19 dataset. Where the GSA was able to choose better values for the hyperparameters of DenseNet121 architecture, which in turn made this architecture achieve a higher accuracy ratio. Where the approach SSD-DenseNet121 achieved an accuracy rate of 94% on the test set. As well as the macro average for the precision, recall, F1 score of the SSD-DenseNet121 were equal as the values for both were 94%, as listed in Table 7 . To ensure the performance of the proposed approach GSA-DenseNet121-COVID-19 as a whole. It was compared with the Inception-v3 architecture based on the manual search. The manual search indicates that the values of the Inception-v3 hyperparameters were randomly chosen. Where the values of batch size, dropout rate, and the number of neurons were set manually to 16, 0.5, 250, respectively. The results of this comparison showed Table 8 A comparison between the results of the proposed approach and the best results achieved by the other proposed approaches listed in [14] [15] [16] [17] 19] . that the proposed GSA-DenseNet121-COVID-19 was superior to the Inception-v3 architecture based on the manual search. Where, the accuracy of the Inception-v3 architecture is 95%, while the macro average precision, recall, and F1 score of this architecture are 95%, 96%, 95%, 95%, respectively, as shown in Table 7 . The proposed GSA-DenseNet121-COVID-19 performance was also compared with other published approaches that were introduced for the same purpose of diagnosing COVID-19 using X-ray images. The other approaches included in [14] [15] [16] [17] 19] were selected for comparison, as they relied on CNN architectures and were trained on a variety of data samples. The proposed approach has been compared to other approaches in terms of the number and variety of samples used, accuracy, precision, recall, and f score, as shown in Table 8 . In [14] , many of the CNN architectures were evaluated to classify two datasets. However, the best performance was for the MobileNet architecture on the second dataset containing 224 samples of COVID-19, 504 healthy samples, and 714 samples of pneumonia. The MobileNet obtained an accuracy, precision, and recall rate of 96.78%, 96.46%, and 98.66%, respectively, in the second dataset classification when it was treated as a binary classification problem, as shown in the second row of Table 8 . In [15] , the proposed DarkCovidNet was able to be more accurate by classifying the two categories more than classifying the three categories, as shown in rows 3 and 4 of Table 8 . Whereas in [16] , after the CNN-SA approach was trained in healthy and COVID-19 cases only, it was able to achieve a rate of 96% for both accuracy, precision, recall, and f score. The CoroNet proposed in [17] has performed better in the binary classification that includes only healthy and COVID-19 cases, than the multi-category classification that includes various types of samples, as shown in rows 5, 6, and 7 of Table 8 . While the Deep Bayes-SqueezeNet approach [19] , having been trained on a relatively large number of samples, was able to achieve 97% for both accuracy, precision, recall, and f score, as shown in the eighth row of Table 8 . Table 8 shows that although the proposed GSA-DenseNet121-COVID-19 has been trained in a smaller and more diverse number of samples than its counterparts. The proposed GSA-DenseNet121-COVID-19 managed to outperform both DarkCovid-Net and CNN-SA. Likewise, the results of GSA-DenseNet121-COVID-19 outperformed the results of MobileNet in terms of accuracy and precision but were slightly smaller in the recall. The proposed GSA-DenseNet121-COVID-19 is very competitive with the Deep Bayes-SqueezeNet. Since the results of GSA-DenseNet121-COVID-19 are better than the results of Deep Bayes-SqueezeNet in terms of precision and recall, they are approximately equal in accuracy and slightly less in result f score. The DenseNet121-COVID-19 was superior to the CoroNet when the latter was trained on a variety of samples, just as the DenseNet121-COVID-19 was trained. This paper proposes an approach called GSA-DenseNet121-COVID-19 that can be used to diagnose COVID-19 cases through chest X-ray images. The proposed GSA-DenseNet121-COVID-19 consists of four main stages are (1) data preparation stage, (2) the hyperparameters selection stage, (3) the learning stage, (4) the performance measurement stage. In the first stage, the binary COVID-19 dataset was handled from the imbalance and then divided into three sets, namely training set, validation set, and test set. After increasing the number of samples of the training set in the first stage using different data augmentation techniques, they were used in the second stage with the validation set. In the second stage, GSA is used to optimize some of the hyperparameters in the CNN architecture used which is called DenseNet121. In the third stage, DenseNet121 was completely trained using the values of the hyperparameters that were identified in the previous stage which in turn helped this architecture to diagnose 98.38% of the test set in the fourth stage. The proposed approach was compared to more than one approach, and the results of the comparison showed the effectiveness of the proposed approach in diagnosing the COVID-19. In future work, the number of samples used to train the proposed approach can be increased to improve its performance in the diagnosis of COVID-19. In addition, the number of other diseases causing pneumonia may be increased and the proposed approach can be used to distinguish them from the COVID-19. The emergence of SARS, MERS and novel SARS-2 coronaviruses in the 21st century Aetiology: Koch's postulates fulfilled for SARS virus Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia Department of Health and Human Services CRISPR-Cas12-based detection of SARS-CoV-2 Real-time RT-PCR in COVID-19 detection: issues affecting the results Application of deep transfer learning for automated brain abnormality classification using MR images An enhanced deep learning approach for brain cancer MRI images classification using residual networks Artificial intelligence technique for gene expression by tumor RNA-seq data: A novel optimized deep learning approach Rapid identification of pathogenic bacteria using Raman spectroscopy and deep learning Deep learning approaches to biomedical image segmentation Deep learning Deep learning in radiology: Does one size fit all? Covid-19: automatic detection from Xray images utilizing transfer learning with convolutional neural networks Automated detection of COVID-19 cases using deep neural networks with X-ray images CovidGAN: Data augmentation using auxiliary classifier GAN for improved Covid-19 detection CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images COVIDiagnosis-Net: Deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data A framework for designing the architectures of deep convolutional neural networks Random search for hyper-parameter optimization Optimizing deep learning hyper-parameters through an evolutionary algorithm Hyperparameter optimization of deep neural network using univariate dynamic encoding algorithm for searches, Knowl.-Based Syst An optimized model based on convolutional neural networks and orthogonal learning particle swarm optimization algorithm for plant diseases diagnosis An optimized deep convolutional neural network to identify nanoscience scanning electron microscope images using social ski driver algorithm GSA: A gravitational search algorithm Fundamentals of Physics Gravity from the Ground Up: An Introductory Guide to Gravity and General Relativity Effective time variation of G in a model universe with variable space dimension A new approach for unit commitment problem via binary gravitational search algorithm Deep convolutional neural networks for image classification: A comprehensive review Deep neural networks with a set of node-wise varying activation functions Evaluation of pooling operations in convolutional architectures for drug-drug interaction extraction The influence of the activation function in a convolution neural network model of facial expression recognition Proceedings of the 14th International Conference on Artificial Intelligence and Statistics Dropout: A simple way to prevent neural networks from overfitting Deep Learning Deep learning and transfer learning features for plankton classification Impact of dataset size and variety on the effectiveness of deep learning and transfer learning for plant disease classification Densely Connected Convolutional Networks Convolutional neural networks: an overview and application in radiology COVID-19 image data collection Chest X-ray images (Pneumonia) A survey on image data augmentation for deep learning Classification assessment methods A probabilistic interpretation of precision, recall and f-score, with implication for evaluation Encyclopedia of Machine Learning Performance analysis of google colaboratory as a tool for accelerating deep learning applications Keras: Deep learning for humans Early stopping -but when?, in: Neural Networks: Tricks of the Trade Visualising basins of attraction for the cross-entropy and the squared error neural network loss functions Adam: A method for stochastic optimization An empirical study of learning rates in deep neural networks for speech recognition Visual explanations from deep networks via gradient-based localization Parameters optimization of support vector machines for imbalanced data using social ski driver algorithm The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.