key: cord-1020312-q3hz8ypc authors: Mittal, Himanshu; Pandey, Avinash Chandra; Pal, Raju; Tripathi, Ashish title: A new clustering method for the diagnosis of CoVID19 using medical images date: 2021-01-23 journal: Appl Intell DOI: 10.1007/s10489-020-02122-3 sha: 0a7dc8d7bc772bd9ccb938789f6688ac08f8beac doc_id: 1020312 cord_uid: q3hz8ypc With the spread of COVID-19, there is an urgent need for a fast and reliable diagnostic aid. For the same, literature has witnessed that medical imaging plays a vital role, and tools using supervised methods have promising results. However, the limited size of medical images for diagnosis of CoVID19 may impact the generalization of such supervised methods. To alleviate this, a new clustering method is presented. In this method, a novel variant of a gravitational search algorithm is employed for obtaining optimal clusters. To validate the performance of the proposed variant, a comparative analysis among recent metaheuristic algorithms is conducted. The experimental study includes two sets of benchmark functions, namely standard functions and CEC2013 functions, belonging to different categories such as unimodal, multimodal, and unconstrained optimization functions. The performance comparison is evaluated and statistically validated in terms of mean fitness value, Friedman test, and box-plot. Further, the presented clustering method tested against three different types of publicly available CoVID19 medical images, namely X-ray, CT scan, and Ultrasound images. Experiments demonstrate that the proposed method is comparatively outperforming in terms of accuracy, precision, sensitivity, specificity, and F1-score. With the rapid outbreak of the Corona virus Disease (CoVID19) around the globe [1] , the World health organization (WHO) has been declared CoVID19 as a Public Health Emergency of International Concern (PHEIC) [2] . Pathogenically, the virus identified in CoVID19 is "severe acute respiratory syndrome coronavirus 2" (SARS-CoV-2) which is quite distinct from other respiratory viruses like MERS-CoV, SARS-CoV, avian influenza, and influenza [3] . Usual symptoms in a COVID19 patient are fever, dry cough, and tiredness. Other signs experienced by patients include sore throat, diarrhea, headache, loss of taste Malaviya National Institute of Technology, Jaipur, India or smell, and rash on the skin and severely infected patients have symptoms of breathing difficulty, chest pain, and Multi-Organ Failure (MOF) [4] . Despite worldwide efforts to control the growth of CoVID19, the number of CoVID19 incidences is still rising at a reproduction rate of 3.77 [5] . Globally, there are over 6.15 million confirmed cases, out of which more than 2.63 million patients have recovered while 371,700 patients died, as of the date (1 June 2020) [6] . Moreover, USA with more than 1.8 million confirmed cases is the most severely affected country, followed by Brazil, Russia, United Kingdom, Spain, and Italy, with more than 200 thousand confirmed cases to date. India, being the second-largest population country, has also a significant rise in the number of CoVID19 cases. At the time of this writing article, there are nearly around 200 thousand confirmed cases in India with a death count of 5,394. Presently, there are no approved antiviral drugs or treatment regimens to cure COVID19 [4] . Meanwhile, real-time polymerase chain reaction (RT-PCR) can be conducted to confirm the diagnosis of CoVID19 [7] . However, it has been observed that this method has several limitations, such as time-consuming, low true-positive rates, and high false-negative rates [7, 8] . Due to the high communicable behavior of CoVID19, such limitations raises a major concern in the current Fig. 1 Representative sample images of non-CoVID19 patients: a chest Xray, b chest CT-scan, and c lung Ultrasound epidemic situation. If an infected person is not diagnosed and treated timely, then the infected person may act as a carrier of the virus and spread it among healthy people unknowingly. Therefore, early identification of the infected people would help to quarantine them, which would be advantageous in declining the growth of CoVID19. As an alternative, clinical observations [9] have observed high sensitivity in diagnosing the CoVID19 through medical imaging modalities. There are irregular groundglass opacities in chest images of the infected people [10] . Figures 1 and 2 illustrate representative samples of three medical imaging modalities, i.e. chest X-ray, chest CTscan, and lung ultrasound images, of non-CoVID19 and CoVID19 patients, respectively. In general, the manual analysis of such images requires expert radiologists which makes this costly and error-prone process. To alleviate the same, computer-aided methods could facilitate as diagnostic aids for CoVID19. Fang et al. [11] has observed that the diagnosis of CoVID19 through imaging patterns in the chest images is much accurate than RT-PCR. Similar observations are also reported by Xie et al. [12] . Recently, many supervised learning methods have been proposed in which the concept of transfer learning is mainly applied in which a pre-trained deep learning model is used as a feature extractor. Deep learning models are not employed directly due to the unavailability of a large dataset of labeled chest images for CoVID19. Therefore, researchers have extracted deep features through transfer learning, and then, classification is performed through supervised models like decision tree, AdaBoost, support vector machine, and many more [4, 13] . Literature has witnessed that the accuracy attained by supervised learning on deep features, i.e. features extracted through transfer learning, is around 90%. However, the availability of limited medical images of CoVID19 may result in overfitting and thus impact the generalization ability of these learnable models. Moreover, the key requirement of supervised learning is labelled data [14, 15] . In such scenarios, literature has witnessed that unsupervised learning is quite efficient in determining patterns within the unlabelled data, especially clustering [16, 17] . Clustering methods induce inferences by discovering constitutional structures from the unlabelled dataset [18] . Therefore, the motivation of this paper is employing unsupervised learning on deep features for the diagnosis of CoVID19 from the medical chest images. In literature, one of the most popular unsupervised methods is Kmeans. Kmeans method clusters the unlabelled dataset into distinct unified clusters according to certain similarity measure [19] . Sometimes, Kmeans method suffers from a number of demerits like, biased towards initial clusters, converges pre-maturely, and produces local optima [20] . To mitigate such limitations, metaheuristic methods have been successful for clustering [21, 22] . Metaheuristic methods are mathematical models that mimic the unique optimization behavior of the natural systems [23] . Generally, metaheuristic methods have exhibited good performance on numerous nonlinear and complex optimization problems [24] [25] [26] . From the last two decades, they have been an integral part of many real-world applications ranging from design to business planning [27, 28] . Some popular metaheuristic methods are genetic algorithm [29] , differential algorithm [30] , particle swarm optimization [31] , ant colony optimization [32] , biogeography-based optimizer [33] , and artificial bee colony [34] . There are many other phenomena in nature that have been utilized for solving other types of realworld problems [35] . One such common phenomenon is Newton's law of gravity and motion which inspired Rashedi et al. [36] to introduce a new metaheuristic method termed as gravitational search algorithm (GSA). Gravitational search algorithm (GSA) has achieved effective results on many optimization problems like image segmentation, energy conservation, parameter selection, and clustering. A comprehensive survey of different applications of GSA is reviewed in [37] . The prominent attribute of GSA is the gravitational interaction among the considered solutions which makes it a unique and stable method [38] . Moreover, it updates solutions based on the fitness of each solution and distance among them, [39] . Further, it considers current solutions to achieve an optimal solution and requires only a single predefined parameter i.e., gravitational constant. [40] . Therefore, this paper leverages the strengthens of GSA for the optimal clustering of image features. GSA [36] is inspired by the Newtonian law of gravity and motion. It considers a swarm of solutions as objects which update the solution collectively. In GSA, every object interacts with other objects under the gravitational force. Initially, GSA explores the search space by allowing a high number of objects for interaction. The number of objects is linearly decreased to perform exploitation which is controlled by Kbest function [41] . In GSA, the fitness of an object determines its mass, and thus, a heavier object represents a better solution. Therefore, the position of the heaviest object corresponds to the optimal solution of the considered problem. Though GSA is quite efficient, there are still chances of improvement in the solution precision. Therefore, this paper presents a novel variant of GSA, improved GSA (IGSA), to enhance the vicinity to optimal solutions. Further, the proposed variant is employed to obtain optimal clusters in the proposed clustering method for the CoVID19 diagnosis. The overall contribution of this paper is four folds: The proposed KIGSA-C method is experimentally evaluated against two methods, namely Kmeans and Kmeans-based gravitational search algorithm clustering (KGSA-C) . For experimental analysis, three types of medical chest images, i.e. X-ray, CT-scan, and Ultrasound images, of non-CoVID19 and CoVID19 patients are taken from the publicly available links. The effectiveness of the considered methods is compared in terms of confusion matrix along with five classification measures, namely sensitivity, specificity, precision, F1score, and accuracy [42] . The organization of the paper is as follows; Section 2 discusses the related work. The feature extraction through transfer learning along with GSA is detailed in Section 3. Section 4 discusses the proposed method along with new variants of GSA followed with the experimental results in Section 5. Lastly, the conclusion is drawn in Section 6. Xu et al. [43] used a convolutional neural network (CNN) for transfer learning on the CT images and performed classification between influenza-A viral and CoVID19 pneumonia with an accuracy of 86.7%. On the same lines, Wang et al. [44] used a modified inception model instead of CNN to obtain 89.5% accuracy. Further, Hemdan et al. [45] performed a comparative study of various deep learning models on the chest X-ray images and reported the best accuracy of 90.00%. However, Sethy et al. [46] extracted deep features from the X-ray images and then trained the support vector machine to classify between positive and negative cases. The proposed model achieved an accuracy of 95.38%. Further, Wang and Wong [47] modeled a deep learning framework, COVID-Net, for multi-class classification of chest X-ray images among normal, bacterial infection, viral infection, and CoVID19 infection. This study achieved an accuracy of 83.5%. For similar classes, Farooq and Hafeez [48] utilized a pretrained ResNet-50 architecture and reported an accuracy of 96.23%. Further, Maghdid et al. [49] employed supervised methods on features obtained through AlexNet and obtained an accuracy of 94.00%. In literature, GSA has been modified or hybridized with different characteristics to improve its performance on various types of non-linear, non-differentiable, and large search space problems. Shaw et al. [50] used the concept of opposition-based learning to propose opposition-based GSA (OGSA), which achieved better solution precision. Chatterjee et al. [51] applied the wavelet theory in GSA and presented enhanced GSA (EGSA(I)) to improve the exploration ability. In [52] , Mirjalili et al. introduced enhanced GSA (EGSA(II)) in which the velocity parameter incorporates the exploitation feature of PSO. Sarafrazi et al. [53] introduced enhanced GSA (EGSA(III)), which modified the position parameter with disruption operator. Moreover, a modified GSA was proposed by Niknam et al. [54] by incorporating a selfadaptive mutation method in GSA. Li et al. [55] introduced chaotic behavior in the position equation and presented a chaotic gravitational search algorithm (CGSA). Mirjalili et al. [56] proposed adaptive gbest GSA (GGSA) to improve the exploitation at later iterations. Further, Khajehzadeh et al. [57] employed an adaptive maximum velocity constraint in the modified gravitational search algorithm (MGSA(I)). The modified gravitational search algorithm (MGSA(II)) [58] uses the chaotic operator to prevent premature convergence. In piecewise-GSA (PGSA), Li et al. [59] modified the gravitational constant with piecewise equation. Similarly, Li et al. [60] proposed a weighted inertia mass for each object to accelerate convergence. To reduce the centre bias behavior of GSA, Davarynejad et al. [61] introduced a mass-dispersed gravitational search algorithm (mDGSA). Moreover, a chaotic Kbest gravitational search algorithm (CKGSA) [62] introduces chaotic behavior based on logistic mapping in Kbest function. It makes GSA escape the local optima as chaos behavior has shown better-searching behavior than stochastic [58, 63] . However, chaotic behavior is dependent on initial conditions which make this biased towards the initial settings. Mittal et al. [64] proposed logarithmic Kbest GSA (LKGSA) in which Kbest function decreases logarithmically which results in allowing heavier objects to exert force for more number of iterations. However, LKGSA performs exploration for a few iterations only. It diminishes the ability of the LKGSA to explore the search space exhaustively. Further, exponential Kbest GSA (EKGSA) [65] defines the exponentially decrease in Kbest which allows Kbest function to perform exploration at an early stage and exhaustive exploitation at the later stage. Although EKGSA shows exhaustive exploitation at later iterations, the convergence precision still lacks. The availability of limited labeled data for a problem domain always leads the training of a deep learning model towards overfitting, especially when the number of samples in the data is much smaller in comparison to the data dimensionality [4] . Moreover, it has been witnessed in the literature that the performance of the raw deep learning model is not significant on CoVID19 images even if strategies like fine-tuning [66] and data augmentation [67] , are applied to reduce the problem of limited or imbalanced data. To resolve this, the concept of transfer learning, termed as transfer learning as feature extractor (TLFE), is employed in which a pre-trained deep learning model learned on a cross-domain dataset is used as a feature extractor for the current dataset [4] . The extracted features are low in dimension, which is further utilized with other learning models to perform the classification. This strategy of transfer learning advantages the learning processing in many ways, such as overcoming overfitting, reducing computational time, and decreasing required resources [13] . Generally, there are two types of deep learning models, namely the convolutional neural network (CNN) and recurrent neural network (RNN). RNN models are better in understanding the textual data while CNN models are efficient in encoding the visual patterns from images [68] . Therefore, a pre-trained CNN model is utilized to extract the feature vectors from the image dataset. In a CNN [69] , there are generally four kinds of layers, namely convolutional layer, non-linear activation layer, pooling layer, and fully-connected layer. Different combinations of these layers present different CNN models. Along with these layers, different CNN models use different classification layers, like sigmoid or softmax, to perform the classification task. Figure 3 depicts the architecture of AlexNet model for image classification. There are many [71] pre-trained CNN models publicly available for TLFE like AlexNet, VGG, ResNet, Inception, and MobileNet. [70] . In general, the convolutional layer performs the convolution operation of filtering the image features by applying 'K' kernels. The output generated from this layer is termed as feature maps. The non-linear activation layer brings nonlinearity in the system by applying non-linear, continuous, and differential functions such as sigmoid, tanh, and rectified linear unit (ReLU). Generally, CNN prefers ReLU for a non-linear activation layer whose mathematically formulation is defined in (1) . where, f (x) corresponds to the output ReLU function for the input value x. It can be observed that ReLU returns 0 if x is less than 0, else it attains x only. Figure 4 illustrates the same functioning of ReLU. Further, the pooling layer is applied in CNN to perform down-sampling and fetch important structural features through either max-pooling or average-pooling. Maxpooling computes the maximum value in the considered region of an image while average-pooling calculates the average value over the studied region. Usually, a filter of size 2 × 2 is moved around the image to perform down-sampling. Moreover, it has been observed in the literature that the max-pooling layer is more efficient than the average-pooling layer in CNN models comparatively. Lastly, the fully-connected layer, alias dense layer, corresponds to the layer in which each node receives input from each node of the previous layer. To apply this layer in CNN, a flattening operation is performed first, which transforms visual features into feature vectors which are generated from the combination of convolutional, non-linear activation, and pooling layer. After that, CNN employs a set of fully-connected layers till the last layer, i.e. classification layer. The classification layer evaluates the probability and maps the computed feature vectors to [72] class labels. If the classification problem is binary, then the sigmoid function is usually applied in the last layer, else softmax function is used for multi-class problems. To perform the feature extraction, the last layer of a pre-trained CNN model is truncated and the remaining combination of layers, i.e. convolutional, non-linear activation, pooling, and fully-connected, are used to extract the feature vector. In GSA, a swarm of solutions is considered to update the solutions collectively. It assumes every solution as an object and allows them to interact with each other under the gravitational force. Initially, GSA explores the search space by allowing a high number of objects for interaction and linearly decreases the number of objects to perform exploitation. Specifically, the search strategy is controlled by the Kbest function. The larger value of this function represents exploration, while its lower value corresponds to the exploitation of GSA. Moreover, the fitness of an object determines its mass, and thus, a heavier object means a better solution with better fitness. Hence, GSA converges every object towards the heaviest one based on the laws of gravity and motion. Moreover, the position of the heaviest object corresponds to the optimal solution of the considered problem. Assume there are P objects in a d dimensional space, each object (X i ) can be represented as (2). where, x i r corresponds to the position of i th object in the r th dimension. The position of the i th object at t th iteration is updated according to the (3). In (3), v r i (t + 1) corresponds to the updated velocity of i th object at t th iteration and computed by (4) . According to the law of motion, the acceleration (a r i (t)) of each object is computed which is formulated in (5) for i th object at t th iteration. here, F r i (t) and M i (t) correspond to the total force and mass of i th object at t th iteration respectively. GSA assumes gravitational mass and inertia mass of an object equal. Therefore, M i is calculated in every iteration t by (6) for i th object with fitness value (f it i (t)) where best (t) and worst (t) are measure by (8) and (9) respectively for minimization problem. Moreover, the total force F r i (t) is defined as a randomly weighted sum of the force by other Kbest objects in r th dimension and is defined as (10) . where, rand j ∈ [0, 1] is a random number. F ij is the force of j th object on i th object and is computed by (11) . In (10), Kbest is computed as (12) at t th iteration. where max it is the maximum number of iterations and f inal per is the percentage of objects which apply force to others. Equation (12) shows that the value of Kbest decreases linearly over iterations. To illustrate the trend of the Kbest function, Fig. 5 depicts the number of iterations along the x-axis and values attained by this function along the y-axis. It is observable from the figure that the Kbest function decreases linearly. At the end of the stopping criteria, the position of the heaviest object among the Kbest objects defines the optimal solution. The pseudo-code of GSA is presented in Algorithm 1 [73] . This paper presents a new clustering method, Kmeansbased improved gravitational search algorithm clustering (KIGSA-C) method, for the diagnosis of CoVID19 from medical chest images i.e., X-ray, CT-scan, and Ultrasound images. Figure 6 presents the architecture of the proposed method. First, the publicly available medical images of the infected and non-infected persons are collected. Then, the visual quality of all the collected images is improved through pre-processing. Here, standard normalization techniques of image processing are applied. Next, the feature extraction phase is performed in which a pre-trained CNN model acts as TLFE and aids in extracting prominent deep features for each input image. Since the initial population of a metaheuristic algorithm is initialized randomly, this may impact the algorithm convergence as well as may lead it to local optima. Therefore, the extracted features are optimally clustered through Kmeans-based improved gravitational search algorithm (KIGSA). In KIGSA, the solutions obtained from Kmeans are used for the population initialization of the proposed variant, improved gravitational search algorithm (IGSA). Further, the proposed IGSA updates the solutions to obtain optimal cluster centroids. The objective function considered by IGSA to obtain the optimal solution is the sum of squared Euclidean distance between the features and corresponding cluster centroid. Further, (13) equates the mathematical formulation for the considered objective function. where, C i and x j correspond to the i th cluster centroid and j th feature vector, respectively. Moreover, N represents the Fig. 6 The workflow of the proposed method number of feature vectors, while n depicts the number of clusters to be formed. As the diagnosis of CoVID19 is a binary classification problem, the number of formed clusters is two. Finally, the obtained optimal cluster centroids are used to perform the mapping of input image features with labels. Further, the pseudo-code of the proposed method is described in Algorithm 2. The proposed variant, improved gravitational search algorithm (IGSA), is detailed in the following section. In GSA, the Kbest function regulates the trade-off between exploration and exploitation. This function determines the 'K' number of best solutions which apply the gravitational force in an iteration and its value decreases linearly with increasing iterations. Hence, the transition from exploration to exploitation in GSA is linear [74] . Moreover, in later iterations, the small value of Kbest results in reduced exploration ability [75] . Henceforth, the probability of GSA to trap into local optimum increases [76] which results in lacking the ability to achieve better solution precision. To alleviate the same, a novel variant of GSA, improved gravitational search algorithm (IGSA), is proposed in which the Kbest function is modified. Equation 14 formulates the mathematical definition of the new Kbest function at t th iteration. (14) where, P , max it and f inal per correspond to population size, maximum iterations, and minimum percentage of objects, respectively. Further, Fig. 7 illustrates different values attained by the modified Kbest over 1000 iterations for P as 50. It can be observed from Fig. 7 that modified Kbest does not decrease linearly. Moreover, it starts favouring exploitation over exploration after a few iterations. It results in achieving better solution precision as heavier objects exploit the search space over more iterations comparatively. The time complexity of the proposed method to perform clustering is defined as follows: where N, k, and t correspond to the number of feature vectors, number of clusters, and number of iterations respectively. Table 1 The considered standard benchmark functions [27] S.No. Ackley Rosenbrock and Yang's Step Further, each individual of KIGSA-C method is initialized with Kmeans and updated according to IGSA based on the defined objective function. Therefore, the time complexity of the proposed method is defined as O(N x k x t + P 2 + N x k). The experimental evaluation is conducted in two sections. Table 1 describes the first set (S 1 ) of benchmark functions consisting of (F 1 − F 17 ) functions [27] along with their definitions, range of features, global minimum fitness, optimal position values, and categories. Moreover, to validate the robustness, the second set (S 2 ) contains (C 1 − C 28 ) real-parameter single-objective unconstrained optimization functions [78] which are briefed in Table 2 . The performance has been evaluated and statistically analyzed in terms of mean fitness value, Friedman test, and box-plot. To perform the comparative analysis, the new variant have been compared with standard GSA and nine variants of GSA, namely OGSA [50] , mDGSA [61] , GGSA [56] , PGSA [59] , MGSA(I) [57] , MGSA(II) [58] , EGSA(I) [51] , EGSA(II) [52] , and EGSA(III) [53] . The comparison also includes the performance results with three recent metaheuristic algorithms, namely modified particle swarm optimization (MPSO) [79] , shuffled differential evolutionary (SDE) [80] , and spiral biogeography based optimizer (SBBO) [81] . Moreover, the algorithms are evaluated over 10, 30, 50, and 90 dimensions for each benchmark function. 30 run of each experiment is conducted to minimize the interference [82] . Table 3 details the parameter settings of all the considered algorithms which are taken from their respective literature. The parameter values of the proposed algorithm have been decided empirically by testing its performance on different parameter values of standard GSA algorithm. Tables 4 and 5 tabulate the mean fitness values of the considered algorithms for S 1 set (F 1 − F 10 ) and (F 11 − F 17 ) respectively. Furthermore, Tables 6, 7, and 8 show results on the S 2 set (C 1 − C 10 ), (C 11 − C 20 ), and (C 21 − C 28 ) respectively. In tables, best values are made bold. It can be observed that the IGSA has shown better results comparatively. On S 1 benchmark functions, IGSA obtained superior results on 53%, 71%, 59%, and 65% of 17 problems for each 10, 30, 50, and 90 dimensions respectively. While on the S 2 problems, IGSA outperformed compared algorithms on 54%, 61%, 50%, and 58% out of 28 problems on the respective dimensions. Thus, it can be seen that IGSA attains better trade-off between exploration and exploitation on different complexities functions. To validate the IGSA statistically, Friedman's test [83] has been conducted between IGSA and considered algorithms. Friedman's test is a non-parametric statistical test and has been performed by considering the mean fitness values over 30 runs for each benchmark function in the S 1 and S 2 sets and results are presented in Tables 9 and 10 for each considered dimension. Generally, Friedman's test considers two hypotheses, namely null hypotheses and alternative hypotheses. The null hypothesis (H0) refers to that there is no significant difference among the samples Bold font : Best value and the alternative hypothesis signifies that all samples are significantly different. In the tables, p-value is less than the considered significance level, i.e. α = 0.05 which indicates that the null hypothesis (H0) is rejected. Moreover, IGSA attained minimum mean rank value among the comparative algorithms in all tables. Therefore, it is perceptible that there is a significant difference between IGSA and other algorithms for both sets over 10, 30, 50, and 90 dimensions. Further, box-plots have been drawn to analyze the consistency of the new variant for different dimensions in terms of the average failure rate. Failure rate corresponds to the number of runs in which an algorithm fails to achieve the optimum value. To count the same, the considered algorithms have been executed 30 times where each execution consists of 30 runs. Figure 8 illustrates the boxplots over different dimensions i.e. 10, 30, 50, and 90. The considered algorithms are depicted on the horizontal axis, while the average failure rate is denoted on the vertical axis. In the figures, IGSA is comparatively more consistent with small interquartile range and medians. Therefore, it is pertinent from experiments that IGSA attains better global and local search balance along with high precision. The experimental analysis of the proposed clustering (KIGSA-C) method for CoVID19 diagnosis is evaluated on three types of chest images, namely X-ray, CT-scan, and Ultrasound. Table 11 tabulates the considered datasets for the three types of chest images. In the X-ray dataset, there are 234 chest images of CoVID19 patients that are publicly available on [84] . A similar count of non-CoVID19 images is considered from a publicly available dataset which consists of chest X-ray of patients who are infected from other viral and bacterial pneumonia like, MERS, SARS, and ARDS [85] . Further, four CNN models namely, DenseNet-161, ResNet-34, VGG-16, and MobileNet-V2 are considered to perform feature extraction as they have top-5 error value as less than 10 [70] . For feature extraction, the last layer of these pre-trained models is truncated and the remaining combination of layers, i.e. convolutional, non-linear activation, pooling, and fully-connected, are used to extract features. Table 12 tabulates the considered CNN models for TLFE along with the number of extracted features. After extracting the deep features, optimal cluster centroids are obtained through the KIGSA-C method which is used to label the considered images in two categories, namely CoVID19 and non-CoVID19. As labeled dataset is used for experimentation, the performance of the proposed method can be evaluated on different classification parameters, such as accuracy, precision, and F1-score, by comparing the predicted label with actual label [42] . Moreover, the results of the proposed KIGSA-C method is compared against two methods, i.e. Kmeans and Kmeans-based gravitational search algorithm clustering (KGSA-C). Table 13 tabulates the confusion matrices between CoVID19 and non-CoVID19 classes for the considered methods over different TLFE models. Bold entries present the best values. From Table 13 , it is observable that the proposed method identifies more than 98.5% of the CoVID19 images correctly. Moreover, the false-positive rate of the proposed method is comparatively low than the other methods. In terms of false-negative, the KIGSA-C method attained competitive results. For better visualization comparison, the classification accuracy of the considered methods on the X-ray dataset is plotted as barchart in Fig. 9 . In the figure, X − axis denotes the names of the method while Y −axis represents the obtained accuracy. From the figures, it can be claimed that the proposed KIGSA-C method outperforms other considered methods with a margin of more than 1% approximately. Further, the performance of the proposed method is evaluated on four more parameters, namely sensitivity, specificity, precision, and F1-score. The formulation of each parameter [87] is expressed in (15) -(18) respectively [88] . Table 14 presents the results of the considered methods on the studied parameters for X-ray images. The bold represents the best value. It can be observed for the table that the proposed method has consistency in the results. Sensitivity = true positives true positives + f alse negatives (15) Specif icity = true negatives true negatives + f alse positives (16) P recision = true positives true positives + f alse positives (17) F 1 score = 2 precision * recall precision + recall (18) Further, the performance of the considered methods is examined on CT-scan and Ultrasound images. The CTscan dataset referred from a publicly available resource [86] . This dataset consists of 392 CT-scan images of CoVID19 patients with an equal number of non-CoVID19 Fig. 9 Classification accuracy of considered methods on X-ray dataset In this paper, a new clustering method, Kmeans-based gravitational search algorithm clustering (KIGSA-C) method, has been introduced for the diagnosis of CoVID19 from the medical images. Although many supervised models exist in the literature, the generalization of these models may suffer due to the limited availability of datasets. To remedy the same, a new clustering is presented. In the proposed method, features are extracted from the considered images through a deep learning model. The extracted features are further optimally clustered through a novel variant, improved gravitational search algorithm (IGSA). The obtained optimal cluster centroids are then used to perform the labeling of the considered images into CoVID19 and non-CoVID19. Further, the proposed variant, IGSA, has been validated against 16 recent metaheuristic algorithms over two sets of benchmark functions belonging to the three types of problems, namely unimodal, multimodal, and real parameters single-objective optimization problems of IEEE Congress on Evolutionary Computation (CEC), 2013. The experimental performance has been evaluated over four different functional dimensions, i.e. 10, 30, 50, and 90 and studied in terms of mean fitness value, Friedman test, and boxplot. It has been observed from experimental results that IGSA has surpassed the considered algorithms over the maximum number of benchmark problems in each dimension. Moreover, the Friedman test has statistically ranked the IGSA as the top algorithm among the considered algorithms. The analysis of the box-plots indicates that the searching behavior of IGSA is better with consistency. Thus, it can be concluded that IGSA achieves better precision with a balanced trade-off between exploration and exploitation. Further, the performance of the new KIGSA-C method has been evaluated against Kmeans and Kmeansbased gravitational search algorithm clustering methods on three types of medical chest images, namely X-ray, CTscans, and Ultrasound images. In experiments, the feature extraction is performed through four pre-trained deep learning models, namely DenseNet-161, ResNet-34, VGG-16, and MobileNet-V2. The experimental results are measured in terms of sensitivity, specificity, precision, F1-score, and accuracy. Experiments affirm that the proposed KIGSA-C method is efficient and can be used as an alternative tool in aiding the diagnosis of CoVID-19. In the future, real-time datasets may be explored to enhance the efficiency of the proposed model. Moreover, it may be scaled upon a distributed framework to reduce the computation time. Additionally, a parallel version of the proposed method can be explored to handle big datasets by employing architectures like spark or Hadoop. Organization WH (2020) Novel coronavirus (2019-ncov): situation report Covid-19 infection: origin, transmission, and characteristics of human coronaviruses Automatic detection of coronavirus disease (covid-19) in x-ray and ct images: a machine learning-based approach The epidemiology and pathogenesis of coronavirus disease (covid-19) outbreak Correlation of chest ct and rt-pcr testing in coronavirus disease 2019 (covid-19) in China: a report of 1014 cases Classification of covid-19 patients from chest ct images using multi-objective differential evolution-based convolutional neural networks Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirusinfected pneumonia in Wuhan, China Temporal changes of ct findings in 90 patients with covid-19 pneumonia: a longitudinal study Ct manifestations of two cases of 2019 novel coronavirus (2019-ncov) pneumonia Chest ct for typical 2019-ncov pneumonia: relationship to negative rt-pcr testing Automatic detection of coronavirus disease (covid-19) using x-ray images and deep convolutional neural networks Performance comparison of deep neural networks on image datasets Enhancing text mining using deep learning models Feature selection method based on hybrid data transformation and binary binomial cuckoo search A novel clustering method using enhanced grey wolf optimizer and mapreduce A new fuzzy cluster validity index for hyper-ellipsoid or hyper-spherical shape close clusters with distant centroids Twitter sentiment analysis using hybrid cuckoo search method Grey relational analysis based keypoints selection in bag-of-features for histopathological image classification A parallel military dog based algorithm for clustering big data in cognitive industrial internet of things Parallel bat algorithm-based clustering using mapreduce. In: Networking communication and data knowledge engineering Classification of histopathological images through bag-of-visual-words and gravitational search algorithm Dynamic frequency based parallel k-bat algorithm for massive data clustering (dfbpkba) A novel differential evolution test case optimisation (detco) technique for branch coverage fault detection An exhaustive survey on nature inspired optimization algorithms Eewc: energy-efficient weighted clustering method based on genetic algorithm for hwsns A genetic algorithm tutorial Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces Particle swarm optimization Ant colony optimization Biogeography-based optimization Artificial bee colony algorithm with global and local neighborhoods Enhancing sentiment analysis using enhanced whale optimisation algorithm Gsa: a gravitational search algorithm A comprehensive survey on gravitational search algorithm Gravitational search algorithm: a comprehensive analysis of recent variants ckgsa based fuzzy clustering method for image segmentation of rgb-d images Interval type-2 fuzzy logic for dynamic parameter adaptation in a modified gravitational search algorithm Histopathological image classification by optimized neural network using igsa Unsupervised data classification using improved biogeography based optimization Deep learning system to screen coronavirus disease 2019 pneumonia A deep learning algorithm using ct images to screen for corona virus disease Covidx-net: a framework of deep learning classifiers to diagnose covid-19 in x-ray images Detection of coronavirus disease (covid-19) based on deep features Covid-net: a tailored deep convolutional neural network design for detection of covid-19 cases from chest radiography images Covid-resnet: a deep learning framework for screening of covid19 from radiographs Diagnosing covid-19 pneumonia from x-ray and ct images using deep learning and transfer learning algorithms A novel oppositionbased gravitational search algorithm for combined economic and emission dispatch problems of power systems A maiden application of gravitational search algorithm with wavelet mutation for the solution of economic load dispatch problems A new hybrid psogsa algorithm for function optimization Disruption: a new operator in gravitational search algorithm Probabilistic energy and operation management of a microgrid containing wind/photovoltaic/fuel cell generation and energy storage devices based on point estimate method and self-adaptive gravitational search algorithm Parameters identification of chaotic system by chaotic gravitational search algorithm Adaptive gbest-guided gravitational search algorithm A modified gravitational search algorithm for slope stability analysis A chaotic digital secure communication based on a modified gravitational search algorithm filter Piecewise function based gravitational search algorithm and its application on parameter identification of avr system Path planning of unmanned aerial vehicle based on improved gravitational search algorithm Massdispersed gravitational search algorithm for gene regulatory network model parameter identification Chaotic inertia weight in particle swarm optimization An image segmentation method using logarithmic kbest gravitation al search algorithm based superpixel clustering An optimum multi-level image thresholding segmentation using non-local means 2d histogram and exponential kbest gravitational search algorithm Decision fusion-based fetal ultrasound image plane classification using convolutional neural networks Automatic diagnosis of fungal keratitis using data augmentation and image fusion with deep convolutional neural network The history began from alexnet: a comprehensive survey on deep learning approaches A survey of deep neural network architectures and their applications torchvision.models ? pytorch 1.5.0 documentation Enhanced bag of features using alexnet and improved biogeography-based optimization for histopathological image analysis Improving sentiment analysis using hybrid deep learning model Gsa: a gravitational search algorithm Gravitational particle swarm A band selection method for airborne hyperspectral image based on chaotic binary coded gravitational search algorithm A review of gravitational search algorithm An automatic nuclei segmentation method using intelligent gravitational search algorithm based superpixel clustering Real parameter single objective optimization using self-adaptive differential evolution algorithm with more strategies A modified particle swarm optimization for large-scale numerical optimizations and engineering design problems Shuffled differential evolution-based combined heat and power economic dispatch Histopathological image classification using enhanced bag-of-feature with spiral biogeography-based optimization Spam review detection using spiral cuckoo search clustering method Overview of friedman's test and post-hoc analysis ieee8023/covid-chestxray-dataset: we are building an open database of covid-19 cases with chest x-ray or ct images Chest x-ray images (pneumonia) -kaggle Covid-ct-dataset: a ct scan dataset about covid-19 jannisborn/covid19 pocus ultrasound: open source ultrasound (pocus) data collection initiative for covid-19 A new weighted two-dimensional vector quantisation encoding method in bag-of-features for histopathological image classification