key: cord-0760936-f31gcurc
authors: Yousri, Dalia; Abd Elaziz, Mohamed; Abualigah, Laith; Oliva, Diego; Al-qaness, Mohammed A.A.; Ewees, Ahmed A.
title: COVID-19 X-ray images classification based on enhanced fractional-order cuckoo search optimizer using heavy-tailed distributions
date: 2020-12-24
journal: Appl Soft Comput
DOI: 10.1016/j.asoc.2020.107052
sha: 3631e6e30afa0f9fb29e3ca4528c6ccd5f8d1195
doc_id: 760936
cord_uid: f31gcurc

Classification of COVID-19 X-ray images to determine the patient’s health condition is a critical issue these days since X-ray images provide more information about the patient’s lung status. To determine the COVID-19 case from other normal and abnormal cases, this work proposes an alternative method that extracted the informative features from X-ray images, leveraging on a new feature selection method to determine the relevant features. As such, an enhanced cuckoo search optimization algorithm (CS) is proposed using fractional-order calculus (FO) and four different heavy-tailed distributions in place of the Lévy flight to strengthen the algorithm performance during dealing with COVID-19 multi-class classification optimization task. The classification process includes three classes, called normal patients, COVID-19 infected patients, and pneumonia patients. The distributions used are Mittag-Leffler distribution, Cauchy distribution, Pareto distribution, and Weibull distribution. The proposed FO-CS variants have been validated with eighteen UCI data-sets as the first series of experiments. For the second series of experiments, two data-sets for COVID-19 X-ray images are considered. The proposed approach results have been compared with well-regarded optimization algorithms. The outcomes assess the superiority of the proposed approach for providing accurate results for UCI and COVID-19 data-sets with remarkable improvements in the convergence curves, especially with applying Weibull distribution instead of Lévy flight.

The first cases of coronavirus disease (COVID- 19) were registered in Wuhan, an important city of China, in December 2019. COVID-19 is originated by a virus called SARS-CoV-2, and currently, it is one of the major concerns worldwide. By the middle of April 2020, the affectations of COVID-19 in humans can be summarized in more than 150,000 deaths and almost 1,700,00 confirmed cases worldwide [1] . These amounts are evidence of the fast dissemination in the population. The symptoms of COVID-19 known until now include fever, cough, sore throat, headache, fatigue, muscle pain, among others [2] . The test for detection of COVID-19 commonly used is invasive, and it is called a reverse transcription-polymerase chain reaction (RT-PCR) [3] . COVID-19 is not only affecting the health of the nations but also the consequences of the diseases are important (e.g., economic and psychological [4, 5] ). Another important consideration is that prompt detection could be reflected in early treatment. Then COVID-19 is a pandemic whit a large number of challenges to face. Based on the above information, they are required tools that permit the detection of this mortal disease in a timely manner.

On the other hand, the use of medical images to diagnose disease has increased in the latest years. Different computer vision and image processing tools can be used to identify the abnormalities produced by the illnesses. One of the main advantages of using this kind of system is that the detection is fast and accurate. However, the results must be validated by an expert. In this way, medical image processing algorithms can be used as a primary diagnosis that provides a clue about a possible disease. In the case of COVID-19, the use of X-ray images permits to study how the virus is affecting the lungs. Since COVID-19 is a recent virus, only a few data-sets are available, and the number of works related to them is also reduced.

A number of studies have been proposed to classify COVID-19 from X-ray scan images using different techniques, such as DenseNet201 [6] , MobileNet v2 [7] , DarkNet [8] , and others [9] [10] [11] [12] . In general, the medical images can be directly analyzed to identify certain elements in the scene that permits the diagnosis. However, it is also common to extract different features from all the images and create a set that contained the information of the objects contained in the images [13] . In this case, not all the features extracted could provide essential information about the disease that is going to be detected; for that reason, it is necessary to use automatic tools that remove the non-desired (irrelevant or duplicated) elements.

Feature Selection (FS) is an important pre-processing tool that helps to extract the information desired from the irrelevant data [14] [15] [16] . FS helps in reducing the dimensionality of the data by selecting only the relevant information in the next steps. With this pre-processing, the machine learning could be applied more efficiently over the data-set by decreasing computational effort [17, 18] . The use of FS has been extended for to different domains, for example in bio-medicine for diagnosis of neuromuscular disorders [19] , in signal processing for speech emotion recognition [20] , in internet of things specially for medicine with EEG signals [21] , human activity recognition [22] and text classification [23] , to mention some. The FS works by maximizing the relevance of the features, but also the redundancy could be minimized. In comparison to other methods, FS preserver the data by not applying domain transformations; then, the FS operates by including and excluding the attributes in the dataset without any modification. The FS has three main advantages, (1) it improves the prediction performance, (2) it permits a better understanding of the processes that fastly generate the data, and (3) it has more effective predictors with lower computational cost [24] . According to Pinheiro et al. [25] , they exist two kinds of FS method, namely the wrapper and filter methods. The most popular approaches are the wrapper-based because they provide more accurate results [26] . The wrapper FS method needs an internal classifier to find a better subset of features; this could affect their performance, especially with large data-sets. Moreover, they also have backward and forward strategies to add or remove features that become non-suitable for a larger amount of features. Considering these drawbacks, meta-heuristic algorithms (MA) are used to increase the performance of FS processes.

MA have become more popular in recent years due to their flexibility and adaptability [27] . MA can be used in a wide range of applications, and FS is not an exception. Some classical approaches Genetic Algorithms (GA) [28] , Particle Swarm Optimization (PSO) [29] and Differential Evolution (DE) [30] have been successfully used to solve the FS problem. More so, modern MA algorithms also have been applied for FS, such as Grasshopper Optimization Algorithm (GOA) [31] , Competitive Swarm Optimizer (CSO) [32] , Gravitational Search Algorithm (GSA) [33] , and others. since FS can be seen as an optimization problem, there is not MA able to handle all the difficulties in FS. This last fact is defined according to the No-Free-Lunch (NFL) theorem [34] ; therefore, it is necessary to continue exploring new alternative MA.

An interesting MA is the Cuckoo Search (CS) algorithm introduced by Yang and Deb [35, 36] as a global optimization method inspired by the breeding behavior of Cuckoo birds (CB). CS is a population-based method that employs three basic rules that mimic the brood parasitism of the Cuckoo bird. In nature, the female CB lays the eggs in the nest of other species. Some CB species can imitate the color of the eggs of the nest where they will be laid. In the CS context, a set of host nest containing eggs (candidate solutions) are randomly initialized, then only one Cuckoo egg is generated in each generation using Lévy Flights. After that, a random host nest is chosen, and its egg is evaluated using a fitness function. The fitness value of the Cuckoo egg and the one selected from a random host nest are compared, the best egg is preserved in the host nest. A fraction of the worst nest is abandoned in each generation, and the best solutions are preserved [35] . The use of CS for several applications has been increased in recent years [37, 38] . The CS is an interesting alternative for solving complex optimization problems. However, the CS has the disadvantage that it can fall into sub-optimal solutions [39] ; this is caused due to inappropriate balance between exploration and exploitation phases. To overcome such drawbacks, they have been introduced some modifications of the standard CS.

Recently, an interesting approach called fractional-order cuckoo search (FO-CS) is proposed to include the use of fractional calculus to enhance the performance of the CS [40] . The FO-CS possesses better convergence speed than the CS; besides, the balance between exploration and exploitation produced by the fractional calculus permits more accurate solutions to complex optimization problems. Nevertheless, several other approaches can be included with FO-CS for further performance enhancement. This paper presents the use of heavy-tailed distributions instead of Lévy flights in the FO-CS. This kind of distribution has been used to enhance the mutation operator in evolutionary algorithms and other MA [41, 42] . Using the heavy-tailed distributions, it is possible to escape from non-prominent regions of the search space [41] . The distributions used in to enhance the FO-CS are the Mittag-Leffler distribution (ML D ) [43] , the Pareto distribution (P D ) [44] , the Cauchy distribution (C D ) [45] , and the Weibull distribution (W D ) [46] . The FO-CS based on heavy-tailed distributions is proposed as an alternative method for solving features selection problems. In this context, the experimental results include different tests considering eighteen data-sets from the UCI repository [47] . The comparisons included different MA from the state-of-the-art, where the proposed approach provides better results in terms of accuracy and convergence. Moreover, the most important contribution of this work is the use of the FO-CS based on heavy-tailed distributions for FS over a COVID-19 data-set. Over this experiment, the results obtained by the proposed method permits to accurately identify three different classes, namely normal patients, COVID-19 infected patients, and Pneumonia patients. Regarding the results of UCI and COVID-19 data-sets, different statistical and metrics validate the good performance of the FO-CS based on heavy-tailed distributions, especially Weibull distribution. The main contribution of the current study can be summarized as follows:

1. Provide an alternative COVID-19 X-ray image classification method which aims to detect the COVID-19 patient from normal and abnormal cases. 2. Extracting features from X-ray images then using a new feature selection method to select the relevant features. 3. Develop feature selection method using a modified CS based on the fractional concept and heavy-tailed distributions. 4. Evaluate the developed method using two data-set for real COVID-19 X-ray images. 5. Compare the new developed FS method with other recently implemented FS methods.

The rest of the paper is organized as follows. Section 2 presents the related works. Section 3 introduces the basics of heavy-tailed distributions, CS, and FO-CS. Section 4 explains the proposed modified FO-CS, in Section 5 presents the experimental results and comparisons. Finally, in Section 6, the conclusions and future work are discussed.

In this section, we present a simple review of the existing works of COVID-19 medical image classification, the applications of MA for feature selection, and the recent applications of the CS.

Recently, some studies have been presented to classify and identify COVID-19 from chest X-ray images using different techniques. Pereira et al. [48] applied CNN to extract texture features from chest X-ray images. They applied resampling techniques to balance the distribution of the classes to implement a multiclassification process. Ozturk et al. [8] applied a deep learningbased approach for determining COVID-19 for X-ray scan images. They used DarkNet model to classify images into binary classification and multi-classification. For binary classification, images were classified into COVID or no-findings. For the multiclassification, images were classified into three classes, COVID, pneumonia, and no-findings. Elaziz et al. [49] proposed a machine learning-based model to classify COVID-19 X-ray images into two classes, COVID or non-COVID. They proposed a feature extraction method called fractional multichannel exponent moments to extract features from X-ray scan images. Thereafter, they applied an improved Manta-Ray Foraging Optimization algorithm as a feature selection method to select the most relevant features. In [9] , the authors evaluated several convolutional neural network models with different X-ray scan images that include, COVID-19, normal incidents and bacterial pneumonia. Ouchicha et al. [50] proposed a deep learning-based approach for COVID-19 classification, called CVDNet, which is built by using the residual neural network. The evaluation outcomes showed that CVDNet could achieve a high detection accuracy on a small dataset. In [51] , the authors proposed a deep learning approach, namely COVIDX-Net for COVID-19 diagnosis from chest x-ray scan images. COVIDX-Net depends on seven deep learning architectures. It was evaluated with 50 X-ray images, and it showed a good performance. In [52] , a new model called CoroNet is proposed for COVID-19 detection from both of CT and X-ray images. Xception architecture is used to build CoroNet, and ImageNet datasets are applied to train it. It showed an acceptable performance for several classes with an average accuracy of 89. 6 Toraman et al. [53] presented a Convolutional CapsNet model for COVID-19 detection from X-ray images. The proposed model was applied for binary classification (COVID-19, and None-COVID) and also for multi-class classification. Furthermore, Jain et al. [54] applied a deep learning model based on ResNet-101 for COVID-19 X-ray image classification. The proposed model achieved significant performance using several performance measures. Hassantabar et al. [55] applied deep neural network (DNN) and convolutional neural network (CNN) to detect infected lung tissue of COVID-19 patients form X-ray scans. Furthermore, there are different approaches for COVID-19 X-ray images classification, such as MobileNet v2 [7] , DenseNet201 [6] , and others [10] [11] [12] . In recent years, the modern MA have been widely applied for FS. for example, in [56] it has been proposed the use of the Grasshopper Optimization Algorithm (GOA) to identify the best subset of features, the use of Crow Search Algorithm (CSA) with chaotic maps has also been proposed for FS using different benchmark data-sets [31] . Another interesting work includes the implementation of the Competitive Swarm Optimizer (CSO) for FS [32] ; the advantage of CSO is its capability to handle high-dimension optimization problems. In [33] , an evolutionary version of the Gravitational Search Algorithm (GSA) is introduced for FS, and the use of mutation and crossover operators enhance the performance of the standard GSA. Besides using the hybrid Particle Swarm Optimization (PSO) to extract the optimal features of different benchmark data-sets. Too and Mirjalili [57] proposed an FS approach based on a hyper learning binary dragonfly algorithm. They applied the proposed approach for different FS datasets, including the COVID-19 patient health prediction dataset. All these approaches are novel, and their performance is relevant. However, they have some disadvantages; the first is that, like many other meta-heuristics, they can fail into sub-optimal solutions; the second is related to their use only in benchmark sets. Finally, since FS can be seen as an optimization problem, there is not MA able to handle all the difficulties in FS. This last fact is defined according to the No-Free-Lunch (NFL) theorem [34] ; therefore, it is necessary to continue exploring new alternative MA.

The cuckoo search (CS) is one of the algorithms that have been applied for several applications has been increased in recent years [37, 38] . The CS is an interesting alternative for solving complex optimization problems. However, the CS has the disadvantage that it can fall into sub-optimal solutions [39] ; this is caused due to inappropriate balance between exploration and exploitation phases. To overcome such drawbacks, they have been introduced some modifications of the standard CS. For example, in [58] , the authors proposed the hybridization of CS with the Chaotic Flower Pollination algorithm (CFPA) for maximizing area coverage in Wireless Sensor Networks. The Reinforced Cuckoo Search Algorithm (RCSA) is presented by Thirugnanasambandam et al. [59] for multi-modal optimization, the RCSA contains three operators called modified selection strategy, Patron-Prophet concept, and self-adaptive strategy. An enhanced version of the CS that includes the quasi-opposition based learning and other two strategies is introduced in [60] for the parameter estimation of photovoltaic models. Another recent approach of fractional-order cuckoo search (FO-CS) including the use of fractional calculus was proposed for identifying the unknown parameters of the chaotic, hyper-chaotic fractional-order financial systems [40] . The proposed FO-CS exposed its efficiency in solving set of known mathematical benchmarks as well with a remarkable convergence speed in comparison with the basic CS.

In this section, the essential knowledge about the features extraction and feature selection approaches from images are explained in detail. Moreover, we introduce an overall background about the integrated parts in the novel optimizer variants that are the heavy-tailed distributions and the novel fractional-order cuckoo search optimizer (FO-CS) as described below:

In this study, the image features are extracted using the following techniques, the Fractional Zernike Moments (FrZMs), Wavelet Transform (WT), Gabor Wavelet Transform (GW), and Gray Level Co-Occurrence Matrix (GLCM). Brief descriptions of these techniques are listed below.

The FrZMs are used to extract gray images features as presented by [61] . This process is shown in Eq. (1) [62] . FrR αn,m (r)g(r, θ)e −jmθ rdθ dr (1) where α denotes the fractional parameter, α ∈ R + . g(r, θ) is a gray image. FrR αn,m (r) denotes the real valued radial polynomial as, and it is calculated by Eq. (2) . n and m define the order which |m| ≤ n and n − |m| be even, j 2 = −1.

FrR αn,m (r)

The calculation of FrZMs need a linear mapping transformation of image coordinates system to a proper space inside a unit circle because they are determined in polar coordinates (r, θ) where |r| ≤ 1 [63] . This step is calculated by Eq. (3)

where

N defines the pixels number in the image.

The WT is a method used for signal analysis and feature extraction [64] . It breaks up a signal into scaled and shifted versions of the mother wavelet. It is represented by the sum of the signal multiplied by the shifted and scaled versions of the wavelet function (ψ ). The continuous WT (CWT) of a signal x(t) can be computed as:

where ψ denotes the mother wavelet. a and b ∈ R and represent the shifted and scaled parameters, respectively. a is not equal to zero.

The GW is a popular technique used filters for extracting image features [65] . It is calculated by Eq. (7).

where µ and ν represent the kernels' orientation and scale. k (µ,ν) denotes the Gabor vector as in Eq. (8) .

where, k ν equals k max /f ν . k max represents the max frequency. f represents the spacing factor among kernels in the frequency space. φ equals π µ/8.

The GLCM technique is a statistical technique which applied to extract the texture features from an image [66] .

The GLCM uses five equations to perform its task as follow:

• The contrast (con) equation represents the amount of local variations in an image I ij . Eq. (9) calculates this equation.

• The correlation (corr) equation represents the relationship among the image pixels. Eq. (10) calculates this equation.

• The energy equation represents the textural uniformity.

Eq. (11) calculates this equation. (11) • The entropy equation defines the intensity distribution randomness. Eq. (12) calculates this equation. (12) • The homogeneity (hg) equation defines the distribution closeness. Eq. (13) calculates this equation.

where L defines the gray levels in the image. p ij is the number of transitions between i and j.

Random walking has an important effect on the efficiency and quality of the MA. The Lévy flight distribution may consider the most popular random walk whereby the jump-lengths have a heavy-tailed probability distribution. That is why the Lévy flight distribution is employed in MA to be a more effective source for the random walk than the Gaussian distribution.

For the heavy-tailed distributions, there are several types of distributions, such as Mittag-Leffler distribution, Cauchy distribution, Pareto distribution, and Weibull distribution can be applied to the MA to emulate the random walk in the algorithms. In this section, we present the details and the mathematical formulas of those heavy-tied distributions and their charts as follows;

• Mittag-Leffler distribution (ML D ) [43] : a random variable has been followed the ML D distribution that its distribution can be defined via the following function;

where ML D distribution is a heavy-tailed when the U > 0 and δ ∈ [0 1]. The symbols γ and δ are the scale and shape parameters, and they take the value of 4.5, and 0.8, respectively. The symbols of w and v denote uniform random numbers. The τ is a Mittag-Leffler random number. • Pareto distribution (P D ) [44] : a random variable has been followed the Pareto distribution if it has the following tail pattern as below:

where a and b are the scale and shape parameters and have values of 1.5 and 4.5, respectively.

• Cauchy distribution (C D ) [45] : a random variable has been followed the Cauchy distribution if it has the following tail formula as below:

where β and µ are the scale and location parameters, and they have values of 4.5 and 0.8, respectively. • Weibull distribution (W D ) [46] : a random variable has been shown to follow the Weibull distribution if it has the following tail formula as below:

where k and ζ are the scale and shape parameters. The Weibull Distribution is a heavy-tied when ζ has values less than 1. Therefore in the current work, the tuned values of k, and ζ are 4 and 0.3, respectively. 

The cuckoo search (CS) has been innovated by Yang et al. [36] inspiriting from the natural behavior of cuckoo breeding parasitism. In [36] , authors modeled that behavior mathematically via three hypotheses mainly (I) each cuckoo lays one egg at a time, and (II) cuckoo puts egg in a nest that chosen randomly, and the fittest is retained for the next generation (III) the available host nests number is bounded, and a host cuckoo can detect a stranger egg with a probability P s ∈ [0 1]. These behaviors can be formulated mathematically using Levy fights. For each cuckoo i, a new solution U (t+1) is obtained based on Levy flight as in Eq. (18) .

where σ > 0 is the step size, as per the literature, it is tuned as 1. Levy flight is a random walk, and its jumps are computed from a Levy distribution. For the local random walk, the solution will be updated using the following equation;

where the product ⨂ denotes entry-wise multiplications. Where U j and U k are two randomly selected solutions. The H is a Heaviside function, s is the step size, and ϵ is a random number drawn from a uniform distribution. Yang et al. [36] utilized a switching probability P s to transmit amongst the local and global random walk and assumed its values as 0.25 to achieve a balance in the transmission stage.

Recently Yousri et al. [40] proposed a novel variant for CS algorithm to improve the global cuckoo walk of Eq. (18) via accounting the memory prospective of cuckoo during motion based on fractional-calculus (FC) method. In the FO-CS, four previous terms from memory are saved for each cuckoo during its motion, and that memory has been modified based on the first input first output concept. Moreover, the switching probability P s has been computed based on Beta function and FC parameters as follow:

• The enhanced global random walk of cuckoo based on the previous four terms from memory (m = 4) is updated as follow;

where σ refers to the derivative order coefficient [40] . The authors strongly believe that with increasing the memory terms, the execution time has been increased; therefore, in the current study, the number of terms is selected as m = 4 to maintain an acceptable execution time.

• In the FO-CS, the switching probability P s has generated using the B distribution. The B function is calculated for σ from 0.1 to 1 with step 0.1 and number of memory terms (m = 4) using Eq. (21b).

where B refers to the beta function with inputs of the derivative order vector (σ vec ) that is varied from 0.1 to 1 with step 0.1 and memory terms is m = 4. The generated distribution Z has minimum and maximum values min(Z ) and max(Z ). The normalization interval ranges are c = 0.2 and d = 0.3. The l = 1, 2, 3, . . . , 10 denotes the index of the derivative order. The recommended value of the derivative order is 0.3 therefore the index l is 3 [40] .

The structure of the proposed COVID-19 classification approach is given in Fig. 2 , and the following steps summarized this structure. In general, the proposed COVID-19 classification method consists of two stages. The first stage is to extract the features using a set of methods, as discussed in Section 3. In case of FO − CS ML :

In case of Fo − CS P :

In case of Fo − CS C :

In case of Fo − CS W :

The details of the stages of the proposed method are given as presented below:

In this stage, the X-ray COVID-19 images are used as input to our methods. Then, for each image in the current data-set, a set of features are extracted using Fractional Zernike moments, Wavelet, Gabor Wavelet, and GLCM, as discussed in Section 3.1. All of these features are collected in one matrix, where each row represents the extracted features from its corresponding COVID-19 image. This matrix of extracted features is used as an input to the second stage (i.e., to the developed FO-CS method) to determine the irrelevant features that are removed and the relevant features that are kept.

This stage is considered the main core of the proposed COVID-19 detection using X-ray images. Since it aims to normal and abnormal images first, then determines COVID-19 patients from abnormal images. This aim is achieved by using the following steps:

The developed feature selection FO-CS method begins by dividing the data-set (features from the first stage) into training and testing sets, which represent 80% and 20% from the dataset, respectively. Then constructing a population of N real-valued solutions generated from a uniform random number. This step is formulated as:

In the above equation, the LB j and UB j denote the lower and upper boundaries of search space at dimension j, and r 1 is a random number drawn from uniform distribution in the interval of [0 1].

In this step, the solutions starting their updating by computing the fitness value of each of them. Since the FS problem is a discrete problem, so we convert the real-value of each solution to binary to deal with this type of problem. This process is formulated using the following equation.

Then features that corresponding 1's in BU are selected, and the other features are removed. Followed by using Eq. (28) to evaluate the performance of those selected features. 20)), respectively to update the current solution. Then using Eq. (21b) to compute the switching probability P s and applied to discover the solution that will be replaced by new ones. Before starting a new iteration, the memory window can be updated using first in first out approach, as depicted in Algorithm 1.

The main target of this step is to check whether the second step will be conducted again or not. This is achieved by checking if the terminal conditions are reached or not. In the case they are not reached, then step 2 (i.e., updating solution) is repeated again; otherwise, it will return by best solution U b .

In this step, the testing set is used as input to KNN; however, their features that are corresponding to 0's in U b will be removed. Then predict the target of the testing set and compute the performance of this output using different metrics. Get a cuckoo randomly. 7: if Dis is ML D then 8: Utilize the modified equation of motion Eq. (22) Update the past solutions (memory terms (r)) on FIFO bases FIFO.

Compute the fitness function Fit i (i = 1, 2, . . . , N).

Select a nest from n randomly and compute the fitness at this index (index = j) 21: if Fit i < Fit j then 22: Replace U j by the new one U i 23: end if 24: Calculate P s based on r = 4 and β = 0.3 via Eq. (21b).

Abandon a fraction P s of worst nest and build new ones via Eq. (19) . 26: Retain the best solutions. 27: Sort the solutions and determine the current best.

28: t = t+1. 29 : end while 30: Return the best solution

To validate the performance of the developed COVID-19 detection method, a set of experimental series is performed. The main aim of the first experimental series is to evaluate the performance of the main core of our COVID-19 detection (i.e., FO-CS based on heavy-tailed distributions), a set of eighteen UCI data-sets is used. Meanwhile, the second experimental series aims to test the applicability of the developed COVID-19 detection method by using two real-world COVID-19 images which have different characteristics.

In this experimental, eighteen well-regarded data-sets of UCI [47] source are used. These data-sets specifications are listed in Table 1 Data-sets description [47] . Table 1 , the examined benchmark problems diverse from the small to the high dimension to evaluate the efficiency of the proposed algorithm. In this study, the first stage of the proposed classification COVID-19 X-ray images (i.e., feature extraction) not used.

We compared the improved FO-CS variants with several existing MA, including the original CS, Henry gas solubility optimization (HGSO) [67] , Harris hawks optimization (HHO) [68] , Genetic Algorithm (GA) [69] , Salp Swarm Algorithm (SSA) [70] , Whale optimization algorithm (WOA) [71] , and Grey wolf optimizer (GWO) [72] . These algorithms are employed based on their original implementation and parameter values. All the algorithms have been implemented with a population size of 15 and a number of iterations of 50 using Matlab R2018b working on Windows 10, 64 bit. Computations were performed on a 2.5 GHz CPU processor with 16 GB RAM.

In this paper, we employed several performance measures to evaluate the improved FO-CS variants as follows.

• Average of accuracy (Avg acc ): It is applied to test the efficiency of the algorithm for predicting target labels of each class during a set of N r runs:

Acc k Best , Acc k Best = TP + TN TP + FN + FP + TN (29) • Standard deviation (STD): It is applied to for calculating dispersion between the accuracy value overall runs their average values:

• Average of selected features (AVG |BX Best | ): It is used to test the ability of the algorithm for selecting the smallest relevant features set during all runs as defined by the following equation:

where |.| represents the cardinality of BX k Best at kth run.

• Average of fitness value (AVG Fit ): It is employed to test the performance of the algorithm in balancing between the ratio of selecting features and the error:

In this section, the results of the proposed fractional-order cuckoo search algorithm using five types of distributions approaches are given; Levy distribution (FO-CS), Mittag-Leffler distribution (FO-CS ML ), Pareto distribution (FO-CS P ), Cauchy distribution (FO-CS C ), and Weibull distribution (FO-CS W ). Table 2 describes the accuracy results of the proposed methods compared to other competitor optimization methods.

It is apparent from Table 2 that the proposed FO-CS ML and FO-CS P got almost the same response over the proposed methods; each method got five best results out of eighteen. They gave better results than the other proposed methods (i.e., FO-CS, FO-CS C , and FO-CS W ) in terms of the classification accuracy measure. The proposed FO-CS and FO-CS W also show almost the same performance over the proposed methods; each method excelled in three cases. They gave better results than only one of the other proposed methods (i.e., FO-CS C ) according to used measure. The conventional CS performed better than only one of the proposed methods (FO-CS) over all the data-sets utilized in this paper in terms of the classification accuracy measure.

It is worthwhile to mention that the proposed FO-CS ML and FO-CS P got better results in five out of eighteen compared to all other comparative methods (i.e., HHO, HGSO, WOA, SSA, GWO, and GA). Meantime, SSA got the best results in four cases (i.e., DS1, DS11, DS17, and DS18) compared to all other methods. Generally, in terms of the accuracy measure, the proposed FO-CS outperformed the other state-of-the-art methods using various distribution approaches, mainly when used FO-CS ML and FO-CS P . Table 3 shows the number of selected features achieved by the feature selection algorithms. From this Table, we can notice that the proposed FO-CS W made the smallest number of features selected in seven data-sets (DS6, DS7, DS11, DS13, DS16, DS17, and DS18). In comparison with the proposed FO-CS methods, the FO-CS W got the smallest optimal number of the selected features according to the results of the proposed FO-CS with the used distribution approaches, followed by FO-CS ML . As well, in comparison with all tested methods, most of the best cases (in terms of the best number of selected features) have been accomplished by the FO-CS W . WOA achieved the smallest number of selected features in three data-sets. FO-CS W BDA gave a better performance than other methods on most of the used datasets according to the number of selected feature measures. The high-performance of the FO-CS with Weibull distribution (FO-CS W ) enhances its capability in reducing the number of selected features. Table 4 summarizes the mean fitness function values obtained by the proposed algorithms. According to this measure, we can reveal that the proposed FO-CS W is a superior method compared to the other proposed methods that are based on using the modified fractional-order cuckoo search algorithm with five various distributions approaches. The proposed FO-CS W got the best mean fitness values in four cases, and it is not worse than any other method on most of the use of data-sets. In comparison with all of the given methods in Table 4 , FO-CS ML also outperformed the comparative methods in terms of the mean fitness function except HGSO; they got an equal number of best-obtained results (each method got the best results in two cases). The standard deviation (STD) values of the proposed algorithm's fitness function are given in Table 5 . From these results, it can be seen that the proposed FO-CS W exceeded the other proposed methods on most of the used data-sets. According to the STD measure, FO-CS W achieved the best results in four cases in terms of the STD measure. Moreover, FO-CS W achieved better 3 . Average of accuracy overall tested for each algorithm. fitness function results with excellent STD values compared to other comparative methods in the majority of data-sets. Compared to all comparative methods in Table 5 , HGSO got the best cases in most data-sets, which means that HGSO is a powerful competitor method in this research. Thus, the proposed FO-CS W beets all the comparative methods according to the STD measure.

For the all feature selection algorithms, the minimum results of the fitness function values are given in Table 6 . The obtained results proved that the proposed FO-CS W and FO-CS ML are able to get better results compared to other proposed methods. It got the best results in six out of eighteen cases. However, FO-CS W provides average of the best fitness value better than FO-CS ML overall the tested datasets. The obtained high-performance by the proposed FO-CS W algorithm demonstrates its capability to equilibrium the exploration and exploitation during the optimization process. As per the obtained-results, the performance of the proposed FO-CS W algorithm is proven on the small-sized datasets and large-sized data-sets. The three data-sets (i.e., DS17, and DS16) are comparatively large, and the minimum fitness values of the proposed FO-CS W is obviously less compared to all other comparative methods.

According to the last evaluation measure (worst fitness function values), the proposed algorithms using the FO-CS with the used distribution approaches, as shown in Table 7 , got better results than the other comparative methods relatively. The proposed FO-CS W achieved better results in comparison with the proposed algorithms as well as in comparison with other comparative methods. It obtained high-performance results (worst mean fitness values) in five out of eighteen experimented data-sets.

In general, the previous results showed that the best values obtained by FO-CS variants equal to 60% of all measures namely, they obtained the best values in 67% of the accuracy measure in all datasets, 0.50% of the selected features, and 78% of the standard deviation, also 56% of the fitness function values, the minimum, and the worst values. These results indicate that the variants of the proposed method can effectively compete with other algorithms in selecting the most relevant features.

Besides, from the given results, it can be recognized that the fractional-order calculus has increased the efficacy of the proposed algorithms, especially with the Weibull distribution approach in terms of fitness function values. The main reason here is that the proposed FO-CS W has evaded from converging towards the local area and improved its exploration functions to address more complex cases. Hence, it can make an excellent trade-off between the exploration and exploitation strategies.

Furthermore, Figs. 3-5 depict the results of the average of the accuracy, the fitness value, and the number of selected features, respectively, for each algorithm overall the tested data-sets. These figures provide extra evidence about the high performance of the proposed FO-CS variants with heavy-tailed distributions as feature selection approaches, which provide better results than the other methods.

For further analysis of the results of FO-CS variants, as feature selection approaches, the non-parametric Friedman test [73] is used as in Table 8 . From this table, it can be noticed that the FO-CS P provides better results in terms of accuracy overall the tested data-set. Whereas, in terms of the number of selected features, FO-CS W allocates the first rank with a mean rank of nearly 3.5, and it has the smallest mean rank in terms of fitness value. Moreover, we can see that the FO-CS W obtained the first rank in the fitness function and the smallest selected attributes (in these measures, the smallest value is the best), whereas it was ranked third in the accuracy measure (in this measures the largest value is the best). Although the FO-CS W was ranked third, the first and the second rank were for the other variants of the proposed method (i.e., FO-CS P and FO-CS ML ); therefore, the proposed method variants obtained the first five ranks compared with the other algorithms.

Generally, the proposed FO-CS W has a superior exploration search capability due to the used fractional-order calculus between the candidate solutions, which encourage its exploration abilities over the search processes when it is needed and in the following phase, it can efficiently converge on the neighborhood of explored locations by using the Weibull distribution approach, chiefly, through the final iterations. Consequently, this proposed method (FO-CS W ) has stabilized a balance between the exploration and exploitation strategies, so its influence can be recognized in the enhanced fitness results and the smallest number of selected features compared to all other comparative methods in this research.

As the acceleration convergence speed is the other aspect that should be discussed and evaluated for the recommended FO-CS variant, the convergence curves based on the optimal fitness function and the mean convergence curves for the basic CS and FO-CS variants have been drawn for the eighteen datasets to demonstrate the efficiency of the recommended FO-CS W as in Figs. 6, 7 and 8, 9, respectively. By inspecting the convergence curves of the minimum fitness functions of Figs. 6-7, it can be seen that the CS suffers from stagnation in several functions in nearly 80% of the studied datasets. In contrast, the FO-CS variants expose high qualified performance, especially FO-CS W , FO-CS ML , and FO-CS C as they show successful performance for 6, 6 and 5 datasets, respectively. The average convergence curves across the set of independent runs are drawn as in Figs. 8-9 for achieving unbiased comparison. By inspecting the figures, it can be seen 

In this section, the FO-CS variants are evaluated using two real-world COVID-19 images which have different characteristics.

In this study, two data-sets are used to assess the performance of the developed FO-CS as the COVID-19 X-ray classification method. These data-sets are contained only two classes (i.e., COVID and No-Findings), which are used for binary classification, or they have three classes (i.e., COVID, No-Findings, Pneumonia) which used for multi-class classification. The description of these data-sets are given in the following 1. Data-set 1: This is an X-ray image data-set which collects from different two sources. The first source is the COVID-19 X-ray images collected by Cohen JP [74] . This source has 127 COVID-19 X-ray. The second source named Chest X-ray, which collected by Wang et al. [58] . This dataset contains normal and pneumonia X-ray images. From this source, 500 no-findings and 500 pneumonia are used. Fig. 10 depicts sample from this data-set (Data-set 1). 2. Data-set 2: This data-set has 219 and 1341 for positive and negative COVID-19 images, respectively. It is collected collaboration between different teams such as Qatar University, University of Dhaka (Bangladesh), also collaborators from Malaysia and Pakistan medical doctors [75] . In addition, the Italian Society of Medical and Interventional Radiology (SIRM) [76] . Fig. 11 shows sample of Data-set 2.

In this section, the comparison results between the variants of FO-CS and other algorithms are discussed. All of the algorithms used the extracted features from algorithms introduced in Section 5.1. In this section, the extracted features from the COVID-19

x-ray images are the input for the proposed approach that aimed to reduce these features by removing the irrelevant ones. The parameter settings for each algorithm are similar to the previous experiments of the UCI data-set. Table 9 shows the average of results in terms of accuracy, fitness value, and the number of selected features. One can be observed from this table that the FO-CS W algorithm provides better results than other algorithms in terms of accuracy. In general, it has the first rank according to the STD, and the worst of accuracy. However, according to the Best value of the accuracy, the FO-CS ML has the higher. By analyzing the results of COVID-19 Data-set 1 in terms of fitness value, it has been observed that the FO-CS C has better value according to the Mean and Best fitness value. However, the GA and FO-CS W provide better results in terms of STD and worst of fitness value, respectively.

According to the results of the algorithms using the second COVID-19 data-set (i.e., Data-set 2), it can be observed that mean and Best of the accuracy of FO-CS W is better than others. Also, in the worst value of accuracy, FO-CS ML and FO-CS W have the same performance, but FO-CS ML is more stable in terms of accuracy. In addition, the FO-CS W provides better performance in terms of Mean, Best, and Worst fitness value, but FO-CS C is more stable than FO-CS W in terms of fitness value. From all of these results, it can be noticed that the following observation, firstly, the variants of FO-CS, in general, provide better results than other algorithms. Secondly, the heavy-tailed distribution has the largest effect on FO-CS, which has high accuracy and smaller fitness value than levy flight distribution. Finally, using the content of FO leads to improve the performance of CS and the conclusion from the results of FO-CS when compared with traditional CS algorithms in real-world COVID-19 X-ray images. Fig. 12(a) shows the average of the fitness value cross the number iterations. From this convergence curve, it can be observed that FO-CS W and FO-CS C are the most competitive FS algorithms applied to COVID-19 X-ray images, and they converge faster than other algorithms, followed by FO-CS ML and FO-CS P , respectively for the first data-set. The FO-CS W is the superior one in the case of the second data-set.

From the previous results, we noticed that the proposed FO-CS based on heavy-tailed distributions provide better results than other algorithms for both UCI datasets and COVID-19 X-ray images. However, there are some limitations of these variants, such as determine the optimal value of the parameters for heavy-tailed distributions. In addition, determine the number of terms and derivative order coefficient.

In this study, we present four variants of the recent fractionalorder cuckoo search optimization algorithm (FO-CS) using heavytailed distributions instead of Lévy flights to classify the extracted features from COVID-19 x-ray data-sets. The considered heavytailed distributions included the Mittag-Leffler distribution, the Pareto distribution, the Cauchy distribution, and the Weibull distribution. The application of these distributions can be used to improve mutation operators of MA, and it can be used to escape from non-prominent regions of the search space. For appraising the performance of the proposed variants before applying to the COVID-19 classification approach, we used eighteen data-sets from the UCI repository, and we compared their results with several well-regarded MA utilizing several statistical measures. At this end, the FO-CS based on heavy-tailed distributions variants have been employed for the COVID-19 classification optimization task to classify the data-sets for normal patients, COVID-19 infected patients, and Pneumonia patients. Two different data-sets have been studied, and the FO-CS novel approaches have been compared with numerous MA to evaluate the proposed approach for classifying such of those important data-sets to introduce a reliable and robust technique that helped in classifying the COVID-19 data-sets efficiently and with high accuracy. The FO-CS based on Weibull distribution showed its superiority compared to the other proposed variants and recent well-regarded MA. In future work, we will try to evaluate the proposed method in different applications, such as parameters estimation and solving various engineering problems. 

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Serial interval of novel coronavirus (COVID-19) infections

A review of coronavirus disease-2019 (COVID-19)

Coronavirus disease 2019 (COVID-19): a perspective from China

Coronavirus stress, optimism-pessimism, psychological inflexibility, and psychological health: Psychometric properties of the coronavirus stress measure

Economic effects of coronavirus outbreak (COVID-19) on the world economy

Classification of the COVID-19 infected patients using densenet201 based deep transfer learning

Extracting possibly representative COVID-19 biomarkers from X-Ray images with deep learning approach and image data related to pulmonary diseases

Automated detection of COVID-19 cases using deep neural networks with X-ray images

Covid-19: automatic detection from xray images utilizing transfer learning with convolutional neural networks

Covidiagnosis-net: Deep Bayes-squeezenet based diagnostic of the coronavirus disease 2019 (COVID-19) from X-Ray images

A combined deep cnn-lstm network for the detection of novel coronavirus (covid-19) using x-ray images

COVID-19 detection using deep learning models to exploit social mimic optimization and structured chest X-ray images using fuzzy color and stacking approaches

Feature Extraction and Image Processing for Computer Vision

Feature Selection and Enhanced Krill Herd Algorithm for Text Document Clustering

A new feature selection method to improve the document clustering using particle swarm optimization algorithm

Feature selection based on improved runner-root algorithm using chaotic singer map and oppositionbased learning

Galaxy images classification using hybrid brain storm optimization with moth flame optimization

Boosting salp swarm algorithm by sine cosine algorithm and disrupt operator for feature selection

EMG Feature selection for diagnosis of neuromuscular disorders

Attention and feature selection for automatic speech emotion recognition using utterance and syllable-level prosodic features, Circuits Systems Signal Process

A novel machine learning based feature selection for motor imagery eeg signal classification in internet of medical things environment

Device-free human micro-activity recognition method using wifi signals

Feature selection for text classification: A review

Data Mining: Concepts and Techniques

A global-ranking local feature selection method for text categorization

The elements of statistical learning: Data mining, inference and prediction. by

Multi-verse optimizer algorithm: a comprehensive survey of its results, variants, and applications

Genetic algorithms in feature and instance selection

Particle swarm optimization for feature selection in classification: A multi-objective approach

Differential evolution for feature selection: a fuzzy wrapperfilter approach

Feature selection via a novel chaotic crow search algorithm

Feature selection for high-dimensional classification using a competitive swarm optimizer

An evolutionary gravitational search-based feature selection

No free lunch theorems for optimization

Cuckoo search via Lévy flights

Cuckoo search algorithm: a metaheuristic approach to solve structural optimization problems

Cuckoo search: recent advances and applications

A survey on applications and variants of the cuckoo search algorithm

Modified cuckoo search algorithm with variational parameters and logistic map

Fractional-order cuckoo search algorithm for parameter identification of the fractional-order chaotic, chaotic with noise and hyper-chaotic financial systems

Escaping large deceptive basins of attraction with heavy-tailed mutation operators

Fast genetic algorithms

On Mittag-Leffler distributions and related stochastic processes

Pareto distribution

The skew-Cauchy distribution

Generalized weighted Weibull distribution

UCI machine learning repository

COVID-19 identification in chest X-ray images on flat and hierarchical classification scenarios

New machine learning method for image-based diagnosis of COVID-19

Cvdnet: A novel deep learning architecture for detection of coronavirus (Covid-19) from chest x-ray images

Covidx-net: A framework of deep learning classifiers to diagnose covid-19 in x-ray images

Coronet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images

Convolutional capsnet: A novel artificial neural network approach to detect COVID-19 disease from X-ray images using capsule networks

A deep learning approach to detect Covid-19 coronavirus with X-Ray images

Diagnosis and detection of infected tissue of COVID-19 patients based on lung x-ray image using convolutional neural network approaches

Efficient feature selection method using realvalued grasshopper optimization algorithm

A hyper learning binary dragonfly algorithm for feature selection: A COVID-19 case study, Knowl.-Based Syst

Improved cuckoo search and chaotic flower pollination optimization algorithm for maximizing area coverage in wireless sensor networks

Reinforced cuckoo search algorithm-based multimodal optimization

A novel improved cuckoo search algorithm for parameter estimation of photovoltaic (PV) models

Image analysis by fractional-order orthogonal moments

Fractional quaternion zernike moments for robust color image copy-move forgery detection

Quaternion pseudo-zernike moments combining both of RGB information and depth information for color image splicing detection

Spike detection using the continuous wavelet transform

Application of gabor wavelet and locality sensitive discriminant analysis for automated identification of breast cancer using digitized mammogram images

Opposition-based mothflame optimization improved by differential evolution for feature selection

Henry gas solubility optimization: A novel physics-based algorithm

Harris hawks optimization: Algorithm and applications

Genetic algorithm-based heuristic for feature selection in credit risk assessment

Improved salp swarm algorithm based on particle swarm optimization for feature selection

The whale optimization algorithm

Chaotic opposition-based grey-wolf optimization algorithm based on differential evolution and disruption operator for global optimization

Relative power of the wilcoxon test, the friedman test, and repeated-measures ANOVA on ranks

COVID-19 image data collection

Can AI help in screening viral and COVID-19 pneumonia? 2020, arXiv preprint

Italian society of medical and interventional radiology

This project is supported financially by the Academy of Scientific Research and Technology (ASRT), Egypt, Grant No 6619.