key: cord-0985960-rabztfi5
authors: Koyuncu, Hasan; Barstuğan, Mücahid
title: COVID-19 discrimination framework for X-ray images by considering radiomics, selective information, feature ranking, and a novel hybrid classifier()
date: 2021-06-17
journal: Signal Process Image Commun
DOI: 10.1016/j.image.2021.116359
sha: 0b101298f96020240a2ccd3a56c4f5229cc15207
doc_id: 985960
cord_uid: rabztfi5

In medical imaging procedures for the detection of coronavirus, apart from medical tests, approval of diagnosis has special significance. Imaging procedures are also useful for detecting the damage caused by COVID-19. Chest X-ray imaging is frequently used to diagnose COVID-19 and different pneumonias. This paper presents a task-specific framework to detect coronavirus in X-ray images. Binary classification of three different labels (healthy, bacterial pneumonia, and COVID-19) was performed on two differentiated data sets in which corona is stated as positive. First-order statistics, gray level co-occurrence matrix, gray level run length matrix, and gray level size zone matrix were analyzed to form fifteen sub-data sets and to ascertain the necessary radiomics. Two normalization methods are compared to make the data meaningful. Furthermore, five feature ranking approaches (Bhattacharyya, entropy, Roc, t-test, and Wilcoxon) are mentioned to provide necessary information to a state-of-the-art classifier based on Gauss-map-based chaotic particle swarm optimization and neural networks. The proposed framework was designed according to the analyses about radiomics, normalization approaches, and filter-based feature ranking methods. In experiments, seven metrics were evaluated to objectively determine the results: accuracy, area under the receiver operating characteristic (ROC) curve, sensitivity, specificity, g-mean, precision, and f-measure. The proposed framework showed promising scores on two X-ray-based data sets, especially with the accuracy and area under the ROC curve rates exceeding 99% for the classification of coronavirus vs. others.

COVID-19 has become a mortal pandemic, and its detection is vital for not only COVID-19 positive individuals but also for people who have been in close proximity to such individuals [1] . The early detection of coronavirus is important to provide necessary medical intervention despite the unknown prognosis [2] . In this context, literature studies on computer-aided diagnosis systems can help medical experts for supporting the diagnosis to be decided [3] [4] [5] .

Here, we propose a task-specific framework for detecting coronavirus in X-ray images. For this purpose, framework analyses were performed by using four radiomics, two normalization approaches, five filter-based feature selection methods and an efficient optimized classifier. The optimized classifier is kept constant at the end of the framework. These analyses were examined in detail to design the remaining parts of the framework and to realize the highest classification performance that the framework was capable of. selected with social mimic optimization (SMO) and classified by the SVM-based model. The proposed model achieved 99.27% accuracy for multiclass classification. Ucar and Korkmaz [8] designed a diagnosis model employing Deep Bayes-SqueezeNet to categorize coronavirus in X-ray images. The data set used by them contained X-ray images belonging to three categories (1583 normal, 4290 pneumonia and 76 , and the data were augmented to obtain higher performance. After augmentation, the proposed model achieved an overall accuracy of 98.26% for multiclass classification. Apostolopoulos and Mpesiana [9] compared deep learning methods by considering two Xray-based data sets: one consisting of 224 COVID-19, 700 bacterial pneumonia, and 504 normal images and the other comprising 224 COVID-19, 714 bacterial and viral, and 504 normal cases. Pretrained models such as VGG19, MobileNet-v2, and Inception-ResNet-v2 were implemented for binary and multiclass classification, and the best accuracy rate (96.78%) was recorded for binary classification with MobileNet-v2. Butt et al. [10] constructed a CT image data set comprising 184 COVID- 19, 194 influenza-A viral pneumonia, and 145 healthy cases. They proposed a model employing a three-dimensional (3D) convolutional neural network (CNN) and used it to segment and identify the candidate region. The model achieved an area under the ROC (which stands for ''receiver operating characteristic'') curve (AUC) score of 99.6% for coronavirus vs. noncoronavirus classification. Li et al. [11] used 4356 CT images to train a 3D deep learning model named COVNet. The collected data set was divided into two classes: COVID-19 and others. In experiments, COVNet achieved an AUC score of 96% for the detection of coronavirus. Kang et al. [12] used 2522 CT images containing 1495 COVID-19 and 1027 pneumonia cases, and employed the V-Net model was handled to obtain the lung, lung lobes, and pulmonary segments. The lesion region was subsequently segmented. Radiomic and handcrafted features were extracted from each CT image, and five classifiers were used to categorize them. The best classification accuracy (95.5%) was achieved by a neural network (NN) algorithm. Afshar et al. [13] used a model processing capsule network (CN) to classify X-ray images belonging to four types of classes (individuals with COVID-19, those with a viral disease, those with a bacterial disease, and normal individuals). The study focused on coronavirus vs. noncoronavirus classification to detect COVID-19. The CN-based structure achieved 95.7% accuracy. Subsequently, the application of a transfer learning method to the CN increased the accuracy up to 98.3%. Mahdy et al. [14] performed binary classification on Xray images comprising 15 normal and 25 COVID-19 cases. The study first performed multilevel thresholding on images, and the thresholding results were classified into two categories (COVID-19 and normal) with an SVM. Experiments showed that the suggested pipeline achieved 97.48% accuracy for binary classification. Hemdan et al. [15] used different deep learning methods to classify 25 COVID-19 and 25 normal cases in an X-ray data set. The best classification accuracy (90%) was obtained by the VGG19 and DenseNet201 algorithms. Table 1 presents a review of state-of-the-art methods.

In addition to diagnoses with imaging modalities, blood tests [16, 17] and clinical data [18] have also been considered for COVID-19 diagnosis.

Deep-learning-based approaches have been frequently used for COVID-19 classification. However, the necessity of a comprehensivetask affirmative framework keeps its currency to achieve remarkable results on coronavirus classification. In particular, there is a need for a framework design for distinguishing samples involving coronavirus from other labels.

Studies on coronavirus detection generally perform binary and multiclass classification. In particular, the binary categorization of multiclass data is important. In other words, the design of a task-specific framework should focus on detection of the target illness, and it should be capable of revealing the sample corresponding to the illness from among other labels, including normal samples. An illness-based classification model based on binary classification of multiclass data pertaining to the illness and other labels can be designed. Furthermore, a coronavirus-specific framework that is intended solely for the detection of coronavirus can be designed. Herein, this process arises as the second necessity of the literature.

As the first step in the development of a task-affirmative framework, the radiomic features to be extracted should be identified; these features help achieve reliable performance. At this point, the combination of these features and their in-depth analysis constitute the selective information to be processed among the model. First-order statistics (FOS), gray level co-occurrence matrix (GLCM), gray level run length matrix (GLRLM), and gray level size zone matrix (GLSZM) are efficient radiomics frequently used in tasks such as adrenal tumor classification, esophageal cancer categorization, and brain tumor grading [19] [20] [21] [22] [23] [24] . These radiomics have also been used in various imaging modalities such as X-ray, magnetic resonance (MR), computed tomography (CT) imaging [19] [20] [21] [22] [23] [24] . Yang et al. [19] designed a pipeline for esophageal cancer classification in X-ray images. GLCM features were evaluated as textural information and seen as one class of feature extraction tools. Liao et al. [20] mentioned that different radiomics obtained in MR images, namely, FOS, GLCM, and GLSZM, apart from others could be used for making survival predictions of patients. Koyuncu et al. [21] used discrete wavelet transform (DWT) and GLCM features to categorize adrenal tumors as binary in 114 CT images involving eight types of tumors. Barstuğan et al. [22] added GLRLM features in addition to DWT and GLCM radiomics to classify adrenal tumors in 122 MR images involving nine types of tumors. Yang et al. [23] graded rheumatoid arthritis in ultrasound images by using a problem-specific model that was developed on the basis of tests with GLCM features and SVM derivatives.

As the second step in the development of a task-oriented model, feature ranking approaches, which are as significant as radiomics, should be chosen since the essential information should be provided to the classifier for achieving the best performance. Raghavendra et al. [24] compared Bhattacharyya, entropy, Roc, t-test, and Wilcoxon rankings to find the coherent ranking with the classifier unit. In experiments, echocardiographic images were categorized into two classes: normal and ischemic heart disease. Nascimento et al. [25] proposed a special pipeline to classify lymphoma images, and the Wilcoxon ranking was used in the feature selection part near of other methods for improving the classification performance. Momenzadeh et al. [26] developed a hidden-Markov-model-based framework by considering Bhattacharyya, entropy, Roc, t-test, and Wilcoxon rankings to classify microarray data sets. They considered B-cell lymphoma, leukemia cancer, and prostate data sets to evaluate the performance of the framework. Koyuncu et al. [27] considered FOS features along with five feature ranking approaches (Bhattacharyya, entropy, Roc, t-test, and Wilcoxon) to classify high and low grade gliomas (brain tumors) defined in different MRI-phase combinations.

As the third step in the development of an efficient framework, the classifier unit, which is the most important section, should be evaluated using a state-of-the-art classifier. Koyuncu [28] examined chaotic particle swarm optimization (CPSO) and compared different chaotic maps to determine the most effective CPSO. Gauss-map-based CPSO (GM-CPSO) achieved remarkable convergence against state-ofthe-art methods for global function optimization. To test the efficiency of GM-CPSO, Koyuncu hybridized it with a NN, and designing a hybrid classifier was considered as the second task. GM-CPSO-NN was compared with recent and efficient hybrid classifiers for epileptic seizure recognition. In the data set, five categories (tumor area, nontumor area, closed-eye, and open-eye recordings with no seizure activity, and seizure activity) were considered, and the seizure vs. nonseizure discrimination was performed with the highest accuracy (97.24%) by GM-CPSO-NN. In another study of Koyuncu [29] , GM-CPSO-NN was compared with robust hybrid NNs and with the derivatives of [27] used GM-CPSO-NN as the classifier unit to design a taskaffirmative framework for brain tumor classification. Two optimized classifiers were compared with GM-CPSO-NN to identify the classifier that was appropriate for the construction of the framework. According to the results, the framework containing GM-CPSO-NN had the highest accuracy score (90.18%) for the binary characterization of brain tumors.

In this study, four efficient radiomics (FOS, GLCM, GLRLM, and GLSZM) were evaluated to determine the best data combination, to rise the selective information, and to design a task-affirmative framework for binary (coronavirus vs. noncoronavirus) classification. Furthermore, two normalization methods (minmax and z-score) were compared among radiomics to make the data meaningful. Additionally, five feature ranking approaches (Bhattacharyya, entropy, Roc, t-test, and Wilcoxon) were used to obtain the necessary information that was to be provided to the classifier unit for achieving the highest performance. GM-CPSO-NN was chosen as the classifier owing to its efficiency in different classification tasks and since it was a state-of-the-art classifier.

The salient features of this paper are as follows:

• It presents a comprehensive study on the design of a specific COVID-19 framework by using radiomics, selective information, feature ranking, and an efficient classifier. • It presents a detailed study on coronavirus vs. noncoronavirus discrimination. • It is the first study to develop an optimized NN-based framework for COVID-19 categorization.

The paper is organized as follows. The GM-CPSO-NN method, radiomics and feature ranking approaches used in this study, and general scheme of framework analysis are presented in Section 2. Information on the data set used in this study, the experimental analysis performed, and interpretations are discussed in Section 3. Concerning the experimental results, proposed framework is taken form and presented in Section 4 beside of the literature comparison. Finally, Section 5 provides the concluding remark.

In this section, the design of the optimized classifier (GM-CPSO-NN) is described, the radiomics and feature ranking approaches used are briefly discussed, and the overall flow and specifications of the designed framework are visually interpreted.

GM-CPSO was developed by Koyuncu [28] through chaotic behavior analysis performed with a particle swarm optimization (PSO) algorithm. GM-CPSO uses (1) and (2) to obtain the new velocity and new position vectors, respectively [28] .

In (1) and (2), (t ) and (t+1) are the current and new velocities, while (t ) and (t+1) denote the current and new positions, respectively. Here, the velocity can be defined as the step size that limits Table 2 Pseudocode of GM-CPSO.

-Determine the parameter values and ranges -Assign the first positions -Calculate the error or profit -Define ( ) (0) = (0) and (0) = best( ( ) (0)) -Update the inertia weight vector according to the maximum iteration number → (3) While (iteration < maximum iteration number) -Computation of new velocity and position vectors → (1) and (2) -Calculate the error or profit If (minimum error or maximum profit is reached) -Save the best particle vector and its fitness -Break End -Attain the vector for every particle and for the entire swarm -iteration++ End the movement of position.

( ) (t ) and (t ) are the individual best position of the th particle and the global best solution (position) for the entire swarm (all particles), respectively, 1 and 2 are the acceleration constants moving the particles to the individual and global best solutions, 1 and 2 are random numbers generated within the range (0,1), and is the inertia weight restricting the velocity acts. In PSO, is frequently set to be a constant value or a variable that is iteratively changed according to the linear descent movement. In GM-CPSO, is a variable and it is adjusted according to the Gauss map defined in (3) below. Here, the initial value ( (0)) is set to 0.7 to achieve robust chaotic behavior [28] [29] [30] .

As evident in (3), the Gauss map can be equal to zero during iterations, which eliminates the effect of the current velocity on the new velocity. Thus, solution diversity is upgraded by using or eliminating the current velocity phenomenon, which directly effects the new velocity and indirectly the new position. Table 2 presents the pseudocode of GM-CPSO.

An NN generally uses backpropagation-type feedback methods. An error metric, the mean squared error (MSE) being usually preferred, is used to detect the fitness of networks according to the desired and obtained outputs. Backpropagation-type methods can cause fluctuations in the error, and optimization algorithms are employed to improve the convergence of the error rate toward the minimum level [31] . In optimized NNs, the optimization algorithm is generally changed using the feedback algorithm of the NN. The formed structure is defined as hybrid, hybridized, or optimized NN. To simplify the presentation of optimized NNs, we can focus on the flow of the optimization algorithm.

In every iteration of the training phase, the optimized NN evaluates different networks, and the network number is specified as the particle number of the optimization algorithm. Different networks are compared to improve the classification of the patterns at each iteration. Furthermore, these networks are generated according to the phenomena declared in optimization algorithm.

In optimized NNs, every network includes different weight-bias values, and these networks are generally transformed into vectors to be processed by optimization algorithm. In other words, every network is represented as a weight-bias vector. In particular, a network of the optimized NN can be referred to as a particle, position, or weight-bias vector.

Briefly, in the hybrid method, the calculation of the fitness in the optimization approach (e.g., the third and seventh items in Table 2 ) involves the operation of NNs and the determination of MSE rates for each network used [27, 28] .

In optimized NNs, the flowchart of the optimization algorithm can be considered to describe the training phase in which the aim is to detect the best weight-bias vector (network). The test part consists of only the evaluation of test data with the best network determined by the optimization algorithm. In other words, the test part employs the best weight-bias vector (the best network) yielding the minimum MSE found in the training phase, to test the system performance. Notably, the performance of optimized NNs depends on the quality of the optimization approach used.

In addition to the parameters of the optimization algorithm, the optimized NNs involve the general NN parameters (hidden node number, maximum iteration number, and minimum error). Concerning the general structure of optimized NNs, GM-CPSO-NN algorithm can be revealed as defined in Fig. 1 by originating the flow of GM-CPSO in training part of the classifier. As declared before, every network (weight-bias vector) can be considered as a particle (position vector) in the optimization method [27] [28] [29] 31] .

In Fig. 1 , parameter definitions include the aforementioned PSO parameters besides NN parameters. [ , ] are defined as the boundaries of weight-bias vectors, and it is advisable to choose the velocity boundaries ([ , ]) as 0.2 times the position boundaries for better convergence [31] . Furthermore, the total network number is defined to be equal to the population size (particle number) of the optimization algorithm. For the achievement of the best classification performance, the population size, hidden node number, acceleration constants, and maximum iteration number should be analyzed in detail.

FOS are the mean, standard deviation, skewness, kurtosis, energy, and entropy, obtained from a histogram or acquired directly from an image. As one of the most popular and preferred radiomics, FOS features do not vary among studies [27] . However, the GLCM, GLRLM, and GLSZM features can differ from one study to another, depending on work preferences or the advice of the literature. Concerning this, we need to explain the GLCM, GLRLM, and GLSZM features. For details on the determination of FOS features, the reader is referred to ref. [27] .

Information acquired on a single pixel value does not provide detailed information about the texture of an image. Although statistical measurements are easy to obtain, there is no distinctive information obtained via single-pixel-based analysis. The GLCM [32] is based on the pixel intensity distribution of an image, and is part of secondorder statistics. The co-occurrence matrix, which provides information on pixel neighborhoods, contains significant information about the positions of pixels with similar gray levels [32] . It is defined as P = [p(i,j| d, )] and is used to determine the frequency features of pixels (gray value i) and their neighboring pixels (gray value j) at distance d and in direction [32] . In this study, 19 efficient features were extracted using the GLCM method; angular secondary moment, contrast, correlation, autocorrelation, entropy, cluster prominence, cluster shade, dissimilarity, sum of squares: variance, difference variance, inverse difference moment, inverse difference, maximum probability, sum average, sum entropy, sum variance, difference entropy, information measures of correlation 1, and information measures of correlation 2 were evaluated in X-ray images. Let be the number of different gray levels in a quantized image, and let p(i,j) denote the (i,j)th input in a normalized gray scale spatial dependence matrix. Let (i) and (j) signify the th and th entries in the marginal probability matrix, obtained by summing the rows and columns of p(i,j). Furthermore, let and be the mean values of and , and let and represent their standard deviations, respectively. The features used can then be expressed as follows [32] [33] [34] . 

The GLRLM [35] reveals the size of homogeneous runs for each gray level. Let G be the number of gray levels, R be the longest run, and W be the number of pixels. The obtained GLRLM has a size G × R, and the elements of each p(i,j| ) give the number of occurrences with direction , gray level i, and run length j. This study extracted short run emphasis (SRE), long run emphasis (LRE), gray level nonuniformity (GLN), run length nonuniformity (RLN), run percentage (RP), low gray level run emphasis (LGRE), and high gray level run emphasis (HGRE) from the GLRLM [35] . The features used are presented in (29)-(35) [35] .

The GLSZM [36] is an advanced statistical matrix and used for texture characterization. In the GLSZM, run length principle is considered as the calculation method. The width of the matrix is dependent on the zone sizes/surfaces. If the texture homogeneity increases, the zones increase in size and the matrix width increases. Therefore, the result matrix becomes wider. When the texture contains considerable noise, so single pixel zones, the number of zones is large, sometimes with close gray level values, which appear in a part of the matrix. By contrast, when the texture is homogeneous, as there are less zones, the matrix contains more zeros and is flattened. The GLSZM computes one matrix for all directions, unlike the GLCM and GLRLM approaches. In this study, 12 efficient features were considered for the GLSZM: small zone emphasis (SZE), large zone emphasis (LZE), gray level nonuniformity (GLNz), zone size nonuniformity (ZSN), zone percentage (ZP), low gray level zone emphasis (LGZE), high gray level zone emphasis (HGZE), small zone low gray level emphasis (SZLGE), small zone high gray level emphasis (SZHGE), large zone low gray level emphasis (LZLGE), large zone high gray level emphasis (LZHGE), and gray level variance (GLV). Let P be a size zone matrix, p be the normalized size zone matrix, be the number of discrete intensity values in the image, be the size of the largest homogeneous region in the volume of interest (VOI), and be the number of homogeneous zones in the images [36] . The evaluated features can then be given by (36)-(49) [36] .

This section presents a short summary of five feature ranking approaches (Bhattacharyya, entropy, Roc, t-test, and Wilcoxon) that are frequently applied to upgrade system performance [27] .

The Bhattacharyya distance (BD) measures the similarity between two probability distributions, and it is calculated as shown in (50), for two classes with normal distributions. Here, 1 and 2 are defined as the means of the first and second classes, respectively, while 1 and 2 are the standard deviations of the classes [26, 27] . 

The entropy test (e) is also referred to as Kullback-Leibler distance or divergence, and it is used for classes with normal distributions. The )

The t-test (t ) score examines the mean values of two samples and focuses on the difference between them. The t score is obtained by using the mean, standard deviation, and sample sizes ( 1 , 2 ), as shown in (52) [26] . The absolute value of t is used to sort features. Larger values indicate higher importance, similar to the entropy test [37] .

The ROC curve is obtained using the tail functions, defined as (x), and it is equal to (1 − (x)) for = 1, 2. (x) denotes the distribution functions of X, and 1 (x) and 2 (x) denote the distribution functions of the two classes. The ROC is defined as follows [37] .

The Wilcoxon test is also termed Mann Whitney U test, and it is used to calculate the feature importance via medians. It can assign the null hypothesis in the event that normal distribution is not observed. If R denotes the sum of ranks of samples, the z score being mentioned for ranking, is obtained as follows [27] .

In this paper, a comprehensive framework is proposed for COVID-19 discrimination. X-ray images were first obtained from different databases to form two data sets. FOS, GLCM, GLRLM, and GLSZM features were extracted from images to determine the radiomics to be used. Selective information was acquired by considering the necessary combination of the radiomics. Fifteen sub-data sets with different radiomic(s) were evaluated to ascertain the necessary combination. Two normalization approaches (minmax and z-score) were also tested to identify the appropriate one for the framework. After the identification of the necessary radiomics and normalization approach, five feature ranking methods (Bhattacharyya, entropy, Roc, t-test, and Wilcoxon) were used to determine the classification process for which the best performance of the framework was achieved. All evaluations were performed with GM-CPSO-NN that proved oneself for the classification of different medical data sets. Fig. 2 shows the general scheme of framework analysis.

As evident in Fig. 2 , data normalization was performed after radiomics were obtained. Data combination analysis was performed together with the normalization technique. Subsequently, to the data that included the required radiomics combination and normalized with the appropriate technique, feature ranking was applied to finalize the COVID-19 framework of which GM-CPSO-NN is kept as the classifier unit.

The first part of this section presents the features of the data used, and the second part discusses an experimental analysis and interpretations. Experiments were examined in three categories: (1) detection of the necessary combination of effective radiomics, (2) identification of the appropriate normalization approach, (3) determination of the essential feature ranking method. Seven metrics (accuracy, AUC, sensitivity, specificity, g-mean, precision, and f-measure) were used to comprehensively evaluate the framework performance [10, 17] ; their equations are presented in (55)-(61). Twofold cross validation was used as the test method since it offers a challenging approach to evaluate the frameworks and to assess the imbalanced data sets. For details on the determination of the metrics, the reader is referred to Ref. [28] .

The data used consisted of X-ray images with normal, bacterial pneumonia, and coronavirus labels. To construct a task-affirmative framework, we used a classification process in which normal and bacterial pneumonia were considered as negative classes and COVID-19 was treated as a positive class. Two different databases were used to obtain a detailed data repository [38] [39] [40] .

In our repository, the first data set (data #1) was constructed using 80 samples for all classes. The second data set (data #2) comprised 400 images, with 80 COVID-19, 160 bacterial pneumonia, and 160 normal images. The second data set was constructed to enforce the classification framework, and the data part involving coronavirus samples remained low in contrast to the normal and pneumonia samples. In other words, there existed a data imbalance between positive and negative cases, and it rendered the task of finding positive class labels challenging; that is why this data set is mentioned [38] [39] [40] . This process revealed the efficiency of our framework and showed whether it could perform the necessary task under both conditions. There were two handicaps in the X-ray images used (Fig. 3 ).

1. The first handicap was the similar contrast and gray level features in X-ray images involving different class labels. 2. The second disadvantage arose because of the different contrast and gray level features among X-ray images belonging to the same class labels.

The framework should accurately distinguish different labels even if they have similar features. Furthermore, it should determine the same class labels in the event that they have different characteristics. Though an efficient classifier (GM-CPSO-NN) was used to classify the challenging data set, feature analyses was the most important examination that could directly increase the classification success for a customized-COVID-19 framework. 

To increase the classification success and perform a detailed performance comparison, we comprehensively determined the parameter settings of GM-CPSO-NN, and they are presented in Table 3 . The velocity ranges of the weight-bias values were maintained at 0.2 times the position ranges for achieving higher performance [27] [28] [29] . Four parameters (acceleration constants, hidden node number, population size, and maximum iteration number) were varied to observe the highest scores in trials by using seven metric-based comparisons. The necessary feature combination, reliable normalization, and appropriate feature ranking were determined by calculating the number of best scores on the seven metrics. Tables 4 and 5 show the performance comparison of feature combinations (radiomics evaluation) to which minmax and z-score approaches were applied. In the absence of the application of a feature ranking method, the first comparison (Tables 4 and 5 ) is related to the identification of effective radiomics. Subsequently, normalization analyses were performed. Table 4 shows that the best success rates of five metrics (accuracy, AUC, sensitivity, f-measure, and g-mean) were achieved by using FOS and FOS + GLRLM features, and the scores are identical for the two features. If the accuracy and AUC metrics are scrutinized, five features (FOS, FOS + GLCM, FOS + GLRLM, FOS + GLCM + GLRLM, and FOS + GLRLM + GLSZM) are observed to yield success scores over 97% for data #1. The key point of the analyses is FOS features, and the addition of other features (except GLRLM) cannot provide higher performance. Even if the combination FOS + GLRLM reveals the same results and cannot enhance the performance, it can also be considered as an appropriate preference. Table 5 shows that the highest scores of five metrics (accuracy, AUC, sensitivity, f-measure, and g-mean) were observed for FOS + GLRLM + GLSZM features. Scores above 97% for the accuracy and AUC metrics were attained in only three conditions (FOS, FOS + GLRLM, and FOS + GLRLM + GLSZM). Although the choice of FOS + GLRLM provides a score above 97% for the accuracy and AUC metrics, it decreases the performance on six metrics according to the rates with FOS features. FOS + GLRLM + GLSZM features are superior in terms of attaining remarkable scores. Moreover, it can be inferred that some additions (GLRLM with FOS features) can decrease the performance while some combinations (FOS + GLRLM + GLSZM) can offer higher performance despite adding new features (GLSZM) to a performance-reducing preference (FOS +GLRLM). Fig. 4 presents a summary of the best results assessed by using the minmax and z-score approaches for the FOS, FOS + GLRLM, and FOS + GLRLM + GLSZM features. The best results were obtained on six metrics by applying z-score to FOS + GLRLM + GLSZM features. Although zscore-based results are higher than minmax-based scores, the rates are very close.

In the third part, feature ranking methods were used after minmax was applied to FOS features and z-score was implemented for FOS + GLRLM + GLSZM features for achieving the maximum performance. Herein, proposed framework will take shape according to this comparison. Tables 6 and 7 show a comparison of feature ranking methods used after minmax and z-score were applied to FOS and FOS + GLRLM + GLSZM features, respectively.

In Table 6 , the highest accuracy (99.17%), AUC (99.06%), gmean (99.06%), and f-measure (98.75%) scores are observed for Bhattacharyya and Roc rankings. In Table 7 , results showing the highest (Optimum results are italicized and bolded for every feature combination.)

Comparison of feature ranking methods for data #1 (FOS + GLRLM + GLSZM and z-score). accuracy, AUC, sensitivity, g-mean, and f-measure rates are observed for t-test ranking. Fig. 5 is presented to provide a detailed view and an objective assessment of the best results of two frameworks. As evident in Fig. 5 , both preferences achieve results close to each other. However, if the first framework (FOS and minmax and Bhattacharyya or Roc) is chosen, the highest accuracy (99.17%) can be achieved besides better scores on specificity, precision, and f-measure metrics. Thus, the aforementioned feature combination, normalization approach, and ranking method were the best choice for data #1-based experiments.

Tables 8 and 9 present performance evaluations of feature combinations to which minmax and z-score approaches have been applied. In the absence of feature ranking, the first comparison (Tables 8 and  9 ) shows the effectiveness of radiomic combinations. The necessary normalization method is examined next.

In Table 8 , FOS + GLRLM features show the highest accuracy (99%), specificity (99.38%), precision (97.50%) and f-measure (97.50%) scores, while FOS features yield the best results for the AUC, sensitivity, and g-mean metrics. With regard to the importance and rank of the first places on the basis of success rates, FOS + GLRLM features show high performance for the classification of data #2. It is apparent that FOS features are necessary for achieving the best performances, either in combination with other features or by itself. Five conditions (FOS, FOS + GLCM, FOS + GLRLM, FOS + GLCM + GLSZM, and FOS + GLRLM + GLSZM) showed a success rate above 97% in terms of the accuracy and AUC rates. However, the use of FOS + GLRLM features showed higher accuracy.

In Table 9 , the best accuracy (99%), specificity (99.38%), precision (97.50%), and f-measure (97.50%) rates are attained by FOS + GLRLM features. Furthermore, FOS + GLSZM features yield the highest scores for the AUC, sensitivity, and g-mean metrics. For four features (FOS, FOS + GLCM, FOS + GLRLM, and FOS + GLSZM), a success rate over 97% was achieved for the accuracy and AUC metrics. However, FOS + GLRLM features can help increase the accuracy rate. Consequently, FOS + GLRLM features remain the best choice in both experiments in which minmax or z-score is applied.

If the highest scores of FOS + GLRLM and minmax and FOS + GLRLM and z-score preferences are examined, it is observed that all scores are identical for data #2. Hence, both normalization methods can be preferred for application to FOS + GLRLM features.

In the third part, feature ranking methods were tested on the combination of FOS + GLRLM and minmax and on FOS + GLRLM and z-score. Tables 10 and 11 show the comparisons for data #2. Herein, general framework design takes shape according to this comparison besides the consequences in Section 3.2.1.

In Table 10 , the highest scores on all metrics are observed if the feature ranking approach is chosen as Bhattacharyya, entropy, or Roc for FOS + GLRLM and minmax preference. In Table 11 , the best scores are obtained only with t-test feature ranking for FOS + GLRLM and z-score preference. Fig. 6 presents a comprehensive view of the best results of the two frameworks.

As evident in Fig. 6 , even if both preferences achieve very similar results, the first choice (FOS + GLRLM and minmax and Bhattacharyya, entropy, or Roc) is superior to the second one, and it provides the highest performances on six metrics. Consequently, the aforementioned feature combination, normalization approach, and ranking method were (Optimum results are italicized and bolded for every feature combination.)

Performance comparison in terms of radiomics for data #2 (z-score normalization is active). 

From the analyses of Section 3, the following inferences can be drawn.

• As in the study of Koyuncu et al. [27] pertaining to brain tumor classification in MR images, it was found that FOS features are necessary for the construction of a task-affirmative framework and to discriminate coronavirus in X-ray images. In other words, FOS features are necessary to construct a specified-coronavirus framework for X-ray imaging modality. • Among the radiomic combinations, FOS and FOS + GLRLM features can be chosen for data including normal and imbalanced data distributions, respectively. • The minmax approach can be preferred as the normalization approach if it is used with radiomics and feature ranking methods. • For the highest performance, Bhattacharyya and Roc rankings seem remarkable on both data sets formed with necessary radiomics to which minmax normalization has been applied. • In this study, GM-CPSO-NN proves its efficiency again as in [27] [28] [29] 31 ] and can be chosen as a state-of-the-art classifier.

Our framework was designed according to the deductions, and it is presented in Fig. 7 . It involves FOS or FOS + GLRLM features as the radiomic features, minmax as the normalization approach, Bhattacharyya or Roc rankings as the feature selection method, and GM-CPSO-NN as the classifier unit. As declared before, data #1 has a normal distribution while data #2 is specified as an imbalanced data set. At this juncture, the classification performance for data #1 is expected to be higher than the results for data #2 since the detection of coronavirus patterns of data #2 is more challenging owing to the smaller number of samples for the COVID-19 label. However, our framework achieved significant scores for data #1, especially for the accuracy (99.17%) and AUC (99.06%) metrics. It also showed better performance for data #2 by presenting success scores of 99.25% and 99.53% for the accuracy and AUC metrics, respectively. These results show the robustness of the proposed framework, implying that it can be considered for use with both normal and imbalanced data sets.

As evident in the framework, three parts, namely, normalization, feature ranking, and classifier, remain the same for both data set conditions. Only the feature extraction (radiomics) method differs with the data set handled. In other words, necessary radiomics are considered according to the data used. For real-time applications, if a designer has balanced data and proposes to use our supervised-framework, it is suggested that FOS features be used as radiomics. If the designer has an imbalanced data set and intends to use the framework, it is recommended that FOS + GLRLM features be used as the necessary radiomics so as to achieve the highest performance. After the training of the classifier in the framework, the test or real-time applications can be implemented by selecting the best network found during training.

Apart from the seven metric-based comparisons, the computation time analysis of the framework was evaluated. The training-test time and test time were observed separately for both data sets. For data #1, the proposed framework resulted in a training-test time and a test time of 1.7316 s and 0.0031 s, respectively. For data #2, these times were 2.5425 s and 0.0039 s, respectively. Clearly, our framework achieves remarkable results in a very short time, especially for the test that directly provides the operation time. Table 12 presents a comparison of our framework with that proposed by state-of-the-art studies. The proposed framework shows remarkable performance when compared with transfer-learning-based models, deep-learning-based pipelines, specific deep learning methods, and more complex systems comprising them. Accuracy and AUC rates achieved by the proposed framework are significant and yield appreciable performance vs. other studies. This superiority can be better observed in an accuracy-based comparison, especially for the results of studies involving binary classification. 

In this paper, a robust framework is presented to identify COVID-19 in X-ray images. The radiomics priority, normalization preferences, and active feature rankings were examined together to design the specifiedcoronavirus framework equipped with a state-of-the-art classifier.

In experiments, remarkable results were obtained, especially the accuracy and AUC metrics for both (normal and imbalanced) data sets. It is concluded that the preferences including FOS and FOS + GLRLM features normalized with the minmax approach to which Bhattacharyya or Roc rankings are applied can be considered to distinguish COVID-19 patterns in normal-and imbalanced-data conditions. FOS features showed high efficiency and were found to be decisive for COVID-19 detection in X-ray images. Furthermore, in the design of a robust model, apart from the determination of appropriate feature extraction (radiomics and normalization evaluation), feature ranking is also an important factor for achieving the highest performance.

In a future work, we intend to test the applicability of our framework to CT-based coronavirus detection. We intend to test its efficiency for both normal and imbalanced data sets. 

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and corona virus disease-2019 (COVID-19): the epidemic and the challenges

Recent progress in understanding 2019 novel coronavirus associated with human respiratory disease: detection, mechanism and treatment

Classification of COVID-19 patients from chest CT images using multi-objective differential evolution-based convolutional neural networks

CT image visual quantitative evaluation and clinical classification of coronavirus disease (COVID-19)

Composite Monte Carlo decision making under high uncertainty of novel coronavirus epidemic using hybridized deep learning and fuzzy rule induction

Automated detection of COVID-19 cases using deep neural networks with X-ray images

COVID-19 detection using deep learning models to exploit social mimic optimization and structured chest X-ray images using fuzzy color and stacking approaches

Covidiagnosis-net: Deep Bayes-squeezenet based diagnostic of the coronavirus disease 2019 (COVID-19) from X-ray images

Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks

Deep learning system to screen coronavirus disease 2019 pneumonia

Artificial intelligence distinguishes COVID-19 from community acquired pneumonia on chest CT

Diagnosis of coronavirus disease 2019 (covid-19) with structured latent multi-view representation learning

Covid-caps: A capsule network-based framework for identification of covid-19 cases from x-ray images

Automatic x-ray covid-19 lung image classification system based on multi-level thresholding and support vector machine

Covidx-net: A framework of deep learning classifiers to diagnose covid-19 in x-ray images

A machine learning-based model for survival prediction in patients with severe COVID-19 infection

Rapid and accurate identification of COVID-19 infection through machine learning based on clinical available blood test results

A novel triage tool of artificial intelligence assisted diagnosis aid system for suspected COVID-19 pneumonia in fever clinics

Feature extraction and classification on esophageal X-ray images of xinjiang kazak nationality

Machine-learning based radiogenomics analysis of MRI features and metagenes in glioblastoma multiforme patients with different survival time

An extensive study for binary characterisation of adrenal tumours

Adrenal tumor characterization on magnetic resonance images

Grading of metacarpophalangeal rheumatoid arthritis on ultrasound images using machine learning algorithms

Automated technique for coronary artery disease characterization and classification using DD-DTDWT in ultrasound images

Lymphoma images analysis using morphological and non-morphological descriptors for classification

A novel feature selection method for microarray data classification based on hidden Markov model

A comprehensive study about brain tumor discrimination by using feature ranking, mutual information and hybridized classifiers

Gm-cpso: a new viewpoint to chaotic particle swarm optimization via gauss map

Parkinson's disease recognition using Gauss map based chaotic particle swarm-neural network

Biogeography-based optimisation with chaos

Loss function selection in NN based classifiers: Try-outs with a novel method

Textural features for image classification

Computer-aided grading of gliomas based on local and global MRI features

Analysis of co-occurrence texture statistics as a function of gray-level quantization for classifying breast ultrasound

Local relative GLRLM-based texture feature extraction for classifying ultrasound medical images

Advanced statistical matrices for texture characterization: application to cell classification

A novel aggregate gene selection method for microarray data classification

Identifying medical diagnoses and treatable diseases by image-based deep learning

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.