key: cord-0658675-ykp1p97o
authors: Nouri-Moghaddam, Babak; Ghazanfari, Mehdi; Fathian, Mohammad
title: A Novel Bio-Inspired Hybrid Multi-Filter Wrapper Gene Selection Method with Ensemble Classifier for Microarray Data
date: 2021-01-04
journal: nan
DOI: nan
sha: 5f5a900fd282e68f9308e6d15f62475110290449
doc_id: 658675
cord_uid: ykp1p97o

Microarray technology is known as one of the most important tools for collecting DNA expression data. This technology allows researchers to investigate and examine types of diseases and their origins. However, microarray data are often associated with challenges such as small sample size, a significant number of genes, imbalanced data, etc. that make classification models inefficient. Thus, a new hybrid solution based on multi-filter and adaptive chaotic multi-objective forest optimization algorithm (AC-MOFOA) is presented to solve the gene selection problem and construct the Ensemble Classifier. In the proposed solution, to reduce the dataset's dimensions, a multi-filter model uses a combination of five filter methods to remove redundant and irrelevant genes. Then, an AC-MOFOA based on the concepts of non-dominated sorting, crowding distance, chaos theory, and adaptive operators is presented. AC-MOFOA as a wrapper method aimed at reducing dataset dimensions, optimizing KELM, and increasing the accuracy of the classification, simultaneously. Next, in this method, an ensemble classifier model is presented using AC-MOFOA results to classify microarray data. The performance of the proposed algorithm was evaluated on nine public microarray datasets, and its results were compared in terms of the number of selected genes, classification efficiency, execution time, time complexity, and hypervolume indicator criterion with five hybrid multi-objective methods. According to the results, the proposed hybrid method could increase the accuracy of the KELM in most datasets by reducing the dataset's dimensions and achieve similar or superior performance compared to other multi-objective methods. Furthermore, the proposed Ensemble Classifier model could provide better classification accuracy and generalizability in microarray data compared to conventional ensemble methods.

One of the most important advances in medical technology in the recent decade is the development of DNA Microarray technology, which makes it possible to solve the problem of gene expression profiling [1] . Cancer occurs mainly due to the changes in genes and their unwanted mutations. Identification of these genes and their effects on the development of cancer plays a major role in the diagnosis and treatment of such diseases. One of the important applications of microarray data is its use in identifying cancer patients [2] . Accordingly, in recent years, extensive studies have been conducted by relying on machine learning and data mining methods to provide classification methods in cancer diagnosis using microarray data. However, using machine learning methods to classify such data is very challenging due to reasons such as a small number of samples, a great number of genes, imbalanced data, data complexity, and data shift [2] .

The presence of a large number of genes makes it very difficult to identify and select genes that are effective in the disease. Also, it makes the classification models more complex and increases the training time of these models. Moreover, a small number of samples and the imbalanced data make classification models after training to be less generalizable to predict new data labels. To overcome the challenges of microarray data, solutions such as gene selection to reduce dimensions and ensemble learning to increase the generalizability of the classification model have been considered by many researchers [1] [2] [3] . The solutions of reducing dimension in microarray data are divided into two general classes of feature selection and feature extraction. They have positive results such as simplifying classification models, reducing learning time, and usually increasing classification accuracy. For dimension reduction, feature extraction methods maps the original gene space to lower dimension space [4] . However, this approach reduces the interpretability of the original genes and the possibility of identifying the effect of the original genes on the final result. Unlike the previous methods, feature selection aims at identifying a subset of prominent genes among all genes and uses statistical methods or meta-heuristic search methods to achieve this objective. In feature selection methods, subsets of selected genes are composed of genes of the main dataset. Therefore they allow researchers to have better interpretation and analysis [5, 6] . Thus, feature/gene selection is one of the suitable solutions to reduce the dimensions of microarray data and selects genes that are effective in different types of cancers. However, the problem of selecting the subset of prominent genes in microarray data is known as an NP-hard problem [7] .

Gene selection methods can generally be divided into five classes, namely filter, wrapper, embedded, ensemble, and hybrid [2] . In filtering methods, the inherent relationship between genes and output is measured using statistical techniques and information theory, and related genes are identified. These methods include Correlation Based Filter [8] , ReliefF [9] , Symmetrical Uncertainty(SU) [8] , Information Gain(IG) [4] , and so on. In addition, these methods have low computational overhead and high scalability, but they might reduce the classification accuracy because of not using a classification model in the gene selection process. On the other hand, Wrappers include a meta-heuristic search methods (e.g. GA, FOA, and PSO) and a classification model that tries to identify a quasi-optimal subset of genes by considering the classification performance criterion. Wrappers usually have high computational overhead, but they provide better results than the filter due to using classifiers in the search process. Hybrid methods typically use a combination of a filter and a wrapper to select genes. Hybrid methods try to exploit the positive points of both filter and wrapper methods simultaneously so that the filter method, as a preprocessing step, can reduce the dimensions of the gene space and then the wrapper method can select prominent genes among the remaining genes [1, 4, 10] .

Given the advantages of hybrid methods, they have been the subject of intense research in recent years. Based on the search method, hybrid methods can be divided into single-objective and multi-objective groups. Many hybrid single-objective methods such as GA [11, 12, 21, [13] [14] [15] [16] [17] [18] [19] [20] , PSO [22, 23] , FOA [24] , DE [25] , ACO ، [26, 27] , Gravitational Search Algorithm [28] , Shuffled Frog Algorithm [29] , Cuckoo search [30] have been proposed so far. Additionally, hybrid multi-objective methods have been considered by many researchers in recent years for the simultaneously optimizing the objectives of minimizing the number of genes and maximizing the efficiency of the classification model. Some of these methods are NSGA-II [31] [32] [33] [34] , MOPSO [35] , MOGA [36] , MOSHO [37, 38] , MOSSO [39] , MOACO [40] , and so on.

Imbalanced data and the small number of samples are other problems of Microarray data that challenge the performance of classification models in training and coping with unseen data. One method to cope with such challenges is to use ensemble learning models. The aim of developing such a system is to offer a trade-off solution between test error and training error in an automated classification model. Ensemble models have been used effectively so far in a range of problems like feature selection, missing features, imbalanced data, incremental learning, concept drift learning, and other applications [41] . The main difference among these methods stems from three factors: 1) the way of selecting the training data, 2) the process of creating ensemble learner members, 3) and the law of combining the output of classifiers [41] . The ability of ensemble classification models in reducing training error and increasing the generalizability of the model has made them usable in the area of feature selection studies [42] [43] [44] [45] [46] [47] [48] [49] [50] .

The problem of gene selection usually follows two conflicting objectives, which include reducing the number of genes and increasing the efficiency of classification. Accordingly, it can be solved as a multi-objective optimization problem (MOP). The final solution for MOPs is usually presented in the form of a set of non-dominated solutions, which is a trade-off between conflicting objectives [51] . Also, given the research literature, studies on solving the problem of gene selection using the hybrid multi-objective method are more limited compared to those of the hybrid single-objective method. Most of the proposed hybrid solutions have solved the problem of gene selection only by considering the criterion of classification efficiency as a singleobjective [2, 3, 5] .

Given what was stated above, to solve the problem of gene selection and to construct an ensemble classification model in Microarray data, a hybrid solution based on multi-filter and novel adaptive chaotic multi-objective forest algorithm (AC-MOFOA) wrapper method is presented in this paper. In the proposed solution, multi-filter is employed as a pre-processing step to reduce the dimensions of the dataset. Due to the combination of several filter methods (i.e., IG, mRMR, RelifF, CFS, and Fisher-score), the multi-filter method has less bias in selecting distinct genes compared to single-filter methods. Also, the multiobjective forest algorithm as a wrapper minimizes the number of genes, maximizes the classification criteria, and optimizes the ELM kernel parameters, simultaneously. Accordingly, the final output of the wrapper will be a set of non-dominated solutions that has solutions with different gene numbers and classification accuracy, and provides the diversity needed to construct an ensemble classification model. The results of the proposed algorithm are compared with five hybrid multi-objective methods (i.e., MOSSO, MOCEPO, C-HMOSHSSA, NSPSO, and MOBBBO) on nine public microarray datasets, which have the number of different genes, classes, and samples. The main objectives of this study are as follows:

 Providing a new solution based on multi-filter and wrapper for gene selection in the microarray  Introducing a new adaptive multi-objective forest algorithm based on chaos theory and non-dominated sorting  Using multi-filter to pre-process data and reduce the number of data genes  Selecting effective genes simultaneously with optimizing KELM classifier parameters  Constructing ensemble classifier using the results of the final Pareto front

The rest of this article is organized as follows: In Section 3, we will explain the fundamental principles of the theory and previous methods of gene selection. In Section 4, the proposed idea for solving the gene selection problem is described. Section 5 presents the design of the experiments and their details. In Section 6, the results of the experiments are analyzed. Finally, in Section 7, conclusions and suggestions for future works are presented.

The search process in wrapper methods is usually based on a metaheuristic method such as GA, and PSO. The FOA is one of the new metaheuristic algorithms that has been introduced in recent years. This algorithm tries to provide a solution for optimization problems by modeling trees' reproduction, growth, and competition in the forest.

FOA has features such as high speed, low number of function evaluations, effective global, and local search. In recent years, this algorithm has been successfully used to solve the problem of feature selection as a single-objective approach [52, 53] . The general steps of FOA are as follows:

1-Initial forest population formation: randomly initialize each tree with 0 age  . 2-Local seeding operator: At this stage, some trees are selected as parents among ones that their age is zero by methods such as the roulette wheel. This operator produces some children with the age of zero around each parent by applying a single change to parent variables. The number of children generated by this operator for each parent is known as Local Seeding Change (LSC). Besides, the age of all trees will be increased by one in exception of the best tree and new generated children which is considered zero in FOA.

To control the population of trees in the forest, FOA uses two parameters of age and Area-Limit. If the number of trees exceeds the Area-Limit, the trees that are older than the maximum allowable age are first removed from the forest and added to the candidate population. Next, if the number of trees is still more than Area-Limit, the trees are sorted in descending order based on the value of the objective function, are selected among the trees in the size of Area-Limit, and then are transferred to the next generation. Other remained trees are added to the candidate population. 4-Global Seeding Operator: To increase exploration capability in FOA, the global seedling operator is applied to trees in the candidate population. This operator randomly selects some trees among the trees of the candidate population and randomly initializes several variables in each tree. The number of changed variables and the number of selected trees in the global seedling operator are some FOA parameters that are denoted by Global Seeding Change (GSC) and Transfer rate, respectively. 5-Termination condition: Similar to other metaheuristic algorithms, one of the following conditions can be considered for termination of execution: 1) reaching the defined accuracy threshold, 2) reaching the specific iterations, or 3) reaching a specified number of function evaluation (NFE).

The ELM model, as one of the newest classification models based on artificial neural networks, was presented by Huang. ELM is based on Single-hidden Layer Feedforward Networks (SLFNs) [54] . This neural network model has been presented based on not adjusting the parameters of the hidden-layer network. Compared to conventional neural network training methods, ELM has features such as better generalizability, faster learning, and not being trapped in local optimum [55] . For linear separation of data and improving classification efficiency, Kernel ELM, as a version of ELM, maps data from a smaller space to a larger space. Based on the details provided in the Supplementary Information, the performance of KELM depends on the values of  of the kernel function and C (i.e., the regularization parameter). Hence, these two parameters need to be optimized.

Using statistical techniques and information theory, filter methods measure the intrinsic relationship between the genes and output and identify prominent genes. These methods have low computational overhead and high scalability. However, due to the lack of using a classification model in the gene selection process, they might compromise the accuracy of classification. Filter methods can be divided into two classes, including Univariate and Multivariate. Univariate methods independently examine the dependence of each gene on the target output, in which the relations between the genes are ignored. In contrast, multivariate methods try to reduce the redundancy of gene subsets by considering the dependence of each gene on the target output and relationships between the genes. Multivariate methods have a higher computational overhead compared to univariate methods. The description of the five filter method used in this paper (i.e., IG, mRMR, RelifF, CFS, and Fisher-score) is presented in Supplementary Information.

Problems in the real world usually include the objectives that need to be optimized simultaneously. To solve such problems, a set of solutions that represents a tradeoff among different objectives is required. The set of solutions to multi-objective problems is known as Pareto optimal solutions. A multi-objective optimization (MOO) problem is defined as follows [56, 57] : 

Definition 3: For a given MOP,   F v   , the optimal Pareto set *  is defined as follows:

Definition 4: For a given MOP,   G v   , the optimal Pareto set *  , and optimal Pareto set, * F  is defined as follows:

As a result, the optimal Pareto front of the set  of all decision variable vectors will include members that meet the criteria 2 and 3.

Gene selection is one of the important methods in the preprocessing of microarray data to reduce data dimensions and simplify classification models. In general, gene selection methods can be divided into 5 groups: Filter, Wrapper, Hybrid, Ensemble, and Embedded. Among the mentioned methods, hybrid methods have been considered by many researchers in recent years as they combine positive features of Filter and Wrapper methods. Filter methods use statistical models and information theory to identify genes related to the output. These methods have low computational overhead and high scalability. However, since they do not use a classification model, they usually cause a reduction in classification efficiency criterion. Wrapper methods use a search method to find subsets of quasi-optimal genes and use the accuracy/error of a classification model to evaluate the found subsets. Due to using a classification model in the search process, wrapper methods usually provide better results but they have a high computational overhead. Thus, in Hybrid methods, by combining the filter and wrapper methods, an attempt is made to use the advantages of both methods simultaneously in the final gene selection model. In hybrid methods, a filter is usually applied to the data as data preprocessing step to select the genes that have the most relationship with the output and the least relationship within the set, and to reduce the initial data set dimensions. Next, a wrapper is used to select a subset of quasioptimal genes among the remaining gene sets. We review the hybrid methods of gene selection in this section. Depending on the type of search method, hybrid methods can be divided into single-objective and multi-objective groups. Hybrid singleobjective algorithms usually follow the objectives of minimizing the number of selected genes or maximizing the efficiency of classification or a combination of these objectives.

Shreem et al. [58] proposed a hybrid solution based on Harmony search and the Markov blanket filter in which the Markov blanket was considered as a local search to improve harmony search solutions. In [27] , a hybrid solution is presented based on the Fisher-score filter to reduce the initial dimensions and the ACO algorithm for selecting the optimal gene subset. Elyasigomari et al. (2017) proposed a hybrid solution based on the mRMR filter and the hybrid algorithm of the cuckoo optimization algorithm (COA) and harmony search [30] . In this method, the Harmony search method is used as a local optimizer to improve COA solutions. In [20] , a hybrid method was proposed based on the T-test filter and the Nested-GA algorithm in which GA has two sections of outer and inner. The outer section selects genes based on the accuracy of the SVM classifier and the inner section is executed on the DNA Methylation dataset. Shukla et al. [19] proposed a new idea based on conditional MI and adaptive GA filters. In another study, a hybrid solution based on ensemble filter and GA was presented [18] . In this method, three filter methods of ReliefF, Chi-Square, and Symmetrical Uncertainty were used to construct the ensemble filter. Based on the conducted study, the Union merge function outperforms the Top N Gene. In [22] , a method was proposed based on correlation base filter and improved PSO. Gangavarapu et al. [59] proposed a new idea based on an ensemble filter that uses 5 filter methods of mRMR, IG, CFS, CFSS, and oneRFeatureEval to form the ensemble filter. Then, the parameters of the penalty functions are optimized using Greedy search and GA. In [60] , a hybrid solution is presented based on Artificial Bee Colony (ABC) called PrABC, which is an objective function as a combination of classification accuracy and the number of features. In this method, the dimensions of the dataset are reduced using an Ensemble Filter based on IG, Correlation, and Relief. Finally, PrABC is used to select the optimal gene subset. Bir-Jmel et al. [26] presented an innovative hybrid method using a filter based on the MWIS graph and ACO. In the proposed method, a local search is used to improve ACO solutions. In [24] , a hybrid solution is proposed based on ANOVA Filter and enhanced Jaya-based FOA. In this method, both parameters LSC and GSC in the FOA are considered in the range of 1%-50% and optimized using enhanced Jaya. Other hybrid single-objective methods include GA [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] , PSO [22, 23] , FOA [24] , DE [25] , ACO [26, 27] , Shuffled Frog Algorithm [29] , Cuckoo search [30] , and so on.

Multi-objective metaheuristic algorithms optimize several conflicting objectives in their search process [61] . Moreover, the problem of gene selection is inherently a multi-objective optimization problem that has at least two conflicting objectives: 1) minimizing the number of genes and 2) maximizing the criterion of classification efficiency. Many researchers have recently focused on solving the problem of multi-objective gene selection. The solution of such algorithms is a set of non-dominated solutions that provide a tradeoff among different objectives for users. The set of non-dominated solutions is also known as the Pareto Front solutions.

In [36] , a hybrid solution is proposed based on the relevance filter and MOGA. In this method, first, the relevance filter selects 100 genes and then MOGA continues the search by considering the objectives of the accuracy of classification and size of the subset of selected genes. Li et al. [62] proposed a new idea by combining the Fisher Score-Markov filter and the Multi-Objective Binary Biogeography Based Optimization (MOBBBO) algorithm. In this approach, Fisher-Markov as a preprocessing step reduces dimensions of the database by removing irrelevant genes. Next, MOBBBO simultaneously uses the search to find the optimal subset of genes and optimize the parameters of the SVM kernel function. In another proposed solution, a combination of correlation coefficient and NSGA-II is presented [32] . In [35] , a hybrid method is presented by considering Quartile filters and non-dominated sorting MOPSO.. Lai et al. proposed a hybrid idea based on the averaged filter method (AFM) and MOSSO [39] . AFM is an Ensemble Filter created by combining several filter methods. In [63] , a model is proposed by combining filter and wrapper methods in which the T-test filter is considered as one of the objectives. Baliarsingh et al. [64] applied the Fisher-score method along with the Multi-Objective Chaotic Emperor Penguin Optimization (MOCEPO). In the proposed method, conventional methods of generating random numbers were replaced with chaos theory to improve efficiency. In [65] , a solution called C-HMOSHSSA is presented by combining Fisher-score and multi-objective spotted hyena optimizer (MOSHO) and the Salp swarm algorithm (SSA). In another study, a combination of a Multi-filter and BOFS was used to solve the gene selection problem and construct an ensemble classifier [65] . In [66] , a hybrid solution was presented based on Fisher-score and MOFOA in which the concepts of repository and binary tournament selection are used to solve the problem of multi-objective gene selection. Divya et al [37] have proposed a hybrid solution based on IG filter and MOSHO considering the accuracy of SVM and the number of the selected genes. Some other hybrid multi-objective methods include NSGA-II [31] [32] [33] [34] , MOPSO [35, 39, 67] , MOGA [36] , MOSHO [37, 38] , MOSSO [39] , MOACO [40] , MOBAT [68, 69] , and so on.

Most of the studies conducted to solve the gene selection problem have focused on single-objective metaheuristic algorithms. In this regard, studies conducted on hybrid multi-objective gene selection methods are less than single-objective studies. Thus, our study aims to present a hybrid multi-filter and multi-objective wrapper solution based on the Forest Optimization Algorithm to solve the gene selection problem and construct an ensemble classifier. 

Given what was stated in the previous section, it can be concluded that the problem of gene selection is a multi-objective problem that must simultaneously solve two conflicting objectives of reducing the number of genes and increasing the efficiency of the classification model. The FOA algorithm is one of the latest metaheuristic algorithms used as a single-objective and multi-objective to solve the problem of feature/gene selection [24, 53, 66] . However, based on the "no free lunch" theory, no solution is optimal for all problems and a new and improved solution can always be proposed. Hence, this study presents a new solution based on multi-filter and AC-MOFOA to simultaneously solve the problem of gene selection and the construction of ensemble classifiers.

In the proposed solution, the Multi-filter first reduces the dimensions of the dataset by removing Irrelevant and redundant genes. The considered multi-filter consists of 5 filter methods of IG, Fisher-score, mMRM, CFS, and ReliefF. AC-MOFOA is then presented as a new version of the FOA algorithm based on the concepts of chaos theory, adaptive operators, non-dominated sorting (NS), and crowding distance. AC-MOFOA identifies quasi-optimal gene subsets, considering the objectives of maximizing KELM classification accuracy and minimizing the number of selected genes. Based on the concepts of MOO, the output of MOP problem-solving algorithms is presented as a set of solutions that are not superior to each other. Also, the members of the solutions set have the necessary variety to construct Ensemble classifiers due to training with different subsets of datasets and different KELM classifiers. Therefore, members of the final Pareto Front set are used to construct the Ensemble classifier to obtain a final classification model for microarray data classification. To have a good understanding, the general flowchart of the proposed method is shown in Figure 1 , followed by describing the main sections.

Filter methods have low computational load and high scalability. However, each filter approach has advantages and disadvantages and using only one filter method to identify related genes can cause bias in the final result. Thus, in the proposed method, a multi-filter is used for the initial preprocessing of the data and identification of subset of genes that have the highest correlation with the output and lowest internal correlation. The objective of multi-filter is to reduce the bias effect of filter methods in gene selection and to combine the strengths of different filter methods. The desired multi-filter includes five filters, including IG, Fisher-score, mMRM, CFS, and ReliefF. In this system, Fisher-score and IG are univariate filter methods and mMRM, CFS, and ReliefF are multivariate filter methods. In this step, each of the five methods in the multi-filter is first applied to the dataset and then 30% of the superior genes are extracted from the output of each method based on the rank of the genes. Then, using the voting mechanism and "min vote" threshold, the results of five filter methods are combined and the subset of output genes of the multi-filter step is determined. In this study, the min vote threshold was considered to be 3.

In the present study, a new version of the FOA called AC-MOFOA is proposed. This new algorithm is based on the concepts of chaos theory, adaptive operators, non-dominated sorting, and crowding distance. One of the important challenges in FOA is determining the optimal value of LSC and GSC parameters. Thus, adaptive local seeding and adaptive global seeding operators were proposed to automatically adapt the LSC and GSC values during the search process to solve this challenge in AC-MOFOA. Chaos theory was also used to improve AC-MOFOA diversity and faster convergence. AC-MOFOA optimizes KELM model parameters in addition to searching for quasi-optimal gene subsets. AC-MOFOA pseudo code is displayed in Algorithm 1, with its steps described in the following. Apply Non-Dominated Sorting to Forest 9:

F1 ← Extract Lowest Rank Pareto Front according to Non-Dominated Sort results 10:

Set Age of all trees in F1 Zero 11:

Calculate crowding distance for trees in F1 according to equation (9) 12:

Calculate LSC according to equation (10) 13:

for j = 1 to Area-Limit 14:

Apply Roulette wheel to select Parent Tree from F1 /* the solution with more crowding distance have higher selection probability */ 15:

Apply if Forest size > Area-limit 21:

Remove trees from Forest according to Pareto rank and Crowding-distance and them to Candidate population /* trees in Highest Pareto rank with lowest crowding distance will have high priority to remove */ 22: end if 23:

Randomly select some trees of the Candidate population according to the "transfer rate" 12:

Calculate GSC according to equation (11) In AC-MOFOA, each tree is denoted as a vector that represents the values of tree age, C and γ parameters related to KELM, and values related to gene selection 1 2 . ., , , , ) (

. In this method, the size of each tree is + 3, which includes n number of data genes, and three variables including age, C , and γ . Figure 2 illustrates the structure of a tree or solution. AC-MOFOA uses Chaos theory to initialize the trees. Chaos theory is one of the new techniques used to improve the search ability of metaheuristic algorithms. Chaos can be described as the behavior of a non-linear dynamic system which is highly sensitive to the initial state and can be calculated by deterministic algorithms. In addition to having random features, chaos theory has other features such as sensitivity, stochasticity, and ergodicity to preliminary situations. Based on these features, chaos theory ensures diversity among the population-based metaheuristic algorithm solutions and enhances the convergence performance of these algorithms [64, 70, 71] . For this purpose, the logistic map function will replace the generation of random numbers. The logistic map functions have main behaviors: chaotic and convergent. The chaotic and convergent behavior leads to exploration and exploitation, respectively [72] . The logistic map function is defined as follows:

where t i v represents the value of chaotic map for i -th variable, in which t and 1 t  represent he iteration number. Note that 0 i v is randomly initiated in the interval   0,1 . In the next iteration of AC-MOFOA to calculate 2 i v using seeding operators, the equation (7) will applied on . In the proposed solution, based on the KELM studies, the parameters C and γ are considered in range of 8 8 2 , 2      [73] . Due to initializing the values of C and γ , t i v in the equation (7) is replaced with t C and t  where the results of this equation mapped to 8 

 . The value of the age variable will be zero for all new trees generated by AC-MOFOA operators.

The values of the variables 1 2 , , , n v v v  belong to an interval of   0,1 , which are mapped to binary space based on the equation (8) . In equation (8), if the value of the i v is greater than0.5, the binary value of the gene will be equal to 1; otherwise, it will be equal to 1. A value of 1 means that the gene is selected while a value of 0 means that it is not selected.

To solve the problem of multi-objective gene selection, some changes must be applied to FOA. For this purpose, the idea of NS and crowding distance presented in NSGA-II [74] was employed. NS is a technique for ranking Pareto optimal solutions based on the concept of dominance. M and n are the number of objectives and solutions, respectively. NS first extracts the set of solutions that are not dominated by any of the solutions obtained and forms the Pareto front 1 F . Then, among the remaining solutions, the solutions that were dominated only by members of 1 F are extracted to form 2 F and this process continues accordingly. The set of solutions 1 F is considered as the Pareto optimal solutions. Crowding-distance is used in NSGA-II to prioritize members within each i F and direct search to less crowded areas. Crowding-distance is calculated using equation (9): 

A local seeding operator has been introduced in the FOA to generate exploitation capability. It generates an LSC number of new trees for each parent around trees with an age of zero. The local seeding operator in AC-MOFOA is a single-parent operator. Each parent is selected among the 1 F members using the roulette wheel. In this method, members with larger values are more likely to be selected to increase the diversity between solutions. This operator changes the value of a variable in the parent tree using logistic map function according to equation (7) . An example of this operator is illustrated in Figure 3 . Determining the LSC value is one of the challenges of FOA and is determined by trial and error based on the dimensions of the problem. In the proposed solution to overcome this challenge, the LSC value is determined adaptively by the algorithm itself. Accordingly, LSC value is first set at 20% of the data set dimensions and then it is updated in each step through the following equation.

where T represents the current number of iterations of the algorithm. Based on equation (10), the minimum LSC value will be 1.

Forest limiting is one of the most important operators of the FOA. In this process, old and improperly fitted trees are removed from the Forest and added to the candidate population. After implementing the local seeding operation, the age of all the trees in the forest increases by one unit. In AC-MOFOA, the age of 1 F member trees is considered to be 0. To limit the population, trees older than Max-Age are removed from the forest and added to the candidate population. If the forest size is still larger than the Area-Limit after removing the old trees, the trees with the highest Pareto rank will be removed from the forest and added to the candidate population. Additionally, if there is a need to remove a part of i F , trees based on cd value (from lower value to higher value) are removed from i F and added to the candidate population.

The global seedling operator is used to create exploration capability to generate new trees in FOA. This operator selects based on the size of the transfer rate parameter from the trees in the candidate population and then changes the GSC number of the variables in each tree. To change variables, a chaotic number in the allowable range of the variable is generated using equation (7) and replaces its previous value. Figure 4 illustrates the function of this operator. Another important parameter of FOA is the GSC parameter. Setting its optimal value is one of the challenges of FOA. In the proposed solution, the GSC value is set adaptively by the algorithm itself to overcome this challenge. In this method, the GSC value is first set at 30% of the dataset dimensions and then is updated in each step through equation (11):

Based on equation (11), the minimum value of GSC will be 2.

AC-MOFOA has two objectives of maximizing the accuracy of the classification and minimizing the number of genes. The classification accuracy is calculated using equation (12) :

The parameters FP, TP, FN, and TN represent false positive, true positive, false negative, and true negative, respectively.

Imbalanced data and the small number of samples are other issues of microarray data that challenge the performance of classification models in training and coping with unseen data. One way to cope with such challenges is to use ensemble learning models. The aim of developing such a system is to provide a trade-off solution between bias error (training error) and variance error (test error) in an automated classification system. One of the important features in constructing an ensemble model is to create diversity in basic classifiers. Several methods have been proposed to create diversity in basic classifiers; e.g. the use of different classification models or training each classification model with different subsets of datasets. The final solution of AC-MOFOA is a set of non-dominated solutions, each including a different subset of genes and a KELM with different kernel parameters. Therefore, it can be stated that the members of the final Pareto front have sufficient diversity in terms of basic classifiers and training subsets to construct an ensemble classifier. Accordingly, the output of the 1 F members is combined using the voting mechanism to form the final ensemble classifier.

To design the experimental scenarios, nine microarray datasets [75, 76] were used considering the diversity in the number of samples, genes, and classes (Table 1) . After reducing the dimensions in the multi-filter step, each of the datasets is randomly divided into two batches (80% for training and 20% for the test) while maintaining the ratio of classes. To evaluate the subset of selected genes, the KELM classifier with RBF kernel function is used and the kernel   γ and ELM   C parameters are optimized by the proposed solution. KELM and 10-fold cross-validation (CV) were used to evaluate the subset of selected genes [77] . In this study, five Hybrid multi-objective methods, namely C-HMOSHSSA [38] , MOBBBO [62] , MOCEPO [64] , MOSSO [39] , and NSPSO [35] , were selected for comparing and evaluating the AC-MOFOA results. Python 3.7 programming language is used to implement all methods. We also used the ELMClassifier function in the Python-ELM v0.3 Library with default settings. A PC with 16GB ram and Intel Core i7 6700HQ hardware was used for performing the tests. To compare the results fairly, the parameters of the selected algorithms are obtained based on the information of reference articles or through the Taguchi experiment design method [78] . In all algorithms, the number of individuals (i.e. tree, habitat, hyena, salp, and particle) is equal to 60 and the termination condition is considered to be 40000 evaluations. The results were also examined in 50 independent performances. In algorithms with continuous expression, the threshold for selecting or not selecting a feature is considered to be 0.5. Table 2 presents a summary of the parameters of compared algorithms. A total of 50 executions were used for comparing the results based on the Pareto front. Accordingly, the obtained Pareto sets in 50 independent executions are first placed in a Union set and then non-dominated solutions are extracted among them to form the final Pareto set [61] . Also, in the next section, Box-plot charts are used to represent some of the obtained results. These charts were plotted based on the data collected in the Union set.

One of the quantified criteria for comparing the performance of multi-objective algorithms is the success counting criterion (SCC) [79] . Based on this criterion, to quantify the performance of multi-objective algorithms, the final Pareto front of each method is aggregated in set P and then the Pareto front obtained by all methods is extracted from the members of set P. After this step, the level of contribution of each method in the formation of the optimal Pareto front is calculated using equation (13) .

In equation (13), if the solution belongs to the studied method, 1 i S  ; otherwise, 0 i S  . Also, n represents the total number of members of the optimal Pareto front. According to this criterion, a method with higher SCC values shows better performance in identifying the final Pareto front. Diversity and convergence measures are used to evaluate the performance of multi-objective metaheuristic algorithms in identifying the optimal Pareto front. The diversity measure assesses the level of diversity of Pareto front solutions and the convergence measure assesses the degree of convergence of solutions to the main Pareto front. Hypervolume is one of the criteria used for evaluating the performance of multi-objective metaheuristic algorithms. This criterion simultaneously evaluates both diversity and convergence measures [80, 81] . Hypervolume is calculated based on the following equation:

In this section, the test results are presented in five subsections: 1) providing multi-filter step results in terms of classification accuracy and dimension reduction rate, 2) comparison of AC-MOFOA with multi-objective algorithms, 3) AC-MOFOA performance evaluation using Student T-test, 4) presenting the comparison results of the examined methods in terms of CPU execution time, and space and time complexity, and 5) analyzing the results of the ensemble classifier. Each of the subsections is described in the following. 

The Multi-Filter step is the first step in the proposed hybrid solution for pre-processing and reducing the data dimensions. Thus, we first analyze the results of this step. Based on Table 3 , multi-filter consists of five filter methods of IG, Fisher-score, mMRM, CFS, and ReliefF. It could reduce the number of genes in all datasets by at least 73%. The highest reduction in the number of genes was observed in the Colon-Prostate dataset with 80.9% and the lowest reduction was observed in Rsctc_5 dataset with 73.43%. Also, comparing KELM accuracy results in two modes of training with all genes and training with selected genes suggests an improvement in classification accuracy for selected genes (Table 3 ). Due to employing the combination of Univariate and Multivariate filter methods, Multi-Filter ability and efficiency increased in selecting the most prominent genes from the dataset; also the bias of using a single filter method can be decreased. Thus, it can be concluded that the Multi-Filter step as preprocess step could effectively reduce dimensions of the datasets, and improve the classification accuracy. 

To evaluate the performance of AC-MOFOA and compare its results with C-HMOSHSSA, MOBBBO, MOCEPO, MOSSO, and NSPSO methods, Figures 5-9 were plotted based on the final Pareto front, classification accuracy distribution, and distribution of selected gene numbers in two modes of test and train. Based on Figure 5 , AC-MOFOA in 7 datasets could achieve higher classification accuracy by selecting fewer genes, and AC-MOFOA in 2 datasets (including SRBCT and Breast) showed performance similar to C-HMOSHSSA in terms of set objectives. Moreover, AC-MOFOA in all datasets could significantly increase the classification accuracy by considerably reducing the number of genes. For example, in the Breast dataset, AC-MOFOA by selecting 4 of 5592 genes (a set of genes selected by the Multi-filter) could achieve a classification accuracy of 100%.

To compare the level of contribution of each method in the formation of the final Pareto front, the SCC results on the test dataset were displayed in Table 4 . Based on Table 4 , in most datasets AC-MOFOA could achieve higher SCC values than other methods, suggesting that AC-MOFOA has a greater contribution in identifying the final Pareto front. Figure 7 , it can be seen that AC-MOFOA outperforms other multi-objective algorithms in most datasets in terms of the maximum and the minimum number of selected genes. Also, in 7 cases of datasets, the proposed method could provide better results than other methods in terms of the mean and median number of selected genes. Only in rsctc_5 and Tumors_9 datasets, C-SHMOSHSSA and MOSSO methods, respectively, outperformed AC-MOFOA in the objective function of minimizing the number of genes.

When comparing the distribution two objectives, they were examined independently and were different from the results in which both of these objectives were examined simultaneously using the concepts of dominance. For example, an algorithm may have achieved solutions with a higher number of genes but lower classification accuracy, while this solution has been dominated by other solutions in that algorithm. Table 5 . According to this table, AC-MOFOA had more contribution in identifying the final Pareto front and could identify more than 50% of the final Pareto front in 8 datasets. Figure 9 shows the results of comparing multi-objective algorithms based on the classification accuracy distribution on the train set. As can be seen from this figure, AC-MOFOA in all cases could achieve higher or similar accuracy to other methods. Furthermore, AC-MOFOA always outperforms other methods in terms of minimum, mean, and median of accuracy.

The obtained results indicate that AC-MOFOA outperformed other methods in terms of achieving the optimal Pareto front and involving in the formation of the final Pareto front. In detail, high initial values of LSC and GSC in the adaptive local and global seeding operators lead to enhance the exploitation and exploration capabilities of the AC-MOFOA. Local seeding operator is applied to the solution of 1 F that boosts the ability of AC-MOFOA to improve the Pareto solutions further. Moreover, employing the global seeding operator on the Candidate Population gives a chance to less fitted trees for better exploration of the problem space. In addition, AC-MOFOA uses NS to identify the Pareto front and rank the solutions. Furthermore, AC-MOFOA implements the crowding distance, which increases its ability to direct the search to the less crowded areas of the Pareto front, leading to a more uniform final Pareto front. Also, a separate study on distribution of each objective (namely, classification accuracy and the number of selected genes) shows that AC-MOFOA solutions were superior to other methods in most datasets in terms of statistical criteria such as upper bound and lower bound, mean and median. 

To evaluate the efficiency and effectiveness of AC-MOFOA compared to other multi-objective algorithms, Student T-test was performed on normalized values of Hypervolume. To perform the t-test, the Hypervolume values were first calculated in 50 independent executions for the MOSSO, MOCEPO, C-HMOSHSSA, MOBBBO, NSPSO, and AC-MOFOA algorithms. Next, the t-test was performed with a significance level of 0.05. Tables 6 and 7 show the t-test results in both test and train modes, respectively. Comparisons were made from left to top, respectively. Superior, similar, and worse performance of the AC-MOFOA compared to the corresponding algorithm are displayed by "+", "=", and "-", respectively. As can be seen from Table 6 , the AC-MOFOA results in all test datasets are significantly superior to the three methods of MOBBBO, NSPSO, and MOSSO. Moreover, MOCEPO provided similar results to AC-MOFOA only in the Lung database and showed worse performance in other cases. AC-MOFOA could also achieve superior results over C-HMOSHSSA in 5 out of 9 datasets and had similar results in 4 datasets. Table 7 shows the results of the t-test on the Hypervolume values of the train data. According to the results of the t-test, it can be concluded that AC-MOFOA outperformed other methods in most datasets. AC-MOFOA showed similar results with MOCEPO and C-HMOSHSSA in only a few cases. For example, we can refer to three datasets of Lung, SRBCT, and Leukemia3 in which AC-MOFOA had a similar performance to MOCEPO.

Time and space complexity analysis is commonly used to analyze the performance of computer algorithms. Therefore, in this section, we discuss AC-MOFOA complexity analyses. By analyzing the AC-MOFOA algorithm, it can be seen that the two steps of 8 and 11 have the highest time complexity among all the steps of the algorithm. In step 8, a non-dominated sorting operation is performed with the time order of   2 O MN , assuming that M is the number of objectives and N is the number of trees. Also, based on a study conducted by Jensen [82] , with an optimal implementation, the time complexity of Figure 10 presents the results of comparing the mean execution time of AC-MOFOA with other multi-objective algorithms. The mean execution time of algorithms is calculated based on the 50 independent executions time on each of the datasets.

Analyzing the AC-MOFOA execution time shows that this method had a shorter execution time than MOCEPO, MOBBBO, C-HMOSHSSA, and NSPSO. Due to its single-parent structure, AC-MOFOA only needs to select one parent among the Pareto Front members. Thus, it spends less time in the parent-selection step. Also, AC-MOFOA only needs to make limited changes in the structure of the parent tree to generate new trees, resulting in the reduced computational overhead of global and local seeding operators. Furthermore, implementing the chaos theory on local and global seeding operators increases the AC-MOFOA convergence speed. At the same time, it maintains the necessary diversity needed for effective search of the problem space. As a result of this, the execution time of AC-MOFOA can be reduced compare to the traditional methods. Among the methods compared, the MOSSO algorithm had a shorter execution time than AC-MOFOA in some scenarios because of using the Archive structure to maintain the Pareto front and being a single parent. 

In this section, we analyze the results of the Ensemble classifier made from members of the Pareto Front. The comparison results of the proposed Ensemble classifier with the three ensemble learning methods (i.e., Adaboost, Bagging, and Random Forest) are shown in Table 8 . Analysis of the results suggests that the proposed Ensemble classifier in all datasets could improve the classification accuracy. The proposed method also outperformed the other methods in 7 datasets. For example, in Breast dataset, the proposed method could improve the classification accuracy by more than 30% compared to the non-optimized KELM. In fact, due to the optimization of the RBF kernel   γ and ELM   C function parameters, AC-MOFOA could improve the classification efficiency of each of the Pareto Front KELMs simultaneously with reducing the number of genes. Accordingly, the combination of improved KELMs could provide better results compared to conventional ensemble learning methods. However, the proposed method in two datasets of Leukemia3 and Lung yielded less accurate solutions than the Random Forest and Bagging methods. 

Based on the obtained results, the proposed method could successfully achieve the set objectives (i.e., reducing the dimensions of Microarray datasets, increasing the accuracy of the classification, and constructing the Ensemble Classifier). The considered Multi-Filter includes five filter methods, including IG, Fisher-score, mRMR, CFS, and ReliefF. The selected combination includes Univariate and Multivariate Methods. In this system, Univariate methods rank genes based on their individual relation with output. On the other hand, Multivariate methods rank genes based on their intra-relations and relation with output. By reducing the bias of single-filter methods, Multi-filter could select prominent and influential genes of the dataset and reduce dimensions of them to an acceptable level. Additionally, Multi-filter considers the dependency of each gene with output and redundancy between genes. Furthermore, AC-MOFOA simultaneously reduced the size of the gene subset and improved classification accuracy using the NS, crowding distance, chaos theory, adaptive local, and adaptive global seeding operators. The use of adaptive local and global seeding operators enabled the AC-MOFOA to overcome the challenge of determining LSC and GSC, which leads to broader global and local search to find the optimal Pareto front in the early stages. Due to the implementation of the chaos theory concepts in local and global seeding operators, AC-MOFOA could achieve faster convergence compared to the traditional methods. Furthermore, the local seeding operator in AC-MOFOA uses the solutions of for generating new trees, which improves AC-MOFOA's search ability toward the optimal Pareto front. Besides, the use of NS and crowding distance enabled the algorithm to perform well in identifying the uniform and divers Pareto fronts. Finally, members of final Pareto front, including pairs of subset genes and KELM, were utilized to construct the Ensemble classifier. It is of note that the members of the Pareto front have the necessary diversity to construct an ensemble learner because of using different gene subsets and KELM classifiers with different optimized parameters. The results show that the proposed Ensemble classifier outperforms conventional Ensemble classification methods in terms of classification accuracy and generalizability.

In the present study, a hybrid method of Multi-filter and AC-MOFOA was presented to solve the problem of gene selection and construct Ensemble Classifiers for the microarray datasets. Based on experiments and results, it could successfully achieve the set objectives. The proposed multi-filter was developed by combining five filter methods, namely IG, Fisher-score, CFS, mRMR, and ReliefF, using the voting mechanism. The combination of several univariate and multivariate filter methods in constructing Multi-filter has reduced the bias effect compared to single filter mode and, as a result, the proposed Multi-filter as a pre-processing step could reduce the dimensions of the data and increase the classification accuracy. In the second step of the proposed hybrid method, AC-MOFOA was presented using the concepts of NS, crowding distance, chaos theory, and adaptive operators. The second step of the proposed method's objectives was selecting the quasi-optimal subset of genes, optimizing the KELM classification parameters, and increasing the classification accuracy. AC-MOFOA uses NS to identify the Pareto front and rank the solutions. In addition, using the crowding distance, the ability to direct the AC-MOFOA search to the less crowded areas of the Pareto front increased, leading to a more uniform final Pareto front. Using chaos theory and adaptive local and global seeding, AC-MOFOA could improve its global and local search without needing to set the LSC and GSC parameters. Moreover, utilizing the logistic map as a chaos function in global and local seeding operators, the convergence speed and search diversity of AC-MOFOA have been improved. Finally, in the third step of the proposed method, the Ensemble Classifier model was formed using the final Pareto front KELM classifiers. To evaluate the effectiveness and efficiency of the proposed solution, the results were compared with five hybrid multi-objective gene selection methods of MOSSO, MOCEPO, C-HMOSHSSA, NSPSO, and MOBBBO on nine microarray datasets with different dimensions. Based on the results, AC-MOFOA in most datasets could simultaneously increase classification accuracy by reducing the dimensions of the datasets. Moreover, the distribution analysis of the accuracy of solutions and the number of genes selected by AC-MOFOA based on statistical criteria confirm its effectiveness. Furthermore, the hypervolume indicator and SCC analysis show that AC-MOFOA achieves better results on identifying the optimal Pareto front. Additionally, the proposed Ensemble Classifier model provides better performance for microarray data classification by increasing the accuracy of classification compared to conventional ensemble learning methods (i.e., Adaboost, Bagging, and Random Forest). AC-MOFOA could successfully solve the problem of gene selection and construction of the Ensemble classifier.

However, due to developments and advances in DNA microarray that have resulted in increasing dimensions and samples of such data, conventional computers will lose their ability to solve the problem of gene selection and microarray data classification. Hence, there is an urgent need to develop methods based on big data architecture. In future studies, we will review and develop scalable methods based on multi-objective metaheuristic algorithms in the big data context.

A Survey on Different Feature Selection Methods for Microarray Data Analysis

A review of microarray datasets and applied feature selection methods

A Survey on Hybrid Feature Selection Methods in Microarray Gene Expression Data for Cancer Classification

A survey of feature selection and feature extraction techniques in machine learning

A review of feature selection methods in medical applications

A survey on swarm intelligence approaches to feature selection in data mining

On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems

Correlation-based Feature Selection for Machine Learning

A Practical Approach to Feature Selection

A Survey on Feature Selection

A hybrid feature selection algorithm for gene expression data classification

A hybrid feature selection method for DNA microarray data

A hybrid gene selection method for microarray recognition

A novel aggregate gene selection method for microarray data classification

Gene selection for microarray cancer classification using a new evolutionary method employing artificial intelligence concepts

A novel hybrid feature selection method for microarray data analysis

Two-Stage Hybrid Gene Selection Using Mutual Information and Genetic Algorithm for Cancer Data Classification

Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods

A two-stage gene selection method for biomarker discovery from microarray data for cancer classification

A Nested Genetic Algorithm for feature selection in high-dimensional cancer Microarray datasets

IG-GA: A hybrid filter/wrapper method for feature selection of microarray data

Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification

Hybrid particle swarm optimization and tabu search approach for selecting genes for tumor classification using gene expression data

A new optimal gene selection approach for cancer classification using enhanced Jaya-based forest optimization algorithm

Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments

Gene Selection via a New Hybrid Ant Colony Optimization Algorithm for Cancer Classification in High-Dimensional Data

A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization

Gene selection for cancer types classification using novel hybrid metaheuristics approach

An evolutionary framework based microarray gene selection and classification approach using binary shuffled frog leaping algorithm

Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification

Optimal genes selection with a new multi-objective evolutional algorithm hybriding NSGA-II with EDA

Feature selection in cancer microarray data using multi-objective genetic algorithm combined with correlation coefficient

Evolutionary rough feature selection in gene expression data

Multi-objective optimization using genetic algorithm for gene selection from microarray data

Cancer microarray data feature selection using multi-objective binary particle swarm optimization algorithm

Multi-objective optimization using pareto GA for gene-selection from microarray data for disease classification

Prediction of Gene Selection Features Using Improved Multi-objective Spotted Hyena Optimization Algorithm

C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods

Multi-objective simplified swarm optimization with weighting scheme for gene selection

Dimension reduction for microarray data using multi-objective ant colony optimisation

Ensemble learning

A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm

A Hybrid Feature Selection with Ensemble Classification for Imbalanced Healthcare Data: A Case Study for Brain Tumor Diagnosis

Using an ensemble classifier based on sequential floating forward selection for financial distress prediction problem

Wrapper-and ensemble-based feature subset selection methods for biomarker discovery in targeted metabolomics

Wrapper Feature Subset Selection for Dimension Reduction Based on Ensemble Learning Algorithm

Ensemble-based wrapper methods for feature selection and class imbalance learning

Ensemble based on GA wrapper feature selection

Churn prediction system for telecom using filter-wrapper and ensemble classification

Building an efficient intrusion detection system based on feature selection and ensemble classifier

A Survey of Multiobjective Evolutionary Algorithms for Data Mining : Part I

Forest Optimization Algorithm

Feature selection using Forest Optimization Algorithm

Extreme learning machines: A survey

Extreme learning machine: theory and applications

Multi-objective optimization using evolutionary algorithms

Evolutionary Algorithms for Solving Multi-Objective Problems

Hybridising harmony search with a Markov blanket for gene selection problems

A novel filter-wrapper hybrid greedy ensemble approach optimized using the genetic algorithm to reduce the dimensionality of high-dimensional biomedical datasets

A probabilistic multi-objective artificial bee colony algorithm for gene selection

Pareto front feature selection based on artificial bee colony optimization

Multiobjective binary biogeography based optimization for feature selection using gene expression data

Feature Selection Using Multi-Objective Optimization Technique for Supervised Cancer Classification

Analysis of high-dimensional biomedical data using an evolutionary multi-objective emperor penguin optimizer

A multi-objective feature selection and classifier ensemble technique for microarray data analysis

A novel filter-wrapper hybrid gene selection approach for microarray data based on multi-objective forest optimization algorithm

Gene selection from large-scale gene expression data based on fuzzy interactive multi-objective binary optimization for medical diagnosis

A New Meta-heuristic Bat Inspired Classification Approach for Microarray Data

Gene selection for tumor classification using a novel bio-inspired multi-objective approach

Chaos genetic algorithm instead genetic algorithm

Scheduling of scientific workflows using a chaos-genetic algorithm

Genetic algorithm using theory of chaos

Analysis of high-dimensional genomic data employing a novel bio-inspired algorithm

A fast and elitist multiobjective genetic algorithm: NSGA-II

Markov blanket-embedded genetic algorithm for gene selection

Wrappers for feature subset selection

Taguchi's quality engineering handbook

Improving PSO-Based Multi-objective Optimization Using Crowding, Mutation and ∈-Dominance

Theory of the hypervolume indicator: Optimal μ-distributions and the choice of the reference point

Analyzing hypervolume indicator based algorithms

Reducing the Run-Time Complexity of Multiobjective EAs: The NSGA-II and Other Algorithms