key: cord-0797801-f9slvxtk authors: Fang, Qicheng; Shen, Bo; Xue, Jiankai title: A new elite opposite sparrow search algorithm-based optimized LightGBM approach for fault diagnosis date: 2022-01-23 journal: J Ambient Intell Humaniz Comput DOI: 10.1007/s12652-022-03703-5 sha: da95c4c863adda0bdd51d7d47282793ba07e3b55 doc_id: 797801 cord_uid: f9slvxtk In this paper, a new fault diagnosis approach based on elite opposite sparrow search algorithm (EOSSA) optimized LightGBM is proposed. It is necessary to extract appropriate features when dealing with high-dimensional data. Since the distribution of the high-dimensional data is not always approximately subject to a normal distribution, it will cause errors when it is approximated to normal distribution for feature extraction. The dimension reduction algorithms based on Euclidean distance often ignore the change of data distribution. To address this problem, cam locally linear discriminate embedding (CLLDE) based on cam weighted distance is proposed, which can improve the performance dealing with the deformed data of locally linear discriminate embedding (LLDE). The performance of CLLDE is better than LLDE on the iris dataset. It is important to establish a classifier with optimized hyper-parameters for fault identification. Sparrow search algorithm (SSA) is a novel optimization algorithm, which has achieved good results in many applications, but its optimization ability and convergence speed still need to be improved. Elite opposite sparrow search algorithm (EOSSA) is proposed by introducing elite opposite learning strategy and orifice imaging opposite learning strategy into SSA. The optimization results on benchmark functions show that EOSSA converges faster and has better optimization ability compared with the other five algorithms. EOSSA is used to optimize the hyper-parameters of LightGBM to train a classifier that can obtain a better fault recognition rate. Finally, the effectiveness of the proposed fault diagnosis approach is verified on Tennessee Eastman (TE) process dataset. Experiment results demonstrate that the EOSSA-LightGBM-based approach is superior to other algorithms. Fault diagnosis and detection are mostly data-driven nowadays. Although data acquisition becomes easier, how to deal with a large amount of high-dimensional data has become a difficult problem. There is a lot of information that can be used in high-dimensional data, but most of them are redundant, and a large amount of data increases the computational complexity, which may lead to the curse of dimensionality. A common feature extraction method to deal with highdimensional data is dimension reduction. The traditional linear dimension reduction methods include principal component analysis (PCA) (Duchene and Leclercq 1988; Wold et al 1987) , linear discriminant analysis (LDA) (Duda et al 2001; Etemad and Chellapa 1997) and others. Later, Schölkopf et al. (1997) proposed kernel principal component analysis (KPCA) to deal with nonlinear data, which achieved good results. With the further development of manifold learning, there is a better choice to deal with nonlinear data. Manifold learning is mainly divided into local information preserving methods and global information preserving methods. Locally linear embedding (LLE) (Roweis and Saul 2000) is a classical manifold learning method based on local information, which can generate an implicit function to map data from high-dimensional space to low-dimensional space. Nonlinear methods for preserving global information mainly include multi-dimensional scaling (MDS) (Cox and Cox 1994) , isometric mapping (ISOMAP) (Law and Jain 2006; Tenenbaum et al 2000) and others. ISOMAP proves that it is feasible and effective to change the measurement distance between data points. Li et al. (2008) combines maximum margin criterion (MMC) (Li et al 2006) with LLE to propose locally linear discriminant embedding (LLDE) . LLDE uses the category attributes of data, and it makes the data of different categories farther apart and similar data closer to each other after dimension reduction. Besides, the algorithm avoids the small sample size (SSS) problem (Zheng et al 2004) . As mentioned above, using category attributes is more conducive to classification. Because of the attraction or repulsion between data samples (Zhou and Chen 2006) , the distribution of data may be deformed. Cam weighted distance considers the scale of data distribution and the direction of deformation, which can well measure deformed data. When building a neighborhood with cam weighted distance, the samples with larger density will be given smaller weight and larger weight for the smaller ones (Zhou and Chen 2006) . The fault data should be paid more attention in fault diagnosis, while cam weighted distance can give more attention to fault data. Owing to the way that LLDE calculates distance, LLDE has some limitations when dealing with deformed data. To address this problem, cam locally linear discriminant embedding (CLLDE) is proposed based on cam weighted distance. It is necessary to classify the extracted data to realize fault identification after feature extraction. Zhang et al. (2018) used robust LLE to extract features, then used support vector machine (SVM) to identify mechanical faults. Fan et al. (2019) used convolutional neural network (CNN) for feature extraction, and then used LightGBM for mechanical fault prediction. LightGBM (Ke et al 2017) is a gradient lifting algorithm based on the decision tree proposed by Microsoft. In this algorithm, the histogram is introduced to generate the decision tree. In the sampling process, gradient-based one side sampling (GOSS) and exclusive feature bundling (EFB) are used to merge independent features. In addition, the enhancement to the leaf-wise growth strategy improved the computation efficiency of the algorithm. The influence of hyper-parameters should be considered when training a classifier. LightGBM has some hyperparameters that determine the final classification effect. Thus, it is important to select the appropriate hyper-parameters for the classifier. Swarm intelligence optimization algorithm is widely applied in engineering and plays an important role in the adjustment of model parameters. Zeng et al. (2017) applied the improved particle swarm optimization (PSO) to address the short-term load forecasting problem. Liu et al. (2019) used an improved PSO and K-means algorithm to solve the clustering problem of the emergency patient. You et al. (2018) use BP neural network optimized by hybrid PSO to study the electro-hydraulic control system. Zeng et al. (2018) proposed a switching delayed PSO to optimize the hyper-parameters of SVM to implement medical diagnosis. Pathana et al. (2021) show that an optimization algorithm to optimize CNN to identify COVID-19 patients by X-ray image of the lung. Sparrow search algorithm (SSA) (Xue and Shen 2020) is a novel swarm intelligence algorithm, which has been applied in many fields and achieved good results, such as (Xing et al 2021; Zhang and Ding 2021; Zhu and Yousefi 2021) . Xing et al. (2021) apply SSA in fault diagnosis of the wheelsetbearing system. show that the chaotic mapping strategy to improve SSA to optimize the stochastic configuration network. Zhu et al. (2021) proposed an adaptive SSA to address the parameters identification problem. However, the search efficiency and the convergence speed of SSA are needed to be improved. To improve the ability of the algorithm to find the optimal solution and accelerate the convergence speed of the algorithm, the elite opposite sparrow search algorithm (EOSSA) is proposed based on the elite opposite learning strategy and orifice imaging opposite learning strategy. EOSSA is used to optimize LightGBM to find the most suitable hyper-parameters to identify fault data. The diagram of the proposed fault diagnosis approach is shown in Fig. 1 . The main contributions of this paper are summarized as follows. CLLDE algorithm is proposed based on cam weighted distance to improve the limitations when dealing with the deformed data of LLDE. EOSSA is proposed Fig. 1 Diagram of the proposed framework based on the EOSSA-LightGBM approach based on elite opposite learning strategy and orifice imaging opposite learning strategy, which improves the ability to find the optimal solution and accelerate the convergence speed compared with SSA. SSA and EOSSA are used to optimize the hyper-parameters of LightGBM for the first time, and a new fault diagnosis approach based on EOSSA optimized LightGBM is proposed. Aimed at the problem that the fault data may not be approximated to normal distribution for similarity measurement. An approach based on CLLDE and EOSSA-LightGBM is proposed to address the diagnosis problem of the deformed fault data. The rest of this paper is organized as follows. In Sect. 2, CLLDE is proposed based on cam weighted distance, and its effectiveness is verified on the iris dataset. In Sect. 3, EOSSA is proposed by introducing the elite opposite learning strategy and orifice imaging opposite learning strategy into SSA. The establishment process of EOSSA-LightGBM and the fault diagnosis experiment are introduced in Sect. 4. The conclusion of the whole paper is in Sect. 5. Let d dimensional vector Z obey standard normal distribution N(0, I), and its probability density function is Random vector X can be defined by the transformation (Zhou and Chen 2006) where z denotes the original well distributed dataset. a > b ⩾ 0 reflects the scale of data distribution and the deformation direction in a certain direction. is a normalized vector that describes the deformation orientation. ��z�� = √ z T z , X is the deformation distribution, and a and b are the deformation parameters (Pan et al 2009) . The deformed data can be changed into normal distribution by transforming z = X∕(a + b cos ) , then the definition of cam weighted distance is obtained. Let x 0 ∈ R d be the center of cam distribution D d (a, b, ) . The cam weighted distance from point x ∈ R d to x 0 can be defined as (2) X = a + b z T z z. According to Lemma 1, for arbitrary x in dataset, we can suppose that it is the center of the cam distribution. The points around it are subject to cam distribution, and its k-nearest neighbors The central vector Ĝ i and the average vector length L i can be presented as follows: G i and the mean value of v ij can be used to estimate E(X) . Besides, L i and the mean value of ‖v ij ‖ can be used to estimate E(‖X‖) . The parameter estimation can be obtained by Lemma 1: In practical applications, most high-dimensional data are not normally distributed. For fault data, it is often different from normal data in the process of acquisition due to attraction, repulsion, and other factors. These changes can not be ignored. Under this condition, it assumes that the data approximately is subject to the standard normal distribution will produce large errors. The original Euclidean distance calculation method based on normal distribution is no longer applicable. It can transform the deformed data into normal distribution to use cam weighted distance, which can eliminate the influence of data deformation and better describe the similarity (Zhou and Chen 2006) . Considering this condition, cam weighted distance is used to improve the nearest neighbor distance calculation method of LLDE to make it more suitable for fault data diagnosis. Cam Locally Linear Discriminate Embedding (CLLDE) uses cam weighted distance to calculate the distance between the samples, and other calculation processes are similar with LLDE. Changing the calculation method of the distance that used in LLDE between the nearest neighbor points can change the direction of the algorithm to select the nearest neighbor points. Instead of taking the nearest neighbor points as the selection criteria, the calculation method will consider the points in all directions of the center point to select the nearest neighbor points. Based on the cam distance, combined with LLE and MMC, CLLDE is proposed to extract feature. LLE is a nonlinear dimension reduction method. Let X = {X 1 , X 2 , … , X n } , and X i ∈ R D denotes n points in D dimensional space, where i = 1, 2, … , n . The mapping of high-dimensional data in lowdimensional space can be recorded as LLE achieves the goal through three steps (Lei et al 2010) . Firstly, kNN algorithm or -ball criterion is used in LLE to find the nearest neighbors. LLE is to find the best reconstruction weight matrix in the second step, and to minimize the local reconstruction error of x i can realize it. It can be described as where N is the amount of all data points, x i denotes the ith data point, x j denotes the j-th nearest neighbor, and w ij represents the weight coefficient of the i-th data point to the j-th nearest neighbor. Let N k i be the set of neighbor points of x i . The x i is reconstructed by its neighbor points, if x j is the neighbor point of x i , then w ij ≠ 0 ; on the contrary, w ij = 0 . In addition, it should meet ∑ k j=1 w ij = 1 to have a better data distribution after dimensionality reduction. The restriction of weight W can be written as In the third step, the reconstruction matrix W can be used to calculate the optimal embedding matrix Y after dimension reduction. This step can be described as (13) can be transformed as arg min y tr{Y T MY} , where M = M ij n×n . To avoid degenerate solutions, it constrains the embedding vectors to have unit covariance with which satisfy 1 N Y T Y = I . According to Rayleigh-Ritz theorem (Li et al 2008) , the smallest nonzero eigenvector of the sparse matrix M is the solution of (13). Li et al. (2008) linearize LLE by introducing Y = V T X into LLE to address this problem, and (13) can be rewritten as The maximum margin criterion (MMC) restricts the mapping relationship, and its purpose is to maximize the distance between each class of samples. It can use the category attributes of data to improve the separability of data. S w is the within-class scatter matrix. S b is the between-class scatter matrix. They can be calculated by Li et al (2006) where c is the number of classes, m is the mean vector of all the samples, m i represents the average vector of the i-th class, n i is the number of samples in the i-th class, and x i j is the j-th sample in the i-th class. The objective function of MMC under projection matrix W can be described as The class relationship between samples is considered in MMC. It can obtain the optimal linear discriminant mapping and maintain the internal geometric structure of neighbors. The linearized LLE obtains the manifold structure of high-dimensional data, which ensures the scalability of the data. To minimize the reconstruction error and maximize the distance between classes, a multi-objective optimization problem can be obtained by combining (14) and (17), and they have the same mapping matrix in this case. The multiobjective optimization problem can be represented as Since (14) and (17) have the same mapping matrix, a single objective optimization problem can be constructed by making a difference to solve the mapping matrix V. The single objective optimization problem can be written as The problem can be solved by the Lagrange multiplier method, and the form of (19) can be transformed into where i is the generalized eigenvalue of (XMX T − (S b − S w )) and XX T . V is the corresponding generalized eigenvector. Therefore, the minimum value of the objective function (19) can be required as long as the generalized eigenvector V of (21) is obtained. V is composed of eigenvectors corresponding to the first d smallest eigenvalues of the generalized spectral decomposition, where d is the target dimension to be reduced to. After changing the distance of selecting the nearest neighbor, mapping matrix Y can be obtained by (21) and Y = V T X , which can better reflect the actual distribution of the deformed data. The main steps of CLLDE are summarized in Algorithm 1. The next part is the performance experiment of CLLDE in comparison with LLDE. The integrated development environment is Python 3.7.6 for all the experimentations, the model of CPU is Intel(R) Core(TM) i7-8750H, and the operating system is Windows 10. The following experiments are in the same environment. The iris dataset contains 150 records of three categories called iris-setosa, iris-versicolour, and iris-virginica respectively. Each category contains 50 data, and each data contains 4 characteristic variables, which means the sample is 4-dimensional. Figure 2 shows the dimension reduction effect of LLDE on the iris data with 10, 20, 30, 40, and 50 neighbors, while Fig. 3 is the dimension reduction effect of CLLDE on the iris data with 10, 20, 30, 40, and 50 neighbors. Through the comparison, we can conclude that CLLDE achieves a better dimensionality reduction effect on the iris dataset than LLDE. In summary, the experiments on the iris dataset show that CLLDE can better distinguish different categories of data, and it is more suitable for pattern recognition with labels under the same neighborhood conditions for uneven data or deformed data. SSA is a novel swarm intelligence algorithm inspired by the behaviors of the sparrows. SSA searches for the solution to the optimization problem by imitating the foraging behaviors and anti-predatory behaviors of the sparrows. Firstly, some sparrows are randomly selected as producers in the sparrow population. The main mission of the producers is to find food for the whole population, which determines the direction and the area of foraging. Scroungers will follow the producers to find food, and some scroungers will keep a lookout over the producers and wrest food from the producers. Once the predator is detected, the individuals will send the anti-predatory signal. Finally, sparrows find the most suitable location to forage for the whole group by iterating the location of producers and scroungers for many iterations. The dimension of the sparrow population is n × H, and n is the number of all the sparrows. H represents the dimensionality of variables to be optimized. The position of the i-th sparrows can be described as S i = [s i,1 , s i,2 , … , s i,q ] , i ∈ [1, n] , q ∈ [1, H] , and s i,j shows the position of the i-th sparrow having j dimensional variables. The fitness value of the sparrow population can be described as where n represents the number of all the sparrows of population, and the value of each row in F S is the fitness value of the individual sparrow. In SSA, the mission of the producers is to find food for the population, and the main behavior of the scroungers is mainly to follow the producers to get food. Besides, some scroungers may compete with producers for the food, and the scroungers will get the food if they win. Based on the above description, the location of producers is defined as where t represents the current iteration, s t i,j is the value of the i-th sparrow in the j-th dimension at the t-th iteration, T is the maximum number of iterations, and ∈ (0, 1] is a random number. A alarm ∈ [0, 1] represents the warning value while J th ∈ [0.5, 1] expresses the safety threshold. Q is a random number that is subject to the normal distribution. is a 1 × q matrix where all elements are one (Xue and Shen 2020) . The position of the scrounger can be updated by where s t+1 pbest represents the best position in the global search space at the t+1 iteration, while s t worst represents the global worst position at t-th iteration. is a 1 × q matrix whose elements are randomly assigned 1 or -1, and + satisfies When the sparrows in the population are aware of the danger, for the sparrows at the edge position, they will fly to a safe place. For the sparrows in the middle of the population, they approach other sparrows in a random walk. According to mentioned above, the mathematical model can be expressed as where s t best express the optimal position in the global search space at t-th iteration, and is a random number, which subjects to standard normal distribution and controls the step size. K ∈ [−1, 1] shows the moving direction of sparrow. is a very small constant, which is aimed to avoid zero-division-error. In a swarm intelligence algorithm, the initial solution determines the distribution of the initial population in the solution space. The spatial distribution of the solution will affect the search-ability and convergence efficiency of the algorithm. The high-quality initial population can accelerate the convergence speed of the algorithm, which is more conducive to finding the global optimal solution. Based on mentioned above, the elite opposite learning and orifice imaging opposite learning strategy are introduced into SSA, and the elite opposite sparrow search algorithm (EOSSA) is proposed. Before introducing EOSSA, it is necessary to explain the opposite point and the elite opposite solution. Let s = s 1 , s 2 , … , s q be a point in q dimensional space, where s i ∈ j , j , j = 1, 2, … , q , j and j represent the lower boundary and the upper boundary of search space, respectively. The opposite point is defined as (Wang et al 2011) Let s i,j be the value of ordinary individual s i in the j dimension, then the elite opposite solution can be defined as (Sihwail et al 2020) where ∈ [0, 1] is a random number, j , j is the interval constructed by the elite sparrows, and s e i,j ∈ j , j . The elite opposite strategy can provide more alternative solutions. The dynamic boundary can accelerate the convergence of SSA. EOSSA takes the sparrows with the top % of fitness values as the elite solutions. is a constant that represents the ratio of the elite sparrows. EOSSA can obtain the dynamic boundary of the elite sparrows, and then uses the opposite strategy to obtain the opposite solutions. Compared with the sparrow before updating and the sparrow after updating, if it obtains a better value, it will replace the previous sparrow. The rest of the population is the ordinary sparrows. For ordinary sparrows, their low fitness is mainly because the search area is far away from the search range of excellent individuals. To expand the search range and have the ability to jump out of the current poor search area, and to increase the diversity of the population, we use the different opposite learning strategies from elite individuals to operate on ordinary sparrows. Orifice imaging opposite learning is an opposite learning strategy based on optical principle , which assumes that there is a light source with a height of h, and the upper and lower boundaries of the coordinate axis are j and j , respectively. The projection point of the light source on the coordinate axis is s. The image that has height h formed by the light source through the orifice is * , and its projection on the coordinate axis is s oi , which is called the opposite point of s. The schematic diagram of orifice imaging is shown in Fig. 4 . According to the principle of orifice imaging, the process can be described as Let h∕h = , then (28) can be transformed as The process is extended to EOSSA, which can be described as where s t i,j is the j-th dimension of the i-th sparrow individual in the t-th iteration, s oi i,j formed by imaging through orifice represents the opposite point of s t i,j , j and j represent the lower boundary and the upper boundary of search space, respectively, is regulation factor. Remark 1 SSA as a swarm intelligence optimization algorithm is essentially a greedy algorithm. SSA may fall into a locally extremum point. The orifice imaging opposite strategy and opposite learning strategy are introduced into SSA to make it have the ability to escape the local extremum point. These strategies can help the algorithm explore more space in a limited time. The opposite strategy will make the sparrow search in the opposite direction in its searching space when the algorithm tends to local optimization. Besides, this approach can increase the diversity of the search population, which is conducive to the algorithm to obtain the optimal solution. The steps of the EOSSA can be summarized as Algorithm 2. In this part, benchmark functions are used to verify the feasibility and effectiveness of EOSSA. SSA, chaotic SSA (CSSA) (Zhang and Ding 2021), chaos particle swarm optimization(CPSO) (Kennedy and Eberhart 1995; Su et al 2015; Pluhacek et al 2018) , disturbance and somersault foraging grey wolf optimizer (DSFGWO) (Mirjalili et al 2014; Wang et al 2021) , and Levy-flight based moth-flame optimization (LMFO) (Mirjalili 2015; Suja 2021 ) are used to compare the convergence characteristics with EOSSA. The initial population number of all algorithms is set to 100. The number of iterations is 1000. The parameters of each algorithm for comparison are set according to (Mirjalili 2015; Xue and Shen 2020) . The detailed information of the standard benchmark functions is listed in Table 1 and the optimization results are listed in Table 2 . The optimization results are the average value and standard deviation (Std) of 30 experiments. Figure 5 shows the optimization process curves of the algorithms on the unimodal benchmark func- , and F 5 (x) . Figure 6 shows the optimization process curves of the algorithms on the multimodal benchmark functions F 6 (x) , F 7 (x) , F 8 (x) , and F 9 (x) , while Fig. 7 shows the optimization process curves of the algorithms on the fixed-dimension benchmark functions F 10 (x) and F 11 (x). Remark 2 F 1 to F 5 are the unimodal benchmark functions. This kind of benchmark function has only one extreme point. They can be used to verify the convergence speed, optimization accuracy, and local development ability of the algorithm. F 6 to F 9 are multimodal benchmark functions. This kind of benchmark function has multiple local extremum points, which makes the algorithm extremely easy to fall into the local extremum. They can be used to verify the ability of the algorithm to escape from the local extremum and global exploration ability. F 10 and F 11 are the fixed dimension benchmark functions. They can be used to further verify the convergence speed, stability, and convergence accuracy of the algorithm. From the optimization results, because the opposite strategy provides more alternative solutions for the algorithm and improves the ability to avoid falling into local optimum, the optimization performance of EOSSA is improved compared with SSA. After 30 experiments, the average test result Fig. 4 The schematic diagram of orifice imaging principle of EOSSA is better than other algorithms, and Std is also better than other algorithms in most benchmark functions, but the standard deviation is slightly worse than others in a few benchmark functions, such as F 6 (x) . From the convergence curve, the convergence speed of EOSSA is improved compared with SSA. To better evaluate and compare the optimization performance of various optimization algorithms, the box plot is used for statistical comparison and analysis. Figure 8 a-k show the statistical results of SSA, EOSSA, CSSA, CPSO, DSFGWO, and LMFO on each benchmark function of this experiment, respectively. From the results of the box plots, it can be concluded that the comprehensive performance of EOSSA on the benchmark functions is better than other algorithms. In summary, the convergence characteristics of EOSSA are better than the other five optimization algorithms on the whole from the above analysis. The feature extraction accuracy will affect the final recognition rate. Simple and significant features can improve fault recognition accuracy and reduce computational complexity. Manifold learning reduces the redundant dimension through the geometric structure of data. LLE is a classical manifold learning algorithm, which can reveal the internal structure of data. For labeled data, it does not use the category attributes of data. Therefore, LLDE is proposed by combining LLE with MMC. Because of the deformed data distribution, the Euclidean distance in LLDE is replaced by cam weighted distance, and CLLDE is proposed. The previous experiment shows a good effect on the classification problem. The main steps of the fault diagnosis approach can be summarized as follows. Firstly, CLLDE is used to map the original high-dimensional data to the low-dimensional feature space. Secondly, EOSSA is used to optimize the hyper-parameters of LightGBM to establish a classifier to obtain a better diagnosis effect. Finally, the trained EOSSA-LightGBM is used to identify the new test fault data. The flowchart of EOSSA-LightGBM is shown in Fig. 9 . The fault data used in this paper is Tennessee Eastman (TE) process (Yin et al 2012) dataset. Each sample of the simulation dataset contains 52 observed variables, among which the first 22 are non-categorical variables, the 23rd to 41st are categorical variables, and the last 11 are control variables. Considering the influence of the real environment, Gaussian noise is added to all observed variables. No. 18 data set is selected to test the effectiveness of the algorithm. There are 980 training samples, including 480 fault samples and 500 normal samples. The training samples are randomly divided into training set and validation set according to the ratio of 7 : 3. There are 960 samples in the testing set, including 160 normal samples and 800 fault samples. The intrinsic dimension of CLLDE is set to 7 and the numbers of the nearest neighbor point are 13 after validating. cos(2 x i )) + 20 + e Multimodal 30 0 F 1 score is used to analyze and evaluate the performance of the fault diagnosis approach. F 1 score can be calculated by precision rate and recall rate, the mathematical expressions of the three above indicators can be written as where TP is the number that the predicted label of the sample is the same as the real label, FN is the number of the sample that the actual label is the misjudgment, and FP represents the number of the false positive sample. The process of fault diagnosis is divided into feature extraction and fault recognition. Different commonly used feature extraction methods and classification algorithms are combined and then compared with the method using CLLDE as feature extraction and EOSSA-LightGBM as fault diagnosis. Feature extraction methods include PCA, In order to obtain the best super parameters of Light-GBM, EOSSA is used to optimize its parameters. The major hyper-parameters learning rate and decision tree depth of LightGBM are selected for optimization. The calculation formula of F 1 score is selected as the fitness function of EOSSA, which is formulated as When the learning rate is 0.1058 and the depth of the decision tree is 7, the F 1 score is the best, and it is effectiveness is verified in the validation set. Therefore, the two super parameters and of LightGBM are set to 0.1058 and 7, respectively. The experimental results on the test set are listed in Tables 3 and 4. In Table 3 , the horizontal comparison is the performance of classifiers using different feature extraction algorithms, and the vertical comparison is the performance of classifiers using the same feature extraction algorithm. Table 4 lists the results of various optimization algorithms in fault diagnosis. In order to show the results more intuitively, its histogram is shown in Fig. 10 . From Table 4 and Fig. 10 , we can conclude the proposed approach Fig. 6 The convergence curves on the multimodal benchmark functions Fig. 7 The convergence curves on the fixed-dimension benchmark functions is better than others. The experiment proves the effectiveness of the approach we proposed. Also, to compare the performance of various algorithms more intuitively, the receiver operating characteristic (ROC) curves and the precision-recall (PR) curves are shown in Figs. 11, 12, 13, 14, 15, and 16. Figures 11, 12, 13, 14, 15, and 16 shows that LightGBM is optimized by SSA, EOSSA, CSSA, CPSO, DSFGWO, and LMFO, respectively. They show the performance of different fault diagnosis approaches when using different feature extraction methods. From area under curve (AUC) values, we can conclude that the proposed approach is superior to other algorithms in fault identification in the contrast experiment. In this paper, a new fault diagnosis approach based on EOSSA optimized LightGBM is proposed. Aimed at the deformation problem of data distribution, cam weighted distance is introduced into LLDE to extract the feature. The experiments show that CLLDE is effective. The influence of the randomly selected k value on CLLDE is smaller than that of LLDE in comparison. In the fault diagnosis experiment, CLLDE is used as a feature extraction method to improve the performance of various classifiers, which shows that it can effectively extract data features. EOSSA is proposed by introducing the elite opposite strategy and orifice imaging opposite learning strategy into SSA. The experimental results show that the strategies used in EOSSA are worked, which can accelerate the convergence speed of SSA as well as making the solution more effective. Overall, EOSSA shows superior performance on unimodal, multimodal, and fixed-dimension problems, but its statistical performance on the No. 6 benchmark function is not as good as other algorithms in comparison. EOSSA has the best solution of all the algorithms in the No. 6 benchmark function. However, its standard deviation is greater than other algorithms, which indicates that its performance in the No. 6 benchmark function is poor because it cannot guarantee that the solution of each searching mission is in an appropriate range. Compared with other optimization algorithms, EOSSA still shows advantages in the fault diagnosis problem, which indicates that EOSSA is still feasible and effective in this problem. In future work, we can study the application of deep learning (DL) and reinforcement learning (RL) in fault diagnosis (Song et al 2021) . Also, we will study Table 4 Fig . 11 The ROC curve and PR curve of LightGBM optimized by SSA Fig. 12 The ROC curve and PR curve of LightGBM optimized by EOSSA Fig. 15 The ROC curve and PR curve of LightGBM optimized by DSFGWO Fig. 16 The ROC curve and PR curve of LightGBM optimized by LMFO Fault diagnosis in spur gears based on genetic algorithm and random forest Multidimensional scaling An optimal transformation for discriminant and principal component analysis Discriminant analysis for recognition of human face images Fault prediction for distribution network based on CNN and LightGBM algorithm LightGBM: a highly efficient gradient boosting decision tree Particle swarm optimization Incremental nonlinear dimensionality reduction by manifold learning Orthogonal linear local spline discriminant embedding for face recognition Efficient and robust feature extraction by maximum margin criterion Locally linear discriminant embedding: an efficient method for face recognition A survey of deep neural network architectures and their applications A novel particle swarm optimization approach for patient clustering from emergency departments A geometric approach to support vector machine (SVM) classification Moth-flame optimization algorithm: a novel nature-inspired heuristic paradigm Grey wolf optimizer Detection of seizures in EEG signal using weighted locally linear embedding and SVM classifier Weighted locally linear embedding for dimension reduction Weighted locally linear embedding for dimension reduction Chaos driven PSO with attractive search space border points Nonlinear dimensionality reduction by locally linear embedding Kernel principal component analysis Improved harris hawks optimization using elite opposition-based learning and novel search mechanism for feature selection A bearing fault diagnosis model based on CNN with wide convolution kernels Multi-fault diagnosis for rotating machinery based on orthogonal supervised linear local tangent space alignment and least square support vector machine Mitigation of power quality issues in smart grid using levy fight based moth fame optimization algorithm A global geometric framework for nonlinear dimensionality reduction Enhancing particle swarm optimization using generalized opposition-based learning Grey wolf optimization based on disturbance and somersault foraging Principal component analysis Multi-component fault diagnosis of wheelset-bearing using shift-invariant impulsive dictionary matching pursuit and sparrow search algorithm A novel swarm intelligence optimization approach: sparrow search algorithm A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark tennessee eastman process A heuristic fault diagnosis approach for electrohydraulic control system based on hybrid particle swarm optimization and levenberg-marquardt algorithm A switching delayed PSO optimized extreme learning machine for short-term load forecasting A new switching-delayed-PSObased optimized SVM algorithm for diagnosis of Alzheimer's disease A stochastic configuration network based on chaotic sparrow search algorithm Robust locally linear embedding algorithm for machinery fault diagnosis A whale optimization algorithm based on embedding circle mapping and orifice imaging opposite learning with dimension by dimension An efficient algorithm to solve the small sample size problem for lda Improving nearest neighbor classification with cam weighted distance Optimal parameter identification of pemfc stacks using adaptive sparrow search algorithm Availability of data and material Iris dataset https:// scikit-learn. org/ stable/. Tennessee Eastman process dataset https:// github. com/ caoyu e1232 02/ tenne ssee-eastm an-profB raatz. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.Code availability The codes of the algorithms could be provided upon request.