key: cord-0764424-wkem1rj3 authors: Gülcü, Şaban title: An Improved Animal Migration Optimization Algorithm to Train the Feed-Forward Artificial Neural Networks date: 2021-11-10 journal: Arab J Sci Eng DOI: 10.1007/s13369-021-06286-z sha: 05e38007541b9b0c7dd7415378268d7cb7b48256 doc_id: 764424 cord_uid: wkem1rj3 The most important and demanding part of the artificial neural network is the training process which involves finding the most suitable values for the weights in the network architecture, a challenging optimization problem. Gradient approaches and the meta-heuristic approaches are two methods extensively used to optimize the weights in the network. Gradient approaches have serious disadvantages including getting stuck in local optima, inadequate exploration, etc. To overcome these disadvantages, meta-heuristic approaches are preferred in training the artificial neural network instead of gradient methods. Therefore, in this study, an improved animal migration optimization algorithm with the Lévy flight feature was proposed to train the multilayer perceptron. The proposed hybrid algorithm is named IAMO-MLP. The main contributions of this article are that the IAMO algorithm was developed, the IAMO-MLP algorithm can successfully escape from local optima, and the initial positions did not affect the performance of the IAMO-MLP algorithm. The enhanced algorithm was tested and validated against a wider set of benchmark functions and indicated that it substantially outperformed the original implementation. Afterward, the IAMO-MLP was compared with ten algorithms on five classification problems (xor, balloon, iris, breast cancer, and heart) and one real-world problem in terms of mean squared error, classification accuracy, and nonparametric statistical Friedman test. According to the results, the IAMO was successful in training the multilayer perceptron. The artificial neural network (ANN) is one of the popular topics in machine learning and artificial intelligence. The design of ANNs was inspired by the working principle of the biological nervous system in the 1940s. It has been used in many areas as a result of studies that gained speed in the 1980s, and its success has been proven [1] . Nowadays, the ANN is a method frequently used in engineering studies due to its effective problem-solving strategy for complex and difficult problems. It is formed by connecting many artificial process elements (neurons) in its layers. ANNs learn from the samples of the given problem and then decides by using the information it has obtained when it encounters samples of the problem that it has never been seen before [2] . The BŞaban Gülcü sgulcu@erbakan.edu.tr 1 Computer Engineering Department, Necmettin Erbakan University, Konya, Turkey method has been successfully used in areas such as classification [3] , system modeling [4] , face recognition [5] , speech recognition [6] , and optimization [7] . The most important feature of ANNs is the ability to make inferences for different conditions using the experience gained by learning from the information. ANNs consist of two stages, training (learning) and testing. In the first step, ANN is trained with training data. Then, the network is tested using the test data to evaluate the performance of the trained ANN [8] . The most important and demanding part of the method is the training process of the network which involves finding the most suitable values for the weights in the network architecture, a challenging optimization problem. As emphasized in [9] , two approaches are extensively used to optimize weights in the network: gradient approaches and meta-heuristic approaches. There are some disadvantages in gradient approaches: (i) They can easily get stuck in local optima. (ii) The value of the learning rate in gradient approaches is very important and affects the performance of the algorithm because if the value of the learning rate is too small, the training process takes a long time and can get stuck, and if it is too large, the training process takes a short time and the algorithm converges prematurely. (iii) The wider the search space is, the worse the gradient approaches generate results. (iv) The gradient approaches depend heavily on the initial values of the weights. Moreover, the same initial values in different runs generate the same results. Because of these problems, meta-heuristic approaches can be preferred in training ANNs instead of gradient methods. Computer scientists have developed many meta-heuristic algorithms inspired by certain behaviors of creatures in nature to find solutions to optimization problems. The animal migration optimization (AMO) algorithm was designed inspired by the animal migration behavior of individuals that can be found in all major animal groups including birds, mammals, and insects. The developer of the AMO algorithm showed that this algorithm is able to improve the initial random population, converge toward the global optimum, provide very competitive results compared to other wellknown algorithms in the literature, and solve different kinds of optimization problems. Thanks to the success of the AMO algorithm, it has been applied to many different optimization problems such as association rule mining [10] , clustering [11] , unmanned aerial vehicles placement [12] , mobile ad hoc networks [13] , the optimal power flow problem [14] , the traveling salesman problem [15] , reinforcement of bridges [16] , and multilevel image thresholding [17] . Therefore, these applications motivated our attempts to employ the AMO algorithm for training ANNs. In this study, the improved animal migration optimization (IAMO) algorithm with the Lévy flight feature was developed and used in the process of finding optimum weights of the network to increase the success of ANN. The most important and demanding part of ANNs is the training process of the network. To increase the success of the network, the network should be updated with optimum weights. In the literature, the training of ANNs has been carried out using some different optimization algorithms. The success of these optimization algorithms in training ANNs was determined by comparing them in the solution of different world problems. Ibrahim Aljarah et al. [18] proposed the whale optimization algorithm for training ANNs. This algorithm was proven to be able to solve a wide variety of optimization problems and surpass existing algorithms. Twenty datasets with different difficulty levels were used to test the success of the algorithm in the training of the feedforward ANN. The success of the algorithm was determined by comparing it with the backpropagation (BP) algorithm and six different evolutionary optimization algorithms. It was shown that the proposed model performs better than the other algorithms in terms of both local optimum avoidance and convergence speed. Ilyas Benmessahel et al. [19] proposed an advanced detection approach by combining the multi-verse optimization algorithm and an ANN for the intrusion detection system. The main idea of the study was to train feed-forward multilayer ANN using the multi-verse optimization algorithm to detect new intrusions. NSL-KDD and UNSW-NB15 datasets were used to test the success of the approach, and the results for UNSW-NB15 were better than the results for NSL-KDD. Moreover, the proposed method outperforms the ANN trained using the particle swarm optimization (PSO) algorithm. Sankhadeep Chatterjee et al. [20] proposed an ANN trained using the PSO algorithm to solve the problem of predicting the failure probability of multistory concrete structures by determining the causes of their structural failure. In experimental studies, a database of multi-story reinforced concrete structures consisting of 150 multistoried buildings was used. The proposed method was compared to the multilayer feed-forward ANN model. The proposed method demonstrated a better success rate than the multilayer feed-forward ANN model. Thus, the success of the proposed method was proved. Gülay Tezel et al. [21] used the artificial algae algorithm (AAA) as a tool to optimize the weights of the ANN. The ANN and AAA were combined in such a way that the training phase of the ANN was performed by the AAA. The proposed model was tested in three data sets (iris, thyroid, and dermatology) taken from the University of California, Irvine (UCI) machine learning database. The results were compared with the multilayer perceptron algorithm with backpropagation in terms of mean absolute error. It has been stated that the models where the proposed method would be applied can provide a reduction of up to 96% in mean squared error. In [22] , the moth flame optimization algorithm is proposed for training feed-forward multilayer ANN. The algorithm is used to produce weight and bias values that ensure to obtain minimum error and a high classification rate. Five classification datasets were used to evaluate the performance of the proposed method which was compared with the genetic algorithm (GA), PSO, ant colony optimization (ACO), and evolution strategy. The experimental results proved that the moth flame optimization algorithm solves the local minima problem and achieves high accuracy. Najmeh Sadat Jaddi et al. [23] proposed a hybrid method based on the bat optimization algorithm (BAT) and the ANN. The BAT algorithm produces the weight and bias values of the ANN with minimum error and high classification success. To test the performance of the proposed approach in terms of classification and prediction accuracy, six classification and two time series data sets were used. The statistical tests showed that the proposed method produces good results. The method was applied to a real-world problem to predict future values of rain data, and the results showed the method's success. In [24] , the particle swarm optimization algorithm was used in training the ANN. In the experimental studies, four datasets from the UCI machine learning database were used. In [25] , the grey wolf optimization (GWO) algorithm was applied to train a multilayer perceptron (GWO-MLP). Eight datasets were used in the experiments, and the GWO-MLP algorithm was compared to the PSO-MLP based on the PSO algorithm, the GA-MLP based on the GA algorithm, the ACO-MLP based on the ACO algorithm, the ES-MLP based on the evolution strategy algorithm, and the PBIL-MLP based on the population-based incremental learning algorithm. In [26] , the states of matter search (SMS) algorithm was used to train the ANN. In the experimental studies, five datasets from the UCI machine learning database were used, and the SMS-MLP algorithm was compared to six algorithms in the literature. According to the experimental results, the SMS-MLP algorithm outperformed the six algorithms. In [27] , an improved version of the beetle antennae search algorithm was proposed. The improved beetle antennae search algorithm outperformed the original implementation on the benchmark functions. The improved beetle antennae search algorithm is used to optimize the parameters of the adaptive neuro-fuzzy inference system and to improve the performance of the prediction model. The improved beetle antennae search algorithm was evaluated for COVID-19 case prediction using the World Health Organization's official data. The overfitting problem of convolutional neural networks was overcome by means of properly selecting a regularization parameter known as a dropout in the context of convolutional neural networks using meta-heuristic-driven techniques. In [28] , the overfitting problem of convolutional neural networks was overcome by means of properly selecting a regularization parameter known as a dropout in the context of convolutional neural networks using four distinct meta-heuristic techniques (particle swarm optimization, bat algorithm, cuckoo search and firefly algorithm). The results of four optimization methods were compared with the default dropout-less and the default dropout ratio. The experiments were carried out over four public datasets in the context of image classification. The experimental results showed that the meta-heuristic-based dropout convolutional neural network is very promising. In this study, we propose the IAMO algorithm which is used in the process of finding the optimum weights of the network to increase the success of the ANN. The contribution of this article is that the IAMO algorithm has the Lévy flight strategy. The proposed hybrid algorithm is called IAMO-MLP, and 13 benchmark functions, five classification problems (xor, balloon, iris, breast cancer, heart) and one realworld problem (a prediction problem in civil engineering) are used in the experiments. On the benchmark functions, the IAMO algorithm was compared with the original AMO algorithm. On the classification problems, the results of the IAMO-MLP algorithm were compared in detail with the results of the AMO-MLP algorithm, the BAT-MLP based on the bat optimization algorithm, the SMS-MLP [26] based on the states of matter search optimization [29] algorithm, and the BP algorithm. The IAMO-MLP algorithm was also compared to the GWO-MLP, ACO-MLP, GA-MLP, PBIL-MLP, PSO-MLP, and ES-MLP algorithms in [25] . On the real-world problem, the results of the IAMO-MLP algorithm were compared with the results of the AMO-MLP, BAT-MLP, SMS-MLP, and BP. Moreover, the algorithms were run using different numbers of neurons in the hidden layer. The experimental results showed that the proposed IAMO-MLP algorithm is more efficient than the others. The main contributions of this article are as follows: (1) The proposed IAMO algorithm has the Lévy flight feature. (2) This article employs the AMO and IAMO to train the ANN for the first time, and the proposed algorithm is called IAMO-MLP. (3) The IAMO-MLP algorithm has the ability to escape successfully from local optima. (4) The initial parameters and positions do not affect the performance of the IAMO-MLP algorithm. (5) The features of the IAMO-MLP algorithm are the simplicity, requiring only a few parameters, and solving a wide array of problems. This article is organized as follows. Information about the training process of ANN and some meta-heuristic algorithms employed in training ANNs is provided in the Introduction section. Information about the ANN algorithm, the AMO algorithm, the proposed IAMO algorithm, and the IAMO-MLP algorithm (training ANN using IAMO algorithm) is provided in the Material and Methods section. The experimental results of the algorithms on the benchmark functions, the classification problems and the real-world problem are given in the Experimental Results section. Finally, the results obtained are evaluated, and suggestions about future studies are presented in the Conclusion section. This section gives detailed information about the ANN algorithm, the AMO algorithm, the proposed IAMO algorithm and the IAMO-MLP algorithm. The first studies on ANN were carried out in the late nineteenth century and early twentieth century. The first study looked at physics, psychology, and neuropsychology [30] emphasizing the general theory of learning, perception, and conditioning. Over time, new developments such as the BP algorithm which is used to train multilayer networks further strengthened the studies on ANN. Over the years, many articles have been written about ANNs, and developed ANN models have been used in many distinct areas. ANN is frequently mentioned in the area of machine learning and artificial intelligence nowadays. The main areas in which ANN is used are classification, clustering, pattern [38] recognition, estimation, and optimization. ANN is used for problem-solving not only in engineering but also in many other fields including finance [31] , medicine [32] , physics [33] , transportation [34] , statistic [35] , and mathematics [36] . ANN has many features that cause it to be used in many different areas including the ability to produce nonlinear models, the ability to learn and generalize, and the applicability to different problems. ANN is a problem-solving strategy that consists of artificial nerve cells similar to the structure of biological nerve cells. The multilayer perceptron (MLP) is a version of ANN which consists of three main layers, namely input, hidden, and output. The general structure of MLP is given in Fig. 1 . The input layer is where the input data in the data set of the problem are given to the MLP. In the input layer, there are as many cells as the input data in the dataset of the problem. The data given to the input layer are transmitted to the next layer in order to process the data. The hidden layer processes the data taken from the input layer and transmits the results to the output layer. While some MLPs have only one hidden layer, some have more than one. The number of neurons in the hidden layer is independent of the number of neurons in the input layer and the output layer. The complexity of the algorithm and the solution duration of the problem increase along with the increase in the number of neurons in the hidden layer. But this situation enables the ANN to be used in solving more complex problems. The output layer is the layer where the output of the network is produced by processing the data coming from the hidden layers [2] . MLP consists of artificial nerve cells, and an artificial nerve cell consists of five main parts: inputs, weights, addition (aggregation) function, activation function, and outputs. The data coming to neurons are called input. Weights are used to adjust the effect of inputs to artificial nerve cells on the output of the problem. The value of a weight can be a positive value, a negative value, or zero. If the weight value is 0, the inputs do not affect the output of the neuron. The input data of artificial nerve cells are multiplied by the weights of the connections, and the net input is calculated using the addition function. The bias value is also added to the net. The activation function produces the output of the artificial nerve cell by process-ing the net input value obtained from the addition function. This process is shown in Fig. 2 . When determining the activation function, nonlinear activation functions are generally preferred. Another point to consider when determining the activation function is that the derivative of the function should be easily calculated. The sigmoid activation function is generally preferred in the MLP model, which is widely used today. The sigmoid activation function given in Eq. (1) is a continuous, nonlinear, and easy derivative function. This function generates a value between 0 and 1 for each input value. The value obtained using the activation function corresponds to the output value of the artificial nerve cell [2] . In addition, all datasets are normalized by using the minmax normalization function given in Eq. (2) to eliminate the effect of attributes that may have different effective rates on the classification [18] . where x is the normalized value of x which is in the range [x min , x max ]. The normalized value x will be in the range [0, 1]. Migration, is a common animal behavior arising out of the animal's survival efforts, is a behavioral movement that transports animals to new habitats. Animal migrations are the movements of individuals over long distances, usually seasonally. Migration is a vital activity found in all animal groups including birds, mammals, and insects. Climate and insufficient food are the main reasons that force animals to migrate. During the migration process, individuals in animal groups act by following three main instructions: (1) move in the same direction as neighbors, (2) stay close to neighbors, and (3) avoid colliding with neighbors. Recent studies on starlings have shown that each bird changes its position in a direct relationship to six or seven animals around it, regardless of how close or how far the animals are. These interactions between starlings in a flock are based on a topological rule [39] . Inspired by these rules, a new swarm-based algorithm called AMO has been proposed by Li, Zhang, and Yin [40] . The main idea of the AMO algorithm is applied through concentric regions around each animal. In the thrust zone, the animal concerned will try to distance itself from its neighbors to avoid the collision. Moving away a little, the animal will try to align its direction of movement with its neighbors in the zone of harmony. In the outermost attraction zone, the animal concerned will try to move toward its neighbor. The AMO algorithm is a swarm-based optimization algorithm developed to solve global optimization problems inspired by animal migration behavior that can be found in all large animal groups such as birds and fish. Two idealized assumptions are used to describe the basic function of the algorithm: (1) The animal with the highest quality in the herd will be defined as the leader animal, and the leader animal will be protected in future generations. (2) The number of animals in the herd is fixed, and each animal will be replaced with a new individual with probability P a . In this case, the animal will leave the group, and then a new animal will join the group. The AMO algorithm consists of two processes: the migration process and the population updating process. The migration process covers how the animals move from the current location to the new location. Animals must obey three topological rules of migration in this process. (1) move in the same direction as neighbors, (2) stay close to neighbors, and (3) avoid colliding with neighbors. When these rules are followed, animals migrate in an optimized way. The migratory animal population consists of a range of animal herds as follows. This migration population is shown in Eq. (3), where NP and X i represent the population size and an individual in the population, respectively. Each individual in the population consists of a d-dimensional vector accepted as the elements of a solution within the maximum and minimum limits in the search space. At the beginning of the algorithm, a value within the maximum and minimum limits is assigned to the d-dimensional vector of each individual in the population using Eq. (4), where X i , X min and X max represent an individual in the population, the minimum bounds in the search space, and maximum bounds in the search space, respectively. rand is a uniformly distributed random number between 0 and 1. A local neighbor cluster is needed to determine the new location of each individual in the population. The ring topology scheme given in Fig. 3 is used to define this cluster. In Fig. 3 , the length of the adjacent cluster is set to be five for each dimension of the individual. If the animal index is i, the neighbor cluster will be created with animals having the indices i−2, i−1, i, i + 1, and i + 2. If the animal index is 1, the neighbor cluster will be created with animals having indices NP-1, NP, 1, 2, and 3. NP represents the total number of individuals in the migration population. Each individual whose neighbor cluster is determined calculates the new position according to the position of the individuals in the neighbor cluster by obeying the rules to be considered where X t neighbor is the current position of the neighbor selected from the cluster. X t i and X t+1 i are the positions of the ith individual in the iterations t and t + 1, respectively, and δ is a random value between 0 and 1. This value may vary according to different problems in the real world. The population updating process covers how some animals left the herd and how new animals are added to the herd. A probability value is given to each individual in the population according to their fitness value. While the probability value for the most compatible individual in the population is 1, this value is 1/NP for the most incompatible individual, and NP is the population size. When the probability value of the individual is less than the randomly generated value, a new individual is created using Eq. (6), where X t i and X t+1 i are the positions of the ith individual in the iterations t and t + 1, respectively. X t r 1 and X t r 2 are randomly selected individuals from the population, X t best is the leading animal with the high quality of the position, and rand is a random value between 0 and 1. If the quality of the new individual is better than the current individual, then the current individual is removed from the population and the new individual is added to the population. The flowchart of the AMO algorithm is shown in Fig. 4 , and the pseudo-code of the AMO algorithm is presented in [40] . Although the AMO algorithm is one of the recent metaheuristic algorithms and shows good performance in solving optimization problems, it has some bottlenecks. According to [41] , the performance of AMO is degraded rapidly when the dimensionality is larger than 30. According to [42] , the bottlenecks of the AMO algorithm are premature convergence and falling into local optima. In order to overcome these bottlenecks, we have proposed the IAMO algorithm which has the Lévy flight strategy. The Lévy flight developed by Paul Lévy is a version of the random walk model. It is based on the Lévy distribution which is a continuous probability distribution. Studies show that the distance traveled by many animals, including bees, ants and fish, in foraging behavior corresponds to the Lévy distribution [43] . The advantage of the Lévy flight is that it optimizes the distance in foraging. Therefore, we applied the Lévy flight strategy to the individuals in the IAMO algorithm and as a result of this, the Lévy flight improved the diversification and intensification in the IAMO algorithm. Normally, the position of an individual is updated using Eq. (5) . But in IAMO, if an individual cannot improve its position in several consecutive iterations, this individual updates its position using Eq. (7) which contains the Lévy flight strategy. where X t i and X t+1 i are the positions of the ith individual in the iterations t and t + 1, respectively. D is the dimension of the position X t i . The Lévy flight is calculated using Eq. (8), where β is a constant (1.5), and r 1 and r 2 are the random numbers between 0 and 1. σ is calculated using Eq. (9), where G is calculated using Eq. (10) . In contrast to AMO, each individual in IAMO has also a counter variable that records the number of consecutive iterations where the individual cannot be improved. There is also a threshold variable in IAMO that controls the activation of the Lévy flight. If the value of the counter variable of an individual exceeds the threshold value, then the Lévy flight strategy is applied to this individual using Eq. (7). Figure 5 shows the flowchart of the IAMO algorithm. The pseudo-code of the IAMO algorithm is shown in Fig. 6 . Firstly, the parameters are initialized, and the population is randomly generated. The fitness value of each individual is calculated, and the global best position is determined. Secondly, the migration process of the algorithm, which covers how the animals move from the current location to the new location, starts. An individual updates its position with the help of its neighbors. If an individual cannot improve its fitness value in several consecutive iterations, this individual updates its position using the Lévy flight strategy, and the counter value of the individual is reset. The fitness value of the new position of each individual is calculated, and, if the new position is better than the current position, then the In this work, a hybrid algorithm (IAMO-MLP) is proposed to train the multilayer perceptron (MLP) using the IAMO algorithm. In the proposed IAMO-MLP algorithm, the IAMO algorithm optimizes the weights and biases of the MLP. In Figure 7 shows an exemplar of the MLP with the 2-3-2 structure, and Fig. 8 shows the position vector for this MLP. In Fig. 7 , X i , o i , w ij and b i represent the inputs, the outputs, the weights, and the biases of the MLP, respectively. The candidate solution vector is the same size for all individuals in the population and is equal to the total number Table 1 Benchmark functions, dimensions, global minimums, and search ranges 50, 50] of weights and biases that make up the network. The length of the candidate solution vector is calculated using Eq. (11) L (k * l) + l + (l * m) + m (11) where L, k, l, and m represent the length of the vector, the number of neurons in the input layer, the number of neurons in the hidden layer, and the number of neurons in the output layer, respectively. The IAMO-MLP algorithm optimizes the weights and biases of MLP according to the inputs-outputs pattern. To find the optimum values, the IAMO-MLP algorithm tries to minimize the error between the real outputs and predicted outputs. The mean squared error (MSE) shown in Eq. (12) is used as the objective function to calculate the fitness values of the solution vector. The IAMO-MLP algorithm aims to minimize the MSE. where N is the number of the training samples, K is the number of neurons in the output layer, and r i j and p i j are the real output and predicted output of the neuron j for the ith instance of the training samples, respectively. The flowchart of the IAMO-MLP algorithm is shown in Fig. 9 . The IAMO-MLP algorithm works as follows: Each individual in the algorithm represents an animal and offers a solution. Each individual has a fitness value calculated by the objective function in Eq. (12) . Firstly, the population size and the length of the neighborhood are initialized and the population is randomly generated using Eqs. (3) and (4) . Then, the fitness value for each individual is calculated by the objective function. In this stage, each individual is assigned to MLP and evaluates MLP using MSE on the training samples. Then, the migration process with the Lévy flight is run. In the migration process with the Lévy flight, each individual seeks better solutions in the search space with the help of their neighbors and the Lévy flight strategy using Eqs. (5) and (7) . Then, the population updating process is run in which animals with low fitness value are removed from the population and new individuals with high fitness value are added to the population using Eq. (6). Thus, each individual searches for better solutions in the search space. This process is continued until the termination criteria are met. At the end of the algorithm, the best position, namely the best MLP with minimum MSE, is reported as the output of the algorithm. Thus, the IAMO-MLP algorithm finds the most appropriate values of the weights and biases according to the inputs-outputs pattern. Consequently, the IAMO algorithm updates the values of the weights, and biases of the MLP to minimize the MSE until the termination criteria of the training process are met. The efficiency of an algorithm may be demonstrated using a theoretical analysis or an empirical analysis [44] . In an empirical analysis of IAMO-MLP, Table 6 presents the average computational time in seconds. In theoretical analysis, the worst-case complexity of the algorithm is generally computed. The worst-case complexity of IAMO-MLP depends on the number of iterations, the number of animals, the structure of the MLP, the number of training instances, the number of attributes of training instances, the migration process, and the population updating process. So, the worst-case complexity of IAMO-MLP is as follows: (14) where g is the number of iterations, N is the number of individuals, t is the number of training instances, i is the number of input nodes, h is the number of hidden nodes, o is the number of output nodes, and d is the dimension of the position vectors of an individual. In this section, to verify the accuracy and robustness of the proposed IAMO and IAMO-MLP algorithms, the experimental studies were carried out on datasets with different difficulty levels and different features. These datasets are 13 benchmark functions, five classification datasets taken from UCI Machine Learning Repository, and a real-world problem taken from [45] . The specifications of the hardware and software used in the experiments are as follows: Intel(R) Core(TM) i5-3330 3.00 GHz, 4 GB memory, and Microsoft Windows 10. The algorithms were implemented in MATLAB R2015a. All statistical analyses in this study were performed with Microsoft Excel 2013 software. The bold text refers to the best results The bold text refers to the best results The bold text refers to the best results Boxplot charts of the classification rate results on test data To determine the success of the proposed IAMO algorithm, 13 benchmark functions shown in Table 1 were used in the experiments. These functions have been widely used in the literature. Functions f 1f 5 are unimodal functions, and the rest of the functions are multimodal. Moreover, Table 1 also shows the dimension, the global minimum values, and the search ranges. The performance of the proposed IAMO algorithm was compared with the performance of the following algorithms: AMO [40] , particle swarm optimization (PSO) [46] , differential evolution (DE) [47] , biogeography-based optimization (BBO) [48] , cuckoo search (CS) [49] , the firefly algorithm (FA) [50] , the gravitational search algorithm (GSA) [51] , and artificial bee colony (ABC) [52] . The results of these algorithms were taken from [40] and presented in Table 2 . To fairly compare the algorithms, each algorithm runs the same function evaluations (FEs) at each run: 150,000 FEs for f 1 , f 6 , f 10 , f 12 and f 13 ; 200,000 FEs for f 2 and f 11 ; 300,000 FEs for f 7 , f 8 and f 9 ; and 500,000 FEs for f 3 , f 4 and f 5 . The threshold and population size parameters of the IAMO algorithm are set to 5 and 50, respectively. The IAMO algorithm was independently run 30 times on each function. The average values (mean) of the results and their standard deviations (stdDev) in 30 runs are provided in Table 2 . According to the results of the algorithms on the benchmark functions in Table 2 , it is evident that the IAMO algorithm outperforms the other algorithms in the majority of the test cases. The IAMO algorithm provides the best results on ten of the benchmark functions. It has also the second-best results on two of the benchmark functions. According to Table 2 , it is obvious that the proposed IAMO algorithm has better performance than the canonical AMO algorithm. To determine the success of the proposed IAMO-MLP algorithm, five classification datasets were used in the experiments: xor, balloon, iris, breast cancer, and heart. These classification datasets are selected according to the different levels of difficulty to effectively test the performance of the IAMO-MLP algorithm. The properties (number of attributes and classes, the number of training and test samples) of these five datasets are shown in Table 3 . The training and test subsets of the datasets are taken from the website www.seyedalimirjalili.com and also used in [25] . Additionally, Table 3 shows the architecture of the MLP and the vector size of the candidate solutions. The number of neurons in the hidden layer was determined by 2 ×n +1 where n is the number of the neurons in the input layer. The length of the vector is calculated using Eq. (11) . The IAMO-MLP algorithm was independently run 30 times on each dataset. 12,500 function evaluations (FEs) for xor and balloon, and 50,000 FEs for the remaining datasets were carried out at each run. The initial values of the weights and biases of MLP were randomly determined in the range of [− 10, 10]. The proposed IAMO-MLP algorithm is also compared with the algorithms AMO-MLP, BAT-MLP, SMS-MLP, and BP. Table 4 shows the parameters of these algorithms. To fairly compare all the algorithms, they have the same number of FEs. Table 5 shows the average and standard deviation of the MSE results of the algorithms on training data. The lower the MSE is, the better the performance of the algorithm is. According to the results in Table 5 , the IAMO-MLP algorithm has the lowest MSE value for each dataset. Figure 10 shows the convergence graphs of the algorithms in terms of the MSE value at each iteration. When the convergence graphs are analyzed, it is seen that the IAMO-MLP algorithm exhibits very good performance. Table 6 shows the average classification accuracy on test data and the average computational time of the algorithms. The IAMO-MLP algorithm has a better classification accuracy than the other algorithms on the datasets xor, balloon, iris, and breast cancer. On the dataset heart, the BAT-MLP algorithm has a better classification accuracy than the other algorithms. According to the average computational time, the fastest algorithm is the BP algorithm, and the slowest algorithm is the IAMO-MLP algorithm. Table 7 shows the results of the performance metrics sensitivity, specificity, precision, and F 1 -Score. According to the results, the IAMO-MLP algorithm has the highest percentage of sensitivity, specificity, and precision on xor, balloon, and iris datasets among all the algorithms. Additionally, the IAMO-MLP algorithm has the highest percentage of specificity and precision on the heart dataset among all the algorithms. In terms of the F 1 -Score, the IAMO-MLP algorithm has better results than the other algorithms on xor, balloon, iris and breast cancer datasets. Overall, the IAMO-MLP algorithm is successful according to the performance metrics. Figure 11 shows the boxplot charts of the classification rate results of the algorithms on test data. Boxplot charts are an easy way to visually show the distribution of data, particularly used to summarize data in terms of the central location, spread, skewness, and kurtosis and to identify outliers. Boxplot charts present the minimum value, first quartile, median value, mean value, third quartile, and maximum value of data. According to the boxplot charts in Fig. 11 , the IAMO-MLP algorithm generates more robust results than the AMO-MLP, BAT-MLP, SMS-MLP, and BP algorithms on each dataset, although the BAT-MLP algorithm has a better average classification accuracy than the IAMO-MLP and AMO-MLP algorithms on the heart dataset. The IAMO-MLP algorithm is also compared with the GWO-MLP, PSO-MLP, GA-MLP, ACO-MLP, ES-MLP, and PBIL-MLP algorithms whose results are taken from [25] . Table 8 shows this comparison according to the best classification accuracy (%). The results show that the IAMO-MLP algorithm outperforms the other six algorithms on all the datasets. On the xor dataset, the IAMO-MLP, GWO-MLP, and GA-MLP algorithms have 100% classification accuracy. On the balloon dataset, all the algorithms have 100% classification accuracy. On the iris dataset, the IAMO-MLP algorithm has 99.33% classification accuracy, and the GWO-MLP algorithm has the second-best result with 91.33% classification accuracy. On the breast cancer dataset, the IAMO-MLP algorithm and the GWO-MLP algorithm have the same classification accuracy, namely 99%. On the heart dataset, the IAMO-MLP algorithm has 79.14% classification accuracy, and the GWO-MLP algorithm has the second-best result with 75% classification accuracy. The Friedman test is a nonparametric statistical test. In the Friedman test, two or more samples are used to compare the populations. This test also gives the ranking results of The bold text refers to the best results The bold text refers to the best results the population. Table 9 shows the average ranking of the algorithms according to the classification rate. Because the aim is to maximize the classification rate, the higher values in Table 9 are better. According to the Friedman test result in Table 9 , the IAMO-MLP algorithm ranks higher than the other algorithms with a ranking score of 4.4. The AMO-MLP algorithm has the second ranking with a ranking score of 4.2. The BAT-MLP algorithm has the third ranking with a ranking score of 2.9. The BP algorithm has the fourth ranking with a ranking score of 1.8. The SMS-MLP algorithm has the last ranking with a ranking score of 1.7. In this section, the IAMO-MLP algorithm is applied to solve a real-world problem in the civil engineering area in which waste tires are added into cement with the result that both waste tires are recycled, and the strength of the concrete is increased [53] [54] [55] . However, the compressive strength of rubberized concrete varies according to the amount of substances added into it. To find the optimum value of the compressive strength, the amount of substances added is empirically determined. Recently, some models based on soft computing techniques have been created such as ANN [45, 56] to estimate the compressive strength. In this study, the IAMO-MLP algorithm was used to estimate the compressive strength. The dataset of the real-world problem was taken from [45] . It has three attributes (water-cement ratio w/c, superplasticizer sp, and granular squeleton gs), one output (comprehensive strength fc), and 112 instances. Some of the data are shown in Table 10 . The 95 instances of them are used for training, and the remaining 17 instances are used for the test. The IAMO-MLP algorithm was compared to the AMO-MLP, BAT-MLP, SMS-MLP [26] , and BP algorithms. In the previous section, the number of neurons in the hidden layer was calculated by 2 ×n +1, where n is the number of neurons in the input layer. In this section, we compared the algorithms using different numbers of neurons in the hidden layer. Therefore, the numbers (H) of neurons in the hidden layer were set from 7 up to 20. We also compared the algorithms using different numbers of population size (P) and maximum iteration number (I): P 50 and I 250, P 50 and I 500, P 100, and I 250. To fairly compare the algorithms, the number of function evaluations of each algorithm was kept equal. Therefore, the maximum epoch number of the BP algorithm was 12,500 for 50 population-250 iterations, and 25,000 for 50 population-500 iterations and 100 population-250 iterations. The other parameters are shown in Table 4 . The algorithms were independently run 30 times. Table 11 shows the results of the algorithms on the training data according to the MSE for P 50 and I 250. It is seen that the IAMO-MLP algorithm outperforms the other algorithms for all H values. The IAMO-MLP algorithm has both the best average values and the best standard deviations. On the other hand, the BP algorithm has both the worst average MSE results and the worst standard deviations. Table 12 shows the results of the algorithms on the test data according to the MSE for P 50 and I 250. Figure 12 shows the boxplot charts of the MSE results on test data for P 50 and I 250. As seen from the table and boxplot charts, the IAMO-MLP algorithm outperforms the other algorithms for all H values except H 8, 11, and 15. Besides, the results of the AMO-MLP The bold text refers to the best results Avg average, Std standard deviation, Min minimum The bold text refers to the best results algorithm are better than the SMS-MLP algorithm, the BAT-MLP algorithm, and the BP algorithm. The BP algorithm has the worst results. Table 13 shows the results of the algorithms on the training data according to the MSE for P 100 and I 250. It is seen that the IAMO-MLP algorithm outperforms the other algorithms for all H values. The IAMO-MLP algorithm has both the best average values and low standard deviations. On the other hand, the BP algorithm has both the worst average MSE results and the worst standard deviations. Table 14 shows the results of the algorithms on the test data according to the MSE for P 100 and I 250. Figure 13 shows the boxplot charts of the MSE results on test data for P 100 and I 250. As seen from the table and boxplot charts, the IAMO-MLP algorithm outperforms the other algorithms for all H values except H 8, 10, 12, and 13. The BP algorithm has the worst results. Table 15 shows the results of the algorithms on the training data according to the MSE for P 50 and I 500. It is seen that the IAMO-MLP algorithm outperforms the other algo- Table 16 shows the results of the algorithms on the test data according to the MSE for P 50 and I 500. Figure 14 shows the boxplot charts of the MSE results on test data for P 50 and I 500. As seen from the table and boxplot charts, the IAMO-MLP algorithm outperforms the other algorithms for all H values except H 8, 9, and 11. The AMO-MLP algorithm also has the second-best results. Besides, the BP algorithm has the worst results. Table 17 shows the Friedman test results. Because the goal was to minimize the MSE, the lower values in the result of the Friedman test are better. For P 50 and I 250, the IAMO-MLP algorithm ranked higher than the other algorithms according to the Friedman test results with a ranking score of 1.36. The AMO-MLP algorithm has the second ranking with a ranking score of 1.64. The SMS-MLP algorithm has the third ranking with a ranking score of 3.07. The BAT-MLP algorithm has the fourth ranking with a ranking score of 3.93. The BP algorithm has the last ranking with a ranking score of 5.0. For P 100 and I 250, the IAMO-MLP algorithm ranked higher than the other algorithms with a ranking score of 1.36. The bold text refers to the best results The bold text refers to the best results The AMO-MLP algorithm has the second ranking with a ranking score of 1.64. The BAT-MLP algorithm has the third ranking with a ranking score of 3.21. The SMS-MLP algorithm has the fourth ranking with a ranking score of 3.79. The BP algorithm has the last ranking with a ranking score of 5.0. For P 50 and I 500, the IAMO-MLP algorithm ranked higher than the other algorithms with a ranking score of 1.21. The AMO-MLP algorithm has the second ranking with a ranking score of 1.79. The SMS-MLP algorithm has the third ranking with a ranking score of 3.14. The BAT-MLP algo-rithm has the fourth ranking with a ranking score of 3.86. The BP algorithm has the last ranking with a ranking score of 5.0. Finally, the IAMO-MLP algorithm achieves good performance on the real-world problem according to the MSE, the boxplot charts, and Friedman test results. From the experimental results, it is demonstrated that the IAMO algorithm is successful in training the MLP, and the IAMO-MLP algorithm has the ability to escape successfully from local optima. Moreover, the independent 30 runs prove that the randomly generated initial positions do not affect the performance of The bold text refers to the best results The bold text refers to the best results The bold text refers to the best results the IAMO-MLP algorithm. Different numbers of neurons in the hidden layer were also investigated, and the performance of the IAMO-MLP algorithm is also satisfactory for different structures of the MLP. The IAMO-MLP algorithm was also used in solving a real-world problem in the civil engineering area. The results indicate that the proposed IAMO-MLP algorithm is successful in predicting the compressive strength of the rubberized concrete with an acceptable degree of accuracy. In order to achieve an efficient method for training neural networks, the improved animal migration optimization algorithm with the Lévy flight feature was developed. The proposed approach was tested on several datasets. The results showed that this method was able to achieve the best training performance for most of the datasets. The proposed IAMO algorithm was compared with the original AMO algorithm and seven algorithms in the literature. The experimental results proved that the IAMO outperforms other algorithms in terms of both local optima avoidance and convergence speed. The high local optima avoidance is due to the intense exploration of this algorithm. The Lévy flight strategy assists this algorithm in avoiding the many local solutions in the problem of training MLPs. Also, the AMO and IAMO have the neighbor cluster feature in which an individual searches the space around its neighbors. Moreover, AMO and IAMO have the population updating process which covers how some animals leave the herd and how new animals are added to the herd. Thanks to the population updating process, the AMO and IAMO have a good ability on global exploration. The superior convergence speed of the IAMO-MLP algorithm originates from the Lévy flight strategy, migration process, and population updating process of the IAMO algorithm. Therefore, the IAMO and IAMO-MLP algorithms manage to outperform other algorithms on most of the datasets. Moreover, the convergence graphs, boxplots charts and Friedman tests show that the initial positions do not affect the performance of the IAMO-MLP algorithm. According to this comprehensive study, the IAMO algorithm is highly recommended to be used in hybrid intelligent optimization schemes such as training MLP. This recommendation is made because of its exploration behavior which results in high local optima avoidance when training MLP. The exploitation behavior is another reason why the IAMO-MLP can converge rapidly toward the global optimum for different datasets. The IAMO-MLP algorithm was compared with the BP algorithm in the experiments, and the results of the IAMO-MLP algorithm showed that it was very promising since it was able to obtain better results in almost all datasets. On the other hand, IAMO-MLP requires a higher computational load than BP because the fitness update of each individual in IAMO-MLP needs to be evaluated under an MLP architecture. Thus, this process during iterations takes a longer time to find the most suitable values for the weights in the network architecture. It is usually expected that more iterations and individuals would provide better results, but such a process comes at the price of a higher computational burden. Therefore, it should be noted here that IAMO is highly recommended only when the dataset and the number of attributes are very large. Small datasets with very few features can be solved much faster by gradient-based training algorithms and without extra computational cost. In contrast, the IAMO algorithm is useful for large datasets due to the extreme number of local optima that makes the conventional training algorithm almost ineffective. The most important and demanding part of the artificial neural network (ANN) is the training process of the network. The training of the ANN is the process of finding the most suitable values for the weights in the network architecture, and this is a very difficult optimization problem. Therefore, this study set out to optimize the values of the weights and biases of the multilayer perceptron (MLP) using the proposed improved animal migration optimization (IAMO) algorithm called IAMO-MLP. The original AMO algorithm was designed inspired by the animal migration behavior of individuals that can be found in all major animal groups including birds, mammals, and insects. The main contributions of this article are: (1) The proposed IAMO algorithm has the Lévy flight strategy. (2) This article employs the AMO and IAMO algorithms for training ANN for the first time. (3) The IAMO-MLP algorithm has the ability to escape successfully from local optima. (4) The initial positions do not affect the performance of the IAMO-MLP algorithm. (5) The features of the IAMO-MLP algorithm are simplicity, requiring only a few parameters, and solving a wide array of problems. In the experiments, datasets with different difficulty levels and different features were used. These datasets are 13 benchmark functions, five classification datasets taken from the UCI Machine Learning Repository, and a real-world problem taken from the literature. The IAMO-MLP algorithm was compared with the original AMO-MLP algorithm and nine algorithms in the literature in terms of mean squared error, classification accuracy, nonparametric statistical Friedman test, boxplot charts, and convergence graphs. The experimental results indicate that the proposed IAMO-MLP algorithm is successful not only in solving classification problems, but also in predicting the compressive strength of the rubberized concrete with an acceptable degree of accuracy. In conclusion, this study has shown that the proposed IAMO-MLP algorithm is successful in training the MLP. The proposed IAMO-MLP algorithm can be successfully used in areas such as classification, face recognition, speech recognition, pattern recognition, prediction, and optimization. In future work, the IAMO-MLP algorithm can be applied to different datasets such as covid-19. Further research regarding the role of the activation function and threshold parameter would be worthwhile. Some of the most representative computational intelligence algorithms such as the krill herd algorithm [57] , the monarch butterfly optimization [58] , the earthworm optimization algorithm [59] , the elephant herding optimization [60] , the moth search algorithm [61] , the slime mould algorithm [62] , and the Harris hawks optimization [63] can be used to train the ANN in solving the civil engineering problem. Besides, the IAMO-MLP algorithm can be hybridized with another meta-heuristic algorithm to increase its success. Training multi-layer perceptron with artificial algae algorithm Yapay sinir agları Artificial Neural Networks A feedforward artificial neural network model for classification and detection of type 2 diabetes Ş: Modeling of removal of chromium (VI) from aqueous solutions using artificial neural network Face recognition using artificial neural network and feature extraction Speech recognition using neural network for mobile robot navigation Ş: Optimization of flexure stiffness of FGM beams via artificial neural networks by mixed FEM Training artificial neural network by bat optimization algorithms Dynamic group optimisation algorithm for training feed-forward neural networks ARM-AMO: an efficient association rule mining algorithm based on animal migration optimization. Knowl.-Based Syst An information entropy-based animal migration optimization algorithm for data clustering UAV placement with animal migration optimization algorithm AMIGM: animal migration inspired group mobility model for mobile Ad hoc networks Optimal power flow using a new evolutionary approach: animal migration optimization An elitist approach for solving the traveling salesman problem using an animal migration optimization algorithm Optimization of bridges reinforcement by conversion to tied arch using an animal migration algorithm A multilevel image thresholding using the animal migration optimization algorithm Optimizing connection weights in neural networks using the whale optimization algorithm A new evolutionary neural networks based on intrusion detection systems using multiverse optimization Particle swarm optimization trained neural network for structural failure prediction of multistoried RC buildings Combining artificial Algae Algorithm to artificial neural network for optimization of weights Mothflame optimization for training multi-layer perceptrons Optimization of neural network model using modified bat-inspired algorithm Training of Artificial Neural Networks using Meta Heuristic Algorithms How effective is the Grey Wolf optimizer in training multi-layer perceptrons Training of the artificial neural networks using states of matter search algorithm COVID-19 cases prediction by using hybrid machine learning and beetle antennae search approach Handling dropout probability estimation in convolution neural networks using metaheuristics An optimization algorithm inspired by the States of Matter that improves the balance between exploration and exploitation Stock market index prediction using artificial neural network Predicting medical expenses using artificial neural network Solving the quantum many-body problem with artificial neural networks An artificial neural network based decision support system for energy efficient ship operations Application of artificial neural networks and multivariate statistics to predict UCS and E using physical properties of Asmari limestones Real time prediction of drilling fluid rheological properties using Artificial Neural Networks visible mathematical model (white box) Prediction of wind pressure coefficients on building surfaces using artificial neural networks Artificial Neural Networks: the Brain behind AI Global civil unrest: contagion, self-organization, and prediction Animal migration optimization: an optimization algorithm inspired by animal migration behavior An intelligent algorithm with interactive learning mechanism for high-dimensional optimization problem based on improved animal migration optimization Opposition-based animal migration optimization Environmental context explains Lévy and Brownian movement patterns of marine predators Metaheuristics: From Design to Implementation Using artificial neural networks approach to estimate compressive strength for rubberized concrete Particle swarm optimization Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces Biogeography-based optimization Cuckoo search via Lévy flights Firefly algorithm, Levy flights and global optimization GSA: a gravitational search algorithm A powerful and efficient algorithm for numerical function optimization: artificial bee colony (ABC) algorithm Engineering properties of self-compacting rubberized concrete Prediction of modulus of elasticity based on micromechanics theory and application to low-strength mortars Investigating properties of pervious concretes containing waste tire rubbers. Constr Prediction of properties of waste AAC aggregate concrete using artificial neural network Krill herd: a new bio-inspired optimization algorithm Monarch butterfly optimization Earthworm optimisation algorithm: a bio-inspired metaheuristic algorithm for global optimisation problems Elephant herding optimization: variants, hybrids, and applications. Mathematics Moth search algorithm: a bio-inspired metaheuristic algorithm for global optimization problems Slime mould algorithm: A new method for stochastic optimization Harris hawks optimization: Algorithm and applications Acknowledgements This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.Author contributionŞaban Gülcü involved in conceptualization, formal analysis, investigation, methodology, project administration, software, validation, visualization, writing-original draft, and writing-review and editing. The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.