key: cord-0976855-nep2rd4z authors: Alweshah, Mohammed title: Coronavirus herd immunity optimizer to solve classification problems date: 2022-03-15 journal: Soft comput DOI: 10.1007/s00500-022-06917-z sha: 26680738ba21461f5c0639b7f804f7745e25a52d doc_id: 976855 cord_uid: nep2rd4z Classification is a technique in data mining that is used to predict the value of a categorical variable and to produce input data and datasets of varying values. The classification algorithm makes use of the training datasets to build a model which can be used for allocating unclassified records to a defined class. In this paper, the coronavirus herd immunity optimizer (CHIO) algorithm is used to boost the efficiency of the probabilistic neural network (PNN) when solving classification problems. First, the PNN produces a random initial solution and submits it to the CHIO, which then attempts to refine the PNN weights. This is accomplished by the management of random phases and the effective identification of a search space that can probably decide the optimal value. The proposed CHIO-PNN approach was applied to 11 benchmark datasets to assess its classification accuracy, and its results were compared with those of the PNN and three methods in the literature, the firefly algorithm, African buffalo algorithm, and β-hill climbing. The results showed that the CHIO-PNN achieved an overall classification rate of 90.3% on all datasets, at a faster convergence speed as compared outperforming all the methods in the literature. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00500-022-06917-z. In many domains, such as industry, academia, and medicine, data mining is defined as the science of extracting useful knowledge from vast datasets through the use of automated search processes that employ statistical and analytical techniques (Tomasevic et al. 2020) . To detect hidden associations in such datasets, it is necessary to identify meaningful patterns through processing and exploring the data contained therein (Viloria et al. 2019) . In prediction, data mining is used where some of the indicators are used to determine other indicators (classification) or in explanation, in which trends that can be readily interpreted by the user (clustering) are identified (Berkhin 2006) . Classification is a process that is an inherent aspect of daily life and it is perceived to be the decision-making function that is most frequently undertaken by human beings (Singh and Singh 2020) . Essentially, when we allocate an object to a predetermined class or category, we are classifying that object according to several different predetermined characteristics that may have some relation to the allocated object (Khanbabaei et al. 2019) . Data classification is an important data mining strategy which requires the prediction of values for categorical variables to produce input data and datasets with various values for predicting useful data (Tharwat 2020) . This can be achieved by constructing structures based on one or more categorical and/or numerical variables (Li et al. 2019) . The aim of any data classification technique is to achieve the optimal output when it is applied to a dataset and classifies that dataset into parts or classes that may be used as potential data for a specific target problem. However, to properly solve a classification problem, an automated system has to first learn the relevant attributes, which involves the use of a training set (input dataset) that includes those attributes (El-Khatib et al. 2019) . Many methods can be used to solve classification problems, such as the naive Bayes (Zhang et al. 2020) , the weshah@bau.edu.jo support vector machine (SVM) (Barman and Choudhury 2020) , the neural network (NN) (Bau et al. 2020) , and the decision tree (DT) (Rizvi et al. 2019) . One of the most widely employed techniques is the NN (Clark et al. 2003) . The NN has been found to be very useful for the classification of data, and there are several subtypes of NN, such as the feed-forward, multilayer perceptron (MLP), modular, and probabilistic neural network (PNN) (Huang et al. 2018) . To obtain a speed advantage due to the parallel architecture of the NNs, the researcher can generate a significant number of hardware neurons. Neural networks are used in many problem domains to investigate models that perform tasks such as the identification of genes in uncharacterized DNA (Bae et al. 2020) . Neural network learning algorithms have also been successfully extended for many unsupervised and supervised learning problems (Sun et al. 2018) . The PNN approach is a common data mining method that has been adapted to solve many pattern identification and classification issues (Lapucci et al. 2020a ). In the PNN, the process is managed by a multilayer network consisting of four layers: an input layer, pattern layer, sum layer, and output layer. In the first layer, the dimension (p) of the input vector reflects the dimension of the layer. In the second layer, the sum of the number of instances in the training sequence is equal to the size of the layer. The third layer (summation) consists of a series of classes in the set, and in the fourth layer, the test sample is classified in a number of classes i (output) (Dukov et al. 2019) . One way to increase the efficiency of a PNN classifier is to modify its weights using the results of a search strategy (Sedighi et al. 2019) . A metaheuristic algorithm offers an efficient method of solving complex problems as it applies a finite sequence of instructions. This type of algorithm can be defined as an iterative search method that explores and exploits the solution space effectively to find nearly optimal solutions in an efficient manner (Hussain et al. 2019) . To direct the search process toward the optimal solution, metaheuristics take into account the data gathered during the search, and then create new solutions by merging one or more good solutions (Roeva et al. 2020; Castillo and Amador-Angulo 2018) . However, metaheuristics are typically imperfect techniques; they do not ensure that the correct global solution is identified; they always find approximation solutions (Alweshah et al. 2015a (Alweshah et al. , 2020a . A number of recently published studies have explored the hybridization of metaheuristic approaches with many different types of classifiers to produce hybrid models (Bernal et al. 2021; Yuan and Moayedi 2019) . Generally, these hybrid approaches have greater accuracy and increased performance than traditional classification processes (Alwaisi and Baykan 2017) . Some of the metaheuristic approaches that have been hybridized with population-based and single-based classification processes include Tabu search (TS) (Alsmadi 2019) , the harmony search algorithm (HSA) (Elyasigomari et al. 2017) , the firefly algorithm (FA) (Alweshah and Abdullah 2015) , differential evolution (Maulik and Saha 2010) , ant colony optimization (Martens et al. 2007) , the genetic algorithm (GA) (Li et al. 2017) , biogeography-based optimization (BBO) (Alweshah 2019), flower pollination algorithm (Alweshah et al. 2022) , Salp swarm optimizer (SSA) (Kassaymeh et al. 2021) , African buffalo algorithm (ABA) (Alweshah et al. 2020b ) and many others (Al-Muhaideb and Menai 2013; Kumar et al. 2020b; Suresh and Lal 2020; Alweshah 2021) . As can be seen from the literature, there is a continuing trend to hybridize various types of classifier and metaheuristic algorithm for optimization and classification problems. In line with this research direction, this paper presents a new hybridization approach that uses the coronavirus herd immunity optimizer (CHIO) algorithm to change the PNN weights (Al-Betar et al. 2020) . Herd immunity is said to occur when the majority of a population is immune, and is considered to be a condition that contributes to the prevention of the transmission of a disease (John and Samuel 2000) . The CHIO algorithm not only imitates the herd immunity condition, it also applies psychological distancing principles that have been implemented to combat the current coronavirus pandemic. It has been shown that the concept and mechanisms of herd immunity can be transposed and modeled for the optimization domain (Alweshah et al. 2015b) . The rest of this paper is organized as follows. First, in Sect. 2, a review of the related work on the use of the PNN with metaheuristic algorithms is provided. Next, in Sect. 3, the CHIO is discussed. This is followed by Sect. 4 in which the specifics of the proposed approach, CHIO-PNN, are explained. Then, in Sect. 5, the experimental setup to test the performance of CHIO-PNN is described and the results of the experiments are discussed. Finally, in Sect. 6, some conclusions are drawn and a number of recommendations for further research are made. The efficiency of metaheuristic algorithms can be attributed to be investigated, for using in hybridization method to tackle the classification issue, which effectively identifies and uses the search space throughout the search procedure. This is achieved by tuning the encountered parameter weights until they are close to the ideal weights. In the following, some relevant works that have used the NN as a classifier are reviewed. The techniques that were used for metaheuristic optimization to obtain a better solution close to the optimal solution are also highlighted. Many local search techniques have been used to tackle classification problems. The first publication of note mentioned in this review is that by AL-Qutami et al. (2017) who used a simulated annealing (SA) optimization approach to select the most effective subgroup related to learners and the ideal combination strategy. The approach was assessed by applying it to real-world test data and it showed remarkable performance, with an average error rate of 2.4% and 4.7% for gas flow rates and liquid, respectively. On the other hand, Moutsopoulos et al. (2017) focused on solving the optimal groundwater level problem using the GA and TS algorithm to maximize the extracted flow rates. The authors found that the TS process was computationally more effective as compared to the GA. In another study that used the GA, Khalid (2017) optimized the shunt active power filter (APF) method using the GA and the adaptive TS algorithm. The authors conducted a simulation in Matlab programming language and demonstrated that their proposed control method for the aircraft shunt APF was extremely effective. Meanwhile, Alweshah (2018) investigated how efficiently an initial population can achieve increased convergence speed and more effective classification accuracy when resolving issues related to classification. To this end, a local search (i.e., the SA algorithm) was exploited to perform an initial solution to the issue of classification. The population-based method was also employed to solve classification problems by Juang and Yeh (2017) , who proposed a fully connected recurrent NN based on the use of the advanced multi objective continuous ant colony optimization (AMO-CACO) for the multi objective gait population of a biped robot (i.e., the NAO). Also, the authors in Chatterjee et al. (2017) proposed a modified cuckoo search (MCS)-trained NN (or NN-MCS model) for the detection of chronic kidney disease CKD. This model was used to overcome the problems observed while using local search-based learning algorithms to train the NN. In addition, Alweshah et al. (2017) proposed a PNN method based on the BBO method to improve classification accuracy, while Alweshah (2018) investigated how efficiently preliminary generations can increase convergence speed and result in more effective classification accuracy when resolving classification issues. Furthermore, an ANN approach with multilayer perceptron (MLP) structure and feed-forward propagation was applied in Jamshidian et al. (2018) to estimate the capillary pressure curves for a target reservoir. The ANN method was optimized by adopting the cuckoo optimization algorithm. Another NN, the bacterial foraging optimizationbased radial basis function neural network (BRBFNN) was implemented by Chouhan, et al. (2018) to identify and classify diseases that affect the leaves of plants. The MLP was also used in a study by Deo et al. (2018) , who developed a hybrid firefly algorithm with multilayer perceptron (MLP-FFA) method to resolve the issue of estimating long-term wind speed based on reference station input data including feasibility research studies on wind energy investment within data-scarce areas. The method was aimed at overcoming inadequate data by utilizing neighboring reference site data so that the target site wind speed could be forecast. The genetic algorithm (GA) has been also employed to solve classification problem. For instance, Mohammadi et al. (2017) investigated logical communication between independent and dependent variables where a cost task that relies on similar experimental data is defined. Such a task is accordingly optimized based on the use of the GA, where the most effective value for every parameter is identified. The authors in Reynolds et al. (2018) applied the GA to represent an assessment engine aimed at reducing energy consumption. The bespoke 24-h and heating set point schedules were created for every area inside a small office building located in the city of Cardiff in the UK. On the other hand, the HSA was applied in Bashiri et al. (2018) in which the authors applied a parameter varying method to increase the ability of the HSA. The results demonstrated that coupling an ANN coupled with the HSA is an accurate and simple method for predicting the maximum scour depth downstream of sluice gates. In another approach, Qi et al. (2018) applied a method for nonlinear relationships modeling and particle swarm optimization (PSO), which was applied for ANN architecture-tuning. The inputs of the ANN were the curing time, the solid content, the cement-tailing ratio and the tailing type. The PSO approach was also applied together with an ANN and expectation maximization in Qiu et al. (2018) to develop a rapid and precise dispersion estimation and source estimation technique. Furthermore, Aljarah et al. (2018) introduced a novel training algorithm that relied on the whole optimization algorithm (WOA). The authors found that the WOA was able to resolve a large range of issues related to optimization and surpassed other related enhanced algorithms. The WOA was also implemented in Abdel-Basset et al. (2018) in a hybrid model together with a local search strategy to resolve the permutation flow shop scheduling issue. In another study related to the classification problem, Alweshah et al. (2019) used the local search solution of the b-hill-climbing (b-HC) optimizer to find the best weight for the PNN through implementing a stochastic operator to prevent local optima. The proposed approach was tested on 11 benchmark datasets and the experimental results showed that the b-HC-PNN method performed better in terms of classification accuracy than the other methods in the comparison. Alweshah et al. also employed the African buffalo algorithm (ABO) and water evaporation algorithm in Alweshah et al. (2020b and c) , respectively, to enhance the PNN weights to make them as accurate as possible, and all the results indicated that both of these algorithms were able to adjust the PNN weights and thereby obtain a high classification accuracy. More comprehensive study of the effect of metaheuristic algorithms on the classification process, Mousavirad et al. (2020) compared the output of 15 metaheuristic algorithms for neural network preparation, including state-of-the-art and some of the most recent algorithms, and evaluated their success on various classification algorithms. In another recent study, Carrillo-Alarcón et al. (2020) addressed the unbalanced class problem, an unbalanced subset of such datasets was chosen to define eight categories of arrhythmia using combined under sampling based on the clustering approach and feature selection method. They compared two metaheuristic methods focused on differential evolution and particle swarm to investigate parameter estimation and boost sample classification. In training the Higher Order Neural Network (HONN) for data classification, the salp swarm algorithm (SSA) was used in Panda and Majhi (2020). The proposed approach was validated by examining different classification indicators across benchmark datasets. The proposed approach outperforms recent algorithms, confirming its superiority in terms of improved discovery and extraction capabilities. From the above overview of the most important recent classification methods, the NN is superior to many other techniques and can be used to resolve numerous diverse problems. Moreover, it is obvious that no single classifier can be used to deal with all kinds of problem. No classification technique is optimal for all cases because each approach has its own specific advantages for the certain areas of concern. Therefore, in this paper, the local search capability of the CHIO algorithm is employed to attempt to produce more reliable results and increase efficiency in training the PNN to solve classification problems through the management of random phases and the effective identification of a search space that can probably decide the optimal value. The CHIO is a recent metaheuristic algorithm that was proposed in 2020 by Al-Betar (2020). Like many other metaheuristic algorithms, it simulates the behavior of a natural entity and was motivated by the appearance of a pathogenic coronavirus. The CHIO mimics the mechanism of obtaining natural immunity against the through the application of herd psychology, which is considered to be one of the methods of acquiring immunity from infectious diseases. In 2020, a pathogenic coronavirus crossed habitats for the third time in as many decades to infect human populations (Melin et al. 2020a; Sun and Wang 2020) . This virus, provisionally known as 2019-nCoV, was first detected in Wuhan, China, in persons exposed to seafood or a wet market (Castillo and Melin 2020) . The quick reaction of the Chinese public health, clinical and research communities led to the identification of the associated clinical illness and provided initial knowledge of the epidemiology of the infection (Melin et al. 2020b; Perlman 2020) . Acquired immunity is formed, either by natural infection with either the pathogen or by vaccination mostly with a vaccine. Herd immunity is derived from the impact of the level of individual immunity on the wider herd (Randolph and Barreiro 2020) . It can be described as indirect immunity against infection that is provided to susceptible individuals when there is a relatively significant proportion of resistant individuals within a population (Boccaletti et al. 2020; Fontanet and Cauchemez 2020) . The idea of coronavirus herd immunity was mathematically modeled to establish a conceptual optimization algorithm, named CHIO. The algorithm is based on an idea of how best to defend society against disease by transforming the bulk of the vulnerable population that is not infected into a resistant population (Al-Betar et al. 2020). As a result, even the remaining vulnerable cases will not be infected and the resistant community will no longer spread the disease. The population of herd immunity individuals can be divided into three categories: susceptible, contaminated (or confirmed) and immunized (or recovered) persons (Al-Betar et al. 2020; Lavine et al. 2011) . A susceptible individual is a person who is not born with the virus or infected with the virus. However, a susceptible individual may be contaminated by coming into contact with infected persons who have failed to obey the prescribed social distance. An infected individual is a person who can pass on the virus to susceptible persons who are in close touch with the psychological distancing factor. The third category of individuals consists of persons who are listed as immunized. They are therefore protected from infection and do not infect untreated people. This sort of person can help the population to avoid transmitting the virus to others and causing a pandemic (Anderson and May 1990) . Figure 1 illustrates how the three types of individual in the population are represented. From the figure, it can be seen that herd immunity is represented as a tree in which the infected individual is the root, and the edges correspond to the other individuals that are contacted. The right-hand section of the figure indicates that the virus cannot be transmitted to contacted individuals if the root individual is immunized. The herd immunity strategy is modeled as an optimization algorithm. The six main phases of the CHIO algorithm are discussed below: The CHIO parameters and the issue of optimization are addressed in this step. In the sense of objective functionality, the optimization problem is formulated as shown in Eq. (1): where f(x) is the measured objective function (or immunity rate) that is computed for the individual x i = (x 1 ,x 2 ,...,x n ), where x i the gene indexed by i, and n represents the number of genes in each individual. Notice that each gene's value range is xi [ [lbi, ubi] , where lbi is located. The highest and lowest boundaries of gene xi are expressed by Lbi and Ubi. The CHIO algorithm has four algorithmic parameters and two operational parameters. The four algorithmic parameters are (1)C 0 , which is the number of preliminary cases of infection initiated by one individual; (2) HIS, which is the size of the population; (3) Max_Itr, which is the actual number of iterations; and (4) n, which represents the problem dimensionality. In this stage, two major control parameters of the CHIO are initialized: (1) the basic reproduction rate (BRr), which regulates the operators of CHIO by propagating the coronavirus among the individuals and (2) the maximum age of infected cases (MaxAge), which determines the classification of the infected cases as either having recovered or died. The CHIO produces a set of cases (individuals) as many as HIS spontaneously (or heuristically). In the herd immunity population (HIP), the generated cases are stored as a twodimensional matrix of size n 9 HIS as follows: HIP ¼ in which each row j represents a case xj that is generated basically. This includes x j i = Lbi ? (Ubi -Lbi) 9 U(0, 1), Vi = 1, 2,.,. n. The objective function (or immunity rate) is determined by using Eq. (1) for each situation. In addition, the HIS duration status variable (S) for all HIP cases is initiated by either zero (susceptible case) or one case (infected case). Note that the random initiation of the number of ones in (S) is as many as C 0 . The evolution phase is the CHIO's primary enhancement loop, where gene x j i in case x j , according to the proportion of the BRr, either remains the same or changes according Fig. 1 Population hierarchy in herd immunity scenario (Al-Betar et al. 2020) to the influence of social distancing based on the following three rules: where r produces a number generator between 0 and 1. The three rules are described below: Under the spectrum of r [ [0, 1 3 BRr] any social gap is caused by the new gene value of x j i t þ 1 ð Þ, which is achieved by the discrepancy between the present gene and a gene obtained from a contaminated case x c , such as where Notice that the value x c i t ð Þ is arbitrarily selected on the basis of a condition vector (S) from every contaminated case x c , so that c = {i|S(i) = 1}. The new gene value of x j i t þ 1 ð Þis influenced by any social gap within the spectrum of r [ [ 1 3 BRr; 2 3 BRr], which is determined by the discrepancy between the present gene and a gene extracted from a compromised case x m , such as where Notice that the value x m i t ð Þ is distributed from every resistant case x m randomly, and that it is centered on a vector of status (S) given that m = {i|S(i) = 0}. The new gene value of x j i t þ 1 ð Þis influenced by any social gap within the spectrum of r [ [ 2 3 BRr; BRr], which is determined by the discrepancy between the present gene and a gene extracted from a compromised case x v , such as where Notice that the value x v i t ð Þ is distributed from every resistant case x v randomly, and that it is centered on a vector of status (S) given that f ðx 3.4 Step 4: Update herd immunity population The immunity rate f(x j t þ 1 ð ÞÞ of each case x j t þ 1 ð Þ generated is determined and the actual case x j t ð Þ is replaced by the obtained case x j t þ 1 ð Þif the obtained case is stronger, such that f (x j t þ 1 ð ÞÞ \ f (x j t ð ÞÞ: Also, the age vector Aj is increased by a value of 1 if Sj = 1. For each event, the state vector (Sj) is modified x j based on the herd immune criterion that uses the following equation: where the binary value of is_corona ðx j t þ 1 ð ÞÞ is equal to 1 when the new value is a value from any infected case that has been inherited by case x j t þ 1 ð Þ. The Df x ð Þ is the mean significance of the immune population rates such as P HIS x i f ðx i Þ HIS . Notice that the immunity levels of the individuals in the population are altered depending on the social gap measured earlier. If the newly produced individual immunity rate is better than the population's average immunity rate, this means that the population is becoming more immune to the virus. If the recently discovered population is sufficiently strong to be immune to the virus, then the threshold of herd immunity has been reached. In this phase, if the immunity rate f ðx j t þ 1 ð ÞÞ of the current infected case (Sj = = 1) cannot be strengthened as defined by the Max_Age parameter (i.e., Aj [ = M ax_Age), then this case is considered dead. However, using x j i t þ 1 ð Þ = Lbi ? (Ubi -Lbi) 9 U(0, 1), Vi = 1, 2,., n is then regenerated from scratch. In addition, Aj and Sj are both set to 0. This phase may be beneficial in diversifying the current population and thereby avoiding local optima. The CHIO algorithm repeats step 3 to step 5 until the termination criterion is reached, which normally depends on whether the maximum number of iterations is reached. In this case, the population is dominated by the total number of susceptible and immunized cases. Also the infected cases are passed. Figure 2 shows the flowchart of the CHIO algorithm. The pseudocode of the CHIO phases is given below: Coronavirus herd immunity optimizer to solve classification problems 4 Proposed CHIO with PNN approach In this paper, the CHIO was combined with the PNN to adjust the NN weights with the aim of increasing the classification accuracy. In the proposed approach, first the PNN generates random solutions. Then, the CHIO is applied to adapt the weights produced by the PNN to improve the solution by optimizing the PNN weights. The PNN technique is a widely used data mining process and has been applied to many classification and pattern recognition problems. In this type of NN, the operations are organized into a multilayered network consisting of four layers, namely, an input layer, pattern layer, summation layer, and output layer. In the first layer (input) the dimension (p) of the input vector reflects the dimension of the layer. In the second layer (pattern), the dimension of the number of examples in the training set is equal to the dimension of this layer. The third layer (summation) consists of the number of classes within the group. The fourth layer (output) and the validation example are classified into a number of classes. The operational formulation in the PNN approach involves four major layers (Specht 1988): • The input layer, where every neuron has a predictive variable where values are fed for each of the neurons in the hidden layer. • The pattern layer: a single layer for every sample of training, which formulates a product related to the input vector x including the vector weight w i , z i = x.w i T . After that, the subsequent nonlinear processes are conducted (Eq. 11): where i is the pattern number, T is the total number of training patterns, X is the ith training pattern from category, and a is the smoothing parameter. • The summation layer: it aggregates the improvement for every class of inputs, and generates a network output as a vector of probabilities (Eq. 12): • The output layer generates different binary classes that are based on the decision classes X r and X s , r = s, r, s = 1, 2,…. ….,q and a classification criterion (Eq. 13): Such nodes just possess a single weight C, the probabilities of a previous membership, including the number of training samples within every class C that is provided by the cost parameter (Eq. 14): where h s denotes the preceding prospect where the current created sample proceeds to Group n, and c n denotes the misclassification cost. After constructing the NN, a group of network weights is tuned to nearly reach the required findings. The procedure is conducted based on using a training algorithm, which modifies different weights until a number of error criteria are obtained. The CHIO algorithm is used to improve the performance of the PNN when applied to classification problems. As seen in Fig. 3 , the PNN creates a random initial solution, and this solution is then submitted to the CHIO which tries to optimize the PNN weights. Thus, the search capability of the CHIO is useful for improving the performance of the PNN. This improvement can be achieved by managing the Figure 4 shows the structure of the proposed algorithm. It consists of two main parts. In the first part (in the lefthand side of the figure) , the PNN is trained on the training datasets. Then the tested datasets are categorized, and then computed the accuracy. In the second part, the CHIO is applied to adapt the weights of the PNN. Then the accuracy of the classification of the data is calculated. The aim of the training process is to decide the most accurate weights to assign to the connector row. The output is computed repeatedly in this step, and the result is compared to the preferred output provided by the training/ test datasets. The procedure begins with initial weights obtained at random by the original PNN classifier. The values from the data input are then multiplied by the PNN algorithm-determined weights w (ij). On the other hand, in the hybrid approach CHIO-PNN, the CHIO algorithm determines the accurate weights through its search capabilities. The CHIO was selected to obtain the highest accuracy and optimum parameter settings for training a PNN. The initial CHIO function does not restrict or regulate the random step duration in the CHIO. The proper combination of the exploration and exploitation phases in CHIO is critical to the performance of selecting the accurate weights to enhance the PNN's classification process. The correctness of the classification system is determined based on the number of true positives (TPs) and true negatives (TNs), false positive (FPs) and false negatives (FNs) produced by the system. A TP is defined as permissible actual labels and the approximate mark associated with the brand. A TN is the negative number between the current label and the projected label. A FP denotes the negative number for the actual mark. However, it is estimated as positive by the classifier. A FN is defined as the positive number for the individual label. However, it is estimated as negative by the classifier. Hence, classification quality is calculated according to Eq. 15 as follows: Additionally, two other performance measurements are taken into account to assess classification quality, namely, specificity and sensitivity, which are calculated by Eqs. 16 and 17, respectively: In a binary classification problem, there is a single positive class and a single negative class. Hence, the optimum classification accuracy in this context is achieved when the classifier achieves 100% accuracy and the error rate is 0. Sensitivity and specificity are statistical measures of binary classification, and are commonly used when comparing the performance of different classifiers. In this section, first, the experimental setup used to test the CHIO algorithm with PNN is described. The turbulence that was made depends on a number of criteria, namely, the accuracy rate, the convergence speed, and some measures of central tendency. Then the results of performance testing are presented, followed by a comparison of these results with those reported in some previous related works. The experiments were carried out using a personal computer with an Intel(R) Core(TM) i7-6006U CPU @ 2.00 GHz (four CPUs), * 2.0 GHz with 8 GB of RAM. Implementation of the CHIO algorithm was done using Matlab R2016a. The datasets were split into 70% for training, and 30% for testing. The experiments were executed over 30 runs for each dataset, and 100 iterations were included in each run. The CHIO approach that was applied to train the PNN was tested and benchmarked using 11 well-known real-world datasets in the University of California at Irvine (UCI) machine learning repository. The features of these datasets are summarized in Table 1 . The 11 benchmark datasets can be accessed and downloaded from http:/csc.lsu.edu/ * huypham/HBA_CBA/datasets.html. In the experiment, a simple train/test split function was used to make the split, where the test size = 0.3 and the training size = 0.7. Some preliminary experiments were conducted to determine the most suitable parameters for testing the performance of the proposed CHIO-PNN algorithm. Table 2 shows the parameter values that were used in all the experiments. When applied to each of the 11 UCI datasets, the PNN classifier method produces a tentative solution by generating the primary weights randomly. To adjust these weights, the CHIO is processed using the PNN technique. The optimum classification accuracy is achieved in a binary classification task, which contains a single positive class and a single negative class, when the number of FPs = 0, the number of FNs = 0, the number of TPs = the quantity of positive classes defined, and the number of TNs = the number of negative classes identified. In the proposed method, the values of FP, FN, TP and TN were determined effectively. To determine the precision of the proposed approach, Eqs. 15, 16 and 17 were used to measure the accuracy, sensitivity and specificity of the proposed approach. The experiments were conducted to test the accuracy, error rate, sensitivity, and specificity of two methods (PNN and CHIO-PNN) to determine whether or not the CHIO was successful in solving problems associated with the classification domain. Therefore, the classification accuracy indicates that its values are increasing and CHIO has demonstrated greater accuracy and increased efficiency than the general methods of classification. From the results obtained, the CHIO with PNN approach achieved an improvement in convergence speed, and moreover, CHIO-PNN yielded more successful results as compared to some other algorithms in the literature, as explained in the following paragraphs. First, from Table 3 , it can be seen that the proposed approach was able to adjust the weights of the PNN in all 11 datasets, thus increasing the degree of accuracy and reducing the error size with high efficiency. Good solutions for data classification problems can be found by eliminating the local optima trap during optimization. This is what the CHIO algorithm did by balancing global and local searches. The results of the proposed CHIO-PNN approach were compared with the results of the PNN and with those of some recent methods in the literature, namely the FA (Alweshah 2014), the ABO (Alweshah et al. 2020b ), b-HC ) and WEA (Alweshah et al. 2020c) , which were each combined with the PNN. All the comparisons were made using the same datasets and parameters as in those strategies. Table 4 shows the performance of the proposed CHIO-PNN approach against that of the other methods based on four criteria, namely, accuracy, sensitivity, specificity, and error rate. From Table 4 it is clear that CHIO-PNN was able to outperform FA-PNN in terms of classification accuracy in 10 out of the 11 datasets, and its performance was equal to FA-PNN in the remaining dataset, namely, Fourclass. Also, CHIO-PNN outperformed ABO-PNN in seven datasets, namely, PID, HSS, BC, LD, GCD, SPECTF, and ACA, and produced the same results in two datasets, namely, Heart and Fourclass. Moreover, it was able to outperform b-HC-PNN in five datasets, namely, PID, BC, GCD, SPECTF, and ACA, and it generated the same result in one dataset, namely, Fourclass. The CHIO-PNN approach also produced results with high efficiency. Hence, the performance of CHIO-PNN was highly accuracy. Also, overall, it outperformed the other methods because it achieved 90.3% average accuracy across all datasets. In comparison, PNN, FA-PNN, ABO-PNN and b-HC-PNN achieved an average accuracy rate of 75.5%, 85.9%, 89%, and 89.6%, respectively. Figure 5 shows the average of the best accuracy values achieved by all of the methods. It is well known that a stable and faster convergence speed can lead to better solutions (Alweshah et al. 2020d) . Therefore, to further evaluate the performance of the proposed CHIO-PNN approach, the convergence speed behavior curves of CHIO-PNN were evaluated when implemented on the 11 datasets over 30 individual runs each of 100 iterations for each dataset. The curves of CHIO-PNN were compared with those produced by the FA-PNN to determine the efficiency of the proposed method. The experimental results displayed in Fig. 6 show that CHIO-PNN was able to enhance the weight parameters of the PNN that were generated randomly and thus provide an improvement in terms of classification accuracy at a faster convergence speed as compared to FA-PNN. The superiority of the proposed approach is due to the ability of the CHIO algorithm to achieve the optimum balance between exploitation and exploration. Furthermore, the T test was also used to compare the performance of the CHIO-PNN approach with that of numerous optimization algorithms. Applying the CHIO-PNN and FA-PNN methods, which depend on the accuracy of the outcomes relevant to each dataset, the statistics of the findings are carried out. By performing a T-test examination including a significance interval of 95 percent (alp = 0.05) on the p values obtained and classification accuracy, various resulting statistics are displayed in Table 5 . From Table 5 , it can be seen that the performance of CHIO is significantly better than that of FA, where most of the P values for the 11 datasets are less than 0.0001. These results indicate that the use of the CHIO is beneficial for solving classification problems when used to refine the weights of the randomly generated PNN weights, as the refinements lead to an improvement in classification accuracy. Additionally, the boxplot technique was used to view the data distribution based on a summary of five numbers (minimum, first quartile (Q1), median, third quartile (Q3), and maximum). A boxplot shows whether the data are symmetrical and how closely they are clustered, and it also reveals the positions of outliers. Figure 7 shows the boxplots that explain the distribution of the resolution quality obtained by CHIO and FA when implemented on the 11 benchmark datasets. The figure shows the boxplot for 30 runs of CHIO and FA. The boxplots are being used to analyzing the PNN optimizer variability for getting best accuracy values in all the runs. From Fig. 7 , it is apparent that the boxplots confirm that the CHIO shows better performance than the FA when training the PNN. The main aim of this study is to adjust the neural network weights in attempt to optimize classification accuracy while still achieving fast convergence speed. To achieve the research goals, the original PNN was applied in classification problems, and the finding was compared with a hybrid method based on PNN and CHIO for classification problems. The PNN was used to produce random solutions, and the CHIO was used to develop them further by optimizing the PNN weights. Because of its exploration and exploitation abilities, CHIO is able to discover promising areas in a in a reasonable time. AS well as the CHIO's balance between local and general search avoids it being stuck in local optima. This confirmed the PNN's results after it was paired with the CHIO algorithm to provide a more accurate classification than the previous approaches in most datasets. The experimental results showed that the proposed CHIO-PNN approach produced highly accurate solutions at a fast convergence speed. In addition, the results of the comparison of the proposed approach with three different algorithms in the literature revealed that the proposed approach was, overall, more effective and had a higher average accuracy rate. Furthermore, the high-quality resolutions for issues related to the classification domain are highlighted where more efficient accuracy and improvement in convergence speed are obtained. In this paper, the coronavirus herd immunity optimizer (CHIO) was combined with the probabilistic neural network (PNN) for the purpose of adjusting the weights generated by the PNN to attempt to increase classification accuracy. In the proposed approach, first, the PNN generated random solutions. Then, the CHIO was applied to adapt the weights of the PNN, to enhancing the solution using the CHIO. The proposed approach, named CHIO-PNN, was applied to 11 UCI standard benchmark datasets to assess its performance in terms of classification accuracy, specificity, and sensitivity. The CHIO was selected to obtain the highest accuracy and optimum parameter settings for training a PNN. The initial CHIO function does not restrict or regulate the random step duration in the CHIO.The proper combination of the exploration and exploitation phases in CHIO is critical to the performance of selecting the accurate weights to enhance the PNN's classification process. The experimental results showed that CHIO-PNN was able to enhance the weight parameters of the PNN that were generated randomly and to provide an improvement in terms of classification accuracy and convergence speed as compared to the PNN alone and also when compared with other methods, namely, the FA, the ABO, b-HC and WEA. The CHIO-PNN approach outperformed all of these methods, achieving 90.3% accuracy on all datasets. In future work, the proposed CHIO-PNN could be extended to other actual and high-dimensional datasets to investigate how it behaves under various conditions in terms of the number of classes and attributes. Also, it can be used to solve problems in many fields such as studying human chromosomes, handwriting identification, image segmentation and feature selection issues. The online version contains supplementary material available at https://doi.org/10.1007/s00500-022-06917-z. A hybrid whale optimization algorithm based on local search strategy for the permutation flow shop scheduling problem Abu Doush I (2021) Coronavirus herd immunity optimizer (CHIO) Hybrid metaheuristics for medical data classification Virtual multiphase flow metering using diverse neural network ensemble and adaptive simulated annealing Optimizing connection weights in neural networks using the whale optimization algorithm Hybrid genetic algorithm with Tabu search with back-propagation algorithm for fish classification: determining the appropriate feature set Training of artificial neural network using Metaheuristic algorithm Firefly algorithm with artificial neural network for time series problems Construction biogeography-based optimization algorithm for solving classification problems Construction biogeography-based optimization algorithm for solving classification problems Solving feature selection problems by combining mutation and crossover operations with the monarch butterfly optimization algorithm Hybridizing firefly algorithms with a probabilistic neural network for solving classification problems Cluster based data reduction method for transaction datasets Evolution of software reliability growth models: a comparison of auto-regression and genetic programming models Biogeography-based optimisation for data classification problems Coronavirus herd immunity optimizer with greedy crossover for feature selection in medical diagnosis b-Hill climbing algorithm with probabilistic neural network for classification problems A hybrid mine blast algorithm for feature selection problems African Buffalo algorithm: training the probabilistic neural network to solve classification problems Water evaporation algorithm with probabilistic neural network for solving classification problems Flower pollination algorithm for solving classification problems Immunisation and herd immunity DNA privacy: analyzing malicious DNA sequences using deep neural networks Soil texture classification using multi class support vector machine Prediction of local scour depth downstream of sluice gates using harmony search algorithm and artificial neural networks Understanding the role of individual units in a deep neural network A survey of clustering data mining techniques Optimization of type-2 fuzzy logic controller design using the GSO and FA algorithms Modeling and forecasting of epidemic spreading: the case of Covid-19 and beyond Algredo-Badillo I (2020) A metaheuristic optimization approach for parameter estimation in arrhythmia classification from unbalanced data A generalized type-2 fuzzy logic approach for dynamic parameter adaptation in bee colony optimization applied to fuzzy controller design Forecasting of COVID-19 time series for countries in the world based on a hybrid approach combining the fractal dimension and fuzzy logic Cuckoo search coupled artificial neural network in detection of chronic kidney disease Bacterial foraging optimization based radial basis function neural network (BRBFNN) for identification and classification of plant leaf diseases: an automatic approach towards plant pathology A neural network based approach to automated e-mail classification Multi-layer perceptron hybrid model integrated with the firefly optimizer algorithm for windspeed prediction of target site using a limited set of neighboring reference station data. Renew Energy Evaluation of PNN pattern-layer activation function approximations in different training setups Glass classification using artificial neural network Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification COVID-19 herd immunity: where are we? Enterprise credit risk evaluation based on neural network algorithm Metaheuristic research: a comprehensive survey A novel estimation method for capillary pressure curves based on routine core analysis data using artificial neural networks optimized by Cuckoo algorithm-a case study Herd immunity and herd effect: new insights and definitions Multiobjective evolution of biped robot gaits using advanced continuous ant-colony optimized recurrent neural networks Salp swarm optimizer for modeling the software fault prediction problem Performance evaluation of Adaptive Tabu search and Genetic Algorithm optimized shunt active power filter using neural network control for aircraft power utility of 400 Hz Applying clustering and classification data mining techniques for competitive and knowledge-intensive processes improvement Ensemble classification technique for heart disease prediction with metaheuristic-enabled training system PNN and KCNQ1OT1 can predict the efficacy of adjuvant fluoropyrimidine-based chemotherapy in colorectal cancer patients Natural immune boosting in pertussis dynamics and the potential for long-term vaccine failure Genetic algorithm for the optimization of features and neural networks in ECG signals classification Deep learning for hyperspectral image classification: an overview Classification with ant colony optimization Automatic fuzzy clustering using modified differential evolution for image classification Analysis of spatial spread relationships of coronavirus (COVID-19) pandemic in the world using self organizing maps Multiple ensemble neural network models with fuzzy response aggregation for predicting COVID-19 time series: the case of Mexico Intelligent parameter optimization of Savonius rotor using Artificial Neural Network and Genetic Algorithm A benchmark of recent population-based metaheuristic algorithms for multi-layer neural network training Management of groundwater resources using surface pumps: optimization using genetic algorithms and the Tabu search method Majhi SK (2020) Effectiveness of swarm-based metaheuristic algorithm in data classification using Pi-sigma higher order neural network Another decade, another coronavirus Neural network and particle swarm optimization for predicting the unconfined compressive strength of cemented paste backfill Atmospheric dispersion prediction and source estimation of hazardous gas using artificial neural network, particle swarm optimization and expectation maximization Herd immunity: understanding COVID-19 A zone-level, building energy optimisation combining an artificial neural network, a genetic algorithm, and model predictive control The role of demographics in online learning; a decision tree based approach Joint set-up of parameters in genetic algorithms and the artificial bee colony algorithm: an approach for cultivation process modelling A novel hybrid model for stock price forecasting based on metaheuristics and support vector machine Role of data mining techniques in bioinformatics Modeling COVID-19 epidemic in Heilongjiang province, China Evolving unsupervised deep neural networks for learning meaningful representations A metaheuristic framework based automated Spatial-Spectral graph for land cover classification from multispectral and hyperspectral satellite images Classification assessment methods An overview and comparison of supervised data mining techniques for student exam performance prediction Integration of data mining techniques to PostgreSQL database manager system Evaluation and comparison of the advanced metaheuristic and conventional machine learning methods for the prediction of landslide occurrence Class-specific attribute value weighting for naive bayes Acknowledgement This work has been carried out during sabbatical