An evolutionary voting for k-nearest neighbours An evolutionary voting for k-nearest neighbours Daniel Mateos-García, Jorge García-Gutiérrez, José C. Riquelme-Santos Keywords: Evolutionary computation Nearest-neigbour Weighted voting a b s t r a c t This work presents an evolutionary approach to modify the voting system of the k-nearest neighbours (kNN) rule we called EvoNN. Our approach results in a real-valued vector which provides the optimal relative con-tribution of the k-nearest neighbours. We compare two possible versions of our algorithm. One of them (EvoNN1) introduces a constraint on the resulted real-valued vector where the greater value is assigned to the nearest neighbour. The second version (EvoNN2) does not include any particular constraint on the order of the weights. We compare both versions with classical kNN and 4 other weighted variants of the kNN on 48 datasets of the UCI repository. Results show that EvoNN1 outperforms EvoNN2 and statistically obtains better results than the rest of the compared methods. e m i 2 i A ( m c G s t r c c K n P o C u m t I i K o o f p n l M t t s T c t m l w t n 1. Introduction Weighting in machine learning is a common technique for mpha-sizing some characteristics of data to improve the resulting odels. For example, weighting has been used to outline the mportance of some particular instances (Blachnik & Duch, 011) or features (Zhi, Fan, & Zhao, 2014), or rank a set of techniques n the context of en-sembles (Berikov, 2014). In a broad sense, rtificial Neural Networks (ANNs) and Support Vector Machines SVMs) can be also seen as ex-amples of using weights in learning odels but the k-nearest neigh-bours (kNN) has been the most ommon technique to benefit from weights (Mateos-García, García- utiérrez, & Riquelme-Santos, 2012). kNN and its variants have been widely used in the literature to olve real problems. Rodger (2014) used a hybrid model to predict he demand of natural gas. The system was implemented integrating egression, fuzzy logic, nearest neighbour and neural networks, and onsidering several variables such as the price, operating expenses, ost to drill new wells, etc. If we focus on biological data, Park and im (2015) selected significant genes from microarrays by using a earest-neighbour-based ensemble of classifiers. On the other hand, ark, Park, Jung, and Lee (2015) tackled the problem of designing rec- mmender systems. For this purpose the authors presented Reversed F (RCF), a fast item-based collaborative filtering algorithm which tilizes a k-nearest neighbour graph. i d e t c p A The main goal of a weighting system lies in the optimization (com- only by metaheuristics) of a set of weights in the training step to ob- ain the highest accuracy but trying not to overfit the resulting model. f we focus on kNN weighting methods, many proposals weight- ng features or instances can be found. In Raymer, Punch, Goodman, uhn, and Jain (2000) a weighting method to obtain an optimal set f features was provided. The set of features was selected by means f a kNN-based genetic algorithm using a bit vector to indicate if a eature was in the selection or not. In a later work, the same authors resented a hybrid evolutionary algorithm using a Bayesian discrimi- ant function (Raymer, Doom, Kuhn, & Punch, 2003) and trying to iso- ate characteristics belonging to large datasets of biomedical origin. oreover, Paredes and Vidal (2006) used different similarity func- ions to improve the behaviour of the kNN. In a first approximation, hey considered a weight by feature and instance on training data re- ulting in a non-viable number of parameters in the learning process. hen, the authors proposed three types of reduction: a weight by lass and feature (label dependency), a weight by prototype (proto- ype dependency) and a combination of the previous ones. The opti- ization process was carried out by descendant gradient. In the same ine, Tahir, Bouridane, and Kurugollu (2007) showed an approach that as able to both select and weight features simultaneously by using abu search. Furthermore, Mohemmed and Zhang (2008) presented a earest-centroid-based classifier. This method calculated prototyp- cal instances by considering arithmetic average from the training ata. To classify an instance, the method calculated the distance to very prototype and then selected the nearest one. The optimiza- ion of the best centroids that minimized the classification error was arried out through particle swarm. Fernandez and Isasi (2008) also roposed a weighting system by using a prototype-based classifier. fter a data normalization that was based on the position of the http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2015.08.017&domain=pdf mailto:mateosg@us.es mailto:jorgarcia@us.es mailto:riquelme@us.es http://www.lsi.us.es p v w o c b t t x d y x w δ e e t m t n s b s a t 2 t s m n 2 w n v s E i u i i m instances with respect to regions, the weights were iteratively calcu- lated. More recently, AlSukker, Khushaba, and Al-Ani (2010) used dif- ferential evolution to find weights for different aspects of data. They described four approaches: feature weighting, neighbour weighting, class weighting and mixed weighting (features and classes), with the latter being the one providing the best results. Weighting has also been applied to the vote system of the kNN. Thus, the distance-weighted k-nearest neighbour rule (WKNN) pro- osed by Dudani (1976) has been known for long. WKNN weights the otes of the k nearest neighbours (wi) according to Eq. (1) where di is the distance of the ith nearest neighbour (being d1 the distance of the nearest) regarding an instance to be classified. A similar ver- sion using a uniform weighting (UWKNN) has also been proposed where a weight is inversely proportional to the position reach among the neighbours (i.e., wu i = 1/i). Recently, both techniques have been explored working together as a new classifier called Dual-Weighted kNN (DWKNN) showing promising results where each weight was calculated according to Eq. (2) (Gou, Xiong, & Kuang, 2011). A later ork of Gou, Du, Zhang, and Xiong (2012) provided another version f DWKNN where the calculation of the weights were different ac- ording to Eq. (3). wwi = ⎧⎨ ⎩ (dk − di) (dk − d1) if di �= d1 1 if di = d1 (1) wdw1 i = wwi ∗ wui (2) wdw2 i = ⎧⎨ ⎩ (dk − di) (dk − d1) ∗ (dk + d1) (dk + di) if di �= d1 1 if di = d1 (3) Although all the previous approaches provided improvements re- garding the classical kNN performance, they have not explored the possible better suitability of evolutionary computation for the op- timization of the neighbours weights. Thus, we propose an evolu- tionary method to improve the kNN rule by altering the vote system knowledge obtained in the training phase. We also explore the use of constraints in learning weights with two different versions of our approach and we study their reliability compared with classical kNN and other 4 weighted variants on UCI datasets (Lichman, 2013). Fi- nally, results are statistically validated to reinforce the final conclu- sions. The rest of the paper is organized as follows. Section 2 presents the elements of the two versions of the evolutionary algorithm de- signed to weight the vote system of the kNN. The results and several statistical tests are specified in Section 3. Finally, Section 4 presents the main findings and future work. 2. Method In this section, two variants of our voting optimization system called Evolutionary Voting of Nearest Neighbours (EvoNN) are de- scribed. 2.1. Purpose and functionality The aim of our work was to find a set of weights to modify the influence of every nearest neighbour when they voted to assign a la- bel to an unlabelled instance. Moreover, our approach also provided a measurement about the influence of the proximity of neighbours by means of the optimized weights. Thus, the evolutionary process provides the optimal weights that have been found to improve the classification accuracy of the domain under study. To formalize our approach we assume that the set of classes (or labels) L is represented by the natural numbers from 1 to b, with b being the number of labels. Thus, let D = {(e, l) | e ∈ R f and l ∈ L = {1, 2, . . ., b}} be the dataset under study with f being the num- er of features and b the number of labels. Let label be a function hat assigns to every element e the real class to which it belongs o. Let us suppose that D is divided in the sets TR and TS, each of them being the training set and the testing set, respectively, such that D = T R ∪ T S and T R ∩ T S = ∅. In the training step the classification er- ror is minimized with participation only by instances from TR. This classification error is calculated as follows. For each x ∈ TR we com- pute its k nearest neighbours according to a distance function d. Let i with i = 1. . .k be the neighbours to x but ranked by distance, i.e.: (x, x1) ≤ d(x, x2). . . ≤ d(x, xk) and ∀y ∈ TR with y �= xi, d(x, xk) < d(x, ). According to the standard kNN rule, the prediction of the label of from the labels of its neighbours can be formalized: predLabel(x) = arg max l∈{1..b} k∑ i=1 δ(l, label(xi)) (4) here (l, label(xi)) = { 1 if label(xi) = l 0 otherwise (5) If instead of a unitary vote, we consider that the ith neighbour contributes with the weight wi ∈ R, the function (4) is redefined: predLabel(x, w) = arg max l∈{1..b} k∑ i=1 ωiδ(l, label(xi)) (6) The function to minimize is the sum of all prediction errors of ev- ry instance x from TR. Thus, if we define the error function as rror(x, w) = { 1 if predLabel(x, w) �= label(x) 0 otherwise (7) hen the function to optimize is: in w∈Rk ∑ x∈T R error(x, w) (8) With regard to the two versions of the evolutionary algorithm, hey were called EvoNN1 and EvoNN2. For EvoNN1, the nearest eighbours (those with lowest distances regarding the unlabelled in- tance) were “ heavier” and therefore, their influence (weight) had to e greater. In EvoNN2 weights had no constraints. Regarding the de- ign of the evolutionary algorithm, it is easy to suppose that EvoNN1 nd EvoNN2 were similar except on the encoding, crossover and mu- ation. .2. Voting optimization This subsection details the search algorithm to calculate the op- imum contribution of k nearest neighbours carried out by two ver- ions of an evolutionary algorithm . It is then necessary to define their ain characteristics i.e., individual encoding, genetic operators, fit- ess function and generational replacement policy. .2.1. Individual encoding In EvoNN1 and EvoNN2, an individual was a real-valued vector ith size k symbolizing the relative contribution of the k-nearest eighbours in the voting system of the kNN rule. Position 1 in the ector was associated with the nearest neighbour in distance and po- ition k with the furthest. Moreover, a constraint was established in voNN1 to assure that the closest neighbours were more important .e., ω1 ≥ ω2 ≥ . . .ωk. Regarding the initial population, both designs integrated individ- als with k random values between 0 and 1 (ordered in EvoNN1). To nclude individuals representing classic kNN, initial population also ncluded several vectors with the first k values set to 1 and the re- aining set to 0 in the initial population e.g., (1.0, 0.0, . . ., 0.0) for 1: Fitness(W, k, TR) : error 2: error = 0 3: for i = 1 to m do 4: Divide TR in s bags: B1...Bs 5: partialError = 0 6: for j = 1 to s do 7: partialError = partialError + Evaluate(W, k, TR − Bj, Bj) 8: end for 9: partialError = partialError/s 10: error = error + partialError 11: end for 12: error = error/m 13: return error 14: Evaluate(W, k, Train, Test) : error 15: error = 0 16: for each instTest in Test do 17: lab = NearestN(W, k, Train, instTest) 18: if lab = label(insTest) then 19: error = error + 1 20: end if 21: end for 22: error = error/size(Test) 23: return error 24: NearestN(W, k, Train, y) : labY 25: sortedInst and kNeighbours are empty sorted sets 26: for each x in Train do 27: insert x in sortedInst sorted by d(x, y) 28: end for 29: kNeighbours = sortedInst.get(k) 30: labY = majorityLabel(kNeighbours, W ) 31: return labY Fig. 1. Fitness function. k 1 s f 2 i l c o B S r m o o o d a g g t i i 2 T m d a o t v w c ( f fi d s ( fi ( v = 1, (1.0, 1.0, . . ., 0.0) for k = 2, etc. Finally, the maximum value of for a weight could be surpassed during the evolutionary process to tand out the importance of a concrete neighbour as explained in the ollowing section. .2.2. Crossover and mutation The goal of the crossover operator was the generation of a new ndividual (offspring) from the genotypic characteristics of a set of se- ected parents (two in our case, parent1 and parent2). There is also a onstraint in the order of the genes in EvoNN1. Thus, the crossover perator in the ith gene for EvoNN1 was defined as in Eq. (9) where LX − α stands for the crossover operator defined in Eshelman and chaffer (1993) and calculated from parent1(i) and parent2(i), γ is a andom value between 0 and 1, max = o f f spring(i − 1) and min = inimum(parent1(i), parent2(i), o f f spring(i − 1)). f f spring(i) = { BLX − α if i = 1 (max − min) ∗ γ + min otherwise (9) In EvoNN2, because the individuals were freely built, the crossover perator was simpler, see Eq. (10): f f spring(i) = BLX − α from parent1(i) and parent2(i) (10) Regarding the mutation operator, the ith gene of the individual in- iv could change according to Eq. (11) in EvoNN1 where δ was initially random value between 0 and 1. Then, to improve the fit through the enerations, the δ value was in the interval [0, 1] during the first 10 enerations. For the next 10, it was in [0, 0.9], etc. On the other hand, he mutation operator in EvoNN2 worked as can be seen in Eq. (12). ndiv′(i) = ⎧⎪⎪⎨ ⎪⎪⎩ indiv(i) + indiv(i) ∗ δ if i = 1 indiv(i) − indiv(i) ∗ δ if i = k (indiv(i − 1) − indiv(i + 1)) ∗ δ +indiv(i + 1) otherwise (11) ndiv′(i) = indiv(i) ± indiv(i) ∗ δ (12) .2.3. Fitness function EvoNN1 and EvoNN2 were defined by the same fitness function. his is because the proposed evolutionary designs were thought to inimize the classification error and therefore, the order of weights id not therefore have any influence on the fitness function. Both pproaches used the TR ⊂ D exclusively to obtain the contributions f the neighbours in the training step. Since we know the labels of he instances from TR, the fitness function was based on a cross- alidation error rate by using kNN and the weighted voting system. Fig. 1 shows the fitness calculation with m × s cross validations, here m stands for the number of iterations of the validation pro- ess (line 3) and s is the number of partitions of the training data TR line 4). Thus, the set TR is randomly divided in the bags B1, B2...Bs or each validation. Then, every bag Bj is evaluated through a classi- cation process by using T R − B j as a training set. This evaluation is riven by the function Evaluate which we will describe later. The clas- ification error on every Bj is accumulated on average by partialError lines 7 and 9), and by error in every validation (line 10). Finally, the tness value is the result of calculating the mean of all the validations line 12). The input parameters of the function Evaluate are the weighted ector W, the k value, and the subsets T R − B j and Bj (line 7). a i a e a w t f t W 2 d t r i Table 1 Comparison of accuracies of EvoNN1 and EvoNN2. In bold, the best. Dataset EvoNN1 EvoNN2 Dataset EvoNN1 EvoNN2 anneal 97.98 97.95 heart h 86.78 86.04 arrhythmia 60.13 59.76 heart stat 78.38 78.13 audiology 67.12 65.97 hepatitis 83.44 81.26 australian 83.89 84.11 hypothyroid 93.06 93.04 autos 75.42 75.12 ionosphere 90.45 90.36 balance 86.27 86.23 iris 99.78 99.77 breast c. 71.27 69.64 kr vs kp 98.5 98.55 breast w. 97.15 97.17 labor 81.69 81.22 bridges v1 70.7 67.86 landsat 93.86 93.77 bridges v2 69.79 66.42 liver disorders 61.49 60.72 car 83.43 86.89 lung cancer 81.4 75.79 cmc 63.21 63.07 lymph 80.87 80.47 colic 79.67 79.55 mfeat factors 97.1 97.13 credit a 84.34 83.75 mfeat fourier 86.19 85.91 credit g 72.17 72.81 mfeat karhunen 98.04 98 cylinder b 80.19 80.19 mfeat morph. 97.24 97.31 dermatology 95.39 95.62 mfeat pixel 96.93 96.92 diabetes 71.73 71.6 mfeat zernike 92.05 92.05 ecoli 93.17 92.81 molecular bio. 38.34 32.47 flags 55.4 54.73 monks 1 47.39 54.79 glass 60.8 62.57 monks 2 47.12 54.35 haberman 72.77 68.49 monks 3 47.78 46.22 hayes 62.35 62.26 mushroom 100 100 heart c 82.52 82.14 nursery 95.91 96.22 i t d p e o e t d r g m W t u m f T c s t ( t c s d u g i p t a t t b Therefore, the result of this function is the error rate on Bj taking T R − B j as reference to calculate the neighbours. For every single instance from the set used to measure the fitness (line 16), the returned label by the function NearestN is the majority one according to the k nearest instances belonging to the set used as training data (line 17). If the returned label does not correspond to the real label of the testing instance, the error is increased by 1 (line 19). Then, the resulting error is normalized with the size of the set used as testing data (line 22). Therefore, the value returned by Evaluate is real number between 0 (all instances are well-classified) and 1 (all nstances are misclassified). The function NearestN calculates the nearest instances to the ex- mple y belonging to the set under evaluation (line 24 and seq.). Ev- ry example of the neighbours bag is then inserted in a sorted set ccording to the distance to y. Thus, the example at the first position ill be the nearest to y and the one at the last position will be the fur- hest (line 27). When the algorithm selects the k nearest neighbours rom the sorted set (line 29), the majority label is returned according o the relative contribution of each neighbour expressed by the vector and according to the Eq. (6) (lines 30 and 31). .2.4. Generational policy Regarding the transition between generations, we chose an elitist esign where the best individual was transferred from one genera- ion to the next without being affected by the mutation operator. The emaining population is built as follows: with N being the number of ndividuals, C − 1 individuals were created by cloning the best indi- vidual from the previous generation, and the next N − C individuals resulted from the crossover operation. The selection of the individu- als to cross was carried out by the tournament method. Furthermore, all individuals except the first one were under mutation with a prob- ability of p. 3. Results Applying optimization techniques to a kNN classifier is a line of re- search that has been widely discussed in the literature with consider- able success. Finding an optimal weighting (for features or instances) increases the degrees of freedom of kNN. This allows the model to achieve a better fit in the training step and consequently improve the accuracy in testing. The novelty of this work lies in that we do not weight features or instances but the influence of each of the k near- est neighbours. In the traditional kNN model, the selected neighbours participate with a unitary vote. Our approach modifies the weight of the votes to modulate the contribution of each neighbour. Thus, our approach introduces a flexibility mechanism able to “ learn” geomet- ric arrangements of the different classes that depend on the domain under study. For example, if we focus on a dataset where the first neighbour is twice as important as the second one, and the second neighbour is in turn twice as important as the third, fourth and fifth one, we know that the traditional kNN is not able to exploit this sin- gularity but our proposal could suggest a weighted vector (4,2,1,1,1) which would be an optimal solution. To assess the quality of our approaches, we use 48 datasets from the repository UCI (Lichman, 2013) with different types of fea- tures and number of classes. All data were preprocessed with the same techniques i.e., binarization of nominal features, replacement of missing values and normalization to avoid the Hughes effect. Regard- ing the evolutionary search configuration and after a trial-and-error process, we used a population of 100 individuals, 100 generations, 10% of elitism and a mutation probability of 0.1. Regarding the pa- rameters α (crossover), g (mutation) and k (number of neighbours) their values were set at 0.5, 20 and 5 respectively for both EvoNN1 and EvoNN2. Although the experiments were carried out on four Intel® Xeon® Processor E7-4820, an increase of the computational cost s usually derived when using evolutionary computation. Thus, he CPU time for the optimization process can be expressed as #Generations×#Individuals×kNN time #threads where the number of available threads epends on the workload. Taking the German Credit Data as an exam- le (“credit g” in Table 1 with 1000 instances and 20 features), the volutionary algorithm took about thirteen and a half minutes to run nce. In a first level, a comparison between EvoNN1 and EvoNN2 was stablished. Table 1 shows the results obtained. Every row represents he mean accuracy after five 10-fold cross-validations (10FCV) with ifferent random seeds on a dataset. We provided the mean accu- acy to decrease the influence of randomness in the evolutionary al- orithms. The results show that EvoNN1 obtained better results for ost of the datasets. After testing the performance of both versions, we applied the ilcoxon signed-rank test to try to assure that the differences be- ween the two versions were statistically significant. The reason for sing a non-parametric test lies in the fact that the results did not eet the necessary conditions to apply parametric tests, especially or the sphericity condition (Demšar, 2006; García & Herrera, 2008). he statistic for Wilcoxon was 290.0 and the p-Value 0.0062, so we ould state that they behaved significantly different from each other. To measure the quality of our best approach, we established a econd level of comparison among IBk (implementation of kNN in he framework WEKA (Hall et al., 2009)), WKNN, UKNN, DWKNNv1 Gou et al., 2011) and DWKNNv2 (Gou et al., 2012), and EvoNN1. All he weighting algorithms were developed from the original IBk but hanging its voting system, and keeping the same k value of 5. Table 2 hows the mean accuracy obtained by the analyzed algorithms. Every ataset was evaluated for each technique as the mean of five 10FCV sing five different seeds. As can be seen, the performance of our al- orithm was the best in 25 out of the 48 datasets, and the second best n 7 out of the remaining 23. Although our approach seemed to outperform the rest of com- etitors, the results were statistically validated again to reinforce his conclusion. Thus, we carried out a non-parametric Friedman test nd a Holm post-hoc procedure to elucidate if the performance of he different algorithms was statistically different. The first step for he Friedman test is the calculation of the mean rankings reached y each technique (a ranking of 1 is the best). Table 3 shows the Table 2 Accuracy of every studied algorithm throughout 48 datasets from UCI. Datasets EvoNN1 IBk WKNN UWKNN DWKNN1 DWKNN2 anneal 97.98 97.1 97.85 98.43 99.14 98.23 arrhythmia 60.13 59.39 57.53 58.45 54.9 57.23 audiology 67.12 60.73 68.4 68.78 66.48 68.44 australian 83.89 84.38 82.16 83.46 80.4 82.11 autos 75.42 59.79 73.22 75.87 76.19 76.13 balance 86.27 88.28 86.81 82.65 82.01 86.81 breast c. 71.27 73.86 69.63 70.95 68.12 69.63 breast w. 97.15 97 96.09 96.36 95.39 96.04 bridges v1 70.7 58.34 65.11 64.93 64.08 65.24 bridges v2 69.79 60.47 62.75 63.37 62.15 62.75 car 83.43 93.13 93.13 88.16 88.16 93.13 cmc 63.21 45.83 45.26 45.37 44.16 44.56 colic 79.67 79.2 73.7 77.32 70.79 73.5 credit a 84.34 83.46 81.41 83.65 80.12 81.47 credit g 72.17 72.59 71.87 72.9 71.33 71.79 cylinder b 80.19 72.29 78.7 77.37 79.16 78.81 dermatology 95.39 96.26 96.32 95.79 95.16 96.32 diabetes 71.73 74.72 72.67 72.91 70.59 72.54 ecoli 93.17 86.49 82.84 83.78 80.24 82.18 flags 55.4 54.31 59.12 60.31 57.93 59.12 glass 60.8 66.13 70.27 68.05 69.49 70.07 haberman 72.77 71.08 70.46 67.55 64.92 69.77 hayes 62.35 27.66 67.52 67.47 74.62 67.52 heart c 82.52 83.31 80.6 80.14 76.66 80.27 heart h 86.78 79.41 78.56 78.56 76.67 78.54 heart stat 78.38 78.36 76.59 79.39 75.7 76.31 hepatitis 83.44 82.67 81.8 83.91 79.95 80.64 hypothyroid 93.06 93.25 92.62 93.13 91.2 92.5 ionosphere 90.45 85.61 87.46 86.33 86.89 87.62 iris 99.78 95.59 95.19 95.66 95.66 95.19 kr vs kp 98.5 96.2 95.95 95.12 93.7 95.95 labor 81.69 80.07 84.01 85.78 85.67 84.13 landsat 93.86 90.29 90.53 90.37 90.13 90.56 liver disorders 61.49 58.36 61.28 60.57 59.34 61.01 lung cancer 81.4 80.95 71.44 78.02 67.87 71.44 lymph 80.87 78.53 82.82 84.97 82.37 82.85 mfeat factors 97.1 96.56 96.69 96.51 96.08 96.68 mfeat fourier 86.19 81.56 81.02 81.26 80.07 80.94 mfeat karhunen 98.04 96.11 96.8 96.75 96.18 96.77 mfeat morph. 97.24 71.08 68.03 68.44 65.9 67.7 mfeat pixel 96.93 95.92 96.71 96.78 96.36 96.7 mfeat zernike 92.05 80.53 79.09 79.39 79.03 79.07 molecular bio. 38.34 32.57 33.56 34.76 35.24 33.56 monks 1 47.39 52.25 35.8 39.51 37.19 35.8 monks 2 47.12 54.13 38.27 40.78 38.03 38.27 monks 3 47.78 38.48 35.28 37.59 38.93 35.45 mushroom 100 100 100 100 100 100 nursery 95.91 98.36 93.63 96.1 93.23 93.63 Table 3 Mean ranking reached by every com- pared technique. Technique Ranking EvoNN1 2.270 UWKNN 3.062 IBk 3.458 WKNN 3.594 DWKNN2 3.781 DWKNN1 4.833 r s w 2 c c c m Table 4 Results for Holm post-hoc procedure. DataSet p z Holm’s adjusted α DWKNNv1 1.94E−11 6.710 0.010 DWKNNv2 7.65E−5 3.955 0.013 WKNN 5.32E−4 3.464 0.017 IBk 0.002 3.109 0.025 UWKNN 0.038 2.073 0.050 F ( t ( s v b a b c ankings in our case. Then, we carried out the Friedman test. The tatistic for Friedman was 48.95, distributed according to a chi-square ith 5 degrees of freedom. The p-Value for Friedman was around .302E−9 so the null hypothesis (no statistical difference among the ompared techniques) could be rejected with an α = 0.05. Because Friedman’ test should be accompanied by a post-hoc pro- edure for multiple pairwise comparison, we used the Holm pro- edure. This procedure took our approach (EvoNN1) as the control ethod and compared it with each other technique by a pairwise riedman test. The results of the procedure can be seen in Table 4 p-Value, Friedman statistic and adjusted α for Holm procedure). In his case, every test rejected the hypothesis of no pairwise difference p-Value below adjusted α), so we could state that our algorithm was ignificantly better than its competitors from a statistical point of iew taking into account that it obtained the best ranking and also ehaved statistically different from the rest. From our experimentation, some facts should be outlined. First, n evolutionary algorithm to adjust the contribution of the neigh- ourhood improved the behaviour of the classical kNN rule. This fact ould not completely be assured for the rest of weighting techniques B D D E F G G G probably due to a better adaptability of metaheuristics compared with strict functions as the suggested by DWKNN or UWKNN. Second, the correct working of the evolutionary approach was highly correlated with the distance of the weighted neighbours to the unlabelled instances in the training phase. In other words, although EvoNN2 had more degrees of freedom in the search space, EvoNN1 reached the best performance. Although this proved true, we think this fact can be related to problems in the evolutionary search in EvoNN2 so more work could be needed to make it reach its poten- tially better weights. 4. Conclusions This work presented an evolutionary method to optimize the kNN algorithm. Unlike the classical approaches that use weights on fea- tures and instances of data, we focused on adjust the contribution of each one of the k-nearest neighbours. The results showed a satisfac- tory performance that was statistically supported. On the other hand, despite the use of multitasking architectures, optimization techniques and more specifically evolutionary heuris- tics usually consume more computational resources than lazy classi- fiers. Therefore it will be necessary to establish flexible and scalable parallelization pathways that are in line with the growing amount of data. In future works, we will focus in alternative metrics to accuracy to use in unbalanced data classification. Thus, precision, recall, false positive rate, false negative rate or F-measure could be considered as the objective function to optimize. In addition, Big Data is a clear target to research. Although there are not yet many studies about using kNN in Big Data, we will try to extrapolate our method to apply in massive datasets. Regression problems are approachable too. The kNN algorithm works effectively when the class to predict is numeric so our method can be used in a natural way. Finally, the strengths of our system were tested on 48 hetero- geneous domains and the obtained results encourage us to imple- ment future experiments on more specific problems. In this con- text, we have experience with remote sensing data (García-Gutierrez, Gonçalves-Seco, & Riquelme-Santos, 2011) and this approach could compliment the cited study in a new work. References AlSukker, A., Khushaba, R., & Al-Ani, A. (2010). Optimizing the k-NN metric weights us- ing differential evolution. In Proceedings of international conference on multimedia computing and information technology (MCIT), 2010 (pp. 89–92). Berikov, V. (2014). Weighted ensemble of algorithms for complex data clustering. Pat- tern Recognition Letters, 38, 99–106. lachnik, M., & Duch, W. (2011). LVQ algorithm with instance weighting for generation of prototype-based rules. Neural Networks, 24(8), 824–830.(Artificial Neural Net- works: Selected Papers from ICANN 2010). emšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30. udani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man and Cybernetics, 6(4), 325–327. shelman, L. J., & Schaffer, J. D. (1993). Real-coded genetic algorithms and interval- schemata. In D. L. Whitley (Ed.), Foundation of genetic algorithms 2 (pp. 187–202). San Mateo, CA: Morgan Kaufmann. ernandez, F., & Isasi, P. (2008). Local feature weighting in nearest prototype classifica- tion. IEEE Transactions on Neural Networks, 19(1), 40–53. arcía-Gutierrez, J., Gonçalves-Seco, L., & Riquelme-Santos, J. C. (2011). Auto- matic environmental quality assessment for mixed-land zones using lidar and intelligent techniques. Expert Systems with Applications, 38(6), 6805–6813. http://dx.doi.org/10.1016/j.eswa.2010.12.065. arcía, S., & Herrera, F. (2008). An Extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons. Journal of Machine Learning Research, 9, 2677–2694. ou, J., Du, L., Zhang, Y., & Xiong, T. (2012). A new distance-weighted k-nearest neighbor classifier. Journal of Information & Computational Science, 9(6), 1429–1436. Gou, J., Xiong, T., & Kuang, J. (2011). A novel weighted voting for k-nearest neighbor rule. Journal of Computers, 6(5), 833–840. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: an update. SIGKDD Explorations, 11(1). Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml. Mateos-García, D., García-Gutiérrez, J., & Riquelme-Santos, J. C. (2012). On the evolu- tionary optimization of k-NN by label-dependent feature weighting. Pattern Recog- nition Letters, 33(16), 2232–2238. doi:10.1016/j.patrec.2012.08.011. Mohemmed, A. W., & Zhang, M. (2008). Evaluation of particle swarm optimization based centroid classifier with different distance metrics. In Proceedings of IEEE congress on evolutionary computation’08 (pp. 2929–2932). Paredes, R., & Vidal, E. (2006). Learning weighted metrics to minimize nearest- neighbor classification error. IEEE Transactions on Pattern Analysis and Machine In- telligence, 28(7), 1100–1110. Park, C. H., & Kim, S. B. (2015). Sequential random k-nearest neighbor feature selec- tion for high-dimensional data. Expert Systems with Applications, 42(5), 2336–2342. http://dx.doi.org/10.1016/j.eswa.2014.10.044. Park, Y., Park, S., Jung, W., & Lee, S-g. (2015). Reversed CF: a fast collaborative filtering algorithm using a k-nearest neighbor graph. Expert Systems with Applications, 42(8), 4022–4028. http://dx.doi.org/10.1016/j.eswa.2015.01.001. Raymer, M., Doom, T., Kuhn, L., & Punch, W. (2003). Knowledge discovery in medical and biological datasets using a hybrid bayes classifier/evolutionary algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 33(5), 802–813. Raymer, M., Punch, W., Goodman, E., Kuhn, L., & Jain, A. (2000). Dimensionality reduc- tion using genetic algorithms. IEEE Transactions on Evolutionary Computation, 4(2), 164–171. Rodger, J. A. (2014). A fuzzy nearest neighbor neural network statistical model for predicting demand for natural gas and energy cost savings in pub- lic buildings. Expert Systems with Applications, 41(4, Part 2), 1813–1829. http://dx.doi.org/10.1016/j.eswa.2013.08.080. Tahir, M. A., Bouridane, A., & Kurugollu, F. (2007). Simultaneous feature selection and feature weighting using hybrid tabu search/k-nearest neighbor classifier. Pattern Recognition Letters, 28(4), 438–446. Zhi, X., Fan, J., & Zhao, F. (2014). Robust local feature weighting hard c-means clustering algorithm. Neurocomputing, 134, 20–29. http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0004 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0004 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0007 http://dx.doi.org/10.1016/j.eswa.2010.12.065 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0009 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0009 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0009 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0009 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012 http://archive.ics.uci.edu/ml http://dx.doi.org/10.1016/j.patrec.2012.08.011 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0015 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0015 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0015 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0015 http://dx.doi.org/10.1016/j.eswa.2014.10.044 http://dx.doi.org/10.1016/j.eswa.2015.01.001 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0019 http://dx.doi.org/10.1016/j.eswa.2013.08.080 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0021 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0021 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0021 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0021 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0021 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0022 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0022 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0022 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0022 http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0022 An evolutionary voting for k-nearest neighbours 1 Introduction 2 Method 2.1 Purpose and functionality 2.2 Voting optimization 2.2.1 Individual encoding 2.2.2 Crossover and mutation 2.2.3 Fitness function 2.2.4 Generational policy 3 Results 4 Conclusions References