An evolutionary voting for k-nearest neighbours


An evolutionary voting for k-nearest neighbours

Daniel Mateos-García, Jorge García-Gutiérrez, José C. Riquelme-Santos

Keywords:

Evolutionary computation

Nearest-neigbour

Weighted voting

a b s t r a c t

This work presents an evolutionary approach to modify the voting system of the k-nearest neighbours (kNN) rule we called 
EvoNN. Our approach results in a real-valued vector which provides the optimal relative con-tribution of the k-nearest 
neighbours. We compare two possible versions of our algorithm. One of them (EvoNN1) introduces a constraint on the 
resulted real-valued vector where the greater value is assigned to the nearest neighbour. The second version (EvoNN2) 
does not include any particular constraint on the order of the weights. We compare both versions with classical kNN and 4 
other weighted variants of the kNN on 48 datasets of the UCI repository. Results show that EvoNN1 outperforms EvoNN2 
and statistically obtains better results than the rest of the compared methods.

 
e  
m  
i  
2  
i  
A  
(  
m  
c

G

 
s  
t  
r  
c  
c

K

n

P

o

C

u

m

t

I

i

K

o

o

f

p

n

l

M

t

t

s

T

c

t

m

l

w

t

n

                      1. Introduction

Weighting in machine learning is a common technique for
mpha-sizing some characteristics of data to improve the resulting
odels. For example, weighting has been used to outline the

mportance of some particular instances (Blachnik & Duch,
011) or features (Zhi, Fan, & Zhao, 2014), or rank a set of techniques

n the context of en-sembles (Berikov, 2014). In a broad sense,
rtificial Neural Networks (ANNs) and Support Vector Machines

SVMs) can be also seen as ex-amples of using weights in learning
odels but the k-nearest neigh-bours (kNN) has been the most

ommon technique to benefit from weights (Mateos-García, García-
utiérrez, & Riquelme-Santos, 2012).

kNN and its variants have been widely used in the literature to
olve real problems. Rodger (2014) used a hybrid model to predict
he demand of natural gas. The system was implemented integrating
egression, fuzzy logic, nearest neighbour and neural networks, and
onsidering several variables such as the price, operating expenses,

ost to drill new wells, etc. If we focus on biological data, Park and 
im (2015) selected significant genes from microarrays by using a 
earest-neighbour-based ensemble of classifiers. On the other hand, 
ark, Park, Jung, and Lee (2015) tackled the problem of designing rec-
mmender systems. For this purpose the authors presented Reversed 
F (RCF), a fast item-based collaborative filtering algorithm which 
tilizes a k-nearest neighbour graph.

i

d

e

t

c

p

A

The main goal of a weighting system lies in the optimization (com-

only by metaheuristics) of a set of weights in the training step to ob-

ain the highest accuracy but trying not to overfit the resulting model.

f we focus on kNN weighting methods, many proposals weight-

ng features or instances can be found. In Raymer, Punch, Goodman,

uhn, and Jain (2000) a weighting method to obtain an optimal set

f features was provided. The set of features was selected by means

f a kNN-based genetic algorithm using a bit vector to indicate if a

eature was in the selection or not. In a later work, the same authors

resented a hybrid evolutionary algorithm using a Bayesian discrimi-

ant function (Raymer, Doom, Kuhn, & Punch, 2003) and trying to iso-

ate characteristics belonging to large datasets of biomedical origin.

oreover, Paredes and Vidal (2006) used different similarity func-

ions to improve the behaviour of the kNN. In a first approximation,

hey considered a weight by feature and instance on training data re-

ulting in a non-viable number of parameters in the learning process.

hen, the authors proposed three types of reduction: a weight by

lass and feature (label dependency), a weight by prototype (proto-

ype dependency) and a combination of the previous ones. The opti-

ization process was carried out by descendant gradient. In the same

ine, Tahir, Bouridane, and Kurugollu (2007) showed an approach that

as able to both select and weight features simultaneously by using

abu search. Furthermore, Mohemmed and Zhang (2008) presented a

earest-centroid-based classifier. This method calculated prototyp-

cal instances by considering arithmetic average from the training

ata. To classify an instance, the method calculated the distance to

very prototype and then selected the nearest one. The optimiza-

ion of the best centroids that minimized the classification error was

arried out through particle swarm. Fernandez and Isasi (2008) also

roposed a weighting system by using a prototype-based classifier.

fter a data normalization that was based on the position of the

http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2015.08.017&domain=pdf
mailto:mateosg@us.es
mailto:jorgarcia@us.es
mailto:riquelme@us.es
http://www.lsi.us.es


p

v

w

o

c

b

t

t

x

d

y

x

w

δ

e

e

t

m

t

n

s

b

s

a

t

2

t

s

m

n

2

w

n

v

s

E

i

u

i

i

m

instances with respect to regions, the weights were iteratively calcu-

lated. More recently, AlSukker, Khushaba, and Al-Ani (2010) used dif-

ferential evolution to find weights for different aspects of data. They

described four approaches: feature weighting, neighbour weighting,

class weighting and mixed weighting (features and classes), with the

latter being the one providing the best results.

Weighting has also been applied to the vote system of the kNN.

Thus, the distance-weighted k-nearest neighbour rule (WKNN) pro-

osed by Dudani (1976) has been known for long. WKNN weights the

otes of the k nearest neighbours (wi) according to Eq. (1) where di
is the distance of the ith nearest neighbour (being d1 the distance

of the nearest) regarding an instance to be classified. A similar ver-

sion using a uniform weighting (UWKNN) has also been proposed

where a weight is inversely proportional to the position reach among

the neighbours (i.e., wu
i

= 1/i). Recently, both techniques have been
explored working together as a new classifier called Dual-Weighted

kNN (DWKNN) showing promising results where each weight was

calculated according to Eq. (2) (Gou, Xiong, & Kuang, 2011). A later

ork of Gou, Du, Zhang, and Xiong (2012) provided another version

f DWKNN where the calculation of the weights were different ac-

ording to Eq. (3).

wwi =

⎧⎨
⎩

(dk − di)
(dk − d1)

if di �= d1

1 if di = d1
(1)

wdw1
i

= wwi ∗ wui (2)

wdw2
i

=

⎧⎨
⎩

(dk − di)
(dk − d1)

∗ (dk + d1)
(dk + di)

if di �= d1

1 if di = d1
(3)

Although all the previous approaches provided improvements re-

garding the classical kNN performance, they have not explored the

possible better suitability of evolutionary computation for the op-

timization of the neighbours weights. Thus, we propose an evolu-

tionary method to improve the kNN rule by altering the vote system

knowledge obtained in the training phase. We also explore the use

of constraints in learning weights with two different versions of our

approach and we study their reliability compared with classical kNN

and other 4 weighted variants on UCI datasets (Lichman, 2013). Fi-

nally, results are statistically validated to reinforce the final conclu-

sions.

The rest of the paper is organized as follows. Section 2 presents

the elements of the two versions of the evolutionary algorithm de-

signed to weight the vote system of the kNN. The results and several

statistical tests are specified in Section 3. Finally, Section 4 presents

the main findings and future work.

2. Method

In this section, two variants of our voting optimization system

called Evolutionary Voting of Nearest Neighbours (EvoNN) are de-

scribed.

2.1. Purpose and functionality

The aim of our work was to find a set of weights to modify the

influence of every nearest neighbour when they voted to assign a la-

bel to an unlabelled instance. Moreover, our approach also provided

a measurement about the influence of the proximity of neighbours

by means of the optimized weights. Thus, the evolutionary process

provides the optimal weights that have been found to improve the

classification accuracy of the domain under study.

To formalize our approach we assume that the set of classes (or

labels) L is represented by the natural numbers from 1 to b, with
b being the number of labels. Thus, let D = {(e, l) | e ∈ R f and l ∈
L = {1, 2, . . ., b}} be the dataset under study with f being the num-

er of features and b the number of labels. Let label be a function

hat assigns to every element e the real class to which it belongs

o. Let us suppose that D is divided in the sets TR and TS, each of

them being the training set and the testing set, respectively, such that

D = T R ∪ T S and T R ∩ T S = ∅. In the training step the classification er-
ror is minimized with participation only by instances from TR. This

classification error is calculated as follows. For each x ∈ TR we com-
pute its k nearest neighbours according to a distance function d. Let

i with i = 1. . .k be the neighbours to x but ranked by distance, i.e.:
(x, x1) ≤ d(x, x2). . . ≤ d(x, xk) and ∀y ∈ TR with y �= xi, d(x, xk) < d(x,
). According to the standard kNN rule, the prediction of the label of

from the labels of its neighbours can be formalized:

predLabel(x) = arg max
l∈{1..b}

k∑
i=1

δ(l, label(xi)) (4)

here

(l, label(xi)) =
{

1 if label(xi) = l
0 otherwise

(5)

If instead of a unitary vote, we consider that the ith neighbour

contributes with the weight wi ∈ R, the function (4) is redefined:

predLabel(x, w) = arg max
l∈{1..b}

k∑
i=1

ωiδ(l, label(xi)) (6)

The function to minimize is the sum of all prediction errors of ev-

ry instance x from TR. Thus, if we define the error function as

rror(x, w) =
{

1 if predLabel(x, w) �= label(x)
0 otherwise

(7)

hen the function to optimize is:

in
w∈Rk

∑
x∈T R

error(x, w) (8)

With regard to the two versions of the evolutionary algorithm,

hey were called EvoNN1 and EvoNN2. For EvoNN1, the nearest

eighbours (those with lowest distances regarding the unlabelled in-

tance) were “ heavier” and therefore, their influence (weight) had to

e greater. In EvoNN2 weights had no constraints. Regarding the de-

ign of the evolutionary algorithm, it is easy to suppose that EvoNN1

nd EvoNN2 were similar except on the encoding, crossover and mu-

ation.

.2. Voting optimization

This subsection details the search algorithm to calculate the op-

imum contribution of k nearest neighbours carried out by two ver-

ions of an evolutionary algorithm . It is then necessary to define their

ain characteristics i.e., individual encoding, genetic operators, fit-

ess function and generational replacement policy.

.2.1. Individual encoding

In EvoNN1 and EvoNN2, an individual was a real-valued vector

ith size k symbolizing the relative contribution of the k-nearest

eighbours in the voting system of the kNN rule. Position 1 in the

ector was associated with the nearest neighbour in distance and po-

ition k with the furthest. Moreover, a constraint was established in

voNN1 to assure that the closest neighbours were more important

.e., ω1 ≥ ω2 ≥ . . .ωk.
Regarding the initial population, both designs integrated individ-

als with k random values between 0 and 1 (ordered in EvoNN1). To

nclude individuals representing classic kNN, initial population also

ncluded several vectors with the first k values set to 1 and the re-

aining set to 0 in the initial population e.g., (1.0, 0.0, . . ., 0.0) for


1: Fitness(W, k, TR) : error
2: error = 0
3: for i = 1 to m do
4: Divide TR in s bags: B1...Bs
5: partialError = 0
6: for j = 1 to s do
7: partialError = partialError + Evaluate(W, k, TR − Bj, Bj)
8: end for
9: partialError = partialError/s

10: error = error + partialError
11: end for
12: error = error/m
13: return error

14: Evaluate(W, k, Train, Test) : error
15: error = 0
16: for each instTest in Test do
17: lab = NearestN(W, k, Train, instTest)
18: if lab = label(insTest) then
19: error = error + 1
20: end if
21: end for
22: error = error/size(Test)
23: return error

24: NearestN(W, k, Train, y) : labY
25: sortedInst and kNeighbours are empty sorted sets
26: for each x in Train do
27: insert x in sortedInst sorted by d(x, y)
28: end for
29: kNeighbours = sortedInst.get(k)
30: labY = majorityLabel(kNeighbours, W )
31: return labY

Fig. 1. Fitness function.

k

1

s

f

2

i

l

c

o

B

S

r

m

o

o

o

d

a

g

g

t

i

i

2

T

m

d

a

o

t

v

w

c

(

f

fi

d

s

(

fi

(

v

= 1, (1.0, 1.0, . . ., 0.0) for k = 2, etc. Finally, the maximum value of
for a weight could be surpassed during the evolutionary process to

tand out the importance of a concrete neighbour as explained in the

ollowing section.

.2.2. Crossover and mutation

The goal of the crossover operator was the generation of a new

ndividual (offspring) from the genotypic characteristics of a set of se-

ected parents (two in our case, parent1 and parent2). There is also a

onstraint in the order of the genes in EvoNN1. Thus, the crossover

perator in the ith gene for EvoNN1 was defined as in Eq. (9) where

LX − α stands for the crossover operator defined in Eshelman and
chaffer (1993) and calculated from parent1(i) and parent2(i), γ is a
andom value between 0 and 1, max = o f f spring(i − 1) and min =
inimum(parent1(i), parent2(i), o f f spring(i − 1)).

f f spring(i) =
{

BLX − α if i = 1
(max − min) ∗ γ + min otherwise

(9)

In EvoNN2, because the individuals were freely built, the crossover

perator was simpler, see Eq. (10):

f f spring(i) = BLX − α from parent1(i) and parent2(i)
(10)

Regarding the mutation operator, the ith gene of the individual in-

iv could change according to Eq. (11) in EvoNN1 where δ was initially
random value between 0 and 1. Then, to improve the fit through the

enerations, the δ value was in the interval [0, 1] during the first 10
enerations. For the next 10, it was in [0, 0.9], etc. On the other hand,
he mutation operator in EvoNN2 worked as can be seen in Eq. (12).

ndiv′(i) =

⎧⎪⎪⎨
⎪⎪⎩

indiv(i) + indiv(i) ∗ δ if i = 1
indiv(i) − indiv(i) ∗ δ if i = k
(indiv(i − 1) − indiv(i + 1)) ∗ δ

+indiv(i + 1) otherwise

(11)

ndiv′(i) = indiv(i) ± indiv(i) ∗ δ (12)

.2.3. Fitness function

EvoNN1 and EvoNN2 were defined by the same fitness function.

his is because the proposed evolutionary designs were thought to

inimize the classification error and therefore, the order of weights

id not therefore have any influence on the fitness function. Both

pproaches used the TR ⊂ D exclusively to obtain the contributions
f the neighbours in the training step. Since we know the labels of

he instances from TR, the fitness function was based on a cross-

alidation error rate by using kNN and the weighted voting system.

Fig. 1 shows the fitness calculation with m × s cross validations,
here m stands for the number of iterations of the validation pro-

ess (line 3) and s is the number of partitions of the training data TR

line 4). Thus, the set TR is randomly divided in the bags B1, B2...Bs
or each validation. Then, every bag Bj is evaluated through a classi-

cation process by using T R − B j as a training set. This evaluation is
riven by the function Evaluate which we will describe later. The clas-

ification error on every Bj is accumulated on average by partialError

lines 7 and 9), and by error in every validation (line 10). Finally, the

tness value is the result of calculating the mean of all the validations

line 12).

The input parameters of the function Evaluate are the weighted

ector W, the k value, and the subsets T R − B j and Bj (line 7).


a

i

a

e

a

w

t

f

t

W

2

d

t

r

i

Table 1

Comparison of accuracies of EvoNN1 and EvoNN2. In bold, the best.

Dataset EvoNN1 EvoNN2 Dataset EvoNN1 EvoNN2

anneal 97.98 97.95 heart h 86.78 86.04

arrhythmia 60.13 59.76 heart stat 78.38 78.13

audiology 67.12 65.97 hepatitis 83.44 81.26

australian 83.89 84.11 hypothyroid 93.06 93.04

autos 75.42 75.12 ionosphere 90.45 90.36

balance 86.27 86.23 iris 99.78 99.77

breast c. 71.27 69.64 kr vs kp 98.5 98.55

breast w. 97.15 97.17 labor 81.69 81.22

bridges v1 70.7 67.86 landsat 93.86 93.77

bridges v2 69.79 66.42 liver disorders 61.49 60.72

car 83.43 86.89 lung cancer 81.4 75.79

cmc 63.21 63.07 lymph 80.87 80.47

colic 79.67 79.55 mfeat factors 97.1 97.13

credit a 84.34 83.75 mfeat fourier 86.19 85.91

credit g 72.17 72.81 mfeat karhunen 98.04 98

cylinder b 80.19 80.19 mfeat morph. 97.24 97.31

dermatology 95.39 95.62 mfeat pixel 96.93 96.92

diabetes 71.73 71.6 mfeat zernike 92.05 92.05

ecoli 93.17 92.81 molecular bio. 38.34 32.47

flags 55.4 54.73 monks 1 47.39 54.79

glass 60.8 62.57 monks 2 47.12 54.35

haberman 72.77 68.49 monks 3 47.78 46.22

hayes 62.35 62.26 mushroom 100 100

heart c 82.52 82.14 nursery 95.91 96.22

i

t

d

p

e

o

e

t

d

r

g

m

W

t

u

m

f

T

c

s

t

(

t

c

s

d

u

g

i

p

t

a

t

t

b

Therefore, the result of this function is the error rate on Bj taking

T R − B j as reference to calculate the neighbours.
For every single instance from the set used to measure the fitness

(line 16), the returned label by the function NearestN is the majority

one according to the k nearest instances belonging to the set used as

training data (line 17). If the returned label does not correspond to the

real label of the testing instance, the error is increased by 1 (line 19).

Then, the resulting error is normalized with the size of the set used

as testing data (line 22). Therefore, the value returned by Evaluate is

real number between 0 (all instances are well-classified) and 1 (all

nstances are misclassified).

The function NearestN calculates the nearest instances to the ex-

mple y belonging to the set under evaluation (line 24 and seq.). Ev-

ry example of the neighbours bag is then inserted in a sorted set

ccording to the distance to y. Thus, the example at the first position

ill be the nearest to y and the one at the last position will be the fur-

hest (line 27). When the algorithm selects the k nearest neighbours

rom the sorted set (line 29), the majority label is returned according

o the relative contribution of each neighbour expressed by the vector

and according to the Eq. (6) (lines 30 and 31).

.2.4. Generational policy

Regarding the transition between generations, we chose an elitist

esign where the best individual was transferred from one genera-

ion to the next without being affected by the mutation operator. The

emaining population is built as follows: with N being the number of

ndividuals, C − 1 individuals were created by cloning the best indi-
vidual from the previous generation, and the next N − C individuals
resulted from the crossover operation. The selection of the individu-

als to cross was carried out by the tournament method. Furthermore,

all individuals except the first one were under mutation with a prob-

ability of p.

3. Results

Applying optimization techniques to a kNN classifier is a line of re-

search that has been widely discussed in the literature with consider-

able success. Finding an optimal weighting (for features or instances)

increases the degrees of freedom of kNN. This allows the model to

achieve a better fit in the training step and consequently improve the

accuracy in testing. The novelty of this work lies in that we do not

weight features or instances but the influence of each of the k near-

est neighbours. In the traditional kNN model, the selected neighbours

participate with a unitary vote. Our approach modifies the weight of

the votes to modulate the contribution of each neighbour. Thus, our

approach introduces a flexibility mechanism able to “ learn” geomet-

ric arrangements of the different classes that depend on the domain

under study. For example, if we focus on a dataset where the first

neighbour is twice as important as the second one, and the second

neighbour is in turn twice as important as the third, fourth and fifth

one, we know that the traditional kNN is not able to exploit this sin-

gularity but our proposal could suggest a weighted vector (4,2,1,1,1)

which would be an optimal solution.

To assess the quality of our approaches, we use 48 datasets

from the repository UCI (Lichman, 2013) with different types of fea-

tures and number of classes. All data were preprocessed with the

same techniques i.e., binarization of nominal features, replacement of

missing values and normalization to avoid the Hughes effect. Regard-

ing the evolutionary search configuration and after a trial-and-error

process, we used a population of 100 individuals, 100 generations,

10% of elitism and a mutation probability of 0.1. Regarding the pa-

rameters α (crossover), g (mutation) and k (number of neighbours)
their values were set at 0.5, 20 and 5 respectively for both EvoNN1

and EvoNN2.

Although the experiments were carried out on four Intel®

Xeon® Processor E7-4820, an increase of the computational cost
s usually derived when using evolutionary computation. Thus,

he CPU time for the optimization process can be expressed as
#Generations×#Individuals×kNN time

#threads
where the number of available threads

epends on the workload. Taking the German Credit Data as an exam-

le (“credit g” in Table 1 with 1000 instances and 20 features), the

volutionary algorithm took about thirteen and a half minutes to run

nce.

In a first level, a comparison between EvoNN1 and EvoNN2 was

stablished. Table 1 shows the results obtained. Every row represents

he mean accuracy after five 10-fold cross-validations (10FCV) with

ifferent random seeds on a dataset. We provided the mean accu-

acy to decrease the influence of randomness in the evolutionary al-

orithms. The results show that EvoNN1 obtained better results for

ost of the datasets.

After testing the performance of both versions, we applied the

ilcoxon signed-rank test to try to assure that the differences be-

ween the two versions were statistically significant. The reason for

sing a non-parametric test lies in the fact that the results did not

eet the necessary conditions to apply parametric tests, especially

or the sphericity condition (Demšar, 2006; García & Herrera, 2008).

he statistic for Wilcoxon was 290.0 and the p-Value 0.0062, so we

ould state that they behaved significantly different from each other.

To measure the quality of our best approach, we established a

econd level of comparison among IBk (implementation of kNN in

he framework WEKA (Hall et al., 2009)), WKNN, UKNN, DWKNNv1

Gou et al., 2011) and DWKNNv2 (Gou et al., 2012), and EvoNN1. All

he weighting algorithms were developed from the original IBk but

hanging its voting system, and keeping the same k value of 5. Table 2

hows the mean accuracy obtained by the analyzed algorithms. Every

ataset was evaluated for each technique as the mean of five 10FCV

sing five different seeds. As can be seen, the performance of our al-

orithm was the best in 25 out of the 48 datasets, and the second best

n 7 out of the remaining 23.

Although our approach seemed to outperform the rest of com-

etitors, the results were statistically validated again to reinforce

his conclusion. Thus, we carried out a non-parametric Friedman test

nd a Holm post-hoc procedure to elucidate if the performance of

he different algorithms was statistically different. The first step for

he Friedman test is the calculation of the mean rankings reached

y each technique (a ranking of 1 is the best). Table 3 shows the


Table 2

Accuracy of every studied algorithm throughout 48 datasets from UCI.

Datasets EvoNN1 IBk WKNN UWKNN DWKNN1 DWKNN2

anneal 97.98 97.1 97.85 98.43 99.14 98.23

arrhythmia 60.13 59.39 57.53 58.45 54.9 57.23

audiology 67.12 60.73 68.4 68.78 66.48 68.44

australian 83.89 84.38 82.16 83.46 80.4 82.11

autos 75.42 59.79 73.22 75.87 76.19 76.13

balance 86.27 88.28 86.81 82.65 82.01 86.81

breast c. 71.27 73.86 69.63 70.95 68.12 69.63

breast w. 97.15 97 96.09 96.36 95.39 96.04

bridges v1 70.7 58.34 65.11 64.93 64.08 65.24

bridges v2 69.79 60.47 62.75 63.37 62.15 62.75

car 83.43 93.13 93.13 88.16 88.16 93.13

cmc 63.21 45.83 45.26 45.37 44.16 44.56

colic 79.67 79.2 73.7 77.32 70.79 73.5

credit a 84.34 83.46 81.41 83.65 80.12 81.47

credit g 72.17 72.59 71.87 72.9 71.33 71.79

cylinder b 80.19 72.29 78.7 77.37 79.16 78.81

dermatology 95.39 96.26 96.32 95.79 95.16 96.32

diabetes 71.73 74.72 72.67 72.91 70.59 72.54

ecoli 93.17 86.49 82.84 83.78 80.24 82.18

flags 55.4 54.31 59.12 60.31 57.93 59.12

glass 60.8 66.13 70.27 68.05 69.49 70.07

haberman 72.77 71.08 70.46 67.55 64.92 69.77

hayes 62.35 27.66 67.52 67.47 74.62 67.52

heart c 82.52 83.31 80.6 80.14 76.66 80.27

heart h 86.78 79.41 78.56 78.56 76.67 78.54

heart stat 78.38 78.36 76.59 79.39 75.7 76.31

hepatitis 83.44 82.67 81.8 83.91 79.95 80.64

hypothyroid 93.06 93.25 92.62 93.13 91.2 92.5

ionosphere 90.45 85.61 87.46 86.33 86.89 87.62

iris 99.78 95.59 95.19 95.66 95.66 95.19

kr vs kp 98.5 96.2 95.95 95.12 93.7 95.95

labor 81.69 80.07 84.01 85.78 85.67 84.13

landsat 93.86 90.29 90.53 90.37 90.13 90.56

liver disorders 61.49 58.36 61.28 60.57 59.34 61.01

lung cancer 81.4 80.95 71.44 78.02 67.87 71.44

lymph 80.87 78.53 82.82 84.97 82.37 82.85

mfeat factors 97.1 96.56 96.69 96.51 96.08 96.68

mfeat fourier 86.19 81.56 81.02 81.26 80.07 80.94

mfeat karhunen 98.04 96.11 96.8 96.75 96.18 96.77

mfeat morph. 97.24 71.08 68.03 68.44 65.9 67.7

mfeat pixel 96.93 95.92 96.71 96.78 96.36 96.7

mfeat zernike 92.05 80.53 79.09 79.39 79.03 79.07

molecular bio. 38.34 32.57 33.56 34.76 35.24 33.56

monks 1 47.39 52.25 35.8 39.51 37.19 35.8

monks 2 47.12 54.13 38.27 40.78 38.03 38.27

monks 3 47.78 38.48 35.28 37.59 38.93 35.45

mushroom 100 100 100 100 100 100

nursery 95.91 98.36 93.63 96.1 93.23 93.63

Table 3

Mean ranking reached by every com-

pared technique.

Technique Ranking

EvoNN1 2.270

UWKNN 3.062

IBk 3.458

WKNN 3.594

DWKNN2 3.781

DWKNN1 4.833

r

s

w

2

c

c

c

m

Table 4

Results for Holm post-hoc procedure.

DataSet p z Holm’s adjusted α

DWKNNv1 1.94E−11 6.710 0.010
DWKNNv2 7.65E−5 3.955 0.013
WKNN 5.32E−4 3.464 0.017
IBk 0.002 3.109 0.025

UWKNN 0.038 2.073 0.050

F

(

t

(

s

v

b

a

b

c

ankings in our case. Then, we carried out the Friedman test. The

tatistic for Friedman was 48.95, distributed according to a chi-square

ith 5 degrees of freedom. The p-Value for Friedman was around

.302E−9 so the null hypothesis (no statistical difference among the
ompared techniques) could be rejected with an α = 0.05.

Because Friedman’ test should be accompanied by a post-hoc pro-

edure for multiple pairwise comparison, we used the Holm pro-

edure. This procedure took our approach (EvoNN1) as the control

ethod and compared it with each other technique by a pairwise
riedman test. The results of the procedure can be seen in Table 4

p-Value, Friedman statistic and adjusted α for Holm procedure). In
his case, every test rejected the hypothesis of no pairwise difference

p-Value below adjusted α), so we could state that our algorithm was
ignificantly better than its competitors from a statistical point of

iew taking into account that it obtained the best ranking and also

ehaved statistically different from the rest.

From our experimentation, some facts should be outlined. First,

n evolutionary algorithm to adjust the contribution of the neigh-

ourhood improved the behaviour of the classical kNN rule. This fact

ould not completely be assured for the rest of weighting techniques


B

D

D

E

F

G

G

G

probably due to a better adaptability of metaheuristics compared

with strict functions as the suggested by DWKNN or UWKNN.

Second, the correct working of the evolutionary approach was

highly correlated with the distance of the weighted neighbours to the

unlabelled instances in the training phase. In other words, although

EvoNN2 had more degrees of freedom in the search space, EvoNN1

reached the best performance. Although this proved true, we think

this fact can be related to problems in the evolutionary search in

EvoNN2 so more work could be needed to make it reach its poten-

tially better weights.

4. Conclusions

This work presented an evolutionary method to optimize the kNN

algorithm. Unlike the classical approaches that use weights on fea-

tures and instances of data, we focused on adjust the contribution of

each one of the k-nearest neighbours. The results showed a satisfac-

tory performance that was statistically supported.

On the other hand, despite the use of multitasking architectures,

optimization techniques and more specifically evolutionary heuris-

tics usually consume more computational resources than lazy classi-

fiers. Therefore it will be necessary to establish flexible and scalable

parallelization pathways that are in line with the growing amount of

data.

In future works, we will focus in alternative metrics to accuracy

to use in unbalanced data classification. Thus, precision, recall, false

positive rate, false negative rate or F-measure could be considered as

the objective function to optimize.

In addition, Big Data is a clear target to research. Although there

are not yet many studies about using kNN in Big Data, we will try to

extrapolate our method to apply in massive datasets.

Regression problems are approachable too. The kNN algorithm

works effectively when the class to predict is numeric so our method

can be used in a natural way.

Finally, the strengths of our system were tested on 48 hetero-

geneous domains and the obtained results encourage us to imple-

ment future experiments on more specific problems. In this con-

text, we have experience with remote sensing data (García-Gutierrez,

Gonçalves-Seco, & Riquelme-Santos, 2011) and this approach could

compliment the cited study in a new work.

References

AlSukker, A., Khushaba, R., & Al-Ani, A. (2010). Optimizing the k-NN metric weights us-

ing differential evolution. In Proceedings of international conference on multimedia
computing and information technology (MCIT), 2010 (pp. 89–92).
Berikov, V. (2014). Weighted ensemble of algorithms for complex data clustering. Pat-
tern Recognition Letters, 38, 99–106.

lachnik, M., & Duch, W. (2011). LVQ algorithm with instance weighting for generation
of prototype-based rules. Neural Networks, 24(8), 824–830.(Artificial Neural Net-

works: Selected Papers from ICANN 2010).
emšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal

of Machine Learning Research, 7, 1–30.
udani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions

on Systems, Man and Cybernetics, 6(4), 325–327.

shelman, L. J., & Schaffer, J. D. (1993). Real-coded genetic algorithms and interval-
schemata. In D. L. Whitley (Ed.), Foundation of genetic algorithms 2 (pp. 187–202).

San Mateo, CA: Morgan Kaufmann.
ernandez, F., & Isasi, P. (2008). Local feature weighting in nearest prototype classifica-

tion. IEEE Transactions on Neural Networks, 19(1), 40–53.
arcía-Gutierrez, J., Gonçalves-Seco, L., & Riquelme-Santos, J. C. (2011). Auto-

matic environmental quality assessment for mixed-land zones using lidar

and intelligent techniques. Expert Systems with Applications, 38(6), 6805–6813.
http://dx.doi.org/10.1016/j.eswa.2010.12.065.

arcía, S., & Herrera, F. (2008). An Extension on “Statistical Comparisons of Classifiers
over Multiple Data Sets” for all Pairwise Comparisons. Journal of Machine Learning

Research, 9, 2677–2694.
ou, J., Du, L., Zhang, Y., & Xiong, T. (2012). A new distance-weighted k-nearest neighbor

classifier. Journal of Information & Computational Science, 9(6), 1429–1436.

Gou, J., Xiong, T., & Kuang, J. (2011). A novel weighted voting for k-nearest neighbor
rule. Journal of Computers, 6(5), 833–840.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The
WEKA data mining software: an update. SIGKDD Explorations, 11(1).

Lichman, M. (2013). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Mateos-García, D., García-Gutiérrez, J., & Riquelme-Santos, J. C. (2012). On the evolu-

tionary optimization of k-NN by label-dependent feature weighting. Pattern Recog-

nition Letters, 33(16), 2232–2238. doi:10.1016/j.patrec.2012.08.011.
Mohemmed, A. W., & Zhang, M. (2008). Evaluation of particle swarm optimization

based centroid classifier with different distance metrics. In Proceedings of IEEE
congress on evolutionary computation’08 (pp. 2929–2932).

Paredes, R., & Vidal, E. (2006). Learning weighted metrics to minimize nearest-
neighbor classification error. IEEE Transactions on Pattern Analysis and Machine In-

telligence, 28(7), 1100–1110.

Park, C. H., & Kim, S. B. (2015). Sequential random k-nearest neighbor feature selec-
tion for high-dimensional data. Expert Systems with Applications, 42(5), 2336–2342.

http://dx.doi.org/10.1016/j.eswa.2014.10.044.
Park, Y., Park, S., Jung, W., & Lee, S-g. (2015). Reversed CF: a fast collaborative filtering

algorithm using a k-nearest neighbor graph. Expert Systems with Applications, 42(8),
4022–4028. http://dx.doi.org/10.1016/j.eswa.2015.01.001.

Raymer, M., Doom, T., Kuhn, L., & Punch, W. (2003). Knowledge discovery in medical

and biological datasets using a hybrid bayes classifier/evolutionary algorithm. IEEE
Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 33(5), 802–813.

Raymer, M., Punch, W., Goodman, E., Kuhn, L., & Jain, A. (2000). Dimensionality reduc-
tion using genetic algorithms. IEEE Transactions on Evolutionary Computation, 4(2),

164–171.
Rodger, J. A. (2014). A fuzzy nearest neighbor neural network statistical model

for predicting demand for natural gas and energy cost savings in pub-
lic buildings. Expert Systems with Applications, 41(4, Part 2), 1813–1829.

http://dx.doi.org/10.1016/j.eswa.2013.08.080.

Tahir, M. A., Bouridane, A., & Kurugollu, F. (2007). Simultaneous feature selection and
feature weighting using hybrid tabu search/k-nearest neighbor classifier. Pattern

Recognition Letters, 28(4), 438–446.
Zhi, X., Fan, J., & Zhao, F. (2014). Robust local feature weighting hard c-means clustering

algorithm. Neurocomputing, 134, 20–29.

http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0001
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0001
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0001
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0001
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0001
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0002
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0002
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0003
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0003
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0003
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0003
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0004
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0004
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0005
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0005
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0006
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0006
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0006
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0006
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0007
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0007
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0007
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0007
http://dx.doi.org/10.1016/j.eswa.2010.12.065
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0009
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0009
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0009
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0009
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0011
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0011
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0011
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0011
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0011
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0012
http://archive.ics.uci.edu/ml
http://dx.doi.org/10.1016/j.patrec.2012.08.011
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0014
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0014
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0014
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0014
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0015
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0015
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0015
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0015
http://dx.doi.org/10.1016/j.eswa.2014.10.044
http://dx.doi.org/10.1016/j.eswa.2015.01.001
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0018
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0018
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0018
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0018
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0018
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0018
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0019
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0019
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0019
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0019
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0019
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0019
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0019
http://dx.doi.org/10.1016/j.eswa.2013.08.080
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0021
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0021
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0021
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0021
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0021
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0022
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0022
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0022
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0022
http://refhub.elsevier.com/S0957-4174(15)00562-X/sbref0022

	An evolutionary voting for k-nearest neighbours
	1 Introduction
	2 Method
	2.1 Purpose and functionality
	2.2 Voting optimization
	2.2.1 Individual encoding
	2.2.2 Crossover and mutation
	2.2.3 Fitness function
	2.2.4 Generational policy


	3 Results
	4 Conclusions
	 References