Error curves for evaluating the quality of feature rankings


Error curves for evaluating the quality of
feature rankings
Ivica Slavkov1, Matej Petković1,2, Pierre Geurts3, Dragi Kocev1,2 and
Sašo Džeroski1,2

1 Jozef Stefan Institute, Ljubljana, Slovenia
2 Jozef Stefan International Postgraduate School, Ljubljana, Slovenia
3 Université de Liège, Liège, Belgium

ABSTRACT
In this article, we propose a method for evaluating feature ranking algorithms.
A feature ranking algorithm estimates the importance of descriptive features when
predicting the target variable, and the proposed method evaluates the correctness
of these importance values by computing the error measures of two chains of
predictive models. The models in the first chain are built on nested sets of top-ranked
features, while the models in the other chain are built on nested sets of bottom ranked
features. We investigate which predictive models are appropriate for building
these chains, showing empirically that the proposed method gives meaningful results
and can detect differences in feature ranking quality. This is first demonstrated on
synthetic data, and then on several real-world classification benchmark problems.

Subjects Algorithms and Analysis of Algorithms, Artificial Intelligence,
Data Mining and Machine Learning
Keywords Feature ranking, Error curves, Evaluation

INTRODUCTION
In the era of data abundance, we face high-dimensional problems increasingly often.
Sometimes, prior to applying predictive modeling (e.g., classification) algorithms to such
problems, dimensionality reduction may be necessary for a number of reasons, including
computational reasons. By keeping only a limited number of descriptors (features), a
classifier can also achieve better predictive performance, since typically, a portion of
the features strongly influence the target variable, and the others can be understood as
(mostly) noise. This dimensionality reduction corresponds to the task of feature selection
(Guyon et al., 2002). A task related to it is feature ranking. This is a generalization of feature
selection where, in addition to simply telling apart relevant features from irrelevant
ones (Nilsson et al., 2007), one also assesses how relevant are they for predicting the target
variable.

In machine learning, feature ranking is typically seen either as a preprocessing or as
a postprocessing step. In the former case, one actually tackles the feature selection problem
by first computing the feature relevance values, and then keeping only the features
whose relevance is above some user defined threshold. In the second case, feature ranking
is obtained after building a predictive model in order to explain it, for example (Arceo-
Vilas et al., 2020). For black box models, such as neural networks, this may be the only way
to understand their predictions.

How to cite this article Slavkov I, Petković M, Geurts P, Kocev D, Džeroski S. 2020. Error curves for evaluating the quality of feature
rankings. PeerJ Comput. Sci. 6:e310 DOI 10.7717/peerj-cs.310

Submitted 12 August 2020
Accepted 1 October 2020
Published 7 December 2020

Corresponding author
Matej Petković, matej.petkovic@ijs.si

Academic editor
Sebastian Ventura

Additional Information and
Declarations can be found on
page 36

DOI 10.7717/peerj-cs.310

Copyright
2020 Slavkov et al.

Distributed under
Creative Commons CC-BY 4.0

http://dx.doi.org/10.7717/peerj-cs.310
mailto:matej.�petkovic@�ijs.�si
https://peerj.com/academic-boards/editors/
https://peerj.com/academic-boards/editors/
http://dx.doi.org/10.7717/peerj-cs.310
http://www.creativecommons.org/licenses/by/4.0/
http://www.creativecommons.org/licenses/by/4.0/
https://peerj.com/computer-science/


In some application domains, such as biology or medicine, feature ranking may be the
main point of interest. If we are given data about the expression of numerous genes, for a
group of patients, and the patients’ clinical state (diseased/healthy), one can find good
candidate genes that influence the health status of the patients, which gives us a deeper
understanding of the disease.

Due to the prominence of the feature ranking task, there exist many feature ranking
methods. Simpler methods assess the relevance of each feature independently ignoring the
other features (χ2 statistics, mutual information of the feature and the target variable)
and their possible interactions. A prominent example that shows the myopic nature of
such approaches is the case when the target variable y is defined as y = XOR (x1, x2)
where x1 and x2 are two binary features. Ignoring x1 when computing the relevance of x2
(and vice-versa) would result in assessing x1 as completely irrelevant, that is, as random
noise. More sophisticated methods assess relevance of each feature in the context of
the others. They are typically based on some predictive model, for example, Random
Forest feature ranking (Breiman, 2001), or optimization problem (Nardone, Ciaramella &
Staiano, 2019), but not necessarily, cf. for example, ReliefF (Robnik-Šikonja & Kononenko,
2003) and the work of Li & Gu (2015).

However, there is no unified definition of feature importance, and actually, every feature
ranking algorithm comes with its own (implicit) definition. Therefore, different methods
typically introduce different feature importance scores: Deciding which of them is the
best is a very relevant, but also very challenging task that we would like to address in this
article. More precisely, we continue and extend our previous work (Slavkov et al., 2018),
where we proposed and evaluated a quantitative score for the assessment of feature
rankings. Here, we propose a new feature ranking evaluation method that can evaluate
feature rankings in a relative sense (deciding which of the feature rankings is better), or
in an absolute sense (assessing how good is a feature ranking). The method is based on
constructing two chains of predictive models that are built from the top-ranked and
bottom-ranked features. The predictive performances of the models in the chain are then
shown on graphs of so called forward feature addition (FFA) and reverse feature addition
(RFA) curves, which reveal how the relevant features are distributed in the ranking(s).
An important property of the proposed method is that it does not need any prior ground
truth knowledge of the data.

We investigate the performance of the proposed evaluation approach under a range
of scenarios. To begin with, we prove the potential of the FFA and RFA curves by using
them in setting which employs synthetic data. Next, we investigate the use of different
types of predictive models for constructing the curves, thus considerably extending the
preliminary experiments by Slavkov et al. (2018). Furthermore, we apply the proposed
evaluation approach to a large collection of benchmark datasets. Compared to Slavkov
et al. (2018), we have included 11 new high-dimensional datasets. The results of the
evaluation, in a nutshell, show that the FFA and RFA curves are able to discern the best
ranking among multiple proposed feature rankings.

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 2/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


The remainder of this article is organized as follows. “Related Work” outlines related
work, “Method for Evaluating Feature Rankings” describes in detail the proposed method
for constructing error curves. Next, “Empirical Evaluation of FFA/RFA Curve Properties”
discusses the properties of the error curves when applied to synthetic data. We then
give the results of the experimental evaluation on benchmark datasets in “Feature
Ranking Comparison on Real-Worlds Datasets”. “Conclusions” concludes with a
summary of our contributions and an outline of possible directions for further work. In the
appendices, we give additional information about generating synthetic data (Appendix
A1), measuring distance between rankings (Appendix A2), and comparative evaluation of
feature ranking methods (Appendix A3). In Appendix A4, detailed experimental results
are given.

RELATED WORK
The evaluation of feature rankings is a complex and unsolved problem. Typically, feature
rankings are evaluated on artificially generated problems, while evaluation on real
world problems remains an issue approached indirectly. To begin with, when the ground
truth ranking is known, one can transform the problem of feature ranking evaluation
into an evaluation of classification predictive model (Jong et al., 2004) as follows. First,
a ranking is computed. Then, for every threshold, the numbers of relevant features
(true positives) and irrelevant features (false positives) with the feature relevance above the
threshold are computed. From these values, a ROC curve can be created and the area
underneath it computed.

Another possible approach is to compute separability (Robnik-Šikonja & Kononenko,
2003), that is, the minimal difference between the feature importance of a relevant feature
and the feature importance of an irrelevant feature. If this difference is positive, then the
relevant features are separated from the irrelevant ones, otherwise they are mixed.

However, both approaches are more applicable to feature selection problems and are too
coarse for feature rankings problem, since they only differentiate between relevant and
irrelevant features. Spearman’s rank correlation coefficient between the computed and the
ground truth ranking might be more appropriate.

The main shortcoming of the upper approaches is that they demand the ground truth
ranking. In real world scenarios, this is not known, which makes the upper approaches
useless. Nevertheless, using synthetic data and the controlled environment offers a good
starting point for showing the usefulness of a feature ranking evaluation method, as
we shall also see later.

An approach that overcomes the issue of unknown ground truth ranking bases on
selecting k top-ranked features and building a predictive model that uses only these
features to predict target variable. The ranking whose top-ranked features result in the
model with the highest predictive performance, is proclaimed the best. Since it is now
always clear which value of k should be chosen, this can be done for multiple values of
k (Guyon et al., 2002; Furlanello et al., 2003; Paoli et al., 2005; Verikas, Gelzinis &
Bacauskiene, 2011).

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 3/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


In addition to correctness, rankings stability is sometimes also part of the evaluation.
The stability of a ranking algorithm can be measured by comparing the feature rankings
obtained, for example, from the different bootstrap replicates of a dataset or from the folds
in cross-validation (Guzmán-Martnez & Alaiz-Rodrguez, 2011; Kalousis, Prados & Hilario,
2007; Jurman et al., 2008). In Saeys, Abeel & De Peer (2008) both stability and predictive
performance are combined into a single feature ranking quality index.

Also, notions similar to FFA curves (without any particular name, though) as the
feature ranking evaluation method can be found in the literature (Liu et al., 2003; Duch,
Wieczorek & Biesiada, 2004; Biesiada et al., 2005; Liang, Yang & Winstanley, 2008).
However, to the best of our knowledge, there is no discussion and detailed investigation
why FFA curves are an appropriate method for comparing feature rankings, nor which
learning methods should (or should not) be used for constructing them.

A METHOD FOR EVALUATING FEATURE RANKINGS
First of all, every feature ranking method should be able to tell apart relevant features from
irrelevant ones (Nilsson et al., 2007). In addition to that, the method should order the
features with respect to the target variable, awarding the most relevant ones the top ranks.

If ground truth ranking exists, the method should return this ranking in the optimal
case. The worst case is more complicated and has two possible answers. One is the inverse
of the ground truth ranking. However, since the ground truth ranking is typically not
known in real-world scenarios, a more useful definition of the worst ranking is random
ranking. This ranking also contains as little information about the distribution of the
relevant features in the ranking as possible. Moreover, this distribution can be always
assessed and is the cornerstone of our ranking evaluation method.

The evaluation method
First, we define the notation used in the rest of the article: D denotes a dataset whose
columns are input features Fi that form a set F, and the target feature Ft. A feature ranking
method takes the dataset as an input, and returns a list R ¼ ðFð1Þ; . . . ; FðnÞÞ as the output,
where F(i) is the feature with the rank i.

We evaluate a ranking R by evaluating different subsets S of features F. This is done by
building a predictive model MðS; FtÞ and assessing its predictive power. The evaluation
of the predictive model provides a cumulative assessment of the information contained
in the feature set S and it can be quantified with an error measure errðMðS; FtÞÞ.
The question is how to generate the feature subsets from the feature ranking, so that the
error estimates can provide insight into the correctness of the feature ranking and
constitute an evaluation thereof.

The construction of the feature subsets should be guided by the feature ranking R.
Starting from the top ranked feature F(1) and going towards the bottom ranked
feature F(n), the feature relevance should decrease. Following this basic intuition,
we propose two methods for constructing feature subsets from the feature ranking: FFA
and RFA.

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 4/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


Forward feature addition constructs the feature subsets Si by considering the i highest
ranked features, starting with S1 = {F(1)}. The next set S

i+1 is constructed by adding the
next lower-ranked feature, namely Si+1 = Si ∪ {F(i+1)}. The process continues until i = n
and Sn contains all of the features from R.

Reverse feature addition produces feature sets Si constructed in an opposite manner to
FFA. We start with S1 ¼ fFðnÞg that contains only the lowest ranked feature. The next
feature set Si+1 is constructed by adding the lowest-ranked feature which is not in Si,
namely Si+1 = Si ∪ {F(n−i)}. In the same way as for FFA, the process of RFA continues until
we include all of the features, that is, Sn = F.

Note that FFA can be viewed as backward feature elimination. Starting from Sn = F, at
each step we remove the least relevant feature from Si to obtain Si−1. Similarly, RFA can be
viewed as forward feature elimination. Finally, it holds that F = Sn−i ∪ Si for all i.

For each i and each constructed feature subset Si or Si, we build predictive models

MðSi; FtÞ and MðSi; FtÞ. We then estimate their respective prediction errors, erri and erri.
This results in two error curves. We name them FFA and RFA curves, each constructed
by the corresponding FFA/RFA feature subset construction method. The value for
each point of the FFA curve is defined as FFA(i) = erri, while for the RFA curve as RFA(i) =
erri. The process of FFA/RFA curve construction is summarized in Algorithm 1.

The computational complexity of the proposed algorithm for constructing a single (FFA
or RFA) curve is OðnðM þ TÞÞ, where n is the number of features, M = M(n) is the
cost of constructing the predictive model and T = T(n) is the cost of its evaluation. It
should be noted that M and T are dependent on the specific learning method used for
inducing the model and on the procedure used for evaluating it.

Typically, the points FFA(i), FFA(i + 1), ·, do not differ considerably, for i large enough,
since it expected that only a small proportion of the features is relevant when the data
is high-dimensional. This means that we can make the algorithm more efficient if we
construct the set Si+δ(i) from the set Si by including δ(i) features into it. Analogously,
we speed up the construction of the RFA curves.

Algorithm 1 Generation of the FFA and RFA curves.

Input: Feature Ranking R = (F(1),…,F(n)), Target Feature Ft, type of curve (FFA or RFA)

S ) Ø

E ) list of length n

for i = 1,2, …,n do

if curve type is FFA then

S ) S ∪ {F(i)}

else

S ) S ∪ {F(n−i+1)}

E[i] ) err (M(S, Ft))

return E

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 5/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


Interpretation of error curves
The visualization and interpretation of the FFA and RFA curves can be explained by
considering the examples of FFA and RFA curves given in Fig. 1. The y-axis for both curves
is the same and depicts the error estimate of a feature subset. Point i at the x-axis
corresponds to the moment when the feature F(i) is first included in the predictive model:
Si for the FFA curve and Sn−i+1 for the RFA curve. Thus, the FFA curve in Fig. 1 is
constructed from left-to-right as the top-ranked features are at the beginning of the
ranking. In contrast, the RFA curve is constructed from right-to-left starting with the end
of the ranking and going towards its beginning.

Let us first focus on the FFA curve. We can observe that as the number k of features
increases, the accuracy of the predictive models also increases. This can be interpreted
as follows: By adding more and more of the top-k ranked features, the number of relevant
features in the constructed feature subsets increases, which is reflected in the improvement
of the accuracy (error) measure.

Next, for the RFA curve in Fig. 1, if we inspect it from right-to-left, we can notice that
it is quite different from the FFA curve at the beginning. Namely, the accuracy of the
models constructed with the bottom ranked features is minimal, which means the ranking
is correct in the sense that it puts only irrelevant features at the bottom of the ranking.
As the number of bottom-k features increases, some relevant features are included and the
accuracy of the models increases.

We now consider the complete Fig. 1. The FFA and RFA curve essentially provide
an estimate of how the relevant features are spread throughout the feature ranking.
Namely, the FFA curve provides us with an estimate of where the relevant features appear
at the top of the ranking, while the RFA curve provides an estimate of where relevant

●

●

●

●

●

●

●
●

● ● ● ● ● ● ● ● ● ● ● ●

5 10 15 20

0
.5

0
.6

0
.7

0
.8

ranking

a
cc

u
ra

cy

● ● ● ●
●

●

●

●

●

●

●

●

● ● ● ● ● ● ● ●

●

●

FFA
RFA

Figure 1 Sample FFA and RFA curves. Full-size DOI: 10.7717/peerj-cs.310/fig-1

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 6/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-1
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


features appear at the bottom of the ranking. In the specific case depicted in Fig. 1, the
relevant features are located between the 1st and the 13th ranked feature.

Besides providing an estimate of the spread of the relevant features across the feature
ranking, the real utility of the FFA/RFA curves becomes apparent if we consider them in a
relative, or more precisely, a comparative context. Let us consider two arbitrary feature
ranking methods rA and rB, which produce feature rankings RA and RB, respectively.
For these two rankings, we present the corresponding FFA/RFA curves in Fig. 2.

We first inspect the FFA curves visually. We find that the values of the FFA curve of
the ranking method rA are (most of the time) above the FFA curve of the other ranking
method rB. This can be interpreted in the following way: for an arbitrary k, when
considering the top-k features of the feature rankings RA and RB, more relevant features
are included in the top-k features of ranking RA than the top-k features of ranking RB.
This implies that ranking algorithm rA produces a better ranking as compared to the
ranking algorithm rB.

A similar discussion applies to the RFA curve. When one considers the bottom-k
features of a given feature ranking, most of the time, feature ranking RA includes less
relevant features than feature ranking RB, that is, the predictive models constructed are
less accurate. Here, because the opposite logic of the FFA curve applies, one can also
conclude that the feature ranking algorithm rA produces a better feature ranking than the
feature ranking algorithm rB.

Expected FFA and RFA curves
When one wants to assess the quality of a single feature ranking in a real-world
application, its forward (reverse) feature addition curves can be only compared to the
curves that belong the ranking, generated uniformly at random, since the ground-truth

●

●

●

●

●

●

●
●

● ● ● ● ● ● ● ● ● ● ● ●

5 10 15 20

0
.5

0
.6

0
.7

0
.8

ranking

a
cc

u
ra

cy
●

●

●

●

●

●

●

●

●

●

●

●

●
●

● ●
● ● ● ●

●

●

rA
rB

A

● ● ● ●
●

●

●

●

●

●

●

●

● ● ● ● ● ● ● ●

5 10 15 20

0
.5

0
.6

0
.7

0
.8

ranking

a
cc

u
ra

cy

● ● ● ● ● ●
●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

rA
rB

B

Figure 2 Comparison of FFA curves (A) and RFA curves (B) of two ranking methods rA and rB.
Full-size DOI: 10.7717/peerj-cs.310/fig-2

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 7/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-2
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


ranking is not known. As discussed before, the random ranking RRND is the worst-case
ranking, since it contains no information about the distribution of the relevant features.
As such, it can also serve as a baseline.

The expected values of the points that define FFA curve of the ranking RRND coincide
with the expected values of the RFA curve of this ranking, since the corresponding
values only depend on the data itself and the number of features i at a given point of the
curves. Thus, expected curves can be the common name for both types of the curves that
belong to RRND. Computing the exact average error estimations ES½erri� ¼ ES½erri�, where
S � F; jSj ¼ i, may be unfeasible if the number of features n is large (e.g., for i = n/2,
Oðð2nÞ!=ðn!Þ2Þ models have to be evaluated), but one can overcome this by sampling the
sets S.

Stability of feature ranking
An important aspect of feature ranking algorithms is their stability (Nogueira, Sechidis &
Brown, 2017) or, more specifically, the stability of the ranked feature lists that they
produce. Once we have the set R of m rankings Rt that were induced from the different
samples of the dataset D, the stability index SðRÞ can be calculated as

StðRÞ ¼ 1
m
2

� � Xm�1
t¼1

Xm
s¼tþ1

SMðRt; RsÞ;

that is, the stability index is the average of pairwise similarities SM for each pair of rankings.
In general, the function SM can be any (dis)similarity measure, for example, the Spearman
rank correlation coefficient (Saeys, Abeel & De Peer, 2008; Khoshgoftaar et al., 2013),
the Jaccard distance (Saeys, Abeel & De Peer, 2008; Kalousis, Prados & Hilario, 2007),
an adaptation of the Tanimoto distance (Kalousis, Prados & Hilario, 2007), Fuzzy
(Goodman and Kruskal’s) gamma coefficient (Boucheham & Batouche, 2014, Henzgen &
Hüllermeier, 2015), etc.

To assess the stability of feature ranking in our experimental work, we set SM = Ca,
where Ca is the Canberra distance (Lance & Williams, 1966; Lance & Williams, 1967;
Jurman et al., 2008). This is a weighted distance metric that puts bigger emphasis on
the stability of the top ranked features. If we have two feature rankings RA and RB of n
features, then the Canberra distance is defined as

CaðRA; RBÞ ¼
Xn
j¼1

jrankAðFjÞ � rankBðFjÞj
rankAðFjÞ þ rankBðFjÞ

: (1)

However, we do not only estimate the stability of the ranking as a whole. Rather,
we also estimate the stability of the partial rankings based on the features from Si.
In order for the distance to be applicable to such partial rankings with i < n features,
the following adaptation is proposed: instead of using the ranks rankA, B(F), we use

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 8/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


rankiA;BðFÞ ¼ minfrankA;BðFÞ; i þ 1g, that is, all features with rank higher than i are
treated as if they had rank i + 1:

CaðRA; RBÞ ¼
Xn
j¼1

jrankiAðFjÞ � rankiBðFjÞj
rankiAðFjÞ þ rankiBðFjÞ

: (2)

Additionally, we would like the stability indicator to be independent of specific values of
i and n. Hence, we normalize it by the expected Canberra distance between random
rankings, denoted by Ca(n, i). It can be approximated (Jurman et al., 2008) as

Caðn; iÞ � ði þ 1Þð2n � iÞ
n

log 4 þ ið1 þ iÞ
n

þ 2i � 3; (3)

which we make use of when i ≥ 8 and the computation of the exact value becomes
intractable. For i ≥ 8, the relative error of the approximation (3) is smaller than 1%. Our
final stability indicator is thus the curve consisting of points calculated as

i;
StðSiÞ
Caðn; iÞ

� �
; (4)

for 1 ≤ i ≤ n, that represent the relative change of distance between top-i lists w.r.t. the
expected top-i distance.

EMPIRICAL EVALUATION OF FFA/RFA CURVE PROPERTIES
We start with the experiments on synthetic datasets. In such laboratory conditions, one has
full control over the data, can establish the ground truth feature ranking, and produce
rankings of different quality. Such a setting will facilitate proper assessment of our
proposed feature ranking evaluation method. Before proceeding to the experiments, we
briefly describe the constructed synthetic datasets. The detailed description of the datasets
is given in “Appendix A1”.

We construct three datasets named single, pair and combined. Each of them
consists of 1,000 instances and 100 features. The relevant features in single dataset
are individually correlated to target, the relevant features in the pair dataset are related to
the target via XOR relation, and the relevant features in the combined dataset are the
union of the relevant features in the single and pair datasets. The rest of the features in
the datasets are random noise.

Evaluation by randomising the ground truth ranking
The appropriateness of the proposed method is first demonstrated on a family of feature
rankings that contain more and more noise. By doing that, we can show that the lower
and lower quality of feature rankings is reflected in the FFA and RFA curves, and thus
detected by the method.

We start with the ground-truth ranking RGT and perturb it as follows. First, we select a
proportion θ of randomly selected features which are then assigned random relevances,
drawn uniformly from the unit interval. The other features preserve their ground truth
relevance. This results in a ranking Rθ.

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 9/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


Experimental setup
We use the aforementioned single, pair and combined datasets. The following amounts
θ of noise are introduced into the ground truth ranking: θ ∈ {0.05, 0.1, 0.15, 0.2, 0.3, 0.5, 1}.
The value θ = 1 corresponds to a completely random ranking.

For every value of θ, we estimate the expected values of the FFA/RFA curves that belong
to the ranking Rθ, by first generating m = 100 realizations of the ranking, and second,
(point-wise) averaging of the error estimates of the obtained predictive models.

For constructing FFA/RFA curves, SVMs were used, as noted and justified at the
end of the “Analysis of Different Learning Methods to Construct FFA and RFA Curves”.
The curves were constructed via 10-times stratified 10-fold cross validation, using different
random seeds.

Results
The obtained FFA and RFA curves are shown in Fig. 3 that gives the results for the dataset
combined. The results for the datasets single and pair are similar. In addition to the
curves that belong to the rankings Rθ with different amounts of noise, ground truth
ranking is also shown.

Both, FFA curves (Fig. 3A) and RFA curves (Fig. 3B) correctly detect different amount
of noise θ: the higher the θ, the more distant are FFA and RFA curve of Rθ to the curves of
ground truth ranking. The independent confirmation of these results are given in
“Appendix A2”.

Additionally, note that FFA curves cannot give all the information about the ranking:
Had we not plotted the RFA curves in Fig. 3B, we would not have a proof that all of
the rankings misplace some relevant features (check the considerable decrease in accuracy
just before the 100th feature).

●●

●●●●

●
●

●●●●●
●●●●●●

●●●●●●●●●●●●●
●
●●●

●●●●●●●●●●●●●●
●

● ● ● ● ●
● ● ● ●

0 20 40 60 80 100

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●

●

●

●
●

●

●
●

●
●

●
●●

●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●● ●

● ● ● ● ●
● ●

● ●

●

●

●

●

●
●
●
●
●
●●

●
●
●●●●●

●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●● ● ● ● ● ● ● ● ● ● ●

●

●
●●●

●
●
●
●●

●●
●●

●●
●
●●●●●●

●●●●
●●●●

●●●●
●●●

●
●●

●
●●

●
●
●●

●●

●

●

●
●

●
● ●

● ●
●

●

●

●

●

ground truth
5% noise
10% noise
15% noise
20% noise
30% noise
50% noise
random

A

●
●

●

●
●●

●●

●●

●

●

●
●●

●●

●
●

●
●

●
●
●●●

●●●●
●●●

●●
●

●
●●●

●
●
●
●●●

●●●

● ●
● ● ●

● ● ●
●

●

●

0 20 40 60 80 100

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●
●

●

●

●

●
●

●
●

●
●

●

●

●

●

●

●

●

●

●

●
●

●
●

●

●
●

●

●
●

●

●
●

●●●●●
●●●●●●●●●●●● ●

●
●

● ●

● ●

●

●

●

●●
●

●

●
●
●●●●

●
●
●
●
●

●

●

●

●

●
●
●
●
●●

●
●
●
●
●
●
●
●
●
●
●●●

●●●●●●●●●●●● ● ●
●

● ●
● ●

●
●

●

●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●
●●●●●●●

●

●

●

●
●

●
●

●

●

●

●

●

●

●

ground truth
5% noise
10% noise
15% noise
20% noise
30% noise
50% noise
random

B

Figure 3 Dataset combined: Forward feature addition curves (A), and reverse feature addition
curves (B). The curves for the noisy rankings Rθ (0.05 ≤ q ≤ 1) and the ground truth ranking are
shown. Full-size DOI: 10.7717/peerj-cs.310/fig-3

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 10/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-3
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


Analysis of different learning methods to construct FFA and RFA
curves
According to Algorithm 1, the error curve estimates depend not only on the feature
ranking method, but also on the learning method used to construct the predictive
models. In this section, we investigate which learning methods (learners) are suitable to
construct the FFA and RFA curves. Note that we are not searching for a learner that would
produce the most accurate predictive models. Rather, the requirement for the learner to be
used in this context is that it should produce predictive models that exploit all the
information that the features contain about the target concept, and can thus distinguish
between feature rankings of different quality.

Experimental setup
When comparing the FFA and RFA curves of different ranking methods, constructed with
different learners, we used the combined dataset described in detail in “Appendix 1”.
We consider the following four feature ranking methods.

Information gain, where we calculate the information gain of each feature Fi as
IG(Ft,Fi) = H(Ft) − H(Ft :amp:mid; Fi).

SVM-RFE uses a series of support vector machines (SVMs) to recursively compute a
feature ranking. A linear SVM was employed, as proposed by (Guyon et al., 2002).
Following (Slavkov et al., 2018), we set ε = 10−12 and c = 0.1.

ReliefF algorithm as proposed by Robnik-Šikonja & Kononenko (2003). The number of
neighbors was set to 10, and the number of iterations was set to 100%.

Random Forests, which can be used for estimating feature relevance as described by
Breiman (2001). A forest of 100 trees was used, constructed by randomly choosing
rounded up log2 of the number of features.

We compare the above ranking methods by using different learners to produce
classifiers and generate error estimates for the FFA and RFA curves. More specifically,
we consider Naïve Bayes (John & Langley, 1995); Decision Trees (Quinlan, 1993);
Random Forests (Breiman, 2001): the number of trees was set to 100, and in each split log2
of the number of features are considered; SVMs (Cortes & Vapnik, 1995): a polynomial
(quadratic) kernel was employed, with the ε = 10−12 and c = 0.1; k-NN (Aha & Kibler,
1991) with a value of k = 10.

The curves were constructed via 10-times stratified 10-fold cross validation, using
different random seeds. The obtained FFA and RFA curve comparisons of the four feature
ranking methods obtained by each of the five learning methods, are presented in the
following section.

Results

The rankings are shown in Fig. 4, where each graph represents the distribution of the
ground truth relevance values. The y-axis depicts the ground truth relevance value (5).
Each point, i, represents the i-th ranked feature, F(i), as determined by the feature ranking
method.

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 11/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


We can see that the rankings fall into two groups: in Figs. 4A and 4B, highly relevant
features are concentrated on the left, while in the Figs. 4C and 4D, they are evenly spread.

ReliefF and Random Forests (Figs. 4A and 4B) are thus clearly better than Info Gain
and SVM-RFE (Figs. 4C and 4D). Hence, the FFA and RFA curves should at least
differentiate between the two groups of the rankings. However, there should be visible
difference also between Relief and Random Forests at the beginning of the ranking.
The detailed comparative evaluation of the obtained feature rankings is given in
“Appendix A3”.

In the case of FFA curves, the learners can be divided into two groups: FFA curves
produced by Naïve Bayes, Decision Trees and Random Forests cannot capture any

1 6 12 19 26 33 40 47 54 61 68 75 82 89 96

feature ranking

re
le

va
n

ce
0

.0
0

0
.0

5
0

.1
0

0
.1

5
0

.2
0

0
.2

5

A

1 6 12 19 26 33 40 47 54 61 68 75 82 89 96

feature ranking

re
le

va
n

ce
0

.0
0

0
.0

5
0

.1
0

0
.1

5
0

.2
0

0
.2

5

B

1 6 12 19 26 33 40 47 54 61 68 75 82 89 96

feature ranking

re
le

va
n

ce
0

.0
0

0
.0

5
0

.1
0

0
.1

5
0

.2
0

0
.2

5

C

1 6 12 19 26 33 40 47 54 61 68 75 82 89 96

feature ranking

re
le

va
n

ce
0

.0
0

0
.0

5
0

.1
0

0
.1

5
0

.2
0

0
.2

5

D

Figure 4 Distribution of relevant features for each of the four ranking methods: ReliefF (A),
Random Forests (B), Info Gain (C) and SVM-RFE (D). Full-size DOI: 10.7717/peerj-cs.310/fig-4

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 12/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-4
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


difference between rankings, whereas those produced by SVMs and k-NN can. It suffices
to show one representative graph for each group (those for Naïve Bayes and k-NN are
shown in Fig. 5), since there are no significant differences among the learning methods in
the same group1. The FFA curves produced by these two learners have all the desirable
properties: the curves for Relief and Random Forests are better than those of Info Gain and
SVM-RFE. Additionally, at the beginning, the Random Forest curve is under the curve of
Relief.

0 20 40 60 80 100

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●●

●●
●

●●●●●●●●●●
●●

●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●

● ● ●
● ● ●

● ● ● ●

●●

●●
●●

●●●●●●●●●
●●●

●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●● ● ●

● ● ● ● ● ● ●
●

●

●

info gain
random forests
reliefF
SVM−RFE

A

0 20 40 60 80 100

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●●

●●
●

●●●●
●

●

●

●●●●
●

●●
●

●●
●●

●●
●●

●
●

●●
●

●●
●

●

●
●●

●●●●
●

●
●

●●
●

● ●
●

● ●
●

●
●

●
●

●●

●●●
●●●●●●●●●●●●●

●
●●●●

●●
●●●

●●
●
●●●●

●●●●●●●
●●

●
●
●
●●● ●

● ● ● ● ● ● ● ● ●

●

●

info gain
random forests
reliefF
SVM−RFE

B

Figure 5 Comparison of FFA curves for the four different ranking methods for the combined
dataset. The curves were obtained by using the Naїve Bayes (A) and k-NN (B).

Full-size DOI: 10.7717/peerj-cs.310/fig-5

0 20 40 60 80 100

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●

●

●

●

●

●●
●

●●●

●

●
●

●

●

●●

●

●

●●

●●

●●●

●
●

●●
●●●

●●●
●

●●

●
●

●●●
●

●
●

●

●

●

●
●

● ● ●
●

●
●

●

●

●

●

●

●

●●

●●

●
●●

●

●
●

●●●

●●

●
●●

●●

●
●●

●●●●
●
●●

●●●
●

●
●●●●

●
●
●●● ● ●

●
●

●
● ●

●
●

●

●

●

info gain
random forests
reliefF
SVM−RFE

A

0 20 40 60 80 100

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●

●

●

●

●

●

●

●
●

●

●●
●

●
●●

●●

●

●

●

●●

●

●

●●
●

●

●

●●
●

●
●●

●●

●

●

●

●

●
●

●
●

●
●

●
●

● ●
●

●
●

●
●

●

●

●

●

●

●

●

●

●
●
●●●

●●

●

●
●

●

●
●

●

●

●●
●
●●●●

●
●
●
●
●●

●
●●

●
●●

●

●

●●
●
●
●●●

●
●

●

●

●
●

● ●

●

●

●

●

●

info gain
random forests
reliefF
SVM−RFE

B

Figure 6 Comparison of RFA curves for the four different ranking methods for the combined
dataset. The curves were obtained by using Decision Trees (A) and k-NN (B).

Full-size DOI: 10.7717/peerj-cs.310/fig-6

1 Compare, for example, Fig. 5B (obtained
by k-NN) with Fig. A1A (obtained by
SVMs), which is given in Appendix 3
(note that Fig. A1A also contains the
random ranking curve).

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 13/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-5
http://dx.doi.org/10.7717/peerj-cs.310/fig-6
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


The reason why, for example, the Naïve Bayes classifier does not show any difference
between rankings, is the fact that it can not use the information from the interactions
of higher order. Namely, it assumes feature independence. Hence, it is not appropriate for
use in the considered context.

If we proceed to RFA curves, again, the Naïve Bayes classifier does not show any
difference between rankings, whereas the other four methods do. We prefer Random
Forests, SVMs and k-NN over Decision Trees in the case of RFA curves, because Decision
Trees generate quite unstable curves, as shown in Fig. 6A. In Fig. 6B, the RFA curves of
k-NN are shown. Again, there is no quantitative difference between them and the RFA
curves generated by SVMs2.

Tu sum up, one can use

� SVMs and k-NN models, for constructing FFA curves,
� SVMs, k-NN and Random Forests, for constructing RFA curves.

Thus, only k-NN and SVMs are appropriate for constructing both FFA and RFA
curves. Since one should typically use approximate k-NN when the number of features is
extremely high (Muja & Lowe, 2009), we use SVMs (with the settings described here)
as the learner for constructing the FFA/RFA curves in all the remaining experiments in this
work.

Discussion
We have to give some additional notes about choosing the best method, when, for example,
different learning methods prioritize different rankings, which is possible since some
learning method might make use of some features, whereas another learning method can
make better use of some others.

If we have computed feature rankings to learn a classifier that uses only a subset of
(top-ranked) features and we have already decided on which classifier to use, we should use
the same (type of) classifier to construct the curves, because we want to use the features
that the chosen learning method prioritizes.

Second, if our motivation for computing the feature rankings is to discover all
relevant features for a given problem (e.g., the genes that influence the patients’ clinical
state), and learning method A prioritizes the ranking R ¼ ðx1; x2; . . .Þ over the ranking
R0 ¼ ðx10; x20; . . .Þ, whereas learning method B prioritizes R′ over R, this means that x1, x2,
x1′ and x2′ are important (provided that both learners achieve similar accuracy), so we can
include them all in the subsequent experiments (and thus use both learning methods).

The decision about which among the two appropriate methods to use—k-NN or
SVMs—might also depend on the properties of the dataset at hand. As mentioned
before, k-NN could be too time-consuming when the number of features is extremely high.
On the other hand, if the number of instances is high, SVMs could be too time-consuming,
but speed-ups are possible (Tsang, Kwok & Cheung, 2005). As for the noise, both
methods are quite robust (Wang, Jha & Chaudhuri, 2018; Xu, Caramanis & Mannor,
2009), so this is not among the most influential factors.

2 Compare Figs. 6B–A1B (given in
“Appendix A3”).

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 14/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


FEATURE RANKING COMPARISON ON REAL-WORLDS
DATASETS
In this section, we move from the synthetic data and show the appropriateness of the
proposed feature ranking evaluation method on the real-world data with unknown
relevant and irrelevant features. To be consistent with the synthetic-data experiments,
we evaluate the same four feature ranking methods as before, and compare them to the
random feature rankings which now serve as the only baseline.

Datasets description
In this extensive study, 35 classification benchmark problems are used. They come in two
bigger groups. The first group has been part of the experiments in (Slavkov et al., 2018) and
except for aapc (Džeroski et al., 1997), water and diversity (Džeroski, Demšar &
Grbović, 2000), mostly originates from the UCI data repository (Newman & Merz, 1998).
This benchmark problems have higher number of instances (up to 5,000) and not
extremely high number of features (up to 280). This problems cover various domains:
health, ecology, banking etc.

The second group is newly included and contains 11 high-dimensional micro-array
benchmark problems (Mramor et al., 2007) (up to 12,625 features) with lower number of
examples (up to 110). The main properties of the data are given in Table 1.

Experimental setup
We construct the curves that base on the feature ranking methods described in
experimental setup part of “Analysis of Different Learning Methods to Construct FFA and
RFA Curves”, and the curves that belong to the completely random ranking (i.e., expected
curves) which serve as a baseline. For the actual construction of the curves (once the
ranking is obtained), support vector machines were used, as described and justified in the
results in “Analysis of Different Learning Methods to Construct FFA and RFA Curves”.
The curves were constructed via 10-times stratified 10-fold cross validation, using different
random seeds.

The expected error curves for random rankings were produced by generating 100
random rankings for each dataset under consideration. For each random ranking, error
curves were produced and the average of the error values was used as the expected error.
This was done in a manner similar to the one described in “Evaluation by Randomising
the Ground Truth Ranking”. As mentioned in “The Evaluation Method”, building
FFA/RFA curves by adding the features one by one to large feature subsets Si and Si, might
be too costly when n is big enough. In this set of experiments, we use the following
procedure. We add δ(i) features to the subset, where δ(i) is defined as follows: δ(i) = 1 if
1 ≤ i ≤ 50, δ(i) = 5 if 50 < i ≤ 500, and δ(i) = n//20 otherwise, where // denotes integer
division.

Results
In this section, we show representative examples of three types of curves: FFA, RFA and
stability curves. The curves are shown for two datasets with lower and two with higher

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 15/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


number of features. The graphs for the other datasets can be found in “Appendix 4” in
Figs. A2–A12.

We start with the breast-w dataset. The FFA/RFA curves in Figs. 7A and 7B show that
both types of curves are needed in order to evaluate the ranking completely. The FFA-

Table 1 Properties of the benchmark datasets: number of instances, number of features, number of
discrete/numeric attributes, and number of different class values.

Dataset #Inst. #Feat. (D/N) #Cl.

aapc 335 84 (83/1) 3

arrhythmia 452 280 (73/206) 16

australian 690 14 (8/6) 2

balance 625 4 (0/4) 3

breast-cancer 286 9 (9/0) 2

breast-w 699 9 (9/0) 2

car 1,728 6 (6/0) 4

chess 3,196 36 (36/0) 2

diabetes 768 8 (0/8) 2

diversity 292 86 (0/86) 5

german 1,000 20 (13/7) 2

heart 270 13 (6/7) 2

heart-c 303 13 (7/6) 5

heart-h 294 13 (7/6) 5

hepatitis 155 19 (13/6) 2

image 2,310 19 (0/19) 7

ionosphere 351 34 (0/34) 2

iris 150 4 (0/4) 3

sonar 208 60 (0/60) 2

tic-tac-toe 958 9 (9/0) 2

vote 435 16 (16/0) 2

water 292 80 (0/80) 5

waveform 5,000 21 (0/21) 3

wine 178 13 (0/13) 3

amlPrognosis 54 12,625 (0/12,625) 2

bladderCancer 40 5,724 (0/5,724) 3

breastCancer 24 12,625 (0/12,625) 2

childhoodAll 110 8,280 (0/8,280) 2

cmlTreatment 28 12,625 (0/12,625) 2

colon 62 2,000 (0/2,000) 2

dlbcl 77 7,070 (0/7,070) 2

leukemia 72 5,147 (0/5,147) 2

mll 72 12,533 (0/12,533) 3

prostate 102 12,533 (0/12,533) 2

srbct 83 2,308 (0/2,308) 4

Note:
The datasets with a considerably high number of features are listed under the dashed line.

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 16/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


curves suggest that all feature ranking algorithms (except for the random ranking) place
only the relevant features at the beginning, since there is practically no difference if we
compare the accuracy of the 89-feature (all) model and, for example, the 11-feature
model. However, the RFA-curves show that all feature ranking algorithms—except for Info
Gain—misplace some relevant features, since the Info-Gain-ranking-based models have
the lowest accuracy by far in the RFA curves. Also, in the case of Info Gain, the accuracy
cease to decrease after only cca. A total of 40 top-ranked features were removed.

Figure 7C shows that Info Gain produces also the most stable rankings. We can see that
the top-ranked feature is always the same, since the stability index of the Info Gain equals
0 at the point k = 1. The second most stable algorithm is ReliefF, the third is Random
Forest and the least stable is SVM-RFE, but the difference between Random Forest and
SVM-RFE is not considerable.

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

● ●

●
● ● ● ●

● ●
● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●

●

● ●

●
●

●
● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ●

●

●

●

●

●
●

●
●

●
● ●

●
●

●
● ●

● ●
● ●

● ●
● ●

● ● ●
● ● ●

● ● ● ●
● ● ● ● ●

● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●

● ● ● ●

1 11 19 27 35 43 51 89

●

●

●

infoGain
rforest
relief
svmRfe
random

A

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

1
.0

feature ranking

a
cc

u
ra

cy

● ● ●
●

● ●
●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●

●
●

● ●
●

● ● ●
●

● ●
●

● ●
●

●
●

●
●

●

● ● ● ●
● ●

●
●

● ● ● ● ● ● ●
● ●

●
● ● ● ● ●

● ●
● ●

● ● ● ● ● ● ● ● ● ●
● ● ● ●

●
● ●

● ●
●

● ● ● ●
●

●
●

● ●
● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●

● ●
●

●
●

●
●

●
●

●
●

●

●

●

●

●

1 11 19 27 35 43 51 89

●

●

●

infoGain
rforest
relief
svmRfe
random

B

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o

rm
a

liz
e

d
 C

a
n

b
e

rr
a

 d
is

ta
n

ce

●

●

●
●●

●
●●●●●●●●●●

●●●
●●●●●

●●●●●●●●●
●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●

●

●●
●

●
●

●
●

●●●●●
●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●

●●●●●●●●
●●●●●●●

●●●●●●●
●●●●●●●●●●●●●●●

1 9 17 33 49 65 81

●

●

infoGain
rforest
relief
svmRfe

C

0
.3

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●
●

●

●
●

●

●
●

● ● ● ●
● ●

● ● ● ● ● ● ● ● ● ● ● ● ●
● ●

●
● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●

●

●

●
●

●

●
●

●

●

●
●

●
●

● ● ●
● ● ●

● ● ●
● ● ●

● ● ● ● ● ●
●

●
● ●

● ●
● ● ●

● ●
● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

●

●

●
●

●
● ●

●
●

●
●

●
●

●
●

●
●

●
●

●
● ●

● ●
●

●
● ●

● ●
● ●

● ●
● ●

● ● ●
● ● ●

● ● ● ●
● ● ●

● ●
●

●
●

●
●

●

1 9 17 25 33 41 49 80

●

●

●

infoGain
rforest
relief
svmRfe
random

D

0
.3

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●
●

●

●

●

●

●
●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●

●
●

●

●

●

●

●

●

●

●

●
●

● ● ●
● ●

● ● ● ● ●
● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●

●

● ●
●

●
●

●
●

●
●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

1 9 17 25 33 41 49 80

●

●

●

infoGain
rforest
relief
svmRfe
random

E

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●

●
●

●

●

●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●
●●●●●●●●

●●●●●●●●●●●●

●

●

●

●
●

●
●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●

8 16 24 32 40 48 56 64 72 80

●

●

infoGain
rforest
relief
svmRfe

F

Figure 7 Ranking quality assessment for datasets breast-w (A–C) and water (D–F) in terms of the FFA (A and D) and RFA curves (B and E),
and rankings’ stability estimates (C and F). The FFA/RFA curves are obtained by using SVMs. Full-size DOI: 10.7717/peerj-cs.310/fig-7

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 17/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-7
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


Let us now take a look at the curves for the dataset water. From the FFA-curves in
Fig. 7D, we see that ReliefF, Info Gain and Random Forest ought to have the same 21
top-ranked features, and as a consequence, the same last 59 = 80 − 21 features too.
However, the first 21 features are ordered better by Info Gain and ReliefF, while the last
59 are more properly ordered by Random Forest. We can conclude that none of the
rankings is ideal, but we can come close to the ideal one (in terms of FFA-curves),
if we combine the first part of the ReliefF (or Info Gain) and the second part of
Random Forest. This claim is also confirmed by the RFA-curves of Info Gain and ReliefF
(Fig. 7E): these two algorithms indeed misplace some relevant features, since the accuracy
of the model abruptly decreases and the end of the ranking.

Figure 7F suggests that we should prefer Info Gain and ReliefF over Random Forest
since they are more stable. However, we can also notice that Random Forest is the least
stable at the beginning of the ranking but its stability increases when the number of
features gets larger.

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

1
.0

feature ranking

a
cc

u
ra

cy

●

●

●

●

●
●

●

●●
●

●
●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●

●●●●
●●

●●●●●●
●●

●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●

●

●

●

●●

●●
●●●●

●●
●●●●●●●●●

●●●●●●●●
●●●●

●●●●●●
●●●●●●●●●●●●●●

●●●●
●●●●●

●
●●●●●●●●●●●●●●●●●●

●
●

●

●

●

●

●

●

●

●

●
●

●
●

●
●

●
●

●
●●

●●
●

●●
●●

●●
●●●

●●●●
●●●

●●●●
●●●●●●●

●
●●

●●
●●

●●●

●
●

●●
●●●

●●●●●●●●●●●●●●●●●

1 13 29 45 101 8113

●

●

●

infoGain
rforest
relief
svmRfe
random

A

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

1
.0

feature ranking

a
cc

u
ra

cy

●●●●
●●

●●●●●●●●
●●

●●●

●

●

●

●

●

●●●
●

●
●●●

●●●●
●

●●
●

●
●

●●

●●

●
●●●

●●
●●●●

●●●
●

●
●●

●●
●

●
●

●●●●

●

●
●

●●●●●●●●●●

●●
●

●●

●
●

●●●
●●●●●●●

●

●

●

●

●
●

●

●●●
●

●
●●●

●●

●
●●●●●

●●●
●

●●
●●●●

●●●●●
●

●●●
●

●●
●●●●●●●●●

●
●

●●

●

●

●

●

●

●●●●●

●●●●●●●●●●●●●●●●●●●●
●●

●
●

●●●●●
●●

●
●

●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●
●●

●
●

●
●

●
●

●
●

●

●

●

●

●

●

●

●●

1 13 29 45 101 8113

●

●

●

infoGain
rforest
relief
svmRfe
random

B

0
.0

0
.2

0
.4

0
.6

0
.8

1
.0

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●

●

●
●
●
●●●●●●●●

●●
●●
●●
●●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●
●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●

●●

●●●●●
●●
●●
●●
●●
●●
●●●
●●●●
●●●●●
●●●●●●

●●●●●
●●●●●
●●●●●●

●●●●●●
●●●●●●●

●●●●●●●●●
●●●●●●●●●●●

●●●●●●●●●●●●●●
●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

6 57 880 2920 4960 10000

●

●

infoGain
rforest
relief
svmRfe

C

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●●
●●

●
●

●
●

●
●●

●
●●

●●
●●

●

●
●●●

●●
●

●
●

●●●●●
●●

●
●

●
●

●●
●●

●●
●●

●●●●●●
●

●●
●

●●●
●

●
●

●●

●
●●●

●●●●●●●
●●●●●●●

●●

●
●

●

●

●●
●

●

●
●●

●
●●●

●
●

●

●
●

●●
●

●
●

●
●

●
●●

●
●●●●

●
●

●●
●

●●
●●●●●●

●●
●

●●
●

●

●●
●●●●

●
●

●

●
●●●●●●●●●●●●●●●●●●●●

●●●●●
●●

●●
●

●
●●

●
●●●●

●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●
●●

●
●●●

●●●●●●●●●●●●●●●●●●
●●

1 13 29 45 101 8177

●

●

●

infoGain
rforest
relief
svmRfe
random

D

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●●●
●●●●●

●●
●

●
●

●
●

●
●

●
●●●

●

●

●●●

●

●
●

●
●

●
●

●
●

●

●

●
●

●
●

●

●

●
●●●●

●
●

●●

●●

●●●●
●

●●●●
●●

●
●

●
●

●
●

●

●●

●
●●●

●

●

●
●

●●●

●●
●●●●●

●
●

●●
●●●

●

●

●

●

●
●

●
●

●

●
●

●
●

●
●●

●
●●

●●
●●

●
●

●
●●

●
●●●

●●●●●●
●

●

●

●
●

●
●●●

●●●
●●

●
●

●

●●
●

●

●

●●
●

●●
●

●●
●●

●

●●●●●●●●●●●●●●●●●●●●●
●●

●

●●●
●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●

1 13 29 45 101 8177

●

●

●

infoGain
rforest
relief
svmRfe
random

E

0
.2

0
.4

0
.6

0
.8

1
.0

top k features

n
o

rm
a

liz
e

d
 C

a
n

b
e

rr
a

 d
is

ta
n

ce

●

●

●
●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●

●

●
●
●●●●●●●●

●●●●
●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

8 59 920 2960 5000 10100

●

●

infoGain
rforest
relief
svmRfe

F

Figure 8 Ranking quality assessment for datasets mll (A–C) and amlPrognosis (D–F) in terms of the FFA (A and D) and RFA curves (B and
E), and rankings’ stability estimates (C and F). The FFA/RFA curves are obtained by using SVMs. Full-size DOI: 10.7717/peerj-cs.310/fig-8

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 18/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-8
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


We begin the analysis of the high-dimensional datasets with mll. Figure 8B shows that
Random Forest completely misplaces some relevant features, since its RFA-curve mostly
goes above the random-ranking one. Even though it is evident from Random Forest’s
FFA-curves that some relevant features are also successfully captured, Random Forest
produces the worst ranking. Info Gain is slightly better, whereas ReliefF and SVM-RFE are
again the best algorithms. From the FFA-curves, we can conclude that SVM-RFE places
more features with higher relevance at the beginning of the ranking (its curve is higher
than ReliefF’s), while RFA-curves reveal that SVM-RFE also misses some quite relevant
features: ReliefF’s curve is far below SVM-RFE’s. Figure 8C shows that ReliefF is
considerably more stable than SVM-RFE, hence we prefer the former over the latter on the
mll dataset.

The last example will show that sometimes, the understanding of the results is not
that easy. In the Figs. 8D and 8E, the FFA and RFA curves for the amlPrognosis dataset
are presented. In this case, only Info Gain performs considerably better than random
ranking in terms of FFA-curves. SVM-RFE is also able to find some relevant features at
the beginning (peak of its curve at 10 features), but after that, the models’ accuracy
decreases, hence mostly noisy features are positioned here. Some relevant features are
again placed by ReliefF, Info Gain and Random Forest also around the 2,000th place
(local peak of their curves in the right part of the FFA-curve). RFA-curves confirm that
there is indeed much noise in these data, since removing features does not result in an
(at least approximately) decreasing curve.

It may not come as a surprise that all ranking algorithms produce rankings that are very
unstable at the beginning (Fig. 8F), but it is interesting that after approximately 1,000
features, Info Gain and Random Forest produce quite stable rankings even though they
have low quality. The reason for both low quality of rankings and their instability is
probably the low number of instances accompanied by a high number of features (54 and
12,625 respectively).

CONCLUSIONS
We have proposed a method for evaluating and comparing feature rankings. The method
is based on constructing two chains of predictive models that are built on two families of
nested subsets of features. The first family of subsets are the sets of top-ranked features,
while the second family consists of sets of bottom-ranked features. The predictive
performances of the models form a FFA curve in the former case, and RFA curve in the
latter case.

We show in our experiments that both types of curves are necessary when comparing
the rankings: FFA curves detect whether important features are placed at the beginning
of the ranking, whereas RFA curves detect whether important features can still be found at
the end of the ranking.

In the set of experiments, we show the usefulness of the proposed evaluation
method and its sensitivity to the rankings of different quality on synthetic data. The second
set of experiments shows which of the learning methods are appropriate for building the

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 19/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


FFA and RFA curves (SVMs, k-NN) and which are not (Naïve Bayes, Decision Tree,
Random Forest). In the third set of experiments on synthetic data, we test several feature
ranking algorithms and examine their properties. Considering data with different
properties, we show that ReliefF algorithm outperforms the other investigated approaches,
both in terms of detecting relevant features and in terms of stability of the feature rankings
it produces.

Moreover, we show the usefulness of the proposed approach in real-world scenarios.
We evaluate feature rankings computed by four feature ranking algorithms on
35 classification benchmark problems. The results reveal that there is no feature ranking
algorithm that would dominate the others on every dataset.

A possible disadvantage of the proposed method is that it can be computationally quite
intensive, if we want to construct the curves in full resolution. Namely, every point of a
FFA or RFA curve comes at the cost of building and evaluating a predictive model.
However, as justified in the method description, the full resolution is, especially when the
number of features is really high, not necessary, and, moreover, the construction of the
curve can be also easily parallelized.

The work presented in this article can continue in many directions. First, of all, the
proposed methodology could use other error measures, since accuracy is appropriate only
for the task of classification when the distribution of target variable is approximately
uniform. The strong modularity of the FFA/RFA curves allows for their use in any
other predictive modeling task, for example, for the task of regression, we could use root
mean squared error instead of accuracy. However, even though there exists a regression
version of most of the learners, which are considered for constructing the curves,
experiments should be repeated on those cases, since the conclusions about, for example,
the most appropriate learner for constructing the curves, can be different. Moreover, the
method can be adapted not only to the regression setting, but also to different tasks of
structured output prediction (Bakr et al., 2007) and time series prediction.

APPENDIX 1
In this section, we explain how we generate our synthetic datasets. For simplicity, we take
both the features Fi and the target Ft to be binary and take values from the set {0,1}.
We then partition the set of features F into feature interaction subsets Fint of cardinality
one and two. The feature sets with cardinality one are single features Fi that are in an
individual interaction with the target Ft, while the features from the interaction sets with
cardinality two determine the value of the target by the XOR relation.

The examples are generated as follows. For each example, we first randomly (from a
uniform distribution) set the value of the target feature Ft. After that, if Fint = {Fi}, then the
value of feature Fi is randomly chosen, so that P(Fi = Ft) = p. Otherwise, we have Fint =
{Fi, Fj}, and the values of the features Fi and Fj are randomly chosen, so that P(XOR(Fi, Fj) =
Ft) = p.

We consider the probability values p ∈ {0.8, 0.7, 0.6, 0.5}. The feature sets with p = 0.5
are in fact independent of the target Ft, and can be considered as irrelevant features.

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 20/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


With combinations of these feature interaction sets, three datasets were generated, each
of them consisting of 1,000 instances and 100 features in total.

The first dataset (named single) comprises only individually correlated features.
The second dataset (named pair) contains relevant features related to the target via the
XOR relation, as well as irrelevant features. The third (named combined) is a combination
of the first two. It contains XOR-related features and individually correlated features.

In order to simulate the redundancy of features, which occurs in real datasets, the three
datasets are constructed in the following way: If the set Fint of relevant features is included
in the dataset, we also include two redundant copies of Fint in the dataset. Irrelevant
features are realized independently of each other.

The properties of the generated datasets are summarized in Table 2, from which we can
observe that there are 9 relevant features in the single dataset, 18 in the pair dataset, and
27 in the combined dataset.

The ground-truth feature relevances rel(Fi) of the features Fi are defined as follows.
First, the relevance of each feature group Fint is defined as the mutual information
between the group and Ft, namely rel(Fint) = I(Fint; Ft). Second, for Fi ∈ Fint, feature
importances are obtained as

relðFiÞ ¼ relðFintÞ=jFintj: (5)
For the particular three datasets, this ground-truth ranking RGT should also result in the

optimal FFA and RFA curves, but this may not be the case in general. In the next section,
we give the results of comparing it to the other feature rankings.

APPENDIX 2
When discussing the results in the “Evaluation by Randomising the Ground Truth
Ranking”, we showed that, when the level of noise θ in the ranking increases, then (i) the
quality of the ranking Rθ decreases, and (ii) the rankings RGT and Rθ become more and
more distant. However, for the second point, we need to define a distance dist(RGT, Rθ)
between a noisy and the ground truth ranking. In the definition of dist(RGT, Rθ) we use the
average Spearman rank correlation coefficient ρ(RA, RB) which is calculated as

Table 2 Properties of the synthetic datasets.

Fint p #Copies in #Copies in #Copies in
Single Pair Combined

0.8 3 3

0.7 3 3

0.6 3 3

0.8 3 3

0.7 3 3

0.6 3 3

0.5 91 82 73

Note:
If p > 0.5, #copies denotes the number of copies of the interaction set. In the last row where p = 0.5, #copies corresponds
to the number of independently realized irrelevant features.

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 21/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


1
n � 1

Xn
i¼1

ðrankAðFiÞ � rankAÞðrankBðFiÞ � rankBÞ
rArB

where n is the number of features, therefore the average ranks rankA and rankB equal

(n + 1)/2. Standard variations σA, B are computed as rA;B ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPn

i¼1 ðrankA;BðFiÞ � rankA;BÞ
q 2

=

ðn � 1Þ. For a given θ, the distance between rankings RGT and Rθ is then computed as

distðRGT; RhÞ ¼ 1 �
1
m

Xm
t¼1

qðRGT; Rh;tÞ (6)

where m is the number of different realizations Rθ, t of the noisy ranking Rθ.
Table 3 lists the values of the distances between the ground truth ranking RGT and its

noisy versions Rθ. Note that, for all three synthetic datasets, the distances indeed increase
when the amount of noise θ increases.

APPENDIX 3
In “Evaluation by Randomising the Ground Truth Ranking”, we analyzed rankings of
different quality (with different amounts of noise) by comparing them to the ground truth
ranking. In a real-world setting, the ground truth ranking is unknown and the feature
rankings are induced directly from data. Therefore, in this section we analyze feature
rankings, produced by the four feature ranking methods from “Analysis of Different
Learning Methods to Construct FFA and RFA Curves”, and the synthetic data described in
“Generating Synthetic Data”.

When comparing the rankings, stability should also be taken into account as discussed
earlier. Therefore, the stability indicator (4) is also included in the analysis.

Experimental setup
We have used the same parameter settings for the feature ranking algorithms as in
“Analysis of Different Learning Methods to Construct FFA and RFA Curves”. As noted
and justified in the corresponding “Results”, SVMs were used for constructing the
FFA/RFA curves. The curves were constructed via 10-times stratified 10-fold cross
validation, using different random seeds. To complement them, we also estimate the
stability of each feature ranking algorithm by using the stability indicator described in
“Stability of Feature Ranking”. All feature ranking methods were tested only on the
combined dataset.

Table 3 The distances dist (RGT, Rq) for different q values, for each of the three synthetic datasets.

θ 0.05 0.1 0.15 0.2 0.3 0.5 1

Single 0.219 0.34 0.449 0.51 0.628 0.81 1.056

Pair 0.126 0.239 0.327 0.397 0.519 0.726 1.091

Combined 0.1 0.171 0.252 0.32 0.432 0.652 1.048

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 22/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


Results
For our analysis, we consider three types of graphs. The first two types are FFA curves
(Fig. A1A) and RFA curves (Fig. A1B). The third is the stability estimate graph (Fig. A1C)
where the y-axis refers to the value of the stability indicator (4): the higher the value,
the less stable the ranking method. Each point, k, on the x-axis represents the size of the
considered feature subsets, consisting of the top ranked k features.

Upon a visual inspection of the overall results in Fig. 4, we can conclude that all of the
feature ranking methods can correctly detect the features individually related to the target.
However, Info Gain and SVM-RFE (Figs. 4C and 4D, respectively) exhibit random
behavior for the XOR features, that is, are unable to assign proper relevance values to
them. Random Forests (Fig. 4B) separate relevant from irrelevant features, but the ordering
of the relevant features is mixed. Finally, ReliefF (Fig. 4A) provides the ranking that is the
closest to the ground truth.

These differences in behavior among the different ranking methods are also clearly
reflected in the FFA and RFA curves in Figs. A1A and A1B. In Fig. A1B, the RFA curves
for Info Gain and SVM-RFE have a similar behavior: Namely, a linearly increasing
accuracy (from right to left) in the region where the relevant features are randomly
distributed and a sharp increase in accuracy in the region where the individually relevant
features are included. On the other hand, the RFA curves of both random forests and
ReliefF remain first constant and then increase abruptly when the top-ranked features are
included. These two groups of methods can be also distinguished from the FFA curves.
The FFA curves of all methods are first increasing abruptly and then slightly decreasing
but the FFA curves of ReliefF and random forests increase during more steps and reach
higher accuracy than Info Gain and SVM-RFE. This clearly indicates that while Info
Gain and SVM manage to identify a proportion of the relevant features and put them at the

0 20 40 60 80 100

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

ground truth
info gain
random forests
reliefF
SVM−RFE
random

A

0 20 40 60 80 100
0

.4
0

.5
0

.6
0

.7
0

.8
0

.9

feature ranking

a
cc

u
ra

cy

ground truth
info gain
random forests
reliefF
SVM−RFE
random

B

0
.2

0
.4

0
.6

0
.8

top k features

n
o

rm
a

liz
e

d
 C

a
n

b
e

rr
a

 d
is

ta
n

ce

0 20 40 60 80 100

infoGain
rforest
relief
svmRfe

C

Figure A1 Ranking quality assessment for dataset combined in terms of the FFA (A) and RFA curves (B), and rankings’ stability estimates (C).
The FFA/RFA curves are obtained by using SVMs. Full-size DOI: 10.7717/peerj-cs.310/fig-A1

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 23/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-A1
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


top of the ranking, this proportion is nevertheless smaller than the one identified by
random forests and ReliefF.

Forward feature addition and RFA curves undoubtedly allow us to compare the quality
of the different ranking algorithms. The FFA/RFA curves of all methods are clearly better
than the curves of the random ranking. The ReliefF ranking algorithm, however, clearly
outperforms the other methods. It has the best error curves, for example, the curves
that are the closest to the ground truth ranking. The second best method are random
forests: they exhibit very similar performances, but show a slightly worse FFA curve.
Both Info Gain and SVM-RFE are clearly inferior in terms of both FFA and RFA curves.

Stability-wise, as seen in Fig. A1C, all of the algorithms are stable in the region of the
relevant features that they can detect, except for random forests, which has an instability
peak exactly in this region. This means that random forests are in this case capable of
detecting all the relevant features, but are highly unstable in the estimation of their
ordering. Further inspection reveals that Relief generates not only the best rankings in
terms of FFA/RFA curves, but also the most stable ones.

APPENDIX 4
In this section, we provide detailed per dataset results from the experimental study.

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 24/39

http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●

●
●
●●●●●●

●
●

●
●
●●●●●

●
●●

●
●
●
●●●●●●●●●●●●

●●●●●●●●●●
●●●●

●●●●●●●
●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●●
●●

●●
●
●●

●●
●●●●●

●
●●●●

●●●
●●●

●●
●●

●
●●

●●●●●●●●●●●●●●●
●●

●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●

●

●

●

●

●

●

●
●

●

●
●
●
●
●
●
●
●
●
●
●
●
●
●●

●
●
●●

●●
●●

●●
●●●

●●
●●●

●●●●
●●●●

●●
●●

●●●●
●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

1 18 38 86 118 138

●

●

●

infoGain
rforest
relief
svmRfe
random

A

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●●
●
●
●
●
●
●
●
●
●●

●●●
●●

●●
●●●

●●
●●

●●
●●

●●●
●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●
●●

●

●●
●
●
●●

●
●●

●●
●
●
●●

●●
●●●

●●●
●●

●
●●●●●●●●●●

●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●●●●
●●●

●●●
●●●

●●
●●

●●
●●

●●
●●

●●
●●

●●
●●

●●●
●●●

●●
●●●

●●●
●●

●
●●

●●
●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

1 15 30 45 96 159 219 279

●

●

●

infoGain
rforest
relief
svmRfe
random

B

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●●●●
●●●●●

●
●●●

●
●●●●●

●●
●●●

●●●
●
●
●
●●●

●●●●●●●●●●●●●●●

●

●

●
●
●
●

●
●●

●●●●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●
●
●
●
●
●

●

●

●●

●

●
●
●●●●●●●●●●●●●●●●●●●●●●●

●●
●●●

●●●●●●●●●●●●●●●●●●
●
●
●

●
●

●
●

●
●
●●●

●●●●
●●

●●
●
●
●

●
●
●

●
●
●
●
●
●
●●

●
●
●

●
●

●
●
●

●●
●
●
●
●
●●

●●●●●●
●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●●

●
●●

●
●
●
●
●
●
●
●
●
●
●

●
●

●

●

●

●

●

1 18 38 86 118 138

●

●

●

infoGain
rforest
relief
svmRfe
random

C

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy ●●●●●●
●
●●

●●●●●
●●●●

●●●
●●●●●●●●●●●●●●●

●●●●●●●●●●●●
●
●●●●●●●●●●

●●●●●●●●
●●

●●●●
●●●●●●●●●●●●●●●●●

●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●
●●●●●●●●●●●●●●●●●●●●

●
●●●●●●●●●●●●●●●

●●●●●●
●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●

●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

1 15 30 45 96 159 219 279

●

●

●

infoGain
rforest
relief
svmRfe
random

D

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●

●

●

●

●

●

●
●●

●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●
●●

●●

●●

●

●

●

●
●
●
●●

●●
●●●

●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

5 15 35 55 75 95

●

●

infoGain
rforest
relief
svmRfe

E

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●

●●

●

●

●
●
●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●

●

●

●●●●●
●
●●●

●●●
●●●

●●●
●●●●

●●●●
●●●●

●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

8 19 41 63 85 170

●

●

infoGain
rforest
relief
svmRfe

F

Figure A2 Ranking quality assessment for datasets aapc (A, C and E) and arrhythmia (B, D and F)
in terms of the FFA (A and B) and RFA curves (C and D), and rankings’ stability estimates (E and F).
The FFA/RFA curves are obtained by using SVMs. Full-size DOI: 10.7717/peerj-cs.310/fig-A2

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 25/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-A2
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

●
●

●

● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●
● ●

●

●

●
●

●

●

●

●

●
●

●
●

●
● ●

● ●
● ●

●
● ●

●
● ●

● ●
●

● ●
● ● ●

● ●
● ● ●

1 6 14 22 30 38

●

●

●

infoGain
rforest
relief
svmRfe
random

A

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●

●

●

●

●

●

●

●

●

●

●

1 2 3 4

●

●

●

infoGain
rforest
relief
svmRfe
random

B

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●

● ●
● ● ● ● ●

●
●

● ●
●

●
●

●
●

●

●

●

●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●
●

●
● ●

●
●

●
●

●
●

●
●

●

●

●
●

●

● ● ● ● ● ● ●
●

● ● ● ●
● ● ● ●

● ● ● ●
●

● ●
●

●
●

●
●

●

●
●

●
●

●
●

●

●

●

1 6 14 22 30 38

●

●

●

infoGain
rforest
relief
svmRfe
random

C

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●

●

●

●

●

●

●

●

●

●

●

1 2 3 4

●

●

●

infoGain
rforest
relief
svmRfe
random

D

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o

rm
a

liz
e

d
 C

a
n

b
e

rr
a

 d
is

ta
n

ce

● ●

●

●

●

●
●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ●

● ●

●

● ●

●

●
● ● ● ● ●

● ●
●

●
●

●
●

●
●

●
●

●
● ●

● ●
● ●

● ●
● ● ●

● ● ● ● ●

6 14 22 30 38

●

●

infoGain
rforest
relief
svmRfe

E

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

1
.0

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●
●

● ●

●
● ● ●

1 2 3 4

●

●

infoGain
rforest
relief
svmRfe

F

Figure A3 Ranking quality assessment for datasets australian (A, C and E) and balance (B, D and
F) in terms of the FFA (A and B) and RFA curves (C and D), and rankings’ stability estimates
(E and F). The FFA/RFA curves are obtained by using SVMs.

Full-size DOI: 10.7717/peerj-cs.310/fig-A3

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 26/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-A3
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy ● ● ● ●
●

● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ●

● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ●

●

●
●

●
●

● ● ● ● ● ●
●

● ● ● ● ● ● ● ● ● ● ● ●
● ●

● ●
● ●

●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●
● ● ● ● ●

● ● ● ● ● ●
● ● ● ● ● ● ● ●

● ● ● ● ●
● ● ● ● ● ● ● ●

● ● ● ● ● ● ●

1 8 16 24 32 40 48

●

●

●

infoGain
rforest
relief
svmRfe
random

A

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

1
.0

feature ranking

a
cc

u
ra

cy

●

● ● ● ● ●

●

●
● ●

●

●

●

● ● ● ●

● ● ● ●

●
●

●

●

●

●

●

●

●

●
●

●
●

● ● ●
●

● ● ● ●

● ●
● ●

●
●

●
●

●

●

●

●

●

●

●

●

●

●

●
● ●

1 5 9 13 17 21

●

●

●

infoGain
rforest
relief
svmRfe
random

B

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

● ●

●
●

● ● ● ●
●

●
●

● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●
● ● ●

●

●

●
●

● ●
●

● ●
● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

1 8 16 24 32 40 48

●

●

●

infoGain
rforest
relief
svmRfe
random

C

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

1
.0

feature ranking

a
cc

u
ra

cy

● ● ●

●

●

●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●
●

●

●

●

●

●

●

●

●
● ● ● ● ● ● ● ● ● ●

● ●
●

●

●

●

●

●

●

●

●

●

●
●

●
●

●
●

● ● ●

1 5 9 13 17 21

●

●

●

infoGain
rforest
relief
svmRfe
random

D

0
.2

0
.4

0
.6

0
.8

top k features

n
o

rm
a

liz
e

d
 C

a
n

b
e

rr
a

 d
is

ta
n

ce

●

●

● ●
●

●
●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●

●

●

●

●
● ●

● ●
● ●

● ●
● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

8 16 24 32 40 48

●

●

infoGain
rforest
relief
svmRfe

E

0
.2

0
.4

0
.6

0
.8

top k features

n
o

rm
a

liz
e

d
 C

a
n

b
e

rr
a

 d
is

ta
n

ce

●

●

●

●
●

●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

●

●

●
●

●
●

● ●
●

●
●

●
●

●
● ● ● ● ● ●

1 5 9 13 17 21

●

●

infoGain
rforest
relief
svmRfe

F

Figure A4 Ranking quality assessment for datasets breast-cancer (A, C and E) and car (B, D and
F) in terms of the FFA (A and B) and RFA curves (C and D), and rankings’ stability estimates
(E and F). The FFA/RFA curves are obtained by using SVMs.

Full-size DOI: 10.7717/peerj-cs.310/fig-A4

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 27/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-A4
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

1
.0

feature ranking

a
cc

u
ra

cy

●

●

●
●

● ●
●

●
●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ●

● ● ● ● ● ● ● ● ● ● ●

●

●

●

● ● ● ● ● ● ● ● ●
● ●

● ●
● ● ●

● ● ● ● ● ● ● ● ● ●
● ● ● ● ●

● ● ● ●

●

●

●
●

●

●
●

●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

● ●

1 6 14 22 30 38

●

●

●

infoGain
rforest
relief
svmRfe
random

A

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

● ●
● ●

● ● ●

●

●

● ●
● ● ● ●

●

●
●

●

●

●

●
●

1 2 3 4 5 6 7 8

●

●

●

infoGain
rforest
relief
svmRfe
random

B

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

1
.0

feature ranking

a
cc

u
ra

cy

●

●

●

●

● ●
●

●
●

●
●

●
● ●

●
●

● ● ● ●
●

●

●
●

● ● ● ● ● ● ● ● ● ● ● ● ●

●

●

●

●

●

● ● ● ●
● ● ● ●

●
●

●
●

●
●

●
● ● ● ●

●
●

●
●

●
●

●
●

●
●

●
●

●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●

●

●

●

●

●

1 6 14 22 30 38

●

●

●

infoGain
rforest
relief
svmRfe
random

C

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●

●
● ● ● ● ●

●

●

● ● ● ● ● ●

●

●

●
●

●
●

●

●

1 2 3 4 5 6 7 8

●

●

●

infoGain
rforest
relief
svmRfe
random

D

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●

●
● ● ● ●

● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

●

●

●

●

●
●

●
● ● ● ● ● ● ● ● ●

● ●
● ●

● ●
● ●

● ●
● ●

● ●
● ●

● ● ● ● ● ●

6 14 22 30 38

●

●

infoGain
rforest
relief
svmRfe

E

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●

●

●
●

● ●
● ●

●
●

● ● ●

●

● ●

1 2 3 4 5 6 7 8

●

●

infoGain
rforest
relief
svmRfe

F

Figure A5 Ranking quality assessment for datasets chess (A, C and E) and diabetes (B, D and F) in
terms of the FFA (A and B) and RFA curves (C and D), and rankings’ stability estimates (E and F).
The FFA/RFA curves are obtained by using SVMs. Full-size DOI: 10.7717/peerj-cs.310/fig-A5

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 28/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-A5
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


0
.3

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●
●

●

●
●

●

●
● ●

● ● ● ●
● ●

● ● ● ● ● ● ● ● ● ●
● ●

●
● ●

● ●
● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●

●

●

●

●

●

●

● ●
●

●

●
● ●

● ●
● ●

● ● ●
● ● ● ●

● ●
● ●

● ●
● ●

● ●
● ●

● ● ●
● ● ● ●

● ●
●

● ●
● ● ● ● ●

● ● ● ● ● ●

●
●

●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
● ●

● ●
● ●

● ●
● ●

● ●
● ●

● ● ●
● ● ●

● ● ●
● ● ●

● ●
● ● ●

●
●

●
●

●
●

●

1 10 18 26 34 42 50 86

●

●

●

infoGain
rforest
relief
svmRfe
random

A

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy ● ● ● ●
●

● ●
●

● ● ●
●

● ●
● ●

● ● ●
● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●

● ●

●
●

●
● ●

● ● ●
●

●
● ●

● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●

● ● ● ● ● ● ● ● ● ●
● ● ● ● ●

● ● ● ● ●
● ● ● ● ● ●

● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●

● ● ● ● ● ●
● ●

●

1 5 13 21 29 37 45 59

●

●

●

infoGain
rforest
relief
svmRfe
random

B

0
.3

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●

●

●

●

●

●

●
●

●
●

●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●
●

● ●

●

●

●

●

●

●

●

●
● ● ● ●

● ●
● ●

●
●

● ● ●
● ● ●

●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●
●

● ●
●

●
●

●
●

●
●

●
●

●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ●
● ● ●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

● ●
●

●
●

●
●

1 10 18 26 34 42 50 86

●

●

●

infoGain
rforest
relief
svmRfe
random

C

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●
● ● ● ● ● ● ● ● ●

● ● ● ● ● ●
● ● ● ● ●

● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●

●
●

● ●
● ●

●
● ● ● ●

● ●
● ● ●

● ● ●
● ● ● ● ● ●

● ● ● ●
● ● ● ●

● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

1 5 13 21 29 37 45 59

●

●

●

infoGain
rforest
relief
svmRfe
random

D

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●

●

●
●

●
●

●
●

●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●
●●●●●●●●●●●●●●●●●●

●

●

●

●

●

●
●

●●
●●●●●●●

●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

6 14 30 46 62 78

●

●

infoGain
rforest
relief
svmRfe

E

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o

rm
a

liz
e

d
 C

a
n

b
e

rr
a
 d

is
ta

n
ce

● ●

●

●
●

● ●
●

●
● ●

● ● ●
● ● ●

● ● ● ●
● ● ● ● ●

● ● ● ● ● ●
● ● ● ● ● ●

● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

●

●

●

●
●

●
●

● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ●
● ● ● ● ● ●

● ● ● ● ●
● ● ● ● ● ●

3 11 19 27 35 43 51 59

●

●

infoGain
rforest
relief
svmRfe

F

Figure A6 Ranking quality assessment for datasets diversity (A, C and E) and german (B, D and F)
in terms of the FFA (A and B) and RFA curves (C and D), and rankings’ stability estimates (E and F).
The FFA/RFA curves are obtained by using SVMs. Full-size DOI: 10.7717/peerj-cs.310/fig-A6

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 29/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-A6
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

● ●

● ●

●

● ● ●
● ●

●
●

●
● ●

● ● ● ● ● ● ●

●

●

●

●

●

●
● ● ● ● ●

● ● ● ●
● ●

● ●
● ●

●

●

●

●

●

●

●
●

●
●

●
●

● ●
● ●

● ●
● ●

● ●
●

1 6 10 14 18 22

●

●

●

infoGain
rforest
relief
svmRfe
random

A

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy ●

●

●

● ● ● ●
●

●

●
● ● ● ●

● ● ● ● ●
● ● ●

● ● ● ● ●
● ● ● ● ●

● ●
● ● ●

● ● ●
● ● ● ●

●

●

●

●

●

●

●
●

●
● ●

● ●
● ● ●

● ●
● ●

● ●

1 6 10 14 18 22

●

●

●

infoGain
rforest
relief
svmRfe
random

B

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

● ●
●

●
●

●

●
● ●

● ● ● ●

●

●

● ● ●
● ● ● ●

● ● ●

● ● ●
●

● ●

●
●

●

●

●

●

●

●

●

●

●
● ●

● ● ● ● ● ● ● ●
● ● ●

●
●

●
●

●
●

●

●

●

●

●

1 6 10 14 18 22

●

●

●

infoGain
rforest
relief
svmRfe
random

C

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

● ●
● ●

●

●

● ●
● ● ●

● ● ● ● ● ● ● ● ●
● ●

●
●

●
●

●
●

●

●

● ●
●

●
●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●
●

●
●

●
●

●

●

●

●

●

●

1 6 10 14 18 22

●

●

●

infoGain
rforest
relief
svmRfe
random

D

0
.2

0
.4

0
.6

0
.8

top k features

n
o

rm
a

liz
e

d
 C

a
n

b
e

rr
a

 d
is

ta
n

ce

●

●

●
● ●

● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●

●

●

● ● ● ● ● ● ●
●

● ● ● ● ●
● ●

● ●
● ● ●

2 6 10 14 18 22

●

●

infoGain
rforest
relief
svmRfe

E

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●

●

●

●
●

●
● ● ●

● ● ● ● ● ● ● ●
● ●

● ● ●

●
●

●

● ●
●

● ●
● ● ●

● ●
● ●

● ●
● ●

● ● ●

2 6 10 14 18 22

●

●

infoGain
rforest
relief
svmRfe

F

Figure A7 Ranking quality assessment for datasets heart-c (A, C and E) and heart-h (B, D and F)
in terms of the FFA (A and B) and RFA curves (C and D), and rankings’ stability estimates (E and F).
The FFA/RFA curves are obtained by using SVMs. Full-size DOI: 10.7717/peerj-cs.310/fig-A7

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 30/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-A7
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

● ●
● ●

● ●

●

● ●
●

●
● ●

●
● ● ● ● ● ●

●

●

●
●

●

●

● ● ●
●

●
● ● ● ● ● ● ● ● ●

●

●

●

●

●

●
●

●
●

●
●

●
●

●
● ●

● ● ● ●

1 4 8 12 16 20

●

●

●

infoGain
rforest
relief
svmRfe
random

A

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

● ● ● ●

● ● ● ● ●
●

● ●
●

● ● ● ●
●

●

●

● ● ● ● ●
● ●

● ●
● ● ●

● ● ●
●

●
●

●
● ●

● ●
●

●
● ●

● ●
● ●

● ● ● ● ●
●

1 3 7 11 15 19

●

●

●

infoGain
rforest
relief
svmRfe
random

B

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

● ●
● ●

●

● ●
●

● ● ●

●

●

●

●
● ●

● ● ●

● ● ●
●

● ●

●

●
● ●

●
●

●

●

●
●

●

●

●
●

● ● ● ● ● ● ● ● ●
●

●
●

●
●

●

●

●

●

●

●

1 4 8 12 16 20

●

●

●

infoGain
rforest
relief
svmRfe
random

C

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

● ●
●

●
●

●
●

●
●

●
●

● ● ● ● ● ● ●

●
● ●

●
●

● ● ●
●

●

● ● ● ● ● ● ● ● ●

●
●

● ● ● ● ● ● ● ● ●
● ●

●
● ● ● ●

●

1 3 7 11 15 19

●

●

●

infoGain
rforest
relief
svmRfe
random

D

0
.2

0
.4

0
.6

0
.8

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●

●

●

●
● ●

● ● ● ●
● ● ● ● ● ● ● ● ● ●

●

● ● ● ●
●

●
●

●
●

● ● ● ●
● ●

● ● ● ●

4 8 12 16 20

●

●

infoGain
rforest
relief
svmRfe

E

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o

rm
a

liz
e

d
 C

a
n

b
e

rr
a

 d
is

ta
n

ce

●

●

●

●

●

●
●

●
● ● ● ● ● ● ● ● ● ● ●

●

● ●

●

●

●

●
●

● ●
● ● ● ●

● ●
● ● ●

3 7 11 15 19

●

●

infoGain
rforest
relief
svmRfe

F

Figure A8 Ranking quality assessment for datasets heart (A, C and E) and hepatitis (B, D and F)
in terms of the FFA (A and B) and RFA curves (C and D), and rankings’ stability estimates (E and F).
The FFA/RFA curves are obtained by using SVMs. Full-size DOI: 10.7717/peerj-cs.310/fig-A8

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 31/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-A8
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


0
.2

0
.4

0
.6

0
.8

feature ranking

a
cc

u
ra

cy

●

●

● ● ● ●

●

●
●

●

●
●

●
● ● ● ● ● ●

●

●

●

●

●

●
● ●

● ● ● ● ● ● ● ● ● ● ●

●

●

●

●

●

●

●

●

●

●

●
●

●
●

●
●

●
● ●

1 3 7 11 15 19

●

●

●

infoGain
rforest
relief
svmRfe
random

A

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●

●
●

●
●

●
●

● ●
● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●
● ● ● ● ●

● ●

●

●

● ●

● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●
● ● ● ●

●

●

●

●

●

●

●

●
●

●
●

●
●

● ●
● ● ●

● ● ●
● ● ● ●

● ● ● ● ●
● ● ● ●

1 10 18 26 34

●

●

●

infoGain
rforest
relief
svmRfe
random

B

0
.2

0
.4

0
.6

0
.8

feature ranking

a
cc

u
ra

cy

● ● ● ● ● ●

●

●

●

●

●

●

● ● ● ● ● ● ●

●

● ●
●

●

● ●

●

● ● ●
●

●

●

●

● ● ● ●

●
●

●
●

●
●

●
●

●

●

●

●

●

●

●

●

●

●

●

1 3 7 11 15 19

●

●

●

infoGain
rforest
relief
svmRfe
random

C

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●
● ●

●
● ● ● ● ● ● ●

● ●
●

● ● ●
●

● ●
●

●
● ●

●
● ●

●
● ● ● ●

●

●

●

●
● ● ● ● ● ● ● ● ● ● ● ●

●
●

●
●

●
●

●

●

●

●

●

●

●
●

●
● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●

●
●

●
●

●
●

●

●

●

●

●

●

1 10 18 26 34

●

●

●

infoGain
rforest
relief
svmRfe
random

D

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●

●

●

●
●

●
● ● ● ● ● ● ● ● ● ● ● ● ●

●

●

● ●
● ●

●
●

● ● ● ●
● ● ● ● ● ● ●

3 7 11 15 19

●

●

infoGain
rforest
relief
svmRfe

E

0
.2

0
.4

0
.6

0
.8

top k features

n
o

rm
a

liz
e

d
 C

a
n

b
e

rr
a

 d
is

ta
n

ce

●

●

●

●
● ●

●
●

●
● ●

● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

●

●

●
●

●

●

●
●

●
●

●
●

● ●
● ●

● ● ●
● ● ● ●

● ● ●
● ● ●

● ● ● ●

2 10 18 26 34

●

●

infoGain
rforest
relief
svmRfe

F

Figure A9 Ranking quality assessment for datasets image (A, C and E) and ionosphere (B, D and F)
in terms of the FFA (A and B) and RFA curves (C and D), and rankings’ stability estimates (E and F).
The FFA/RFA curves are obtained by using SVMs. Full-size DOI: 10.7717/peerj-cs.310/fig-A9

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 32/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-A9
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

● ● ●

●
●

●

●

●

●

●
●

1 2 3 4

●

●

●

infoGain
rforest
relief
svmRfe
random

A

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

● ●

●

●

●

●
●

●

● ●
●

●
● ● ●

●
● ● ● ●

●
●

●
● ● ● ● ●

● ● ● ●
●

● ●
● ● ● ● ● ● ● ● ● ● ●

● ● ●
● ●

●
●

●
●

●

●
●

●

● ● ●
● ●

● ● ● ● ● ●
●

●
●

●
● ●

●
● ● ●

● ● ●
●

● ● ●
●

●
●

● ●
● ● ● ●

● ● ● ● ● ● ●
● ●

●

● ●
●

●

●

●

●

●

●

●
●

●
●

●
●

●
●

● ●
●

● ●
●

● ●
● ●

● ●
● ● ●

● ● ●
● ● ●

● ● ●
● ● ●

● ● ● ● ● ●
●

● ●

1 5 13 21 29 37 45 60

●

●

●

infoGain
rforest
relief
svmRfe
random

B

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●

●

●

●

●

●

●

●

●

●

●

1 2 3 4

●

●

●

infoGain
rforest
relief
svmRfe
random

C

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

●
● ● ●

●

●
●

● ●
● ● ● ● ●

●
● ● ●

●
● ● ● ●

●

●
●

●
●

●
●

● ●
●

● ●
●

● ● ● ●

●
●

● ● ● ● ● ● ● ● ● ●

●

● ●
● ●

● ● ●
●

●
● ● ● ●

● ● ● ● ●
● ● ●

● ●
● ●

●

● ●

● ●

● ●
● ●

●
●

● ●
●

●
●

●

●
●

● ● ● ● ● ● ● ●

● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●
●

● ● ●
●

●
●

●
●

●
●

●

●

●

●

●

●

●
● ●

1 5 13 21 29 37 45 60

●

●

●

infoGain
rforest
relief
svmRfe
random

D

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●
● ● ●

● ● ● ●

1 2 3 4

●

●

infoGain
rforest
relief
svmRfe

E

0
.2

0
.4

0
.6

0
.8

top k features

n
o

rm
a

liz
e

d
 C

a
n

b
e

rr
a

 d
is

ta
n

ce

●

●
●

●
●

●●
●●●

●●●●●●●●●●
●●●

●●●
●●●●

●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●

●

●

●

●

●

●

●
●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●

●●●●●●●●

6 12 18 24 30 36 42 48 54 60

●

●

infoGain
rforest
relief
svmRfe

F

Figure A10 Ranking quality assessment for datasets iris (A, C and E) and sonar (B, D and F) in
terms of the FFA (A and B) and RFA curves (C and D), and rankings’ stability estimates (E and F).
The FFA/RFA curves are obtained by using SVMs. Full-size DOI: 10.7717/peerj-cs.310/fig-A10

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 33/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-A10
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

1
.0

feature ranking

a
cc

u
ra

cy

●

●

●

●
●

●

●

●

●
● ● ●

●

●
●

●

●

● ● ● ● ● ● ● ● ● ●

● ●

● ●

●
●

●

●

●
●

●

● ●
●

●

●

●

● ● ● ● ● ● ● ● ● ●

● ● ●
●

●
●

●
●

●
●

●
●

●
●

●
●

●
●

●

●

●

●

●
●

● ● ●

1 3 7 11 15 19 23 27

●

●

●

infoGain
rforest
relief
svmRfe
random

A

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

● ● ● ● ● ● ● ● ● ● ● ●
● ● ●

●

● ● ● ● ●
● ● ● ●

● ● ● ● ● ●

●

●

●

●

●
●

●
●

●
●

● ●
● ●

●
●

1 4 8 12 16

●

●

●

infoGain
rforest
relief
svmRfe
random

B

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

1
.0

feature ranking

a
cc

u
ra

cy

● ●

● ●

●

●

●
●

●
●

●
●

●
●

●

●
● ● ● ● ● ● ● ● ● ● ●

● ●
●

● ●

●

●
● ● ●

●

●
●

●
●

●

●
●

● ● ● ● ● ● ● ● ●

● ● ●
●

●

●

●

●

●

●

●

●
●

●
●

●

●
●

●
●

●
● ● ● ● ● ●

1 3 7 11 15 19 23 27

●

●

●

infoGain
rforest
relief
svmRfe
random

C

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●

● ●
●

●
● ●

●

● ●
● ● ● ●

●

●

●

● ● ●
●

● ●
●

●
●

●
●

●

●

●

●

●
● ● ● ●

● ●
●

●
●

●

●
●

●

●

●

1 4 8 12 16

●

●

●

infoGain
rforest
relief
svmRfe
random

D

0
.0

0
.2

0
.4

0
.6

0
.8

1
.0

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●

●

●

●
●

● ● ●
●

●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

●

●
●

●

● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●

● ●
● ● ● ●

3 7 11 15 19 23 27

●

●

infoGain
rforest
relief
svmRfe

E

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●

●
●

●
●

● ●
● ●

●
● ● ● ● ● ●

●

●

●

●
●

● ● ●
● ● ●

● ●
● ● ●

4 8 12 16

●

●

infoGain
rforest
relief
svmRfe

F

Figure A11 Ranking quality assessment for datasets tic-tac-toe (A, C and E) and vote (B, D and
F) in terms of the FFA (A and B) and RFA curves (C and D), and rankings’ stability estimates
(E and F). The FFA/RFA curves are obtained by using SVMs.

Full-size DOI: 10.7717/peerj-cs.310/fig-A11

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 34/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-A11
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy
●

●

●

●

●

●
●

●
●

●
● ●

● ●
● ● ● ● ● ● ●

●

●

●

●

●

●

● ●
●

●
●

● ●
● ●

● ● ● ● ● ●

●

●

●

●

●

●

●

●
●

●
●

●
●

●
●

● ●
● ●

● ●

1 5 9 13 17 21

●

●

●

infoGain
rforest
relief
svmRfe
random

A

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

1
.0

feature ranking

a
cc

u
ra

cy

●

●

●

●

●

● ●
● ● ● ●

●
●

●

●

●

●

●

●

● ●
●

● ● ● ●

●

●

●

●

●

●

●

●
●

●
●

●
●

1 3 5 7 9 11 13

●

●

●

infoGain
rforest
relief
svmRfe
random

B

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

feature ranking

a
cc

u
ra

cy

●
●

●
●

●
●

●
●

●

●
●

●

●

●

●

●

●

●

●

● ●

●
● ●

●
●

●
●

●

●

●
●

●

●

●

●

●

●

●

●

● ●

● ● ● ●
● ● ●

●
●

●
●

●
●

●

●

●

●

●

●

●

●

1 5 9 13 17 21

●

●

●

infoGain
rforest
relief
svmRfe
random

C

0
.4

0
.5

0
.6

0
.7

0
.8

0
.9

1
.0

feature ranking

a
cc

u
ra

cy

●
● ●

●

●

●

●

●

●

●
● ● ●

●

●

●

●

●

●

●

●

●

●
● ● ●

●
●

●
●

●

●

●

●

●

●

●

●

●

1 3 5 7 9 11 13

●

●

●

infoGain
rforest
relief
svmRfe
random

D

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●

●

●
●

●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

●
●

● ● ●
●

● ● ● ● ● ● ● ● ● ● ● ● ● ●

1 5 9 13 17 21

●

●

infoGain
rforest
relief
svmRfe

E

0
.0

0
.2

0
.4

0
.6

0
.8

top k features

n
o
rm

a
liz

e
d
 C

a
n
b
e
rr

a
 d

is
ta

n
ce

●

●

● ● ● ●
● ● ● ● ● ● ●

● ●
● ● ●

●
● ● ● ●

●
● ●

1 3 5 7 9 11 13

●

●

infoGain
rforest
relief
svmRfe

F

Figure A12 Ranking quality assessment for datasets waveform (A, C and E) and wine (B, D and F) in
terms of the FFA (A and B) and RFA curves (C and D), and rankings’ stability estimates (E and F).
The FFA/RFA curves are obtained by using SVMs. Full-size DOI: 10.7717/peerj-cs.310/fig-A12

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 35/39

http://dx.doi.org/10.7717/peerj-cs.310/fig-A12
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


ADDITIONAL INFORMATION AND DECLARATIONS

Funding
This work was supported by The Ad Futura Slovene Human Resources Development and
Scholarship Fund, Slovenian Research Agency (through the grants J2-9230 and N2-0128
and a young researcher grant), the European Commission through the grants TAILOR
(H2020-ICT-952215) and AI4EU (H2020-ICT-825619). The funders had no role in study
design, data collection and analysis, decision to publish, or preparation of the manuscript.

Grant Disclosures
The following grant information was disclosed by the authors:
The Ad Futura Slovene Human Resources Development and Scholarship Fund, Slovenian
Research Agency: J2-9230 and N2-0128.
TAILOR (H2020-ICT-952215) and AI4EU (H2020-ICT-825619).

Competing Interests
The authors declare that they have no competing interests.

Author Contributions
� Ivica Slavkov conceived and designed the experiments, performed the experiments,
analyzed the data, performed the computation work, prepared figures and/or tables,
authored or reviewed drafts of the paper, and approved the final draft.

� Matej Petković performed the experiments, analyzed the data, performed the
computation work, prepared figures and/or tables, authored or reviewed drafts of the
paper, and approved the final draft.

� Pierre Geurts conceived and designed the experiments, authored or reviewed drafts of
the paper, and approved the final draft.

� Dragi Kocev conceived and designed the experiments, authored or reviewed drafts of the
paper, and approved the final draft.

� Sašo Džeroski conceived and designed the experiments, authored or reviewed drafts of
the paper, and approved the final draft.

Data Availability
The following information was supplied regarding data availability:

The code for constructing the curves is available at GitHub: https://github.com/
Petkomat/fr-eval-curves.

The code for constructing the feature rankings and predictive models used the methods
(feature ranking and predictive modeling) implemented in Weka 3.6: https://www.cs.
waikato.ac.nz/~ml/weka/.

The following datasets are available from the UCI repository: https://archive.ics.uci.edu/
ml/datasets.php

Credit approval, Arrhythmia, Balance Scale, Breast Cancer Wisconsin (Original),
Breast Cancer, Car Evaluation, Chess (King-Rook vs. King-Pawn), Statlog (German Credit

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 36/39

https://github.com/Petkomat/fr-eval-curves
https://github.com/Petkomat/fr-eval-curves
https://www.cs.waikato.ac.nz/&#x007E;ml/weka/
https://www.cs.waikato.ac.nz/&#x007E;ml/weka/
https://archive.ics.uci.edu/ml/datasets.php
https://archive.ics.uci.edu/ml/datasets.php
https://archive.ics.uci.edu/ml/datasets/Statlog+%28Australian+Credit+Approval%29
https://archive.ics.uci.edu/ml/datasets/Arrhythmia
https://archive.ics.uci.edu/ml/datasets/Balance+Scale
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer
https://archive.ics.uci.edu/ml/datasets/Car+Evaluation
https://archive.ics.uci.edu/ml/datasets/Chess+%28King-Rook+vs.+King-Pawn%29
https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


Data), Statlog (Heart), Heart Disease (for Cleveland and Hungarian data), Hepatitis,
Image Segmentation, Ionosphere, Iris, Connectionist Bench (Sonar, Mines vs. Rocks),
Tic-Tac-Toe Endgame, Congressional Voting Records, Waveform Database Generator
(Version 1), Wine.

The Pima Indians Diabetes data (previously available at the UCI repository) is now
available at OpenML: https://www.openml.org/d/37.

The following datasets are available from the Bioinformatics Laboratory:
https://file.biolab.si/biolab/supp/bi-cancer/projections/index.html.

AML prognosis, bladder cancer, breast cancer, childhood ALL, CML treatment,
breast & colon (colon part of the data set), DLBCL, leukemia, MLL, prostate, SRBCT.

Three additional datasets (aapc, diversity, water) are available at GitLab:
http://source.ijs.si/data/classification-data.

REFERENCES
Aha D, Kibler D. 1991. Instance-based learning algorithms. Machine Learning 6:37–66.

Arceo-Vilas A, Fernandez-Lozano C, Pita S, Pértega-Díaz S, Pazos A. 2020. A redundancy-
removing feature selection algorithm for nominal data. PeerJ Computer Science 1:e24
DOI 10.7717/peerj-cs.24.

Bakr GH, Hofmann T, Schölkopf B, Smola AJ, Taskar B, Vishwanathan SV. 2007.
Predicting structured data. Cambridge: The MIT Press.

Biesiada J, Duch W, Kachel A, Paucha S. 2005. Feature ranking methods based on information
entropy with parzen windows. In: International Conference on Research in Electrotechnology and
Applied Informatics, Katowice, Poland.

Boucheham A, Batouche M. 2014. Robust biomarker discovery for cancer diagnosis based on
meta-ensemble feature selection. In: 2014 Science and Information Conference. 452–560.

Breiman L. 2001. Random forests. Machine Learning 45(1):5–32 DOI 10.1023/A:1010933404324.

Cortes C, Vapnik V. 1995. Support-vector networks. Machine Learning 20(3):273–297.

Duch W, Wieczorek T, Biesiada J. 2004. Comparison of feature ranking methods based on
information entropy. In: IEEE International Conference on Neural Networks - Conference
Proceedings. Vol. 2. Piscataway: IEEE, 1415–1419.

Džeroski S, Demšar D, Grbović J. 2000. Predicting chemical parameters of river water quality
from bioindicator data. Applied Intelligence 13(1):7–17 DOI 10.1023/A:1008323212047.

Džeroski S, Potamias G, Moustakis V, Charissis G. 1997. Automated revision of expert rules for
treating acute abdominal pain in children. In: Artificial Intelligence in Medicine - AIME, LNCS
1211, 98–109.

Furlanello C, Serafini M, Merler S, Jurman G. 2003. Entropy-based gene ranking without
selection bias for the predictive classification of microarray data. BMC Bioinformatics 4(1):54
DOI 10.1186/1471-2105-4-54.

Guyon I, Weston J, Barnhill S, Vapnik V. 2002. Gene selection for cancer classification using
support vector machines. Machine Learning 46(1/3):389–422 DOI 10.1023/A:1012487302797.

Guzmán-Martnez R, Alaiz-Rodrguez R. 2011. Feature selection stability assessment based on the
Jensen-Shannon divergence. Lecture Notes in Computer Science 6911:597–612.

Henzgen S, Hüllermeier E. 2015. Weighted rank correlation: a flexible approach based on fuzzy
order relations. In: Appice A, Rodrigues PP, Santos Costa V, Gama J, Jorge A, Soares C, eds.

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 37/39

https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
https://archive.ics.uci.edu/ml/datasets/Statlog+(Heart)
https://archive.ics.uci.edu/ml/datasets/Heart+Disease
https://archive.ics.uci.edu/ml/datasets/Hepatitis
https://archive.ics.uci.edu/ml/datasets/Image+Segmentation
https://archive.ics.uci.edu/ml/datasets/Ionosphere
https://archive.ics.uci.edu/ml/datasets/Iris
https://archive.ics.uci.edu/ml/datasets/Connectionist+Bench+%28Sonar%2C+Mines+vs.+Rocks%29
https://archive.ics.uci.edu/ml/datasets/Tic-Tac-Toe+Endgame
https://archive.ics.uci.edu/ml/datasets/Congressional+Voting+Records
https://archive.ics.uci.edu/ml/datasets/Waveform+Database+Generator+(Version+1)
https://archive.ics.uci.edu/ml/datasets/Waveform+Database+Generator+(Version+1)
https://archive.ics.uci.edu/ml/datasets/Wine
https://www.openml.org/d/37
https://file.biolab.si/biolab/supp/bi-cancer/projections/index.html
https://file.biolab.si/biolab/supp/bi-cancer/projections/info/AMLGSE2191.html
https://file.biolab.si/biolab/supp/bi-cancer/projections/info/bladderGSE89.html
https://file.biolab.si/biolab/supp/bi-cancer/projections/info/BCGSE349_350.html
https://file.biolab.si/biolab/supp/bi-cancer/projections/info/ALLGSE412_pred_poTh.html
https://file.biolab.si/biolab/supp/bi-cancer/projections/info/CMLGSE2535.html
https://file.biolab.si/biolab/supp/bi-cancer/projections/info/BC_CCGSE3726_frozen.html
https://file.biolab.si/biolab/supp/bi-cancer/projections/info/DLBCL.html
https://file.biolab.si/biolab/supp/bi-cancer/projections/info/leukemia.html
https://file.biolab.si/biolab/supp/bi-cancer/projections/info/MLL.html
https://file.biolab.si/biolab/supp/bi-cancer/projections/info/prostata.html
https://file.biolab.si/biolab/supp/bi-cancer/projections/info/SRBCT.html
http://source.ijs.si/data/classification-data
http://dx.doi.org/10.7717/peerj-cs.24
http://dx.doi.org/10.1023/A:1010933404324
http://dx.doi.org/10.1023/A:1008323212047
http://dx.doi.org/10.1186/1471-2105-4-54
http://dx.doi.org/10.1023/A:1012487302797
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


Machine Learning and Knowledge Discovery in Databases. Berlin: Springer International
Publishing, 422–437.

John GH, Langley P. 1995. Estimating continuous distributions in Bayesian classifiers. In:
Proc. Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, CA. Burlington:
Morgan Kaufmann, 338–345.

Jong K, Mary J, Cornuéjols A, Marchiori E, Sebag M. 2004. Ensemble feature ranking. In:
PKDD - LNCS 2302. 267–278.

Jurman G, Merler S, Barla A, Paoli S, Galea A, Furlanello C. 2008. Algebraic stability indicators
for ranked lists in molecular profiling. Bioinformatics 24(2):258–264
DOI 10.1093/bioinformatics/btm550.

Kalousis A, Prados J, Hilario M. 2007. Stability of feature selection algorithms: a study on
high-dimensional spaces. Knowledge and Information Systems 12(1):95–116
DOI 10.1007/s10115-006-0040-8.

Khoshgoftaar TM, Fazelpour A, Wang H, Wald R. 2013. A survey of stability analysis of feature
subset selection techniques. In: IEEE 14th International Conference on Information Reuse
Integration (IRI). Piscataway: IEEE, 424–431.

Lance GN, Williams WT. 1966. Computer programs for hierarchical polythetic classification
(‘similarity analyses’). Computer Journal 9(1):60–64 DOI 10.1093/comjnl/9.1.60.

Lance GN, Williams WT. 1967. Mixed-data classificatory programs i-agglomerative systems.
Australian Computer Journal 1:15–20.

Li Z, Gu W. 2015. A redundancy-removing feature selection algorithm for nominal data. PeerJ
Computer Science 3:e1184 DOI 10.7287/peerj.preprints.1184v1.

Liang J, Yang S, Winstanley A. 2008. Invariant optimal feature selection: a distance discriminant
and feature ranking based solution. Pattern Recognition 41(5):1429–1439
DOI 10.1016/j.patcog.2007.10.018.

Liu T, Liu S, Chen Z, Ma W-Y. 2003. An evaluation on feature selection for text clustering. In:
Fawcett T, Mishra N, eds. ICML. Menlo Park: The AAAI Press, 488–495.

Mramor M, Leban G, Demšar J, Zupan B. 2007. Visualization-based cancer microarray data
classification analysis. Bioinformatics 23(16):2147–2154 DOI 10.1093/bioinformatics/btm312.

Muja M, Lowe DG. 2009. Fast approximate nearest neighbors with automatic algorithm
configuration. In: Ranchordas A, Araújo H, eds. VISAPP (1). Porto: INSTICC Press, 331–340.

Nardone D, Ciaramella A, Staiano A. 2019. A redundancy-removing feature selection algorithm
for nominal data. PeerJ Computer Science 1:e24 DOI 10.7717/peerj-cs.24.

Newman CBD, Merz C. 1998. UCI repository of machine learning databases. Available at
http://archive.ics.uci.edu/ml/index.php (accessed 13 December 2015).

Nilsson R, Peña JM, Björkegren J, Tegnér J. 2007. Consistent feature selection for pattern
recognition in polynomial time. Journal of Machine Learning Research 8:589–612.

Nogueira S, Sechidis K, Brown G. 2017. On the stability of feature selection algorithms.
Journal of Machine Learning Research 18(1):6345–6398.

Paoli S, Jurman G, Albanese D, Merler S, Furlanello C. 2005. Semisupervised profiling of gene
expressions and clinical data. In: Proc. Sixth International Conference on Fuzzy Logic and
Applications. 284–289.

Quinlan R. 1993. C4.5: programs for machine learning. San Mateo: Morgan Kaufmann Publishers.

Robnik-Šikonja M, Kononenko I. 2003. Theoretical and empirical analysis of ReliefF and
RReliefF. Machine Learning 53(1/2):23–69 DOI 10.1023/A:1025667309714.

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 38/39

http://dx.doi.org/10.1093/bioinformatics/btm550
http://dx.doi.org/10.1007/s10115-006-0040-8
http://dx.doi.org/10.1093/comjnl/9.1.60
http://dx.doi.org/10.7287/peerj.preprints.1184v1
http://dx.doi.org/10.1016/j.patcog.2007.10.018
http://dx.doi.org/10.1093/bioinformatics/btm312
http://dx.doi.org/10.7717/peerj-cs.24
http://archive.ics.uci.edu/ml/index.php
http://dx.doi.org/10.1023/A:1025667309714
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/


Saeys Y, Abeel T, De Peer YV. 2008. Robust feature selection using ensemble feature selection
techniques. In: Daelemans W, Goethals B, Morik K, eds. Machine Learning and Knowledge
Discovery in Databases. ECML PKDD 2008, Lecture Notes in Computer Science. Vol. 5212.
Berlin: Springer, 313–325.

Slavkov I, Petković M, Kocev D, Džeroski S. 2018. Quantitative score for assessing the quality of
feature rankings. Informatica 42(1):43–52.

Tsang IW, Kwok JT, Cheung P-M. 2005. Core vector machines: fast svm training on very large
data sets. Journal of Machine Learning Research 6:363–392.

Verikas A, Gelzinis A, Bacauskiene M. 2011. Mining data with random forests: a survey and
results of new tests. Pattern Recognition 44(2):330–349 DOI 10.1016/j.patcog.2010.08.011.

Wang Y, Jha S, Chaudhuri K. 2018. Analyzing the robustness of nearest neighbors to adversarial
examples. In: Proceedings of the 35th International Conference on Machine Learning (ICML),
PMLR 80. Stockholm, Sweden, 5120–5129.

Xu H, Caramanis C, Mannor S. 2009. Robustness and regularization of support vector machines.
Journal of Machine Learning Research 10:1485–1510.

Slavkov et al. (2020), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.310 39/39

http://dx.doi.org/10.1016/j.patcog.2010.08.011
http://dx.doi.org/10.7717/peerj-cs.310
https://peerj.com/computer-science/

	Error curves for evaluating the quality of feature rankings
	Introduction
	Related work
	A method for evaluating feature rankings
	Empirical evaluation of ffa/rfa curve properties
	Feature ranking comparison on real-worlds datasets
	Conclusions
	Appendix 1
	Appendix 2
	Appendix 3
	Appendix 4
	References


<<
  /ASCII85EncodePages false
  /AllowTransparency false
  /AutoPositionEPSFiles true
  /AutoRotatePages /None
  /Binding /Left
  /CalGrayProfile (Dot Gain 20%)
  /CalRGBProfile (sRGB IEC61966-2.1)
  /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2)
  /sRGBProfile (sRGB IEC61966-2.1)
  /CannotEmbedFontPolicy /Warning
  /CompatibilityLevel 1.4
  /CompressObjects /Off
  /CompressPages true
  /ConvertImagesToIndexed true
  /PassThroughJPEGImages true
  /CreateJobTicket false
  /DefaultRenderingIntent /Default
  /DetectBlends true
  /DetectCurves 0.0000
  /ColorConversionStrategy /LeaveColorUnchanged
  /DoThumbnails false
  /EmbedAllFonts true
  /EmbedOpenType false
  /ParseICCProfilesInComments true
  /EmbedJobOptions true
  /DSCReportingLevel 0
  /EmitDSCWarnings false
  /EndPage -1
  /ImageMemory 1048576
  /LockDistillerParams false
  /MaxSubsetPct 100
  /Optimize true
  /OPM 1
  /ParseDSCComments true
  /ParseDSCCommentsForDocInfo true
  /PreserveCopyPage true
  /PreserveDICMYKValues true
  /PreserveEPSInfo true
  /PreserveFlatness true
  /PreserveHalftoneInfo false
  /PreserveOPIComments false
  /PreserveOverprintSettings true
  /StartPage 1
  /SubsetFonts true
  /TransferFunctionInfo /Apply
  /UCRandBGInfo /Preserve
  /UsePrologue false
  /ColorSettingsFile (None)
  /AlwaysEmbed [ true
  ]
  /NeverEmbed [ true
  ]
  /AntiAliasColorImages false
  /CropColorImages true
  /ColorImageMinResolution 300
  /ColorImageMinResolutionPolicy /OK
  /DownsampleColorImages false
  /ColorImageDownsampleType /Average
  /ColorImageResolution 300
  /ColorImageDepth 8
  /ColorImageMinDownsampleDepth 1
  /ColorImageDownsampleThreshold 1.50000
  /EncodeColorImages true
  /ColorImageFilter /FlateEncode
  /AutoFilterColorImages false
  /ColorImageAutoFilterStrategy /JPEG
  /ColorACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /ColorImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /JPEG2000ColorACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /JPEG2000ColorImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /AntiAliasGrayImages false
  /CropGrayImages true
  /GrayImageMinResolution 300
  /GrayImageMinResolutionPolicy /OK
  /DownsampleGrayImages false
  /GrayImageDownsampleType /Average
  /GrayImageResolution 300
  /GrayImageDepth 8
  /GrayImageMinDownsampleDepth 2
  /GrayImageDownsampleThreshold 1.50000
  /EncodeGrayImages true
  /GrayImageFilter /FlateEncode
  /AutoFilterGrayImages false
  /GrayImageAutoFilterStrategy /JPEG
  /GrayACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /GrayImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /JPEG2000GrayACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /JPEG2000GrayImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /AntiAliasMonoImages false
  /CropMonoImages true
  /MonoImageMinResolution 1200
  /MonoImageMinResolutionPolicy /OK
  /DownsampleMonoImages false
  /MonoImageDownsampleType /Average
  /MonoImageResolution 1200
  /MonoImageDepth -1
  /MonoImageDownsampleThreshold 1.50000
  /EncodeMonoImages true
  /MonoImageFilter /CCITTFaxEncode
  /MonoImageDict <<
    /K -1
  >>
  /AllowPSXObjects false
  /CheckCompliance [
    /None
  ]
  /PDFX1aCheck false
  /PDFX3Check false
  /PDFXCompliantPDFOnly false
  /PDFXNoTrimBoxError true
  /PDFXTrimBoxToMediaBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  ]
  /PDFXSetBleedBoxToMediaBox true
  /PDFXBleedBoxToTrimBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  ]
  /PDFXOutputIntentProfile (None)
  /PDFXOutputConditionIdentifier ()
  /PDFXOutputCondition ()
  /PDFXRegistryName ()
  /PDFXTrapped /False

  /CreateJDFFile false
  /Description <<
    /CHS <FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000500044004600206587686353ef901a8fc7684c976262535370673a548c002000700072006f006f00660065007200208fdb884c9ad88d2891cf62535370300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002>
    /CHT <FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef653ef5728684c9762537088686a5f548c002000700072006f006f00660065007200204e0a73725f979ad854c18cea7684521753706548679c300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002>
    /DAN <FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002000740069006c0020006b00760061006c00690074006500740073007500640073006b007200690076006e0069006e006700200065006c006c006500720020006b006f007200720065006b007400750072006c00e60073006e0069006e0067002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e>
    /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200076006f006e002000640065006e0065006e002000530069006500200068006f00630068007700650072007400690067006500200044007200750063006b006500200061007500660020004400650073006b0074006f0070002d0044007200750063006b00650072006e00200075006e0064002000500072006f006f0066002d00470065007200e400740065006e002000650072007a0065007500670065006e0020006d00f60063006800740065006e002e002000450072007300740065006c006c007400650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000410064006f00620065002000520065006100640065007200200035002e00300020006f0064006500720020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e>
    /ESP <FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f0073002000640065002000410064006f0062006500200050004400460020007000610072006100200063006f006e00730065006700750069007200200069006d0070007200650073006900f3006e002000640065002000630061006c006900640061006400200065006e00200069006d0070007200650073006f0072006100730020006400650020006500730063007200690074006f00720069006f00200079002000680065007200720061006d00690065006e00740061007300200064006500200063006f00720072006500630063006900f3006e002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e>
    /FRA <FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f00620065002000500044004600200070006f007500720020006400650073002000e90070007200650075007600650073002000650074002000640065007300200069006d007000720065007300730069006f006e00730020006400650020006800610075007400650020007100750061006c0069007400e90020007300750072002000640065007300200069006d007000720069006d0061006e0074006500730020006400650020006200750072006500610075002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200035002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e>
    /ITA <FEFF005500740069006c0069007a007a006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e00740069002000410064006f006200650020005000440046002000700065007200200075006e00610020007300740061006d007000610020006400690020007100750061006c0069007400e00020007300750020007300740061006d00700061006e0074006900200065002000700072006f006f0066006500720020006400650073006b0074006f0070002e0020004900200064006f00630075006d0065006e007400690020005000440046002000630072006500610074006900200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000410064006f00620065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e>
    /JPN <FEFF9ad854c18cea51fa529b7528002000410064006f0062006500200050004400460020658766f8306e4f5c6210306b4f7f75283057307e30593002537052376642306e753b8cea3092670059279650306b4fdd306430533068304c3067304d307e3059300230c730b930af30c830c330d730d730ea30f330bf3067306e53705237307e305f306f30d730eb30fc30d57528306b9069305730663044307e305930023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200035002e003000204ee5964d3067958b304f30533068304c3067304d307e30593002>
    /KOR <FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020b370c2a4d06cd0d10020d504b9b0d1300020bc0f0020ad50c815ae30c5d0c11c0020ace0d488c9c8b85c0020c778c1c4d560002000410064006f0062006500200050004400460020bb38c11cb97c0020c791c131d569b2c8b2e4002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200035002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e>
    /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken voor kwaliteitsafdrukken op desktopprinters en proofers. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.)
    /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200066006f00720020007500740073006b00720069006600740020006100760020006800f800790020006b00760061006c00690074006500740020007000e500200062006f007200640073006b0072006900760065007200200065006c006c00650072002000700072006f006f006600650072002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006500720065002e>
    /PTB <FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f0062006500200050004400460020007000610072006100200069006d0070007200650073007300f5006500730020006400650020007100750061006c0069006400610064006500200065006d00200069006d00700072006500730073006f0072006100730020006400650073006b0074006f00700020006500200064006900730070006f00730069007400690076006f0073002000640065002000700072006f00760061002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200035002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e>
    /SUO <FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f0074002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a00610020006c0061006100640075006b006100730074006100200074007900f6007000f60079007400e400740075006c006f0073007400750073007400610020006a00610020007600650064006f007300740075007300740061002000760061007200740065006e002e00200020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200035002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e>
    /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740020006600f600720020006b00760061006c00690074006500740073007500740073006b0072006900660074006500720020007000e5002000760061006e006c00690067006100200073006b0072006900760061007200650020006f006300680020006600f600720020006b006f007200720065006b007400750072002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200035002e00300020006f00630068002000730065006e006100720065002e>
    /ENU (Use these settings to create Adobe PDF documents for quality printing on desktop printers and proofers.  Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.)
  >>
  /Namespace [
    (Adobe)
    (Common)
    (1.0)
  ]
  /OtherNamespaces [
    <<
      /AsReaderSpreads false
      /CropImagesToFrames true
      /ErrorControl /WarnAndContinue
      /FlattenerIgnoreSpreadOverrides false
      /IncludeGuidesGrids false
      /IncludeNonPrinting false
      /IncludeSlug false
      /Namespace [
        (Adobe)
        (InDesign)
        (4.0)
      ]
      /OmitPlacedBitmaps false
      /OmitPlacedEPS false
      /OmitPlacedPDF false
      /SimulateOverprint /Legacy
    >>
    <<
      /AddBleedMarks false
      /AddColorBars false
      /AddCropMarks false
      /AddPageInfo false
      /AddRegMarks false
      /ConvertColors /NoConversion
      /DestinationProfileName ()
      /DestinationProfileSelector /NA
      /Downsample16BitImages true
      /FlattenerPreset <<
        /PresetSelector /MediumResolution
      >>
      /FormElements false
      /GenerateStructure true
      /IncludeBookmarks false
      /IncludeHyperlinks false
      /IncludeInteractive false
      /IncludeLayers false
      /IncludeProfiles true
      /MultimediaHandling /UseObjectSettings
      /Namespace [
        (Adobe)
        (CreativeSuite)
        (2.0)
      ]
      /PDFXOutputIntentProfileSelector /NA
      /PreserveEditing true
      /UntaggedCMYKHandling /LeaveUntagged
      /UntaggedRGBHandling /LeaveUntagged
      /UseDocumentBleed false
    >>
  ]
>> setdistillerparams
<<
  /HWResolution [2400 2400]
  /PageSize [612.000 792.000]
>> setpagedevice