International Journal of Network Security, Vol.18, No.3, PP.420-432, May 2016 420

Feature Selection for Intrusion Detection System
Using Ant Colony Optimization

Mehdi Hosseinzadeh Aghdam1, and Peyman Kabiri2

(Corresponding author:Mehdi Hosseinzadeh Aghdam)

Department of Computer Engineering and Information Technology, Payame Noor University (PNU)1

P.O. BOX 19395-3697, Tehran, Iran

School of Computer Engineering, Iran University of Science and Technology2

Tehran, 16846-13114, Iran

(Email: mhaghdam@iust.ac.ir)

(Received March 30, 2014; revised and accepted Jan. 16 & Mar. 4, 2015)

Abstract

Intrusion detection is a major research problem in net-
work security. Due to the nonlinear nature of the intru-
sion attempts, unpredictable behavior of the network traf-
fic and the large number of features in the problem space,
intrusion detection systems represent a complicated prob-
lem area. Choosing effective and key features for intrusion
detection is a very important topic in information secu-
rity. The purpose of this study is to identify important
features in building an intrusion detection system such
that they are computationally efficient and effective. To
improve the performance of intrusion detection system,
this paper proposes an intrusion detection system that
its features are optimally selected using ant colony op-
timization. The proposed method is easily implemented
and has a low computational complexity due to use of a
simplified feature set for the classification. The extensive
experimental results on the KDD Cup 99 and NSL-KDD
intrusion detection benchmark data sets demonstrate that
the proposed method outperforms previous approaches,
providing higher accuracy in detecting intrusion attempts
and lower false alarm with reduced number of features.

Keywords: Ant colony optimization, feature selection, in-
trusion detection system

1 Introduction

Intrusion Detection Systems (IDSs) have become impor-
tant and widely used tools for ensuring network security.
In recent years, intrusion detection based on statistical
pattern recognition methods has attracted a wide range
of interest in response to the growing demand of reliable
and intelligent IDSs, which are required to detect sophisti-
cated and polymorphous intrusion attacks [7]. In general,
IDSs are usually classified into two categories in the liter-
ature: signature-based intrusion detection and anomaly-

based intrusion detection [33]. Signature-based intrusion
detection, misuse detection, approach can reliably iden-
tify intrusion attacks in relation to the known signatures
of discovered vulnerabilities. Therefore, well-known in-
trusions can be detected efficiently with a very low false
positive rate. Intrusions are usually polymorphic, and
evolve continuously. Signature-based detection will fail
easily when facing unknown intrusions and emergent in-
tervention of security experts is required to define signa-
tures or accurate rules, which limits the application of
the signature-based detection approach to build intelli-
gent IDSs. Signature-based detection is commonly used
in the design of commercial IDSs [38].

Anomaly-based IDS usually deals with statistical anal-
ysis and pattern recognition problems. It can detect zero
day attacks if the classification model has the generaliza-
tion capability to extract intrusion pattern and knowledge
during the training period [33]. In the anomaly-based in-
trusion detection approach, the system builds models for
normal behavior in the network traffic and detects any de-
viation from this model as a potential intrusion attempt.
This approach commonly suffers from high false positive
rate on classifying normal network traffic. Due to the dy-
namic nature of the network traffic, discovering bound-
aries between normal and abnormal behavior is a major
difficulty in this approach [38]. To overcome the anomaly
intrusion detection problem, the data mining [14], ma-
chine learning [25] and immune system [37] approaches
have been proposed in recent years.

Considering the available computational power and
due to the large amount of audit data with a large number
of features that IDS needs to examine, even for a small
network, traffic analysis is a difficult task. Audit data
captures various features of the connections. For exam-
ple, the audit data would show the network service on the
destination (http, telnet, etc.), or the number of wrong
fragments or length of the connection (second). Selected
features in the problem space might be correlated, which


International Journal of Network Security, Vol.18, No.3, PP.420-432, May 2016 421

is difficult for humans to discover. An IDS operating in
real-time needs a reduced amount of data for the process-
ing. Some of these features are irrelevant and redundant
and thus can be eliminated before the processing [7]. Most
of these features are not applicable to intrusion detection;
even some noise data may badly affect the performance of
the process of detecting intrusions. Extra and noisy fea-
tures can increase the computation time, and can have a
negative impact on the accuracy of the IDS [28]. Hence,
we need to select some representative features from the
original feature space to reduce the dimensionality of fea-
ture space and improve the efficiency and performance of
IDS.

Feature Selection (FS) is a commonly used step in ma-
chine learning, especially when dealing with a high di-
mensional space of features. The objective of FS is to
simplify a data set by reducing its dimensionality and
identifying relevant underlying features without sacrific-
ing predictive accuracy. By doing that, it also reduces
redundancy in the information provided by the selected
features. In real world problems FS is a must due to the
abundance of noisy, irrelevant or misleading features. FS
is extensive and it spreads throughout many fields, includ-
ing text categorization, data mining, pattern recognition,
signal processing and intrusion detection [1, 2].

Recently, Ant Colony Optimization (ACO) is gaining
popularity as a new approach to the FS [1, 13, 36]. Some
literatures provided experimental results that illustrated
ACOs superiority over conventional approaches, such as
sequential search and the genetic algorithm [13]. Dorigo
and colleagues introduced meta-heuristic optimization al-
gorithm based on behavior of ants in the early 1990s.
ACO is a new solution finding method in artificial intel-
ligence called Swarm Intelligence (SI). In SI interaction
of cooperative individuals is such that a problem-solving
behavior emerges. SI is the property of a system whereby
the collective behaviors of unsophisticated individuals in-
teracting locally with their environment cause coherent
functional global patterns to emerge. Insects such as ants
and bees live in colonies. An individual can only do sim-
ple behavior on its own, while their colonial cooperative
work represents a complex behavior [10]. ACO algorithm
is inspired by social behavior of ant colonies. Although
they have no sight, ants are capable of finding the shortest
route between a food source and their nest by chemical
materials called pheromone that they leave when mov-
ing [6].

ACO algorithm was originally applied to the travelling
salesman problem and later on, it was successfully applied
to other optimization problems such as the quadratic as-
signment problem [18], routing in telecommunication net-
works, graph coloring problems, scheduling, etc. This
method is particularly attractive for FS as there seems
to be no heuristic that can lead the search to the optimal
minimal subset every time [1]. In this paper a novel in-
trusion detection system using ACO has been introduced.
The classifier performance and the length of selected fea-
ture subset are adopted as heuristic information for ACO.

Thus, the proposed method needs no prior knowledge of
features. Also the classifier performance and the length
of selected feature vector are considered for performance
evaluation. Finally, the proposed method is applied to
KDD Cup 99 and NSL-KDD, the new version of KDD
Cup 99, data sets.

The rest of this paper is organized as follows. Section 2
outlines the related work about feature selection methods
in intrusion detection. The proposed ACO-based method
for IDS is described in section 3. Section 4 reports compu-
tational experiments. It also includes a brief discussion of
the results obtained and finally the conclusion and future
works are offered in the last section.

2 Related Work

Feature selection is a process that selects a subset of orig-
inal features. The significance of FS can be viewed in two
facets. The frontier facet is to filter out noise and elimi-
nate irrelevant and redundant features. FS is compulsory
due to the abundance of noisy, irrelevant or misleading
features in a data set. Second, FS can be considered as
an optimization problem for an optimal subset of features
that better satisfy a desired measure [12]. Quality of the
FS optimization can be measured using certain evaluation
criteria. Since FS is a NP-hard problem, there is no prac-
tical solution to find its optimal feature subset [17]. A typ-
ical FS process includes subset generation, subset evalua-
tion, termination criteria and result validation. Subset se-
lection procedure implements a search method that selects
feature subsets for evaluation based on a certain search
strategy. It may start with empty subset, full subset, a
selected feature subset or some random feature subset.
Those methods that start with an initial subset usually
select this feature subset using heuristic methods. These
search methods include forward selection, backward elimi-
nation and forward/backward combination methods. The
process of subset selection and evaluation is repeated un-
til a given termination condition is satisfied. The selected
best feature subset usually needs to be validated using a
different test data set [17].

Volume of the network traffic data and the high di-
mensionality of the feature space are main causes for pro-
hibitively high processing overhead. This is why they are
major problems in IDS design. Therefore, FS is an im-
portant phase in IDS design. Variation of the normal
traffic often hinders anomaly-based IDS ability to accu-
rately model normal state of the operation of the network.
In any intrusion attempt there are some behavioral pat-
terns and interrelations that are unique and recognizable.
Since these patterns are hidden within the irrelevant and
redundant features it is often difficult to discover them.
Eliminating the less relevant features can improve both
the speed and the accuracy of the classification [29].

In accordance to the literature, approaches to FS
can be divided into filters, wrappers and embedded ap-
proaches [12]. The filter model separates FS from learning


International Journal of Network Security, Vol.18, No.3, PP.420-432, May 2016 422

algorithm and selects feature subsets that are independent
of any learning algorithm. It relies on various measures
of the general characteristics of the training data such as
distance, information, dependency, and consistency. In
the wrapper approach feature subset is selected using the
evaluation function based on the same learning algorithm
that will be used later for learning [17]. In this approach
the evaluation function finds the optimal feature subset
using the subset generation procedure. Finally, compar-
ing the result with best subset resulted from the previous
iterations subset evaluation procedure only keeps the best
subset. Subset evaluation procedure tests the best subset
against the termination criteria to determine if the selec-
tion process should end. Although, wrappers may gener-
ate a better result, their execution cost is high and may
encounter problem in dealing with feature spaces with the
very high dimensions. This is due to the use of learning
algorithms in the evaluation of subsets, some of which
can encounter problems while dealing with a high dimen-
sional space of features [4, 12]. If the FS and learning
algorithm are interleaved then the FS approach is a kind
of embedded approach [20].

Feature selection is an important issue in intrusion de-
tection. Elimination of the insignificant features may en-
hance accuracy of the detection. By concentrating on the
most important features execution speed of the process
can be increased without significant effect on the accu-
racy of detection. Chebrolu et al. employed the effective-
ness of IDS in terms of real-time performance and detec-
tion accuracy from the feature selection perspective [7].
In the reported work, features were selected using two
methods, Bayesian network and classification and regres-
sion trees. Four different sets of features were derived
and used in their ensemble method for IDS. In their ex-
periments, KDD Cup 99 data set is used. A very high
detection rate is reported in their experiments. Their ap-
proach uses only 5092 records for the training and 6890
records for the test. They have proposed no generic sub-
set. Their reported results show that different feature
subsets with different length are more effective in detect-
ing various types of attacks [7].

Sung and Mukkamala proposed a well-known closed-
loop FS method for SVM-based IDS, called SVM-RFE,
which recursively eliminated one feature at a time
and compared the resulting performance in each SVM
test [28]. They also ranked six significant features [29].
They used three methods and compared the performance
of these methods in terms of classification accuracy on the
test data set. In the reported work, they used support vec-
tor decision function ranking, linear genetic programming
and multivariate adaptive regression splines [29]. Sung et
al. reported a new feature subset that provides accurate
results for the detection of different types of attacks [30].
They have used genetic algorithm to maximize inter-class
difference and minimize size of the subset. Mukkamala
and Sung applied SVM-RFE method to the KDD Cup 99
data and performed the feature ranking for FS [22]. They
ranked the features into three categories: important, sec-

ondary, and insignificant according to three main perfor-
mance criteria: overall accuracy of classification, training
time, and testing time. 19 important features were iden-
tified and used in the experiments. This heuristic-based
technique is time consuming. Additionally, unknown at-
tack types are not considered in the reported work since
the same data set (kddcup.data-10-percent.gz) was ap-
plied to the experiment.

Zhang et al. considered the capability of rough set the-
ory in coming up with classification rules in detecting the
attacks [39]. They showed that rough set classification
using GA can produce a high detection rate. Speed of
the feature ranking in the reported work is fast. They did
not report the selected features used for classification pro-
cess. Ohn et al. adopted genetic algorithm to find optimal
feature subset for SVM [23]. 31 features were used with
radial kernel function in their experiment and a very high
detection rate was obtained for the original KDD Cup
99 test data set (corrected.gz). Since their training data
set was sampled from the full data set (kddcup.data.gz),
the challenge of the problem was reduced. The reason is
that the number of attack types in the original training
data set (kddcup.data-10-percent.gz) is intentionally em-
ployed more than the test data set (corrected.gz). These
two data sets can challenge the methods of detecting the
unknown type attack.

From these reported works, we can conclude that some
features are really significant in intrusion detection. Also,
it has been proven that there is no single generic classifier
that can best classify all the attack types. Instead, in
some cases, specific classifier performs better than others.
Thus, most of these works lead to an ensemble or fusion
of multiple classifier IDS.

3 Proposed ACO-based Method
for Intrusion Detection System

In the early nineties an algorithm called Ant System (AS)
was proposed by Dorigo and colleagues as a novel nature-
inspired meta-heuristic approach for the solution of com-
binatorial optimization problems. First, algorithm was
applied to the traveling salesman problem. Recently, it
was extended and/or modified both to improve its per-
formance and to apply it to other optimization prob-
lems. Improved versions of AS include, among others,
Ant Colony System (ACS), MAX-MIN, AS and AS-rank.
An ant colony optimization algorithm is essentially a sys-
tem based on individuals which simulate the natural be-
havior of ants, including mechanisms of cooperation and
adaptation. The inspiring source of ACO is the foraging
behavior of real ants [9]. The ACO algorithm is based on
a computational paradigm inspired by real ant colonies
and the way they function. The idea is to use several
constructive computational ants. Based on the results of
previous experiments stored in the ant dynamic memory
structure, each ant is guided to the constructed solution.
The paradigm is based on the observation made by ethol-


International Journal of Network Security, Vol.18, No.3, PP.420-432, May 2016 423

ogists about the medium used by ants to communicate
information regarding shortest paths to food by means of
pheromone trails. While an isolated ant moves practically
at random, exploration, an ant encountering a previously
laid trail can detect it and decide with high probability
to follow it, exploitation, and consequently reinforces the
trail with its own pheromone. What emerges is a form
of autocatalytic process through which the more the ants
follow a trail, the more attractive that trail becomes to be
followed. The process is thus characterized by a positive
feedback loop, during which the probability of choosing
a path increases with the number of ants that previously
chose the same path. The mechanism above is the inspi-
ration for the algorithms of the ACO family [21].

Before the training phase, a FS phase may also be con-
sidered. The FS process identifies which features are more
discriminative than the others. This has the benefit of
generally improving system performance by eliminating
irrelevant and redundant features. In general, FS is not
very popular procedure in IDS. However, a few studies
use different FS methods for their experiments. This im-
plies that FS could improve some certain level of classifi-
cation accuracy in IDS. Given a feature set of size N, the
FS problem is to find a minimal feature subset of size S
(S < N) while retaining the previous accuracy. There-
fore, there is no concept of solution path in FS problem.
A partial solution, i.e. subset, does not define any or-
der among the components, i.e. features of the solution,
and the next component to be selected is not necessarily
influenced by the last component added to the partial so-
lution [5, 15]. FS problem solutions are not necessarily
of the same size. The first step in FS using ACO is to
address the problem of redefining the way that the ACO
representation graph is used.

3.1 Graph Representation

The main goal of ACO algorithm is to model a prob-
lem as the search for a minimum cost path in a graph.
Here nodes can be considered as features, with the edges
between them denoting the choice of the next feature.
The search for the optimal feature subset is then an ant
traversal through the graph where a minimum number
of nodes, features, are visited that satisfies the traversal
stopping criterion. Nodes are fully connected to allow any
feature to be selected next. On the basis of this reformula-
tion of the graph representation, the transition rules and
pheromone update rules of standard ACO algorithms can
be applied. In this case, pheromone and heuristic value
are not associated with links. Instead, each feature has
its own pheromone value and heuristic value [1].

3.2 Heuristic Information

Generally, the representation of heuristic value is the at-
tractiveness of features and the basic ingredient of any
ACO algorithm is a constructive heuristic for probabilis-
tically constructing solutions. A constructive heuristic

assembles solutions as sequences of features from the fi-
nite set of features. A subset construction starts with an
empty subset. Then, at each construction step the cur-
rent subset is extended by adding a feature from the set
of features. A suitable heuristic desirability of travers-
ing between features could be any subset evaluation. In
proposed method classifier performance is mentioned as
heuristic information for FS. In other words, the classi-
fier accuracy of each feature on training set is considered
as heuristic information for each feature. The heuristic
information of traversal and node pheromone levels are
combined to form the so called probabilistic transition
rule, denoting the probability that ant k will include fea-
ture i in its subset at time step t:

Pki (t) =

{
[τi(t)]

α.[ηi]
β∑

u∈Jk[τu(t)]
α.[ηu]β

ifi ∈ Jk

0 otherwise
(1)

where Jk is the set of feasible features that ant k
can be added to its subset; τi and ηi are respectively
the pheromone value and heuristic information associated
with feature i. α and β are two parameters that determine
the relative weight of the pheromone value and heuristic
information. The transition probability used by ACO is
a balance between pheromone intensity (i.e. history of
previous successful moves), τi, and heuristic information
(expressing desirability of the move), ηi. This effectively
balances the exploitation-exploration trade-off. The best
balance between exploitation and exploration is achieved
through proper selection of the parameters α and β. If
α=0, no pheromone information is used, i.e. previous
search experience is neglected. The search then degrades
to a stochastic greedy search. If β=0, the attractiveness
(or potential benefit) of moves is neglected.

3.3 Pheromone Update

Pheromone updating is an important part for working the
ACO algorithm suitably. After all ants have completed
their solutions, pheromone evaporation on all nodes is
triggered using Equation (2) and then according to Equa-
tion (3) all ants deposit a quantity of pheromone, ∆τi(t),
on each node that they have used.

τi(t) = (1 −ρ)τi(t) (2)
τi(t + 1) = τi(t) + ∆τi(t) (3)

with

∆τi(t) =

m∑
k=1

∆τki (t) (4)

where m is the number of ants at each iteration and
ρ ∈ (0, 1) is the pheromone trail decay coefficient. The
main role of pheromone evaporation is to avoid stagna-
tion, that is, the situation in which all ants constructing
the same solution. All ants can update the pheromone ac-
cording to Equations (3,4). Where ∆τki (t) is the amount


International Journal of Network Security, Vol.18, No.3, PP.420-432, May 2016 424

of pheromone deposited by ant k on node i at time step
t:

∆τki (t) =

{
ω.γ(Sk(t)) + φ.(n/|Sk(t)|) ifi ∈ Sk(t)
0 otherwise

(5)

where Sk(t) is the feature subset found by ant k at iter-
ation t, and |Sk(t)| is its length. The pheromone is up-
dated according to both the measure of the classifier per-
formance, γ(Sk(t)), and feature subset length. ω and φ
are two parameters that control the relative importance of
classifier performance and feature subset length, ω ∈ [0, 1]
and φ = 1 − ω. This formula means that the classifier
performance and feature subset length have different sig-
nificance for FS process. In our experiment we assume
that classifier performance is more important than subset
length, so they were set as ω = 0.7, φ = 0.3.

Algorithm 1 ACO-based Feature Selection Algorithm
for IDS
1: Begin
2: Initialize all parameters, i.e. α,β,ρ,m,τ0,φ,ω,T.
3: Let t = 1.
4: for Each node i do
5: τi(t) = τ0.
6: end for
7: Place m ants, k = 1, ,m. // Initialize a population of

ants with random positions
8: while t ≤ T do
9: for Each ant k = 1, ...,m do

10: Sk(t) = {}
11: while Ant is able to increase the detection rate

do
12: From current node, select next node i using

Equation (1).
13: Add node i to subset Sk(t).
14: end while
15: Calculate the subset length |Sk(t)|.
16: Calculate the classifier performance γ(Sk(t)).
17: end for
18: for Each node i do
19: Apply pheromone evaporation using Equa-

tion (2).
20: Calculate ∆τi(t) using Equations (4,5).
21: Update pheromone using Equation (3).
22: end for
23: t = t + 1
24: end while
25: Return the subset Sk(t) with highest γ(Sk(t)) as the

solution.
26: End

3.4 Solution Construction

The overall process of ACO feature selection can be seen
in Figure 1. The process starts by generating a number
of ants which are then placed randomly on the graph i.e.

Figure 1: Overall process of proposed ACO-based IDS

each ant begins with one random feature. Alternatively,
the number of ants to place on the graph may be set equal
to the number of features within the data set; each ant
starts path construction at a different feature. From these
initial positions, they traverse nodes, features, probabilis-
tically until a traversal stopping criterion is satisfied. The
resulting subsets are collected and then evaluated. If an
optimal subset is found or the algorithm is executed a cer-
tain number of times, then the process halts and outputs
the best feature subset encountered. If none of these con-
ditions occurred, then the pheromone is updated, a new
set of ants are created and the process iterates once more.

In the proposed approach an IDS consists of several
essential parts including feature extraction and FS. Af-
ter preprocessing of compressed raw (binary) TCP dump
data of 7 weeks of network traffic (DARPA 98), feature
extraction is used to transform the input TCP dump data
into a feature set, feature vector. FS is applied to the fea-
ture set to select more informative features and to reduce
the dimensionality of the problem space. ACO algorithm
is used to explore the space of all subsets of given fea-
ture set. The performance of selected feature subsets is
measured by invoking an evaluation function with the cor-
responding reduced feature space and measuring the spec-
ified classification result. The best feature subset found


International Journal of Network Security, Vol.18, No.3, PP.420-432, May 2016 425

is then output as the recommended set of features to be
used in the actual design of the IDS.

4 Experiments and Results

The following sections describe the data sets and imple-
mentation results.

4.1 Data sets

Since 1999, KDD Cup 99 data set from UCI repository
is widely used as the benchmark data set for IDS eval-
uation [34]. The KDD Cup 99 contained 4,898,431 and
311,029 records in the training set and test set, respec-
tively. Each of record contains 41 features and is labeled
as either normal or an attack, with exactly one specific
attack type. In our experiments, we apply its 10% train-
ing data set consisting of 494021 connection records for
training. This data set is prepared by Stolfo et al. and
is built based on the data captured in DARPA 98 IDS
evaluation program [27]. The DARPA 98 intrusion detec-
tion evaluation program was prepared and managed by
MIT Lincoln Labs. Lincoln Labs set up an environment
to acquire nine weeks of raw TCP dump data for a LAN
simulating a typical U.S. Air Force LAN. DARPA 98 is
about 4 gigabytes of compressed raw (binary) TCP dump
data of 7 weeks of network traffic, which can be processed
into about 5 million connection records, each with about
100 bytes. The two weeks of test data have around 2
million connection records. All attacks fall in one of the
following four categories:

• Denial of Service (DoS): Is an attempt to con-
sume network resources in such a way that their ser-
vices become limited or unavailable for the legitimate
users.

• User to Root (U2R): Is an attack in which the
attacker starts accessing a normal user account on
a machine and gains root access to the machine by
exploiting vulnerabilities.

• Remote to Local (R2L): Occurs when an attacker
does not have an account on a remote system, but
who has the ability to send packets to a system over
a network and exploits vulnerabilities to gain local
access as a user of that system.

• Probing: An attack in which the attacker scans net-
work to collect information about its systems for the
apparent purpose of circumventing its security con-
trols.

KDD Cup 99 features can be classified into three
groups:

1) Basic features: this category encapsulates all the
attributes that can be extracted from a TCP/IP con-
nection. Most of these features leading to an implicit
delay in detection.

2) Traffic features: this category includes features
that are computed with respect to a window interval.

3) Content features: Most of the DoS and Probing
attacks have many intrusion frequent sequential pat-
terns, this is due to the fact that these attacks estab-
lish many connections to the host(s) in a very short
period of time. Unlike these attacks, the R2L and
U2R attacks donot have any intrusion frequent se-
quential patterns. The R2L and U2R attacks are em-
bedded in the payload of the packets, and normally
include only a single connection. To identify these
kinds of attacks, some relevant features are needed
to identify suspicious behavior in the packet payload.
These features are called content features [31].

Conducting a thorough analysis of the recent research
trend in anomaly detection, one will encounter several
machine learning methods reported to have a very high
detection rate of 98% while keeping the false alarm rate
at 1%. However, when we look at the state of the art IDS
solutions and commercial tools, there is few products us-
ing anomaly detection, and practitioners still think that
it is not a mature technology yet. Tavallaee et al. stud-
ied the details of a research in anomaly detection and
considered various aspects such as learning and detec-
tion approaches, training data sets, testing data sets, and
evaluation methods. Their study shows that there are
some inherent problems in the KDD Cup 99 data set [31],
which, is widely used as one of the few publicly available
data sets for network-based anomaly detection systems.

The first important deficiency in the KDD Cup 99 data
set is the large amount of redundant records. They found
approximately 78% and 75% of the records are duplicated
in the training and testing sets, respectively. This large
number of redundant records in the training set will cause
learning algorithms to be biased towards the more fre-
quent records, and thus prevent it from learning infre-
quent records which are usually more harmful to networks
such as U2R attacks. The existence of these repeated
records in the test set, on the other hand, will cause the
evaluation results to be biased by the methods which have
better detection rates on the frequent records [31].

Tavallaee et al. applied 21 learned machines to analyze
the difficulty level of the records in KDD Cup 99 data set.
They labeled the records of the entire train and test sets,
which provide 21 predicted labels for each record. All
the 21 methods applied on the data set classified about
98% of the records in the train set and 86% of the records
in the test set correctly. Tavallaee et al. reported these
statistics on both KDD Cup 99 train and test sets since
they have found similar results presented in many papers,
random parts of the KDD Cup 99 training set are used
as test sets. As a result, these papers obtain about 98%
detection rate. Even applying the KDD test set will result
in having a minimum detection rate of 86%, which makes
the comparison of IDSs quite difficult since they all vary
in the range of 86% to 100% [31]. In this paper, both
the KDD Cup 99 and more recent and revised version


International Journal of Network Security, Vol.18, No.3, PP.420-432, May 2016 426

of KDD Cup 99 data set, NSL-KDD data set, are used
for the experimentation. NSL-KDD data set is publicly
available for the researchers and does not suffer from any
of the mentioned problems [35]. Additionally, the number
of NSL-KDD records in the train and test sets is more
reasonable. This advantage makes it a good choice to run
the approaches on the complete KDD Cup 99 data set
without the need to randomly select a small portion.

It should be mentioned that the test set is not from
the same probability distribution as the training set, and
it includes unknown attack types that do not exist in the
training set that makes it more realistic. The data sets
contain a total number of 22 training attack types, with
an additional 17 types in the test data set [19]. In Table 2,
distribution of different attack types in the KDD Cup 99
and NSL-KDD data sets are listed.

In the KDD Cup 99 and NSL-KDD data sets there are
41 features (listed in Table 3) suggested for each record.

4.2 Performance Measures

The False Positive Rate (FPR) is defined as the number of
normal records that are incorrectly detected as intrusions
divided by the total number of normal records. The de-
tection rate is defined as the number of intrusion records
classified by the IDS divided by the total number of intru-
sion records present in the test data set. These are good
performance measures, since they measure what percent-
age of intrusions the system is able to detect and how
many incorrect classifications are made in the process.
Usually True Positive Rate (TPR) and false positive rate
are used for performance measurement. TPR also known
as Detection Rate (DR) or sensitivity or Recall and FPR
also known as the False Alarm Rate (FAR). They showed
in the following equations:

Recall = TPR =
TP

TP + FN
(6)

Precision =
TP

TP + FP
(7)

FDR =
FP

TN + FP
(8)

Accuracy =
TP + TN

TP + TN + FP + FN
(9)

where TP, TN, FP, FN are the numbers of true positives,
true negatives, false positives and false negatives, respec-
tively. Another commonly used measure is F-measure
that is defined in Equation (10) [26].

F −measure =
2 ×Recall×Precision
(Recall + Precision)

(10)

4.3 Results

A series of experiments was conducted to show the util-
ity of proposed method. All experiments are executed
on a machine with Intel(R) Core(TM) i7 CPU 3.2 GHz
and 4 GB of RAM. The operating system was Windows

7 Professional. All tested models were implemented on
MATLAB R2011b. Various values were tested for the pa-
rameters of the proposed method. The results show that
the highest performance is achieved by setting the param-
eters to values as follow: α = 0.4,β = 0.6,ρ = 0.2, the
initial pheromone intensity of each feature is equal to 1
(τ0 = 1) the number of ant in every iteration is 50 (m=50)
and the maximum number of iterations is 100 (T=100).
These values were empirically determined in our prelimi-
nary experiments; however, we make no claims that these
are optimal values. Parameter optimization is still a topic
for future research.

Many papers have focused on improving detection rates
of the IDSs using efficient classifiers, i.e. this is a quiet
difficult approach. This paper puts forward a modified
ACO-based FS algorithm aiming at building a classifier to
detect intrusion attempts. In the experiments, the whole
training set and test set were applied. Each record in
the training set or the test set consisted of 41 features.
Initially the proposed FS algorithm is used to select im-
portant features for each type of the previous discussed
attacks. Later on, a classification system was used to clas-
sify the attacks were the results are reported. Each fea-
ture value is standardized using Equation (11) and then
unimportant features are removed, i.e., leaving only the
important features, as listed in Table 4. For each type
of attacks, using selected features classification of the at-
tacked is performed and results are compared with those
using all 41 features. Finally, using the results of the com-
parisons, their performance in detecting attacks is evalu-
ated.

StandardScore =
X −µ
σ

(11)

where X is a feature value to be standardized, µ is the
mean of feature values and σ is the standard deviation of
the feature values.

Results of classification using the proposed method are
reported in Tables 5 and 6. In each class (normal or at-
tack), each row shows the performance of the proposed
method and baseline approach (using all features) in de-
tecting attacks. Experimental results show that FS phase
of the process improves the detection rate. In the normal
class, studying input features with regard to the output
shows that there is no linear relation between the input
features and output. Therefore, comparing it versus the
baseline approach, implementing the proposed method
has significantly improved accuracy of the classification
models. The best performance of this system, in terms of
its accuracy, is reported to be 98.90%, with only 2.59%
false positive rate.

Several experiments are performed to compare the two
different IDSs. Results show that an IDS combined with
the proposed ACO-based algorithm has higher detection
rates in detecting attacks than the baseline approach.
Figure 2 shows the true positive rate and the false positive
rate for the proposed method as we change the number
of selected features. The effect of selecting different fea-


International Journal of Network Security, Vol.18, No.3, PP.420-432, May 2016 427

Table 1: KDD Cup 99 and NSL-KDD data sets

Data set name # Instances Normal DoS U2R R2L PROBE
KDD Cup 99 Training set 494021 97278 391458 54 1124 4107

Testing set 311029 60593 229853 2636 13781 4166
NSL-KDD Training set 25192 13449 9234 11 209 2289

Testing set 22544 9711 7458 533 2421 2421

Table 2: The distribution of attack types in the KDD Cup 99 and NSL-KDD

Attack category Attack type KDD Cup 99 NSL-KDD
Training set Testing set Training set Testing set

kddcup.data10percent corrected.gz KDDTrain+20Percent KDDTest+
Neptune 107201 58001 8282 4657
Smurf 164091 280790 529 665
Pod 264 87 38 41
Teardrop 979 12 188 12

Denial of Service Land 21 9 1 7
(DoS) Back 2203 1098 196 359

Apache2 - 794 - 737
Udpstorm - 2 - 2
Process-table - 759 - 685
Mail-bomb - 5000 - 293
Buffer-overflow 30 22 6 20
Load-module 9 2 1 2
Perl 3 2 0 2
Rootkit 10 13 4 13

User to Root Spy 2 - 1 -
(U2R) Xterm - 13 - 13

Ps - 16 - 17
Http-tunnel - 158 - 133
Sql-attack - 2 - 2
Worm - 2 - 2
Snmp-guess - 2406 - 331
Guess-password 53 4367 10 1231
Ftp-write 8 3 1 3
Imap 12 1 5 1
Phf 4 2 2 2
Multihop 7 18 2 18

Remote to Local Warezmaster 20 1602 7 944
(R2L) Warezclient 1020 - 181 -

Snmpgetattack - 7741 - 178
Named - 17 - 17
Xlock - 9 - 9
Xsnoop - 4 - 4
Send-mail - 17 - 14
Port-sweep 1040 354 587 157
IP-sweep 1247 306 710 141

Probe Nmap 231 84 301 73
Satan 1589 1633 691 735
Saint - 736 - 319
Mscan - 1053 - 996


International Journal of Network Security, Vol.18, No.3, PP.420-432, May 2016 428

Table 3: Lists of features in KDD Cup 99 and NSL-KDD data sets

No. Feature name Description Type
1 Duration Length of the connection (second) Continuous
2 Protocol-type Type of protocol, e.g. tcp, udp, etc. Discrete
3 Service Network service on the destination, e.g., http, telnet, etc. Discrete
4 Flag Normal or error status of the connection Discrete
5 Src-bytes Number of data bytes from source to destination Continuous
6 Dst-bytes Number of data bytes from destination to source Continuous
7 Land 1 if connection is from/to the same host/port; 0 otherwise Discrete
8 Wrong-fragment Number of wrong fragments Continuous
9 Urgent Number of urgent packets Discrete
10 Hot Number of hot indicators Discrete
11 Num-failed-logins Number of failed login attempts Discrete
12 Logged-in 1 if successfully logged in; 0 otherwise Discrete
13 Num-compromised Number of compromised condition Discrete
14 Root-shell 1 if root shell is obtained; 0 otherwise Discrete
15 Su-attempted 1 if su root command attempted; 0 otherwise Discrete
16 Num-root Number of root accesses Discrete
17 Num-file-creations Number of file creation operations Discrete
18 Num-shells Number of shell prompts Discrete
19 Num-access-files Number of operations on access control files Discrete
20 Num-outbound-cmds Number of outbound commands in an ftp session Discrete
21 Is-host-login 1 if the login belongs to the hot list; 0 otherwise Discrete
22 Is-guest-login 1 if the login is a guest login; 0 otherwise Discrete
23 Count Number of connections to the same host as the current Discrete

connection in the past two seconds
24 Srv-count Number of connections to the same service as the current Discrete

connection in the past two seconds
25 Serror-rate Percent of connections that have SYN errors Discrete
26 Srv-serror-rate Percent of connections that have SYN errors Discrete
27 Rerror-rate Percent of connections that have REJ errors Discrete
28 Srv-rerror-rate Percent of connections that have REJ errors Discrete
29 Same-srv-rate Percent of connections to the same services Discrete
30 Diff-srv-rate Percent of connections to different services Discrete
31 Srv-diff-host-rate Percent of connections to different hosts Discrete
32 Dst-host-count Count for destination host Discrete
33 Dst-host-srv-count Srv-count for destination host Discrete
34 Dst-host-same-srv-rate Same-srv-rate for destination host Discrete
35 Dst-host-diff-srv-rate Diff-srv-rate for destination host Discrete
36 Dst-host-same-src-port-rate Same-src-port-rate for destination host Discrete
37 Dst-host-srv-diff-host-rate Diff-host-rate for destination host Discrete
38 Dst-host-serror-rate Serror-rate for destination host Discrete
39 Dst-host-srv-serror-rate Srv-serror-rate for destination host Discrete
40 Dst-host-rerror-rate Rerror-rate for destination host Discrete
41 Dst-host-srv-rerror-rate Srv-serror-rate for destination host Discrete


International Journal of Network Security, Vol.18, No.3, PP.420-432, May 2016 429

Table 4: Selected features out of the 41 features

Category # Feature Selected features
Normal 5 Urgent, Num-failed-logins, Count, Rerror-rate, Dst-host-srv-diff-host-rate

(9, 11, 23, 27, 37)
DoS 4 Duration, Flag, Root-shell, Dst-host-srv-diff-host-rate (1, 4, 14, 37)
U2R 4 Service, Dst-bytes, Count, Serror-rate (3, 6, 23, 25)
R2L 3 Count, Srv-count, Diff-srv-rate (23, 24, 30)
Probe 8 Protocol-type, Duration, Hot, Logged-in, Num-compromised, Num-access-files,

Diff-srv-rate, Dst-host-diff-srv-rate (2, 4, 10, 12, 13, 19, 30,35)

Table 5: Proposed IDS performance on the KDD Cup 99 data set

Category All features Proposed IDS
Normal DoS U2R R2L Probe Normal DoS U2R R2L Probe

#Correctly detected 45954 183928 29 537 2800 59024 229347 2465 13667 3110
# Miss detected 14639 45925 2607 13244 1366 1569 506 171 114 1056
Precision 77.74 87.86 46.77 82.23 6.68 69.60 81.66 6.53 13.07 86.41
Recall (TPR) 75.84 80.02 1.10 3.9 67.21 97.41 99.78 93.51 99.17 74.65
F-measure 76.78 83.76 2.15 7.45 12.15 81.19 89.82 12.21 23.10 80.10

tures for the all attack classes in each step of the proposed
method is depicted in Figure 2. The maximum difference
between true positive rate and false positive rate is in
the 5th step. Hence the first five features are selected for
modeling the system.

A comparison between the test results for the proposed
method versus other machine learning methods tested on
the KDD Cup 99 test set are presented in Table 7. It
can be stated that all the machine learning algorithms
tested on this data set offered an acceptable level of de-
tection performance for Normal, DoS and Probe attacks
but they did not have good performance on the U2R and
R2L types. The proposed method shows better TPR for
DoS, U2R and R2L attacks and offers an acceptable level
of detection rate for Normal and Probe attacks. There
are 18729 records of various new attacks in the KDD Cup
99 test set, which have never appeared in the training
set. These new attack records make an IDS trained by
a training set hard to achieve good performance for test
set. Experiments show that the proposed method shows
an acceptable detection rate for detecting new attacks.

Considering the reported results, ACO is faster in lo-
cating the optimal solution. In general, it can find the
optimal solution within tens of iterations. If exhaustive
search is used to find the optimal feature subset in the
KDD Cup 99 data set, there will be tens of billions of
candidate subsets, which, makes the search nearly impos-
sible. Using ACO, the optimal solution is found after
100th iterations. ACO has powerful exploration ability;
it is a gradual searching process that approaches optimal
solutions. The execution time for the ACO is mainly af-
fected by the dimensionality of the problem (number of
the features), and the size of the data. ACO can search in

Figure 2: (a) TPR is depicted in each step and (b) FPR
is shown in each step


International Journal of Network Security, Vol.18, No.3, PP.420-432, May 2016 430

Table 6: Proposed IDS performance on the NSL-KDD data set

Category All features Proposed IDS
Normal DoS U2R R2L Probe Normal DoS U2R R2L Probe

#Correctly detected 9455 5131 28 454 1434 9483 5613 392 597 1667
# Miss detected 256 2327 505 1967 987 228 1845 141 1824 754
Precision 65.46 87.34 68.29 90.80 85.10 61.31 79.22 8.17 93.14 85.79
Recall (TPR) 97.36 68.80 5.25 18.75 59.23 97.65 75.26 73.55 24.66 68.86
F-measure 78.29 76.97 9.75 31.08 69.85 75.33 77.19 14.71 39.00 76.40

Table 7: Detection rate per record of KDD Cup 99 for the different algorithms performances on the test data set
with corrected labels of KDD Cup 99 data set

Model Normal(%) DoS(%) U2R(%) R2L(%) Probe(%) Accuracy FPR
Proposed method 97.41 99.78 93.51 99.17 74.65 98.9 2.59
PLSSVM [3] 95.69 78.76 30.7 84.85 86.46 Not reported 4.3
Clustering feature [11] 99.3 99.5 19.7 28.8 97.5 95.7 0.7
ESC-IDS [32] 98.2 99.5 14.1 31.5 84.1 95.3 1.9
KDDwinner [24] 99.5 97.1 13.2 8.4 83.3 91.8 0.5
KDD99 runner-up [16] 99.4 97.5 11.8 7.3 84.5 91.5 0.6

the feature space until the optimal solution is found. ACO
comprises a very simple concept, and the ideas can be im-
plemented in a few lines of computer code. It requires only
primitive mathematical operators, and is computationally
inexpensive in terms of both memory requirements and
speed. Each ant has a separate memory. In ACO, each
ant that passes the optimum solutions are tagged so that
other ants can use them and knowledge of good solutions
is retained by all ants.

5 Conclusions and Future Works

This paper addresses the problem of dimensionality re-
duction using ACO in intrusion detection problem area.
ACO has the ability to converge quickly. It has a strong
search capability in the problem space and can efficiently
find minimal feature subset. Experimental results demon-
strate a competitive performance. More experimentation
and further investigation into this technique is required.
The pheromone trail decay coefficient and pheromone
amount have an important impact on the performance of
ACO. Selection of the parameters is proved to be problem-
dependent. The deposited pheromone expresses the qual-
ity of the corresponding solution. Evaporation becomes
more important for more complex problems. If ρ = 0,
i.e. no evaporation, the algorithm does not converge. If
pheromone evaporates too much (a large ρ is used), the
algorithm often converged to sub-optimal solutions. In
many practical problems, it is difficult to select the best
ρ without trial-and-error. α and β are also key factors in
ACO for FS. For large data sets, to speed up the calcu-
lation of FS process, a parallel algorithm can be imple-

mented.

The proposed method uses ACO algorithm and a sim-
ple classifier (nearest neighbor classifier) to select impor-
tant features and a trained classifier to identify any kind
of new attacks. Tests and comparisons are performed on
KDD Cup 99 and NSL-KDD data sets, the test sets con-
tains 17 kinds of different attacks. The proposed method
reduced the number of features by approximately 88% and
the detection error reduced by around 24% using KDD
Cup 99 test data set. The proposed method will signif-
icantly reduce both the memory size and the CPU time
required for intrusion detection by reducing number of
the features used for the detection. This shows that the
proposed method is very reliable for intrusion detection.
Results indicate that the proposed ACO-based detection
method outperforms other methods since it can provide
better and more robust representation of the data. This
is due to the fact that it can accurately detect a broader
range of attacks using smaller number of features.

As for the future work, intention is to apply the pro-
posed intrusion detection method using complicated clas-
sifiers to improve its performance and to combine the pro-
posed method with other population-based algorithms.
Analyzing packet payload is recently attracting lots of
attention and many researchers report works carried-out
in this area. It is notable that feature selection for the
payload-based intrusion detection is not mature yet. In-
tension will be to extract and selection appropriate fea-
tures from the packet payload to improve the detection
rate.


International Journal of Network Security, Vol.18, No.3, PP.420-432, May 2016 431

Acknowledgments

This study was supported by the Iran University of Sci-
ence and Technology. The authors gratefully acknowledge
the anonymous reviewers for their valuable comments.

References

[1] M. H. Aghdam, N. Ghasem-Aghaee, and M. E.
Basiri, “Application of ant colony optimization for fea-
ture selection in text categorization,” in Proceedings
of the IEEE Congress on Evolutionary Computation
(CEC’08), pp. 2872–2878, 2008.

[2] M. H. Aghdam, J. Tanha, A. R. Naghsh-Nilchi, and
M. E. Basiri, “Combination of Ant Colony Optimiza-
tion and Bayesian Classification for Feature Selection
in a Bioinformatics Dataset,” Journal of Computer
Science and System Biology, vol. 2, pp. 186–199, 2009.

[3] F. Amiri, M. M. R. Yousefi, C. Lucas, A. Shakery, and
N. Yazdani, “Mutual information-based feature selec-
tion for intrusion detection systems,” International
Journal of Network and Computer Applications, vol.
34, pp. 1184–1199, 2011.

[4] J. Bins, Feature Selection from Huge Feature Sets in
the Context of Computer Vision, Ph.D. dissertation,
Colorado State University, 2000.

[5] C. Blum, and M. Dorigo, “The hyper-cube framework
for ant colony optimization,” IEEE Transaction on
Systems, Man, and Cybernetics, Part B, vol. 34, no.
2, pp. 1161–1172, 2004.

[6] E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm
Intelligence: From Natural to Artificial Systems, Ox-
ford University Press, New York, 1999.

[7] S. Chebrolu, A. Abraham, and J. P. Thomas, “Fea-
ture deduction and ensemble design of intrusion de-
tection systems,” International Journal of Computers
and Security, vol. 24, pp. 295–307, 2005.

[8] P. S. Chung, C. W. Liu, and M. S. Hwang, “A Study
of Attribute-based Proxy Re-encryption Scheme in
Cloud Environments,” International Journal of Net-
work Security, vol. 16, no. 1, pp. 1–13, 2014.

[9] M. Dorigo, and C. Blum, “Ant colony optimization
theory: A survey,” Theoretical Computer Science, pp.
243–278, 2005.

[10] A. P. Engelbrecht, Fundamentals of Computational
Swarm Intelligence, John Wiley & Sons, London,
2005.

[11] S. J. Horng, M. Y. Su, Y. H. Chen, T. W. Kao, R.
J. Chen, J. L. Lai, C. D. Perkasa, “A novel intru-
sion detection system based on hierarchical clustering
and support vector machines,” International Journal
of Expert Systems with Applications, vol. 38, no. 1, pp.
306–313, 2011.

[12] R. Jensen, Combining Rough and Fuzzy Sets for Fea-
ture Selection, Ph.D. dissertation, School of Informa-
tion, Edinburgh University, 2005.

[13] K. J. Lee, J., Joo, J., Yang, and V. Honavar, “Exper-
imental comparison of feature subset selection using

GA and ACO algorithm,” in Advanced Data Mining
and Applications, LNCS 4093, pp. 465–472, Springer,
2006.

[14] W. Lee, S. J. Stolfo, K. W. Mok, “Adaptive intru-
sion detection: a data mining approach,” Interna-
tional Journal Artificial Inteligence Review, vol. 14,
no. 6, pp. 533–567, 2000.

[15] G. Leguizamon, and Z. Michalewicz, “A New Version
of Ant System for Subset Problems,” In Proceedings of
IEEE Congress on Evolutionary Computation, 1999.

[16] I Levin, “KDD-99 classifier learning contest LLSofts
results overview,” International Journal of SIGKDD
Explorations, vol. 1, no. 2, pp. 67–75, 2000.

[17] H. Liu, and L. Yu, “Toward integrating feature se-
lection algorithms for classification and clustering,”
IEEE Transactions on Knowledge and Data Engineer-
ing, pp. 491–502, 2005.

[18] V. Maniezzo, A. Colorni, “The Ant System Applied
to the Quadratic Assignment Problem,” IEEE Trans-
action on Knowledge and Data Engineering, vol. 11,
no. 5, pp. 769–778, 1999.

[19] MIT Lincoln Labs, 1998 DARPA Intru-
sion Detection Evaluation, Feb. 2012. (http:
//www.ll.mit.edu/mission/communications/ist/

corpora/ideval/index.html)
[20] D. Mladeni, “Feature Selection for Dimensionality

Reduction,” in Subspace, Latent Structure and Fea-
ture Selection, Statistical and Optimization, Perspec-
tives Workshop (SLSFS’06), LNCS 3940, pp. 84–102,
Springer, 2006.

[21] R. Montemanni, L. M. Gambardella, A. E. Rizzoli,
and A. V. Donati, “A new algorithm for a Dynamic
Vehicle Routing Problem based on Ant Colony Sys-
tem,” in Second International Workshop on Freight
Transportation and Logistics, pp. 27–30, 2003.

[22] S. Mukkamala, and A. H. Sung, “Feature ranking
and selection for intrusion detection systems using
support vector machines,” in Proceedings of the Inter-
national Conference on Information and Knowledge
Engineering, pp. 503–509, 2002.

[23] S. Y. Ohn, N. H. Nguyen, D. S. Kim, and J. S.
Park, “Determining Optimal Decision Model for Sup-
port Vector Machine by Genetic Algorithm,” in Com-
putational and Information Science, LNCS 3314, pp.
895–902, Springer, 2005.

[24] B. Pfahringer, “Winning the KDD 99 classifica-
tion cup: bagged boosting,” International Journal of
SIGKDD Explorations, vol. 1, no. 2, pp. 65–66, 2000.

[25] M. Ramadas, S. Ostermann, and B. Tjaden, “De-
tecting anomalous network traffic with self-organizing
maps,” in Proceedings of Sixth International Sympo-
sium on Recent Advances in Intrusion Detection, pp.
36–54, 2003.

[26] C. J. Rijsbergen, Information Retrieval (2nd ed.),
London, UK: Butterworth, 1979.

[27] S. J. Stolfo, W. Fan, W. Lee, A. Prodromidis, P.
K. Chan, “Cost-based modeling for fraud and intru-
sion detection: Results from the jam project,” in In


International Journal of Network Security, Vol.18, No.3, PP.420-432, May 2016 432

Proceedings DARPA Information Survivability Con-
ference and Exposition (DISCEX’00), pp. 130–144,
2000.

[28] A. H. Sung, and S. Mukkamala, “Identifying impor-
tant features for intrusion detection using support vec-
tor machines and neural network,” in Proceedings of
International Symposium on Applications and the In-
ternet, pp. 209–216, 2003.

[29] A. H. Sung, and S. Mukkamala, “The Feature Se-
lection and Intrusion Detection Problems,” in Pro-
ceedings of Advances in Computer Science - ASIAN:
Higher-Level Decision Making. 9th Asian Computing
Science Conference, LNCS 3321, pp. 468–482, 2004.

[30] W. S. Sung, H. L. Chi, “Using Attack-Specific Fea-
ture Subsets for Network Intrusion Detection,” in Ad-
vances in Artificial Intelligence, LNCS 4304, pp. 305–
311, Springer, 2006.

[31] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghor-
bani, “A Detailed Analysis of the KDD CUP 99 Data
set,” in Proceeding of the IEEE Symposium on Com-
putational Intelligence in Security and Defense Appli-
cations, pp. 57–62, 2007.

[32] A. N. Toosi, and M. Kahani ”A new approach to in-
trusion detection based on an evolutionary soft com-
puting model using neuro-fuzzy classifiers,” Interna-
tional Journal of Computer Communications, vol. 30,
pp. 2201–2212, 2007.

[33] C. Tsang, S. Kwong, and H. Wang, “Genetic-fuzzy
rule mining approach and evaluation of feature selec-
tion techniques for anomaly intrusion detection,” In-
ternational Journal of Pattern Recognition, vol. 40,
pp. 2373–2391, 2007.

[34] UCI, KDD Cup 1999 Data, The UCI KDD Archive
Information and Computer Science University of Cali-
fornia, Irvine, Oct. 2014. (http://kdd.ics.uci.edu/
databases/kddcup99/kddcup99.html)

[35] UNB, NSL-KDD Data Set for Network-based Intru-
sion Detection Systems, Mar. 2014. (http://nsl.cs.
unb.ca/NSL-KDD/)

[36] S. M. Vieira, J. M. C. Sousa, and T. A. Runkler,
“Fuzzy classification in ant feature selection,” in IEEE
International Conference on Fuzzy Systems, pp.1763–
1769, 2008.

[37] P.D. Williams, K. P. Anchor, J. L. Bebo, G. H. Gun-
sch, and G. D. Lamont, “CDIS: towards a computer
immune system for detecting network Intrusions,” in
Proceedings of Fourth International Symposium on
Recent Advances in Intrusion Detection, pp. 117–133,
2001.

[38] S. X. Wu, and W. Banzhaf, “The use of computa-
tional intelligence in intrusion detection systems: A
review,” International Journal of Applied Soft Com-
puting, vol. 10, pp. 1–35, 2010.

[39] L.H. Zhang, G. H. Zhang, L. Yu, J. Zhang, and Y.
C. Bai, “Intrusion Detection Using Rough Set Classifi-
cation,” International Journal of Zheijiang University
Science, vol. 5, no. 9, pp. 1076–1086, 2004.

Mehdi Hosseinzadeh Aghdam is a Ph.D. candidate
in the computer engineering department at the Iran
University of Science and Technology, he is also a lecturer
at Payame Noor University (PNU). He graduated with a
master’s in computer engineering Artificial Intelligence
from University of Isfahan (UI) in 2008. At UI, he worked
on swarm intelligence-based method for feature selection.
His main research interests are: Data Mining (Feature
Selection, Recommendation Systems), Computational
Intelligence, Pattern Recognition and Social Network
Analysis.

Peyman Kabiri is assistant professor of computer
engineering at the Iran University of Science and Tech-
nology (IUST). He received a B.Eng. degree in computer
hardware engineering from IUST in 1992 and the M.Sc.
and Ph.D. degrees in computer science at the Nottingham
Trent University, UK, in 1996 and 2000 respectively. He
is the director of the Intelligent Automation Laboratory
at the Department of Computer Engineering-IUST. Dr.
Kabiri has authored a number of publications including
journal articles, book chapters, and conference papers.
He is editor of a book in intrusion detection work area.