Submitted 17 December 2018
Accepted 30 March 2019
Published 18 November 2019

Corresponding author
Ramak Ghavamizadeh Meibodi,
r-ghavami@sbu.ac.ir

Academic editor
Sebastian Ventura

Additional Information and
Declarations can be found on
page 11

DOI 10.7717/peerj-cs.188

Copyright
2019 Hasanpour et al.

Distributed under
Creative Commons CC-BY 4.0

OPEN ACCESS

Improving rule-based classification using
Harmony Search
Hesam Hasanpour, Ramak Ghavamizadeh Meibodi and Keivan Navi
Department of Computer Science and Engineering, Shahid Beheshti University, Tehran, Iran

ABSTRACT
Classification and associative rule mining are two substantial areas in data mining. Some
scientists attempt to integrate these two field called rule-based classifiers. Rule-based
classifiers can play a very important role in applications such as fraud detection, medical
diagnosis, etc. Numerous previous studies have shown that this type of classifier achieves
a higher classification accuracy than traditional classification algorithms. However, they
still suffer from a fundamental limitation. Many rule-based classifiers used various
greedy techniques to prune the redundant rules that lead to missing some important
rules. Another challenge that must be considered is related to the enormous set of
mined rules that result in high processing overhead. The result of these approaches
is that the final selected rules may not be the global best rules. These algorithms are
not successful at exploiting search space effectively in order to select the best subset of
candidate rules. We merged the Apriori algorithm, Harmony Search, and classification-
based association rules (CBA) algorithm in order to build a rule-based classifier. We
applied a modified version of the Apriori algorithm with multiple minimum support
for extracting useful rules for each class in the dataset. Instead of using a large number
of candidate rules, binary Harmony Search was utilized for selecting the best subset
of rules that appropriate for building a classification model. We applied the proposed
method on a seventeen benchmark dataset and compared its result with traditional
association rule classification algorithms. The statistical results show that our proposed
method outperformed other rule-based approaches.

Subjects Artificial Intelligence, Data Mining and Machine Learning
Keywords Apriori algorithm, CBA algorithm, Harmony Search

INTRODUCTION
The availability of a huge amount of raw data has created an immense opportunity for
knowledge discovery and data mining research to play an essential role in a wide range of
applications such as industry, financial forecasting, weather forecasting and healthcare.

Classification is one of the most important areas in data mining that has applied in
many applications such as bioinformatics, fraud detection, loan risk prediction, medical
diagnosis, weather prediction, customer segmentation, target marketing, text classification
and engineering fault detection. Association Rule Mining (ARM) is another popular and
substantial technique in machine learning and data mining that introduced by Agrawal,
Imieliński & Swami (1993), and since that remained one of the most active research areas
in machine learning and knowledge discovery. Association rule mining finds interesting
relationships among large sets of data items. Association rules show attribute value

How to cite this article Hasanpour H, Ghavamizadeh Meibodi R, Navi K. 2019. Improving rule-based classification using Harmony
Search. PeerJ Comput. Sci. 5:e188 http://doi.org/10.7717/peerj-cs.188

https://peerj.com/computer-science
mailto:r-ghavami@sbu.ac.ir
mailto:r-ghavami@sbu.ac.ir
https://peerj.com/academic-boards/editors/
https://peerj.com/academic-boards/editors/
http://dx.doi.org/10.7717/peerj-cs.188
http://creativecommons.org/licenses/by/4.0/
http://creativecommons.org/licenses/by/4.0/
http://doi.org/10.7717/peerj-cs.188


conditions that occur frequently together in a given data set. Association rules provide
information of this type in the form of if-then statements. Unlike the if-then rules of
logic, association rules are intrinsically probabilistic and are computed from the data. The
ARM is a powerful exploratory technique with a wide range of applications including
marketing policies, medical domain (Ilayaraja & Meyyappan, 2013; Shin et al., 2010),
financial forecast, credit fraud detection (Sarno et al., 2015) and many other areas. There
are a number of famous association rule mining algorithms that are accessible to researchers
(Agrawal, Imieliński & Swami, 1993; Burdick, Calimlim & Gehrke, 2001; Scheffer, 2001a).

There is some evidence that integration benefits of classification and association rule
mining together can result in more accurate and efficient classification models than
traditional classification algorithms (Ma & Liu, 1998). Producing concise and accurate
classifier by utilizing association rule mining is one of the attractive domain for data
mining and machine learning researchers.

A typical associative classification system is constructed in two stages:
1. discovering all the association rules inherent in a database.
2. Selecting a small set of relevant association rules to construct a classifier.
In the first step, some algorithms use Apriori algorithm for rule generation and some

other algorithms use other approaches such as FOIL (First Order Inductive Learner).
Mazid, Ali & Tickle (2009) compared the performance between the rule-based classification
and association rule mining algorithm based on their classification performance and
computational complexity. They concluded that Apriori is a better choice for rule-based
mining task in terms of accuracy and computational complexity.

Usually a lot of rules are generated in the first step and the main issue on second step is
that how to efficiently find out a small number of high-quality rules and how to generate a
more accurate classifier. It must be noted that some researchers focus on the first step and
try to find a minimal class association rule set (Li, Shen & Topor, 2002), but our focus is on
the second step.

Traditional algorithms use greedy approaches for selecting a small subset of generated
rules for building a classifier. By using this approach, the selected rules are not the best
subset of possible rules. Another challenge is that the resulted rules bias to prevalent
classes and classification the rare instances is a major problem. consequently, test samples
belonging to the small classes are misclassified as prevalent classes (Chen, Hsu & Hsu, 2012;
Sun, Wong & Kamel, 2009). Sometimes rules with low support and very high confidence
are effective in identifying rare events.

In this paper, we present an association rule-based classification method to obtain
an accurate and compact rule-based classifier. We used the Apriori algorithm for rule
generation and Harmony Search for selecting the best subset of rules that can build a
classifier.

The plan of this paper is as follows: at first, we present the necessary information related
to rule-based classification. In the next section, we describe the proposed method. The
Results section shows the induced results and, finally, the Discussion section concludes the
study.

Hasanpour et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.188 2/14

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.188


PRELIMINARIES
Apriori algorithm and interesting measures
Apriori is a standard and well-known basic algorithm in association rule mining that is used
for mining frequent itemsets in a set of transactions. It was first introduced by Agrawal and
Srikant (Agrawal, Imieliński & Swami, 1993). The APRIORI-C is another Apriori-based
algorithm that drives rules according to the parameters minimal confidence and minimal
support of a rule (Jovanoski & Lavrač, 2001). Predictive Apriori (Scheffer, 2001b) in another
algorithm motivated by Apriori and unlike the confidence related focus of Apriori tries
to maximizes the expected accuracy of an association rule on unseen data. While Apriori
sorts the rules based on confidence only, Predictive Apriori considers both the confidence
and support in ranking the rules.

Nahar et al. considered three rule generation algorithms—Apriori, Predictive Apriori
and Tertius- for extraction the meaningful factors for particular types of cancer (Nahar
et al., 2011) and heart disease (Nahar et al., 2013). Their experimental results showed that
Apriori is the most beneficial association rule mining algorithm.

Apriori algorithm can produce a lot of rules, but much of them are superfluous. To
select appropriate rules from the set of all possible rules, constraints on various measures of
interestingness can be used. Support and confidence are two measures of rule interestingness
that mirror the usefulness and certainty of a rule respectively (Agrawal et al., 1996). The
support is the percentage of the total number of records of transactions that include all
items in the antecedent (if) and consequent (then) parts of the rule. Frequent itemsets
are those itemsets that their frequency is greater than a predefined minimum support
(Minsup). Confidence is the ratio of the number of transactions that include all items in
the consequent, as well as the antecedent (the support) to the number of transactions that
include all items in the antecedent. In other words, confidence is the accuracy of the rule
and usually is used in Apriori for ranking the rules. The task of association rule mining is to
generate all association rules from the set of transactions that have a support greater than
Minsup and confidence greater than Mincon. Since we need to discover the relationship
between input attributes and class label, we need to find all the rules of the form A → B
that antecedent part of the rule includes of some item and the consequent part can just be
the class items.

High support and high confidence rules are not necessarily interesting. Instead of using
only support and confidence, we also used lift measure as a metric for evaluating the
significance and reliability of association rules. Lift is the ratio of Confidence to Expected
Confidence. Hence, Lift is a value that gives us information about the increase in the
probability of the consequent given antecedent part of a rule. A lift ratio larger than
1.0 implies that the relationship between the antecedent and the consequent is more
significant than would be expected and make those rules potentially useful for predicting
the consequent in unseen instances. The larger the lift ratio, the more significant the
association.

Hasanpour et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.188 3/14

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.188


Support(X H⇒Y)=P(X∩Y) (1)

Confidence(X H⇒Y)=P(Y |X)=
Support(X∩Y )
Support(X)

(2)

Lift(X,Y)=
P(X∩Y )
P(x)∗P(Y )

. (3)

Another issue that must be considered is related to the type of dataset that is appropriate
for applying the Apriori algorithm. Consider a dataset for supervised learning which
contains observations of a class label variable and a number of predictor variables. Such
a dataset can be converted into an appropriate format for association rule mining if both
the class label and the predictors are of the categorical type. Since our benchmark datasets
contain continuous variables, we must use a method for handling numeric attributes.
There are some methods for this purpose. A traditional method is discretization that can
be static or based on the distribution of data. We used a method proposed by Tsai, Lee &
Yang (2008).

Associative rules for classification
In recent years, some researchers tried to combine association rule mining and classification
(Cano, Zafra & Ventura, 2013; Li, Han & Pei, 2001; Ma & Liu, 1998; Wang, Zhou & He,
2000; Wang & Wong, 2003; Yin & Han, 2003). Their experiments show that this approach
achieves better accuracy than conventional classification algorithms such as C4.5. The
reason is that the associative classifier is composed of high-quality rules, which are
generated from highly confident event associations that reflect the close dependencies
among events.

The Classification Based on Association rules (CBA) algorithm is one of the first
efforts for combining of classification and association rule mining (Ma & Liu, 1998). This
algorithm will describe with details in the next section. Li, Han & Pei (2001) suggested
a weighted χ2 analysis to perform a Classification based on Multiple Association Rules
(CMAR). Unlike the CBA algorithm, the CMAR algorithm uses all the rules that cover the
example to be classified instead of using just one rule.

Yin & Han (2003) propose the CPAR (Classification based on Predictive Association
Rules) rule-based classification algorithm CPAR doesn’t generate a large number of
candidate rules as in conventional associative classification. It pursues a greedy algorithm
to produce rules directly from training data and uses the best K rules in prediction time.

An advantage of associative classifiers is that they are rule-based and thus lend themselves
to be more easily understood by humans. As previously stated, a classification system is
built in two phase. In the first stage, the learning target is to discover the association
patterns inherent in a database (also referred to as knowledge discovery). In the second
stage, the goal is to select a small set of relevant association patterns to construct a classifier
given the predictor attributes. To produce the best classifier out of the entire set of rules,
we need to consider all the feasible subsets of rules and selecting the most accurate subset.
This is clearly impractical.

Hasanpour et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.188 4/14

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.188


In the classification phase, some methods such as (Ma & Liu, 1998; Thabtah, Cowling
& Peng, 2004; Wang, Zhou & He, 2000), simply select a rule with a maximal user-defined
measure, such as confidence. If there is no rule covering the example, then usually the
prevalent class is taken to be the predicted class. However, identifying the most effective
rule at classifying a new case is a big challenge. When classifying a new data object, it may
have more rules that satisfy the test conditions and using them may increase the prediction
accurately (Li, Han & Pei, 2001).

CBA algorithm
Classification Based on Associations (CBA) algorithm is one of the first algorithms to bring
up the idea of classification using association rules (Ma & Liu, 1998). CBA implements
the famous Apriori algorithm (Agrawal, Imieliński & Swami, 1993) in order to discover
frequent items. Once the discovery of frequent items finished, CBA proceeds by converting
any frequent item that passes the Minconf into a rule in the classifier. At the rule generation
phase, CBA selects a special subset of association rules whose right-hand-side are restricted
to the classification class attribute. This subset of rules is called class association rules
(CARs). At the next step, the CBA algorithm builds a classifier using CARs. At this step,
CBA uses a heuristic approach and sorts the rules according to their confidence and selects
top rules that cover the training samples.

The algorithm first selects the best rule (rule having the highest confidence), then
eliminates all the covered examples. If at least one example satisfied the rule conditions,
then that rule is appended to the final rules. This procedure is repeated until there are no
more rules to select or there are no more examples to cover. The algorithm then stops and
returns the classifier in the form of an IF-THEN-ELSE rule list. One challenge with this
approach is that selecting the best rules may be not the best subset of rules.

The CBA system follows the original association rule model and uses a single Minsup
in its rule generation. It seems that this is inadequate for mining of CARs because class
frequency distributions in many practical classification datasets is unbalanced. We used the
CBA algorithm with three little changes. The first change is that we use multiple Minsup
than can be useful for imbalanced datasets. The second change is that in the original CBA
algorithm once each sample is covered by a rule, it is removed from the samples; we defined
a parameter called Delta. This parameter defined that how many times each sample must
be covered to remove from the samples (Li, Han & Pei, 2001). This approach leads to
the generation of more rules. The third change occurs in the classification phase. In the
classification phase of the original CBA algorithm, the rule with maximum confidence
that covers the test conditions defines the class label of a test sample. We select the top K
(a predefined parameter) rules from each class that covers the test sample conditions and
determined the class label according to the sum of the confidence of selected rules.

All data preprocessing and analyse were conducted using Matlab version 2014a (The
MathWorks Inc., Natick, MA, USA).

Proposed method
The proposed method of rule selection based on HS are depicted in Fig. 1. At the initial step,
we did some preprocessing on each dataset. One of the main preprocessing is discretization

Hasanpour et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.188 5/14

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.188


1 

 
Training 

data 
Discretization 

Convert to appropriate 

format 

Apply Apriori algorithm 

Harmony search 

Classification using 

selected rules 

Test 

data 

Apply CBA algorithm 

Figure 1 The framework of the proposed method. This figure shows what steps must be done for imple-
mentation of the proposed method.

Full-size DOI: 10.7717/peerjcs.188/fig-1

of continuous features. We applied a discretization algorithm based on a class-attribute
contingency coefficient that was proposed by Tsai, Lee & Yang, (2008). After discretization,
we convert each dataset to the appropriate format such that the value of each feature can
be True (1) or False (0). For this aim, if a feature is converted to N different discrete values,
we produce N feature. After the conditions are satisfied for the Apriori algorithm, we run
this algorithm for each class with different Minsup and Minconf. The main novelty of our
study is in the next step. As previously was mentioned, the Apriori algorithm produces
many rules and CBA algorithm uses a greedy algorithm for selecting a subset of produced

Hasanpour et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.188 6/14

https://peerj.com
https://doi.org/10.7717/peerjcs.188/fig-1
http://dx.doi.org/10.7717/peerj-cs.188


rules for building a classifier. Using greedy approaches cause the selected rules to not be
the best subset of rules.

We believe that population-based evolutionary algorithms fit well to the rule selection
problem. Harmony Search (HS) is a population-based stochastic search algorithm that
inspired by the musical process of searching for a perfect state of harmony (Geem, Kim
& Loganathan, 2001). The harmony in music is analogous to the optimization solution
vector, and the musician’s improvisations are similar to local and global search methods
in optimization techniques When a musician is improvising, he has three choices: (1) to
execute any pitch from memory; (2) to execute a pitch next to any other in his memory;
(3) to execute a random pitch from the range of all possible pitches. These three options
are employed in the HS algorithm by means of three main parameters: Harmony Memory
(HM), Harmony Memory Consideration Rate (HMCR), and Pitch Adjustment Rate (PAR).
The HMCR is defined as the probability of selecting a component from the present HM
members. The PAR determines the probability of a candidate from the HM to be mutated.
The steps in the procedure of HS are as follows:

Step 1. Initialize a harmony memory (HM). The initial HM consists of a given number
of randomly generated solutions to the optimization problems under consideration.

Step 2. Improvise a new harmony from HM.
Step 3. Update the HM. If the new harmony is better than the worst harmony in HM,

then include the new harmony in the HM and exclude the worst harmony from the HM.
Step 4. If the stopping criteria we not satisfied, go to step 2.
HS has been successfully applied to various discrete optimization problems such

as Maximum Clique Problem (Afkhami, Ma & Soleimani, 2013), traveling salesperson
problem (Geem, Kim & Loganathan, 2001), tour routing (Geem, Tseng & Park, 2005),
water network design (Geem, 2006), dynamic relocation of mobile base stations in wireless
sensor networks (Moh’d Alia, 2017), and others.

In binary HS, the size of each solution equals the number of candidate’s rules. For
example, if the Apriori algorithm produces 100 rules that satisfy Minsup and Minconf
conditions then the size of each solution in HS will be equal to 100. Each solution consists
of a binary vector of rule incidences, indicating exclusion (0) or inclusion (1) of the rule in
the combination.

The standard Harmony Search (HS) is not suitable for binary representations. This
is due to the pitch adjusting operator not being able to perform the local search in the
binary space. Therefore we used the implementation of HS that proposed by Afkhami, Ma
& Soleimani (2013).

We run the HS algorithm with the following parameters: maximum number of iterations
= 20, harmony memory size = 100, Number of new harmonies = 20, harmony memory
consideration rate = 0.75.

We used Harmony Search, a music-inspired stochastic search algorithm, for selecting
the best subset of rules as a classifier. One of the important section in any meta-heuristic
algorithm is the calculation of cost function. For this aim, we apply a modified version
of the CBA algorithm on the selected rules and calculate the error rate of applying the
resulted rules on the training and validation data. At final, the solution with the minimum

Hasanpour et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.188 7/14

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.188


Table 1 Pseudo code of the proposed method. This pseudo code supposes that we have training input,
training output, test input, validation input and validation output. The code shows that how we build a
rule-based classifier and determine the test data output.

For i=1 to K fold
Determine Traininput, Trainoutput, Testinput, Testoutput, Valinput and Valoutput
Finalrules={};
For j=1 to number_class

Rulesj= apply Apriori algorithm(traininput, Minsupj,Minconj,class j )
Finalrules= append Rulesj to Finalrules

End %for j
Selected_rules=Apply harmony search algorithm (Finalrules, Traininput, Trtainoutput, Valinput,
Valoutput)
Testoutput= apply selected_rules on Testinput
End %for i

cost value is selected and this solution (a subset of rules) applies on the test data. It is
obvious that the proposed flowchart is shown for one fold of cross-validation. In K-fold
cross-validation, this approach must be repeated for K times, until all the samples in the
dataset are used for the test data. The pseudo code of the proposed method is shown in
Table 1.

Time complexity of the Apriori algorithm and association rule mining is a critical
challenge that must be considered (Cano, Luna & Ventura, 2013; Cano, Zafra & Ventura,
2014; Luna et al., 2016; Thabtah, Cowling & Hammoud, 2006). As its time complexity is
exponential, we can do some preprocessing activity to decrease the running time. First of
all, we can apply feature selection before applying Apriori algorithm. Feature selection can
be done before or after of discretization. The second thing that we can do is related to the
size of the rules. As small rules are favorable, we can limit the size of items that appear in a
rule and consequently decrease the running time of Apriori algorithm.

RESULTS
We applied the proposed method on seventeen benchmark dataset and compare its
result with traditional association rule classification algorithms. We compared our
proposed method with the CPAR, CBA and C4.5 algorithms that are famous in rule-based
classification (Ma & Liu, 1998; Quinlan, 1993; Yin & Han, 2003) The characteristic of the
used datasets are shown in Table 2. We selected datasets with a verity of size in samples,
attributes and number of classes.

To run the experiments, a stratified five-fold cross-validation was used to produce a
reliable accuracy. Cross-validation is a standard evaluation measure for calculating error
rate on data in machine learning. At each run, we split each dataset to five parts, three part
for training, one part for validation and one part for testing. To increase reliability, the
experiments for each dataset have been repeated 10 times and the average of results were
reported.

Hasanpour et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.188 8/14

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.188


Table 2 Description of used datasets. Each row shows the specifications of a used dataset including the
name of the dataset, the number of features, the number of classes and distribution of classes.

id Dataset # of
Data items

# of features # of class distribution

1 Iris 150 4 3 50–50–50
2 Galaxy 323 4 7 51–28–46–38–80–45–35
3 Wine 178 13 3 59–71–48
4 Tictactoe 958 9 2 626–332
5 SAHeart 462 9 2 160–302
6 Car 1,728 6 4 1,210–384–65–69
7 Breast cancer 699 19 2 458–241
8 Yeast 1,484 8 10 244–429–463–44–35–51–163

–30–20–5
9 Balance scale 625 4 3 49–288–288
10 lymphography 148 18 4 2–81–61–4
11 Haberman 306 3 2 225–81
12 Mammographic 830 5 2 427–403
13 phoneme 5,404 5 2 3,818–1,586
14 Pima 768 8 2 267–500
15 German 1,000 20 2 700–300
16 Monks-2 432 6 2 142–290
17 Monks-3 432 6 2 228–204

The result of the proposed method is shown in Table 3. As the results show, at four
dataset decision tree gain the best accuracy, CPAR algorithm have the highest accuracy
at the five datasets and our proposed method is the best at the seven datasets out of nine
datasets. In one dataset, all algorithms are perfect and gain equal accuracy. The CBA
algorithm is not the best in none of the datasets and in all of the datasets our proposed
method outperformed the CBA algorithm. It must be noted that the results of decision
tree, CBA and CPAR algorithms are reproduced on the same partitions.

We used the Friedman test (Friedman, 1940) as an appropriate choice for comparing
multiple classification algorithms (Brazdil & Soares, 2000; Demšar, 2006). The Friedman
test is a non-parametric statistical test developed by Milton Friedman (Friedman, 1937;
Friedman, 1940). Similar to the parametric repeated measures ANOVA, it is used to detect
differences between groups when the dependent variable being measured is ordinal. It
must be noted that classification algorithms are ranked on each of the datasets and then
the Friedman test is applied.

The last row of Table 3 shows the mean rank for each of the algorithms. As the results
show, proposed method gained the best position and CBA has the worst one. The results
also shows that there is an overall statistically significant difference between the mean ranks
of the related algorithms (P =0.0005).

The reported accuracy of other studies may be different in some algorithms from our
ones. One of the main reason for this conflict is that we had no information about the

Hasanpour et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.188 9/14

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.188


Table 3 Experiment results based on 20 repetitions of 10-fold cross validation. Each row of the table
shows the accuracy of applying four rule-based classification algorithm on a dataset.

Id dataset Decision tree CBA CPAR Proposed
method

1 Iris 91.33 94.6 94.67 96.67
2 Galaxy 68.73 56.66 60.37 61.12
3 Wine 86.52 95.12 94.38 97.19
4 Tictactoe 91.22 80.48 97.39 92.81
5 SAHeart 64.94 66.45 71.21 73.43
6 Car 94.68 73.38 95.78 80.22
7 Breast cancer 92.85 85.69 95.42 96.28
8 Yeast 54.99 52.96 57.61 56.27
9 Balance scale 77.44 72.20 71.36 73.76
10 Lymphography 73.65 57.43 83.11 74.32
11 haberman 71 73 73.84 75.16
12 mammographic 81.08 80.81 81.49 83.20
13 phoneme 85.79 70.65 70.73 77.22
14 pima 71.08 71 64.84 72.01
15 German 73.1 67 71.40 68.7
16 Monks 2 73.61 60.49 80.79 64.58
17 Monk 3 1 1 1 1

Mean Rank 2.55 3.5 2.14 1.79

discretization algorithm, in particularly the number of ranges used to discretize continuous
attributes. Using different discretization approaches can result in different outputs.

DISCUSSION
This research has focused on the application of computational intelligence in association
rule mining-based classifiers. Although rule-based classification algorithms have high
classification accuracy, but some of them suffer from a critical limitation. They used a
heuristic approach for selection a subset of rules for building a classifier. It is obvious
that the selected rules may not be the best subset of possible rules. Another challenge of
existing algorithms is related to rare class. Using greedy approaches, the resulted rules bias
to prevalent classes and classification the rare instances is a major problem.

We combined the Apriori, CBA and Harmony Search algorithms in order to build a
rule-based classifier that has a high prediction accuracy. We used Apriori algorithm with
multiple Minsup for rule generation. Since the number of rules that satisfy Minsup and
Minconf conditions is high and considering all subset of rules is not possible, we applied
the Harmony Search algorithm for finding the best subset of rules that can be used as
a classifier. Harmony Search (HS) is a relatively simple yet very efficient evolutionary
algorithm. One of the main sections in every population based algorithms is calculating the
cost function. For every solution (subset of selected rules) we applied a modified version of
the CBA algorithm on training and validation data and assigned the resulted value to the
cost function. The statistical and experimental results of applying the proposed method

Hasanpour et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.188 10/14

https://peerj.com
http://dx.doi.org/10.7717/peerj-cs.188


on seventeen benchmark dataset demonstrate that our proposed method outperformed
famous algorithms such as tree search, CBA and CPAR in general.

One of the limitations of the proposed method is that it does not gain proper accuracy in
datasets with many class number. Another limitation in our study is that we used accuracy
measure for comparing the algorithms. Using measures such as precision and recall better
reflects the benefits of the proposed method. Our aim in the future is to tackle these
problems.

ADDITIONAL INFORMATION AND DECLARATIONS

Funding
The authors received no funding for this work.

Competing Interests
The authors declare there are no competing interests.

Author Contributions
• Hesam Hasanpour conceived and designed the experiments, performed the experiments,
analyzed the data, contributed reagents/materials/analysis tools, prepared figures and/or
tables, performed the computation work, authored or reviewed drafts of the paper,
approved the final draft.
• Ramak Ghavamizadeh Meibodi authored or reviewed drafts of the paper, approved the
final draft.
• Keivan Navi authored or reviewed drafts of the paper, approved the final draft.

Data Availability
The following information was supplied regarding data availability:

This work uses standard benchmark datasets that can be downloaded from the following
addresses: https://sci2s.ugr.es/keel/datasets.php;

https://archive.ics.uci.edu/ml/index.php.
The MATLAB code is available as a Supplemental File.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/
peerj-cs.188#supplemental-information.

REFERENCES
Afkhami S, Ma OR, Soleimani A. 2013. A binary Harmony Search algorithm for solving

the maximum clique problem. International Journal of Computer Applications
69(12):38–43.

Agrawal R, Imieliński T, Swami A. 1993. Mining association rules between sets of items
in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference
on management of data, Washington DC, 25-28 May 1993. New York: ACM, 207–216
DOI 10.1145/170035.170072.

Hasanpour et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.188 11/14

https://peerj.com
https://sci2s.ugr.es/keel/datasets.php
https://archive.ics.uci.edu/ml/index.php
http://dx.doi.org/10.7717/peerj-cs.188#supplemental-information
http://dx.doi.org/10.7717/peerj-cs.188#supplemental-information
http://dx.doi.org/10.7717/peerj-cs.188#supplemental-information
http://dx.doi.org/10.1145/170035.170072
http://dx.doi.org/10.7717/peerj-cs.188


Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo AI. 1996. Fast discovery of
association rules. Advances in Knowledge Discovery and Data Mining 12:307–328.

Brazdil PB, Soares C. 2000. A comparison of ranking methods for classification algo-
rithm selection. In: European conference on machine learning. Springer, 63–75.

Burdick D, Calimlim M, Gehrke J. 2001. Mafia: a maximal frequent itemset algorithm
for transactional databases. In: Proceedings of the 17th international conference on data
engineering, 2001. Piscataway: IEEE, 443–452.

Cano A, Luna JM, Ventura S. 2013. High performance evaluation of evolutionary-
mined association rules on GPUs. The Journal of Supercomputing 66:1438–1461
DOI 10.1007/s11227-013-0937-4.

Cano A, Zafra A, Ventura S. 2013. An interpretable classification rule mining algorithm.
Information Sciences 240:1–20 DOI 10.1016/j.ins.2013.03.038.

Cano A, Zafra A, Ventura S. 2014. Parallel evaluation of Pittsburgh rule-based classifiers
on GPUs. Neurocomputing 126:45–57 DOI 10.1016/j.neucom.2013.01.049.

Chen W-C, Hsu C-C, Hsu J-N. 2012. Adjusting and generalizing CBA algorithm
to handling class imbalance. Expert Systems with Applications 39:5907–5919
DOI 10.1016/j.eswa.2011.11.113.

Demšar J. 2006. Statistical comparisons of classifiers over multiple data sets. Journal of
Machine Learning Research 7:1–30.

Friedman M. 1937. The use of ranks to avoid the assumption of normality implicit in
the analysis of variance. Journal of the American Statistical Association 32:675–701
DOI 10.1080/01621459.1937.10503522.

Friedman M. 1940. A comparison of alternative tests of significance for the problem of m
rankings. The Annals of Mathematical Statistics 11:86–92
DOI 10.1214/aoms/1177731944.

Geem ZW. 2006. Optimal cost design of water distribution networks using harmony
search. Engineering Optimization 38:259–277 DOI 10.1080/03052150500467430.

Geem ZW, Kim JH, Loganathan GV. 2001. A new heuristic optimization algorithm:
harmony search. Simulation 76:60–68 DOI 10.1177/003754970107600201.

Geem ZW, Tseng C-L, Park Y. 2005. Harmony search for generalized orienteering
problem: best touring in China. In: International conference on natural computation.
Springer, 741–750.

Ilayaraja M, Meyyappan T. 2013. Mining medical data to identify frequent diseases
using Apriori algorithm. In: Pattern recognition, informatics and mobile engineering
(PRIME), 2013 international conference on. Piscataway: IEEE, 194–199.

Jovanoski V, Lavrač N. 2001. Classification rule learning with APRIORI-C. In: Portuguese
conference on artificial intelligence. Springer, 44–51.

Li W, Han J, Pei J. 2001. CMAR: accurate and efficient classification based on multiple
class-association rules. In: Data mining, 2001 ICDM 2001, proceedings IEEE interna-
tional conference on. Piscataway: IEEE, 369–376.

Li J, Shen H, Topor R. 2002. Mining the optimal class association rule set. Knowledge-
Based Systems 15:399–405 DOI 10.1016/S0950-7051(02)00024-2.

Hasanpour et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.188 12/14

https://peerj.com
http://dx.doi.org/10.1007/s11227-013-0937-4
http://dx.doi.org/10.1016/j.ins.2013.03.038
http://dx.doi.org/10.1016/j.neucom.2013.01.049
http://dx.doi.org/10.1016/j.eswa.2011.11.113
http://dx.doi.org/10.1080/01621459.1937.10503522
http://dx.doi.org/10.1214/aoms/1177731944
http://dx.doi.org/10.1080/03052150500467430
http://dx.doi.org/10.1177/003754970107600201
http://dx.doi.org/10.1016/S0950-7051(02)00024-2
http://dx.doi.org/10.7717/peerj-cs.188


Luna JM, Cano A, Pechenizkiy M, Ventura S. 2016. Speeding-up association rule mining
with inverted index compression. IEEE Transactions on Cybernetics 46:3059–3072
DOI 10.1109/TCYB.2015.2496175.

Ma BLWHY, Liu B. 1998. Integrating classification and association rule mining. In:
Proceedings of the fourth international conference on knowledge discovery and data
mining.

Mazid MM, Ali AS, Tickle KS. 2009. A comparison between rule based and association
rule mining algorithms. In: Network and system security, 2009 NSS’09 third interna-
tional conference on. Piscataway: IEEE, 452–455.

Moh’d Alia O. 2017. Dynamic relocation of mobile base station in wireless sensor
networks using a cluster-based harmony search algorithm. Information Sciences
385:76–95.

Nahar J, Imam T, Tickle KS, Chen Y-PP. 2013. Association rule mining to detect
factors which contribute to heart disease in males and females. Expert Systems with
Applications 40:1086–1093 DOI 10.1016/j.eswa.2012.08.028.

Nahar J, Tickle KS, Ali AS, Chen Y-PP. 2011. Significant cancer prevention factor
extraction: an association rule discovery approach. Journal of Medical Systems
35:353–367 DOI 10.1007/s10916-009-9372-8.

Quinlan JR. 1993. C4.5: programs for machine learning. Burlington: Morgan Kauffman.
Sarno R, Dewandono RD, Ahmad T, Naufal MF, Sinaga F. 2015. Hybrid association rule

learning and process mining for fraud detection. International Journal of Computer
Science 42(2):59–72.

Scheffer T. 2001a. Finding association rules that trade support optimally against confi-
dence. In: European conference on principles of data mining and knowledge discovery.
Springer, Berlin, Heidelberg, 424–435.

Scheffer T. 2001b. Finding association rules that trade support optimally against
confidence. Principles of Data Mining and Knowledge Discovery 424–435
DOI 10.1007/3-540-44794-6_35.

Shin AM, Lee IH, Lee GH, Park HJ, Park HS, Yoon KI, Lee JJ, Kim YN. 2010. Diagnostic
analysis of patients with essential hypertension using association rule mining.
Healthcare Informatics Research 16:77–81 DOI 10.4258/hir.2010.16.2.77.

Sun Y, Wong AK, Kamel MS. 2009. Classification of imbalanced data: a review. In-
ternational Journal of Pattern Recognition and Artificial Intelligence 23:687–719
DOI 10.1142/S0218001409007326.

Thabtah F, Cowling P, Hammoud S. 2006. Improving rule sorting, predictive accuracy
and training time in associative classification. Expert Systems with Applications
31:414–426 DOI 10.1016/j.eswa.2005.09.039.

Thabtah FA, Cowling P, Peng Y. 2004. MMAC: a new multi-class, multi-label associative
classification approach. In: Data mining, 2004 ICDM’04 fourth IEEE international
conference on. Piscataway: IEEE, 217–224.

Tsai C-J, Lee C-I, Yang W-P. 2008. A discretization algorithm based on class-attribute
contingency coefficient. Information Sciences 178:714–731
DOI 10.1016/j.ins.2007.09.004.

Hasanpour et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.188 13/14

https://peerj.com
http://dx.doi.org/10.1109/TCYB.2015.2496175
http://dx.doi.org/10.1016/j.eswa.2012.08.028
http://dx.doi.org/10.1007/s10916-009-9372-8
http://dx.doi.org/10.1007/3-540-44794-6_35
http://dx.doi.org/10.4258/hir.2010.16.2.77
http://dx.doi.org/10.1142/S0218001409007326
http://dx.doi.org/10.1016/j.eswa.2005.09.039
http://dx.doi.org/10.1016/j.ins.2007.09.004
http://dx.doi.org/10.7717/peerj-cs.188


Wang Y, Wong AKC. 2003. From association to classification: inference using weight
of evidence. IEEE Transactions on Knowledge and Data Engineering 15:764–767
DOI 10.1109/TKDE.2003.1198405.

Wang K, Zhou S, He Y. 2000. Growing decision trees on support-less association rules.
In: Proceedings of the sixth ACM SIGKDD international conference on knowledge
discovery and data mining. New York: ACM, 265–269.

Yin X, Han J. 2003. CPAR: classification based on predictive association rules. In:
Proceedings of the 2003 SIAM international conference on data mining. Philadelphia:
SIAM, 331–335.

Hasanpour et al. (2019), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.188 14/14

https://peerj.com
http://dx.doi.org/10.1109/TKDE.2003.1198405
http://dx.doi.org/10.7717/peerj-cs.188