Mining the data from a hyperheuristic approach using associative classification This is a repository copy of Mining the data from a hyperheuristic approach using associative classification. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/75051/ Version: Submitted Version Article: Thabtah, Fadi A. and Cowling, Peter I. orcid.org/0000-0003-1310-6683 (2008) Mining the data from a hyperheuristic approach using associative classification. Expert systems with applications. pp. 1093-1101. ISSN 0957-4174 https://doi.org/10.1016/j.eswa.2006.12.018 eprints@whiterose.ac.uk https://eprints.whiterose.ac.uk/ Reuse Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of the full text version. This is indicated by the licence information on the White Rose Research Online record for the item. Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing eprints@whiterose.ac.uk including the URL of the record and the reason for the withdrawal request. Mining the data from a hyperheuristic approach using associative classification Fadi Thabtah a,*, Peter Cowling b a Department of Computing and Engineering, University of Huddersfield, Huddersfield, UK b MOSAIC Research Centre, University of Bradford, Bradford, UK Abstract Associative classification is a promising classification approach that utilises association rule mining to construct accurate classification models. In this paper, we investigate the potential of associative classifiers as well as other traditional classifiers such as decision trees and rule inducers in solutions (data sets) produced by a general-purpose optimisation heuristic called the hyperheuristic for a personnel scheduling problem. The hyperheuristic requires us to decide which of several simpler search neighbourhoods to apply at each step while constructing a solutions. After experimenting 16 different solution generated by a hyperheuristic called Peckish using different classifi- cation approaches, the results indicated that associative classification approach is the most applicable approach to such kind of problems with reference to accuracy. Particularly, associative classification algorithms such as CBA, MCAR and MMAC were able to predict the selection of low-level heuristics from the data sets more accurately than C4.5, RIPPER and PART algorithms, respectively. � 2007 Elsevier Ltd. All rights reserved. Keywords: Associative classification; Classification; Data mining; Hyperheuristic; Scheduling 1. Introduction Heuristic and metaheuristic approaches have been applied widely in personnel-scheduling problems (Blum & Roli, 2003; Cowling, Kendall, & Han, 2002). A metaheuris- tic could be defined as a recursive process which directs a simpler local search method by using different concepts for exploring and exploiting the search space in order to achieve good enough solutions (Blum & Roli, 2003). There are several different metaheuristic strategies for solving scheduling and optimisation problems such as local search, tabu search, simulated annealing and variable neighbour- hood search. Hamiez and Hao (2001) have used a tabu search-based method to solve the sport league scheduling problem (SLSP). Their implementation of the enhanced tabu search algorithm was able to schedule a timetable for up to 40 teams and its performance in term of the CPU time was excellent if compared with previous algorithms such as that of (McAloon, TretKoff, & Wetzel, 1997) that had been used for solving the same problem. Aicklen and Dowsland (2000) have used Genetic algorithms to deal with a nurse rostering problem in major UK hospi- tals, and Hansen and Mladenovic (1997) have showed that variable neighbourhood search is an effective approach for solving optimisation problems in which it generates good or sometimes near-optimal solutions in a moderate time. Cowling, Kendall, and Soubeiga (2000) and Cowling et al. (2002) argued that metaheuristic and heuristic approaches tend to be knowledge rich and require exten- sive experience in the problem domain and the selected heuristic techniques, and therefore they are expensive in term of their implementation. A new general framework to deal with large and complex optimisation and schedul- ing problems, called a hyperheuristic, has been proposed 0957-4174/$ - see front matter � 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2006.12.018 * Corresponding author. E-mail addresses: F.Thabtah@hud.ac.uk (F. Thabtah), P.I.Cowling@- bradford.ac.uk (P. Cowling). www.elsevier.com/locate/eswa Available online at www.sciencedirect.com Expert Systems with Applications 34 (2008) 1093–1101 Expert Systems with Applications by Cowling et al. (2000). It tends to robustly find good solutions for large and complex scheduling problems and has been proven to be effective in many experiments (Cowling & Chakhlevitch, 2003; Cowling et al., 2002). A hyperheuristic approach can be described as a super- visor, which controls the choice of which local search neighbourhood to choose while constructing a solution/ schedule. A local search neighbour, also known as a low- level heuristic, is a rule or a simple method that generally yields a small change in the schedule. Often these low-level heuristics are based on normal methods of constructing a schedule such as adding an event, deleting an event or swapping two events. Fig. 1 represents the general hype- rheuristic framework that at each iteration, selects and applies the low-level heuristic that has the largest improve- ment on the objective function, i.e. LLH5 in the figure shown below. The arrows going from and to the hyper- heuristyic in Fig. 1 represent the selected low-level heuris- tics improvement values on the objective function obtained after trying them by the hyperheuristic. The training scheduling problem that we consider in this paper is a complex optimisation problem for a large finan- cial service company (Cowling et al., 2002). It involves a number of events, trainers, and locations to be scheduled over a period of time. The task is to create a timetable of courses distributed over several locations in a specific period of time using a known number of trainers. A more detailed description of the problem is presented in the next section. In this paper, our aim is to determine an applicable data mining technique to the problem of deciding which low-level heuristic to apply in a given situation, using infor- mation about heuristic performance derived earlier. In particular, we would like to answer questions like: which learning algorithm can derive knowledge that could direct the search in order to produce good solutions? To achieve our goal, we compare three associative classi- fication techniques CBA (Liu, Hsu, & Ma, 1998), MCAR (Thabtah, Cowling, & Peng, 2005), MMAC (Thabtah, Cowling, & Peng, 2004) and two popular traditional classification techniques (PART (Frank & Witten, 1998) and RIPPER (Cohen, 1995)) on data sets generated using a hybrid hyperhueristic called Peckish (Cowling & Chakhlevitch, 2003), for the trainer scheduling problem. We analyse data from several solutions of the Peckish hyperheuristic that combines greedy (best first) and ran- dom approaches. We identify that the learning task involves classifying low-level heuristics in terms whether they improved the objective function in old solutions in order to produce useful rules. These rules then will be used to decide the class attribute ‘‘low-level heuristic’’ while con- structing new solutions. We use the classification algo- rithms mentioned above to learn the rules. The training scheduling problem and different hyperheu- ristic approaches utilised to solve it are discussed in Section 2. Section 3 is devoted to the applicability of data mining classification algorithms to predict the behaviour of low-level heuristics used by the Peckish hyperheuristic. Data sets, their features and experimental results are presented in Section 4 and finally conclusions are given in Section 5. 2. The training scheduling problem and hyperheuristics A much simpler version of the training scheduling prob- lem has been solved in Cowling et al. (2002) using a Hyper- Genetic algorithm. A larger and more complex problem, which has been described in Cowling and Chakhlevitch (2003) is summarised in this section. It involves a number of events, trainers, and locations to be scheduled over a period of time. The task is to create a timetable of geo- graphically distributed courses over a period of time using different trainers, and the aim is to maximise the total pri- ority of courses and to minimise the amount of travel for each trainer. The problem is associated with a large num- ber of constraints such as: • Each event is to be scheduled at one location from the available number of locations. • Each event must start within a specified time period. • Each event can occur at most once. • Each event to be delivered by competent trainers from the available trainers. • Each location has a limited number of rooms and rooms have different capacities and capabilities. The data used to build the solutions of the training scheduling problem is real data provided by a financial firm where training is given by 50 trainers over a period of 3 months in 16 different locations. Further, there are about 200 events to be scheduled and 95 different low-level heuris- tics that can be used to build each solution. However, solu- tions given to us by the authors of Cowling and Chakhlevitch (2003) have been constructed using only 10 low-level heuristics, where each low-level heuristic repre- sents local search neighbourhoods. For example, selecting a location with the lowest possible travel penalty for a par- ticular trainer to deliver a course as early as possible corre- sponds to a low-level heuristic. In solving the trainer scheduling problem, three hype- rheuristic approaches, random, greedy and hybrid, have been used. All of these approaches aim to manage the choice of which low-level heuristics to choose during the process of building the solution. The random approach LLH1 LLH2 LLH5LLH3 LLH4 LLH6 Hyperheurisic Fig. 1. Hypreheuristic general framework. 1094 F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 selects a low-level heuristic at random from the available ones in the problem. At each choice point in the search space, commonly all low-level heuristics have an equal chance to be selected. On the other hand, the greedy approach selects the low-level heuristic that yields the big- gest improvement in the objective function. If none of the available ones improve the objective function then the algorithm will be trapped in a local optimum. The hybrid approach is named Peckish and consists of a combination of greedy and random approaches. It builds a solution by selecting a single low-level heuristic during each iteration in the search in order to apply. The choice is based on the low-level heuristic that has the largest improvement on the objective function in the problem (if one exists). In the case that none of the available low-level heuristics improve upon the objective function value, then the choice is random. In this paper, we choose a low-level heuristic from a can- didate list of good low-level heuristics. By changing the length of this candidate list and considering how it is merged, we can trade off the degree of greediness and ran- domness in the Peckish hyperheuristic. As a result, several different solutions produced by the Peckish hyperheuristic are investigated. We analysed the strategy used by the Peckish hyperheu- ristic to construct a solution and observed that all available low-level heuristics in the problem must be tested in order to record their effect on the objective function at each iter- ation and apply only a single one. Data mining could pro- vide a much quicker prediction of effective low-level heuristics at each iteration. In the next section, we investi- gate some of the popular data mining techniques for learn- ing the sets of low-level heuristics that improve the objective function and have been applied by the Peckish hyperhueristic. 3. Data mining for the selection of low-level heuristics Since we are aiming to use knowledge derived from old solutions of the problem, data mining seems an appropri- ate technique to extract that knowledge. The next task is to identify which data mining method is applicable to extract knowledge from solutions generated by the Peck- ish hyperheuristic. As mentioned earlier, the Peckish hype- rheuristic usually selects and applies the low-level heuristic that leads to the largest improvement on the objective function and this is the class we want to find. In other words, we can learn rules that predict the performance of low-level heuristics in some solution runs and use these rules to forecast which low-level heuristics the hyperheu- ristic should choose in other runs. Since we are predicting a particular attribute (low-level heuristic), as a result, supervised learning approaches such as classification are appropriate. There are many classification approaches for extracting knowledge from data that have been studied in the litera- ture Cendrowska (1987), Quinlan (1993) and Cohen (1995). Three common approaches, divide-and-conquer (Quinlan, 1987), rule induction (Cohen, 1995; Furnkranz & Widmer, 1994) and associative classification (Li, Han, & Pei, 2001; Liu et al., 1998) have been selected for our base comparison. Further, five classification techniques related to such approaches have been compared, which are PART (Frank & Witten, 1998), RIPPER (Cohen, 1995), CBA (Liu et al., 1998), MCAR (Thabtah et al., 2005) and MMAC (Thabtah et al., 2004). Our choice of these methods is based on the different schemes they use in learning rules from data sets. In the next subsection, we briefly survey these algorithms. 3.1. Associative classification Associative classification techniques employ association rule discovery methods to find the rules. This approach was introduced in 1997 by Ali, Manganaris, and Srikant (1997) to produce rules for describing relationships between attri- bute values and the class attribute and not for prediction, which is the ultimate goal for classification. In 1998, asso- ciative classification has been successfully employed to build classification models (classifiers) by Liu et al. (1998) and later attracted many researchers, e.g. (Yin & Han, 2003), from data mining and machine learning communi- ties. In this subsection we survey associative classification techniques used in this paper to generate rules from the hyperheuristic data. 3.1.1. Classification based on association (CBA) The idea of using association rule mining in classifica- tion problems was first introduced in Liu et al. (1998), in which an algorithm called CBA is proposed, which oper- ates in three main steps. Firstly, if the intended data set contains any real or integer attributes, it is disctretised using multi-interval discretisation method of Fayyad and Irani (1993). Secondly, the Apriori candidate generation step (Agrawal & Srikant, 1994) is adopted to find the potential rules. Apriori candidate generation method necessitates multiple passes, where the potential rules found in the previous pass are used for the generation of potential rules in the current pass. This repetitive scans requires high CPU time and main memory. Once all poten- tial rules are produced, the subset that leads to the least error rate against the training data set is selected to from the classifier. The selection of such subset is accomplished using the database coverage heuristic, which ensures that every rule in the classifier must cover correctly at least one training data object. 3.1.2. MCAR: multi-class classification based on association rule A recently developed associative classification algo- rithm called MCAR (Thabtah et al., 2005) employs tid- list intersections to quickly find the rule. This algorithm consists of two main phases: rules generation and a clas- sifier builder. In the first phase, the training data set is F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 1095 scanned once to discover the potential rules of size one, and then MCAR intersects the potential rules tid-lists of size one to find potential rules of size two and so forth. This rules discovery method does no require passing over the training data multiple times. In the second phase, rules created are used to build a classifier by considering their effectiveness on the training data set. Potential rules that cover certain number of training objects will be kept in the final classifier. Finally, MCAR adds upon previous rule ranking approaches in associative classification, which are based on (confidence, support, rule length) by looking at the class distribution frequencies in the training data and prefers rules that are associated with dominant classes. Experimental results showed that MCAR rule ranking method reduces rule random selection during the process of ranking the rules especially for dense clas- sification data. 3.1.3. Multi-class, multi-label associative classification (MMAC) The MMAC algorithm consists of three steps: rules gen- eration, recursive learning and classification. It passes over the training data in the first step to discover and generate a complete set of rules. Training instances that are associated with the produced rules are discarded. In the second step, MMAC proceeds to discover more rules that pass user pre- defined thresholds denoted by minimum-support and mini- mum-confidence from the remaining unclassified instances, until no further potential rules can be found. Finally, rule sets derived during each iteration are merged to form a glo- bal multi-label classifier that then is tested against test data. The distinguishing feature of MMAC is its ability of gener- ating rules with multiple classes from data sets where each of their data objects is associated with just a single class. This provides decision makers with useful knowledge dis- carded by other current associative classification algorithms. 3.2. Traditional classification approaches 3.2.1. C4.5 C4.5 algorithm was created by Quinlan (1993) as a deci- sion tree method for extracting rules from a data set. C4.5 is an extension of the ID3 algorithm (Quinlan, 1979), which accounts for missing values, continuous attributes and pruning of decision trees. As for the ID3 algorithm, C4.5 uses information gain to select the root attribute. The algo- rithm selects a root attribute from the ones available in the training data set. C4.5 makes the selection of the root based on the most informative attribute and the process of selecting an attribute is repeated recursively at the so- called child nodes of the root, excluding the attributes that have been chosen before, until the remaining training data objects can not be split any more (Quinlan, 1979). At that point, a decision tree is outputted where each node corre- sponds to an attribute and each arc to a possible value of that attribute. Each path from the root node to any give leaf in the tree corresponds to a rule. One of the major extensions of the ID3 algorithm that C4.5 proposed is that of pruning. Two known pruning methods used by C4.5 to simplify the decision trees constructed are sub-tree replace- ment and pessimistic error estimation (Witten & Frank, 2000). 3.2.2. Repeated incremental pruning to produce error reduction algorithm (RIPPER) RIPPER is a rule induction algorithm that has been developed in 1995 by Cohen (1995). It builds the rules set as follows: The training data set is divided into two sets, a pruning set and a growing set. RIPPER constructs the classifier using these two sets by repeatedly inserting rules starting from an empty rule set. The rule-growing algorithm starts with an empty rule, and heuristically adds one condi- tion at a time until the rule has no error rate on the growing set. In fact, RIPPER is a refined version of an earlier developed algorithm called Incremental Reduced Error Pruning (IREP) (Furnkranz & Widmer, 1994) that adds a post pruning heuristic on the rules. This heuristic has been applied to the classifier produced by IREP as an optimisation phase, aiming to simplify the rule set fea- tures. For each rule ri in the rule set, two alternative rules are built; the replacement of ri and the revision of ri. The replacement of ri is created by growing an empty rule r 0 i and then pruning it in order to reduce the error rate of the rules set including r0 i on the pruning data set. The revi- sion of ri is constructed similarly except that the revision rule is built heuristically by adding one condition at a time to the original ri rather than to an empty rule. Then the three rules are examined on the pruning data to select the rule with the least error rate. The integration of IREP and the optimisation procedure forms the RIPPER algorithm. 3.2.3. PART Unlike the C4.5 and RIPPER techniques that operate in two phases, the PART algorithm generates rules one at a time by avoiding extensive pruning (Frank & Witten, 1998). The C4.5 algorithm employs a divide-and-conquer approach, and the RIPPER algorithm uses rule induction approach to derive the rules. PART combines both approaches to find and generate rules. It adopts rule induc- tion approach to generate a set of rules and uses divide- and-conquer to build partial decision trees. The way PART builds and prunes a partial decision tree is similar to that of C4.5, but PART avoids constructing a complete decision tree and builds partial decision trees. PART differs from RIPPER in the way rules are created, where in PART, each rule corresponds to the leaf with the largest coverage in the partial decision tree. On the other hand, RIPPER builds the rule in a greedy fashion, starting from an empty rule, it adds conditions, until the rule has no error rate and the process is repeated. Missing values and pruning tech- niques are treated in the same way as C4.5. 1096 F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 4. Data and experimental results 4.1. Data sets and their features Data from 16 different solutions produced by Peckish hyperheuristic for the training scheduling problem were provided by the authors of Cowling and Chakhlevitch (2003). Each solution represents 500 iterations of applied low-level heuristics and is given in a text file. Twelve of the data files each represents only a single solution, whereas, each of the remaining four files represents ten combined solutions. Ten different low-level heuristics (LLH 1, LLH 2, LLH 20, LLH 27, LLH 37, LLH 43, LLH 47, LLH 58, LLH 68, LLH 74) have been used to produce each solution. Each file consists of 15 different attributes and 5000 instances. One iteration in a single solu- tion is shown in Table 1 where the bold row indicates that low-level heuristic number 74 was applied by the Peckish hyperheuristic because it has the largest improvement on objective function. In Table 1, column LLH represents the low-level tested, ESM and RSM columns stand for event and resource selection methods, respectively which specify how an event is scheduled. SE indicates the selected event number to be scheduled. UE reflects whether another event is a conflict with currently scheduled event, i.e. they share the same trainer, location, or timeslot. EID corresponds to the event identification number and R stands for rescheduled, which means, if swapping unscheduled event from the schedule with the selected event is possible, then, it is possible to reschedule back the removed event from the schedule. OP, NP, OPE and NPE correspond to old priority, new priority, old penalty and new penalty, respectively. The new priority and new penalty values represent the total pri- ority and penalty of the schedule after applying a low-level heuristic. The difference between the new priority and new penalty values gives the value of the objective function. IMP stands for amount of improvement and represents the change in value on the objective function after a low- level heuristic has been applied. CPU is the time in search for the low-level heuristic to be applied and SC reflects whether or not the current schedule has been changed, i.e. whether the current low-level heuristic had any effect at all, (‘1’ changed or ‘0’ not changed). Finally AP column indicates whether or not the current low-level heuristic has been applied (‘1’ applied or ‘0’ not applied) by the hyperheuristic. After analysing the data in each file, we identified that six attributes have some correlation to the class attribute (LLH), which are (OP, NP, OPE, NPE, IMP, AP). How- ever, we are interested to learn rules that represent useful sequence of applied low-level heuristics at different itera- tions and lead to improvement on the objective function. Therefore, solutions generated by the Peckish hyperheuris- tic have been filtered to retain certain iterations where improvements have occurred upon the objective function. Furthermore, a PL/SQL program has been designed and implemented to generate a new structure of each solution in order to enable the extraction of rules that represent the sequence of applied low-level heuristics. In other words, each training instance in the new structure should contain the low-level heuristics that improved the objective func- tion in the current iteration along with others applied in the previous iterations. Specifically, in each solution run and for each iteration, we record low-level heuristics applied in the previous three iterations along with the ones that improved the objective function in the current iteration. Table 2 represents part of a solution run (data) gener- ated in the new structure after applying the PL/SQL pro- gram on the initial solution features, where columns LLH_3, LLH_2 and LLH_1 represent the low-level heu- ristics applied at the previous three iterations. Column LLH represents the current low-level heuristic that improved the objective function and column Imp repre- sents the improvement on the objective function value. Finally column Apply represents whether or not the selected low-level heuristic has been applied by the hype- rheuristic. As shown in Table 2, data generated by the hyperheuristic have multiple labels, since there could be more than one low-level heuristic that improve the objec- tive function at any give iteration. Hence, each training instance in the scheduling data may associate with more than one class. Table 1 One iteration of the Peckish hyperheuristic for the training scheduling problem LLH ESM RSM SE UE EID R OP NP OPE NPE IMP CPU SC AP 1 0 1 1 0 �1 �1 72,999 72,999 830 830 0 0.03 0 0 2 0 2 128 0 �1 �1 72,999 72,999 830 830 0 0.02 0 0 20 4 0 12 0 �1 61 72,999 72,999 830 830 0 0.02 0 0 27 0 2 36 �1 �1 �1 72,999 72,999 830 830 0 0 0 0 37 1 2 70 �1 �1 58 72,999 72,999 830 830 0 0 0 0 43 1 8 142 �1 �1 �1 72,999 72,999 830 830 0 0 0 0 47 2 2 22 �1 �1 �1 72,999 72,999 830 830 0 0 0 0 58 3 3 18 �1 �1 �1 72,999 72,999 830 830 0 0 0 0 68 4 3 25 �1 �1 �1 72,999 72,999 830 830 0 0 0 0 74 4 9 57 �1 �1 �1 72,999 73,099 830 830 100 0 1 1 F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 1097 4.2. Experimental results In this section, we describe the experiments to evaluate classification accuracy and rules features produced by dif- ferent rule learning algorithms from the optimisation data sets. We have performed a number of experiments using ten-fold cross validation on 16 different data files derived by the Peckish hyperheuristics for the trainer scheduling problem. The sizes of the data files after applying the PL/ SQL program vary. There are 12 data files, each contains 182–750 training instances, where each file represents only one single run of the Peckish hyperheuristic. The remaining four data files contain 1500–2400 training instances and represent ten combined solutions. Two popular traditional classification algorithms (PART, C4.5) and three associa- tive classification techniques (CBA, MCAR, MMAC) have been compared in terms of accuracy. The experiments of PART and C4.5 have been conducted using Weka software system (WEKA, 2000) and CBA experiments have been performed using a VC++ version provided by the authors of CBA (1998). Finally MCAR and MMAC algorithms were implemented in Java under Windows XP on 1.7 Ghz, 256 RAM machine. The relative prediction accuracy, which corresponds to the difference of the classification accuracy of CBA, PART and C4.5 algorithms with respect to (MCAR, MMAC) are shown in Figs. 2–4, respectively. Fig. 2 for instance signifies the difference of the accuracy between CBA classifiers rela- tive to those derived by (MCAR, MMAC) on the 16 optimisation data sets. The relative prediction accuracy figures shown are computed using the formulae ðAccuracyMCAR�AccuracyCBAÞ AccuracyCBA and ðAccuracyMMAC�AccuracyCBAÞ AccuracyCBA for CBA. The same sort of formulae has been used for PART and C4.5 algorithms, respectively. We used a minsupp of 2% and a minconf of 30% in the experiments of CBA, MCAR and MMAC algorithms. The label-weight evaluation mea- sure (Thabtah et al., 2004) has been used to calculate the accuracy for MMAC algorithm in the figures. -40.00% -30.00% -20.00% -10.00% 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Data sets A c c u ra c y ( % ) d if fe re n c e M CA R M M A C CB A Fig. 2. Difference of accuracy between CBA and (MCAR, MMAC). -40.00% -20.00% 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Data sets A c c u ra c y ( % ) D if fe re n c e M CA R M M A C P A RT Fig. 3. Difference of accuracy between PART and (MCAR, MMAC). Table 2 Sample of optimisation data Iteration LLH 3 LLH 2 LLH 1 LLH Imp Apply 1 58 68 58 20 100 1 1 58 68 58 43 25 0 2 20 27 43 74 50 1 2 20 27 43 2 10 0 2 20 27 43 58 8 0 3 43 27 58 68 875 1 4 37 20 2 37 1055 1 4 37 20 2 2 950 0 4 37 20 2 74 69 0 4 37 20 2 58 9 0 1098 F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 Label-weight evaluation method assigns each class in a multi-label rule a value based on its number of occurrences with rule’s consequent (left-hand-side) in the training data. To clarify, a training object may belong to several classes where each one associated with it by a number of occur- rences in the training data set. Each class can be assigned a weight according to how many times that class has been associated with the training object. Thus, unlike the error- rate method (Witten & Frank, 2000), which considers only one class for each rule in computing the correct predictions, label-weight gives a value for each possible class in a rule according to its frequency in the training data. This gives the top ranked class in a rule the highest weight and not all the weight as error-rate method does. The accuracy results shown in the graphs indicated that associative classification algorithms outperformed the other learning techniques over the majority of test instances. Particularly, CBA, MCAR and MMAC outperformed the other learning algorithms on 5, 6, 7 and 5 benchmark problems, respectively. The won-loss-tied record of MCAR against CBA, C4.5 and PART are 11-5-0, 10-6-0 and 11-5-0, respectively. The MMAC won-loss-tied records against CBA, C4.5 and PART are, 10-6-0, 9-7-0 and 11-5-0, res- pectively. These figures show that associative classifica- tion approach is able to produce more accurate classifiers than decision trees and rule induction approaches, res- pectively. It should be noted that CBA, PART and C4.5 algorithms outperformed (MCAR, MMAC) on data set number 14 in Figs. 2–4, respectively. After analysing the data in this par- ticular set, it turned out that classes in this set are not evenly distributed. For example, class LLH2 and LLH20 were fre- quently applied by the hyperheuristic in this data set, whereas, classes LLH27, LLH37, LLH43 and LLH58 were rarely been used by the hyperheuristic. This is compound by the limited number of training instances in this particular data set (182 training data objects). Analysis of the classifiers produced revealed consistency in the accuracy of both PART and C4.5 because the differ- ence on average in the accuracy between them in all exper- iments is less than 1.6 %. This supports research works conducted in Frank and Witten (1998), where they show that despite the simplicity of PART, it generates rules as accurately as C4.5 and RIPPER. Also C4.5 and PART algorithms showed consistency in the number of rules produced. Analysis of the rules features generated from the hyper- huristic data has been carried out. Fig. 5 shows the number of rules extracted from nine data sets, categorised by the number of classes. MMAC is able to extract rules that are associated with up to four classes for this data. This is one of the principle reasons for improving the accuracy within applications. Fig. 5 also demonstrates that the majority of rules created from each solution are associated with one or two class labels. It turns out that this reflects accurately the nature of the hyperheuristic data, since dur- ing each iteration, normally only one or two low-level heu- ristics improve on the objective function in the scheduling problem. Thus, each training instance usually corresponds to just one or two classes. The additional classes discovered by MMAC algorithm from the real data represent useful knowledge discarded by CBA, PART and RIPPER algorithms. The fact that -60.00% -40.00% -20.00% 0.00% 20.00% 40.00% 60.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Data sets A c c u ra c y ( % ) d if fe re n c e M CA R M M A C C4.5 Fig. 4. Difference of accuracy between C4.5 and (MCAR, MMAC). 0 4 8 12 16 20 24 28 32 36 40 1 2 3 4 5 6 7 8 9 Data set N u m b e r o f R u le s 1-label 2-labels 3-labels 4-labels Fig. 5. Distribution of the rules with regards to their labels. F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 1099 MMAC is able to extract rules with multiple classes enables domain experts to benefit from this additional useful infor- mation. In addition, these multi-label rules can contribute in the prediction step, possibly improving upon the classi- fication accuracy. The numbers of rules produced by C4.5, PART, CBA and MCAR algorithms are listed in Table 3. Since MMAC produces rules with multiple classes, we did not record their numbers of rules generated for fair comparison. The values in Table 3 show that C4.5 always generates more rules than PART, CBA and MCAR. This contradicts some earlier results reported in Liu et al. (1998) and Thabtah et al. (2005) on classifier sizes obtained against UCI data collec- tion (Merz & Murphy, 1996), which show that associative classification approaches like CBA and MCAR normally generate more rules than decision trees. For this reason, we performed extensive analysis on the classifiers derived by C4.5 from the optimisation data sets. After analysing the decision trees constructed by C4.5 algorithm at each iteration, we observed that many rules are generated, which do not cover any training data. We found out that the reason for these many useless rules appear to be the attribute that C4.5 splits the training instances on, if that attribute has many distinct values and only few of these values appear in the training data, then a rule for each branch will be generated, and hence only some of these branches cover training instances. The rest will represent rules that cover not even a single training instance. In other words, when the training data set consists of attributes that have several distinct values and a split occurs, the expected number of rules to be derived by C4.5 can be large. Since data sets used to derive the results in Table 3 contain four attributes where each one of them has 10 different values (low-level heuris- tics) and some of these low-level heuristics are never result in solution improvement for the hyperheuristic, this explains the large numbers of rules derived by C4.5. 5. Conclusions In this paper, we have studied data sets produced from a complex personnel scheduling problem, called the training scheduling problem. These data sets represent solutions generated by a general hybrid approach, called the Peckish hyperheuristic, which is a robust and general-purpose opti- misation heuristic that requires us to decide which of sev- eral simpler low-level heuristic techniques to apply at each step while building the schedule. Our study focused on analysing the behaviour of low-level heuristics that were selected by the hyperheuristic and improved upon the qual- ity of the current solution in order to extract useful rules. These rules can be used later to quickly predict the appro- priate low-level heuristics to call next. For this purpose, we have compared five data mining classification algorithms, (PART, C4.5, CBA, MCAR, MMAC) on 16 different solu- tions produced by Peckish hyperheuristic. The experimental tests showed a better performance for associative classification techniques (MCAR, MMAC, CBA) over decision trees (C4.5), rule induction (RIPPER) and PART algorithm with reference to the accuracy of pre- dicting the appropriate set of low-level heuristics. Since the MMAC algorithm was able to produce rules with multiple classes, including very useful information that the hype- rheuristic can use in forecasting the behaviour of low-level heuristic while constructing a new solution. It is the most applicable data mining approach, which can be used to pre- dict low-level heuristic performance within the Peckish hyperheuristic. Furthermore, C4.5 generated more rules than the other rule learning algorithms since useless rules were extracted by the C4.5 algorithm, which have not even a single repre- sentation in the training data. The reason of these useless rules turn out to be the training data attributes, in which when some of these attributes are associated with many distinct values and only a subset of these values have suf- ficient representation in the training data, there will be valid rules for this subset and the rest will represent useless rules. References Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rule. In Proceedings of the 20th international conference on very large data bases (pp. 487–499). Aicklen, U., & Dowsland, K. (2000). Exploiting problem structure in a genetic approach to a nurse rostering problem. Journal of Scheduling, 3, 139–153. Ali, K., Manganaris, S., & Srikant, R. (1997). Partial classification using association rules. In D. Heckerman, H. Mannila, D. Pregibon, & R. Uthurusamy (Eds.), Proceedings of the third international conference on knowledge discovery and data mining (pp. 115–118). Blum, C., & Roli, A. (2003). Metaheuristics in combinatorial optimisa- tion: overview and conceptual comparison. ACM Computing Surveys (CSUR) (pp. 268–308). CBA (1998). . Cendrowska, J. (1987). PRISM: an algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27(4), 349–370. Table 3 Number of rules of C4.5, PART, CBA and MCAR on the optimisation data sets Data C4.5 PART CBA MCAR 1 11 3 3 14 2 55 16 18 32 3 46 19 12 28 4 82 17 11 33 5 46 17 10 31 6 19 16 23 31 7 91 69 1 20 8 100 51 3 9 9 163 71 2 12 10 163 145 1 21 11 46 21 10 22 12 64 19 22 37 13 46 19 6 33 14 64 24 23 42 15 55 23 16 31 16 64 56 5 18 1100 F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 Cohen, W. (1995). Fast effective rule induction. In Proceedings of the 12th international conference on machine learning (pp. 115–123). CA: Morgan Kaufmann. Cowling, P., & Chakhlevitch, K. (2003). Hyperheuristics for managing a large collection of low level heuristics to schedule personnel. In Proceedings of IEEE conference on evolutionary computation (pp. 1214– 1221). Canberra, Australia. Cowling, P., Kendall, G., & Han, L. (2002) An investigation of a hyperheuristic genetic algorithm applied to a trainer scheduling problem. In Proceedings of congress on evolutionary computation (CEC 2002) (pp. 1185–1190). Hilton Hawaiian Village Hotel, Hono- lulu, Hawaii, May 12–17, ISBN 0-7803-7282-4. Cowling, P., Kendall, G., & Soubeiga, E. (2000). A hyperheuristic approach to scheduling a sales summit. In Proceedings of the third international conference of practice and theory of automated timetabling (PATAT 2000). LNCS (vol. 2079, pp. 176–190). Springer. Fayyad, U., & Irani, K. (1993). Multi—interval discretisation of contin- ues-valued attributes for classification learning. In Proceedings ofI JCAI (pp. 1022–1027). Frank, E., & Witten, I. (1998). Generating accurate rule sets without global optimisation. In Proceedings of the fifteenth international conference on machine learning (pp. 144–151). Madison, Wisconsin: Morgan Kaufmann. Furnkranz, J., & Widmer, G. (1994). Incremental reduced error pruning. In Machine learning: Proceedings of the 11th annual conference. New Brunswick, New Jersey: Morgan Kaufmann. Hamiez, J., & Hao, J. (2001). Solving the sports league scheduling problem with Tabu search. Lecture notes in artificial intelligence (vol. 2148, pp. 24–36). Springer. Hansen, P., & Mladenovic, N. (1997). Variable neighbourhood search. Computer and Operations Research, 1097–1100. Li, W., Han, J., & Pei, J. (2001). CMAR: accurate and efficient classification based on multiple-class association rule. In Proceedings of the ICDM’01 (pp. 369–376). CA: San Jose. Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. In Proceedings of the KDD (pp. 80–86). New York, NY. McAloon, K., TretKoff, C., & Wetzel, G., 1997. Sports league scheduling. In Proceedings of the third ILOG optimisation suite international users conference. Paris, France, 1997. Merz, C., & Murphy, P. (1996). UCI repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science. Quinlan, J. (1979). Discovering rules from large collections of examples: a case study. In D. Michie (Ed.), Expert systems in the micro-electronic age (pp. 168–201). Edinburgh: Edinburgh University Press. Quinlan, J. (1987). Generating production rules from decision trees. In Proceedings of the 10th international joint conferences on artificial intelligence (pp. 304–307). CA: Morgan Kaufmann. Quinlan, J. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufman. Thabtah, F., Cowling, P., & Peng, Y. (2004). MMAC: a new multi-class, multi-label associative classification approach. In Proceedings of the fourth IEEE international conference on data mining (ICDM ’04) (pp. 217–224) Brighton, UK. (Nominated for the Best paper award). Thabtah, F., Cowling, P., & Peng, Y. (2005). MCAR: multi-class classification based on association rule approach. In Proceeding of the 3rd IEEE international conference on computer systems and applications (pp. 1–7). Cairo, Egypt. WEKA (2000): Data Mining Software in Java: . Witten, I., & Frank, E. (2000). Data mining: practical machine learning tools and techniques association rules. In Proceedings of the 3rd KDD conference (pp. 283–286). Yin, X., & Han, J. (2003) CPAR: classification based on predictive association rule. In Proceedings of the SDM (pp. 369–376). San Francisco, CA. F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 1101 Mining the data from a hyperheuristic approach using associative classification Introduction The training scheduling problem and hyperheuristics Data mining for the selection of low-level heuristics Associative classification Classification based on association (CBA) MCAR: multi-class classification based on association rule Multi-class, multi-label associative classification (MMAC) Traditional classification approaches C4.5 Repeated incremental pruning to produce error reduction algorithm (RIPPER) PART Data and experimental results Data sets and their features Experimental results Conclusions References