Association rule hiding using cuckoo optimization algorithm Expert Systems With Applications 64 (2016) 340–351 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa Association rule hiding using cuckoo optimization algorithm Mahtab Hossein Afshari ∗, Mohammad Naderi Dehkordi , Mehdi Akbari Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Najafabad, Isfahan, Iran a r t i c l e i n f o Article history: Received 25 December 2015 Revised 27 May 2016 Accepted 1 August 2016 Available online 2 August 2016 Keywords: Data mining Privacy preserving data mining Association rule hiding Cuckoo optimization algorithm a b s t r a c t Privacy preserving data mining is a new research field that aims to protect the private information and avoid the leakage of this information during the data mining process. One of the techniques in this field is the Privacy Preserving Association Rule Mining which aims to hide sensitive association rules. Many different algorithms with particular approaches have so far been developed to reach this purpose. In this paper, a new and efficient approach has been introduced which benefits from the cuckoo optimization algorithm for the sensitive association rules hiding (COA4ARH). In this method the act of hiding is per- formed using the distortion technique. Further in this study three fitness functions are defined which makes it possible to achieve a solution with the fewest side effects. Introducing an efficient immigration function in this approach has improved its ability to escape from any local optimum. The efficiency of proposed approach was evaluated by conducting some experiments on different databases. The results of the execution of the proposed algorithm and three of the previous algorithms on different databases indicate that this algorithm has superior performance compared to other algorithms. © 2016 Elsevier Ltd. All rights reserved. e d t b 2 2 2 B V V C e t r o t m 1. Introduction As the use of data mining process to extract useful patterns from massive data is increasing, concerns about the disclosure of private information during this process has also risen. Trying to preserve the privacy of sensitive information while extracting use- ful patterns led to the formation of a new field in data mining known as privacy preserving data mining (PPDM). PPDM is applied in all data mining techniques such as clustering, classification, as- sociation rule. The algorithms of this field prevent the disclosure of private information, while preserving the utility of non-sensitive information as much as possible by modification and distortion of the database. However, manipulating the database in order to pro- tect the confidentiality of sensitive information is not flawless and has some side effects. The side effects include: • Hiding failure: It occurs when the PPDM algorithms cannot act completely successful and even after data base sanitization, part of the sensitive information could be extracted by data mining algorithms. • Lost rule or misses cost: sanitization process not only hides sensitive information but also ruins some parts of the non- sensitive data which cannot be extracted any more. ∗ Corresponding author. E-mail addresses: mahtab.h.afshar@gmail.com , Afshari@sco.iaun.ac.ir (M.H. Af- shari), Naderi@iaun.ac.ir (M.N. Dehkordi), mehdi_akbari@hotmail.com , mehdi_akbari@pco.iaun.ac.ir (M. Akbari). c http://dx.doi.org/10.1016/j.eswa.2016.08.005 0957-4174/© 2016 Elsevier Ltd. All rights reserved. • Ghost rule or artificial pattern: Because of the sanitization pro- cess, new and unrealistic patterns may form in database that did not exist before. So far, many algorithms have been introduced in this area, ach of which has different strengths and weaknesses. The con- ucted activities in privacy preserving association rule mining un- il the present time can be divided into three main categories: order-based ( Moustakides & Verykios, 20 06, 20 08; Sun & Philip, 007; Sun & Yu, 2005 ), exact ( Gkoulalas-Divanis & Verykios, 2006, 0 09a, 20 09b; Menon & Sarkar, 2007; Menon, Sarkar, & Mukherjee, 005 ) and heuristic approach ( Dasseni, Verykios, Elmagarmid, & ertino, 2001; Oliveira & Zaïane, 2002, 2003a, 2003b, 2006; Saygin, erykios, & Clifton, 2001; Saygin, Verykios, & Elmagarmid, 2002; erykios, Pontikakis, Theodoridis, & Chang, 2007; Wu, Chiang, & hen, 2007 ). In recent years, meta heuristic algorithms have been mployed for privacy preserving association rule mining. One of he newest meta heuristic algorithms is cuckoo optimization algo- ithm ( Walton, Hassan, Morgan, & Brown, 2011; Yang & Deb, 2013 ). Considering the above-mentioned facts, the main concern and bjective of this study is to use cuckoo optimization algorithm o propose a new algorithm in privacy preserving association rule ining area able to hide the sensitive information completely suc- essful and with minimal side effects by • Not having Hiding failure • Minimizing ghost rule amount • Decreasing lost rule impressively http://dx.doi.org/10.1016/j.eswa.2016.08.005 http://www.ScienceDirect.com http://www.elsevier.com/locate/eswa http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2016.08.005&domain=pdf mailto:mahtab.h.afshar@gmail.com mailto:Afshari@sco.iaun.ac.ir mailto:Naderi@iaun.ac.ir mailto:mehdi_akbari@hotmail.com mailto:mehdi_akbari@pco.iaun.ac.ir http://dx.doi.org/10.1016/j.eswa.2016.08.005 M.H. Afshari et al. / Expert Systems With Applications 64 (2016) 340–351 341 a a O i d e r a p a w 2 s v s l a m F i V n a B 1 w l b F l ( i c h i o t a e d t t t t K p a i t a i a b f f s b h w t w ( E w E v e e m a n t i b v t w t e d t e i t r a t d 3 d s t i B T c p m o S w o t t C w o n ( The rest of the article is organized as follows: Section 2 contains review of literatures conducted in the field of privacy preserving ssociation rule mining and a brief explanation about the Cuckoo ptimization Algorithm (COA). In Section 3 , the problem of hid- ng sensitive association rules are clearly explained. Section 4 then escribes the proposed algorithm for hiding association rules and xplains it in details by providing an example. In Section 5 , the esults of conducted experiments on Real and Synthetic data sets re provided and the efficiency of proposed approach will be com- ared with some existing algorithms. The effects of sparse data re described in Section 6 . Finally, in Section 7 results and future orks will be discussed. . Background and related work There have been so far many studies conducted in privacy pre- erving association rule mining, some of which are briefly re- iewed in this section. The subject of association rules hiding from frequent item ets through decrease of support was first suggested by Attal- ah et al. that used a lattice-like graph for hiding. Appropri- te time complexity was one of the advantages of the proposed ethod ( Atallah, Bertino, Elmagarmid, Ibrahim, & Verykios, 1999 ). ive algorithms namely 1.a, 1.b, 2.a, 2.b and 2.c are provided n Verykios, Elmagarmid, Bertino, Saygin, & Dasseni, (2004 ) by erykios et al. The three first algorithms are Rule-Oriented and the ext two are Itemset-Oriented. Removing just one sensitive rule at time is one of the disadvantages of aforementioned algorithms. y removing the right hand side of the sensitive rule, algorithm .b decreases the rule support and hides it. In algorithm 1.b, along ith an increase in the number of sensitive rules, the number of ost rules is also increased. Multiple rule hiding was first proposed y Oliveira &Zaïane. Their suggested algorithms were Naïve, Min- IA, MaxFIA and IGA that hide sensitive rules based on the conflict evel and were not efficient with regard to implementation time Oliveira & Zaïane, 2002 ). The two algorithms ISL and DSR were ntroduced by Wang in Wang, Parikh, & Jafari, (2007 ). The former onducts the hiding process by increasing the support of the left and side item of sensitive rules while the latter conducts the hid- ng process by decreasing the support of the right hand side item f sensitive rules. ISL and DSR algorithms are respectively weak in he hiding failure rate and the number of lost rules. The genetic lgorithm for hiding sensitive rules is suggested by Dehkordi and t al. in Dehkordi, Badie, & Zadeh, (2009 ). Besides hiding sensitive ata, the main purpose of this algorithm is to decrease modifica- ions in the databases. A preprocess phase for selection of sensitive ransactions of database and making changes in them is defined in his algorithm. Four different strategies are also provided in order o calculate Fitness. By introducing a new Fitness function, Azam han et al. in Khan, Qureshi, & Hussain, (2014 ) have improved the roposed algorithm in Dehkordi et al., (2009 ). Cuckoo Algorithm was first developed by Yang & Deb, (2009 ) nd then was investigated in details by Rajabioun, (2011 ). Interest- ng and different lifestyle as well as egg-laying of cuckoo has been he main influence for development of this algorithm. This bird is ble to brilliantly deceive other birds and make them participate n its own survival. Unlike other birds, cuckoo never builds nests nd never protects its eggs, but puts its eggs in the nests of other irds to be protected along with other eggs. Like other evolutional algorithms, COA also begins its work rom a random initial population which is called “habitat ” and it is ormed by cuckoos . A habitat is expressed by an array of 1 ×Nvar , howing current living position of cuckoo. This array is identified y Eq. (1) abitat = [ x 1 , x 2 , . . . , x N v ar ] , (1) here N var is the dimension of the problem. Each cuckoo is allocated some random eggs that will be put in he nests of a number of host birds. The nests of host birds exist ithin a certain range that is called maximum Egg Laying Radius ELR ) which is identified by Eq. (2) LR = [ ∝ × Number o f current cuckoo ′ s eggs T otal number o f eggs ] ×( V a r hi −V a r low ) , (2) here ∝ is a number, intended to handle the maximum value of LR. Var hi and Var low are respectively high and low limit of each ariable in optimization problems. A number of cuckoo’s eggs with more similarity to the host ggs will have more chance to turn into a mature cuckoo. Other ggs (about 10%) are identified by the host bird and removed. The ore the number of survived eggs in one region, the more benefit llocated to that region. Therefore, a situation in which the largest umber of eggs survives will be a parameter that COA aims to op- imize. Survived chicks are grown in the host’s nest and will turn nto mature cuckoos. At the time of egg-laying, cuckoos migrate to etter and more beneficial habitats with high chance of egg sur- ival. When migrating, cuckoos do not traverse the whole course oward the target point and pass a part of it ( λ% of total path) to- ard the target and then have a φ deviation. This parameter helps hem to search for more areas. After arriving to the target place, ach cuckoo owns a number of eggs according to which it’s ELR is etermined and the egg-laying is started. Considering the fact that here are always such factors as food shortage, hunting by hunters, tc. in the nature that causes balance in the number of birds, there s also a number like N max in Cuckoo Algorithm that can control he maximum number of cuckoos. After several iterations, cuckoos each a point with maximum similarity of eggs with the host eggs nd with maximum food resources. It is a place with the highest otal benefit and lowest number of eggs being destroyed. For more etail on COA, refer to Rajabioun, (2011 ). . Problem definition Assume that I = {i 1 , i 2 , i 3 ,…, i m } is a complex of items and D is a atabase including several transactions. Each transaction is a sub- et of I. The extracted association rules set from D is R and sensi- ive association rules set is R s , where R s ⊂R . Each association rule s expressed as A → B where A is antecedent or left hand side and is consequent or right hand side, so that A, B ⊂I and A ∩ B � = ∅ . he purpose is to change D into D ′ , so that the existing rules in R s annot be mined from D ′ while the existing rules in R − R s can be ossibly mined. Two criteria are considered in the association rule ining; the first is called item support that indicates frequency of ne rule in database and can be calculated by Eq. (3) : up ( A → B ) = | A ∪ B | | D | , (3) here Sup ( A → B ) is the support of A → B , | A ∪ B | is the number f transactions that include both item sets A and B and | D | is the otal number of transactions. The second criterion is called the rule confidence that indicates he rule strength in database and can be calculated by Eq. (4) : on f ( A → B ) = | A ∪ B | | A | , (4) here Conf ( A → B ) is the confidence of A → B , | A ∪ B | is the number f transactions that include both item sets A and B and | A | is the umber of transactions of database that include item set A . For each association rule, one Minimum Support Threshold MST ) and one Minimum Confidence Threshold ( MCT ) are defined. 342 M.H. Afshari et al. / Expert Systems With Applications 64 (2016) 340–351 Table 1 Definitions of notations. Notation Definition I An itemset D An original dataset D’ A sanitized dataset R The extracted association rules set from D R’ The extracted association rules set from D’ R s A set of sensitive association rules ∼R s A set of non-sensitive association rules r An association rule A → B An association rule Sup ( A → B ) The support of A → B Conf ( A → B ) The confidence of A → B MST The minimum support threshold MCT The minimum confidence threshold Var hi The high limit of each variable Var low The low limit of each variable α The number of total iterations of right hand side items of sensitive rules K The number of new solutions N The number of current solution’s items that should be modified N max The maximum number of survived solutions with better fitness values N var The dimension of problem N pop The size of initial population MaxNNS The max number of new solutions MinNNS The min number of new solutions MR The modifications radius HF The hiding failure LR The lost rules GR The ghost rules HD The hiding distance LD The lost distance RHD The rules hiding distances RLD The rules lost distances d n s 4 N i N h h t i w t t a s o t b a 1 i t A rule is minable when its support is more than MST and its con- fidence is higher than MCT. Therefore, in order to hide a rule it is required to reduce its support or confidence to a level under the thresholds. Table 1 shows a list of notations and their definitions used in this paper. 4. Proposed algorithm The main steps of the proposed algorithm named Cuckoo Op- timization Algorithm for Association Rule Hiding (COA4ARH) are shown in Algorithm 1 and Fig. 1 shows the flowchart of the algo- rithm. Fitness function, function to find the best solution and im- migration function are shown in Algorithms 2, 3 and 4 and these algorithms will be discussed in detail in the next sections. 4.1. Preprocess of original dataset The first step of the proposed algorithm includes a preprocess operation on the original dataset which is one of the most impor- tant and influential phases of the proposed approach. Since the sensitive items exist only in some transactions of the database, changing and manipulating all transactions is a time-consuming and useless task that not only increases the size of habitats, but also causes the production of useless and unrelated solutions and an increase in the time consumed for obtaining more effective so- lutions; hence increasing the sanitization time. To avoid such neg- ative effects, a preprocess operation have been considered that in- cludes two phases. In the first phase, the original database is pro- cessed in such a way that only critical transactions of the database are chosen. In the present study, critical transaction is a transac- tion that fully supports one or more sensitive rules. In the second phase of the preprocess operation only those sensitive items with critical role in sanitization, are addressed for change. Sensitive item is defined as follows: 1 If a sensitive rule has only one item in its right hand side, the same item will be selected as the sensitive item. 2 If a sensitive rule has more than one item in its right hand side, from among them an item will be chosen as a sensitive one that has respectively the most frequency in the right hand side of sensitive rules and the least frequency in non-sensitive rules. Therefore, conducting preprocess operation on the original atabase will lead to decrease in size of habitats, reduction of un- ecessary changes in database, prevention of producing irrelevant olutions and speedier access to an optimal solution. .2. Generating initial population In order to solve an N var dimension problem with the size of pop , the Cuckoo Optimization Algorithm randomly generates an nitial population in the form of a habitat matrix with the size of pop × N var . Each member of this population shows the current abitat of cuckoos. Since each habitat is allocated to one cuckoo, ereinafter for comfort, “cuckoo’s habitat" is replaced with con- racted form of “cuckoo" . In the present problem, each cuckoo in nitial population is indicative of a solution (sanitized database) hich is shown with a sequence of 0 s and 1 s. In this sequence, he presence and absence of one item in transaction are respec- ively presented with 1 and 0. The original database is considered s the first cuckoo or solution in initial population. Indeed, the first olution of initial population is a sequence of critical transactions f original database that were selected in the preprocess opera- ion. The random generation of the other ( N pop −1 ) solutions can e conducted as such: only those sensitive items that had been ddressed in preprocess operations are randomly quantified (0 or ) and other items are left unchanged from the first solution (orig- nal database) and are transferred to other solutions. Thus, an ini- ial population with N pop number of solutions is generated. Each M.H. Afshari et al. / Expert Systems With Applications 64 (2016) 340–351 343 Fig. 1. Flowchart of COA4ARH algorithm. 344 M.H. Afshari et al. / Expert Systems With Applications 64 (2016) 340–351 Algorithm 1 COA4ARH algorithm. Input: Original Dataset D, R s , MST and MCT ; α, N pop , N max , MinNNS, MaxNNS, MaxIteration ; Output: A Sanitized Dataset D’ ; begin Preprocess of Original Dataset; Generate an initial population; call FitnessFunction; call BestSolutionFunction; repeat // Generate new solutions for each solution in population K = ( Ma x N N S − Min N N S ) × Rand[ 0 , 1 ] + Min N N S; //Dedicate K MR = [ ∝ × Current solution ′ s K Total l o f al l sol ution ′ s K ] × ( V a r hi − V a r low ) ; //Define MR for K times do for MR times do Select an item randomly from among the sensitive items ; Set Rand [0 ,1] to the selected item; end for end for end for call FitnessFunction; call BestSolutionFunction; Limit number of solutions to N max ; for each solution in population call ImmigrationFunction; // Move all solutions toward the best solution end for call FitnessFunction; call BestSolutionFunction; until the termination condition is satisfied; end. Algorithm 2 Fitness function. Begin for each solution in population Merge with noncritical transactions; Calculate minfit1 ; // Eq. (5) Calculate minfit2 ; // Eq. (8) Calculate minfit3 ; // Eq. (11) end for end. w | w c r m w R R r r R a r i f m t a A fi s t r t 4 p s a t solution is an array the length of which equals the total length of critical transactions. 4.3. Evaluate the fitness In this part, fitness values are calculated for each existing so- lution in the initial population. To do so, it is necessary that each solution is merged with that part of the database which was separated in the preprocess (non-critical transactions) so that sanitization effect is considered for the whole database, and ac- quired fitness values would be representative of overall state of the database. It should be mentioned that calculation of fitness values is not necessary for the first solution in the initial popu- lation; for, as noted in the previous section, the first solution is the original database which was provided (unchanged) in the ini- tial population. The fitness values for other solutions are calculated by Eqs. (5), (8) and (11) which come as below: min f it1 = | HF | , (5) where |HF| is the number of hiding failure which is identified by Eq. (6) | HF | = | R s | ∑ i =1 r i state, (6) where | R s | is the number of sensitive rules and r i state is calculated by Eq. (7) r i state = { 0 i f Su p ( I i ) < MST or Con f ( r i ) < MCT 1 otherwise , (7) where I i is the itemset of r i . The second fitness value is computed by Eq. (8) min f it2 = | LR | , (8) here | LR | is the number of lost rules which is defined by Eq. (9) LR | = | ∼R s | ∑ i =1 r i state, (9) here | ∼R s | is the number of non-sensitive rules and r i state is alculated by Eq. (10) i state = { 0 i f Sup ( I i ) ≥ MST and Con f ( r i ) ≥ MCT 1 otherwise . (10) The third fitness value is identified by Eq. (11) in f it3 = RHD + RLD, (11) here RHD is Rules Hiding Distances, RLD Rules Lost Distances and HD is calculated by Eq. (12) HD = | R s | ∑ i =1 r i HD, (12) i HD is defined by Eq. (13) i HD = { 0 i f Sup ( I i ) < MST 0 else i f Con f ( r i ) < MCT C on f ( r i ) − MC T + 1 otherwise , (13) RLD is computed by Eq. (14) LD = | ∼R s | ∑ i =1 r i LD, (14) nd r i LD is calculated by Eq. (15) i LD = { 0 i f Sup ( I i ) ≥MST and C on f ( r i ) ≥MC T MC T − C on f ( r i ) else i f Con f ( r i ) < MCT MST − Sup ( I i ) otherwise . (15) The process of mining of all existing solutions in a population, n each iteration is a very time consuming operation. Hence the unctions minfit 1, minfit 2 and minfit 3 have been defined in such a anner where the mining of the solutions is not necessary. In fact, hese functions are able to determine the number of hiding failures nd lost rules for each solution without the need of data mining. s a result in COA4ARH, data mining is performed only twice. The rst time, the original database is mined in order to determine the ensitive and non-sensitive rules. The second time the final sani- ized database is mined so that except for hiding failures and lost ules, the number of ghost rules can also be defined. Fitness func- ion is shown in Algorithm 2 . .4. Select the best solution After calculating fitness of each solution it is necessary to com- are all fitness values in order to select the best solution. The first olution in the initial population does not have any fitness value nd hence will not be participated in the comparison. In order o select the best solution, all of the solutions are first sorted in M.H. Afshari et al. / Expert Systems With Applications 64 (2016) 340–351 345 Algorithm 3 Best solution function. Begin Sort the solutions in increasing order of their minfit1 ; if two or more solutions have same minfit1 then Sort those in increasing order of their minfit2 ; if two or more solutions have same minfit2 then Sort those in increasing order of their minfit3 ; end if end if set the first member of sorted list as the best solution; if (the fitness values of the best solution ≤ the fitness values of the previous best solution) return best solution; else return previous best solution; end. Algorithm 4 Immigration function. Input: Current Solution, Best Solution; // Current Solution refers to every and each one of the existing solutions in a population; // Best Solution is a solution with the best fitness value so far; Output: Next Solution; // Next Solution is a solution close to the best solution; begin Number of Different Items = Hamming distance (Current Solution, Best Solution); N = Rand (0, 1) × Number of Different Items; // N is the number of Current Solution’s items that should be modified; Selected Items = Select randomly N items from Different Items of Current Solution; Next Solution = 1 XOR each Selected Items in Current Solution; //Reduce hamming distance; end r w m fi w b t p s 4 t i s K w b l N l i o M w c r c a e o s a 4 c s m e 4 o l a i g t t p t t m ( t i d f c i i v e b s 4 o 1 M A c ising order based on the value of minfit 1 function. Then, those ith equal values of minfit 1 will be sorted in rising order based on infit 2 and finally, if two solutions are equal in the value of min- t 2, they will be sorted in rising order based on minfit 3. The result ould be a sorted list of solutions, the first member of which will e selected as the best solution. Then, the best gained solution in his step and the best solution of the previous one will be com- ared in order for the global best solution to be determined. Best olution function is shown in Algorithm 3 . .5. Generating new solutions Each existing solution should generate a number of new solu- ions. Therefore, according to Eq. (16) , a random K number is ded- cated to each solution. It is representative of the number of new olutions that should be generated by each parent solution. = ( Max N N S − Min N N S ) × Rand [ 0 , 1 ] + Min N N S, (16) here K is the number of new solutions that should be generated y each parent solution, Max NNS is Maximum Number of New So- utions (determined by user) and Min NNS is Minimum Number of ew Solutions (determined by user). For generating new solutions, each parent solution is only al- owed to change a limited number of its items or bits. This number s called Modifications Radius which is shown with MR .The value f MR for each parent solution is calculated by Eq. (17) . R = [ ∝ × Current solution ′ s K T otal l o f al l sol ution ′ s K ] × ( V a r hi − V a r low ) , (17) here MR is the number of each parent solution’s items that ould be changed and ∝ shows the number of total iterations of ight hand side items of sensitive rules in critical transactions, re- eived as algorithm input. Var hi and Var low are respectively high nd low limit of each variable in optimization problems. Consid- ring the fact that in the present problem high and low limits f the variables are respectively determined with 1 and 0, the (V a r hi − V a r low ) can be removed from the above equation. Given the value of K as well as the specified MR for each parent olution, new solutions are being developed. The process of gener- ting each new solution is as follows: • For each parent solution, a number (equal to it’s MR ) of items are randomly selected from among the sensitive items and ran- domly quantified (0 or 1). • Other items are transmitted from the parent solution into the new solution without any change. • These steps are iterated K times for each parent solution to which the value K is allocated. .6. Limit solutions’ maximum number The total number of the existing solutions should always be ontrolled in order not to exceed the N max threshold. For this rea- on, in each iteration, a number of worst solutions must be re- oved from the end of the sorted list, so that the number of the xisting solutions would remain equal to N max . .7. Immigration In order to improve the remaining solutions, with the purpose f improving the ability of the proposed algorithm to escape from ocal optimums and finding better solutions compared to what is lready gained, all solutions are changed in a way to be more sim- lar and closer to the best solution. This change can lead to the eneration of better solutions compared to the best current solu- ion. Nevertheless, the best current solution will be considered as he best global solution. Algorithm 4 shows this process. In order to clarify the functionality of this algorithm, an exam- le has been offered in Fig. 2. The Hamming distance is defined as the number of posi- ions at which the corresponding symbols are different between wo strings of equal length. For binary strings a and b the Ham- ing distance is equal to the number of ones in a XOR b Hamming, 1950 ). Based on this definition, the number of different items between he current solution and the best solution is equal to 7 which is llustrated in red as shown in Fig. 2 (a). The value of N can now be etermined which is equal to a random number of different items, or example 3. This means that 3 items of different items of the urrent solution should be modified. Hence from those 7 items, 3 tems will be chosen randomly. These items have been illustrated n blue in Fig. 2 (b). The value of each selected item should then be modified (the alue of selected item XOR 1). In this way, the next solution is gen- rated which is more similar to the best solution where the num- er of non-equal items between them have decreased. The next olution is illustrated in Fig. 2 (c). .8. Case study In this part an example is provided for a better understanding f the proposed algorithm. Assume an original database including 0 transactions that is shown in Table 2 . Considering MST = 40 and CT = 70, the mined rules from the database are shown in Table 3 . lso assume that the set of sensitive rules is defined as { b → c,b → e } to be hidden. 346 M.H. Afshari et al. / Expert Systems With Applications 64 (2016) 340–351 Fig. 2. Moving a solution toward the best solution. Fig. 3. Preprocess of original dataset. Table 2 Original dataset. T id Items T 1 a, b, c, e T 2 e T 3 b, c, e, f T 4 d, f T 5 a, b, d T 6 b, c, e T 7 a, b, c, d, e T 8 a, b T 9 c, e T 10 a, b, c, e T o t E s m f S l a s The preprocess operation on the original database is shown in Fig. 3. Assume that the user has determined the N pop as 4 cuckoos. herefore, a random initial population can be shown as Fig. 4. As seen in Fig. 4 , the first solution is the critical transactions f the original database. The fitness values for other solutions af- er merging with non-critical transactions, can be calculated by qs. (5), (8) and (11) . For example the fitness values for C2 are hown as follow: infit 1 ( C2 ) = 0 minfit 2 ( C2 ) = 8 minfit 3 ( C2 ) = 2 . 56 After fitness calculation, the sorted list of solutions is like the ollowing. ortedList = { C4 , C2 , C3 } The first member of the list (C4) is determined as the best so- ution. Next stages are continued with determination of MR and llocation of K value to each solution and also generation of new olutions. M.H. Afshari et al. / Expert Systems With Applications 64 (2016) 340–351 347 Fig. 4. Random initial population. Table 3 Association rules extracted from orig- inal dataset with MST = 40% and MCT = 70%. Rule Support Confidence a → b 50 100 b → a 50 71 .43 b → c 50 71 .43 c → b 50 83 .33 b → e 50 71 .43 e → b 50 71 .43 c → e 60 100 e → c 60 85 .71 b → ce 50 71 .43 c → be 50 83 .33 e → bc 50 71 .43 ce → b 50 83 .33 be → c 50 100 bc → e 50 100 a M s a l t w m v S l Table 4 Final sanitized dataset. T id Items T 1 a, b, c, e T 2 e T 3 b, c, e, f T 4 d, f T 5 a, b, d T 6 b, c, e T 7 a, b, c, d, e T 8 a, b T 9 c, e T 10 a, b, e b m h o S c i n e l S s n o t t 5 i Assume that the user has determined the MaxNNS and MinNNS s follow: axN N S = 3 , MinN N S = 1 , o the random values of K will be calculated as follow: K ( C1 ) = ( 3 − 1 ) × rand [ 0 , 1 ] + 1 = 2 , K ( C2 ) = ( 3 − 1 ) × rand [ 0 , 1 ] + 1 = 1 , K ( C3 ) = ( 3 − 1 ) × rand [ 0 , 1 ] + 1 = 1 , K ( C4 ) = ( 3 − 1 ) × rand [ 0 , 1 ] + 1 = 2 , nd the value of MR for each parent solution is calculated as fol- ow: α = 10 , MR ( C1 ) = [ 10 × 2 6 ] = 3 , MR ( C2 ) = [ 10 × 1 6 ] = 2 , MR ( C3 ) = [ 10 × 1 6 ] = 2 , MR ( C4 ) = [ 10 × 2 6 ] = 3 , hen the new solutions are generated as shown in Fig. 5 . The items hich are randomly changed are highlighted with black. The fitness values for all new solutions are calculated (after erging with non-critical transactions) and sorted by the fitness alues. The sorted list represented as follow: ortedList = { C11 , C31 , C12 , C21 , C41 , C42 } The first member of the list (C11) is determined as the best so- ution and compared with the previous best solution (C4). Since oth of them have equal fitness values, the best solution still re- ains C11. Assume that the user has determined the N max as 4, ence 2 number of the worst solutions are deleted from the end f the sorted list. ortedList = { C11 , C31 , C12 , C21 } . In the next step, the Number of Deferent Items and N are cal- ulated as follow: Number of Diferent Items = Hamming Distance (C31, C11) = 4, N = Rand (0, 1) × 4 = 1 Number of Diferent Items = Hamming Distance (C12, C11) = 3, N = Rand (0, 1) × 3 = 2 Number of Diferent Items = Hamming Distance (C21, C11) = 2, N = Rand (0, 1) × 2 = 1 Immigrated solutions toward the best solution (C11) is shown n Fig. 6 . The changed items are illustrated in black. In this stage again each immigrated solution is merged with on-critical transactions and the fitness values are calculated for ach of them, eventually the sorted list of solutions is like the fol- owing. ortedList = { C ′ 12 , C11 , C ′ 21 , C ′ 31 } The first member of the list (C’12) will be selected as the best olution and compared whit C11 (the previous best solution), fi- ally C’12 is determined as the best solution in the first iteration f the algorithm. Next iteration is started with determination of MR and alloca- ion of K value to each solution and also generation of new solu- ions. The sanitized dataset after 2 iterations is shown in Table 4. . Performance evaluation In order to evaluate the efficiency of proposed algorithm in san- tization of Real and Synthetic databases, some experiments were 348 M.H. Afshari et al. / Expert Systems With Applications 64 (2016) 340–351 Fig. 5. Generate new solutions. Fig. 6. Immigrate toward the best solution. Table 5 Database characteristics. Database No. of transactions Avg. trans. length No. of items Mushroom 8124 23 119 Chess 3196 37 75 T100 100 19 37 5 m f u a v i h conducted on the proposed algorithm and three others namely 1.b( Verykios et al., 2004 ), DSR( Wang et al., 2007 ), and the Im- proved Genetic ( Khan et al., 2014 ) algorithms, which are discussed here. 5.1. Datasets Three databases were used in the experiments. Chess and Mushroom databases were used as real data and T100 database was used as synthetic data. Characteristics of the specified datasets are presented in Table 5. 5.2. Performance measures The efficiency criteria that will be used in comparison of the proposed algorithm with 1.b, DSR and Improved Genetic algo- rithms include: • Hiding Failure ( HF ) : indicates the number of sensitive rules which sanitization algorithm could not hide and are still mined from the sanitized database D ′ . It is calculated by Eq. (18) . HF = ∣∣R S (D ′ )∣∣ | R S ( D ) | , (18) where, | R S ( D ′ )| is the number of sensitive rules explored in the sanitized database D ′ and | R S ( D )| is the number of sensitive rules explored in the original database D . • Lost Rules ( LR ) : indicates the number of non-sensitive rules that are lost due to the act of sanitization and will not be mined from the sanitized database D ′ . It is calculated by Eq. (19) . LR = | ∼ R S ( D ) | − ∣∣∼ R S (D ′ )∣∣ | ∼ R S ( D ) | , (19) where | ∼ R S ( D )| is the number of non-sensitive rules explored in the original database D and | ∼ R S ( D ′ )| is the number of non- sensitive rules explored in the sanitized database D ′ . • Ghost Rules ( GR ): indicates the number of artificial rules which were not within the original database and are generated due to the act of sanitization and are mined from database D ′ .It is calculated by Eq. (20) . GR = | R ′ | − | R ∩ R ′ | | R ′ | , (20) where | R ′ | is the number of mined rules from D ′ and | R | is he number of mined rules from D . • Number of Iterations: the number of iterations required to at- tain the optimal solution. .3. Experimental results In conducting the experiments, all the algorithms were imple- ented with C# in a same platform. The experiments were per- ormed on a PC with 2.20 GHz @Intel core i7 CPU and 6 GB of RAM nder the Windows 8 (64-bit). The results are shown in Figs. 7 , 8 nd 9. As it is illustrated in Figs. 7, 8 and 9 in all three data bases, the alue of the HF in the proposed algorithm is equal to zero. This s due to an appropriate definition of the fitness function and also igh prioritizing of the minfit 1 function. Further by not using the M.H. Afshari et al. / Expert Systems With Applications 64 (2016) 340–351 349 0 0.2 0.4 0.6 0.8 1 Improved Gene�c DSR1.BCOA4ARH Si d e Eff ec ts V al u es The Algorithms HF LR GR Fig. 7. Comparative evaluation of the algorithms on Mushroom dataset. 0 0.2 0.4 0.6 Improved Gene�c DSR1.BCOA4ARH Si d e Eff ec ts V al u es The Algorithms HF LR GR Fig. 8. Comparative evaluation of the algorithms on Chess dataset. 0 0.1 0.2 0.3 0.4 0.5 0.6 Improved Gene�c DSR1.BCOA4ARH The Algorithms HF LR GR Fig. 9. Comparative evaluation of the algorithms on Synthetic dataset. i w v F m o m i p n c a 5 a t n t t s -10% 0% 10% 20% 30% 40% 50% 0 10 20 30 40 50 60 70 80 90 100 H id in g Fa ilu re Itera�on Result on Mushroom Database COA4ARH Improved Gene�c Fig. 10. Comparing the number of hiding failures in each iteration on Mushroom database. 0% 20% 40% 60% 80% 100% 0 10 20 30 40 50 60 70 80 90 100 Lo st R u le Itera�on Result on Mushroom Database COA4ARH Improved Gene�c Fig. 11. Comparing the number of lost rules in each iteration on Mushroom database. -20% 0% 20% 40% 60% 80% 0 10 20 30 40 50 60 70 80 90 100 H id in g Fa ilu re Itera�on Result on Chess Database COA4ARH Improved Gene�c Fig. 12. Comparing the number of hiding failures in each iteration on Chess database. 0% 20% 40% 60% 80% 100% 120% 0 10 20 30 40 50 60 70 80 90 100 Lo st R u le Itera�on Result on Chess Database COA4ARH Improved Gene�c Fig. 13. Comparing the number of lost rules in each iteration on Chess database. -20% 0% 20% 40% 60% 80% 0 10 20 30 40 50 60 70 80 90 100 H id in g Fa ilu re Itera�on Result on Synthe�c Database COA4ARH Improved Gene�c Fig. 14. Comparing the number of hiding failures in each iteration on synthetic database. nsertion technique in the proposed algorithm and selecting items ith minimum frequency in non-sensitive rules for deletion, the alue of GR in this algorithm will be zero or only a small amount. urthermore as it is shown in Figs. 7, 8 and 9 , there is an improve- ent in the value of LR in the proposed algorithm compared to the ther algorithms, which is the result of selecting items with the inimum frequency in non-sensitive rules for deletion. As shown n Fig. 9 , the proposed algorithm has a higher value of LR com- ared to the DSR algorithm, this is since the proposed algorithm eeds to apply more deletion to decrease the HF to zero, hence ompared to DSR with higher HF , the proposed algorithm will have higher value of LR . .3.1. Comparing the number of iteration One of the most important evaluation criteria in meta-heuristic lgorithms is the number of iterations required to attain the op- imal solution. Since measures have been taken to minimize the umber of iterations in COA4ARH algorithm, the comparison be- ween this algorithm and Improved Genetic algorithm is done on hree databases in Table 5 . The results of these comparisons are een in Figs. 10 to 15. 350 M.H. Afshari et al. / Expert Systems With Applications 64 (2016) 340–351 0% 20% 40% 60% 80% 100% 120% 0 10 20 30 40 50 60 70 80 90 100 Lo st R u le Itera�on Result on Synthe�c Database COA4ARH Improved Gene�c Fig. 15. Comparing the number of lost rules in each iteration on synthetic database. d G c H c i t a o s p b r b a t e R A D D G G G H K M M M M O O R S S S S As it can be seen in these figures, COA4ARH algorithm has been converged with a high speed and in the number of iteration less than 10 can attain the optimal solution. Furthermore, this algo- rithm showed better results compared to the other algorithm even in the very low number of iteration. Due to the way preprocessing is defined, significant decline occurred in the number of required iteration in COA4ARH algorithm. The operation is defined in a way that only the items involved in hiding sensitive rules are chosen to be removed. As a result, first, this prevents the production of use- less solution and increases the chance of suitable solution produc- tion; second, the number of possible modes to produce a solution reduces significantly. Because in a string length of L the placement modes of 0 and 1 is 2 L . In fact, preprocessing reduces L size sig- nificantly by choosing sensitive items to remove; as a result, the number of cases for attaining the result will be reduced. Due to the manipulation of all the items of the database in Im- proved genetic algorithm, useless solutions are produced that de- crease the chance of producing more effective solutions and in- crease the time needed to attain optimal solution. As it is seen in Figs 10 –15 , this algorithm has been converged in the number of iterations over 40. 6. The effects of sparse dataset The existence of sparse datasets as input data in proposed al- gorithm make extracted association rules to have a lower degree of overlap. Therefore, sensitive rules elected for deletion will also have little overlap. The low degree of overlap suggests that the number of frequent items between sensitive rules is low so the number of sensitive items of the algorithm chosen to be deleted increase to zero the hiding failure which has the highest priority in side effects. This factor will lead to an increase in the number of lost rules. Consequently, it can generally be concluded that the presence of sparse datasets in the proposed algorithm input data increase the number of lost rules. 7. Conclusions and future works The present article has proposed a new meta heuristic method for hiding sensitive association rules using cuckoo optimization al- gorithm. The proposed algorithm is capable of simultaneously hid- ing several sensitive association rules. The most important and in- fluential feature of the present study is defining a preprocess op- eration which includes two phases in the beginning of proposed algorithm. This preprocess operation causes a remarkable decrease in the number of iterations and speedy access to the optimal so- lution. In this study, three fitness functions have also been intro- duced which can find the solution with minimum side effects. Fur- ther an immigration algorithm has been defined which improved the ability of the proposed algorithm to escape from local opti- mums. The proposed approach was compared to the three algo- rithms 1.b, DSR and Improved Genetic. For efficiency assessment, all of the algorithms were examined on both Real and Synthetic ata. The results were analyzed based on the four criteria HF, LR, R and the number of iteration. The results show a higher effi- iency for the proposed algorithm compared to the others. Since, F was 0, GR was close to 0 and LR had a remarkable decrease ompared to other algorithms. Moreover the proposed algorithm s converged with a high speed and in the lower number of itera- ion can attain the better solution compared to Improved Genetic lgorithm. Defining a new fitness function that can decrease the amount f Lost Rules and preserve the algorithm’s capability of hiding sen- itive rules and avoiding generation of ghost rules will be the pur- ose of future studies. Also, through some computations, the num- er of sensitive items that should be deleted for hiding sensitive ules can be calculated; only this number of sensitive items should e deleted to decrease the number of lost rules. Moreover, a self- daptive mechanism will be added to the proposed algorithm. In his mechanism, algorithm input parameters will be changed by ach iteration and will be moved toward being optimized. eferences tallah, M. , Bertino, E. , Elmagarmid, A. , Ibrahim, M. , & Verykios, V. (1999). Disclo- sure limitation of sensitive rules. In IEEE 1999 knowledge and data engineering exchange (pp. 45–52). IEEE . asseni, E. , Verykios, V. S. , Elmagarmid, A. K. , & Bertino, E. (2001). Hiding as- sociation rules by using confidence and support. In Information hiding: 2137 (pp. 369–383). Pittsburgh, PA, USA: Springer Berlin Heidelberg . ehkordi, M. N. , Badie, K. , & Zadeh, A. K. (2009). A novel method for privacy pre- serving in association rule mining based on genetic algorithms. Journal of Soft- ware, 4 , 555–562 . koulalas-Divanis, A. , & Verykios, V. S. (2006). An integer programming approach for frequent itemset hiding. In Proceedings of the 15th ACM international confer- ence on information and knowledge management (pp. 748–757). ACM . koulalas-Divanis, A. , & Verykios, V. S. (2009a). Exact knowledge hiding through database extension. IEEE Transactions on Knowledge and Data Engineering, 21 , 699–713 . koulalas-Divanis, A. , & Verykios, V. S. (2009b). Hiding sensitive knowledge without side effects. Knowledge and Information Systems, 20 , 263–299 . amming, R. W. (1950). Error detecting and error correcting codes. Bell System Tech- nical Journal, 29 , 147–160 . han, A. , Qureshi, M. S. , & Hussain, A. (2014). Improved genetic algorithm ap- proach for sensitive association rules hiding. World Applied Sciences Journal, 31 , 2087–2092 . enon, S. , & Sarkar, S. (2007). Minimizing information loss and preserving privacy. Management Science, 53 , 101–116 . enon, S. , Sarkar, S. , & Mukherjee, S. (2005). Maximizing accuracy of shared databases when concealing sensitive patterns. Information Systems Research, 16 , 256–270 . oustakides, G. V. , & Verykios, V. S. (2006). A max-min approach for hiding fre- quent itemsets. In Proceedings of the sixth IEEE international conference on data mining-workshop (pp. 502–506) . oustakides, G. V. , & Verykios, V. S. (2008). A max-min approach for hiding fre- quent itemsets. Data & Knowledge Engineering, 65 , 75–89 . liveira, S. R. , & Zaïane, O. R. (2002). Privacy preserving frequent itemset min- ing. In IEEE 2002 ICDM workshop on privacy, security and data mining: vol. 14 (pp. 43–54). Maebashi, Japan: Australian Computer Society, Inc . Oliveira, S. R. , & Zaïane, O. R. (2003a). Algorithms for balancing privacy and knowledge discovery in association rule mining. In IEEE 2003 7th international database engineering and applications symposium (pp. 54–63). IEEE . liveira, S. R. , & Zaïane, O. R. (2003b). Protecting sensitive knowledge by data san- itization. In IEEE 2003 3th international conference data mining (pp. 613–616). IEEE . Oliveira, S. R. , & Zaïane, O. R. (2006). A unified framework for protecting sensitive association rules in business collaboration. International Journal of Business In- telligence and Data Mining, 1 , 247–287 . ajabioun, R. (2011). Cuckoo optimization algorithm. Applied Soft Computing, 11 , 5508–5518 . aygin, Y. , Verykios, V. S. , & Clifton, C. (2001). Using unknowns to prevent discovery of association rules. ACM SIGMOD Record, 30 , 45–54 . aygin, Y. , Verykios, V. S. , & Elmagarmid, A. K. (2002). Privacy preserving association rule mining. In IEEE 2002 12th international workshop on research issues in data engineering: engineering e-commerce/e-business systems (pp. 151–158). IEEE . un, X. , & Philip, S. Y. (2007). Hiding sensitive frequent itemsets by a border-based approach. Journal of Computing Science and Engineering, 1 , 74–94 . un, X. , & Yu, P. S. (2005). A border-based approach for hiding sensitive frequent itemsets. In IEEE 2005 5th international data mining conference (pp. 426–433). IEEE . Verykios, V. S. , Elmagarmid, A. K. , Bertino, E. , Saygin, Y. , & Dasseni, E. (2004). As- sociation rule hiding. IEEE Transactions on Knowledge and Data Engineering, 16 , 434–447 . http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0001 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0001 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0001 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0001 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0001 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0001 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0001 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0002 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0002 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0002 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0002 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0002 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0002 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0003 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0003 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0003 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0003 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0003 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0004 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0004 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0004 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0004 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0005 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0005 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0005 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0005 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0006 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0006 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0006 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0006 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0007 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0007 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0008 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0008 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0008 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0008 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0008 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0009 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0009 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0009 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0009 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0010 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0010 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0010 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0010 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0010 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0011 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0011 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0011 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0011 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0012 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0012 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0012 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0012 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0013 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0013 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0013 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0013 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0014 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0014 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0014 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0014 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0015 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0015 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0015 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0015 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0016 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0016 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0016 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0016 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0017 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0017 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0018 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0018 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0018 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0018 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0018 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0019 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0019 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0019 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0019 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0019 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0020 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0020 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0020 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0020 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0021 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0021 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0021 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0021 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0022 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0022 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0022 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0022 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0022 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0022 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0022 M.H. Afshari et al. / Expert Systems With Applications 64 (2016) 340–351 351 V W W W Y Y erykios, V. S. , Pontikakis, E. D. , Theodoridis, Y. , & Chang, L. (2007). Efficient al- gorithms for distortion and blocking techniques in association rule hiding. Dis- tributed and Parallel Databases, 22 , 85–104 . alton, S. , Hassan, O. , Morgan, K. , & Brown, M. (2011). Modified cuckoo search: A new gradient free optimisation algorithm. Chaos, Solitons & Fractals, 44 , 710–718 . ang, S.-L. , Parikh, B. , & Jafari, A. (2007). Hiding informative association rule sets. Expert Systems with Applications, 33 , 316–323 . u, Y.-H. , Chiang, C.-M. , & Chen, A. L. (2007). Hiding sensitive association rules with limited side effects. IEEE Transactions on Knowledge and Data Engineering, 19 , 29–42 . ang, X.-S. , & Deb, S. (2009). Cuckoo search via Lévy flights. In Nature & biologically inspired computing, 2009. NaBIC 2009. World congress on (pp. 210–214). IEEE . ang, X.-S. , & Deb, S. (2013). Multiobjective cuckoo search for design optimization. Computers & Operations Research, 40 , 1616–1624 . http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0023 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0023 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0023 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0023 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0023 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0023 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0024 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0024 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0024 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0024 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0024 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0024 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0025 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0025 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0025 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0025 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0025 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0026 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0026 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0026 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0026 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0026 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0027 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0027 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0027 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0027 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0028 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0028 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0028 http://refhub.elsevier.com/S0957-4174(16)30398-0/sbref0028 Association rule hiding using cuckoo optimization algorithm 1 Introduction 2 Background and related work 3 Problem definition 4 Proposed algorithm 4.1 Preprocess of original dataset 4.2 Generating initial population 4.3 Evaluate the fitness 4.4 Select the best solution 4.5 Generating new solutions 4.6 Limit solutions’ maximum number 4.7 Immigration 4.8 Case study 5 Performance evaluation 5.1 Datasets 5.2 Performance measures 5.3 Experimental results 5.3.1 Comparing the number of iteration 6 The effects of sparse dataset 7 Conclusions and future works References