key: cord-0047099-wdjci1vq authors: Li, Baizhen; Chen, Wei; Wei, Zhihua; Zhang, Hongyun; Zhang, Nan; Sun, Lijun title: Quick Maximum Distribution Reduction in Inconsistent Decision Tables date: 2020-06-10 journal: Rough Sets DOI: 10.1007/978-3-030-52705-1_12 sha: bab70c683f8293a1ec5e67d40cc564d493a0611d doc_id: 47099 cord_uid: wdjci1vq Attribute reduction is a key issue in rough set theory, and this paper focuses on the maximum distribution reduction for complete inconsistent decision tables. It is quite inconvenient to judge the maximum distribution reduct directly according to its definition and the existing heuristic based judgment methods are inefficient due to the lack of acceleration mechanisms that mainstream heuristic judgment methods have. In this paper, we firstly point out the defect of judgment method proposed by Li et al. [15]. After analyzing the root cause of the defect, we proposed two novel heuristic attribute reduction algorithms for maximum distribution reduction. The experiments show that proposed algorithms are more efficient. Rough set theory, introduced by Z. Pawlak [1] in 1982, is an efficient tool to imprecise, incomplete and uncertain information processing [2] [3] [4] [5] . Currently, rough set theory has been successfully applied to many practical problems, including machine learning [6, 7] , pattern recognition [8, 9] , data mining [10] , decision support systems [11] , etc. Attribute reduction, the process of obtaining a minimal set of attributes that can preserve the same ability of classification as the entire attribute set, is one of the core concepts in rough set theory [12] . Maximum distribution reduction, proposed as a compromise between the capability of generalized decision preservation reduction and the complexity of distribution preservation reduction [13] by Zhang et al. [14] in 2003, guarantees the decision value with maximum probability of object in inconsistent decision tables unchanged. Subsequently, Pei et al. proposed a theorem for maximum distribution reduct judgment in 2005. Next, Li et al. [15] paid attention to the computational efficiency of reduction definition and designed a new definition of maximum distribution reduction to speed up attribute reduction. Taking into consideration the general reduction on inconsistent decision tables, Ge et al. [16] proposed new definition of maximum distribution reduction. Heuristic approaches is one of import method in attribute reduction. The heuristic approach is composed of two parts: the attribute reduction heuristic and the search strategy [17] . The attribute reduction heuristic is the fitness function of a heuristic approach. Existing definitions of heuristics are mainly based on three aspects: dependency degree [18] , entropy [19] [20] [21] , and consistency [22, 23] . The search strategy is the control structure of the heuristic approach. Speaking loosely, the search strategy mainly includes three kinds of methods: the deletion method, the addition method, and the addition-deletion method [24] . Existing methods for the judgment of maximum distribution were weak association with mainstream heuristics. As a result, the efficiency of heuristic maximum distribution reduction algorithm was limited due to lack of the support of acceleration policies that mainstream heuristics have. This paper focuses on the quick reduction algorithms for maximum distribution reduction. At first, we analyze the defect of the quick maximum distribution reduction algorithm (Q-MDRA) proposed in [15] and explore the root cause of its defect. Next, based on the existing mainstream heuristic function, we develop three heuristic maximum distribution reduction algorithms. Finally, we conduct some experiments to evaluate the effectiveness and efficiency of proposed algorithms. The rest of this paper is organized as follows. In Sect. 2, we review some basic notions related to maximum distribution reduction and three classic heuristic functions. In Sect. 3, we show the defect of Q-MDRA with a calculation example of maximum distribution reduction. After exploring the root cause of its defect, we present three novel algorithms for maximum distribution reduction. In Sect. 4, we evaluate the efficiency of proposed algorithms through algorithm complexity analysis and comparison experiments. In this section, we review some basic notions related to maximum distribution reduction and three classic heuristic functions. The research object of the rough set theory is called the information system. The information system IS can be expressed as four tuple, i.e. < U, A, V, f >, where U stands for the universe of discourse, a non-empty finite set of objects. A is the set of attributes, V = a∈A V a is the set of all attribute values, and f : U × A → V is an information function that maps an object in U to exactly one value in V a . For ∀x ∈ U, ∀a ∈ A, we have f (x, a) ∈ V a . Specifically in the classification problem, the information table contains two kinds of attributes, which can be characterized by a decision table DT = (U, C ∪ D, V, f ) with C ∩ D = ∅, where an element of C is called a condition attribute, C is called a condition attribute set, an element D is called a decision attribute, and D is called a decision attribute set. For the condition attribute set B ⊆ C, the indiscernibility relation and discernibility relation of B is respectively defined by IN D(B One the basis of above notions, the concept of maximum distribution reduction was proposed by Zhang et al. [14] in 2003. It is said that B is a maximum distribution consistent attribute set if B satisfies condition (1) mentioned above only. There are two methods of maximum distribution reduction: the discernibility matrix based methods and the heuristic methods. For that the discernibility matrix based methods are lowefficiency, heuristic methods are the more reasonable choice for processing the larger scale data. The heuristic attribute reduction algorithms comprises two parts: the heuristic function and the control strategy. We take the addition strategy based heuristic algorithms as the research object of paper. For the heuristic functions, we take three classic heuristic functions, i.e., the dependency degree, the condition entropy, and the consistency as the alternatives for the construction of improved algorithms. Y n }, three classic heuristic functions (dependency degree, the consistency and conditional entropy) are defined by: In this section, we present two defects in Q-MDRA firstly. After analyzing its cause, we construct two quick heuristic maximum distribution reduction algorithms based on classic heuristic functions. At first, we want to review the quick maximum distribution reduction algorithm (Q-MDRA) proposed by Li et al. Here. Based upon Definition 1, Li et al. [15] proposed following theorem for the judgment of the maximum distribution reduct. This theorem is expressed by the Theorem 6.11 of Ref. [15] . γ MD B (D) = γ MD C (D) maintains unchanged the scale of the maximum decision classes instead of the maximum decision classes for all of the objects in decision tables. That is to say, B may be not a maximum distribution reduct of C in some special conditions. We present the detail information in Sect. 3.1. Based on the variant of dependency degree heuristic function in Theorem 1, Algorithm 1 was constructed by the way of the addition strategy. Something needed to be reminded of in Algorithm 1 is that we denote the assignment operation as ":=" and use the "=" to represent that two items are on equal term. Here, in the way of calculation example, we show the detail information about that Q-MDRA may not perform well as our expectation. Assume that there is a decision table given as Table 1 , we are assigned to get the maximum distribution reduct of Table 1 . The process of Q-MDRA for obtaining maximum distribution reduct of Table 1 is shown as follows. Step 1. red := ∅. Step 2. T := red, γ Md Step Using Q-MDRA we get a collection of attributes {a 1 }. According to Theorem 1, {a 1 } is a maximum distribution reduct of Table 1 Here we analyze the root of the defect of Theorem 1. Given a decision table , used in Theorem 1, is not sensitive to the change of the maximum decision classes of objects that have two or more than two maximum decision classes. On the other side, an attribute set red outputted by Q-MDRA does not always satisfy γ Md To solve the problems identified in Q-MDRA, the concept of indiscernibility relation and discernibility relation of maximum distribution with respect to the specific attribute set are defined. Firstly. Next, the maximum distribution reduct is defined using the indiscernibility relation of maximum distribution. Finally, we construct heuristic maximum distribution reduction algorithms with classic heuristic functions. -Neccessity(⇐): Assume that if B satisfies DIS md (C) ⊆ DIS(B) then ∃x ∈ U, γ B (x) = γ C (x). According to the assumption, we know ∃y ∈ . That is to say, < x, y > ∈ DIS md (C), < x, y > / ∈ DIS(B). It is conflicted with DIS md (C) ⊆ DIS(B). Consequently we know if B satisfies DIS md (C) ⊆ DIS(B) then ∀x ∈ U, γ B (x) = γ C (x). As mentioned above, Theorem 2 is true. Above theorem is good for understanding but it is not friendly in computing. So we represent maximum distribution reduction in the way of classic heuristic functions. According to Definition 2, we can present the definition of the maximum distribution reduct by conditional entropy. . This concludes a conflict with B is a maximum distribution reduct. That is to say, if B is a maximum distribution consistent attribute set, then H(T Gran|B) = 0. As a result, Theorem 3 is true. According to Definition 2, we can use dependency degree for the presentation of the maximum distribution reduct. Proof. According to Theorem 2 and Theorem 3, the conclusion is clearly established. For that Γ C (D) = 1 ⇔ δ C (D) = 1, we have Γ B (T Gran) = 1 ⇔ δ B (T Gran) = 1. As a result, there is no need to construct a theorem for maximum distribution reduction with δ B (T Gran). Based on upon theorems, the significance functions for maximum distribution reduction can be defined as follows. The objective of this section is to present the correctnes and the efficiency of the attribute reduction algorithms proposed in this paper, i.e. MDRAUCE and MDRAUDD. To show the correctness of two algorithms, we calculate the maximum distribution reduct of Table 1 using MDRAUCE and MDRAUDD, and check outputs of two algorithms with the definition of maximum distribution T Gran := T Gran − P OS red (T Gran); 9: end while 10: return red reduction for validation. On the other side, we employed 12 UCI data sets to verify the performance of time consumption of MDRAUCE, MDRAUDD, and existing maximum distribution reduction algorithms. In this part, we show the correctness of two algorithms proposed in Sect. 3 through presenting the process of calculating the maximum distribution reduct for Table 1 using Algorithm 2 and Algorithm 3. After that, we check the outputs of two algorithms according to the maximum distribution definition. The process of MDRAUCE for finding the maximum distribution reduct of Table 1 is presented here. In the following description of calculation process, "item1=item2" denotes that the relationship of two are on equal item, and ":=" stands for the assignment operation. Step 1. red := ∅, Step 2. Sig outer Step 5. Because U = ∅, program is over. Algorithm outputs red = {a 1 , a 3 , a 2 } as the result. The process of MDRAUDD for obtaining the maximum distribution reduct of Table 1 is presented as follows. Step Step 2. Sig outer Step 3. Sig outer Step 5. Because U = ∅, program is over. Algorithm outputs red = {a 3 , a 2 , a 1 } as the result. According to Definition 1, we know γ red (x 1 ) = {P 1 } and γ red ( It is obvious that for ∀x ∈ U, γ red (x) = γ C (x). Finally we know that MDRAUCE and MDRAUDD are correct. In this part, we employed 12 data sets to verify the performance of time consumption of MDRAUDD, MDRAUDD, Q-MDRA [15] and QGARA-FS [16] . We carried out all the attribute reduction algorithms in experiments on a personal computer with Windows 10, Intel(R) Core(TM) CPU i5-8265U 1.60GHZ and 8GB RAM memory. The software used was Visual Studio Code 1.3.8, and the programming language was python 3.7. The data sets used in experiments are all downloaded from UCI repository of machine learning data sets [25] whose basic information is outlined in Table 2 . For the sake that reduction algorithms can address only symbolic data, data sets containing continuous attributes were preprocessed by CAIM [26] discretization algorithm. For each data sets, the positive region dependency degree, i.e. γ C (D), is listed in the last column of Table 2 . As we know, the data set is consistent if γ C (D) = 1; otherwise, it is inconsistent. As shown in Table 2 , Wpbc, Wine, and Sonar are consistent. Taking into consideration the value of γ C (D), we take Sat, Segment, Wdbc, and Wave as consistent data sets whose value of γ C (D) satisfies 0.981 ≤ γ C (D) ≤ 1. The other 5 data sets (Vehicle, Ion, Glass, Heart, and Pid) are inconsistent. Table 3 indicate the computational time of MDRAUCE, MDRADD, Q-MDRA, and QGARA-FS for obtaining maximum distribution reduct on 12 data sets. We can see that MDRADD was the fastest of four attribute reduction algorithms for that it was the best on 11 data sets, and MDRAUCE was faster than QGARA-FS. MDRAUCE performed better than Q-MDRA in obtaining the reduct of 9 data sets. Q-MDRA performed better than MDRAUCE, MDRAUCE on small data sets,i.e. Wine data set. However, in processing the large scale data, Q-MDRA consumed more time than MDRAUCE, MDRADD. From results of experiments on both consistent and inconsistent decision tables, the computational times of four algorithms in obtaining the maximum distribution reduct followed this order: MDRADD ≥ MDRAUCE, Q-MDRA > QGARA-FS. For most of the cases in experiments, the computational time of MDRAUDD can reduce half of the computation time of QGARA-FS and Q-MDRA, such as data sets Wpdc, Glass, Heart, etc. In the same condition. from the row of average time consumption in obtaining reduct of 12 data sets, we know that MDRAUCE and MDRADD are more efficient and steady in time consumption of maximum distribution reduction than existing maximum distribution reduction algorithms. In this paper, we focus on the maximum distribution reduction for complete inconsistent decision tables. The problems in Li's algorithm for obtaining the maximum distribution reduct were pointed out, and based on classic heuristic functions, we designed two novel heuristic algorithms, i.e. MDRAUCE and MDRADD, to efficiently finding a maximum distribution reduct. Because the scale of data processed becomes larger and larger, the efficiency of attribute reduction algorithms is still our focus of future researches. Rough sets A multi-attribute decision-making model for the robust classification of multiple inputs and outputs datasets with uncertainty Approximations and uncertainty measures in incomplete information systems Enhanced rough-fuzzy c-means algorithm with strict rough sets properties On a novel uncertain soft set model: Z-soft fuzzy rough set model and corresponding decision making methods ieRSPOP: a novel incremental rough set-based pseudo outer-product with ensemble learning Test-cost-sensitive rough set based approach for minimum weight vertex cover problem Flow-based tolerance rough sets for pattern classification An enhanced classification method comprising a genetic algorithm, rough set theory and a modified PBMF-index function Attribute reduction for dynamic data sets A hybrid decision support system based on rough set and extreme learning machine for diagnosis of hepatitis disease Dimensionality reduction based on rough set theory: a review Comparative study of alternative types of knowledge reduction in inconsistent systems Knowledge reductions in inconsistent information systems Quick attribute reduction in inconsistent decision tables Quick general reduction algorithms for inconsistent decision tables On reduct construction algorithms Learning in relational databases: a rough set approach A rough set algorithm for attribute reduction via mutual information and conditional entropy Continuous attribute reduction method based on an automatic clustering algorithm and decision entropy Entropy based attribute reduction approach for incomplete decision table Consistency-based search in feature selection Consistency based attribute reduction Analysis on attribute reduction strategies of rough set UCI machine learning repository An effective discretization based on classattribute coherence maximization A special thank is owed to professor Li Min, the discussing with him on Q-MDRA contributes a lot to this paper. The work is partially supported by the National Key Research and Development Project (No. 213), the National Nature Science Foundation of China (No. 61976160, No. 61673301) and Key Lab of Information Network Security, Ministry of Public Security (No. C18608).