key: cord-0047098-ekj4klow authors: Restrepo, Mauricio; Cornelis, Chris title: Attribute Reduction from Closure Operators and Matroids in Rough Set Theory date: 2020-06-10 journal: Rough Sets DOI: 10.1007/978-3-030-52705-1_13 sha: 086ed2507dd8df4869f6d0dd957d25aa3906e5c8 doc_id: 47098 cord_uid: ekj4klow In this paper, we present a new closure operator defined on the set of attributes of an information system that satisfies the conditions for defining a matroid. We establish some basic relationships between equivalence classes and approximation operators where different sets of attributes are used. It is shown that the reducts of an information system can be obtained from dependent sets of a matroid. Finally, we show that the closure operator can be defined at least in three different ways. The main concept of rough set theory, proposed by Z. Pawlak, is the indiscernibility between objects given by an equivalence relation in a non-empty set U, called Universe. This theory has been used for the study of information systems and it has been successfully applied in artificial intelligence fields such as machine learning, pattern recognition, decision analysis, process control, knowledge discovery in databases, and expert systems. Matroids were introduced in 1935 by Whitney and have been used as a generalization of the concept of independence in different mathematical theories like linear spaces, graph theory, field theory, and fuzzy sets [5, 11] . In relation to rough sets, many papers have shown interesting connections with matroids [6] [7] [8] [9] [10] [14] [15] [16] [17] . Zhu et al., presented the concept of rough matroids based on a relation [21, 22] and rough matroids based on coverings [19] . An interesting generalization of rough matroids for a general approximation operator was proposed in [12] . The problem of attribute reduction is a problem of high computational complexity and turns out to be fundamental in the construction of machine learning models. This problem has attracted the attention of many researchers and has been addressed from the perspective of different theories. Some relationships of attribute reduction with matroids and rough sets, can be found in [4, 17, 18, 20] . However, in most publications related to matroids and the attribute reduction problem, matroidal structures have been defined in terms of the set of objects and not the set of attributes. This paper proposes a matroidal structure defined in terms of an attribute set whose maximal independent elements, matching reducts. Therefore the idea of independence in a set given by a matroid can also be applied to the set of attributes A, in order to address the problem of attribute reduction in a better way. The paper is organized as follows: Sect. 2 presents preliminary concepts regarding rough set theory, lower and upper approximations, closure operators and matroids. Section 3 establishes some basic relationships between equivalence classes and approximation operators where different sets of attributes are used. Section 4 presents three definitions of closure operators and the necessary conditions for them to define a matroidal structure. Finally, Sect. 5 presents the main conclusions of the paper and describes future work. Rough set theory involves a data table composed of a finite set U of objects described by a finite set A of attributes. The basic notions of rough set theory are: indiscernibility relation on U, lower and upper approximation operators, dependence among attributes, and decision rules derived from lower approximations [3] . The pair (U, A) is called an information system. A simple example can be seen in Table 1 . In this case U = {1, 2, 3, 4, 5, 6} and A = {a 1 , a 2 , a 3 , a 4 }. From each subset of attributes P ⊆ A, it is possible to define a binary relation E P : xE P y if and only if f a (x) = f a (y) for all a ∈ P. In this case, f a (x) is the value of an object x for the attribute a ∈ A. According to the same table, we have that f a 1 (2) = B, while f a 2 (1) = G. Clearly this is an equivalence relation, i.e., a reflexive, symmetric and transitive relation. If X ⊆ U, the operators: are called the lower and upper approximations of X. [x] P represents the equivalence class of x for the set of attributes P, and I P = U/E P represents the equivalence classes, defined from the equivalence relation E P . Each subset of attributes P ⊆ A defines a partition of U. In particular, for P = {a}, the partition is represented as P a . The pair (apr, apr) is a dual pair of approximation operators, that is, for X ⊆ U, apr(∼ X) =∼ apr(X), where ∼ X represents the complement of X, i.e., ∼ X = U \ X. Example 1. In Table 1 we have an information system with six objects and four condition attributes {a 1 , a 2 , a 3 , a 4 }. Also, it is easy to see that: Attribute reduction in rough set theory involves the removal of attributes that have no significance to the classification problem. An attribute reduct set (or simply reduct) is a subset P of the set of attributes A such that the quality of classification is the same [13] . According to the definition of superfluous attribute given in [3] we can say that if [x] P = [x] P∪{a} for all x ∈ U, then a is called a superfluous attribute of P; otherwise, a is called indispensable in P. The set P is independent if all of its attributes are indispensable. The subset Q of P is a reduct of P (denoted as Red(P)) if Q is independent and Matroids are related to the notion of linear independence. They can be introduced from an elementary point of view as a collection of sets of linearly independent vectors. Let } is a set of linearly dependent vectors, because the matrix A with columns v i has free variables. However, there are several subsets of S whose vectors are linearly independent. For example, A collection of sets with linearly independent vectors determines a structure that we will call a matroid. Additionally, we know that any subset of linearly independent vectors is also linearly independent. In this case, a collection of independent sets is: There are different definitions of a matroid. In this case, we consider the following definition in terms of independence. where I is a collection of subsets of U with the following properties: denotes the cardinality of the set I. If I ∈ I, then I is called a independent set. If a subset of U is not an independent set, then it is called a dependent set. By property 3, every two maximal independent sets in a matroid have the same cardinal number. Rank Function. The rank function of a matroid M = (U, I) is a function r : P(U) → N defined as: This function satisfies: The notion of closure operator usually is applied to ordered sets and topological spaces. Some relations between closure operators with upper approximation and matroids are presented in [1, 6] . We present some concepts about ordered structures, according to Blyth [2] . If M = (U, I) is a matroid and r its rank function, then is a closure operator. Also, the operator c M satisfies the following property: For any x, y ∈ U and any A ⊆ U, it follows from y ∈ c(A ∪ {x}) and y / ∈ c(A) that x ∈ c(A ∪ {y}). Any closure operator which satisfies property P 4 defines a matroidal structure according to the following proposition. Proposition 1. [11] If c is a closure operator on a finite set U that satisfied property P 4 , and The following propositions establish useful relationships between equivalence classes and approximation operators for different sets of attributes. Proof. If w ∈ [x] Q then f a (x) = f a (w) for all a ∈ Q, in particular for all a ∈ P, since P ⊆ Q. Therefore f a (x) = f a (w) for all a ∈ P, and so w ∈ [x] P . Using the order relation above it is possible to establish the following: For all P, Q ⊆ A and X ⊆ U we have: Example 2. In Table 2 we have the equivalence classes of each element x ∈ U, using some sets of attributes. For the particular cases of P = {a i }, we have the partitions given in Example 1. The sets of attributes Q 1 = {a 1 , a 2 , a 4 }, Q 2 = {a 1 , a 3 , a 4 }, and A = {a 1 , a 2 , a 3 , a 4 } have the same equivalence class for each x ∈ U. It is easy to see that lower and upper approximations of each X ⊆ U are also the same. In this case, Q 1 and Q 2 are reducts of A. Let (U, A) be a decision system, where U is a finite set and A is a finite set of attributes. For each P ⊆ A, we can define at least three closure operators, as follows. For each P ⊆ A subset of attributes, we define: Proposition 5. The operator c 1 is a closure on A. Proof. We will show the three properties: 1. P ⊆ c 1 (P). If a ∈ P, then P ∪ {a} = P and apr P∪a (X) = apr P (X). 2. If P ⊆ Q, then c 1 (P) ⊆ c 1 (Q). If a ∈ c 1 (P), we have that apr P∪{a} (X) = apr P (X) for all X ⊆ U. We will show that a ∈ c(Q), i.e., apr Q∪{a} (X) = apr Q (X). If w ∈ apr Q∪{a} (X), then [w] Q∪{a} ⊆ [w] P∪{a} = [w] P . Therefore, apr Q∪{a} (X) ⊆ apr Q (X). According to Proposition 2, apr Q (X) ⊆ apr Q∪{a} (X). So, apr Q∪{a} (X) = apr Q (X) and a ∈ c 2 (P). 3. c 1 (c 1 (P)) = c 1 (P). From properties 1 and 2 we have that c 1 (P) ⊆ c 1 (c 1 (P)). For the other relation let a be such that a ∈ c 1 (c 1 (P)). We have that apr c 1 (P)∪{a} (X) = apr c 1 (P) (X) for all X ⊆ U. We need to show that apr P∪{a} (X) = apr P (X). The relation apr c 1 (P) (X) ⊆ apr c 1 (c 1 (P)) (X) always holds. For X ∈ P a , we have that apr c 1 (P) (X) = X. Therefore apr c 1 (c 1 (P)) (X) = X, so ∈ c 1 (P), (see Definition 5) . There are at least other two definitions for the closure operator. For each P ⊆ A subset of attributes, we define: The following proposition establishes the equivalence between c 1 and c 2 operators. Proposition 6. c 2 = c 1 . Proof. We will see that c 1 (P) ⊆ c 2 (P) and c 2 (P) ⊆ c 1 (P), for all P ⊆ A. P∪{a} ⊆ X iff w ∈ apr P∪{a} (X). So apr P (X) = apr P∪{a} (X) for all X ⊆ U, and a ∈ c 1 (P). 2. c 1 ≤ c 2 . Suppose that a ∈ c 1 (P), and a / ∈ c 2 (P). For some a ∈ A, P ⊆ A. Since a / ∈ c 2 (P), there exists x ∈ U such that [x] P = [x] P∪{a} . Hence, there exists y ∈ U for which y ∈ [x] P but y / ∈ [x] P∪{a} . Suppose X = [x] P∪{a} , then apr P∪{a} (X) = X and since a ∈ c 1 (P), apr P (X) = X. Since x ∈ X, therefore [x] P ⊆ X. But this means that y ∈ X = [x] P∪{a} (X) = X, a contradiction. So c 1 ≤ c 2 . Operator c 2 satisfies property P 4 . This is: for any a, b ∈ A and any P ⊆ A, if b ∈ c 2 (P ∪ {a}) and b / ∈ c 2 (P), then a ∈ c 2 (P ∪ {b}). Proof. We will see that c 2 (P) ⊆ c 3 (P), and c 3 (P) ⊆ c 2 (P) for all P ⊆ A. From Propositions 1 and 6, follows this corollary. Example 3. The values of the closure operator c 1 for some sets of attributes in Example 1, are the following: According to Propositions 1 and 7, operator c 1 defines a matroidal structure, given by: a) , for all a ∈ A} (10) Figure 1 shows the matroidal structure obtained from the closure operator on the dataset in Example 1. The connecting lines represent the inclusion relation. According to Eq. 10, the set P = {a 2 , a 4 } belongs to I, because for all a i ∈ A, we have that a i / ∈ c 1 (P − a i ). On the other hand, the set P = {a 3 , a 4 } does not belong to I, because for a 1 ∈ A, we have that a 1 ∈ c 1 (P − a 1 ) = c 1 (P) = {a 1 , a 2 , a 3 , a 4 }. As we can see, the maximal independent sets {a 1 , a 4 }, {a 2 , a 3 } and {a 2 , a 4 } have the same number of elements. If P is different to c 1 (P), i.e. if P c 1 (P), then P is a dependent set, as is shown in the following proposition. Proposition 9. If P c 1 (P), then P is a dependent set Proof. If P c 1 (P), then there exists a ∈ c 1 (P) such that a / ∈ P. Since a / ∈ P, we have P − a = P, therefore a ∈ c 1 (P) = c 1 (P − a), so P / ∈ I. Figure 2 shows the structure obtained from the dependent set related with the matroidal structure. In this collection of dependent sets we can find all the possible reducts. For example, P 1 = {a 1 , a 2 } and P 2 = {a 1 , a 3 } produce the same partition as Q = {a 1 , a 2 , a 3 }, i.e., I P 1 = I Q and I P 2 = I Q . Also, we have that P 1 = {a 1 , a 3 , a 4 } and P 4 = {a 2 , a 3 , a 4 } produce the same partition as A. In this case, P 1 = {a 1 , a 3 , a 4 } and P 2 = {a 1 , a 2 , a 4 } are reducts of A. This paper introduces a new methodology to address the problem of attribute reduction of an information system in Rough Set Theory. Specifically, a closure operator is introduced into the set of attributes that satisfies the conditions for defining a matroid. It is shown that the dependent sets of the matroid contain the reducts. Two definitions of the closure operator are additionally presented and it is proved that they coincide with the proposed operator. The novelty of the proposal to define new structures on the set of attributes undoubtedly opens the possibility of analyzing the problem of attribute reduction from a new perspective. As future work, it is planned to extend the definition to the different generalizations of Rough Set Theory, in particular relation-based rough sets and coveringbased rough sets. Characterization of coverings for upper approximation operators being closure operators Lattices and Ordered Algebraic Structures Rough sets theory for multicriteria decision analysis Nullity-based matroid of rough sets and its application to attribute reduction Matroid Theory Matroidal approaches to rough sets via closure operators Rough sets and matroids from a lattice-theoretic viewpoint The relationships between degree rough sets and matroids Relation matroid and its relationship with generalized rough set based on relations Relationship between partition matroids and rough sets through k-rank matroids Matroid Theory Rough matroids based on dual approximation operators Rough set-based dimensionality reduction for supervised and unsupervised learning A matroidal approach to rough set theory On the upper approximations of covering generalized rough sets Transversal and function matroidal structures of covering-based rough sets Matroidal structure of rough sets and its characterization to attribute reduction. Knowl.-Based Syst Reduction about approximation spaces of covering generalized rough sets Rough matroids based on covering The lattice and matroid representations of definable sets in generalized rough sets based on relations Rough matroids Rough matroids based on relation