key: cord-0044788-5d3l5wr3 authors: Petiot, Guillaume title: Converting Possibilistic Networks by Using Uncertain Gates date: 2020-05-16 journal: Information Processing and Management of Uncertainty in Knowledge-Based Systems DOI: 10.1007/978-3-030-50153-2_44 sha: 64f628a88a0c01f11f52f3008b4ce53f5acb1be3 doc_id: 44788 cord_uid: 5d3l5wr3 The purpose of this paper is to define a general frame to convert the Conditional Possibility Tables (CPT) of an existing possibilistic network into uncertain gates. In possibilistic networks, CPT parameters must be elicited by an expert but when the number of parents of a variable grows, the number of parameters to elicit grows exponentially. This problem generates difficulties for experts to elicit all parameters because it is time-consuming. One solution consists in using uncertain gates to compute automatically CPTs. This is useful in knowledge engineering. When possibilistic networks already exist, it can be interesting to transform them by using uncertain gates because we can highlight the combination behaviour of the variables. To illustrate our approach, we will present at first a simple example of the estimation for 3 test CPTs with behaviours MIN, MAX and weighted average. Then, we will perform a more significant experimentation which will consist in converting a set of Bayesian networks into possibilistic networks to perform the estimation of CPTs by uncertain gates. Knowledge engineering often needs to describe how information is combined. Experts' knowledge can be represented by a Directional Acyclic Graph (DAG). Unfortunately, this kind of knowledge is often imprecise and uncertain. The use of possibility theory [21] allows us to take into account these imperfections. Possibilistic networks [1, 2] , as Bayesian networks [17, 20] in probability theory, can be used to propagate new information, called evidence, in a DAG. The main disadvantage of possibilistic networks is the need to elicit all parameters of Conditional Possibility Tables. When the number of parents of a variable grows, the number of parameters to elicit in the CPT grows exponentially. For example, if a variable of 2 states has 10 parents with 2 states, then we have 2 11 = 2048 parameters to elicit. Moreover, when experts define the parameters of a CPT in possibilistic networks, they often define an implicit behaviour. The most common behaviours are those of Boolean logic, such as AND, OR, XOR, etc. Noisy gates in probability theory [4] deal with this problem of parameters. They use a function to compute automatically the CPT. As a result, the number of parameters is reduced. Another advantage is the modelling of noise and incomplete knowledge. Indeed, it is often difficult to have all the knowledge during the task of modelling because of the complexity of the problem. The use of a variable which represents unknown knowledge leads to another kind of model. In possibility theory, authors Dubois et al. [5] proposed to use uncertain logical gates which present the same advantages as noisy gates. The behaviour of uncertain gates connectors depends on the choice of the function. The function can have the following behaviours: indulgent, compromise or severe. Nowadays, there are several applications of possibilistic networks that exist where experts have elicited all CPT parameters. We propose in this paper a solution to convert an existing possibilistic network into a possibilistic network with uncertain gates. It is very interesting to see which connector corresponds to each CPT. This can provide a better understanding of information processing, which is not highlighted when an expert elicits CPT parameters. This is also useful in knowledge engineering. This paper is structured as follows. In the first part, we will present possibility theory and uncertain gates. Then we will focus our interest on the estimation of uncertain gates and we will propose the examples of the estimation of uncertain gates for test CPTs and existing applications using Bayesian networks. So we will have to convert Bayesian networks into possibilistic networks and then we will perform the estimation of the CPTs. We will present the results of this experiment in the last part of this paper. Uncertain gates are an analogy of noisy gates in possibility theory [21] . Possibility theory proposes to model imprecise knowledge by using a possibility distribution. If Ω is the referential, then the possibility distribution π is defined from Ω to [0, 1] with max v∈Ω π(v) = 1. Dubois et al. [6] define the possibility measure Π and the necessity measure N from P (Ω) to [0, 1]: The possibility theory is not additive but maxitive: We can compute the possibility of a variable A by using the conditioning proposed by D. Dubois and H. Prade [6] : Possibilistic networks [1, 2] can be defined as follows for a DAG G = (V, E) where V is the set of Variables and E the set of edges between the variables: Pa stands for the parents of the variable X. There are two kinds of possibilistic networks: qualitative possibilistic networks (also called min-based possibilistic networks) where is the function min, and quantitative possibilistic networks (also called product-based possibilistic networks) where is the product. In this research, we consider the comparison of possibilistic values instead of the use of an intensity scale in [0,1] so we will use a min-based possibilistic network. We can propose as in noisy gates to use the Independence of Causal Influence [4, 9, 25] to define a possibilistic model with the ICI. In fact, as in probability theory, there is a set of causal variables X 1 , ..., X n which influence the result of an effect variable Y . We insert between each X i s and Y a variable Z i to take into account the noise, i.e. uncertainty. We can also introduce another variable Z l which represents the unknown knowledge [5] . The following figure presents the possibilistic model with ICI ( Fig. 1 ): We propose to compute π(Y |X 1 , ..., X n ) by marginalizing the variables Z i s: The ⊗ is the minimum and the ⊕ is the maximum in possibility theory. If we have a deterministic model where the value of Y can be computed from the values of Z i s by using a function f then: where π(y|z1, ..., zn) = 1 if y = f (z1, ..., zn) 0 else As a result, we obtain the following equation: If we add a leakage variable Z l in the previous equation, we obtain: π(y|x1, ..., xn) = z 1 ,...,zn,z l :y=f (z 1 ,...,zn,z l ) The CPT is computed by using this formula. The function f can be AND, OR or their generalizations uncertain MIN and uncertain MAX as proposed by Dubois et al. [5] . These authors developed an optimization for the computation of uncertain MIN and MAX connectors. We can also use a weighted average function or the operator OWA [22] , as described in our previous study [13, 14] . For this experimentation, we will use the connectors uncertain MIN (UMIN), uncertain MAX (UMAX) and uncertain weighted average (UWAVG). The function f must have the same domain as the variable Y . We can see that the connectors uncertain MIN and uncertain MAX satisfy this property. The weighted average of the intensity level of the variables X i can return a value outside the domain of Y . So in this case we have to combine a function noted g with a scaling function f s which returns a value in the domain of Y . We can write f = f s • g where function g is g(z 1 , ..., z n ) = ω 1 z 1 +...+ω n z n . If the state of the variable Y defines an ordered scale E = {ϑ 0 < ϑ 1 < ... < ϑ m }, the function f s can be for example: The parameters θ i allow us to adjust the behaviour of f s . The function g has n parameters which are the weights w i of the linear combination and n arguments. If all weights are equal to 1 n , then we calculate the average of the intensities. If ∀ i∈ [1,n] ω i = 1, then we make the sum of the intensities. To compute the Eq. 9, we have to define π(Z i |X i ), π(Z l ), and the function f . If the states of a variable are ordered, we can assign an intensity level to each state as done by Dubois et al. [5] . In our experimentation with test CPTs we propose to use 3 states for the variables such as low, medium and high. The intensity levels are 0 for low, 1 for medium and 2 for high. The table of π(Z i |X i ) is the following ( Table 1) : In the previous table, κ represents the possibility that an inhibitor exists if the cause is met and s i the possibility that a substitute exists when the cause is not met. If a cause of weak intensity cannot produce a strong effect, then all s i = 0. So there are 6 parameters at the most per variable and 2 parameters for π(Z l ). If all variables have the same intensity level and are graded variables with a normal state of intensity 0 (the first state is the normal state), there is no problem. In this case, the higher the intensity is, the more abnormal the situation becomes. But there are two other cases to discuss. The first case concerns the variables which are not graded variables. For example, if we have a variable where the domain is (decrease, normal, accelerate), we can see that the normal speed is the neutral state i.e. with the intensity level 1. The second case concerns the incompatibilities of the intensity levels of the causal variables X i s and their parents. Indeed, if some causal variables do not share the same domain with the effect variable, we can imagine that the computation of the function f can give the results that are out of range. This remark leads to a constraint on Z i s, which is that Z i s must have the same domain as Y . The estimation of uncertain gates leads to two problems discussed in [24] . The first one is the simple case where we would like to perform an estimation of uncertain gates from an existing CPT. The second case concerns the estimation of the connector from the data without any CPT. Indeed, in possibilistic networks, the number of parameters to elicit grows exponentially with the number of parents of a variable, leading to problems of knowledge engineering. So if data is available, we propose to perform the estimation of the CPT instead of eliciting all parameters. Unfortunately, it is often difficult to have enough data to estimate all parameters. In this paper, we will focus our interest on the first case. If a CPT already exists, it can be useful to know which uncertain gate corresponds to the CPT. The question can be which connector matches the target CPT among all existing connectors. To discover this, we have to compare the CPT generated by uncertain gates and the existing CPT. This leads to the problem of estimating the parameters of uncertain gates. We look for the closest CPT to a reference CPT. To do this, we must use a distance [8] . For our first experimentation we will use the Euclidean distance. The problems discussed in [24] for this distance are also present in our research. We propose to analyze them and to compare several distances in future works. With respect to this study, we will compare 4 estimation methods by using the distance of the CPTs as a cost function in order to improve the estimation: hill-climbing (HC), simulated annealing (SA), tabu search (TS), and genetic algorithm (GA). We propose to choose the connector with the smallest distanceθ to the target CPT. We consider the example of hill-climbing and the estimation of the parameters of the uncertain MIN. We must at first initialize all parameters of the uncertain MIN to a random value in [0, 1]. Then we generate the CPT by using the connector and we calculate the initial distance between the generated CPT and the target CPT. Next, we evaluate all neighbours of the parameters by adding and subtracting a step (Function GenNeighbour in the following algorithm). We consider only the parameters with the smallest distance. These parameters become our temporary solution. We reiterate this process until the distance is lower than a constant, or a maximum number of iterations is reached. The algorithm for the estimation of the parameters of the uncertain MIN with hill-climbing is the following: The simulated annealing algorithm was proposed [18] to describe a thermodynamic system. This is a probabilistic approach which leads to an approximation of a global optimum. We have proposed the following algorithm for the estimation of the parameters of the uncertain MIN: We also propose to use a genetic algorithm to perform the estimation. The chromosomes gather all the parameters of the connector. The population of chromosomes will evolve by favouring the best individuals. A fitness function, which measures the distance between a CPT generated by one chromosome and the target CPT, is used to compare the individuals. The best chromosome is the one with the smallest fitness. The genetic algorithm performs several processing operations in a loop: the selection of the parents, the crossover, and the mutation. We can resume this with the following algorithm: The tabu search was proposed by F. W. Glover [10] [11] [12] . This algorithm improves local search algorithms by using a list of previous moves to avoid processing a situation already explored. The Tabu search algorithm is the following: We now propose to test this approach on simulated CPTs for connectors MIN, MAX and weighted average. Next, we will compare the estimation of the CPTs. For this experimentation, we will use only two causal variables X 1 and X 2 and one effect variable Y . All the variables have states: low, medium and high with intensity levels 0, 1 and 2. We provide below the parameters used to compute the simulated CPTs (Table 2) : We used the same parameters for X 1 and X 2 . We defined π(Z l ) with the values π(ZL = 0) = 1 and π(ZL = 1) = π(ZL = 2) = 0.1. For the weighted average, we will simulate a mean behaviour by defining the weights equal to 0.5 for all variables but we will not take into account the leakage variable. We present the simulated CPTs obtained by using the above parameters in the following tables (Table 3) : Table 3 . Simulated CPTs. Low We have performed the estimation with several steps 0.1, 0.01 and 0.001 and a limited number of iteration of 40000. The result of the estimation is the following (Table 4) : Table 4 . Result of the estimation. Uncertain We can see in the above tables that all simulated CPTs are associated with the expected connector. We can see in bold in each line the smallest distance which corresponds to the best result for the estimation. By applying the decision rule, we select for each line the connector and the estimation algorithm which has produced the best result. For example, with step 0.001 we can see that the simulated MIN CPT was associated with the connector uncertain MIN and that the best result was provided by the simulated annealing algorithm with the final distance of 0. This means that the parameters are exactly estimated. The simulated MAX CPT was associated to the connector uncertain MAX and the best result was provided by tabu search. And finally, the simulated WAVG CPT was associated to uncertain WAVG connector as expected and the best result was provided by tabu search with a distance of 2 × 10 −3 . If we compare the results, we can see that the best results are obtained with step 0.1. So we provide below the comparison of the estimation algorithms which lead to the best results for this step (Fig. 2) : We can see in the above results that all algorithms did not converge to 0. The best result seems to be obtained by simulated annealing or tabu search but for step=0.001 and for the simulated MIN CPT the tabu algorithm didn't find the expected parameters. Nevertheless, we have tried several parameters for tabu search and we have obtained good results. We have performed another experimentation by using existing Bayesian networks: Asia [16] , Earthquake and Cancer [15] , Survey [19] , Sachs [23] , Andes [3] because of the small number of data set available for possibilistic networks. We have transformed the conditional probability tables into conditional possibility tables to obtain possibilistic networks. The structure of the networks is the same in Bayesian networks and possibilistic networks. Then we have applied our approach to replace all CPTs by appropriate connectors. The problem of converting the probability table has already been discussed by Dubois et al. [7] . The temptation can be to perform the normalization of the probability but it is not sufficient to ensure that the possibility property Π ≥ P is satisfied. For example, if we have p 1 > p 2 > ... > p n , we perform p i = pi p1 for all i. The correct solution proposed by Dubois et al. [7] consists in computing the possibilities as follows: We present below the result of this experimentation. At first, we have performed the estimation for small networks. We can see in the first column the name of the data set, then the name of the node and the number of the parents of the node. In the fourth column, we present the size of the CPT, and in the following columns, the minimal distance for all connectors (Table 5 ). This first experiment allows us to distinguish two cases. The first one concerns the distance close to 0. In this case, the computation of the connector matches the target CPT. The connector can replace the CPT. The second case concerns the distance not close enough to 0, such as the tables alarm, E or T. We cannot replace the CPT by a connector, nevertheless the information provided is meaningful because we can deduce the behaviour of the CPT between severe and indulgent. We have also performed another experiment with Andes, which is a big network, and we obtain the following result for the first 30 nodes (Table 6 ): Having computed all nodes, we obtain the following results: for the uncertain MIN connector, 94% of the final distance is less than 0.1; for the uncertain MAX connector, only 6 results are below 0.1; for the uncertain WAVG connector most of the results have a distance less than 0.1 because it can replace the other connectors. Nevertheless, its computation time is high. The best results for the uncertain MIN connector were obtained by using the hill-climbing algorithm and the genetic algorithm. Nevertheless, the use of simulated annealing and tabu search can sometimes provide good results too. As for the uncertain MAX and uncertain WAVG connectors, all algorithms gave the same number of good results. The use of concurrent algorithms for the estimation has improved the final result. In this paper, we have proposed a solution to convert the CPTs of an existing possibilistic network into uncertain gates. Uncertain gates provide a set of connectors with behaviours from severe to indulgent through the family of compromise (median, average, weighted average,...). Uncertain gates provide meaningful knowledge about how information is combined to compute the state of a variable. To convert the CPTs, we have used and compared optimization algorithms such as hill-climbing, simulated annealing, tabu search and the genetic algorithm. The goal was to find the closest CPT generated by using the connectors. The best result is provided by the connector which matches the best the initial CPT. To validate this approach, we have generated test CPTs with the behaviours MIN, MAX and weighted average and performed the estimation. The results correspond to our expectations. We have also proposed to test our solution on real datasets. To do this, we have converted Bayesian networks into possibilistic networks and performed the estimation of the connectors to select the most appropriate one for each table. For our future works we would like to improve the computation of the connectors by proposing a parallel algorithm. We would like to better analyze the problem of variables with different intensity scales. Also, it can be interesting to evaluate the effects on the decision of converting a CPT into uncertain gates. Possibilistic logic bases and possibilistic graphs Possibilistic graphical models On-line student modeling for coached problem solving using Bayesian networks Canonical probabilistic models for knowledge engineering Uncertain logical gates in possibilistic networks. an application to human geography Possibility Theory: An Approach to Computerized Processing of Uncertainty Probability-possibility transformations, triangular fuzzy sets, and probabilistic inequalities On the effectiveness of the skew divergence for statistical language analysis A new look at causal independence Future paths for integer programming and links to artificial intelligence Tabu Search-Part I Tabu Search-Part II The calculation of educational indicators by uncertain gates Merging information using uncertain gates: an application to educational indicators Bayesian Artificial Intelligence Local computations with probabilities on graphical structures and their application to expert systems Probabilistic Reasoning in Expert Systems: Theory and Algorithms Equation of state calculations by fast computing machines Bayesian Networks: with Examples in R A new look at causal independence Fuzzy sets as a basis for a theory of possibility On ordered weighted averaging aggregation operators in multicriteria decisionmaking Causal protein-signaling networks derived from multiparameter singlecell data Knowledge engineering for Bayesian networks: how common are noisy-MAX distributions in practice? Probabilistic independence of causal influences