key: cord-0044853-3adad86d authors: Coquin, Didier; Boukezzoula, Reda; Ben Ameur, Rihab title: Correction of Belief Function to Improve the Performances of a Fusion System date: 2020-05-15 journal: Information Processing and Management of Uncertainty in Knowledge-Based Systems DOI: 10.1007/978-3-030-50143-3_23 sha: 0f10b9135da412f952e22d837c527929e6fdf1de doc_id: 44853 cord_uid: 3adad86d Our application concerns the fusion of classifiers for the recognition of trees from their leaves, in the framework of belief functions theory. In order to improve the rate of good classification it is necessary to correct Bayesian mass functions. This correction will be done from the meta-knowledge which is estimated from the confusion matrix. The corrected mass functions considerably improve the recognition rate based on the decisions provided by the classifiers. Tree species recognition is the problem of identifying the species of a given tree. This task may be easy for a botanist who has strong knowledge about trees. However, a novice or a lover of the trees universe may have difficulties. This paper is part of the ReVeRiES project that seeks to develop a mobile application that recognizes tree species without needing an internet connection. Indeed, the application will be used in nature, forests, mountains where an internet connection is not available. In the context of tree species recognition, we treat a real problem since the photos of leaves are taken in the natural environment. The processing of this type of data is complicated because of their specificities due firstly to the nature of the objects to be recognized (inter-species similarity and intra-species variability) and secondly to the environment. Errors can be accumulated during the pre-fusion process. The merit of the fusion is to take into account all the imperfections that can taint the available data and try to model them well. The fusion is more effective if the data are well modeled. The theory of belief functions represents one of the best theoretical frameworks able to manage and represent uncertainty, inaccuracy, conflict, etc. This theory is important because of its wealth of tools to manage the various sources of imperfections as well as the specificities of the available data. In the framework of this theory, it is possible to model the data through the construction of mass functions. It is also possible to manage the computational complexity thanks to the approximations allowing to reduce the number of focal elements. Conflict being one of the most present sources of imperfections, can be dealt through the selection of the best combination rule. By merging sources of information with different degrees of reliability, the least reliable source may affect the data issued from the most reliable one. One of the solutions for this problem is to try to improve the performances of the least reliable source. Thus, by merging with other sources, it will provide useful information and will in turn contribute in improving the performance of the fusion system. The performance improvement of an information source can be affected through the correction of mass functions. In this context, the correction can be made based on measures of the relevance or sincerity of the studied source [14] . The confusion matrices present a data source from which meta-knowledge characterizing the state of a source can be extracted [17] . In this paper, the proposed fusion system is a hierarchical fusion system set up within the framework of belief function theory. It allows to merge data from leaves and provides the user with a list of the most likely species while respecting the educational purpose of the application. The computational complexity of this fusion system is quite small allowing, in the long term, to implement the application on a Smartphone. The paper is organized in the following way: Sect. 2 describes the fusion system. Section 3 presents the different mechanisms for correcting the mass functions. A correction method is proposed in Sect. 4 which we have applied to our fusion system. Section 5 shows the improvement brought by this correction on real data. The sub-classification strategy consists on recognizing a leaf through its botanical properties. The application consists of taking a photo of a leaf with a smartphone and the system extracts morphological characteristics (the lobe shape − → A F described by the Polygonal model, the Margin − → A C , the Base and the Apex −−→ A BS ). The fusion system is composed of three classifiers based on random forests that have been previously trained on learning images. As outputs of each classifier S i , we have a distribution of probabilities P j [e i ] in the species referential with e i ∈ Ω = 72 the number of species [1] . The fusion system, shown in Fig. 1 , consists of 3 steps: 1. Mass functions construction: the passage towards the belief functions theory requires the construction of mass functions based on the distributions of probabilities. 2. Mass functions approximation: to reduce the computational complexity. 3. Mass functions combination: in the framework of the belief functions theory, several combination rules exist. Each one allows to manage one of more sources of imperfections. Let Mass functions construction from probability distributions is an important step in the fusion process. Indeed, the mass functions must be sufficiently representative of all available information. In our case, the data obtained at the output of the classifiers are uncertain because the sources have different degrees of reliability and are sometimes conflicting. It is therefore important that the mass functions can represent all these imperfections and take into account as much information as possible [1] . In our case, we have a large discernment framework (Ω = 72 assumptions). To construct 2 72 focal elements and to give a mass to all possible combinations would lead to significant computational complexity. In the literature, there are several methods for constructing a mass function from a probability distribution, the best known of which are: bayesian mass functions and consonant mass functions [2] . We have studied and analyzed these two methods in depth. Following these studies, we found that by building bayesian mass functions, we obtain precise and certain focal elements. The obtained mass functions are representative enough of the available information and the focal elements are singletons which causes the elimination of certain species. Concerning consonant mass functions, we noticed that this method leads to the construction of large focal elements (made up of several species). It has the advantage of allowing a better presentation of all available information. One of the major problems of belief function theory is the computational complexity which is relatively high compared to other theories such as probability theory and possibility theory. Computational complexity increases exponentially with the number of sources to be combined, and becomes even more important when the space of discernment is large. To limit the complexity of calculations, existing approaches rely either on procedures that involve exact calculations or on approximation techniques. In the first category, an optimal calculation algorithm has been proposed by Kennes in [9] . This algorithm makes it possible to considerably optimize the number of addition and multiplication operations when calculating mass functions and to reduce the complexity of the calculations. The disadvantage of this method is that the number of focal elements is not reduced since it is a procedure that performs exact calculations. This can be problematic in our case because we have to manage a large area of discernment and perform the same calculation for several sources of information. The second category of approaches is composed of mass functions approximation procedures. The mass functions approximation procedures allow the reduction of the number of focal elements based on different selection criteria. In particular, we focused on the most commonly used procedures: the "Inner" and "Outer" approximations [4] , the k − l − x [21] and the "Summation method" [11] . The analysis, as well as the experiments carried out, have led to the choice of the summation approximation as the most suitable method for our case since it offers the best relationship between computational complexity and preservation of a maximum of useful information. "Summation" approximation consists of keeping the k − 1 focal elements having the largest mass. The union of the remaining focal elements forms the k th focal element whose sum of their masses is transferred to the element k. The application of this approximation in our specific context is interesting. Unlike the k − l − x approximation, even focal elements with low mass will be kept. Thus, a maximum of information will be transferred allowing more flexibility to the fusion process. In addition, this approximation is simple, and it does not require a lot of calculation operations which reduces the processing time. To better manage the imperfections of the data, several combination rules have been proposed as part of the theory of belief functions. Each rule has been put in place to handle one or more aspects of imperfections. Given the limited and variable performance of the data sources to be combined, we consider that we have to deal with unreliable sources of data. According to Lefevre et al. [10] the combination rules proposed in the literature represent solutions to manage the different sources of conflict. They divide these combination rules into two categories: 1. the first category groups together the combination rules requiring that the sources of information to combine are reliable. 2. the second category groups combination rules where at least one of the sources of information is reliable without necessarily knowing which one. In our case, the main problematic situation we face is the following: -no intersection between 2 focal elements issued from two sources. In that case, our need is to transfer mass to the elements involved in the conflict. -two or more sources propose the true species. In that case, our need is to transfer mass to the intersection. -no intersection between the species proposed by the different sources (total conflict). In that case, our need is to transfer mass to the partial ignorance. After the study and analysis of the different existing combination rules: Dempster combination rule [18] , Dubois and Prade combination rule [6] , Mixte combination rule, Florea and Jousselme combination rule [8] , PCR6 [12] and DPCR combination rule [13] , we found that the DPCR combination rule is the best compromise between a good recognition rate of the first species and the processing time per species (see Table 1 ). Indeed, it transfers the mass to the intersection, the elements involved in the conflict as well as to partial ignorance. It also offers the best compromise between classification ratio and computational complexity. DPCR is called the rule of proportional redistribution of the weakened conflict, this rule was proposed by Martin [13] . It is an extension of the P CR5 combination rule. It consists in applying a discounting procedure to transfer part of the partial conflict to partial ignorance (the union). Since the objective of the application is to present a list of the top 10 most probable species, we are interested in the decision criterion "the maximum of pignistic probability" or BetP which is proposed by Smets in [20] . These decisionmaking criteria makes it possible to transform the mass functions into measures of probabilities for all w k ∈ Ω, but in our case, we work with bayesian bba and in a such a case, we have the same results. Thus, we can to provide the user with an ordered list of the most likely species and not an ordered list of the most likely sets of species. The criterion makes it possible to lift the inaccuracy which can taint the data since we work with consonant mass functions. The dashed colored lines in the Fig. 1 represents the different choices made for the different blocks of the fusion system, which nave been previously explained. The three sources S i are three feature vectors (the Polygonal model, the Margin, the Base and the Apex) that are used as input for the three classifiers (Random Forest). The theory of belief functions provides a flexible framework for modeling and managing information uncertainty. Several tools are proposed in the literature, they allow to modify or correct some of the information based on additional information on the relevance or sincerity of a source. Historically, discounting is the first correction tool proposed in the literature [19] . It is based on the degree of relevance of an information source S to weaken the mass of information it provides [16] . The discounting was developed as part of the Transferable Belief Model (TBM) [20] . Suppose that an information source S provides information represented by a mass function m S and assume that β with β in [0, 1], is the degree of relevance relating to this source S. The discounting is done as follows: Discounting therefore consists of weakening the initial mass function according to its degree of relevance and giving more weight to ignorance. The reverse operation of the discounting, called, de-discounting was introduced in [5] . If after making the discounting it turns out that the information on the relevance of the source was false, the de-discounting allows to find m S (A) from m(A). This correction method uses the overall degree of relevance to weaken the mass functions. However, this measure is insufficient since the degree of relevance may vary from one class to another. Given the quality of the data, it would be more useful to apply contextual correction methods where the weakening of a mass function takes into account each context. In [15] , Mercier et al. studied contextual discounting based on the degree of contextual relevance of an information source. Its application requires the presence of additional information on the relevance of an information source conditionally in different contexts A of Ω such as the union of subsets A form a partition of Ω. Suppose that β A , with β A in [0, 1], is the degree of relevance of a source S in a context A and the union of context elements A form the A partition of Ω. The belief m obtained by taking contextual elements A into account is defined as follows: where ∪ A∈A represents the canonical disjunctive decomposition, A βA represents a simple mass function having as focal elements ∅ and A whose corresponding masses are β A and 1 − β A respectively. A more general case of contextual discounting is discussed in [14] . Its application is simpler and more general since the formation of a partition by A context elements is no longer required. Thus, this correction tool can be applied even if the A context elements do not form a partition of Ω. Suppose β A , with β A the degree of relevance given to a source of information in a context A and that A is the set of contexts for which we know contextually the degree of relevance of the information source. The resulting mass function by taking into account these context elements is calculated as follows: where ∩ A∈A represents the canonical conjunctive decomposition. The contextual discounting discussed in this section is based on contextual elements that provide information about the contextual relevance of the information source. This technique of mass function correction is interesting in our case since the relevance of our classifiers is variable from one species to another. With this technique, it would be possible to introduce additional information on the contextual relevance of each source of information to each species. Contextual discounting can also be applied to contextually modify a mass function based on another measure of reliability: sincerity. It is possible to estimate the sincerity or not of a source of information by assuming that it is reliable. Thus, the correct decisions and incorrect decisions of the source respectively correspond to a sincere or insincere behavior of the source of information. Suppose a source of information is sincere with a β degree and not sincere with a 1 − β degree. According to [7] , the correction of mass functions by taking into account meta-knowledge about the sincerity of a source can be done in 3 different ways: -by weakening the initial mass while giving mass to ignorance as follows: -by weakening the initial mass given to A (∀A ⊆ Ω) while tuning more mass to its complement A (∀A ⊆ Ω) as follows: -by weakening the initial mass by a factor of β = 1 − 2 , where is the classification error rate in the previous Equation, and guarantying mass to ignorance. According to the degree of sincerity of the source of information, the initial mass is weakened while granting mass to ignorance as shown by the Eqs. 5. The Eq. 6 presents a slightly different way since the weakening of the initial mass is coupled to a mass transfer to A which is the complement set of the focal element A in question. This approach is very interesting because it allows a better preservation of the specificity of the information. In the next section, we are particularly interested in this approach and will propose a method to contextually define the complement set A of A. The definition of different forms of non-veracity in [17] made it possible to define a new form of contextual discounting based on the degree of sincerity. Suppose the source S is negatively untrue in A. This new form of contextual discounting is defined as follows: In [14] , Mercier and et al. define a new correction mechanism called contextual reinforcement. Suppose the source S is positively untrue in A. This new form of contextual reinforcement is defined as follows: These two new forms of discounting represented in the Eqs. 7 and 8 are interesting because the discounting takes into account the contextual sincerity of the source of information. Its application requires the setting up of assumptions in relation to the sincerity of the source. So we need meta-knowledge to assert if the source is negatively untrue in A or if it is positively untrue in A. The application of the correction mechanisms requires knowledge of the state of the information source to be processed. This knowledge, called "metaknowledge", is used to characterize the state of a data source: relevant or not, sincere or not. This meta-knowledge is generally uncertain and often presented by mass functions [17] . In the context of classifiers fusion, we are interested in the estimation of meta-knowledge from the confusion matrices obtained for each classifier. A confusion matrix M i = (n kl ) k∈{1,...,K},l∈{1,...,K} is a table characterizing the performance of a data source S i on a test set of K classes (Table 2) . Each line k in the confusion matrix represents the decision made in favor of w lk . Each column l represents the truth w l . Note that n k. = K l=1 n kl represents the number of objects in w k and n .l = K k=1 n kl is the number of objects in w l . The total number of objects is n = K k=1 K l=1 n kl . In [7] , Elouedi et al. define two ways to calculate the contextual relevance for each context: using the percentage of correct classifications for each context, and using a distance to determine the reliability rate of a source in the different contexts. The contextual weakening of a mass function based on its degree of contextual relevance is interesting. In our case, the application of this approach amounts to disjunctively combining the initial mass function with the 72 contextual elements that can be highlighted by the confusion matrix. This is very complicated from a computational point of view since it involves making 72 combinations for each mass function and for each source of information. So, in the following we will use the confusion matrix to estimate the sincerity of the source. It is possible to estimate the sincerity of a source always using a confusion matrix. In [17] , Pichon et al. suggest the three approaches that can be used: generation of a Bayesian mass function characterizing the sincerity of a source, generation of a mass function from the pignistic probability, or generation of a mass function to characterize the sincerity of a source based on the work of Dempster [3] . As we mentioned, we are interested in the weakening of the mass function as proposed in Eq. 10. Indeed, in some applications, it is easy to define the set A complement of A. This is even simpler when it comes to a small discernment space or where relevant knowledge is available (such as expert opinion). In our case, we can assimilate the set A complement of A to the set of all the classes belonging to the discernment space Ω but not belonging to A. Assume that A is formed of 10 species, a mass will be transferred to the complement set A composed of 62 species. The specificity of the information can be lost. For all these reasons, and considering the quality of the processed data as well as the available information, we were interested in defining the A complement set for each context. The proposed correction method consists in reinforcing, contextually, the mass attributed to some classes. This correction requires retrieving information from the test database's confusion matrix to define the complement set A i of each context c i . The proposed approach consists of the following 3 steps: construction of the Bayesian mass functions from the probability distributions obtained at the output of a classifier extraction of contextual meta-knowledges from the confusion matrix. These meta-knowledges allow to define a complement set A i for each context c i contextual use of meta-knowledge to correct the Bayesian mass functions At the output of a classifier, we obtain a probability distribution P i (e k ) between the species. To move from probability theory to belief function theory, we construct Bayesian mass functions and transfer all available information to it as follows: where X is any set containing more than one species. Mass m i equal to the probability P i is given to each element e k of Ω having a non-zero probability. A zero mass is given to all subsets. The purpose of this step is to use the confusion matrix to define, for each context c i , the complement set A i . In our case, there are 72 contexts corresponding to the 72 species forming the discernment space Ω. In a confusion matrix, each line k represents the decision made in favor of w k . Each column l represents the truth w l . By traversing the matrix of confusion line by line without considering the diagonal, it is possible to extract, for a decision w k , all the species in which the truth is found. This set is the complement set A k for a context c k corresponding to w k decision. Let's consider the confusion matrix shown in the Table 3 . Let's analyze the confusion matrix line by line without taking into account the diagonal. The first line corresponds to the decision e 1 . We notice through the third column that the classifier decides e 1 whereas it is e 3 . We thus consider, that for the context c 1 corresponding to the decision e 1 , the complement set is A 1 = {e 3 }. So, the idea is to assign a mass to e 3 each time the classifier decides e 1 . The second line corresponds to the decision e 2 . We notice through the first column that once on 5 the classifier decides e 2 when it is e 1 and through the third column that the classifier decides e 2 when it is e 3 . So a mass will be given to e 1 and e 3 each time the classifier decides e 2 . The third line corresponds to the decision e 3 . According to the first column, once out of 5 the classifier decides e 3 whereas it is e 1 . By going through the confusion matrix line by line, and following the classifier's decision, the three complementary sets corresponding respectively to a decision in favor of e 1 , e 2 and e 3 are obtained: Following the extraction of the complementary sets of each context, the correction of the mass functions is carried out by applying the approach presented in the Eq. 10. Applying this equation, must not change the Bayesian form of the mass function so that we can then build consonant mass functions. The information available will therefore be represented in the best possible way. The initial mass function will be weakened by β and a mass equal to (1 − β)/|A i | is given to each element of the complement set. Depending on how the degree of sincerity β is calculated, the correction can be done in two ways: 1. β = overall degree of sincerity of the information source. 2. β = degree of contextual sincerity. In this case, the value of β is extracted from the diagonal of the confusion matrix for the 72 assumptions. To weaken a mass function, the idea is to contextually vary the value of β. Thus, if the classifier decides that it is the species e 2 , the contextual value of β is that corresponding to the degree of sincerity of the classifier relative to the species e 2 . It is important to note that in our case, we consider that the decision of a classifier corresponds to the species with the maximum mass. Note that for most species, the context value of β is zero. Thus, in these cases, the initial mass function is multiplied by zero and a null mass is given to the complement set. In other words, the mass function becomes zero. In the data set, there is 5104 leaves photos unequally distributed on 72 species. In fact, we have some species that are well presented in the dataset. However, we have some other species that are less presented and some species that have only two exemplary. We used 2572 leaves photos for training and the rest for testing. When the degree of sincerity is global, the value of β is the same for all contexts, while when the degree of sincerity is contextual, the value of β will depend on the context. We notice that in many contexts the value of β is null: 21 contexts for the Polygonal model classifier S 1 , 22 contexts for the Margin classifier S 2 and 34 contexts for the Base and Apex classifier S 3 . Even though contextual values of β usually allow contextual truthfulness to be taken into account in each case, in our case this approach is useless since, in most contexts, β have a null value. In order to evaluate the performance of the classifiers before and after the correction of the mass functions, the idea is to use the 10-fold cross-validation procedure. Meta-knowledge is extracted from 9/10th samples, and the effectiveness of the correction method is tested on the remaining 10. This operation is repeated 10 times, and each time the configuration and test samples are randomly drawn. The end result is the average value of the 10 results corresponding to the 10 repetitions. Figure 2 shows the results obtained before and after the correction of the mass functions of the test database. Since botanists like to see similar species, we have presented the results of the good classification rate according to the number N of species selected. We note a significant improvement in classifier performance for both modalities. Nevertheless, through Table 4 representing the standard deviation for each classifier, we notice that the standard deviation is often important. Indeed, by randomly pulling the configuration data, the sets formed can be quite representative of the variations existing in the data as they can be weakly representative of these variations. The quality of the extracted meta-knowledge, therefore, varies according to the relevance of the configuration database, which subsequently influences the ability of the fusion system to generalize. In this paper, we presented the different mechanisms for correcting mass functions and applied the most appropriate ones for our application. We have shown that the use of the confusion matrix allows us to define meta-knowledge useful for the correction of mass functions. We applied this technique to our merging system and showed that it significantly improved the rate of good classification. Work is in progress on the addition of the bark modality from which we can extract characteristics such as texture, color, ... which would allow us to differentiate between neighboring species. The bark-derived characteristics could serve as a meta-knowledge that we will use to resolve ambiguities when there are significant conflicts on species associated with leaf recognition. This would improve the performance of this system. Influence of basic belief assignments construction on the behaviour of a fusion system for tree species recognition Constructing consonant belief functions from sample data using confidence sets of pignistic probabilities New methods for reasoning towards posterior distributions based on sample data Inner and outer approximation of belief structures using a hierarchical clustering approach Classification using belief functions: relationship between case-based and model-based approaches Representation and combination of uncertainty with belief functions and possibility measures Discountings of a belief function using a confusion matrix Robust combination rules for evidence theory Computational aspects of the mobius transformation of graphs Belief function combination and conflict management A framework for evidential-reasoning systems A new generalization of the proportional conflict redistribution rule stable in terms of decision Toward a combination rule to deal with partial conflict and specificity in belief functions theory Belief functions contextual discounting and canonical decompositions Refined modeling of sensor reliability in the belief function framework using contextual discounting Relevance and truthfulness in information correction and fusion Proposition and learning of some belief function contextual correction mechanisms A Mathematical Theory of Evidence Belief functions: the disjunctive rule of combination and the generalized Bayesian theorem The transferable belief model Approximations for efficient computation in the theory of evidence Acknowledgements. This work is part of ReVeRIES project supported by the French National Agency for Research with the reference ANR-15-CE38-0004.