PII: 0004-3702(95)00009-7


ELSEVI.ER Artificial Intelligence 83 (1996) l-58 

Artificial 
Intelligence 

Measures of uncertainty in expert systems 

Peter Walley * 
Department of Mathematics, The Universiry of Western Australia, Nedlands, WA 6907, Australia 

Received April 1991; revised January 1995 

Abstract 

This paper compares four measures that have been advocated as models for uncertainty in 
expert systems. The measures are additive probabilities (used in the Bayesian theory), coherent 
lower (or upper) previsions, belief functions (used in the Dempster-Shafer theory) and possibil- 
ity measures (fuzzy logic). Special emphasis is given to the theory of coherent lower previsions, 
in which upper and lower probabilities, expectations and conditional probabilities are constructed 
from initia I assessments through a technique of natural extension. Mathematically, all the measures 
can be regarded as types of coherent lower or upper previsions, and this perspective gives some 
insight into the properties of belief functions and possibility measures. The measures are eval- 
uated according to six criteria: clarity of interpretation; ability to model partial information and 
imprecise assessments, especially judgements expressed in natural language; rules for combining 
and updating uncertainty, and their justification; consistency of models and inferences; feasibility 
of assessment; and feasibility of computations. Each of the four measures seems to be useful in 
special kinds of problems, but only lower and upper previsions appear to be sufficiently general 
to model t.he most common types of uncertainty. 

Keywords: Inference; Decision; Prevision; Bayesian theory; Dempster-Shafer theory; Belief functions; 

Possibility theory; Lower probability; Upper probability; Imprecise probabilities; Conditional probability; 

lndependerlce 

1. Introduction 

My aim in this paper is to compare and evaluate mathematical measures of uncer- 
tainty that can be used in expert systems. The measures that I will consider are Bayesian 
probabilities, coherent lower previsions, belief functions and possibility measures. Spe- 
cial emplhasis is given to the theory of coherent lower previsions as this approach has 

* E-mail: peter@maths.uwa.edu.au. 

0004-3702/96/$15.00 @ 1996 Elsevier Science B.V. All rights reserved 
SSDf 0004-3702(95)00009-7 


2 P Walley/ArtQicial intelligence 83 (1996) I-58 

received less attention than the others in the AI literature, although it is (in my view) 
the one that is most widely applicable. On a mathematical level, the other measures 

can be regarded as special types of lower or upper previsions. They can also be given 
the behavioural interpretation of lower or upper previsions. This point of view leads to 
some illuminating comparisons between the theories. 

For example, the four theories differ greatly in the calculus they use for defining, 
updating and combining measures of uncertainty, especially the rules they use to define 
conditional probabilities and expectations and how they model judgements of indepen- 
dence. The rules used in the theory of lower previsions, which are based on a general 

procedure called natural &ension, can be applied to belief functions and possibility 
measures, and thus they can be compared with the rules used in Dempster-Shafer 

theory and possibility theory. 
The paper is not intended to be a survey of the four theories from a neutral position 

and it is certainly not intended to be a wide-ranging survey of mathematical models for 

uncertainty. There is a substantial and quickly growing literature on numerical measures 
of uncertainty in expert systems. References [46,47,88] are good introductions to the 
relationships between probability measures, belief functions and possibility measures. 
Readers can learn more about the subject by consulting the surveys in [ 8,3 1,79,86,89] 
and the proceedings of the annual workshops on Uncertainty in Artificial Intelligence. I 
shall not discuss non-numerical methods of reasoning with uncertainty such as the theory 
of endorsements [ 131, default reasoning and nonmonotonic logics [ 59,63,67,68], com- 
parative probability [ 27,104] and modal logics [59,104]. Surveys of the non-numerical 
approaches can be found in [47,79,86]. 

I take it for granted that most practical reasoning involves uncertainty, partial ignorance 
and incomplete or conflicting information, and that it is often useful to formally model 

the uncertainty. This raises the questions (a) what is the best way to model uncertainty? 
(b) how should we assess, combine and update measures of uncertainty? and (c) how 
should we use the measures to make inferences and decisions? These questions are 
relevant to all kinds of reasoning with uncertainty, not only in expert systems. However 

expert systems are an especially good testing-ground for theories of uncertainty because 
they aim to formalise and automate as much as possible of the reasoning process. As 
an expert system is concerned only with a narrow domain of application, it is possible 
to formulate special assessment strategies, models and patterns of reasoning which are 
appropriate to that application. Many of the relevant uncertainties can be assessed by 
domain experts and these assessments can be encoded in the system. A user may be 
required to supply some further assessments but the expert system should be able to 
guide him through this process. 

The measures of uncertainty that are combined in an expert system may come from 
various sources. Some may be “objective” measures, based on relative frequencies or 
on well established statistical models. (In medical diagnosis, for example, there may be 
data concerning the frequency of a disease in a population, or the statistical association 
between a symptom and a disease.) Other assessments of uncertainty may be supplied by 
the domain expert (e.g., concerning the irrelevance of specific observations in diagnosis, 
or the association between a symptom and a disease when there is little statistical data). 
Indeed several experts may have contributed in building the system. Further assessments 


I! Walley/Art@cial Intelligence 83 (1996) l-58 3 

may be made by the user of the system (e.g. about the uncertainties in the symptoms 
exhibited by a patient). All these measures of uncertainty need to be combined by 
the expert system to make inferences and decisions (e.g. to make a diagnosis for this 
patient). 

What, then, is the meaning of the combined measures of uncertainty? Whose uncer- 

tainties do they measure? It is the user of the system who will act on the conclusions 
of the system, provided he is satisfied that its uncertainty assessments are acceptable 
to him. One can regard the expert system as a consultant that supplies various models 
and assessments, elicits others from the user, combines all the judgements, and finally 
informs the user “if you accept all these judgements then you should draw these con- 

clusions”. So the expert system constructs a single model for uncertainty which the user 
then considers adopting as a model for his own uncertainty and as a basis for action. 
The expert system should be able to justify its assessments of uncertainty and its rea- 
soning procedures, when requested by the user, to make its model and conclusions more 

convincing. It should also be able to modify some of its assumptions and assessments 
if requeste:d by the user. In the end, the uncertainty measures on which conclusions are 

based mus,t be acceptable to the user. 

2. Criteria for evaluating measures of uncertainty 

In the rest of this paper I will compare measures of uncertainty according to the 
following broad criteria. 

(a) Intl?);pretution. The measure should have a clear interpretation that is sufficiently 
definite to be used to guide assessment, to understand the conclusions of the 

system and use them as a basis for action, and to support the rules for combining 
and updating measures. 

(b) Imprecision. The measure should be able to model partial or complete ignorance, 
limited or conflicting information, and imprecise assessments of uncertainty. 

(c) Culculus. There should be rules for combining measures of uncertainty, updating 
them after receiving new information, and using them to calculate other uncer- 
tainties, to draw conclusions and to make decisions. Some justification must be 

given for the rules. Special attention will be given to the rules for computing 
conditional probabilities and expectations from unconditional probabilities. 

(d) Cortsistency. There should be methods for checking the consistency of all uncer- 
tainty assessments and default assumptions used by the system, and the rules of 
the calculus should ensure that the conclusions are consistent with these assess- 
ments. In the Bayesian theory and the theory of lower previsions, the intuitive 
notion of consistency is formalised in mathematical principles of coherence. 

(e) Assessment. It should be practicable for a user of the system to make (and feel 
comfortable with) all the uncertainty assessments that are needed as input. The 
system should give some guidance on how to make the assessments. It should be 
able to handle judgements of various types, including expressions of uncertainty 
in natural language such as “if A then probably B”, and to combine qualitative 
judgements with quantitative assessments of uncertainty. 


P Walley/Artijicial Intelligence 83 (1996) 1-58 

(f) Computation. It should be computationally feasible for the system to derive 
inferences and conclusions from the assessments. 

Readers may wish to add other desiderata to this list. It does seem to me that it 

is essential for a theory of uncertainty to attempt to satisfy all six criteria. We might 
distinguish the first four criteria, which are theoretical, from the last two, which are 
practical. The first four criteria are “theoretical” in the sense that one would expect an 
adequate theory of uncertainty to show that they can be satisfied, irrespective of the 
specific application. The last two criteria are “practical” in the sense that they will be 
satisfied in some applications but not in others, depending on the type of model involved, 
the number of assessments needed, practical constraints of time and computing power, 
and the abilities of the user. 

The two practical criteria are obviously necessary if an expert system is to be im- 
plemented in practice. The first criterion, interpretation, seems essential in order to give 
meaning to, and to justify, the conclusions of the system. The second, imprecision, is 

necessary because partial ignorance and conflicting information are common in practice. 

A calculus is needed in order to derive conclusions from the uncertainty assessments, 
so the third criterion is needed. The fourth criterion, consistency, is needed to avoid 
erroneous and irrational conclusions. See [99] for further justification of these criteria, 

especially (a), (b), (d) and (e). 
There is a striking divergence between workers in probability and philosophy, on 

the one hand, and those in expert systems, artificial intelligence, computer science and 
engineering, concerning their attitude to these criteria. Naturally enough, the second 
group has emphasised practical criteria, especially computation (f), and has mostly 
recognised the need for imprecise measures of uncertainty (b) and natural-language 
judgements (e). They have given much less attention to issues of interpretation (a), 
consistency (d) and justification of the calculus (c) . The literature in probability and 
philosophy pays more attention to these theoretical issues, but less to the practical issues 

of assessment and computation. There is surprisingly little attention, in all the literature, 
to the problem of assessment. 

Much of the work in expert systems is typified by MYCIN, perhaps the best-known 
and most influential expert system, which was designed to help physicians diagnose 
bacterial infections of the blood [ 6,47,80,81]. In MYCIN uncertainty is measured in 
terms of certainty factors. MYCIN does well on the practical criteria (e) and (f), largely 
because of the modularity of its inference system (dependencies amongst variables are 
ignored) and the use of simple rules to combine certainty factors, but it does badly 
on the theoretical criteria (a), (c) and (d), largely because of the lack of a clear 
interpretation of certainty factors and a lack of justification for the rules of combination. 

(An earlier version of this paper [ 1001 contained a detailed comparison of certainty 
factors with other measures of uncertainty but this has been omitted on the advice of 
the referees.) 

Although all six criteria seem essential, I regard the first criterion-the need for a 
clear interpretation-as the most fundamental, because an interpretation is needed to 
support the rules of the calculus, to formulate principles of coherence (consistency), to 
guide assessment, and to understand conclusions. Criterion (a) is therefore a prerequisite 
for criteria (c), (d) and (e). (This point is well made in [ 331.) The importance of 


P: Walley/Art@ial Intelligence 83 (1996) 1-58 5 

interpretation tends to be underrated by workers in expert systems, perhaps because 
it is unclear to them how an interpretation of uncertainty measures could be used to 

derive anrd justify rules for combining and updating the measures. Readers may find it 
illuminating to compare the theories of de Finetti [28,29] or Walley [99], in which 
a behavioural interpretation of linear previsions or lower previsions is used to support 
principles of coherence and hence to derive the entire calculus, with the literature on 
certainty factors and fuzzy logic, in which the rules of the calculus seem quite arbitrary. 

In the following sections I will outline de Finetti’s Bayesian theory and the theory 
of coherent lower previsions, which emphasise interpretation and consistency, followed 
by the Dempster-Shafer theory of belief functions, which emphasises a simple rule of 
combination, and finally consider the possibility measures used in fuzzy logic, which 
emphasises judgements in natural language. Although the four theories differ greatly 
in their interpretation of uncertainty measures, their methods of assessment and their 
calculus, all the measures can be regarded, from a mathematical point of view, as special 
types of coherent lower or upper previsions, and the four theories will be compared from 

this perspective. 

3. Bayesian probabilities 

The most highly developed and best understood theory of uncertainty is the Bayesian 
theory. The book [56] is a good introduction to the theory and [28,29] are wor- 
thy of careful study. In the field of expert systems, [47] is a good introduction and 
[50,60,63,90] are important references. Bayesian probability models have been used 
in quite different ways in the expert systems PROSPECTOR [20,21], GLADYS [91] 
and MUNIN [ 43,47,50]. More recent applications include HUGIN [ 21, PATHFINDER 
[ 371, BAIES [ 141, and [38]. The discussion here will be brief, as the theory and 
its strengths and weaknesses are quite well known. The main aim is to summarise 
the Bayesian approach in a form that will illuminate the comparison with the other 

approaches. 
In the theory uncertainty is measured by unconditional probabilities P(A) or by 

conditional probabilities P (A 1 B), numbers between zero and one which are interpreted 
as fair betting rates. That is, the person whose uncertainties are being modelled is taken 
to be willing to bet on or against event A at rates arbitrarily close to P(A), or at rates 
arbitrarily close to P(A 1 B) on condition that the bet is called off unless event B 
occurs. The key assumption here is that a person should always have the same marginal 
rate for betting on or against an event. This assumption is called the Buyesian dogma of 
precision. Bayesians have given several arguments to support the assumption, but none 
of the arguments is at all compelling; see 199, Chapter 51 for a thorough discussion. 

Apart from countable additivity, all the familiar properties of probabilities can be 
derived from the behavioural interpretation and the dogma of precision, together with 

the coherence principle that there should be no finite combination of acceptable bets that 
is certain to produce a net loss. These assumptions imply that the probability function P 
is a normalised, nonnegative, finitely-additive set function. Further coherence principles 
imply that updated probabilities after observing an event B should agree with conditional 


6 R Walley/Art@cial Intelligence 83 (1996) 1-58 

probabilities P(A 1 B), and that these should be related to unconditional probabilities 
through Bayed rule P( A n B) = P( A 1 B) P( B) . If the unconditional probabilities 
P( A n B) and P(B) have been assessed and P(B) is nonzero then the conditional 
probability P(A 1 B) is uniquely determined through Bayes’ rule. Thus Bayes’ rule can 
be used as a rule for computing conditional probabilities from unconditional probabilities. 
It can also be used as a rule for computing P( A fl B) from P( A 1 B) and P(B) , and 
indeed it is frequently used for that purpose. 

If an unconditional probability measure P is specified, the prevision (or expectation) 
of a random variable X, denoted by P(X), can be computed from P(X) = J XdP. 
When the possibility space fi is finite, P(X) = CoEn P(w)X(w). Again these should 
be regarded as coherence relationships. They can be used to compute the value of P(X) 
from assessments of probabilities, but they could also be used to provide information 
about probabilities from assessments of previsions, as in [ 281. In general, assessments 
of previsions determine upper and lower probabilities rather than precise values. 

If coherent probabilities are specified for all events then, for every random variable 
X, there is a unique value of P(X) that is coherent with the specified probabilities. This 
means that, for Bayesians, no information is lost by modelling uncertainty in terms of 
probabilities; previsions are uniquely determined by probabilities. This result does not 
carry over to imprecise probabilities: upper and lower previsions are not determined, in 

general, by upper and lower probabilities. 

The Bayesian theory of probability is closely related to a theory of decision making. 
Suppose that the utility resulting from each feasible action a can be measured by a 
precise number U(a, w) which depends on the unknown state w. Define the random 

variable X, by X,(w) = V( a, w) . A Bayesian would compute the prevision P( X,), the 
expected utility of action a, for each feasible action, and attempt to choose an action to 

maximise expected utility. 
The Bayesian theory is applied in practice by selecting some events or variables 

whose (conditional) probabilities or previsions can be precisely assessed, adding any 
judgements of independence or exchangeability, and then applying the rules of the 
theory to calculate other (conditional) probabilities or previsions. For example, if 

{A,, AZ,. . . , Ak} is an exhaustive set of mutually exclusive hypotheses and B is an 
observable event, one might make precise assessments of the prior probabilities P(Ai) 
and likelihoods P( B 1 Ai) for 1 6 i < k, and use these to calculate the predictive prob- 

ability P(B) = cff, P( B 1 Ai) P( Ai). After event B is observed, provided P(B) is not 
zero, one might update uncertainty about the hypotheses by calculating their posterior 
probabilities using Bayes’ rule, P(Ai 1 B) = P(B I Ai)P(Ai)/P(B). (There is nothing 
in the Bayesian theory that forces one to calculate P(Ai I B) in this way-it might be 
easier to assess P (Ai I B) directly than to assess the quantities P( Ai) and P( B I Ai) .) 
Uncertain conclusions will typically be expressed in terms of posterior probabilities 
that are conditional on all the available information. When k = 2, Bayes’ rule can be 
written more conveniently in the form p(B) = PA(B), where p = P(Al)/P(Az) is 
the prior odds on Al, p(B) = P(AI I B)/P(Az I B) is the posterior odds on AI, 
and A(B) = P(B 1 Al)/P(B I AZ) is the likelihood ratio generated by B. That is, 
posterior odds = prior odds x likelihood ratio. 


l? Walley/Artijicial Intelligence 83 (1996) l-58 I 

In principle, the Bayesian approach can be applied in any problem involving uncer- 
tainty. In practice, it can be difficult to make the many precise assessments of probabil- 
ities that are needed to determine a complete probability model, and to check that the 
assessments are coherent and that they determine a unique probability model. Unless 
sufficiently many assessments are made, the probabilities of interest will not be precisely 
determined-we obtain upper and lower probabilities [60,62]. But if a larger number 
of assessments are made, so the probabilities of interest are overdetermined, typically 
the assessments will be incoherent. There are also computational difficulties in verifying 
whether a given set of probability assessments and independence judgements is coherent, 
which is equivalent to checking whether a system of linear and quadratic equations has 
a solution. 

To alleviate the difficulties of assessment and computation, the early expert system 
PROSPECTOR incorporated the simplifying assumption that separate pieces of evidence 
are probabilistically independent conditional on the hypotheses of interest, and used 
simple rules (similar to the max/min rules of fuzzy logic) to combine pieces of evidence. 
However these rules are inconsistent with the Bayesian calculus and they can produce 

incoherent probabilities. 
More recently, attention has been directed to special types of models, notably the 

“belief nel works”, “causal networks” or “directed acyclic graphs” studied in [ 50,63,90]. 
These models involve judgements of conditional independence, based on an expert’s 
understanding of the causal relations between variables, which can be represented graph- 
ically by directed trees and which are reasonable in many practical problems. For these 
models, the effort of assessment and computation is greatly reduced: assessments of 
conditional1 probabilities are needed only for the links in the tree, and the effect of 
new information on probabilities can be propagated locally. Belief networks can also be 
elaborated into “influence diagrams” by adding information about possible actions and 
utilities, and thereby used to make decisions. 

Assuming that precise probabilities can be assessed for all events, the rules of the 
probability calculus are uncontroversial. It is the assumption of precision that is unaccept- 
able. When there is little information concerning a possible event A it is inappropriate 
to assess any precise probability P(A). (Suppose, for example, that I produce an urn 
containing coloured balls. Without any further information about the balls, how would 
you assess a precise probability that the first ball drawn from the urn will be red? See 
[ 1011 for discussion of this example.) Similarly the Bayesian approach cannot deal 
with imprecise, qualitative or natural-language judgements such as “if A then probably 
B” [ 1041. 

Conclusion 

The Bayesian theory does very well on criteria (a), (c) and (d), but poorly on 
(b) and (e) . Bayesian probabilities have a simple behavioural interpretation. The rules 
of the probability calculus can be justified through this interpretation, and the rules 
guarantee consistency (coherence). Computations are feasible for some important types 
of models, notably for singly-connected belief networks. The theory is highly developed, 
especially for dealing with judgements of conditional independence [ 631, and useful in 
many practical problems. 


8 I? Walley/ArttjIcial Intelligence 83 (1996) l-58 

The fundamental difficulties with the Bayesian theory concern the dogma of precision. 
Because they demand precise probability models, Bayesians cannot adequately model 
ignorance, partial information, assessments of uncertainty in natural language, or conflict 
between expert opinions. There is a compelling argument for allowing imprecise assess- 
ments of uncertainty (see [99] for detailed discussion), and this has been accepted in 
much of the expert systems literature. In the rest of this paper I consider measures of 
uncertainty which do admit imprecision. 

4. Coherent lower previsions 

This section summarises the theory of coherent lower previsions. The theory is de- 
veloped in detail in [ 991, using mathematical concepts suggested in [ 28,87,1 lo]. For 
related work in expert systems see [ 25,30,35,60,62,66,94,113], but note that these pa- 
pers (except the last one) differ in some important respects from the approach outlined 
here; they emphasise upper and lower probabilities rather than previsions, and they 

adopt a different interpretation which leads to a different concept of independence. 
Fuller discussion of many of the ideas in this section can be found in [99]. 

Assessment 

First consider how an expert system might, ideally, elicit assessments of uncertainty 
from the domain expert and the user. The system should give some guidance on how 

to make the assessments. It might suggest what probabilities could be assessed to 
constrain the probabilities of interest, or what kinds of independence judgements and 

probability models may be reasonable. But the user should not be forced to accept any 
of these judgements or to make assessments of any particular type. For example, the 

system may suggest default assumptions such as conditional independence, but if these 
are unacceptable to the user then the system should be able to operate without them, 
typically obtaining weaker conclusions. 

Generally the system should be able to work with whatever combination of judgements 
and expressions of uncertainty the user is able to make, including precise or imprecise 
assessments of unconditional or conditional probability, judgements in natural language 
such as “A is more probable than B”, “if A then probably B” or “A is very likely”, 
judgements of (conditional) independence, and various other kinds of judgements. The 
system would check whether these judgements are mutually consistent, combine them 
to construct an overall probability model and to draw conclusions about the questions 
of interest, and report the model and conclusions to the user. If the conclusions were 
indeterminate, the user might try to make further assessments in order to make the 
probability model more precise, but he may not always be able to do so. 

The assessment process involves a sequence of judgements. After each judgement, the 
expert system could compute and display summaries of the current probability model, 
and analyse the model to suggest what kinds of judgements should be considered next in 
order to reduce the indeterminacy in inferences or decisions. In the light of the current 
model, the user may choose to reconsider and modify earlier judgements, make further 


P. Walley/Arhjicial Intelligence 83 (1996) 1-58 9 

assessments, update the model to take account of new information, refine or reformulate 
the possibility space, or terminate the process. Each of these steps modifies the current 
model in a simple way: see [99, Section 4.31 for the mathematical details. For example, 
the user may decide to retract some of the assumptions or judgements (a kind of 
nonmonotonic reasoning) because the current model is incoherent or has unacceptable 
implications, he may recognise that his previous possibility space is not exhaustive and 
decide tc’ consider other possibilities, or he may use imprecise probabilistic information 
about one possibility space to provide information about a second possibility space that 
is related. to the first space through a multivalued mapping. 

The key step in this process is the construction of an overall probability model from 
an arbitrary combination of uncertainty judgements. This can be carried out by the 
expert system, without further input from a user, through a mathematical procedure 
called natural extension. 

Interpretation 

Before we can give a formal definition of natural extension we must characterise the 
kind of probability model that we aim to construct. In fact there are several types of 
models for imprecise probabilities that are more or less equivalent, subject to appropriate 

consistency requirements [99, Section 3.81. (Lower and upper probabilities are not an 
adequate model in general, for reasons to be explained later.) The simplest model is a 
lower prevision I’, which is a real-valued function defined on the set C of all gambles. 
A gamble is a bounded mapping from the possibility space of interest, 0 (a set whose 
elements represent possible states of affairs or “possible worlds”), to the real numbers, 
and is interpreted as an uncertain reward in units of utility. 

The interpretation of the quantity p(X) is that you are disposed to pay any price 
less than e(X) for the gamble (uncertain reward) X. Loosely, we may call p(X) a 
supremum buying price for X: it is the supremum of prices which the model asserts 
that you are willing to pay for X. A conjugate upper prevision P is defined by P(X) = 
-p( -X). The interpretation is that you are disposed to sell the gamble X for any 
price greater than p(X). (The theory could be presented equivalently in terms of 7, 
but here we concentrate on P.) The model does not say anything about whether you 
will buy or sell X if the price lies between p(X) and H(X); either course of action 
may be reasonable. Similarly, a conditional lower prevision e( X 1 B) is interpreted as a 
supremum of buying prices for X that you would be willing to pay if you learned only 
that event B has occurred. 

Buying and selling gambles are somewhat artificial activities and they are introduced 
here merely to give a simple interpretation for p(X) and P(X). These quantities also 
have implications in more practical decision problems (outlined near the end of Section 
4), and they could be interpreted in terms of their implications for other types of 
decisions. 

This interpretation of upper and lower previsions is epistemic and behavioural, but 
not necessarily subjective. In some problems, Z’(X) and F(X) can be given a logical 
interpretation, as marginal buying and selling prices that are uniquely determined by the 
available evidence; an expert system that is concerned with a sufficiently narrow domain 


10 I! Walley/Artifcial Intelligence 83 (1996) 1-58 

and that does not require any subjective input from users might encode a system of 
inductive logic [ 1011. The interpretation is epistemic in the sense that upper and lower 
previsions (like all the other measures considered in this paper) reflect a particular state 
of information and will usually change when more information is obtained or when 
information is reassessed. 

The quantities p(X) and P(X) need not be maximally precise. They may be merely 
lower and upper bounds for quantities that could be specified more precisely, in the same 
way that we could give lower and upper bounds for a person’s lifetime, e.g. 400 and 300 
B.C. in the case of Aristotle. The values p(X) and P(X) could be generated by merely 
qualitative judgements, as in the football example below. This should make it clear that, 
contrary to common objections, the specification of P(X) and P(X) does not demand 
“twice as much precision” as a Bayesian model. Note also that, while it may be possible 
to sharpen the assessments of F’(X) and p(X), there is no reason to suppose that a 
precise assessment p(X) = P(X) could be made, any more than Aristotle’s lifetime 
could be represented by a precise point in time. Imprecise (upper and lower) previsions 
may be needed to model incompleteness or conflict in the available information. 

In particular there is no justification for a Buyesiun sensitivity analysis interpreta- 
tion of Z’(X) and F(X), which regards them as lower and upper bounds for some 
underlying linear prevision P(X) that is not known precisely. Upper and lower prob- 
abilities are similarly interpreted as upper and lower bounds for an unknown, precise 

probability value. This interpretation seems to have been taken for granted in most pre- 
vious publications on upper and lower probability, including those in the AI literature 
[ 25,30,35,36,41,60,62,66,94]. Upper and lower probabilities with this interpretation are 
sometimes called “probability bounds ” “probability intervals” or “generalised probabil- , 
ities”. (The term “upper and lower probability” also may be misleading if it suggests 

upper and lower bounds for a precise probability; this seems to be why Shafer [ 711 
preferred the new term “belief function”.) 

The Bayesian sensitivity analysis interpretation is both misleading and unnecessary. 
It is misleading because, in most problems, no useful meaning can be given to the 
“underlying linear prevision”. It is unnecessary because upper and lower previsions can 
be given a direct behavioural interpretation, in terms of buying and selling prices for 

gambles or in terms of their implications in other decision problems, and the behavioural 
interpretation is sufficient to justify the axioms and calculus of the theory. The distinc- 
tion is important because the behavioural interpretation and sensitivity analysis lead to 
different methods for modelling independence and other structural judgements [ 991. 

One of the important contributions of the Dempster-Shafer theory is that it has 
emphasised that belief functions should not be given a sensitivity analysis interpretation 
[ 751. Every belief function can be represented as a lower envelope of a set of probability 
measures. This is merely a mathematical representation, however; it is misleading and 
unnecessary to regard a belief function as a lower bound for some unknown probability 
measure. In the same way, every coherent lower prevision can be represented as a lower 
envelope of a set of linear previsions, but this is no reason to regard the lower prevision 
as a model for partial information about an unknown linear prevision. 

In the theory of coherent lower previsions [ 991, all the axioms and rules are justified 
purely in terms of the behavioural interpretation, without appealing to a sensitivity 


R Walley/Artijicial Intelligence 83 (1996) l-58 11 

analysis interpretation. (This includes the axioms for conditional previsions and the 
rules for conditioning or updating.) So the theory of coherent lower previsions does not 
rely in any way on a sensitivity analysis interpretation, and in this respect it does not 
differ from the Dempster-Shafer theory or possibility theory. 

Of course there are some examples where E does represent partial information about 
an unknown Bayesian probability measure and a sensitivity analysis interpretation is 
justified. The behavioural interpretation still applies in these cases and they can be 
modellecl by coherent lower previsions. But the behavioural interpretation is much more 
general. It can be applied, for example, to belief functions and possibility measures, for 
which the sensitivity analysis interpretation is usually inappropriate. 

This enables us, in Sections 5 and 6, to view multivalued mappings and inexact 
judgements in natural language as sources of coherent lower and upper previsions. A 
behavioural interpretation of these kinds of models need not exclude the interpretations 
they are normally given as “measures of evidential-support” or “degrees of possibility”, 

although it does seem somewhat more definite and useful. 

Coherence 

Coherence of the lower prevision e can be characterised by three axioms: 

(Pl) p(X) 2 inf{X(w): w E 0) for all X E L, 
(P2) p( AX) = Ap(X) for all X E 13, A > 0, 
(P3) p(X+ Y) 2 p(X) +p(Y) for all X,Y E L. 

There are various equivalent characterisations of coherence in [ 991. Wilson and Moral 
[ 1131 have expressed the coherence axioms in the form of a logic, with an associated 
proof theory and semantics, and this may appeal to readers who are familiar with 
classical logic. 

Coherence implies the inequalities (for all gambles X and Y) 

P(X) +P(Y) <p(x+Y) <P(X) +p(Y) <P(x+Y) <P(X) +B(Y). 

The three axioms are consistency requirements which can be justified through the 
behavioural interpretation of L. Axiom (Pl) asserts a willingness to pay any price less 
than the infimum possible value of X for the gamble X. (Recall that X is interpreted 
as an uncertain reward in units of utility.) This is reasonable because such a transaction 

is certain to increase utility. Axiom (P2) says, in effect, that a gamble 2 is acceptable 
if and only if AZ is acceptable for any positive constant A. Thus the acceptability of 
a gamble does not depend on the unit of utility in which its rewards are expressed. To 
justify the superlinearity axiom (P3), consider two acceptable gambles which involve 
paying up to p(X) for X and up to p(Y) for Y. The combined gamble is equivalent to 
paying up to P(X) +P( Y) for X+ Y. Hence E( X+ Y), the supremum buying price for 
X + Y, should be at least E(X) + E( Y). Axioms (P2) and (P3) are compelling only if 
gambles are expressed in terms of a linear utility scale. They would not be reasonable if 
gambles were expressed in monetary units, as the acceptability of a gamble might then 
depend on whether its units were dollars or thousands of dollars and the combination 
of two acceptable monetary transactions might not be acceptable. 


12 P: Walley/Arti&ial Intelligence 83 (1996) 1-58 

In general, conditional previsions will be specified as well as unconditional ones. Co- 
herence axioms that apply to general specifications of conditional previsions are given in 

[ 991. In the simplest case, where unconditional lower previsions &’ and conditional lower 
previsions e(. 1 Bi) are specified for each event Bi in a finite partition {Bt , B2,. . . , Bk}, 
coherence can be characterised in terms of axioms such as 

(Cl) P(X) 2 min{P(XI Bi),P(X 1 &),...,P(X 1 &)}, 
(C2) P(X) 6 m={P(X I h),F(X I &I,. . . ,p(X I Bd}, 
(C3) l’(Bi(X-P(XI Bi)))=Ofori=1,2,...,k. 

The Bayesian theory of de Finetti [28,29] is presented in terms of linear previsions, 
which are simply coherent lower previsions that satisfy the precision condition p(X) = 
p(X) for all gambles X. Linear previsions are maximally precise. At the other extreme 
are the vucuous previsions l’(X) = inf{X( w): w E 0) and F(X) = sup{X( w): w E 
a}, which maximise the degree of imprecision P(X) - p(X) and model complete 
ignorance about a. 

In general, the degree of imprecision in previsions can reflect both the amount of 

information on which they are based and the degree of conflict between different types 
of information (e.g. between the assessments of several experts, or between prior infor- 
mation and statistical data). In turn, greater imprecision in previsions leads to greater 
indeterminacy in conclusions (we may be unable to say which of two hypotheses is 
more probable) and greater indecision (we may be unable to say which of two actions 

is better). 
For example, if several experts assess precise previsions Pt (X) , . . . , P,,(X) then it is 

natural to define p(X) = min{Pi( X) : 1 < i 6 n} and F(X) = max{ Pi( X) : 1 6 i 6 n}. 
Then p is the most precise model that is acceptable to every expert (it represents the 
behavioural dispositions that are common to all the experts, a kind of “consensus”), and 
the degree of imprecision P(X) - p(X) measures the extent of conflict or disagree- 
ment amongst the experts concerning X [ 1031. Other ways of aggregating information 
from several sources are studied in [97; 99, Sections 4.3, 5.3 and 5.4; 1021. The 
combination of assessments from different sources is an important problem in expert 

systems. 
Any coherent lower prevision E can be written as the lower envelope of a closed 

convex set M of linear previsions, i.e. Z’(X) = min{P(X): P E M}, or as the lower 

envelope of the set of extreme points ext M, i.e. E(X) = min{ P (X) : P E ext M}. 
Often the simplest way to characterise the lower prevision E is by specifying the 
probability mass function (or probability density function) of each linear prevision in 
ext M. 

Now we return to the problem of natural extension. Suppose that a user makes finitely 
many judgements of uncertainty. The problem for the expert system is to combine these 
with any other judgements in the system (including those supplied by domain experts) 
to compute a coherent lower prevision P. This can be done in three steps: ( 1) translate 
the judgements into behavioural terms, hence into constraints on E; (2) check that 
all the constraints are mutually consistent; (3) compute the minimal coherent lower 
prevision e that satisfies the constraints. This procedure is illustrated by the following 
simple example. 


P Walley/ArtQicial Intelligence 83 (1996) 1-58 13 

Example 1 (Football game). Consider a football game whose possible outcomes are 
win ( W), draw (D) or loss (L) for the home team. To express his uncertainty about 
the outcome, the user makes the judgements: 

(a) prclbably not W, 
(b) W is more probable than D, 

(c) D is more probable than L. 
To construct the natural extension of these judgements we first translate them into 

behavioural terms. We take (a) to mean that the user is willing to give up one unit 
of utility if W occurs, provided he gains one unit if not W. We will use de Finetti’s 
notation, in which the gamble that pays one unit if event W occurs and zero otherwise 

(the indicator function of W) is denoted also by W. Then (a) yields the constraint 
p( W) < i, or equivalently Z’( D + L) 3 i. According to (b), the subject is willing 
to pay one unit if D occurs, provided he receives one unit if W occurs. This yields the 

constraint Z’( W - D) 3 0. Similarly (c) yields Z’( D - L) 2 0. 
The natural extension of the three judgements can be computed by finding the set M 

of all probability mass functions (w, d, I) that are consistent with the judgements, by 
solving the system of linear inequalities: d + I 2 i, w 2 d, d 2 1, w 2 0, d 2 0, 1 > 0, 
w + d + I = 1. Because this system has solutions, the three judgements are consistent. 
The extreme points of M are the three probability mass functions (3,:) $), (i, 4 , 0) , 
(i, $, a). The lower prevision of any gamble X can then be calculated as the minimum 

expected value of X under these three mass functions. For example, 

EJW)=min{t,&,#}={, F(W)=max{4,i,$}=f. 

Many other kinds of qualitative or quantitative judgements could be added to the three 
we have considered, for example, 

(d) if rtot D then W is very likely, 

(e) W is between 1 and 2 times as probable as D, 
( f) I am willing to bet on L at odds of 4 to 1, 
(g) W has precise probability 0.4. 
Some orher natural-language judgements are considered in Section 6 and further exam- 

ples are given in [ 991. In more complicated problems, one might also make judgements 
of conditional probabilities, conditional independence, positive or negative dependence, 
permutabi‘lity or exchangeability, intervals of measures, upper and lower density func- 
tions or distribution functions or quantiles. All these judgements can be regarded as 

constraints on E. They can be combined (in principle) by natural extension, although 
the linear programming problem may become intractable when many assessments are 
made, and then it may be necessary to restrict attention to special types of assessment or 
model. Note especially that ordinary-language judgements such as (a), (b) and (c) can 
be combined with numerical assessments; the two types of judgement occur together in 
many applications. 

How would Bayesians deal with the football example? They would require a user to 
make more precise assessments in order to determine a single probability measure that 
is consistent with judgements (a), (b) and (c). But he may be unable to go beyond 
these qualitative judgements, except by choosing arbitrary numbers which would not 


14 P. Walley/Arti$cial Intelligence 83 (1996) 1-58 

reflect his state of uncertainty about the game, because he lacks either information about 
the game or expertise in assessing probabilities. 

One popular method for selecting a unique probability measure from M is by max- 
imising entropy. This gives the mass function ($, 3, 3) in the football example. But 
there are alternative methods, such as assigning a second-order probability distribution 
on M, which yield different answers, and any choice of a single probability measure 
seems arbitrary. Note that maximum entropy does not distinguish the partial information 
provided by the three judgements from complete ignorance: the same probability model 

would be selected if we had no information at all about the game. No precise probability 
measure can reflect the imprecision of the three judgements. For discussion of maximum 

entropy, see [ 4232,991. 

Upper and lower probability 

By applying the behavioural interpretation of lower and upper previsions, the lower 
(or upper) probability of event A can be interpreted as specifying acceptable rates for 
betting on (or against) A. Consider a choice of whether to bet on or against A at betting 
rate x, meaning odds of x to 1 - x on A. You will bet OIE A if x is less than P(A), you 
will bet against A if x is greater than F(A), and your choice is not determined (it may 
be reasonable to choose either way) if x is between &A) and p(A). 

Upper and lower probabilities have the basic properties E(8) = p(8) = 0, where 0 
denotes the empty set, P( 0) = p( 0) = 1, p(A) = 1 - E( AC), where AC denotes the 
complement of A (hence upper probabilities are determined by lower probabilities, and 
vice versa), 0 < l’(A) 6 p(A) < 1, and 

P(A)+P(B) <P(AUB) <P(A)+P(B) <P(AUB) <B(A)+B(B) 

when A and B are disjoint. 
These properties are necessary but not sufficient for coherence. The further property 

of 2-monotonicity, that p( A U B) +F’( A JIB) 2 c(A) +F’( B) for all events A and B, is 
sufficient for coherence but not necessary. Some of the mathematical theory of coherent 
lower probabilities simplifies when the lower probabilities are 2-monotone (also known 

as “Choquet capacities of order 2”); see especially the simple formulas for conditional 
probabilities and expectations at the end of this section, and the theory in [40,96,102]. 

One way of constructing a lower prevision R is to assess upper and lower probabilities 
p(A) and l’(A) for all events A and to construct _P_ by natural extension. Not all 
coherent lower previsions can be constructed in this way, because many coherent lower 
previsions can generate the same upper and lower probabilities. 

In the football example, suppose that the initial assessments are upper probabilities 
i, i, 4 and lower probabilities f , a , 0 for the events F D, L respectively. (These are 
the upper and lower probabilities generated by the three judgements considered earlier.) 
Their natural extension to a lower prevision &, calculated as in the previous example, is 
the lower envelope of the five probability mass functions (i,:, i), (i, i,O), (4, a, f), 

(i, i, i) and (5, i, 3). This is different from the lower prevision Et constructed earlier, 

e.g. if (X(W),X(D),X(L)) = (l,-1,O) then P,(X) =0 but&(X) =-i, although 
e, and & generate the same upper and lower probabilities for all events. 


l? Walley/Art$cial Intelligence 83 (1996) 1-58 15 

Thus lower previsions (defined on all gambles) contain more information than lower 
probabilities (defined on all events), and information may be lost when uncertainty is 
modelled solely in terms of upper and lower probabilities. That is why we take lower 
prevision to be the fundamental model, rather than lower probability. Lower (or upper) 
previsions are more expressive than lower (or upper) probabilities. 

The extra information in lower previsions is often crucial in defining conditional 
upper and lower probabilities. Without it the conditional probabilities may be excessively 
imprecise. That can be seen in the football example. Taking B = {K D}, the first (lower 

prevision) model gives p(W 1 B) = $ and F(W 1 B) = 3, whereas the second (lower 

probability) model gives less precise values E( W 1 B) = $ and F( W 1 B) = 3. (In both 
cases the conditional probabilities were calculated by conditioning the extreme points 

and finding upper and lower envelopes. This agrees with natural extension.) 

In general, conditional upper and lower probabilities are not uniquely determined by 
unconditional upper and lower probabilities through the coherence axioms. But con- 
ditional upper and lower probabilities and previsions are uniquely determined by un- 
conditional lower previsions, through axiom (C3), provided the conditioning event has 
nonzero lower probability. 

Another reason for regarding lower previsions as fundamental is that they can often 
be assessed more easily than lower probabilities. If we are primarily concerned with 
a real-valued quantity whose value is uncertain, such as a person’s age, it may be 
easier to assess upper and lower previsions for the quantity rather than upper and lower 
probabilities for the events that the quantity belongs to particular sets. Lower previsions 
are also needed when we start by assessing upper and lower probabilities, then observe 
an event and condition by natural extension; again information may be lost if only the 

updated upper and lower probabilities are reported [41]. 
Thus upper and lower probabilities are often inadequate models for uncertainty. This 

is a defect of all theories of upper and lower probability, including the theory of belief 
functions and, on the interpretation adopted here, possibility theory. A more general 
theory, dealing with upper and lower previsions or an equally general alternative, is 
needed. In particular, upper and lower probabilities are inadequate for modelling natural- 

language mdgements of uncertainty. 

Natural extension 

Now consider the general procedure of natural extension. Suppose that a user makes 
finitely many judgements which can be translated into constraints on conditional lower 
previsions, say p(Xi I Bi) > pi for 1 < i < k, where ,zi are specified real numbers. 
(The case of unconditional previsions is included by taking Bi = R, and the case of 
precise judgements P( X I B) = p by taking p(X 1 B) > ,x and J’( -X I B) 2 -p.) 
Then the natural extension to any conditional lower prevision P( X 1 B) can be computed 
by solving a linear program, using the formula 

k 

,UU: B(X-p) 2 CAiBi(Xi-pi) for some A; 20 (1) 
i=I 


16 t! Walley/Art@cial Intelligence 83 (1996) l-58 

where Y > Z denotes Y(w) > Z(w) for all w E 0, B and Bi stand for indicator 
functions, and p and pi denote constant gambles. This definition of natural extension 
can be justified through the behavioural interpretation of lower previsions: if you are 
willing to pay up to ,ui for Xi conditional on Bi then, by combining positive mul- 
tiples of such gambles, I can induce you to pay up to p(X ( B) for X conditional 
on B. The resulting value of p(X 1 B) is finite provided the initial judgements are 
consistent in the sense that they “avoid sure loss”, a much weaker requirement than 

coherence [ 991. 
Natural extension summarises all the inferences that can be derived from the initial 

judgements through the rules of coherence. In fact, the natural extensions e are the 
minimal lower previsions that satisfy the initial constraints and are coherent. There may 

be other coherent lower previsions p’ that satisfy the initial constraints, but they must 
dominate the natural extensions in the sense that E”( X 1 B) 2 I’( X 1 B) for all gambles 
X and all events B, and therefore they incorporate additional information that is not 
implied by the initial set of judgements. For example, Dempster’s rule of conditioning 
often disagrees with natural extension because it implicitly involves judgements of 
conditional independence that are not implied by the initial belief function and which 
may not be consistent with the initial belief function. 

Natural extension is a very general method of inference. Indeed the following impor- 
tant constructions can be regarded as special types of natural extension: construction of 
(upper and lower) expectations from probabilities; construction of conditional (upper 
and lower) probabilities from unconditional ones; the “Fundamental Theorem of Proba- 
bility” of de Finetti [ 281; construction of inner and outer measures; construction of joint 
probabilities from marginal and conditional ones; and construction of probability models 
from qualitative judgements. The initial constraints are quite general and this allows a 
user to express his uncertainty in whatever forms are most convenient, e.g. through 
qualitative judgements such as the natural-language expressions listed in Section 6, or 
through a combination of qualitative and quantitative judgements. The results of natural 
extension are also very general; it can be used to construct a lower prevision p( X 1 B) 
for any conceivable gamble X and event B, hence to construct upper and lower proba- 
bilities and preferences between gambles. Inferences can be made by computing lower 
previsions of important variables conditional on all available information, and decisions 

by computing lower previsions of differences between utility functions, both of which 
are linear programming problems. 

Alternatively the natural extension can be computed, as in the football example, by 
solving a set of linear inequalities to obtain the extreme probability mass functions that 
are consistent with the initial constraints, and then forming their lower envelope. This is 
the dual linear programming problem. In the special case where all the initial probability 
judgements are precise and no independence judgements are made, it is equivalent to 
what is known as “probabilistic logic” [60,62]. 

This alternative procedure breaks down when independence constraints are included 
because these are nonlinear. On the behavioural interpretation of lower previsions, a 
judgement that two events are independent means that the upper and lower probabilities 
of one event would not change if you learned whether or not the other event occurred. 
This is quite different from judging that the events are independent under some Bayesian 


P Walley/Artijicial Intelligence 83 (1996) 1-58 17 

probability measure that satisfies the other constraints, which would be the appropriate 
definition of independence under a Bayesian sensitivity analysis interpretation [ 1051. 

C0nside.r the simplest case where upper and lower probabilities are assessed for two 
events A and B that are judged to be independent, say P(A) = cx, p(A) = 6, p(B) = p 
and p(B) = j?. Then the behavioural interpretation of independence imposes the 12 
constraints P(A) 2 g, F’( A 1 B) 2 ct, l’(A 1 BC) 2 cy, . . ., P( BC 1 A) 2 1 - fi, 
Z’( BC ) AC) 3 1 - p, and the natural extension of the judgements can be computed by 

natural exlension of these constraints, using E!q. (1). 
Bayesian sensitivity analysis would model the judgements in a different way, by 

finding the: set of joint probability measures P which satisfy the constraints g < P(A) < 

5, j? 6 P(B) < fi and P(A n B) = P(A) P( B). (The independence constraint is 
nonlinear and this makes the computations difficult in more complicated examples.) 
Upper and lower previsions are then defined as upper and lower envelopes of the set of 
solutions. The two approaches do produce different numerical answers; see the example 
of two unreliable witnesses in Section 5. 

All previous work in AI, e.g. [ 25,30,35], seems to have taken the sensitivity analysis 
definition of independence for granted, without considering the behavioural definition. 
A comparative study of the two approaches is needed, with regard to both interpretation 
and computational methods. 

Using the behavioural definition of independence, the computation of natural extension 
is more complicated when many judgements of conditional independence, or other 

structural ‘constraints on lower previsions, are made. Computations can be carried out, 
in general, by solving a finite sequence of linear programs with progressively stronger 
constraints. At each stage the independence and structural constraints are applied to the 
conditional lower previsions that were produced by natural extension in the previous 
stage, giving a stronger set of constraints for the next application of natural extension. 
It is not yet clear whether this is feasible for complex problems with moderately large 
numbers of conditional independence judgements, or for the types of belief networks 
that have been studied by Bayesians. 

There has been a substantial amount of work in recent years concerning the propaga- 
tion of upper and lower probabilities [ 25,30,94] or convex sets of Bayesian probability 
measures [ 10,941 using the sensitivity analysis definition of independence. 

Natural extension can be applied to a completely arbitrary collection of assessments, 
but the linear programming computations will become impracticable when the number 
of assessments is sufficiently large, especially when many independence judgements 
are made. (Of course Bayesian computations also become intractable in unstructured 
problems.:1 When this happens, several approaches might be considered. 

First, we could use simpler formulas, involving only local computations, to give 
lower bounds for E’( X 1 B) and upper bounds for Ti( X 1 B). Some examples of 
such formulas are given in the following subsections (see also [ 66,941) . This approach 
produces conclusions that are always valid but less precise than those produced by natural 
extension. Alternatively we might try to approximate p(X I B) without necessarily 
finding a lower bound. Some approximation methods are suggested in [60] for the case 
in which ,a11 the probability assessments are precise. This approach is less appealing 
than the first as it may produce invalid conclusions. 


18 P Walley/Art$cial intelligence 83 (1996) l-58 

Finally, the approach that seems most likely to be useful in practical problems is to 
develop special types of imprecise probability models, such as the 2-monotone lower 
probabilities or models defined in terms of upper and lower density functions or mass 
functions, for which natural extensions can be computed explicitly, without linear pro- 
gramming. Examples are given in the following subsections. 

The expert system INFERNO [66], designed to diagnose faults on oil rigs, uses 
upper and lower probabilities to measure uncertainty but differs in two important ways 

from the approach suggested here. First, INFERNO works by propagating simple con- 
straints on upper and lower probabilities. This has the advantage of simplifying and 
localising computations. However, because the propagation method and constraints are 
much weaker than those given by natural extension, the conclusions of the system, 
while valid, may often be too weak to be useful. Second, the system provides no way 
of updating probabilities by conditioning on new evidence. Any new information can 
only be regarded as specifying further constraints on a fixed, unconditional probability 

measure. 

Calculus 

All the rules of the theory follow from the principles of coherence and natural 

extension, and they can therefore be justified through the behavioural interpretation of 
lower previsions. An important example is the generalised Bayes rule (GBR) : when 
p is a coherent lower prevision defined on a sufficiently large set of gambles and 
p(B) > 0, the conditional lower prevision !‘(X 1 B) is the unique solution n of the 
equation &‘( B( X - x) ) = 0. This equation can sometimes be solved explicitly, and 
otherwise by simple iterative algorithms. (For example, take xa to be any estimate of x, 
and define asequence ofestimates by x,,+t =x,+2f’(B(X--x,))/(p(B) +P(B)). 
Then the sequence converges to the solution x, and the error is bounded by CCP where 
a = (p(B) - I’(B)) /(P( B) + f’(B)) .) Thus conditional previsions are uniquely 
determined by unconditional ones, provided the lower previsions involved in the GBR 
have been specified and J’(B) > 0. The GBR can be used to update the initial prevision 

p after learning that event B has occurred. When p is a linear prevision, the GBR 

reduces to P (X 1 B) = P (BX) /P(B) , a version of the usual Bayes’ rule. 
As another example of natural extension, suppose {Al, AZ,. . . , Ak} is an exhaustive 

set of mutually exclusive hypotheses, B is an observable event, and upper and lower 
probabilities P(Ai),I’(Ai),P(B 1 Ai),P(B 1 Ai) are assessed for 1 < i < k. (This 
generalises the Bayesian model considered in Section 3.) Assume that the assessments 
satisfy the coherence conditions 0 < p( B 1 Ai) < P( B 1 Ai) 6 1, 0 < p( Ai) < 
P(A;) < 1, and F(Ai) +Cj#eP(Aj) < 1 < P(Ai) +C,P(Aj) for every 1 < i < k. 
We might wish to calculate the natural extensions of these assessments to predictive 
probabilities J’(B) and P(B), and to posterior probabilities P(Ai I B) and P( Ai I B). 

To do so, it is convenient to define the extremal probability measures P which satisfy 
l’(A,i) < P( A,/) < B(A,i) for 1 Q j < k. Any ordering of the hypotheses, denoted by 

A,,,&,..., A,,, determines an extremal P as follows. Set P(A,) equal to p( A,j) if 

j < r, equal to p( A,j) if j > r, and intermediate between e( A,,,) and F( A,,, ) if j = r, 

where the values of r and P( A,,,) are determined by $=i P(A,,) = 1. 


P Walley/Art@cial Intelligence 83 (1996) l-58 19 

Using [99, Theorem 6.7.21, the natural extension can be written as P(B) = 

C;=l P(13 I Aj)Pl (Aj)T a weighted mean of the values P( B 1 A,i), where Pt is the 
extremal probability measure determined b 

r- 
ordering the hypotheses to have decreasing 

values of F’( B 1 Al). Similarly fr( B) = cj=, P (B 1 Aj) P2( Aj), where P2 is defined by 

ordering the hypotheses to have increasing P( B I Aj). (These and the following for- 
mulas simplify when Z’( A,!) = P( Al) for all j, as then PI (Aj) = P2( Aj) = e( Aj) .) In 

general J?(B) is bounded by C;=t P(B I Aj)P(Aj) < P(B) 6 & P(B I A,/)P(Aj), 
but these bounds may be far from sharp. 

Using a result from [99, 8.5.41, the posterior lower and upper probabilities after 
observing B are 

PC4 I B) = 
P(B I Ai>P3(Ai) 

P(B I Ai>& + Cj#P(B 1 AjJP3CA.i) ’ 
(2) 

where 91 is defined by setting nl = i and ordering the other hypotheses to have increasing 

p( B 1 A,i), and 

F( Ai 1 B) = - 
P(B I Ai)fi(Ai) 

f’(B I A,)P4(Ai) + Cj#iP(B I Aj)fi(Aj) ’ 

where P4 is defined by setting nk = i and ordering the other hypotheses to have decreasing 
P( B 1 Al). In this problem the computations are quite simple. If needed, simpler bounds 
could be obtained by substituting either p( AJ) or P( Aj) for P3 (Aj) and Pd(Aj) in 
Eqs. (2) and (3). A special case of these results was used in [25]. More general 
models involving belief networks are studied in [ 941. Other results, allowing general 
prior upper and lower previsions concerning the hypotheses Aj, are in [ 96,991. 

These formulas simplify in the case where there are only two hypotheses, Al and A?. 
Let p = 7’( Al > /I’( AZ) and p = &( Al ) /P( AZ) denote the ptior upper and lower odds 

on AI, and let x(B) = F(B-1 Al)/l’(B I AZ) and A(B) = P(B I Al)/itr(B I AZ) be 
the upper and lower likelihood ratios generated by B. The posterior upper and lower 
probabilities of the hypotheses are determined by the posterior upper and lower odds 
on A 1, which are given by the multiplicative formulas 

F(B) = 
F’(A, 1 B) _- 
f’(A2 1 B) =pA(B)’ 

These generalise the Bayesian formula: 

Conditional probabilities 

P(AI I B) 
p(B) = = 

P(A2 I B) 
= phW. 

posterior odds = prior odds x likelihood ratio. 

Suppose that coherent upper and lower probabilities, P and p, are specified for all 
subsets of a, and that we wish to construct conditional upper and lower probabilities 
B(. ) B) and F’(. ( B), e.g. to update beliefs after observing the event B. This problem 
is of interest because the other theories examined in this paper (Bayesian, belief func- 
tions and possibility theory) attempt to define conditional probabilities and expectations 
in terms of unconditional probabilities. (This seems to be a hangover from the classical 


20 P. Walley/Art@cial Intelligence 83 (1996) 1-58 

theory of probability.) It is important to recognise, however, that this is a special case 
of a much more general problem. In general we will make whatever probability assess- 
ments we can and use natural extension to construct other probabilities and previsions; 
it may often be easier and more informative to assess some conditional probabilities 
and previsions directly than to assess all unconditional probabilities (e.g. consider the 
problem in the previous subsection). So the problem considered here, in which uncondi- 
tional upper and lower probabilities are assessed for all events but no other assessments 
are made, should be regarded as atypical. 

Let B be any subset of 0 such that f’(B) > 0, and let M denote the set of all 
probability measures P such that P(C) > p(C) for all C C 0. Then lower probabilities 
conditional on B can be computed by applying the general formula for natural extension, 

giving 

P(A 1 B) =SUp 
{ 

PU: B(A-,u) > kA,i(Ai-P(Ai)) 
i=l 

for some n 2 0, Ai G 0, hi 2 0 
1 

=inf{P(A fl B)/P(B): P E M}. (4) 

Provided fi is finite, P(A 1 B) can be computed by linear programming techniques. 
(Much simpler formulas can be used in the special case where E is 2-monotone, 
discussed below.) A more general formula, which applies whenever P(B) > 0, is 

&(A ) B) = inf{P(A fl B)/P(B): P E M, P(B) > 0). 

The conditional upper probabilities are defined by 

F(A 1 B) = sup{P(A n B)/P(B): P E M, P(B) > 0). 

The conditional probabilities p( e 1 B) and P(- 1 B) defined by these formulas are 
always coherent lower and upper probabilities. Thus conditioning by natural extension 

preserves coherence. Moreover, &a 1 B) and B( - 1 B) are always coherent with the 
unconditional probabilities e and P. We will see that the families of 2-monotone lower 
probabilities, belief functions and possibility measures are also closed under conditioning 

by natural extension. 
Provided p(B) > 0, the conditional probabilities satisfy the following inequalities: 

P(A n B) 
<P(A I B) < min 

J’(AnB) F(AnB) 

P(AnB)+P(ACnB> F’(B) ’ P(B) ’ 

P(AnB) 
>P(A I B) 2 max 

P(AnB) B(AnB) 

P(AnB)+lJACf7B) Z’(B) ’ B(B) > ’ 

These hold whenever p and P are coherent, but the lower bound for P(A I B) is 
actually achieved whenever p has the stronger property of 2-monotonicity. In that case 
the natural extensions are 


R Walley/Artifcial Intelligence 83 (1996) I-58 21 

&PI 1 B) = 
Pt.4 n B) 

P(AnB) +P(ACnB)’ 

P(P, 1 B) = _ 
F(AnB) 

P(AnB) +P(ACnB)’ 
(5) 

provided Z’(B) > 0 [9,15,23,96]. (The same formulas apply when F(B) > 0 and 
E(B) = 0, provided the denominators are nonzero; set P(A 1 B) = 1 if the first 
denominator is zero, and H(A 1 B) = 0 if the second denominator is zero. This case 
is uninteresting because &A ] B) = 0 and p(A 1 B) = 1 for every A that satisfies 
p( A n B) > 0 and F( AC n B) > 0.) Numerical examples of this rule will be discussed 
in Sections 5 and 6. It has been generalised in [96,102,108] to characterise the posterior 
probabilities generated by a statistical likelihood function and prior lower probabilities 
that are Z!-monotone. 

When p is 2-monotone, the conditional lower probabilities P(. 1 B) are 2-monotone 
as well as coherent. Thus conditioning by natural extension preserves 2-monotonicity. 

(See [96] for a proof.) 

Expectations 

Again suppose that coherent upper and lower probabilities are specified for all sub- 
sets of 0. We wish to construct upper and lower previsions or “expectations”, F(X) 
and P’(X) , for gambles (bounded random variables) X. For example, in order to make 
decisions we would need to compute upper and lower previsions of differences between 
utility functions. But again we must point out that the case considered here, in which 
unconditional probabilities are assessed for all events but no other previsions are di- 
rectly assessed, is atypical; especially as some judgements, such as the natural-language 
judgements in the football example, cannot be modelled adequately in terms of upper 
and lower probabilities. 

Again the upper and lower previsions can be computed from the general formula for 
natural extension ( 1)) giving 

p(X)=SUp ,UU: X-p>k&(Ai--P(Ai)) forsomen>O, AiG 0, Ai>O 
i=l 

= inf (6) 

P(X)=inf /LU: X-/L<eAi(Ai-P(Ai)) forsomen>O, AiLa, Ai>O 
i=l 

= sup {I XdP: P E M . > (7) 
Again these formulas simplify in the special case where the lower probability E is 

2-monotone. Define F;x and &, the upper and lower distribution functions of X, by 


22 R Walley/Art@cial Intelligence 83 (1996) 1-58 

Fx(x) = P({,: X(w) < x)1 and &(x> = p( {w: X(o) < x}). Provided c is 
2-monotone, the natural extensions can be written as Choquet integrals [96] 

M 02 

P(X) = s xdFx(x), P(X) = s x@,(x). (8) 
-co --oo 

Decision making 

Upper and lower previsions are used to make decisions in the following way. Suppose 
we need to choose an action from a finite set of possible actions {al, a2,. . . , ak}, where 
the utility U( a, w) of action a depends on the unknown w. (We assume that utilities 
are specified precisely; otherwise the decision problem is more complicated.) Define a 
gamble Xj by Xj( w) = U(aj, W) for each j = 1,2,. . . , k. TO compare two actions ai 
and a,j we compute the upper and lower previsions P( Xi - Xj) and p( Xi - X,j) based 
on all available information. Then action ai is preferred to aj if &(Xi - Xj) > 0, a,j is 

preferred to ai if P(Xi - X,j) < 0, and if neither condition holds there is insufficient 
information to determine a preference. Say that action ai is maximal if there is no other 
action aj that is preferred to ai. The non-maximal actions can be eliminated, and it is 
reasonable to choose any of the maximal actions. For more details and applications to 
real decision problems see [99, Sections 3.9 and 5.6; 101; 1031. 

This method produces only a partial preference ordering in general, and there may 
be more than one maximal action. Methods for generating complete preferences are 

discussed in [57,85,99]. All such methods seem somewhat arbitrary. It should be ac- 
knowledged that, when probability judgements are imprecise, there may be more than 

one reasonable course of action. 

Imprecise conclusions 

The inferences produced by natural extension are often imprecise. That can be seen in 
simple examples, e.g. if A and B are logically independent events and an expert assesses 
that each has precise probability i but makes no judgement about their degree of depen- 
dence, then the natural extension to upper and lower probabilities for their intersection 
is p( A n B) = i, E( A n B) = 0. When natural extension is used to compute conditional 
upper and lower probabilities from unconditional ones, the conditional probabilities may 
be highly imprecise, especially when the initial probabilities are 2-monotone. (Two ex- 
amples are discussed later in the paper.) In such cases, other methods of conditioning 
such as Dempster’s rule often produce conditional probabilities that are more precise. 
Advocates of the alternative methods have suggested that natural extension tends to 
produce excessive imprecision. 

Wilson and Moral [ 1131 have given an interesting example of this, also discussed in 
[ 1201. Suppose that an expert makes two assessments of conditional lower probabilities: 
E( B 1 A) = 1 and Z’(C 1 B) 2 0.999, where A, B and C are logically independent 
events. What do these judgements imply about p(C 1 A)? 


P WaNey/Arttj?cial Intelligence 83 (1996) 1-58 23 

It may seem that we can make a fairly strong inference in this case, because the 
second judgement is “close to” P(C 1 B) = 1 which, together with P(B 1 A) = 1, 
would yield the inference E( C 1 A) = 1 by natural extension. (To see that, apply the 
general formula for natural extension ( 1), and use A n Cc C (A rl BC) u (B n Cc) to 
show that A (C - 1) 2 A (B - 1) + B (C - 1) .) However, natural extension of the expert’s 
judgements produces only the trivial inferences p( C 1 A) = 0 and P( C 1 A) = 1. 

These inferences may indeed seem “excessively imprecise”, but I believe that they 
are reasonable because nothing more can be derived from the two explicit judgements 
without making further assumptions. Suppose that an alternative method of inference 
produced lthe conclusion p(C 1 A) 2 S from the two judgements, where 6 is a specified 
positive number. The expert might then make some further assessments about the events, 
Suppose he makes the further judgements that P(A) > 0.001 and P(A rl C) = 0. 
These judgements are perfectly consistent with the two initial judgements, since all four 
judgements are consistent with a Bayesian probability measure P that satisfies P(B) = 1, 

P(AflB) =P(A) =O.OOl,P(BnC) =P(C) =0.999andP(ArlC)=O.Howeverthe 
new judgements are not consistent with the inference p(C ( A) > S because they imply 
F(C I A) = 0. This shows that the inference goes beyond the information contained 
in the two initial judgements; it relies on extra (implicit) assumptions which may be 

inconsistent with the expert’s other beliefs. 
This argument can be generalised as follows. Let Dt and V2 denote two sets of 

judgements and let Et denote a set of inferences produced from Dt by applying the 
rules of the calculus. Then we require the following consistency principle: if the overall 
set of judgements (Dt U 2)~) is consistent then (Et U VT) should be consistent. 

The force of this principle depends on the technical meaning that is given to “consis- 
tency”. In the theory of lower previsions “consistency” is identified with “avoiding sure 
loss”, the rules of natural extension do satisfy the consistency principle, and they are 
the strongest rules which do so. That is, the inferences given by natural extension are 

the most precise inferences possible, if the consistency principle is to be satisfied. 
It seems, therefore, that natural extension produces exactly the inferences that are 

implied by the explicit judgements and assumptions. Inferences may be excessively 

imprecise, not because natural extension is the wrong method of inference, but rather 
because the judgements and assumptions are excessively imprecise. Indeed the compu- 
tation of natural extensions will often reveal indeterminacy in conclusions or decisions 
that compels us to make our judgements more precise. For example, the problem that 
unconditional upper and lower probabilities tend to generate very imprecise conditional 
probabilities can sometimes be resolved by assessing upper and lower previsions, which 
are more informative than probabilities and therefore generate more precise inferences. 

One way to sharpen inferences in many problems is to add judgements of condi- 
tional independence or dependence. Consider again the judgements P(B I A) = 1 and 
p( C I B) 2 0.999. The natural extensions must satisfy the coherence conditions 

P(CIA)>P(BnCIA)>R(CIAnB)P(BIA), 

hence p( C I A) 2 p(C ) A n B) in this case. This is useful only if we can relate 
p(C I An B) to P’(C I B). One way to do so is to judge that A and C are conditionafly 
independent given B, which gives p(C I A n B) = p(C 1 B) 2 0.999 and hence 


24 F! Walley/Art$cial Intelligence 83 (1996) l-58 

I’(C 1 A) 3 0.999, a very strong conclusion. The same inference is produced by the 
weaker judgement that A and C are nonnegatively correlated conditional on B, so that 
P(C 1 A n B) > p(C 1 B). Either condition suffices to rule out the possibility that 
A and C are essentially incompatible events, and this must be ruled out before any 
nontrivial inference can be obtained. 

Another way to sharpen inferences in this problem is to make a further numerical 
assessment of &(A 1 B). Coherence requires that 

p(C 1 A) Zp(C IAnB) aP(AnC 1 B)/(fJAnC 1 B) +F(AnCC 1 B)). 

Here 

Hence P( C I A) 2 1 - O.OOl/p( A I B), and any assessment of f’(A I B) greater than 
0.001 will produce a nontrivial lower bound for &( C 1 A). For example, the judgement 
that Z’( A 1 B) 2 0.01 gives the strong conclusion E( C I A) > 0.9. 

It appears that, in moderately complex expert systems, judgements or assumptions 
of conditional independence are needed to reduce the effort of assessment and produce 
useful conclusions, Many expert systems use expert knowledge about causal relationships 
to build “belief networks” based on assumptions of conditional independence. In other 
systems independence constraints are taken as default assumptions; if the expert or user 
supplies no information about the relationship between two events or variables then they 
are assumed to be independent. 

Such default assumptions may sometimes be needed to produce useful conclusions, but 
it is important that they always be made as explicit as possible (they may be inconsistent 

with an expert’s other beliefs), that users be encouraged to consider whether they are 
reasonable in a particular application, and that they can be easily retracted if the overall 
set of judgements becomes inconsistent. Ideally, inferences should be computed both 
with and without default assumptions so that a user can compare their effects. A logic 
of default assumptions that is compatible with the theory of lower previsions is outlined 
in [113]. 

Conclusion 

The theory of coherent lower previsions is a general theory of reasoning in the 

presence of uncertainty and partial ignorance. Lower previsions are more general and 
more expressive than lower and upper probabilities. They have a simple behavioural 
interpretation as supremum buying prices for gambles and they should not be interpreted, 
in general, as lower bounds for an unknown Bayesian prevision. 

The theory certainly satisfies criteria (a)-(d) of Section 2. Lower previsions have a 
clear behavioural interpretation which supports the principles of coherence and natural 
extension. All the rules of the theory can be derived from these principles, and they 
can be used to check consistency of the initial assessments and to ensure consistency of 
assessments with conclusions. The imprecision of lower previsions can be used to model 


P. Walley/Artifcial Intelligence 83 (1996) 1-58 25 

a lack of information, conflict between several types of information or between expert 
opinions, or the vagueness of probability judgements in natural language. Coherent 
models can be produced by combining qualitative (natural-language) judgements with 
precise or imprecise numerical assessments. Because lower previsions have a behavioural 
interpretation, it is quite easy to understand the practical meaning of conclusions that 
are expressed in terms of them, and to use them in making decisions. 

The task of assessment can be handled, in principle, by allowing the user to make 
whatever judgements he finds most comprehensible and natural from a wide variety 

of admissible judgements. In particular he can express his uncertainties in ordinary 
language; some ways of doing so are discussed in Section 6. Other important sources of 
coherent lower previsions include multivalued mappings (Section 5)) partial information 
about precise probabilities, combination of expert opinions [ 1031 and various models 
based on statistical data [99,101,102]. It seems that, in complex problems, assumptions 
of independence or conditional independence will be needed to reduce the effort of 
assessment and produce useful conclusions. Further work is needed to compare several 
ways of modelling independence judgements, to study how independence can be used 
as a default assumption in expert systems, and to determine its effect on the precision 
of coaclu,sions. 

The genera1 method for making inferences and decisions in the theory is natural 
extension. In general, the computation of natural extension can be reduced to a linear 
programming problem, or (in the case of independence judgements) to a finite sequence 
of linear programs. Again, these problems may be intractable in moderately large ex- 
pert systems when many assessments are made, and then (as in the Bayesian theory) 
special types of models are required. Again further work is needed to find computation- 

ally efficient methods for computing natural extensions, especially when independence 
judgements are involved, to develop tractable types of models (e.g. using 2-monotonicity 
or upper and lower densities), and to develop efficient methods for propagating lower 

previsions in belief networks. 

5. Belief functions 

The th’eory of belief functions was initiated by Dempster in a series of papers in 

the 1960s and developed by Shafer [ 711. Its relevance to expert systems is discussed 
in [ 34,741]. For more recent developments see [64,75] and the ensuing discussion 
[ 19,65,76,84,92,107,112]. Smets has developed an interesting variant of the theory 
called “the transferable belief model” [ 82-851. Applications of belief functions in expert 
systems include OASES [5] and its shell [4], MacEvidence [ 391, PSEIKI [44,47], 
PULCINELLA [69] and [49,58]. 

A belicffunction p is a real-valued function, defined on all subsets of a possibility 

space 0, which can be written in the form P(A) = CBCA m(B) for all subsets A, 
where m is a probability mass function on subsets of 0, i.e. m(0) = 0, m(B) 2 0 for 

all B C 0, and ~ecnm(B) = 1. Here C denotes set inclusion, not necessarily strict. 
(In Smets’ theory m@) may be positive.) The conjugate upper probabilities are defined 
by B(A)= 1 -f'(AC)=~,,,gm(B). 


26 P Walley/Art$cial Intelligence 83 (1996) l-58 

The mass function m is called the probability assignment for E. It is determined by 
p through the Mobius inversion formula m(B) = C,,,( -1) lB-Alp( A). Any lower 
probability function p determines a function m through-this formula, and p is a belief 

function if and only if m is a probability mass function. One can think of m(B) as a 
fluid probability mass that is free to move to any element of B. Bayesian probability 
measures are a special type of belief function for which m(B) = 0 unless B is a singleton 
set. The vacuous lower probability is a belief function, defined by m( 0) = 1. 

Interpretation 

The natural extension of a belief function E to a lower prevision is defined for all 
gambles X by p(X) = xBcam( B) inf{X( w): w E B}. It is easily verified that p 
satisfies the coherence axioms (Pl)-(P3) in Section 4, and it follows that every belief 
function is a coherent lower probability function. So belief functions can be given the 

behavioural interpretation of lower probabilities: P(A) is a supremum of acceptable 
rates for betting on event A. 

Various other interpretations of belief functions have been discussed in [36,63- 
65,72,75,76,85,104] ; see [ 831 for a survey. Shafer prefers to interpret belief functions 
by drawing an analogy with a “canonical example” of a “randomly coded message” 
[ 72,751. In Shafer’s canonical example, the belief function is generated from an under- 
lying precise probability measure through a multivalued mapping (defined below). The 
underlying probability measure does appear to have a behavioural interpretation in terms 
of rational betting rates, and the belief function inherits this behavioural interpretation 

through the multivalued mapping [99, p. 1821. So it seems to me that a behavioural 
interpretation of belief functions is not only compatible with Shafer’s interpretation, but 

also a necessary consequence of his interpretation. 
Indeed, I regard Shafer’s semantics and the other interpretations of belief functions 

as possible ways of elaborating the behavioural interpretation. On [ 99, pp. 20 and 611, 
I call the behavioural interpretation “minimal” because it is compatible with a wide 

variety of elaborations. It requires that belief functions have certain implications for 
betting and other decisions, but it does not exclude other semantics which relate belief 
functions to the evidence on which they are based. 

Shafer [75] describes the random coding example as a “metaphor” which may pro- 
vide some guidance in constructing belief functions. The behavioural interpretation is 
concerned with how belief functions are used in making decisions. The two types of 
interpretation are compatible and I think that both are needed. (I prefer to call Shafer’s 
canonical example an “assessment strategy” rather than an “interpretation” of belief 
functions, but that is not to deny its utility.) In practice we need to construct belief 
functions from evidence, but we also need to use belief functions to make decisions 
and this seems to require some kind of behavioural interpretation [ 331. Without one, 
the practical meaning of inferences that are expressed in terms of belief functions is 
somewhat unclear. See [ 1071 for an interesting comparison of the two interpretations. 

In [ 721 Shafer does accept the behavioural interpretation of belief functions, although 
he argues that other aspects of their meaning are more important. In [75] he argues 
vehemently against a Bayesian sensitivity analysis interpretation of belief functions, 


I? Walley/Art@cial Intelligence 83 (1996) l-58 27 

but that argument is irrelevant to the present discussion as I also reject the sensitivity 
analysis interpretation. The theory of coherent lower previsions and the rules of natural 

extension rely only on a behavioural interpretation. 

Of course I am not claiming that the whole Dempster-Shafer theory is compatible 
with a behavioural interpretation of belief functions. Much of the theory is based on 

Dempster’s rule for combining belief functions, and in many problems Dempster’s 
rule produces inferences that are unacceptable under a behavioural interpretation. Some 
presentations of the theory [75,76] give the impression that Dempster’s rule follows 
naturally from the “random coding” or “multivalued mapping” semantics for belief 
functions, together with an inocuous assumption of unconditional independence. In 
fact, even under Shafer’s own semantics, the justification of Dempster’s rule relies 

on stronger assumptions of conditional independence which seem to be unreasonable 
in many applications. (These assumptions are discussed later in this section.) Nor 
do the other interpretations of belief functions provide a convincing justification for 
Dempster’s rule. Shafer’s semantics and the concept of a multivalued mapping are 
compatible with a behavioural interpretation, but the indiscriminate use of Dempster’s 

rule is not. 

Assessment 

Belief functions are often generated by a multivalued mapping [ 151, which is a 
mapping A. from points of an underlying space P = ($1,. . . , &} to subsets of 0. For 
simplicity I assume that the sets A( @I ) , . . . , A( en) are distinct. It may be possible to 
assess a Bayesian probability measure P on !P, using either frequency information or 
subjective judgement, and this induces a belief function on 0 through m( A(q$)) = 

p($i). 

Example Z! (An unreliable witness). In this simplest example, an unreliable witness 
claims that he observed an event C. Either (@I ) he did observe C, so A(& ) = C, or 
(& ) he observed nothing, so A( &) = R. Suppose that, after hearing his report, we 
judge the witness to have credibility a, so P( $1) = (Y and P (I&) = 1 -a. This generates 
the probability assignment m(C) = (Y and m( 0) = 1 - (Y, and the corresponding belief 
function has P(C) = cy and Ti( C) = 1. Thus there is some evidence in favour of C (we 
would bet on C at any odds better than 1 - (Y to (Y), but no evidence against C (we are 
not prepared to bet against C at any odds). 

The imprecision of this belief function simply reflects the absence of information 
about the “base rate” frequency of C, P(C 1 &). A Bayesian would need to make a 
precise assessment of P(C 1 $2) and then compute P(C) = (Y + (1 - a)P(C 1 92). 
Depending on the practical context, there may or may not be information on which 
to assess P( C 1 t,&t). In general, we may only be able to assess lower probabilities 
c:<C 1142) andP(fit) =LX, and then weobtainP(C) =g+(l-(y)p(C 11,bz) and 
P(C) = 1 by natural extension. Note that Bayesians require two precise assessments, 
of a and I’( C 1 CCI;! ) , the multivalued mapping requires one precise assessment (a), but 
we can reach useful conclusions without making any precise assessments. 


28 R Walley/Artijicial Intelligence 83 (1996) l-58 

Belief functions can often be assessed by using the model of a multivalued mapping 
and assessing underlying Bayesian probabilities. (Although it is somewhat ironic that 
precise assessments should be needed in a theory of imprecise probabilities! A more 
general model, which does not assume that precise probabilities can be specified on 
the underlying space P’, is given in [99, Section 4.3.53.) In other problems we may 
be able to assess the probability assignment m more directly. For example there may 
be frequency information about the outcomes Bt , . . . , B, of previous trials, where the 
outcome is recorded as a subset Bi of 0 rather than a single element. We might then 
take the past relative frequency of B as our assessment of m(B) [ 19,104]. Some other 
assessment strategies have been suggested in [ 71,73,75]. 

Example 3 (Football example). It is important to recognise, however, that there are 
many types of information which cannot be modelled by belief functions. One simple 
example is the football example of Section 4, where three judgements were expressed 
in ordinary language. Let p be the lower probability function constructed by natural 
extension of the three judgements, which is the lower envelope of the three probability 
mass functions ($,i,i), (i,i,O), (i,i,i) onR={wD,L}.BytheMGbiusinversion 
formula, 

m(W=P(Q -P(WUD) -P(WUL) -P(DUL) +p(W) +p(D) +p(L) 

&$;_;+;+~+o=-~. 

As m( L?) is negative, m cannot be a probability mass function so P is not a belief 

function. 

It appears that probability judgements in natural language cannot be modelled, in 
general, by belief functions. Note also that, even when such judgements do yield a 
belief function, this may be less informative than the appropriate lower prevision. (As 
explained in Section 4, lower probabilities are less expressive than lower previsions.) 
Even when a belief function p is generated by a multivalued mapping, we may be 

able to make further assessments to sharpen p, and the resulting lower probability 
may not be a belief function. There are other examples in [48,64,96,99] of models 
and assessment strategies which produce coherent lower probabilities that are not belief 
functions. (Consider, for example, two tosses of a fair coin with unknown correlation 
between the outcomes.) There seems to be no good reason to restrict attention to belief 
functions, rather than coherent lower probabilities or coherent lower previsions. Nor is 

it clear why unconditional belief functions are taken to be the fundamental measures of 
uncertainty. Direct assessments of conditional lower and upper probabilities are often 
needed to measure the uncertainty associated with the “if-then” rules that are prevalent 
in expert systems [ 641. 

Dempster’s rule of combination 

Dempster’s rule is extensively used in the theory to combine and update belief func- 
tions. Let ml and m2 be probability assignments based on separate (“independent”) 


P Walley/Artijicial Intelligence 83 (1996) 1-58 29 

bodies of evidence. A combined probability assignment m is defined by 

m(C) =p-’ C ml (A)m2( B) for all non-empty sets C, 
ArlB=C 

where 

P= c ml (A)MB) 

(9) 

is a normalizing constant, provided p is nonzero. It appears to be computationally 
feasible Ito use Dempster’s rule to combine belief functions that are defined on cer- 

tain kinds of tree structures [ 77,78,114]. Other computational results are discussed in 
[34,61,79,111,112]. 

Example 4 (Two unreliabEe witnesses). As a simple example of Dempster’s rule, sup- 
pose that there are two unreliable witnesses of the type considered earlier. The first 
witness, who has credibility cyt, reports that he observed event Cr. The second wit- 
ness, with credibility a~, reports C2. So ml and m2 are defined by ml (Cl) = al, 
ml(O) q : 1 -at, mz(C2) = ff2, m?(a) = 1 - (~2. Suppose that Ct and C2 are logi- 
cally indlependent events. Then p = 1 in Dempster’s rule and the combined probability 
assignment [71] is m(C1 n C2) = cqcq, m(CI) = cut(l - Ly2), m(C2) = (1 - CWI)CQ, 
m(Q)=(l-cut)(l-a2). 

When should beliefs be combined by Dempster’s rule? The rule appears to give 
reasonable answers in some problems, but it produces unsatisfactory and seemingly er- 

roneous tconclusions in other problems (an example is given below). Its applicability 
seems to rely, in general, on some rather delicate judgements of conditional indepen- 
dence. In. the example, the rule is applicable provided that beliefs, based on the report 

of one witness, about whether her report is correct would be unchanged by further 
information which specified both the report of the other witness and whether his report 
was correct. 

This is a type of conditional independence, conditional on the reports of both wit- 
nesses. It is quite different from the type of unconditional independence that is suggested 
by Shafer [ 72,75,76] as a justification for Dempster’s rule. (A similar distinction is made 
in [ 951 .:I In the example, each witness sometimes reports correctly and sometimes in- 
correctly, and the two witnesses may be unconditionally independent in the sense that 
learning whether one witness reported correctly (without knowing what the report was) 

would not change our belief that the other witness reported correctly. 
Unconditional independence is a simple judgement and often a reasonable one, for 

instance when the witnesses are known to have no interaction. But it is not sufficient to 
justify the use of Dempster’s rule, because the belief functions involved in Dempster’s 
rule are based on (or “conditional on”) the specific reports of the two witnesses. The 
credibility of witness i (q) must be a posterior probability, conditional on the report 
of witness i, in order to determine beliefs based on all the available evidence. (Shafer 
[ 72, p. 51, equates this posterior probability with the prior probability that witness i 
will report correctly, but Levi [55] has shown that this is unjustified.) 


30 P: Walley/Art@cial Intelligence 83 (1996) 1-58 

Dempster’s rule therefore relies on an assumption of conditional independence and 
this is harder to justify. If the two witnesses gave detailed reports which agreed in 
most details and were otherwise compatible, for example, then learning that one report 
is correct would certainly give us more confidence that the other report is correct 
and conditional independence would fail. It would fail also if the two reports were 
inconsistent, even though unconditional independence might be reasonable in these two 
cases. 

I am not suggesting that Dempster’s rule of combination is always inappropriate in 
such examples, merely that the implicit judgements of conditional independence should 
be made explicit and carefully considered. But conditional independence seems less 

reasonable in most other, more complex examples than in the case of the two witnesses, 
because learning one body of evidence and which mechanism or “code” produced it 
will typically provide information about the true state w and hence change beliefs about 
which mechanism produced the other body of evidence. That is especially clear in cases 
of conditioning, where one body of evidence actually restricts w to a subset of the initial 
possibility space LL 

Dempster’s rule is therefore unreliable unless careful attention is given to the implicit 
assumptions of conditional independence. For a general mathematical formulation of 
these assumptions and for further discussion, see [99, Section 5.131. The conditions 
given in [99] are similar to, but slightly different from, those given by Voorbraak 
[ 95, p. 1881. Voorbraak reaches a similar conclusion about the limited applicability of 
Dempster’s rule. 

Example 5 (Independent witnesses). Even in the simple example of the two witnesses, 
there are other ways of modelling independence. One way is to make the assessments 
p( Ct ) = at, f3 C2) = q, and judge that events Ct and C2 are independent (conditional 
on the testimony of both witnesses), meaning that upper and lower probabilities for 
one event would not change if we learned whether or not the other event occurred. 
Coherent lower previsions can then be computed by natural extension of the judgements 

P(CI) = P(C1 I c2> = P(Cl I cg = ~1 and p(C2) =p(C2 1 Cl) =J’(C2 1 Cl”> = LYE. 

Natural extension produces a lower probability model that is not a belief function (it is 
not even 2-monotone), and which is more precise than the belief function produced by 

Dempster’s rule. For example, let A denote the event that Ct and C:! have the same truth 
value (i.e. both occur or neither occurs). Then Dempster’s rule gives &,(A) = LYICX~, 

whereas natural extension gives F’,(A) = alcy2/( cq + (~2 - qq) which is strictly 

larger than &(A) provided 0 < (~1 < 1 and 0 < LUZ < 1. (Both rules give F(A) = 1.) 
This is one type of application in which natural extension produces inferences that are 
more precise than Dempster’s rule. (Compare with conditioning.) 

A third approach, suggested by Bayesian sensitivity analysis, is to look for all Bayesian 
probability measures P which satisfy P( Cl ) > CYI and P(Q) > a~, and under which 
Ct and C2 are independent [ 10.51. By forming the lower envelope of all such measures, 
we obtain a lower probability model that is more precise than the two previous models 
and again is not a belief function. For this model 

P,(A) =min{crt,a:!,Qta2 + (1 -nt)(l -a2)} 


P: Walley/Arti&ial Intelligence 83 (1996) 1-58 31 

and &(.4) > &(A) > Z&(A), provided 0 < q < 1 and 0 < a2 < 1. If each witness 
has credibility $, for example, we obtain Pe (A) = 6, &(A) = 3 and Z’o (A) = $. 

The three models do agree on some probabilities, in particular p(Ct II C2) = qcq 

and p( Cl n C2) = 1 for each model. (A fourth model, based on possibility theory, 
will be given in Section 6. This gives &‘(CI II C2) = min{q, a~}, which seems less 
reasonable than the other solutions.) The differences between the four models may be 
important because judgements of independence are common in expert systems. It is not 
clear which model is most appropriate for practical applications, but see [99, Chapter 
9;105] for some comparisons. 

Dempster’s rule of conditioning 

Suppose that the second probability assignment in Dempster’s rule is defined by 

m2( B) I= 1, representing knowledge that event B has occurred. Dempster’s rule of 
combination then reduces to Dempster’s rule of conditioning, which can be written 
most simply in terms of upper probabilities as PO (A 1 B) = P( A fl B) /F( B), defined 
whenever p(B) > 0, with &( A 1 B) = 1 --FD(A~ ) B). This rule is used in the theory 
of belief functions to update beliefs after receiving new information. 

Compare this rule with the rule of natural extension given in Section 4. Because 
every belief function E is 2-monotone, the conditional probabilities defined by natural 
extension are given by the simple formulas 

&(A I B) = 
P(A n B) 

P(AnB) +P(ACnB)’ 

PE(A / B) = _ 
P(AnB) 

P(AnB)+I’(ATB) 
(10) 

provided the denominators are nonzero. Several authors [23,41,93] have shown that if 
p is a bselief function then so is &(a 1 B). (This is true whenever F(B) > 0; see 
[ 4 I] .) Thus conditioning a belief function by natural extension produces another belief 
function. 

The conditional probabilities defined by Dempster’s rule are always at least as precise 
as those defined by natural extension [ 151, in the sense that 

&(A 1 B) < &,(A 1 B) < FD(A 1 B) < BdA I B). (11) 

Both rules yield the same conditional probabilities when the initial probability measure 
is precise (and then both agree with Bayes’ rule), but, in other cases, Dempster’s rule 
typically yields conditional probabilities that are more precise than natural extension. 
(See [ 1!>,23,48] for interesting comparisons of the two rules.) Natural extension pro- 
duces conditional upper and lower probabilities that are coherent with the unconditional 
upper and lower probabilities. In some cases the conditional probabilities defined by 
Dempster’s rule may also be coherent with the unconditional probabilities, even when 
Dempster’s rule and natural extension produce different answers. For instance, Demp- 
ster’s rule seems to produce reasonable inferences when it is used to update possibility 


32 P: Walley/Art@cial intelligence 83 (1996) l-58 

measures (see the examples in Section 6). But there are examples, such as the following, 
in which Dempster’s rule produces inferences that seem to be seriously wrong. 

Example 6 (e-contamination model). To compare the two rules of conditioning, con- 
sider the following statistical model. A random variable X takes values in the sample 
space X = {1,2,3,... , N}. To be specific we will take N = 104. (In fact Dempster’s 
rule will produce the same type of inconsistency for any value of N greater than 1, but 
the degree of inconsistency increases with N.) With probability 0.99, X is generated 
by the uniform probability distribution on X. With probability 0.01, X is generated by 

another, completely unknown, probability distribution on X. Thus almost all observa- 
tions follow a uniform distribution but one observation in a hundred is expected to be 
a “gross error”, in the sense that it is generated by a completely unknown (but possi- 

bly drastically different) mechanism. This sampling model is called an &-contamination 
neighbourhood of the uniform distribution (here E = 0.01) [ 401. 

Let B, denote the event that X = x, let A denote the event that X is generated by 
the uniform distribution, and define y = 1 if A occurs and y = 0 otherwise. There is 
uncertainty about both the value of X and whether A occurs, so we take the possibility 

spacetobefi={(x,y): x=1,2 ,..., N; y=O,l}. 
The uncertainty about D can be modelled by a belief function whose probability 

assignment m is defined by m(A fl B,) = 0.99/N for x = 1,2,. . . , N, m(AC) = 0.01, 
and m(C) = 0 for all other sets C. This model is imprecise because we do not know how 
to distribute the probability mass 0.01 amongst the alternatives to a uniform distribution. 

Before observing X, the probability of A is precisely 0.99. Now suppose we make a 
single observation X = x. How should we update our uncertainty about A? 

Dempster’s rule of conditioning produces 

Fo(A I B,) = 
F(An B,) m(A n B,) 0.99/N 

F(&) = m(A fl B,) + m(AC) = 0.99/N + 0.01 

99 
=- 

99 + N 
< 0.01 when N = 104. 

Similarly PO ( AC 1 B,) = N/( 99 + N), hence 

&(A 1 B,) = 1 -Po(AC 1 B,) =PD(A 1 B,). 

Thus Po( A I B,) = &(A 1 B,) < 0.01 for every possible value of x. After observing 
x, the updated probability of A is precise and smaller than 0.01, whatever the value of x. 

Initially we are very confident that X will be generated by the uniform distribution. 
But if we use Dempster’s rule to update our beliefs then we will become very confident, 
whatever value of X we observe, that X was not generated by the uniform distribution! 

Intuitively there is a strong inconsistency between the initial and updated probabilities. 
Indeed an observer who knew that Dempster’s rule would be used to update probabilities 
could exploit the inconsistency to make a sure gain, by initially betting against event A 
and, after x is observed, betting on A at a more favourable rate. The initial and updated 
probabilities violate the coherence axioms (Cl) and (C2) of Section 4 and they “incur 
sure loss” in the mathematical sense of [ 991. 


P. Walley/Artifcial Intelligence 83 (1996) 1-58 33 

In simple terms, Dempster’s rule produces the wrong answer because it treats the 
probability mass m(AC) = 0.01, which is spread over all possible values of x, as if it 
were entirely focused on the x that is actually observed. Indeed Dempster’s rule produces 
the inferences that a Bayesian would obtain from the precise probability assessments 
P( A n B, 11 = 0.99/N and P( AC II B,) = 0.01, which are incoherent when asserted for 
every possible x. I mentioned earlier that Dempster’s rule is applicable when a particular 
type of conditional independence holds. In this problem the required condition is that, 
for each x, learning that X = x would not change the relative likelihood of A n 8, and 
AC. Of course this condition is unreasonable. 

There appear to be many problems in which Dempster’s rule of conditioning or com- 

bination produces incoherent or unacceptable inferences; other examples can be found 
in [99, Section 5.131 and in [36,48,64,65,98]. 

Compare Dempster’s rule of conditioning with the method of natural extension, which 
always produces coherent inferences. The conditional upper and lower probabilities gen- 
erated by natural extension can be easily computed as follows: 

F&A 1 B,) =_ 
F(AnB,) 0.99/N 

P(AnB,) +P(ACnB,) = 0.99/N+O = ” 

&(A I B,) = 
P(A i-7 Bx) 0.99/N 

P(A n B,) + P(Ac n B,) = 0.99/N + 0.01 
=Z<OOl 

99+N * ’ 

In this problem, these are the unique updated upper and lower probabilities that are 
coherent with the initial belief function. 

The two rules of conditioning produce the same updated lower probability for A, 
but they produce very different upper probabilities. The updated probabilities given by 

Dempster’s rule are precise, whereas those given by natural extension are highly im- 
precise. The reason is that, for each possible i, the initial model is consistent with the 
hypothesis Hi that, whenever AC occurs, X takes the value i for certain. If we knew 
H* to be true we would obtain precise probability P( A 1 B,) = 99/(99 + N), but if 
we knew /4 to be true (where i $ x) we would obtain P( A 1 B,) = 1. The range of 
lower to upper updated probabilities must cover both values. The initial probability of 
A is precisely 0.99 under all the hypotheses Hi, but the updated probabilities are very 
different under different hypotheses and this produces imprecision in the updated upper 
and lower probabilities. 

Despite this explanation, some readers may feel that observation of x should have 
absolutely no effect on uncertainty about A and that the updated probabilities should 
be P( A 1 B,) = 0.99 (precisely) for every possible x to agree with the initial prob- 
ability P (,4) = 0.99. However these updated probabilities are not coherent with the 
initial belief function. If you take the updated probabilities to be precisely 0.99 then 
you must modify the initial probability model to make it precise. Indeed, we can use 
natural extension to compute the unconditional probabilities that are generated by the 
assessments P(A I B,) = 0.99 and P(A rl B,) = 0.99/N for all x, and we obtain 
P(AC fl Bx) = 0.01/N (precisely) for all x. This is tantamount to assuming that the 
probability mass m(AC) = 0.01 can be distributed uniformly over X, i.e. that X is 
generated by a uniform distribution when AC occurs, as well as when A occurs! If 


34 I? Walley/Artificial Intelligence 83 (1996) I-58 

there is really no information about how X is generated when AC occurs then the initial 
probability model should be imprecise, and this inevitably leads to imprecision in the 
updated probabilities. 

Expectations 

It is assumed in the Dempster-Shafer theory that uncertainties are modelled in terms 
of upper and lower probabilities. As discussed in Section 4, upper and lower probabilities 
may not be sufficient. In many problems upper and lower previsions or conditional prob- 
abilities are needed and information is lost if these are defined in terms of unconditional 
probabilities. 

Assuming that beliefs are specified in terms of a belief function c with probability 
assignment m, lower and upper previsions of a gamble X can be computed by natural 
extension, through the formulas 

cc 

P(X) = 
s 

xdFx(x) = cm(B) inf{X(w): w E B}, 

-cc BCfJ 

cc 

B(X) = 

s 

xdF,(x) = cm(B) sup{X(w): w E B}, (12) 

--oo EC&f 

where Fx and & are the upper and lower distribution functions of X under 41 (see Eq. 
(8) ). Preferences between actions can be constructed by computing upper and lower 
previsions of differences between utility functions, as outlined in Section 4. Other ways 
of using belief functions in decision making are described in [ 85,921. 

Conclusion 

Belief functions are a special type of coherent lower probability. They can model 
various types of partial ignorance and limited or conflicting evidence. Because they can 
be represented in terms of a probability mass function m, belief functions appear to be 
mathematically and computationally simpler, in some ways, than the general class of 
coherent lower previsions. Computationally efficient methods have been developed for 
combining and propagating belief functions. 

Belief functions can be assessed through multivalued mappings or in other ways, 
and the assessment strategies suggested by the theory are useful in many applications. 
(Shafer’s emphasis on constructing belief functions from simple evaluations of evidence 
is especially valuable.) However, many other assessment strategies produce coherent 
lower probabilities that are not belief functions. Some important types of uncertainty, 
e.g. judgements of probability in ordinary language, cannot be adequately modelled by 
belief functions. In fact belief functions are much less expressive than coherent lower 
previsions. The theory gives no justification for restricting attention to belief functions 
rather than coherent lower probabilities, or to lower probabilities rather than lower 
previsions. 


P Walley/ArtijEcial Intelligence 83 (1996) I-58 35 

The calculus of belief functions relies heavily on Dempster’s rule of combination. 
Dempster’s rule may be useful in some problems where it is supported by explicit 
judgements of conditional independence but it is unreliable in general and it can produce 
inferences that are intuitively inconsistent and formally incoherent. More attention needs 
to be given to the exact conditions under which Dempster’s rule is applicable. 

Apart from the fundamental role it gives to Dempster’s rule, the Dempster-Shafer 
theory appears to be broadly compatible with the theory of coherent lower previsions. 
Belief functions that are constructed through a multivalued mapping have a behavioural 
interpretation and can be regarded as coherent lower probabilities, and they could be 
combined and updated through the rules of natural extension rather than by Dempster’s 

rule. 

6. Possibility measures 

The the.ory of fuzzy sets was introduced by Zadeh [ 1161 for the purpose of mod- 
elling the imprecision and ambiguity of ordinary language. Uncertainty is often mea- 
sured in this theory by possibility measures, which are described in [ 16,17,46,115,118]. 
Concerning the use of possibility measures and fuzzy logic in expert systems, see 
[ 17,120,1:2 1 ] . Specific applications to expert systems include DIABETO [ 71, CADIAG- 
2 [ I 1, SP.HINX [ 261, TAIGER [ 241, SPII-2 [ 5 11, PULCINELLA [ 53,691. 

Zadeh [ 1211 argues that “classical probability is insufficiently expressive to cope 
with the multiplicity of kinds of uncertainty which one encounters in AI and, more 
particularly, in expert systems”. In particular, Bayesian probabilities are inadequate for 
dealing wj th vague (inexactly defined) events or predicates, such as “young” or “tall”, 

or with natural-language expressions of probability, such as “probably”. These two types 
of vaguenfess occur together in the expression “it is likely that Mary is young”. Many 
of the production rules in expert systems such as MYCIN are fuzzy in these ways. (See 
[ 22,120] for many examples.) If expert systems are to be widely used, it does seem 
necessary to provide mathematical models for vague expressions in natural language. 

Dejnitions 

Consider the expression “Mary is young”. According to Zadeh, this provides some 
informatia’n about Mary’s precise age X, and the information can be modelled by a 

possibility distribution function TX, defined on the set D of possible ages. The number 
VQ (w) 1ie:s between zero and one and is read as “the degree to which it is possible that 
Mary has precise age o, given that she is young”. It is assumed that sup{~x( w) : w E 
0) = 1, t:hat is, the function 7~ is normalised to have supremum value one. (Zadeh 
[ 1181 did not require this normalisation but it has been assumed by most later authors.) 
Zadeh identifies the function 7~ with the membership function for the fuzzy set “young” 
on 0. Thus a possibility distribution function is a type of membership function, and this 
enables th,e basic theory of fuzzy sets to be carried over to possibility distributions. 

A possibility measure T is defined for all subsets A of L? by v(A) = sup{rx (w) : w E 
A}, with V( 8) = 0. It follows that +7r has the properties (for all subsets A and B) : 0 < 


36 P Walley/Art@cial Intelligence 83 (1996) l-58 

n(A) < 1, ~(0) = 1, max{rr(A),7r(AC)} = 1, and ?T(A UB) = max{r(A),T(B)}. 
In general rr( A n B) < min{lr( A), T(B)} but there is equality in the important special 
case where A and B are “non-interactive” events [ 1171. Non-interaction appears to 
correspond to independence in probability theory. A possibility distribution function is 
analogous to a probability mass function or density function, and a possibility measure 
is analogous to a probability measure. 

Interpretation 

Thus Zadeh translates natural-language expressions into the mathematical formalism 
of possibility measures. But how should we interpret these measures? Zadeh clearly 
wishes to distinguish degrees of possibility from degrees of probability: “A basic as- 
sumption which underlies our approach to approximate reasoning is that the imprecision 
which is intrinsic in natural languages is, in the main, possibilistic rather than probabilis- 
tic in nature. . . . possibility relates to our perception of the degree of feasibility or ease 
of attainment, whereas probability is associated with the degree of belief, likelihood, 
frequency, or proportion.” [ 1191 

It is widely recognised that possibility is distinct from probability and this distinction is 
central to all theories of probability, e.g. [ 281. Degrees of probability can be interpreted 
as betting rates or as relative frequencies, but it is less clear that it is meaningful to talk 
of “degrees of possibility”. Zadeh [ 1181 refers to expressions such as “slightly possible” 
and “quite possible” but admits that these usually indicate degrees of probability rather 
than degrees of possibility. 

Zadeh’s explication of degrees of possibility [ 118,119], and the example he gives to 
illustrate it, takes them to be measures of how easy or feasible it is to perform an action. 
This is somewhat different from the way that degrees of possibility are used in fuzzy 
logic, where they seem to measure how plausible it is that a proposition is true. Given 
the information “Mary is young”, for example, ~~(20) might measure how plausible 
it is that Mary is aged 20, but it is difficult to understand it as a measure of “how 

easy” or “how feasible” it is for Mary to be aged 20. The general usage is consistent 
with the following interpretation of degrees of possibility as upper probabilities, which 
are sometimes called “degrees of plausibility”. Various other interpretations of degrees 

of possibility have been suggested [ 16,171, but I do not know of any that justifies the 
usual min-max rules of combination. 

It is essential to have a clear interpretation of “degree of possibility” for the reasons 
mentioned earlier. If we do not understand the meaning of the numbers Q(W) then it 
will be difficult to assess them in practical problems, to understand conclusions that are 
expressed in terms of them, and to justify the calculus that is used to manipulate them. 

Interpretation as upper probability 

In fact, as suggested by Giles [ 32,331, possibility measures can be given a behavioural 
interpretation as upper probabilities. That is, we regard a possibility measure 7r as an 
upper probability measure, H(A) = T(A) = sup{7rx(w): w E A}, and interpret T(A) 
as a marginal acceptable betting rate for betting against A. The corresponding lower 


I? Walley/Artifcial Intelligence 83 (1996) l-58 37 

probabilities are defined by P(A) = 1 - sup{~(w): w E AC} = 1 - r(AC). The 
lower probabilities are sometimes called necessity measures [ 17,461. They have the 
properties (for all subsets A, B of 0): F(A U B) = max{P(A) ,B( B)}, P(A II B) = 
min{P(A),P(B)}, P(A) 6 F(A), min{P(A),P(AC)} = 0, and either F’(A) = 0 or 
F(A) = 1. 

Suppose, for example, that A is the event that Mary’s age is at least 30 years. Given 
the information “Mary is young”, we might assess r(A) = 0.4 and V( AC) = 1, so that 
p(A) = 0.4 and P(A) = 0. The behavioural interpretation is that we would bet against 
A at odds of 3 to 2 against, but we would not bet on A at any finite odds. (As noted 
in Section 4, a more general behavioural interpretation could be given in terms of the 

implications of possibility measures for decision making. Decisions would be made, as 
in Section 4, by computing upper and lower previsions of differences between utility 
functions, using Eqs. ( 13) and ( 14) below.) 

All possibility measures are coherent upper probabilities. One way to see that is 
to construct their natural extensions to lower and upper previsions. If the possibility 
measure has possibility distribution IQ, its natural extensions are (for any gamble Z 

defined on a) 

1 

p(Z) = J inf{Z(w): TX(O) > u}du 
0 

sup Z 

=supz - J sup{~xW: Z(w) < y)dy 
infZ 

(13) 

1 

F(Z) = 
J 

sup{Z(o): Q(W) 3 u}du 

0 

SUD z 

=infZ + J SUP{~X(W): Z(w) > y)dy (14) 
infZ 

where inf Z and sup Z denote the infimum and supremum values of Z(o) over all 
w E 0. The second versions of each formula follow from Eqs. (8). See [ 102, Lemma 
I] for a derivation of these formulas. 

From the first expression for ZJ Z), it is easy to verify that the lower prevision p 
satisfies axioms (Pl)-(P3) of Section 4 and is therefore coherent. Then verify that its 
restriction:5 to upper and lower probabilities are defined by P(A) = sup{rx( w): w E 
A} and P(A) = 1 - sup{~(w): w E AC}, hence these upper and lower probabilities 
are coherent. (Alternatively, verify that p is 2-monotone and use the result that all 
2-monotone lower probabilities are coherent.) 

There are great advantages in adopting this interpretation of possibility measures, 
especially as the theory of coherent upper and lower previsions could then be used 
to guide the assessment of possibility distributions and to derive rules for combining 
them. But note that possibility measures are a very special type of coherent upper 


38 I? Walley/Artijicial Intelligence 83 (1996) I-58 

probability. Their characteristic property P( AUB) = max{P( A), p( B)} is not necessary 
for coherence, e.g. if A and B are logically independent events and you assess F(A) = 
p(B) = i, the natural extension is P(A U B) = 1, and any assessment of P(A U B) 

between $ and 1 would be coherent with the initial assessments, whereas p( A U B) = i 
is necessary for P to be a possibility measure. 

For finite 0, a possibility measure r is just a consonant plausibility function in the 
sense of Shafer [ 711, and the corresponding lower probability p(A) = 1 - 7r( AC) is a 
special type of belief function, characterised by the property that the sets B for which 
m(B) > 0 form a nested sequence [46]. Indeed it seems more appropriate to call 
z-(A) a degree of plausibility rather than a degree of possibility. Event A is more or 
less plausible according to the amount of evidence pointing against A, e.g. if there is 
no evidence against A then A is fully plausible, T(A) = 1, and there is no reason to 
bet against A at any odds. We could also interpret F’(A) = 1 - r( AC) as a “degree 
of potential surprise” that we would experience if A failed to occur [54,70], or as a 
“degree of provability” of A [ 121. The uncertainty measures proposed by Shackle [ 701 
and Cohen [ 121 appear to be mathematically equivalent to possibility measures and thus 
they are special types of coherent lower probabilities. 

Imprecision 

Possibility measures can be used to model some types of imprecise or partial infor- 
mation. For example, complete ignorance about a variable w can be modelled by the 
possibility distribution Q(W) = 1 for all w in 0, which corresponds to the vucuous 
upper and lower probabilities. The degree of imprecision concerning an event A can 
be measured in general by p(A) - I’( A) = T(A) + T( AC) - 1 = min{rr(A), r( AC)}. 
Some natural-language judgements, such as “Mary is young” and the variants discussed 

in the following subsections, can be modelled by possibility measures. 
However, first-order possibility measures do not seem to be sufficiently flexible to 

model many common types of uncertainty. The football example and other examples 

involving natural-language judgements of uncertainty cannot be adequately modelled by 
belief functions, and certainly not by first-order possibility measures which correspond 

to a special type of belief function. (Second-order possibility measures, considered 
later, can be used to model natural-language judgements of uncertainty, but they are 
considerably more complicated than first-order measures.) Most examples of multival- 
ued mappings and belief functions, such as the e-contamination model in Section 5, 
involve coherent upper probabilities that are not possibility measures. Nor can (first- 
order) possibility measures model precise probability assessments. Bayesian probability 
measures are a special type of belief function or coherent lower probability, but not 
a special type of possibility measure. The upper and lower probabilities defined by a 
non-degenerate possibility distribution are always imprecise and usually very imprecise, 
i.e. P(A) - l’(A) is large for many events A. 

Example 7 (ModeDing vague predicates). Zadeh [ 1211 argues that Bayesian probabil- 
ities are unable to model judgements like “Mary is young” or “it is likely that Mary is 
young”. He shows how these judgements can be modelled by possibility distributions, 


t? Walley/Art@cial Intelligence 83 (1996) l-58 39 

although he does not give numerical assessments of the required distributions. I want 
to indicate how these judgements can be modelled using the theory of coherent lower 
previsions and the extent to which this is consistent with fuzzy reasoning. Other models 
for the same judgements are proposed in [ 11,181. 

Consid’er the proposition “Mary is young”. This provides some (but not much) infor- 
mation about Mary’s age. We could model our uncertainty about Mary’s age, given the 
information that she is young, by a coherent lower prevision. There are many possible 
ways of constructing a lower prevision. 

One way, which seems suitable for relating terms such as “young”, “old”, “tall” and 

“rich” to an appropriate numerical scale, is through assessments of upper and lower 
distribution functions [ 991. 

Let X denote Mary’s precise age. The upper and lower distribution functions of X 
are defined by F(w) = P(X < w) and F(w) = p(X 6 o) for all w > 0. Given 
only that Mary is young, it is entirely plausible that X < w for any specific w, so 
it is natural to take F(w) = 1 for all w. (This is a vacuous assessment, so F could 
be ignored altogether.) But “Mary is young” does provide some evidence that she is 

under 30, and strong evidence that she is under 40. On this basis, one might assess 
F(w) = 0 for 0 < w < 15, F(w) = (w - 15)/25 for 15 < o < 40, F(w) = 1 for 
w 3 40. (Alternatively, one might make a few qualitative judgements such as “probably 
X < 25” and “very probably X < 30” which can be translated into constraints on E. 
Note that the information provided by the assertion “Mary is young” is highly sensitive 
to the context in which it is made-is she “young to be walking” or “young to be 
retiring”?-- and it is debatable whether a useful model can be given that is independent 

of context:. We must assume, at least, that Mary is human!) 
A coherent lower prevision is then constructed by natural extension of these judge- 

ments. For any set A of possible ages (positive real numbers), we obtain upper and 
lower probabilities p(A) = sup{ 1 - E(w) : w E A}, p(A) = inf{& w) : w E AC}. This 
probability model is highly imprecise, as one would expect. 

Here the upper probability P is actually a possibility measure, with possibility dis- 

tribution Function 7~ (o) = 1 -F(w) = P(X > w), so 7rx(w> = 1 if 0 < w 6 15, 
Q(W) = (40-w)/25 if 15 < w < 40, and TX(W) = 0 if w > 40. In this case rrx(w), 

the degree to which it is possible that Mary has precise age w, can be identified with the 
upper probability that Mary’s age exceeds w, and the analysis based on natural extension 
of the lower distribution function F agrees with the analysis based on the possibility 
distribution rx; both generate the same possibility measure i? In general, when both the 
upper and lower distribution functions are non-vacuous, the upper probability produced 
by natural extension will not be a possibility measure. 

Conditional possibilities 

Suppose that unconditional upper probabilities are defined through a possibility dis- 
tribution :r by P(A) = sup{7r(o): w E A} and we wish to construct conditional upper 
probabilities P(. 1 B). Assume that r(w) > 0 for some w in B, so that F(‘(B) > 0. 
(Otherwise there is no useful information in 7r from which to construct P(. 1 B) .) 
The conditional probabilities can be constructed by natural extension, using Eq. (5). 


40 l? Walley/Art$cial Intelligence 83 (1996) 1-58 

Because the corresponding lower probabilities are 2-monotone, 

F(A 1 B)=- 
F(A n B) 

P(AnB) +p(ACnB) 

sup{~( w) : w E A r-7 B} 

=sup{a(w): w cAnB}+l -sup{~(w): we AUP} 

= sup{~( w 1 B) : o E A} 

where the conditional possibility distribution IT(. 1 B) is defined by 

(15) 

dw) 
T(W ( B) = T(W) + 1 - max{7r(w),/3}’ 

if w E B and n-(w) > 0, 
(16) 

0, if w E BC or T(W) = 0, 

and p =P(BC). 
It can be verified that sup{~( w 1 B): w E L?} = 1 so that T(. 1 B) is a possibility 

distribution. This shows that if the unconditional upper probability P is a possibility 
measure then so is the conditional upper probability p(. 1 B) defined by natural exten- 
sion, provided p(B) > 0. Thus conditioning a possibility measure by natural extension 
produces another possibility measure. 

Example 8 (Examples of conditioning). Consider the model for the judgement “Mary 
is young”, characterised by the possibility distribution T(W) = 1, if 0 < o < 15, 
z-(w) = (40 - w) /25 if 15 < w < 40, T(W) = 0 if w 2 40. Suppose we learn the 

additional information that Mary’s age belongs to a specified set of real numbers, B. 
We can update upper and lower probabilities concerning Mary’s age simply by updating 
the possibility distribution T to T(. 1 B). 

First let B be the event that Mary is no older than 30 years. Using Eq. (16) we find 
that T(W / B) = r(w) if 0 < w < 30, T(W 1 B) = 0 if w > 30. Here T is updated 
simply by truncating it at age 30; the plausibility of ages below 30 does not change. 

As a second example, if B is the event that Mary’s age is not between 20 and 30 
years, we find that TT( w 1 B) = n-(w) if 0 < w < 20, n-( w I B) = 0 if 20 < w < 30 or 
w 3 40, and T( w 1 B) = (40 - w) /(45 - w) if 30 < o < 40. Here the plausibility of 
ages below 20 does not change, but the new information makes ages between 30 and 
40 more plausible than before. 

Finally, notice that if B does not contain the entire interval (0,151 then the updated 
probabilities will be vacuous, in the sense that a(o I B) = 1 if w E B n (0.40) and 
T(W I B) = 0 otherwise. If we learn, for example, that Mary is at least 5 years old 
then T( w I B) = 1 if 5 < w < 40 and T(W 1 B) = 0 otherwise; all ages between 5 
and 40 become fully plausible. More generally, if BC contains a state w that is fully 
plausible (i.e. T(W) = 1) then F(BC) = 1 and hence the probabilities conditional on 
B are vacuous, in that rr(w 1 B) = 1 if w E B and T(W) > 0, while T(W I B) = 0 
otherwise. 

In this last case, where P(B) = 0, the conditional probabilities defined by natural 
extension are essentially vacuous. This is generally the case when natural extension 


P. Walley/Arti@ial Intelligence 83 (1996) l-58 41 

is used to condition on an event with lower probability zero. It can be understood 
through the consistency principle stated near the end of Section 4. In the example let A 
denote “Mary’s age is less than 30” and let B denote “Mary’s age is at least 5”. Then 
P(B) = 0, and it would be consistent with the initial possibility measure to make a 

further judgement that P(A rl B) = 0 precisely, which would produce P(A 1 B) = 0. To 

be consistent with this we need E(A 1 B) = 0. The same argument applies when A is 
any proper subset of the interval [ 5,40). 

So natural extension cannot produce any nontrivial inferences when the conditioning 

event B has P(B) = 0. (Unfortunately, for possibility measures there are many such 
events B. In fact, for every event B, either E(B) = 0 or E( BC) = 0.) The problem is 
that the initial model is too imprecise to determine conditional probabilities. One way 
to avoid this is to specify a more precise model, in particular one in which P(B) > 0. 
Of course the corresponding upper probability may no longer be a possibility measure. 

Dempster’s rule of conditioning 

Another option is to use a different rule of conditioning when p(B) = 0. When applied 
to a possibility measure ?r for which F(B) > 0, Dempster’s rule of conditioning gives 
the conditional upper probabilities FD(A 1 B) = sup{?s)(w 1 B): w E A}, where the 
conditional possibility distribution fl(. I B) is defined by 

,&u 1 B) = 
if o E B, 

if o E BC, 
(17) 

and k = sup{~-( w): w E B} = p(B). The conditional upper probability FD(. I B) is 
a possibility measure. Writing rrE(. 1 B) and FE(. I B) for the conditional possibility 
distribution and conditional upper probability defined by natural extension, we have 

#(w I 13) < T~(OJ I B) for all w, and PD(A I B) < F&A I B) for all sets A. 
Thus the conditional probabilities determined by Dempster’s rule are always at least as 

precise as those defined by natural extension. 
Consider the three examples of conditioning the information about Mary’s age. In the 

first example, where B is the interval (0,301, rD (. I B) = c#(. I B) and the two rules 
produce the same solution. In the second example, where B = [ 20,30]‘, T’ (w 1 B) 
agrees with ,rrE( w 1 B) except when 30 < o < 40, in which case &‘( w I B) = T(W) is 
smaller than 7rE ( w I B) . The biggest difference between the two rules occurs in the third 
example, where B = [ 5, cm): wD (w 1 B) agrees with the initial degree of possibility 
n-(w) if 5 < w < 40, whereas rE (w I B) = 1. In this case conditioning by Dempster’s 
rule preserves the information in the initial possibility distribution but natural extension 
does not, and Dempster’s rule is arguably more reasonable. (In fact &‘(w I B) = T(W) 
for all w in B in all three examples, because k = p(B) = 1 in each case.) 

Dempster’s rule for conditioning possibility distributions (17) can also be derived 
within possibility theory. The information that event B has occurred is represented 
by a possibility distribution rf which is simply the indicator function of B. This is 
combined with the initial possibility distribution r using the minimum rule, producing 
the conditional possibility distribution ~(w I B) 0: min{r(w), n’(w)}, so ~(0 I B) 


42 P. Walley/Arti$cial Intelligence 83 (1996) I-58 

c( z-(w) if w E B and V(W 1 B) = 0 if w E BC. The proportionality constant is 
determined by the fact that r(. 1 B) is a possibility distribution, and the solution agrees 
with Dempster’s rule ( 17). A different rule for conditioning possibility distributions is 

proposed in [ 171. 

Example 9 (Probabilistic quali$cation). Now consider the qualified judgement “it is 
likely that Mary is young”. Zadeh [ 1211 treats this as information about an underlying 
probability density function for Mary’s age, whereas I regard it simply as information 
about Mary’s age. Let B denote the event that Mary is young, according to the criteria 
used by the speaker. The conditional probabilities P(A 1 B) and Z’( A 1 B) can be 
identified with the (unconditional) probabilities based on the information “Mary is 
young” which were defined previously. The judgement that B is likely is translated as 
p(B) 2 i. The natural extensions of these assessments are 

P(A) =iP(A 1 B) = iinf{F(w): w E AC} = i - $sup{r(w): OJ E AC}, 

P(A) = i + iP(A 1 B) = 1 - i inf{F(w): w E A} = i + $ sup{rr(w): w E A}, 

for any set A of possible ages. 
Again the upper probability is a possibility measure, with possibility distribution 

function 7~( w) = 1 - iF( w). The effect of the qualification “it is likely that” is simply 

to reduce the lower distribution function I; by a factor of $. (Equivalently, replace r by 

i + ir.) Although it produces a possibility distribution for Mary’s age X, this analysis 
is much simpler than that of Zadeh [ 1211, who requires a possibility distribution to be 
defined on the space of all probability density functions for X. 

More generally, suppose that B(. I B) and p(. I B) are upper and lower probabilities 
based on information B, and this information is qualified in some way that can be 

translated as J’(B) 2 /?, e.g. by asserting that B is “very probable” or by using one of 
the other natural-language expressions listed later in this section. Then upper and lower 
probabilities based on the qualified information can be computed by e(A) = pl’( A I B) 
and p(A) = pP( A 1 B) + 1 - p. This is a simple method of discounting information. 

To see that such an analysis will not always produce a possibility measure, consider 
the upper probabilities generated by natural extension from the two judgements “it is 
likely that Mary is young” and “it is likely that Mary is older than 10 years”. Let A 
and B denote the events “Mary is younger than 10 years” and “Mary is older than 40 
years”. Then ‘ir( A) = P(B) = i but P( A U B) = 1 > max{P( A), p( B)}, so H is not 
a possibility measure. 

Cheeseman [ 1 l] suggests several Bayesian models for the uncertainty about Mary’s 
age, given the information “Mary is probably young”. He would first model the uncer- 
tainty given “Mary is young” by specifying a probability density function for Mary’s 
age, and similarly for the uncertainty given “Mary is not young”. To model “proba- 
bly” he would construct a second-order probability density on the unit interval. This is 
analogous to Zadeh’s second-order possibility distribution for “likely”. 

Thus Cheeseman’s analysis requires second-order assessments of similar complexity 
to Zadeh’s. However Cheeseman claims that “For the accuracy appropriate to this type 


P Walley/Artifcial Intelligence 83 (1996) l-58 43 

of linguistic information, it is sufficient to extract a single estimate (probability)” as an 
approximation to the second-order density, and he chooses the precise probability 0.9 
(the mean of his second-order density) to represent “probably”. So Cheeseman, given 
only the information “probably A”, would be willing to bet on A at odds of 9 to 1 
on! Both the second-order density and the precise value 0.9 have strong implications 
concerning betting rates and other decisions, and they are therefore inadequate models 
for an imprecise term like “probably”. Cheeseman’s overall model for the uncertainty 

about Mary’s age is a precise probability density function, which is similarly inadequate 
to model the vagueness of the information on which it is based. 

Vague probability judgements (second-order possibility measures) 

Now consider the translation of vague probability judgements such as “event A is 
likely” or “A is very likely” into possibility distributions. Zadeh [ 117,121] treats such 
a judgement as information about the precise probability (p) of event A, and models it 
through a possibility distribution function 7rt, defined on the probability interval [0, 11. 
For instance, he models “it is likely that A” by a possibility distribution function rP(p) 
that is zero when 0 < p < i and increases nonlinearly on i < p 6 1 to a maximum 
7rP( I) = 1. He models “it is very likely that A” by squaring the function rt,. Watson, 
Weiss and Donnell [ 1091 model “pretty likely” by a possibility distribution function 
that is zero on the interval 0 < p 6 0.55 and roughly constant on 0.65 < p < 0.9. More 
generally, a judgement such as “it is likely that Mary is young” would be translated by 
assigning a degree of possibility m(f) to every conceivable probability density function 
f for Mary’s age. 

It is strange that the theory of possibility measures, apparently designed to model the 
imprecisiamn in ordinary-language judgements of uncertainty, needs to refer to underlying 

probabilities that are precise. The number r,.,(p) measures the degree of possibility that 
the true probability of A is precisely p (given the judgement “it is likely that A”), 
i.e. it measures second-order uncertainty about a first-order probability. I have already 
remarked that it is unclear what is meant by “degree of possibility”, but there is a 
more basic difficulty here: it is not even clear what is meant by “the true probability of 

A”, and why this should be assumed to have the properties of a Bayesian probability 
measure. 

The most natural interpretation is that the “true probability of A” is the subjective 
probability assigned to A by the speaker, who provides only partial information about it 
when he asserts “it is likely that A”. But there is no reason to suppose that the speaker 
has made (or could make) a precise assessment of this probability-the vagueness of 
his assertion suggests just the opposite. (This issue is discussed at length in [99] .) 
Similarly, there will rarely be a “true probability” in any objective sense. If we do not 
understand what is meant by “the true probability of A”, how can we hope to assess the 
degree of possibility that p is the true probability? 

Compare Zadeh’s translation of “it is likely that A” with the behavioural translation 
of “probably A” that was suggested in Section 4. (Zadeh [ 1171 regards “likely” and 
“probable” as “more or less synonymous”.) The behavioural translation is that the 
speaker is willing to accept an even-money bet on A, which is modelled by P(A) 2 i. 


I! Walley/Art@cial Intelligence 83 (1996) I-58 45 

regarded as “non-interactive” and a joint distribution formed by taking minima. (This is 
questionable because the first judgement effectively restricts the range of possible values 
for w - d. But this kind of “interaction” between judgements is very common, and it 
is not clear how possibility theorists would combine the possibility distributions if not 
by the minimum rule.) Thus, assuming w + d + 1 = 1 so that (w, d, Z) is a probability 
distribution, 

~(w,d,Z)~min{~l(w),~~(W-d),~~(d-Z)} 

cx max{O, min{ 1 - 2w, w - d, d - I}}. 

It is necessary to renormalise the right-hand expression to have maximum value one, 
and this yields the joint possibility distribution 

7r( w, d, Z) = 9 max{O, min{ 1 - 2w, w - d, d - I}}. 

The maximum possibility is achieved by w = 8, d = f, I= $. 
Compare this with the lower prevision constructed in Section 4, which is the lower 

envelope of the set M of all probability distributions (w, d, 2) that are consistent with 
the three _iudgements, i.e. satisfy w < 1, w 2 d, d b 1. The joint possibility rTT( w, d, 1) 
is positive when (w, d, 1) lies in the interior of M, it increases with distance from the 
boundary of M, and it is zero otherwise. 

Marginal possibility distributions can be computed from r by maximisation. For 
example, after some calculations the marginal possibility distribution for 2 is found to be 
rrL(I) =max{7r(w,d,Z): 0 < w < 1, 0 < d < 1, w+d = 1-Z) =9max{O,min{$, i- 

1}}, which is unimodal with mode m = f . Hence, using Eqs. ( 19), this model generates 

upper and lower probabilities 

1 

P(L) =m+ 
s 

rL(y)dy= $ =0.2 

,?1 

78, 

p(L) =m - 
I 

rTTL(y) dy = ; =O.ll 

0 

1. 

These probabilities are more precise than the values P(L) = 5 and p(L) = 0 generated 

by the model in Section 4; here P(L) -c(L) = & r~(y) dy = i, compared with 3 for 
the earlier model. 

Calculus 

As seen in the preceding example, the rules for combining and manipulating possi- 

bility distributions involve operations of forming maxima and minima. These rules are 
derived from the basic rules of fuzzy sets proposed by Zadeh [ 1161. Without a definite 
interpretation of possibility measures the rules appear quite arbitrary. But if possibility 
measures are interpreted as coherent upper probabilities, as suggested earlier, the rules 


46 P: Walley/Artificial Intelligence 83 (1996) 1-58 

need to be evaluated according to whether they preserve coherence, in the sense that, 
after applying the rules, the overall upper probability model is coherent. In some spe- 
cial cases, such as (b) below, the rules agree with natural extension and then they do 
preserve coherence. This gives some justification for these rules. 

But other rules, such as (c), disagreee with natural extension in general. It needs to 
be investigated whether such rules can produce upper probabilities that are incoherent or 
incur sure loss. I am currently studying these questions and the results will be reported 
elsewhere. 

The most important rules are as follows. 

(a) Given a joint possibility distribution 7rx,r for two variables X and Y, a marginal 
possibility distribution for X is defined by Q(X) = sup{7~x,r(x, y): y E L$} 

where 0, = {y: (x, y) E 0) and D is the joint possibility space. This rule 
is essentially built into the definition of a possibility measure. It does preserve 

coherence of the corresponding upper probabilities. 
(b) Given two marginal possibility distributions TX and rr~~y, where X and Y are logi- 

cally independent variables, a joint possibility distribution for X and Y is defined 
by ?TX,J( x, y) = min{rx(x), 7ry( y)}. This implies, in terms of the correspond- 
ing upper probabilities, that Px,r( A x B) = min{Fx( A), &( B)} for product 
sets A x B, and this rule agrees with natural extension of the marginal upper 
probabilities to a joint upper probability. However the two rules can disagree for 
sets that are not products. 

(c) Given two possibility distributions ~1 and ~2, concerning the same unknown 
w but based on “non-interactive” bodies of evidence, a combined possibility 
distribution is defined by 7~( w) c( min{rt (w), 772( to)}, where the normalising 
factor is chosen so that n has supremum value 1. This rule was used to combine 
judgements in the football example. Rule (b) can be regarded as a special case 
of (c) with w = (x, y). Another special case is the rule of conditioning discussed 
earlier, where ~2 is the indicator function of the conditioning event, which agrees 
with Dempster’s rule of conditioning ( 17). 

The more general rule (c) appears to have a similar role in combining possibility 
distributions to Dempster’s rule for combining belief functions. It is used in expert 
systems to combine information from different sources. However the rule disagrees with 
natural extension in general (although one special case of agreement was noted in (b)), 
and it need not produce coherent inferences. The next example shows that rule (c) and 
Dempster’s rule of combination can produce quite different answers. 

Example 11 (Two unreliable witnesses). Consider the example in Section 5 of two 
unreliable witnesses. The first witness has credibility at and reports that event Ct 
occurred. This can be modelled by a marginal possibility distribution 7~1 that assigns 
~1 (Cl) = 1 and ~1 (C-f) = 1 - “1. This generates the upper and lower probabilities 
assumed in Section 5, PI (Cl ) = 1 and et (Ct ) = cyt . Similarly the information provided 
by the second witness is modelled by a possibility distribution ~2 (C2) = 1 and 7r2( Cg) = 
1 -cQ. 

If we combine the two marginal possibility distributions by the minimum rule, we 
obtain degrees of possibility 1, 1 - LYZ, 1 - LYI and 1 - max{Lyt , a~} for the elementary 


P. Walley/Artijicial Inrelligence 83 (1996) l-58 47 

events CI flC2, CI nC;, CfflC2 and CynCi. This generates upper and lower probabilities 
p( Cl n C;!) = 1 and ~(CI fl C2) = min{cul, CQ}. The lower probability is quite different 
from the value alay2 which is given by all three models in Section 5, based on different 
ways of formalising the judgement that the two reports are independent. There is also 

a discrepancy between the values of F(C,” fl Ci), which are 1 - max{crl , a~} for this 
model and ( 1 - CYI ) ( 1 - a~> for the three earlier models. 

In this example the combined possibility distribution seems less satisfactory than 
the three earlier models. When crl = cy:! = $, for example, the earlier models give 

p( Cl n Cz) = i whereas the rule for combining possibility measures gives p( Cl n CT) = 
4. The latter value seems unreasonable; one would expect p( Cl n C2) to be somewhat 

less than the marginal lower probabilities a and E( CZ), which are each i, unless 
there is evidence that the two reports are correlated in a particular way. 

In fact, for any joint possibility measure, the corresponding lower probability must 

satisfy p(Cl n C;?) = min{P(CI),P(C2)} since this is a general property of necessity 
measures. But this property seems inappropriate when the two sources of information are 
judged to be independent. So it does not seem possible to adequately model independence 
judgements using possibility measures. 

Computation 

The computation of inferences and decisions from possibility distributions requires, 
in general, the solution of a nonlinear programming problem. (Compare with the com- 
putation of natural extension suggested in Section 4, which involves only linear pro- 
gramming.) For example, decisions are made from second-order possibility distributions 
by computing “fuzzy expectations”, i.e. degrees of possibility for all possible expected 
values x of a random variable X, by maximising n-(P) over all probability distributions 
P which satisfy P(X) = x. Because rr( P) will generally be a nonlinear function of the 

probability masses, this entails a nonlinear programming problem-in fact, a separate 
problem for each possible value of x! (See [ 109,117,121] for more details.) Similarly 
the computation of a marginal possibility distribution involves, in general, many non- 

linear programs. So computations will often be difficult, despite the apparent simplicity 
of the calculus. The computations are generally easier for first-order possibility distri- 
butions than for second-order distributions because the optimisations are generally over 
lower-diml:nsional spaces. 

Assessment 

How di:fficult is it to make the assessments required for a fuzzy analysis? The aim 
is to allow users of the system (or domain experts) to express their uncertainty in 
natural language, in whatever forms are most convenient. This is valuable, as it makes 

the task OF assessing uncertainty relatively simple for the system user. The difficulties 
are passed on to the experts on fuzzy logic, who must translate the natural-language 
judgements into possibility distributions. For example, they must translate “it is likely 
that A” into a function rr defined on the probability interval [0, l] . More complex 


48 I? Walley/Art@cial Intelligence 83 (1996) 1-58 

judgements of uncertainty concerning 0 must be translated into a function 71 defined 
on the space of all probability distributions on 0. A great deal of input is needed to 
specify these functions, and there seems to be a great deal of arbitrariness in selecting 
them. Again it is generally easier to assess a first-order possibility distribution than a 
second-order distribution because the former is usually defined on a lower-dimensional 
space. 

There seems to be little guidance in the literature on possibility theory about how 

to select the required functions. (But see [ 17, p. 191, for some suggestions.) Specific 
assessments of possibility distributions are sometimes given to illustrate the methodology, 
but these appear quite arbitrary and no justification is offered. For example, Zadeh 
[ 1171 specifies two quite different possibility distributions to model the judgement 
“likely” in successive examples: his Example 1.1 has 40.8) = 0.9 while Example 1.2 
has 7r(O.8) = 0.5. Another translation, shown graphically in his Fig. 1, seems to be 
different again. No reasons are given to support any of these assessments, and it is 
hard to see how they could be supported until the meaning of the numbers m(p) is 

clarified. 
In practice one might use standard translations for common terms such as “likely”, 

“very likely” or “more likely than”. That is, all instances of the term “very likely” 
would be translated into the same possibility distribution. The choice of this standard 
distribution may still be somewhat arbitrary but no further input would be needed. 

The danger of using standard translations is that different people seem to use expres- 

sions like “very likely” in different ways and usage varies with context [ 3,106]. Ideally, 
one would like to use a different translation for each person and context. That would 
be impracticable in most cases, as it would require too much input from the person. 
However one could require the standard translation of “very likely” to encompass the 
meanings intended by most speakers in most contexts. We can illustrate the idea as 
follows, using the behavioural interpretation of upper and lower probabilities. 

Standard translations into lower probabilities 

Consider the expression “probably A”. Most people who use this expression would be 
willing to bet on A at even stakes. Hence they would accept the behavioural translation 
p(A) > 0.5. We could, in principle, determine a lower probability E(A) for person 
i by finding the least favourable odds at which he is willing to bet on A. Provided 
&(A) 2 0.5 for all persons i, the behavioural translation l’(A) 2 0.5 is acceptable 
to all, and we may say that it encompasses the meanings of “probably A” for all the 
persons. It could then be used as a standard (“cautious” or “default”) translation of 
“probably A” that is acceptable in the great majority of problems. In any particular 
problem we would try to (a) check with the person that he accepts the behavioural 
translation of his judgement, and (b) encourage him to make this more precise, e.g. if 
he is willing to bet on A at odds of 3 to 2 on then he will accept the more precise 
judgement P(A) > 0.6. If he was unwilling to do this, or if the checks could not be 
carried out, then the standard translation would be used. 

The standard translations should be based on empirical studies of the meanings of 
terms such as “probably”. It seems, for example, that most people do not apply the term 


P. Walley/Art@ial Intelligence 83 (1996) 1-58 49 

“probable” to events with very high probability, so that one might include the constraint 
P(A) 6 0.9, as well as l’(A) 2 0.5, in the standard translation of “A is probable”. In 
the light of empirical studies such as [ 3,45,106], I suggest the following translations of 
common judgements in natural language. In all these expressions I take “probable” to be 
synonymous with “likely”. I identify negative expressions such as “A is improbable” with 
positive expressions such as “AC is probable”. The latter is translated into &( AC) > 0.5, 
which is equivalent to P(A) < 0.5. This translation is somewhat cautious, as there is 
evidence that most people would accept a stronger translation P(A) < 0.4. Many of the 
other translations could be strengthened considerably in suitable contexts. 

l A is extremely probable + l’(A) 2 0.98. 
l A has very high probability --) l’(A) 2 0.9. 
l A is highly probable -+ I’(A) > 0.85. 

l A is very probable + P(A) 2 0.75. 
l A has a very good chance + P(A) 2 0.65. 
l A is quite probable + P(A) 2 0.6. 
l A is probable [likely] + l’(A) > 0.5. 

l A is improbable [unlikely] --+ F(A) < 0.5. 
l A is somewhat unlikely [quite improbable] + P(A) < 0.4. 

l A is very unlikely --f P(A) < 0.25. 
l A has little chance --+ P(A) < 0.2. 
l A is highly improbable + P(A) < 0.15. 
l A has very low probability + p(A) < 0.1. 
l A is extremely unlikely -+ P(A) < 0.02. 
l A has a good chance -+ Z’(A) 2 0.4, P(A) < 0.85. 

l The probability of A is about (Y --+ Z’(A) 2 a - 0.1, P(A) 6 a + 0.1. 
l A is more probable than B + F’(A - B) 2 0. 

There is some degree of arbitrariness in choosing a single number (0.85) to translate 
an expression such as “highly probable”, but this is much less than the arbitrariness 
in choosjng a possibility distribution function r, i.e. a degree of possibility r(p) for 
every value of p between 0 and 1. Similar translations, from imprecise probabilities 
to lingui:stic expressions, could be used to make a system’s conclusions and reasoning 
more comprehensible to a user. 

Possibility theory and fuzzy logic have made a substantial contribution to our un- 
derstanding of uncertainty by drawing attention to the important problem of modelling 
natural-language judgements and suggesting possibility measures as suitable models. 
Provided second-order measures are allowed, possibility measures can mode1 a wide 
variety of uncertainty judgements, including imprecise judgements in natural language. 
Bayesian probabilities are inadequate to mode1 vague predicates and vague probability 
judgements, as in “Mary is probably young”, but it appears that upper and lower previ- 
sions may be adequate, especially as possibility measures can be regarded as a special 
type of coherent upper probability. 


SO f? Walley/Art$cial Intelligence 83 (1996) 1-58 

It is important to distinguish between first- and second-order possibility measures. 
First-order possibility measures, which are used to model the uncertainty generated by 
vague predicates like “young” and “tall”, can be interpreted as coherent upper probabili- 
ties. They are mathematically simpler than the other types of coherent upper probabilities 
as they can be defined in terms of possibility distributions, functions whose domain is 
LJ rather than the power set of a. This may simplify computations and assessment, 
e.g. possibility measures can sometimes be assessed through upper or lower distribution 

functions. First-order possibility measures are likely to be useful models in many expert 
systems where information is elicited in terms of vague predicates. 

The main defect of first-order possibility measures is that they cannot model many 

of the common types of uncertainty. In particular they cannot model natural-language 
judgements of uncertainty or precise probability assessments. Possibility measures are 
a very special type of upper probability (in fact they correspond to a special type of 
belief function), and upper probability is itself inadequate in many problems; upper and 
lower previsions are needed in general. 

Second-order possibility measures, defined on the set of all probability distributions, 
are much more expressive than first-order measures. They can model both precise and 
imprecise judgements of uncertainty. But they are much more complicated than first- 
order measures and they have some serious defects. Second-order models are difficult to 

interpret and assess and they seem overly complicated to model qualitative judgements. 
Theoretical papers on fuzzy logic tend to ignore the practical problem of assessing 

second-order functions, each defined on a space of probability distributions. Compu- 
tations, e.g. of marginal possibility distributions or fuzzy expected values, generally 
require nonlinear programming. 

The rules for combining possibility measures are simple (they involve operations 
of maximising and minimising), but they do not appear to have any compelling jus- 
tification. If possibility measures are interpreted as coherent upper probabilities then 
the rules can be compared with the rules of natural extension and we can investi- 
gate whether they preserve coherence. They appear to do so in some cases but not in 
general. 

7. Comparison and evaluation 

Most practical reasoning involves uncertainty. In expert systems, as in many other 
fields, it is frequently necessary to measure the uncertainty. What measure should we 
use? Four measures have been considered in this paper. Here is a summary of the extent 
to which they satisfy the criteria proposed in Section 2. 

Interpretation, calculus and consistency 

These criteria are satisfied only by the theories of Bayesian probability and coherent 
lower previsions, which start with a simple behavioural interpretation and use this to jus- 
tify principles of coherence and to derive rules for combining and updating probabilities. 
There is a general method, called natural extension, for computing new previsions from 


R Walky/Art~cial Intelligence 83 (1996) 1-58 51 

an arbitrary set of judgements. Natural extension can be used to make inferences and 
decisions. There are general methods for checking consistency of the initial judgements, 
and the rules of natural extension ensure that conclusions will be consistent with the 

judgements. 
Possibility theory and the Dempster-Shafer theory do not do so well on these cri- 

teria. These theories propose mathematical properties to characterise their uncertainty 
measures and simple rules for combining measures, but they fail to give any compelling 
justification for their properties and rules. Belief functions can be interpreted in terms of 
multivalued mappings but this supports Dempster’s rule of combination only when some 

strong assumptions of conditional independence are added. Both theories lack methods 
for checking the consistency of models and conclusions, and their rules can produce 
conclusions that are intuitively inconsistent with the initial model. 

Imprecision 

Bayesian probabilities cannot adequately model ignorance, imprecise or qualitative 
judgements of uncertainty, or vague predicates in natural language. The other measures 
can do so to some extent, but belief functions and first-order possibility measures are 
not sufficiently general to model common types of imprecise judgements. 

Assessment 

Insufficient guidance is given in these theories (especially possibility theory) about 
how to make assessments of uncertainty, although the theories of belief functions and 
lower previsions do take this problem seriously. All the theories seem to need judgements 
of independence or non-interaction, in multivariate problems, to reduce the number of 
assessments. Lower previsions and second-order possibility measures can model a wide 
variety of uncertainty judgements, including qualitative judgements in ordinary language, 
although assessments of second-order possibility distributions seem complicated and 
arbitrary. Assessment is onerous for Bayesians because they require precise assessments 
and a complete probability model; the other theories are more flexible. 

Computation 

For all the measures, computational feasibility will depend on the type and complexity 
of the model and the number of assessments. For lower previsions, the computation 
of inferences and decisions by natural extension involves linear programming. More 
work is needed to develop computationally efficient methods and tractable models, 
particularly based on conditional independence. Bayesian probabilities, belief functions 
and first-order possibility measures, as special types of lower or upper previsions, may 
be computationally simpler in some cases. (They are tractable in singly-connected belief 
networks, for example.) Computational methods are most highly developed for Bayesian 
models. !$econd-order possibility measures are less tractable than the other measures- 
computations involve nonlinear programming. 


52 P. Walley/Artifcial Intelligence 83 (1996) l-58 

8. Conclusion 

So what measures of uncertainty should be used in expert systems? I believe that 
Bayesian probabilities, upper and lower probabilities, belief functions and first-order 
possibility measures can all be useful in different types of problems, for instance when 
the information is in the form of extensive statistical data, bounds on probabilities, 
multivalued mappings or natural-language judgements respectively. 

All these measures can be useful in special types of problems, but none of them is 
adequate as a general model of uncertainty. For example, none of them can adequately 
model the three qualitative judgements in the football example. Bayesian probabilities 
are not sufficiently genera1 because, in many problems, information is scarce and judge- 
ments are imprecise. Belief functions and possibility measures are not sufficiently genera1 
because many examples involve lower probabilities that are not even 2-monotone. Up- 
per and lower probabilities are not sufficiently general because they do not uniquely 
determine upper and lower previsions and conditional probabilities; upper and lower 
previsions produce greater precision in inferences and decisions. I suggest that upper 
and lower previsions, which include the other measures as special cases, are sufficiently 
general to model the most important types of uncertainty. 

This raises the question: to what extent is the theory of coherent lower previsions 
compatible with the other theories? It is compatible with the Bayesian theory as the two 
theories have a similar behavioural interpretation and the rules of the Bayesian theory 
are special types of natural extension. So the theory of lower previsions can be regarded 
as a generalisation of the Bayesian theory; the two theories agree in the special case 
where all probability models are precise. The theory of Bayesian sensitivity analysis or 
“probability intervals” is also, to a large extent, compatible with the theory of lower 

previsions. 
Possibility theory and Dempster-Shafer theory appear to be less compatible with 

the theory of lower previsions. However I believe that the theory of lower previsions 
can incorporate some of the most useful features of these theories. In particular, one 
of the main contributions of the two theories has been to suggest some flexible and 
powerful methods for modelling particular types of partial information, notably through 
multivalued mappings and natural-language judgements. These can be used as methods 
for assessing lower previsions. 

The theories differ in their interpretation of uncertainty measures. The interpretation 

of belief functions and possibility measures is unsettled and controversial, but it seems 
to me that at least some of the interpretations that have been proposed for these mea- 
sures are consistent with the behavioural interpretation of lower and upper previsions. 
Two advantages of the behavioural interpretation are that it relates uncertainty measures 
to decisions and thereby explains how they can be used, and it leads to coherence 
principles which can be used to check consistency of models with conclusions. (Both 
features are lacking in Dempster-Shafer and possibility theory.) The behavioural inter- 
pretation is sufficiently genera1 to encompass multivalued mappings, inexact judgements 
and lower envelopes of Bayesian probability measures as sources of lower previsions, 
yet specific enough to support the principles of coherence and the rules of natural 
extension. 


I? Walley/Art@cial Intelligence 83 (1996) 1-58 53 

The theories have quite different rules for combining and updating uncertainties, and 
in this respect they do appear to be incompatible. Dempster’s rule of combination and 
the minimum rule for combining possibility distributions can, in some problems, produce 
upper and lower probability models that are incoherent, and in these cases the rules are 
not compatible with the theory of lower previsions. These rules are controversial. They 
can lead to intuitive inconsistencies as well as mathematical incoherence and they seem 
to be applicable only in a limited range of problems. If these rules were given a more 
restricted role in the Dempster-Shafer theory and in possibility theory then the three 
theories would be considerably more compatible. 

Natural extension is an alternative to Dempster’s rule and the minimum rule, and it is 
worth investigating other rules that preserve coherence but produce stronger conclusions 
than natural extension. This may be a way of developing the calculus of belief functions 
and possibility measures. The behavioural interpretation and principles of coherence can 
impose some much needed discipline on the theories of belief functions and possibility 
measures, without which these theories can produce inconsistencies. 

Of course the theory of lower previsions is itself undeveloped in some important 

repects. Further investigation is needed into the practical problems of modelling, as- 
sessment and computation. Particular problems are to determine how best to model 
independence judgements, and how to compute natural extensions and propagate lower 
previsions efficiently. It is also important to compare the four approaches in some real- 

istically complex expert systems. I hope that this paper will persuade some people to 

consider these problems. 

Acknowledgements 

I am grateful to Dr. Hitoshi Furuta for inviting me to lecture on this subject in Osaka 
in April 1990 and for suggesting this paper. Many people commented on an earlier 
version of the paper [ 1001 which was circulated in April 1991. In particular I must 
thank Terrence Fine, Frank Hampel and Philippe Smets for stimulating discussions, and 
Smets, Nit Wilson and George Klir for sending me copies of relevant papers. Nit Wilson 
contributed some exceptionally detailed and insightful comments on an earlier version 

and the final version has benefited greatly from his suggestions. 

References 

1 1 1 K.-P. Adlasnig and G. Kolarz, CADIAG-2: computer-assisted medical diagnosis using fuzzy subsets, 
in: M M. Gupta and E. Sanchez, eds., Approximate Reasoning in Decision Analysis (North-Holland, 
Amsterdam, 1982) 219-247. 

12 1 S.K. Andersen, K.G. Olesen, F.V. Jensen and E Jensen, HUGIN-a shell for building Bayesian belief 
universes for expert systems, in: Proceedings IJCAI-89, Detroit, MI (Morgan Kaufman% San Mateo, 
CA, 1989) 1080-1085. 

13 1 R. Beyth-Marom, How probable is probable? Numerical translations of verbal probability expressions, 
J. Forecasting 1 (1982) 257-269. 


54 P Walley/Artijicial Intelligence 83 (1996) I-58 

14 1 G. Biswas and T.S. Anand, Using the Dempster-Shafer scheme in a mixed-initiative expert system 
shell, in: L.N. Kanal, T.S. Levitt and J.F. Lemmer, eds., Uncertainty in Artificial Intelligence 3 (North- 
Holland, Amsterdam, 1989) 223-239. 

15 I G. Biswas, M. Oliff and R. Abramczyk, OASES (Operations Analysis Expert System): an application 

in fiberglass manufacturing, Int. J. Expert Sysf. Res. Appl. 1 (1988) 193-216. 
16 I B.C. Buchanan and E.H. Shortliffe, eds., Rule-Eased Expert Systems (Addison-Wesley, Reading, MA, 

1984). 

17 I J.-C. Buisson, H. Farreny and H. Prade, The development of a medical expert system and the treatment 
of imprecision in the framework of possibility theory, InjI Sci. 37 (1985) 21 l-226. 

18 I R. Buxton, Modelling uncertainty in expert systems, Int. J. Man-Mach. Stud. 31 ( 1989) 415-476. 
19 I L.M. De Campos, M.T. Lamata and S. Moral, The concept of conditional fuzzy measure, fnr. J. Intell. 

Syst. 5 ( 1990) 237-246. 
I IO I J.E. Cano, S. Moral and J.F. Verdegay, Propagation of convex sets of probabilities in directed acyclic 

networks, in: Proceedings Fourth IPMU Conference (1992) 289-292. 
I 1 I I F Cheeseman, Probabilistic versus fuzzy reasoning, in: L.N. Kanal and J.F. Lemmer, eds., Uncertainty 

in Artificial bzfelligence (North-Holland, Amsterdam, 1986) 85-102. 
I I2 I L.J. Cohen, The Probable and fhe Provable (Clarendon Press, Oxford, 1977). 
I I3 I P.R. Cohen, Heuristic Reasoning about Uncertainty: An Art@cial Intelligence Approach (Pitman, 

London, 1985). 
1 I4 I R.G. Cowell, BAIES-a probabilistic expert system shell with qualitative and quantitative learning, 

in: J.M. Bemardo, J.O. Berger, AI? Dawid and A.F.M. Smith, eds., Bayesian Statistics 4 (Clarendon 
Press, Oxford, 1992) 595-600. 

[ IS I A.P. Dempster, Upper and lower probabilities induced by a multivalued mapping, Ann. Math. Stat. 38 
(1967) 325-339. 

[ I6 I D. Dubois and H. Prade, An introduction to possibilistic and fuzzy logic& in: P Smets, A. Mamdani, D. 
Dubois and H. Prade, eds., Non-Standard Logics for Aufomated Reasoning (Academic Press, London, 
1988) 287-326. 

[ I7 I D. Dubois and H. Prade, Possibility Theory (Plenum Press, New York, 1988). 
1 I8 I D. Dubois and H. Prade, Modelling uncertain and vague knowledge in possibility and evidence theories, 

in: R.D. Shachter, T.S. Levitt, L.N. Kanal and J.F. Lemmer, eds., Uncertainty in Arti’cial Intelligence 
4 (North-Holland, Amsterdam, 1990) 303-318. 

I I9 I D. Dubois and H. Prade, Evidence, knowledge, and belief functions, Int. J. Approx. Reasoning 6 ( 1992) 
295-J 19. 

[ 20 I R.O. Duda, PE. Hart and N.J. Nilsson, Subjective Bayesian methods for rule-based inference systems, 
in: Proceedings 1976 National Computer Conference, AFIPS 45 (1976) 1075-1082; also in: B.W. 
Webber and N.J. Nilsson, eds., Readings in Artificial Infelligence (Morgan Kaufmann, Los Altos, CA, 
1981) 192-199. 

12 I I R.O. Duda and R. Reboh, AI and decision making: the PROSPECTOR experience, in: W. Reitman, 
ed., Artificial fntelligence Applicationsfor Business (Ablex, Norwood, NJ, 1984) I 11-147. 

I22 I A. Dutta, Reasoning with imprecise knowledge in expert systems, Inf Sci. 37 ( 1985) 3-24. 
I23 1 R. Fagin and J.Y. Halpern, A new approach to updating beliefs, in: PP. Bonissone, M. Henrion, L.N. 

Kanal and J.F. Lcmmer, eds., Uncertainty in Arttficial Intelligence 6 (North-Holland, Amsterdam, 
1991) 347-374. 

125 

I 24 I H. Farreny, H. Pnde and E. Wyss, Approximate reasoning in a rule-based expert system using possibility 
theory: a case study, in: H.J. Kugler, ed., Information Processing ‘86 (North-Holland, Amsterdam, 
1986) 407-413. 
K.W. Fertig and J.S. Breese, Interval influence diagrams, in: M. Henrion, R.D. Shachter, L.N. Kanal 

and J.F. Lemmer, eds., Uncertainty in Artificial Intelligence 5 (North-Holland, Amsterdam, 1990) 
149-161. 
M. Fieschi, M. Joubert, D. Fieschi, G. Botti and M. Roux, A program for expert diagnosis and 

therapeutic decision, Med. I$ 8 ( 1983) 127-135. 
T.L. Fine, Theories of Probability (Academic Press, New York, 1973). 
B. de Finetti, Theory of Probability Vol. I (Wiley, London, 1974). 
B. de Finetti, Theory of Probability Vol. 2 (Wiley, London, 1975). 

1% 

I 27 
128 
I 29 


I? Walley/Art$cial Intelligence 83 (1996) I-58 55 

I 30 ] L.C. van der Gaag, Computing probability intervals under independency constraints, in: l?P Bonissone, 
M. Hencion, L.N. Kanal and J.F. L.emmer, eds., Uncertainty in Art@cial Intelligence 6 (North-Holland, 
Amsterdam, 199 1) 457-466. 

I 3 I I W.A. Gale, ed., Artificial Intelligence and Statistics (Addison-Wesley, Reading, MA, 1986). 
1321 R. Gales, Foundations for a theory of possibility, in: M.M. Gupta and E. Sanchez, eds., Fuzzy 

133 
[34 

I35 

1% 

137 

138 

I39 

I42 I E.T. Jaynes, Pupers on Probability, Sfutistics and Statistical Physics (Reidel, Dordrecht, 1983). 
I43 ) F.V. Jensen, SK. Andersen, U. Kjaerulff and S. Andreassen, MUNIN-on the case for probabilities in 

medical expert systems, Lecture Notes in Medical Informutics 33 (Springer, Berlin, 1987). 
1441 A.C. Kak, K.M. Andress, C. Lopez-Abadia, MS. Carol1 and J.R. Lewis, Hierarchical evidence 

accumulation in the PSEIKI system, in: M. Henrion, R.D. Shachter, L.N. Kanal and J.F. Lemmer, 
eds., Uncertainty in Artijcial Intelligence 5 (North-Holland, Amsterdam, 1990). 

145 I S. Kellt, Words of estimated probability, Stud. Intell. 8 ( 1964) 49-65. 
I46 ] G.J. Klir and T.A. Folger, Fuzzy Sets, Uncertainty, und Information (Prentice-Hall, Englewood Cliffs, 

NJ, 1!)88). 
147 I F? Krause and D. Clark, Representing Uncertain Knowledge (Kluwer, Dordrecht, 1993). 
I48 I H.E. Kybuxg Jr, Bayesian and non-Bayesian evidential updating, Art& Intell. 31 (1987) 271-293. 
149 I K.B. Laskey, M.S. Cohen and A.W. Martin, Representing and eliciting knowledge about uncertain 

evidence and its implications, IEEE Trans. Sysf. Man Cybern. 19 ( 1989) 536-557. 
I 50 I S.L. Lauritzen and D.J. Spiegelhalter, Local computations with probabilities on graphical structures and 

their application to expert systems (with discussion), J. Roy. Stat. Sot. Ser. B 50 (1988) 157-224. 
15 I I J. Lebailly, R. Martin-Clouaire and H. Prade, Use of fuzzy logic in rule-based systems in petroleum 

geology, in: E. Sanchez and L.A. Zadeh, eds., Approximate Reasoning in Intelligent Systems, Decision 
und Control (Pergamon Press, Oxford, 1987) 125-144. 

I&m&on and Decision Processes (North-Holland, Amsterdam, 1982) 183-195. 
R. Giles, Semantics for fuzzy reasoning, Int. .I. Man-Mach. Stud. 17 (1982) 401-415. 
J. Gordon and E.H. Shortliffe, A method for managing evidential reasoning in a hierarchical hypothesis 
space. Artif: Intell. 26 (1985) 323-357. 

I B.N. Grosof, An inequality paradigm for probabilistic knowledge, in: L.N. Kanal and J.F. Lemmer, 
eds., Uncertainty in Artificial Intelligence (North-Holland, Amsterdam, 1986) 259-275. 
J.Y. Halpem and R. Fagin, Two views of belief: belief as generalized probability and belief as evidence, 
Artif: Intell. 54 (1992) 275-317. 
D. Heckerman, E. Horvitz and B. Nathwani, Toward normative expert systems 1: the PATHFINDER 
project, Methods Inky Med. 31 (1992) 90-105. 
M. Henrion, J.S. Breese and E.J. Horvitz, Decision analysis and expert systems, AI Muguzine 12 
(1991) 64-91. 
Y.-T. Hsia and PP. Shenoy, An evidential language for expert systems, in: 2. Ras, ed., Methodologies 
for Intelligenf Systems 4 (North-Holland, New York, 1989) 9-16. 

I 40 1 l?J. Huber, Robust Stafisfics (Wiley, New York, 198 1) 
141 ) J.-Y. Jaffmy, Bayesian updating and belief functions, IEEE Trans. Syst. Man Cybern. 22 ( 1992) 1144- 

1152. 

I 52 I J.F. Hemmer, Generalised Bayesian updating of incompletely specified distributions, Large Scale Syst. 
5 (1983) 51-68. 

I53 I K.S. Leung, W.S.F. Wong and W. Lam, Applications of a novel fuzzy expert system shell, Expert Syst. 
6 (1939) 2-10. 

I54 I 1. Levi, Potential surprise: its role in inference and decision making, in: L.J. Cohen and M. Hesse, eds., 
Applications of Inductive Logic (Oxford University Press, Oxford, 1980) l-27. 

I55 I 1. Levi, Consonance, dissonance and evidentiary mechanisms, in: P Gbdenfors, B. Hansson and N.E. 
Sahlin, eds., Evidentiury Value, Library of Theoria 15 (Gleerups, Lund, 1983) 27-43. 

I 56 I D.V. Lindley, Making Decisions (Wiley, London, 197 1). 
I57 I R.P. Loui, Interval-based decisions for reasoning systems, in: L.N. Kanal and J.F. Lemmer, eds., 

L’ncertuinty in Artijicial Intelligence (North-Holland, Amsterdam, 1986) 459-472. 
158 I J.D. Lowrance, Automated argument construction, J. Stat. Planning Inference 20 (1988) 369-387. 
I59 I R.C. Moore, Semantical considerations on nonmonotonic logic, Art$ Intell. 25 ( 1985) 75-94. 
I60 I N.J. Nilsson, Probabilistic logic, Artif Intell. 28 (1986) 71-87. 


56 P. Walley/Ar~ifcial Inrelligence 83 (1996) l-58 

I61 I P. Orponen, Dempster’s rule is #P-complete, A@ Intell. 44 ( 1990) 245-253. 
I62 I G. Paass, Probabilistic logic, in: P Smets, A. Mamdani, D. Dubois and H. Prade, eds., Non-Standard 

Logic.7 fiw Aufumated Reasoning (Academic Press, London, 1988) 213-25 1. 
I63 I J. Pearl, Probabilistic Reasoning in Intelligent Systems (Morgan Kaufmann, San Mateo, CA, 1988). 
I64 I J. Pearl, Reasoning with belief functions: an analysis of compatibility, Int. J. Approx. Reasoning 4 

(1990) 363-389. 

I65 I J. Pearl, Rejoinder to comments on “Reasoning with belief functions: an analysis of compatibility”, 
Int. J. Approx. Reasoning 6 (1992) 425-443. 

1661 J.R. Quinlan, Inferno: a cautious approach to uncertain inference, Cornput. J. 26 ( 1983) 255-269. 
I67 I R. Reiter, A logic for default reasoning, Artif Intell. 13 ( 1980) 81-132. 
168 1 R. Reiter, Nonmonotonic reasoning, Ann. Rev. Comput. Sci. 2 (1987) 147-186. 
I69 I A. Saffiotti and E. Umkehrer, PULCINELLA: a general tool for propagating uncertainty in valuation 

networks, in: B.D. D’Ambrosio, P. Smets and PP. Bonissone, eds., Proceedings Seventh Inrernarional 

Cwtference on Uncertainfy in AI, Los Angeles, CA (Morgan Kaufmann, San Mateo, CA, 1991) 
323-331. 

I70 I G.L.S. Shackle, Decision, Order; and 7ime in Human Affairs (Cambridge University Press, Cambridge, 
1969). 

I 7 I I G. Shafer, A Mathematical Theory of Evidence (Princeton Universtiy Press, Princeton, NJ, 1976). 
I72 I G. Shafer, Constructive probability, Synrhese 48 ( 1981) L-60. 
I73 I G. Shafer, Belief functions and parametric models (with discussion), J. Roy. Star. Sot. Ser. E 44 

( 1982) 322-352. 

I74 I G. Shafer, Probability judgement in artificial intelligence and expert systems, Star. Sci. 2 ( 1987) 3-16. 
I75 I G. Shafer, Perspectives on the theory and practice of belief functions, Int. J. Approx. Reasoning 4 

( 1990) 323-362. 
I76 I G. Shafer, Rejoinders to comments on “Perspectives on the theory and practice of belief functions”, 

Int. .I. Approx. Reasoning 6 ( 1992) 445-480. 

I77 I G. Shafer and R. Logan, Implementing Dempster’s rule for hierarchical evidence, A@ Intell. 33 

(1987) 271-298. 

I78 1 P. Shenoy and G. Shafer, Propagating belief functions with local computations, IEEE Expert 1 (3) 
( 1986) 43-52. 

I79 I F.K.J. Sheridan, A survey of techniques for inference under uncertainty, Artif: Intell. Rev. 5 ( 1991) 

89-l 19. 

I80 I E.H. Shortliffe, Computer-Based Medical Consultations: MYCIN (Elsevier, New York, 1976). 

181 

182 I 

183 

I 84 

E.H. Shortliffe and B.G. Buchanan, A model of inexact reasoning in medicine, Math. Biosci. 23 ( 1975) 

35 l-379. 
P. Smets, Belief functions, in: P Smets, A. Mamdani, D. Dubois and H. Prade, eds., Non-Standard 

Logics for Automated Reasoning (Academic Press, London, 1988) 253-286. 
P. Smets, The transferable belief model and other interpretations of Dempster-Shafer’s model, in: PP. 

Bonissone, M. Henrion, L.N. Kanal and J.F. Lemmer, eds., Uncertainty in Arrifcial Intelligence 6 

(North-Holland, Amsterdam, 1991) 375-383. 
P Smets, Resolving misunderstandings about belief functions, Int. J. Approx. Reasoning 6 ( 1992) 

321-344. 
I85 1 P. Smets and R. Kennes, The transferable belief model, Artif: Intell. 66 (1994) 191-234. 
I86 I P. Smets, A. Mamdani, D. Dubois and H. Prade, eds., Nun-Standard Logics for Automated Reasoning 

(Academic Press, London, 1988). 
I 87 ] C.A.B. Smith, Consistency in statistical inference and decision (with discussion), J. Roy. Stat. Sot. 

Ser. B 23 (1961) l-37. 
[ 88 I M. Smithson, Ignorance and Uncertainty (Springer, Berlin, 1989). 
1891 D.J. Spiegelhalter, A statistical view of uncertainty in expert systems, in: W.A. Gale, ed., Arrifcial 

Intelligence and Staristics (Addison-Wesley, Reading, MA, 1986). 
( 901 D.J. Spiegelhalter, AI? Dawid, S.L. Lauritzen and R.G. Cowell, Bayesian analysis in expert systems 

(with discussion), Stat. Sci. 8 (1993) 219-283. 
[ 9 I I D.J. Spiegelhalter and R.P. Knill-Jones, Statistical and knowledge-based approaches to clinical decision- 

support systems, with an application in gastroenterology (with discussion), J. Roy. Stat. Sot. Ser. A 
147 ( 1984) 35-76. 


P Walley/Arti$cial Intelligence 83 (1996) l-58 57 

[ 92 I T.M. Strat, Decision analysis using belief functions, ht. J. Approx. Reasoning 4 ( 1990) 391-417. 
1931 C. Smdberg and C. Wagner, Generalized finite differences and Bayesian conditioning of Choquet 

capacities, Unpublished manuscript (1991). 
I94 I B. Tessem, Interval probability propagation, Int. J. Approx. Reasoning 7 ( 1992) 95-120. 
I 95 I F. Vaorbraak, On the justification of Dempster’s rule of combination, Artif Intell. 48 ( 199 1) 17 l- 197. 
I 96 I F? Walley, Coherent lower (and upper) probabilities, Statistics Research Report, University of Warwick, 

Coventry (1981). 
I97 I P. Walley, The elicitation and aggregation of beliefs, Statistics Research Report, University of Warwick, 

Coventry (1982). 
I98 I F? Walley, Belief function representations of statistical evidence, Ann. Stat. 15 (1987) 1439-1465. 
I99 1 P. Walley, Statistical Reasoning with Imprecise Probabilities (Chapman and Hall, London, 1991). 

I 1001 P. Walley, Measures of uncertainty in expert systems, Statistics Research Report, Department of 

I101 

I102 

[ 103 

Mathematics, University of Western Australia, Perth ( 199 1) . 
I? W;rlley, Inferences from multinomial data: learning about a bag of marbles (with discussion), J. 
Roy. Stat. Sot. Ser. B 58 (1996) 3-57. 
P. Walley, Statistical inferences based on a second-order possibility distribution (with discussion), J. 
Ant. Srat. Assoc. 91 ( 1996). 
I! Walley and FM. Campello de Souza, Uncertainty and indeterminacy in assessing the economic 
viabil!ity of energy options: a case study of solar heating systems in Brazil, Energy Syst. Policy 14 
( 19913) 28 I-304. 

I 104 I P. Walley and T.L. Fine, Varieties of modal (classificatory) and comparative probability, Synthese 41 
( 197’9) 321-374. 

I 105 I P. Walley and T.L. Fine, Towards a frequentist theory of upper and lower probability, Ann. Stat. 10 
(1982) 741-761. 

I 1061 T.S. ‘Wallsten, D.V. Budescu, A. Rapoport, R. Zwick and B. Forsyth, Measuring the vague meanings 
of ptobability terms, J. Exper. Psych. General 115 (1986) 348-365. 

I 107 1 L.A. Wasserman, Comments on Shafer’s “Perspectives on the theory and practice of belief functions”, 
Int. J. Approx. Reasoning 6 (1992) 367-375. 

I 108 I L.A. Wasserman and J. Kadane, Bayes’ theorem for Choquet capacities, Ann. Stat. 18 (1990) 132% 
1339 

I 109 I S.R. Watson, J.J. Weiss and M.L. Donnell, Fuzzy decision analysis, IEEE Trans. Svst. Man Cvbern. 9 

1110 

1 III 

I112 

I113 

(1979) 1-9. 
P.M. Williams, Indeterminate probabilities, in: M. Przelecki, K. Szaniawski and R. Wojcicki, eds., 
Formal Methods in the Methodology of Empirical Sciences (Reidel, Dordrecht, 1976) 229-246. 
N. Wilson, A Monte-Carlo algorithm for Dempster-Shafer belief, in: B.D. D’Ambrosio, P Smets and 
PP Elonissone, eds., Proceedings Seventh International Conference on Uncertainty in AI, Los Angeles, 
CA (Morgan Kaufmann, San Mateo, CA, 1991) 414-417. 
N. Wilson, The combination of belief: when and how fast?, Int. J. Approx. Reasoning 6 (1992) 
377-,388. 
N. Wilson and S. Moral, A logical view of probability, in: A. Cohn, ed., Proceedings Eleventh European 
Conference on Artificial Intelligence (Wiley, London, 1994) 386-390. 

I 1 14 I H. XII, An efficient implementation of belief function propagation, in: B.D. D’Ambrosio, E! Smets and 
PP. Eionissone, eds., Proceedings Seventh International Conference on Uncertainty in AI, Los Angeles, 
CA (Morgan Kaufmann, San Mateo, CA, 1991) 425-432. 

I 1 IS I R.R. Yager, An introduction to applications of possibility theory (with discussion), Human Syst. 
Manage. 3 (1983) 246-269. 

I I 16 I L.A. Zadeh, Fuzzy sets, I@ Control 8 ( 1965) 338-353. 
I I 17 I L.A. Zadeh, The concept of a linguistic variable and its application to approximate reasoning (Part 3), 

I@ Sci. 9 (1976) 43-80. 
I I 18 I L.A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets Syst. 1 (1978) 3-28. 
I II91 L.A. Zadeh, A theory of approximate reasoning, in: J. Hayes, D. Michie and L.I. Mikulich, eds., 

Machine Intelligence 9 (Halstead Press, New York, 1979) 149-194. 
I 1201 L.A. Zadeh, The role of fuzzy logic in the management of uncertainty in expert systems, Fuzzy Sets 

Syst. 11 (1983) 199-227. 


58 P: Walley/Artijcial Intelligence 83 (1996) I-58 

1 1211 L.A. Zadeh, Is probability theory sufficient for dealing with uncertainty in AI: a negative view, in: L.N. 
Kanal and J.E Jxmmer, eds., Uncertainty in Art$cial Intelligence (North-Holland, Amsterdam, 1986) 
103-l 16.