PII: 0898-1221(90)90153-B


Computers Math. Applic. Vol. 19, No. 11, pp. 105-119, 1990 0097-4943/90 $3.00+0.00 
Printed in Great Britain. All rights reserved Copyright © 1990 Pergamon Press plc 

C O M P U T A T I O N A L  M O D E L S  O F  U N C E R T A I N T Y  

R E A S O N I N G  I N  E X P E R T  S Y S T E M S  

J. F. BALDWIN 
Department of Engineering Mathematics, University of Bristol, Bristol BS8 1TR, England 

A b s t r a c t - - T h e  use o f  support pairs associated with the facts and rules o f  a knowledge base o f  an expert 
system to capture various aspects o f  inductive reasoning is discussed. The concept o f  semantic unification 
is introduced with reference to fuzzy sets theory. In this respect a probabilistic interpretation for this 
semantic unification is described using a population voting model. Examples are discussed including 
default reasoning using support logic. 

I N T R O D U C T I O N  

In this paper the use o f  support pairs associated with Prolog type clauses to capture various aspects 
o f  inductive reasoning will be discussed. This represents a support logic programming system and 
it has been implemented in the form o f  the language Fril [1]. Fril can operate as a pure Prolog 
system if no uncertainties are involved. False facts, equivalent statements and a true logic negation 
can be used. 

Fill is being used on a variety o f  applications involving reasoning with uncertainty. These include 
AI applied to scene analysis, medical and fault diagnosis, expert systems in command and control, 
analogical reasoning, probabilistic grammars, program evaluation etc. 

Support logic programming [2-5] is an evidential reasoning system which, rather than proving 
theorems, collects evidence to support an hypothesis and also to support the negation o f  the 
hypothesis. These supports do not have to add up to unity. If no evidence is available then a support 
pair (0 1) is returned corresponding to zero support for and zero support against, i.e. total 
uncertainty. A logic o f  support has been studied under various names by different people. 
K o o p m a n  called it a logic o f  intuitive probability and Carnap a theory o f  confirmation [6]. These 
and other theories are based on a single number representing the supports and some are 
comparitive only. 

Zadeh's fuzzy sets theory forms a basis for a possibility theory [7]. Bellman made contributions 
to the analytical formalism o f  the theory o f  fuzzy sets [8]. Support logic programming uses this 
theory. Fuzzy sets can be used as the referents o f  concepts and semantic unification is used to match 
two fuzzy terms which are syntactically different but which semantically have something in 
common. An interpretation o f  fuzzy sets which shows how semantic unification can be given a 
probabilistic interpretation, necessary for support logic programming, is given below. The calculus 
o f  Fril is consistent with probability theory and this supposes that all propositions are true or false, 
although it may not be possible to acquire evidence which will allow a probability o f  1 or 0 to be 
obtained. 

Expert systems and other knowledge engineering systems such as vision understanding and 
speech recognition programs must be able to cope with knowledge bases with incomplete 
information. Incomplete information can be o f  two types: one type concerns lack o f  data and the 
other, lack o f  concept definition. 

For example, in a vision system only part o f  the object may be in view because o f  occlusion. 
This gives rise to uncertainty concerning what the object may be since the part that can be seen 
could belong to several different objects. Possible extensions o f  the part o f  the object will give rise 
to different interpretations and each extension will have a probability associated with it. This 
probability is assessed from other evidence picked up from other parts o f  the picture and may not 
be able to be assessed with complete accuracy. It may be possible to assess its value as being 
contained within a certain interval. 

On the other hand, the whole object may be in complete view but it is still difficult to say exactly 
what it is. F o r  example, if the object in total view was a bushy tree like object, no complete support 

105 


106 J.F. BALDWIN 

could be given for it being a tree or for it being a bush, simply because o f  a lack o f  precise definition 
for bush and tree. Most concepts which we use in our daily lives are o f  this nature. F o r  example 
we cannot prescribe sufficient and/or necessary conditions for classifying what we mean by a 
"humane society", " a  good business venture", " a  comfortable seat", "a well-structured program", 
"a stable system", " a  reliable system" etc. Even accepting that these definitions are context 
dependent, we will still have difficulty in giving exact definitions within a given context. 
Furthermore, the relevant context may again be a border line decision. 

In law, cases are often looked at by considering similar cases from the past where the judgements 
are known. The present case may not precisely fit any of the historical ones but only have 
similarities with each. The judgement on the present case will then be influenced in some form by 
the combination o f  those judgements of similar past cases. What form this so called combination 
should take is not at all obvious. 

Heuristics used by experts are probabilistic in nature. Truth is not guaranteed when certain 
conditions hold. Difficulties o f  entailment when true propositions are replaced by highly probable 
propositions are well-known. Contraposition is valid for deductive entailments but does not hold 
for high probabilities. Deductive entailment is transitive but strong inductive support is not. More 
importantly the following valid argument o f  deductive logic does not carry over into the inductive 
case. I f  A entails B then A A N D  C entails B. In fact if Pr(B [A) is high, this provides no constraint 
on the value o f  Pr(B I A, C) since 

Pr(C 1,4, B). Pr(B I A) 
Pr(B I A, C) = 

P r ( C I A )  

This, o f  course, is obvious from sample space considerations. All relevant criteria must be 
considered when giving supports to predicates as suggested by Hempel's maximum specificity 
conditions [9]. 

Causal connections are important in expert systems. Any sensible theory o f  causation is 
probabilistic. Frequent conjunctions often occur, constant conjunctions rarely [10], The modelling 
o f  causality is also discussed in Ref. [11]. 

Probabilistic reasoning can be viewed as constraint reasoning in which the various probabilistic 
statements given provide evidence to constrain the probability o f  another statement to be contained 
within a certain interval. 

If we know that: 

Pr(P --~ Q) = 2/3, 

Pr(P) = 4/5, 

what can we conclude a b o u t  Pr(Q)? A point value probability cannot be determined since the 
above two probabilistic statements gives insufficient information for this. The statements constrain 
the interval which contains Pr(Q). Three possible cases must be analysed, since the case 
corresponding to N O T ( P - - .  Q) A N D  (NOT P )  is not possible because o f  the inconsistency of the 
two statements. 

Case 1 Case 2 C a s e  3 

P--*Q NOT(P---~ Q )  P.--*Q 
P P N O T  P 

Q NOT P W o r l d  I: Q 
W o r l d  2: N O T  Q 

Q: {1,1} Q: {0,0} Q: {0,1} 
XI X 2 X 3 

where xt is Pr(case i) and {a, b} means that N E C ( Q ) =  a, POS(Q) = b, where a = 0 if N E C ( Q )  is 
false and 1 if it is true, and b = 0 if POS(Q) is false and 1 if it is true. N E C  and POS are modal 
logic operators. Then since 

Xl + X2+X3---- l, 

X I + X 2 ~--- 4/5, 


Computational models 107 

since P r ( P ) =  Pr(P A N D  P - - - . Q ) +  Pr(P A N D  N O T  (P---.Q)) 

x l + x 3 = 2 / 3 ,  

since Pr(P---*Q)= Pr(P--*Q A N D  P ) +  Pr(P---~Q A N D  N O T  P )  so that 

x1=7/15, x2=1/3, x3=1/5. 

Hence Pr(NEC Q) = 7/15 and Pr(POS Q) = 7/15 + 1/5 = 2/3. From Pr(NEC Q) ~< Pr(Q) ~< 
Pr(POS Q) it follows that Pr(Q) lies in the interval [7/15, 2/3]. 

This interval can be determined using linear programming formulations and this is discussed in 
Ref. [5]. 

S U P P O R T  P A I R S  

The theory o f  uncertainty which forms the basis o f  support logic programming is based on the 
association o f  support pairs with Horn clauses as used in Prolog. 

Any proposition P is assumed to be true or false. A two valued logic is assumed. There is no 
mention o f  truth values lying between 0 and 1. Furthermore any valid formula o f  first order logic, 
F say, will be such that there is support o f  1 for and 0 against. 

Evidence, E, is used to assign a necessary support, Sn(P J E) for, and a necessary support, 
Sn(NOT P I E) against any proposition P being true. Possible supports Sp(P I E) and Sp(NOT P I E )  
are defined as 

Sp(P I E )  = 1 - Sn(P I E); Sp(NOT P I E )  = 1 - Sn(NOT P I E ) .  

These can be further interpreted in terms o f  the modal logic necessity and possibility operators, 
namely 

Sn(P I E) = Pr(NEC P I E ) ;  Sp(P I E) = Pr(POS P I E ) ,  

where modal operators are to be understood in the context o f  possible world semantics. 
The belief that the truth value 1 can be assigned to P using evidence E, Pr(P I E), lies in an interval 

determined by the necessary and possible supports for P :  

Pr(P [E) lies in [Sn(P I E), Sp(P [E)]. 

It is necessarily true that 

S n ( P A N D N O T P ) = 0  and Sp(P A N D  N O T  P )  -- 0, 

Sn(P O R  N O T  P ) =  I and Sp(P O R  N O T  P ) =  I. 

S U P P O R T  L O G I C  P R O G R A M M I N G  

A support logic program consists o f  a sequence o f  support clauses. 
A support clause is a clause with an associated support structure. 
A clause is a list o f  one or more atoms. 
An atom is an atomic formula which is a list whose first element is apredicate symbol or a relation 

and the remaining elements are terms. 
A term is a number, constant, variable or list. 
The elements o f  a list are terms. 
A support structure can be a single support pair or a list o f  two support pairs. 
A support pair is a list o f  two elements; the first element being called the necessary support and 

the second element the possible support. 
Variables, constants, numbers and lists have their usual meaning. 
Support clauses can be further divided into simple support clauses and compound support 

clauses. 

CAMWA 19/I I - - H  


108 J . F .  BALDWIN 

An example o f  a simple support clause is: 

((coml2 is a large senate committee)):(0.6 0.8), 

which could mean that the degree o f  belief that c o m l 2  is a large senate committee is some number 
in the interval [0.6 0.8]. The doubt expressed by using this interval arises because o f  the imprecise 
definition o f  large. This support pair could be determined by asking a large representative sample 
o f  university members to vote whether they accepted that senate committees o f  various sizes were 
large. The vote could be "yes", " n o "  or "abstain" for each size presented. The proportion who 
vote "yes" for a given size would represent the necessary support for it being a large committee. 
This number plus the number o f  abstentions would give the possible support. The number o f " n o ' s "  
would give the necessary support against and this with the abstentions would give the possible 
support against. O f  course, the doubt could also arise because o f  the uncertainty in the actual 
numbers on a committee. The final support pair used must take account of both these cases of 
uncertainty and this could be done using more rules to determine the support pair. 

An example o f  a compound statement with a single support pair is: 

((committee C contains a professor) 
(C is a large senate committee)):(0.9 1), 

which says that at least 90% of large senate committees contain a professor. In other words the 
conditional probability Pr(committee C contains a professor lC is a large committee) lies in the 
interval [0.9 1]. In Prolog terms this corresponds to a fact. 

The simple clause 

((p)): (0 0) 

says that p is false. 
An example o f  a compound statement with two support pairs is: 

((performance X good) 
(engineers_report X ok) 
(efficiency X near_optimal)): ((0.9 1) (0 0.2)), 

which says that if the body o f  the rule, in this case the conjunction o f  the two atoms 
(engineers_report X ok) and (efficiency X near_optimal), is true then the probability that the 
performance o f  X is good lies in the interval [0.9 1], while if the body is false this probability lies 
in the interval [0 0.2]. 

A special case o f  this rule, namely 

((p)(q)):((l 1)(0 0)) 

says that p is equivalent to q. 

THE C A L C U L U S  OF S U P P O R T  LOGIC 

The calculus used in support logic programming is fully described in Ref. [5]. We will not repeat 
this here but discuss a simple example to illustrate the main points o f  the calculus. 

Since propositions are assumed to be either true or false, assuming one accepts the scoring 
argument o f  De Finetti and Lindley [12] then if the support pairs correspond to single numbers, 
a probability calculus must be used. The calculus for the support pairs is then easily determined 
since probabilities are contained within intervals determined by the support pairs. A unique 
probability is not determined and a simple constrained optimisation problem gives the required 
support pair for any compound statement in terms o f  the support pairs o f  its parts. 

Consider the following example in which we know that 

P r ( a l q ) = 0 . 5 ,  P r ( a I N O T  q ) = 0 . 4 ,  

Pr(a Is) = 0.8, Pr(a LNOT s) = 0.4, 

Pr(q) = 0.7, 

Pr(s) = 0.175, 


Computational models 109 

then we can determine Pr(a) in two ways, namely 

Pr(a) = Pr(a Iq)" Pr(q) + Pr(a tNOT q). Pr(NOT q) = 0.47, 

Pr(a) = Pr(a Is). Pr(s) + Pr(a I N O T  s). Pr(NOT s) = 0.47. 

If the answers in each case had been different, we would have concluded that the knowledge base 
was inconsistent. 

The problem expressed in Fril is: 

((a) (q)):((0.5 0.5)(0.4 0.4)), 
((a) (s)):((0.8 0.8)(0.4 0.4)), 
((q)) :(0.7 0.7), 
((s)):(0.175 0.175), 

and the query 

yields the solution 

qs((a)), 

(0.47 0.47). 

Fril uses the two p r o o f  paths to provide the answer to the query and gives the answer (0.47 0.47) 
in each case. These intervals are intersected to give (0.47 0.47) as the final solution. 

We will now consider a modified problem in which the point probabilities are not precisely 
known. 

Pr(a Jq) is in [0.45, 0.55], 
Pr(a Is) is in [0.75, 0.85], 
Pr(q) is in [0.65, 0.75] 
Pr(s) is in [0.1, 0.2] 

Pr(a I N O T  q) is in [0.35, 0.45], 
Pr(a INOT s) is in [0.35, 0.45], 

yields the solution 

and the query 

((a)) (q)):((0.45 0.55)(0.35 0.45)), 
((a) (s)):((0.75 0.85)(0.35 0.45)), 
((q)):(0.65 0.75), 
((s)):(0.1 0.2), 

qs((a)), 

(0.38 0.545). 

The basic rule used to combine support pairs from different p r o o f  paths is the intersection rule. 
An alternative method o f  combining p r o o f  paths is available in Fril and corresponds to using a 
Dempster type rule [13]. This should only be used when the p r o o f  paths correspond to independent 
viewpoints. In this case conflicts can occur and the Dempster rule is one way o f  resolving the 
conflicts. If the user has some other way he wishes to combine solutions from different viewpoints 
he can express this as a rule in Fril. 

Nothing has been said a b o u t  finding the support pairs o f  a conjunction or disjunction when given 
support pairs for each atom. The rules used for Fill are consistent with probability theory. 

We can use the theorem o f  total probability as before to obtain P(a) but we must use interval 
arithmetic. The two methods give [0.38, 0.57] and [0.355, 0.545], respectively for Pr(a). Any point 
in the final interval containing the point probability Pr(a) must lie in both these intervals using 
a consistency argument. Therefore we must intersect the intervals to obtain the final interval. This 
defines the rule o f  how solutions are combined from different p r o o f  paths in Fill. For this case 
the final answer for Pr(a) is that it is contained in the interval [0.38, 0.545]. 

The Fril program for this case is 


110 J . F .  BALDWIN 

D E F A U L T  R E A S O N I N G  

Consider the Fill program: 

((live_.another_five_years X) 
(english X) 
(age X 30) 
(not suffers_from_lung_cancer X)):(0.9 l) 

((live_another_five_years X) 
(english X) 
(age X 30) 
(suffers_from_lung_cancer X)):(0 0.1). 

I f  we do n o t  k n o w  a n y t h i n g  a b o u t  the health o f  a 30-year old English person, these two rules will 
use ((suffers.from_lung_cancer person)):(0 1) and conclude (0 l) as the support pair for (live an- 
other_five_years person). Intuitively we m a y  feel t h a t  an answer something like (0.85 1) should 
have been given, since we k n o w  that most 30-year old Englishmen do live a n o t h e r  five years. We 
could therefore add the additional rule to our program: 

((live_another_five_years X) 
(english X) 
(age X 30)):(x y ) ,  

where x a n d  y are chosen appropriately. W h a t  is appropriate? Strictly (x y )  = (0 1) to be consistent 
with the other two rules. But this would not satisfy the reason we are introducing this rule. We 
will choose (x y ) =  0.85 1). 

I f  n o t h i n g  is k n o w n  a b o u t  the health o f  the person then Fril will use each o f  the rules, obtaining 
(0 1) f r o m  the first two and (0.85 1) from the last giving the answer (0.85 1). I f  the second rule 
is applicable then rules 2 a n d  3 will give inconsistent answers. Fill recognizes this a n d  because the 
b o d y  o f  rule 3 is contained in the b o d y  o f  rule 2, ignores rule 3 a n d  uses the first two rules only. 
This is a consequence o f  the m a x i m u m  specificity requirement. 

N o n - m o n o t o n i c  logic is used to avoid problems like this but these logics have inconsistencies 
[14]. By using Fril there is no reason to introduce these various additional logics a n d  default 
reasoning. In the case o f  the s t a n d a r d  problem that "all birds can fly", " a  penguin is a bird . . . .  a 
penguin c a n n o t  fly" it is, o f  course, false to say t h a t  all birds can fly. M o s t  birds can fly so t h a t  
if all t h a t  is k n o w n  is that X is a bird there is a high probability that it can fly. This will n o t  be 
the case if X is a penguin and is treated as the above problem. Details o f  this and similar problems 
can be f o u n d  in Refs [3, 5]. 

A recursive definition o f  a concept " t a l l " ,  for example, can be written in Fill. This uses the fact 
t h a t  if y o u  remove a little height from a tall m a n  there is still a high support, b u t  n o t  a certain 
support, for him still being tall. Details can be f o u n d  in Refs [1, 4]. 

V O T I N G  M O D E L  I N T E R P R E T A T I O N  OF A F U Z Z Y  SET 

A fuzzy subset f with respect to the set F is defined by means o f  a membership function 
M f: F--~ [0, 1 ]. In other words an element, e, o f  the set F belongs to the fuzzy subset f with a degree 
o f  membership Mf(e). H o w  can we interpret this membership level in more specific terms which 
will give some justification to its actual value a n d  also its existence a n d  use? 

One possible interpretation is in terms o f  the voting behaviour o f  a p o p u l a t i o n  P, say, o f  persons, 
all o f  w h o m  have their own understanding o f  the meaning o f f  We all use the term " t a l l "  in relation 
to a person's height w i t h o u t  having a precise understanding o f  w h a t  it means. W h e n  we use the 
word in o r d i n a r y  conversation, we assume others will be able to interpret it in more or loss the 
same way as ourselves. It is certainly true t h a t  there is a set o f  heights which everyone would accept 
as satisfying the concept o f  "tall height" a n d  there is a set o f  heights which every one would accept 
as n o t  satisfying this concept. The difficulty arises for the set o f  heights in between these two sets. 
I f  this intermediate set is null then we have an exact definition for " t a l l "  b u t  otherwise we do not. 
Each member o f  this intermediate set can have a degree o f  membership in the set o f  heights 
representing " t a l l " ,  but how do we choose the actual degree? 


C o m p u t a t i o n a l  m o d e l s  111 

Consider a set F and l e t f b e  a fuzzy subset o f  this set. Let each person belonging to a population 
P vote on whether to accept or reject the membership o f  a given element e o f  set F as belonging 
t o f .  Each person must accept or reject, agree or n o t  agree to the elements membership. Abstentions 
a n d  partial agreements are n o t  allowed. M f ( e )  is equated to the p r o p o r t i o n  o f  persons who vote 
for accepting e as a m e m b e r  o f f .  Similarly (1 - Mf(e)) is the p r o p o r t i o n  o f  persons who reject e 
as belonging to f so t h a t  this is the membership level for e n o t  belonging to f .  Each individual will 
have a threshold level such that if his d o u b t  in an element e belonging to f increases above this 
level he will reject its membership, otherwise he will accept it. It is similar to a member o f  a j u r y  
having to say guilty or innocent for the person on trial. The evidence presented at the trial m a y  
n o t  be conclusive but the person must still m a k e  a final judgement. I f  people are alowed to abstain 
we return to an analogue with support pairs. 

Example 
Let F be the set o f  positive integers {50 55 60 65 70 75} and f be defined by the following 

membership function 

Mf(55) = 0.2, Mf(60) = 0.5, Mf(65) = 0.8, Mf(70) = 1, Mf(50) = Mf(75) = 0. 

The fuzzy set f can therefore be represented as 

f =  5510.2 + 6010.5 + 6510.8 + 701 I. 

I f  we take P as m a d e  up o f  10 people then the voting pattern to give consistency with this definition 
could be 

Person 

1 2 3 4 5 6 7 8 9 10 

70 70 70 70 70 70 70 70 70 70 
65 65 65 65 65 65 65 65 
60 60 60 60 60 
55 55 

The integers given are those integers which the person accepted as satisfying f .  Therefore person 
1 accepts {70, 65, 60, 55} as satisfying f while person 7 only accepts {70, 65}. 

This interpretation assumes that persons who vote yes for 55 also vote yes for 60 and for 65. 
Similarly it assumes persons who vote yes for 60 vote yes for 65. The assumption that this 
interpretation uses is t h a t  a person who votes for an element h in the set F with membership value 
M f ( h )  as belonging to f will also vote for a n y  other element o f  F satisfying f if it has a higher 
membership value t h a n  Mf(h). We will call this the constant threshold model since it corresponds 
to each person having a threshold level for acceptance o f  an element o f  F i n f w h i c h  does not vary 
with the element o f  F chosen. 

A n  alternative interpretation could be: 

Person 

1 2 3 4 5 6 7 8 9 10 

70 70 70 70 70 70 70 70 70 70 
65 65 65 65 65 65 65 65 
60 60 60 60 60 

55 55 

Other possible interpretations can be given but the one which intuitively seems more reasonable 
is the c o n s t a n t  threshold model. 

I N T E R S E C T I O N  A N D  U N I O N  O F  F U Z Z Y  SETS 

Consider two fuzzy sets f l ,  f 2 ,  with membership functions M f l ,  Mf2, b o t h  defined as fuzzy 
subsets o f  the set F. 


112 J . F .  BALDWIN 

Consider an element h o f  F. The proportion o f  persons o f  population P who vote for h satisfying 
both the concepts defined by f l  and f 2  is contained in the interval 

Iconj = [ m a x { M f l ( h )  + M f 2 ( h ) -  1,0}, min{Mfl(h), Mf2(h)}]. 

If we use the constant threshold model then one assumes that the threshold levels o f  the persons 
P stay constant for judging different concepts. This means that if a person votes yes for one concept 
when having a certain degree of doubt, that person will also vote yes for another concept if faced 
with the same degree o f  doubt. In this case we assume that those people who voted yes for the 
concept with the lower membership value will also have voted yes for the other. Thus for this 
assumption the membership value for element h for the intersection o f  f l  and f 2  will be 
min{Mfl(h), Mf2(h)}. 

Similarly for the union o f f l  and f 2  the membership value for element h will lie in the 

Idisj = [max{Mfl(h), Mf2(h)}, min{Mfl(h) + Mf2(h), 1}]. 

With the same assumption as above, the minimum number of persons would vote yes for the union, 
so that the membership level for element h belonging to the union o f f l  and f 2  is max{Mfl(h), 
Mf2(h)}. 

This is the assumption we make in Fril for fuzzy sets and is the usual definition for fuzzy 
conjunction and disjunction. 

More generally we can define a mapping T 

T: [0, 1].[0, 1]--,[0, 1] 

The mapping 

S: [0, 1]*[0, 1]--~[0, 1]. 

satisfies the same axioms as for T except that (l) is replaced by (l') where 

(1') S ( a  O) = a. 

Examples o f  instances o f  S conorms corresponding to the T norms (1)-(3) above are respectively 

(1) S(a b ) =  max{a b}, 
(2) S(a b ) = a  + b  - a . b ,  
(3) S(a b) = m i n { a  + b  1}. 

Assumptions can be made about the voting model to obtain each o f  these answers. For example, 
if it is assumed that no preference can be made for any possible voting pattern for P in relation 
to f l  or f 2 ,  then all possible distributions must be allowed. For each pai~ o f  distributions the 
proportion o f  those persons voting for both and the proportion o f  those persons voting for at least 
one can be determined. This gives the values o f  the conjunction and disjunction, respectively for 
this pair o f  distributions. This is repeated for all possible pairs o f  distributions and the values for 
the conjunction and disjunction determined in each case. If it is assumed that any pair o f  

which satisfies the axioms 

(1) T(a, 1) = a, 
(2) T(a, b) = T(b, a), 
(3) T ( a , b ) > t T ( e , d )  if a > i c  and b > i d ,  
(4) T(a, T(b, c)) = T(T(a, b), c). 

T is called a T-norm and generalizes the A N D  corresponding to conjunction 
Examples o f  instances o f  T are 

(1) T(a, b) = rain{a, b}, 
(2) T(a, b) = a .b, 
(3) T(a, b) = max{a + b - 1, 0}. 

A dual norm, called the T-conorm, S, exists which generalizes disjunction. For any T-norm T there 
exists a dual norm S such that 

S(a, b) = 1 - T((1 - a), (1 - b)). 


C o m p u t a t i o n a l  m o d e l s  113 

distributions is as likely as a n y  other, then the expected values for the conjunction a n d  disjunction 
will be equal to M f l  (h). Mf2(h) a n d  M fl (h) + M f2(h) - M f l  (h). Mf2(h), respectively. 

S E M A N T I C  U N I F I C A T I O N  

Let f l ,  f 2  be two fuzzy subsets o f  the set F and suppose that each o f  these can be associated 
with some object X. Then we can ask the question, what is the probability o f  " X  i s f l "  given that 
we k n o w  that " X  is f 2 " .  

Consider the more specific case in which we k n o w  that J o h n  is between 5 ft 10 in. and 6 ft. Then 
the probability t h a t  J o h n  is over 5 ft 10 in. is 1. The probability that J o h n  is below 5 ft 9 in. is 0. 
The probability that J o h n  is between 5 ft 9 in. a n d  5 ft 11 in. lies between 0 a n d  1. This is so since 
J o h n ' s  actual height could belong to the interval [5 ft 10 in., 5 ft 11 in.] which would give a 
probability o f  1 o f  J o h n  being between 5 ft 9 in. a n d  5 ft 11 in., but it could also belong to [5 ft 
11 in., 6 ft] which would give zero probability. The first can occur with a probability x~ and the 
second case with a probability x2. I f  all t h a t  we k n o w  a b o u t  x~ and x2 is that both are non-negative 
a n d  they sum to one, then the probability o f  J o h n  being between 5 ft 9 in. a n d  5 ft 11 in. lies 
anywhere in the interval [0, 1]. I f  we can estimate x~ and x~ then 

Pr(John is between 5 ft 9 in. and 5 ft 11 in.) = x~. 

I f  an equally likely distribution is assumed over the interval 

[5ft 1 0 i n . , 6 f t ]  then X l = 1 / 2 .  

This example illustrates the non-fuzzy version o f  the situation posed above, with respect to the 
fuzzy subsets f l  and f 2 .  We can ask a similar question for the fuzzy case. W h a t  is the probability 
that J o h n  is tall given that we k n o w  that J o h n  is a little above average height? We should be able 
to arrive at an answer using a similar approach to that used for the non-fuzzy case but taking into 
account t h a t  n o t  every height has membership level o f  1 or 0 in the sets " t a l l "  and " a  little above 
average height". 

S E M A N T I C  

We will now return to the example above 
actual definitions for f l ,  f 2  a n d  F. 

The Pr(X is f l  [ X is f 2 )  can be interpreted 
X is f 2 ) .  N o w  

N o w  

Therefore 

U N I F I C A T I O N  A N D  P O P U L A T I O N  V O T I N G  M O D E L  

o f  determining Pr(X is f l  I X is f 2 )  when we are given 

in this voting model as Pr(P accepts X is f l  I P is told 

P r ( P  accepts X is f l  I P is told X is f 2 )  

= S U M  {Pr(P accepts X is f l  I P accepts X is h, P is told X is f 2 ) .  
h 

Pr(P accepts X is h I P is told X is f2)} 

= S U M  {Pr(P accepts X i s f l  IP accepts X is h). 
h 

P r ( P  accepts X is h I P is told X is f 2 ) } .  

Pr(P accepts X i s f l  [P accepts X is h ) =  Pr(h is accepted a s f l )  
= M f l  ( h ) .  

P r ( P  accepts X is f l  I P is told X is f 2 )  

= S U M  M f l ( h ) . P r ( P  accepts X is h IP is told X is f 2 ) .  
h 

Pr(P accepts X is h I P is told X is f 2 )  

= S U M  Pr(person i chooses X is h IX is f 2 ) .  
i 


1 1 4  J ,  F .  BALDWIN 

I f  person i is told t h a t  X is f 2 ,  then this person has an interpretation for the label f 2  in the f o r m  
o f  a set o f  acceptable values. I f  h is n o t  one o f  these values P r ( p e r s o n  i chooses X is h l X  is f 2 )  --- 0. 
I f  the set o f  values consists only o f  h then P r ( p e r s o n  i chooses X is h IX is f 2 )  = 1. I f  the set contains 
m o r e  than o n e  value including h then probabilities m u s t  be assigned to each value. All that is k n o w n  
a b o u t  these probabilities is that they sum to 1. 

E x a m p l e  

Consider 

then P interprets f 2  as 

f l  = 5510.2 + 6010.5 + 6510.8 + 7011, 

f 2  -- 5511 + 6010.2, 

F = {50, 55, 60, 65, 70, 75}, 

P f r s o u  

1 2 3 4 5 6 7 8 9 10 

55 55 55 55 55 55 55 55 55 55 
60 60 

so that 

P r ( P  accepts X is 551P is told X is f 2 )  = 4/5 + x .  1/5, 

P r ( P  accepts X is 601P is told X is f 2 ) =  (1 - x ) .  1/5 

following optimization p r o b l e m  

m a x / m i n  z = 0 . 2 x  I q- 

subject to 

xl + x 2 ~  < 1, 

x2 ~< 0.2, 

X 3 = 0 ,  

X 4 ~--- 0 ,  

X 1 " ~ ' X 2 " ~ X 3 " 3 t - X 4  ~ -  1, 

0.5x2 + 0.8x3 + lx4, 

subject to 

for the c o n s t a n t  threshold m o d e l  used for interpreting f 2 .  
I f  all possible distributions corresponding to interpretations o f  f 2  are considered then the 

following optimization m o d e l  results 

m a x / m i n  z ffi 0.2xl + 0.5x2 + 0.8x3 + 1 X 4 ,  

x l ~ < l ,  

x2 ~< 0.2, 

X 3 "~- 0 ,  

x 4 = 0 ,  

xl + x2 + x3 + x4 = 1. 

where 0 ~< x ~ 1. 

This follows since persons 3 - 1 0  accept X is 55 as this is the only value they can choose. Persons 
1 a n d  2 have a choice a n d  x represents their p r o b a b i l i t y  o f  choosing 55. 

T h e r e f o r e  

P r ( P  accepts X is f l  I P  is told X is f 2 )  = 0.2(0.8 + 0.2x) + 0 . 5 , 0 . 2 ( 1  - x); 0 ~ x ~< 1, 

so t h a t  P r ( P  accepts X is f l  I P is told X is f 2 )  lies in the interval [0.2, 0.26]. 
W e  thus conclude that P r ( X  is f l  IX is f 2 )  is in [0.2, 0.26]. This is equivalent to solving the 


Computational models 115 

In both cases: min z gives lower b o u n d  a n d  m a x  z gives upper b o u n d  for Pr(X is f l  IX is f 2 ) .  
In this example the support pair is the same whatever interpretation is used for f 2 .  The next 

example will yield different results for different interpretations. 

A M O R E  C O M P L E X  E X A M P L E  

F =  {el, e2, e3, e4, eSe6}, 

f l  = e l l 0 . 1  + e 2 1 0 . 3  + e 3 1 0 . 5  + e 4 1 0 . 7 e 5 1 1 ,  

f 2  = e l  10.2 + e2J 1 + e310.7 + e410.1, 

Pr(X is f l  IX is f 2 )  lies in [z rain, z max], where z rain and z max are determined by solving one 
o f  the following optimization models. 

(l) Using c o n s t a n t  threshold model: 

m i n / m a x  z = 0.1x~ + 0.3x2 + 0.5x3 + 0.7x4 + xs, 

subject to 

Therefore 

Thus 

x4 ~< 0.I, 

xl + x4 ~ 0.2, 

xj + x3 + x4 ~< 0.7, 

xl + x2 + x3 + x4 = 1. 

z min = 0 . 1 , 0 . 2  + 0.3*0.8 = 0.26, 

z max = 0 . 7 , 0 . 1  + 0 . 5 , 0 . 6  + 0 . 3 , 0 . 3  = 0.46. 

Pr(X is f l  IX is f 2 )  lies in [0.26, 0.46]. 

(2) Allowing for all possible interpretations o f f 2  

m i n / m a x  z = 0.1x~ + 0.3x2 + 0.5x3 + 0.7x4 + xs, 

subject to 

X4 ~<0.1 

xl ~< 0.2 

x3 ~ 0.7 

X2~<l 

X I " ~ - X 2 " ~ - X 3 " ~ - X 4  = 1. 

Therefore 

Thus 

z min = 0.1 *0.2 + 0.3*0.8 = 0.26, 

z m a x  = 0 . 7 , 0 . 1  + 0 . 5 , 0 . 7  + 0 . 3 , 0 . 2  = 0.48. 

Pr(X is f l  IX is f 2 )  lies in [0.26, 0.48] 

R E S T R I C T I O N  M O D E L  

The generalization o f  the linear p r o g r a m m i n g  solutions for general fuzzy subsets f l  a n d  
f 2 ,  defined on F, when X is f 2  is given a n d  Pr(X is f l I X  is f 2 )  is to be determined, is as 


116 J.F. BALDWIN 

follows: 

Let 

F = { e , } ;  i = l . . . n ,  

f l  = S U M  {e, lMfl(e~)}, 
e i  

f 2  = S U M  {e,[ M f 2 ( e i ) } .  
el 

(1) Using the c o n s t a n t  threshold m o d e l  for f 2 :  

max/min z = S U M  Mfl(e~)'Xei , 
e i  

subject to 

S U M  Xk <. Mf2(ei); all el, S U M  x ~ ,  = 1 .  
{k: Mf2(k) ~< Mf'2(ei)} e i 

(2) Using all possible interpretations for f 2 :  

m a x / m i n  z = S U M  M f l  (e~)- xei, 
e i  

subject to 

xe, ~< Mf2(ei); all ei, S U M  xei = 1. 
e i  

I f  z min, z max c o r r e s p o n d  to the min z and m a x  z, subject to the given constraints, respectively 
then 

P r ( X  is f l  I!X is if2) lies in [z min, z max]. 

This c o r r e s p o n d s  to interpreting m e m b e r s h i p  functions o f  fuzzy sets as possibility restrictions. F o r  
s o m e  element e~ in F, Mfl(e~) gives the u p p e r  b o u n d  to the possibility that ei belongs t o f l .  M f l  
restricts the possibility that h belongs to f l  to M f l  (e~). I f  we k n o w  the p r o b a b i l i t y  distribution over 
the elements o f  F consistent with the statement that X is f 2 ,  i.e. we k n o w  P r ( X  is e~lX is f 2 )  for 
all e i in F, then 

P r ( X  is f l l X  is f 2 )  = S U M  Mfl(ei)" P r ( X  is e,[ X is f 2 ) .  
e i 

This uses a weighted sum o f  the conditional probabilities where the weights are the m e m b e r s h i p  
values o f  the fuzzy set f l .  F o r  the non-fuzzy case the characteristic values o f  the characteristic 
function for f l  w o u l d  be used. 

In fact we d o  n o t  k n o w  w h a t  the values for {Pr(X is e i l X  is f 2 ) }  are. All we k n o w  is that the 
value o f  P r ( X  is ei l X is f 2 )  will be constrained b y  the fact that X is f 2 .  W e  will use one o f  the 
following a s s u m p t i o n s  for the f o r m  that this constraint can take: 

(1) Using the c o n s t a n t  threshold m o d e l  for f 2 :  

P r ( X  is one o f  Fs I X  is f 2 )  ~< max{Mf2(ei):e~ in F s }  for any subset Fs o f  F. 

(2) Using all possible interpretations for f 2 :  

P r ( X  is e~lX is f 2 )  ~< Mf2(e~); all e~ in F. 

I f  we p u t  P r ( X  is ei l X  is f 2 )  = x,,, then this gives the constraints given a b o v e  for each o f  the models. 
There is n o w  n o  unique solution for P r ( X  is f l  IX is ./2) so that we can find u p p e r  and lower 

b o u n d s  for this by maximizing and minimizing S U M { M f l ( e ~ ) . x , ,  } subject to the constraints o n  
xe, given above. 


Computational models I 17 

F u r t h e r m o r e ,  as given below, this can be generalized once again to handle continuous 
membership functions. 

S P E C I A L  C A S E  

We will consider the special cases o f  determining Pr(X is f i X  is f ) .  This often causes difficulty 
since the value o f  this is n o t  necessarily 1 as m a n y  seem to expect. We will consider an example 
a n d  then justify this answer by using the p o p u l a t i o n  model for interpreting the results. 

Let 
F = {el, e2, e3, e4, es, e6, eT, es, eg, el0} 

a n d  

f =  el 10.1 + e210.2 + e310.3 + e410.4 + e510.5 + e610.6 

+ e710.7 + e810.8 + e910.9 + e~011.0, 
then 

(1) Using the c o n s t a n t  threshold model: 

m a x / m i n  z = 0.1xl + 0.2x2 + 0.3x3 + 0.4x4 + 0.5x5 + 0.6x6 + 0.7x7 + 0.8x s + 0.9x 9 + xi0 , 

subject to 

so that 

X I ~<0.1, 

Xl + X2 ~< 0.2, 

Xl + X2 + X3 

Xl +X2 +X3 + 

Xl + X2 + Xa + 

xl + x2 + x3 + 

X l + X 2 + X 3 + 

Xi 

Xi 

X1 

0.3, 

X4 ~< 0.4, 

X4 + X5 ~< 0.5, 

X4 + Xs + X6 <~ 0.6, 

X4 + Xs + X6 + XT <~ 0.7 

+ X2 + X3 + X4 + X5 + X6 + X7 + Xs ~ 0.8, 

+ X2 + X3 + X4 + X5 + X6 + XT + Xs + Xg ~ 0.9, 

+ X2 + X3 + X4 + X5 + X6 + Xy + Xs + X9 + Xlo ~ 1, 

z min = 0.1,(0.1 + 0 . 2 + 0 . 3 + 0 . 4 + 0 . 5 + 0 . 6 + 0 . 7 + 0 . 8 + 0 . 9 +  1 ) = 0 . 1 , 5 . 5  

= 0.55, 

z m a x =  1 . 1  = 1. 

Therefore using the c o n s t a n t  threshold model 

Pr(X is f i X  is f )  = [0.55, 1]. 

We might note t h a t  i f  we had chosen f to be 

f = et l0 + e210.1 + e310.21 + e410.3 + e~10.4 + e610.5 + e710.6 + esl0.7 + e910.8 + et0[0.9, 

then Pr(X is f i X  is f )  would lie in [0.45, 1]. 
(2) Using all possible interpretations for f :  

m a x / m i n  z = 0.1xl + 0.2x2 + 0.3x3 + 0.4x4 + 0.5x5 + 0.6x6 + 0.7x7 + 0.8x s + 0.9x9 + xl0, 


118 

subject to 

so t h a t  

J. F. BALDWIN 

xl ~<0.1, 

x2 ~< 0.2, 

x 3 ~ 0.3, 

x4 ~< 0.4, 

x~ ~<0.5, 

x6 ~< 0.6, 

x7 ~< 0.7, 

xs ~<0.8, 

x9 ~ 0.9, 

X~o ~< 1, 

z min = 0 . 1 , 0 . 1  + 0 . 2 , 0 . 2  + 0 . 3 , 0 . 3  + 0 . 4 , 0 . 4  = 0.3, 

z max = 1, 

Pr(X is f i X  is f )  lies in [0.3, 1]. 

We might note t h a t  if we h a d  chosen f to be 

f =  el 10 + e210.1 + e310.21 -{- e410.3 -t- e5t0.4 + e610.5 -q- e710.6 + esl0.7 + e910.8 -{- ej010.9, 

then 

Pr(X is f i X  is f )  would lie in [0.4, 1]. 

It is quite easy to justify this result using the population voting model. I f  the population P is 
told t h a t  X is f then P will interpret this. There will be some elements o f  the set F which some 
members o f  P, but n o t  all, will accept as possible values for X. W h e n  asked if X is f t h e s e  members 
o f  P will choose these values with a certain probability as possible values for X, b u t  n o t  all members 
o f  P will choose these values. 

Consider the situation t h a t  the p o p u l a t i o n  is told t h a t  J o h n  is tall. Some members o f  the 
p o p u l a t i o n  k n o w  t h a t  other members accept heights corresponding to tall which they do not. 
Therefore these members k n o w  t h a t  there is a probability t h a t  J o h n ' s  height will be a value which 
is unacceptable to them as representing tall. Therefore the probability t h a t  J o h n  is tall when the 
p o p u l a t i o n  is told t h a t  J o h n  is tall c a n n o t  be 1 since some members o f  the p o p u l a t i o n  will n o t  vote 
yes for certain. 

C O N T I N U O U S  C A S E  

The above is easily generalized to the c o n t i n u o u s  case. 

C O N C L U S I O N S  

A theory of reasoning with uncertainties applicable to expert systems and other AI applications 
has been described. It is being applied to many applications and evidence to date indicates that 
it is relatively easy to apply. Fril is a powerful AI systems language and ideal for writing expert 

system shells. 

R E F E R E N C E S  

I. J. F. Baldwin, T. P. Martin and B. W. Pilsworth, Fril Manual. Equipu AIR Ltd, Bristol (1987). 
2. J. F. Baldwin, Support logic programming. Proc. NATO Advanced Study Institute on Fuzzy Sets Theory and 

Applications. Louvain-La-N©uve, Belgium 0955). 


Computational models I 19 

3. J. F. Baldwin, Support logic programming. In Fuzzy Sets Theory and Applications (Eds A. Jones and H. J. 
Zimmermann), pp. 133-170. Reidel, Dordrecht, Holland (1986). 

4. J. F. Baldwin, Evidential support logic programming. Fuzzy Sets Systems 10, 1-26 (1987). 
5. J. F. Baldwin, A theory of support pairs (in press). 
6. I. Hacking, Logic o f  Statistical Inference. Cambridge Univ. Press (1965). 
7. L. A. Zadeh, Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets Systems 1, 3-28 (1978). 
8. R. Bellman and M. Giertz, On the analytic formalism o f  the theory of fuzzy sets. Inform. Sci. 5, 149-156 (1973). 
9. C. G. Hempel, Maximum specificity and lawlikeness in probabilistic explanation. Phil. Sci. 35, 116-133 (1968). 

10. P. Suppcs, Probabilistic Metaphysics. Blackwell, Oxford (1984). 
11. W. C. Salmon, Scientific Explanation and the Causal Structure o f  the World. Princeton Univ. Press, N.J. (1984). 
12. D. V. Lindley, Scoring rules and the inevitability of probability. Int. Star. Rev. 50, 1-26 (1982). 
13. G. Sharer, A Mathematical Theory o f  Evidence. Princeton Univ. Press, N.J. (1976). 
14. V. O. Homolka, The role of nonmonotonic reasoning in an intelligent maintenance system, ITRC73. Information 

Technology Research Centre, Bristol Univ. (1985).