Producing Modular Hybrid Rule Bases for Expert Systems


Published in the International Journal of Artificial Intelligence Tools (IJAIT), 
Vol. 10, No. 1-2 (2001), 87-105. 

 Copyright World Scientific Pub 2000. All rights reserved. 

CONSTRUCTING MODULAR HYBRID RULE BASES 
FOR EXPERT SYSTEMS 

 
I. HATZILYGEROUDIS, J. PRENTZAS 
University of Patras, School of Engineering 

Dept of Computer Engin. & Informatics, 26500 Patras, Hellas (Greece) 
Email: ihatz/prentzas@ceid.upatras.gr 

& 
Computer Technology Institute, P.O. Box 1122, 26110 Patras, Hellas (Greece) 

Email: ihatz@cti.gr 
 
 
ABSTRACT 

Neurules are a kind of hybrid rules integrating neurocomputing and production rules. Each 
neurule is represented as an adaline unit. Thus, the corresponding neurule base consists of a 
number of autonomous adaline units (neurules). Due to this fact, a modular and natural 
knowledge base is constructed, in contrast to existing connectionist knowledge bases. In this 
paper, we present a method for generating neurules from empirical data. To overcome the 
difficulty of the adaline unit to classify non-separable training examples, the notion of 
‘closeness’ between training examples is introduced. In case of a training failure, two subsets 
of ‘close’ examples are produced from the initial training set and a copy of the neurule for each 
subset is trained. Failure of training any copy, leads to production of further subsets as far as 
success is achieved. 

Keywords: hybrid knowledge representation, symbolic representation, connectionist 
representation, production rules, neural networks, modularity, naturalness. 

1. Introduction 

Recently, there has been extensive research activity at combining (or integrating) the 
symbolic and the connectionist approaches1, 2, 3. There are a number of efforts at 
combining symbolic rules and neural networks for knowledge representation4, 5, 6, 7. 
What they do is a kind of mapping from symbolic rules to a neural network. Also, 
connectionist expert systems8, 9, 10 are a type of integrated systems that represent 
relationships between concepts, considered as nodes of a neural network. The strong 
point of those approaches is that knowledge elicitation from the experts is reduced to 
a minimum. A weak point of them is that the resulted systems lack the naturalness 
and modularity of symbolic rules. This is mainly due to the fact that those 
approaches give pre-eminence to connectionism. For example, the systems in8, 10 are 
more or less like black boxes and, to introduce new knowledge, one has to modify a 


large part of the network. Connectionist knowledge bases cannot actually be 
incrementally developed. 

We use neurules11, 12, which achieve a uniform and tight integration of a symbolic 
component (production rules) and a connectionist one (the adaline unit). Each 
neurule is considered as an adaline unit. However, pre-eminence is given to the 
symbolic component. Thus, the constructed knowledge base retains the modularity 
of production rules, since it consists of autonomous units (neurules), and their 
naturalness, since neurules look much like symbolic rules. A difficult point in this 
approach is the inherent inability of the adaline unit to classify non-separable 
training examples. 

In this paper, we describe a method for generating neurules directly from 
empirical (training) data. We overcome the above difficulty of the adaline unit by 
introducing the notion of ‘closeness’, as far as the training examples are concerned. 
That is, in case of failure, we produce two subsets of the initial training set of the 
involved neurule, which contain ‘close’ success examples and train a copy of the 
neurule for each subset. Failure of training any copy leads to production of further 
subsets as far as success is achieved. This paper is a revised and extended version of 
the one presented at FLAIRS’200013. 

The structure of the paper is as follows. Section 2 presents neurules and the 
corresponding expert system architecture. Section 3 presents the basic ideas 
introduced in the paper. In Section 4, the algorithm for creating a hybrid knowledge 
base directly from empirical data is described. In Section 5 the hybrid inference 
mechanism is presented. Section 6 contains examples and experimental results. 
Section 7 discusses related work and finally, Section 8 concludes. 

2. Neurules 

2.1 Structure 
Neurules (: neural rules) are a kind of hybrid rules. Each neurule is considered as an 
adaline unit (Fig.1a). The inputs Ci (i=1,...,n) of the unit are the conditions of the 
rule. Each condition Ci is assigned a number sfi, called its significance factor, 
corresponding to the weight of the corresponding input of the adaline unit. 
Moreover, each rule itself is assigned a number sf0, called the bias factor, 
corresponding to the weight of the bias input (C0 = 1, not illustrated in Fig.1 for the 
sake of simplicity) of the unit.  

Each input takes a value from the following set of discrete values: [1 (true), 0 
(false), 0.5 (unknown)]. This gives the opportunity to easily distinguish between the 
falsity and the absence of a condition, in contrast to symbolic rules. The output D, 
which represents the conclusion (decision) of the rule, is calculated via the formulas: 

as usual14, where a is the activation value and f(x) the activation function, which is a 
threshold function (Fig.1b). Hence, the output can take one of two values, ‘-1’ and 
‘1’, representing failure and success of the rule respectively. 

D f sf sf Ci
i

n
i= +

=

∑(a),  a= 0
1


Fig.1. (a) A neu

2.2 Syntax and semantics 
The general syntax (structure
occurrences and ‘< >’ denote
<rule>::= (<bias-factor>) if <
<conditions>::= <condition> 
<conclusions>::= <conclusion
<condition>::= <variable> <l
<conclusion>::= <variable> <
where <variable> denotes a v
domain, e.g. ‘sex’, ‘pain’ et
concept’ etc in a tutoring do
variable or an intermediate v
in a conclusion can be eith
variable takes values from t
variables take values through
conclusions respectively. <l-
The symbolic predicates are 
=}. <r-predicate> can only be
be a symbol or a number. 
significance (weight) of the 
neurules see Fig. 3, Section 6

We distinguish between inp
is a neurule having only inpu
variables in its conclusions. A
intermediate variable in its c
An output neurule is one havi

C1 C2 

(sf1) 
(sf2) 

(sfn)

(sf0)

D

(a

f(x)

x 

1

0 
. . .
rule as an adaline unit (b) the activation function 

) of a rule is (where ‘{ }’ denotes zero, one or more 
s non-terminal symbols): 
conditions> then <conclusions> 
{, <condition>} 
> {, <conclusion>} 

-predicate> <value> (<significance-factor>) 
r-predicate> <value> 
ariable, that is a symbol representing a concept in the 

c, in a medical domain, or ‘learning-ability’, ‘related-
main. A variable in a condition can be either an input 
ariable or even an output variable, whereas a variable 
er an intermediate or an output variable. An input 

he user (input data), whereas intermediate and output 
 inference, since they represent intermediate and final 

predicate> denotes a symbolic or a numeric predicate. 
{is, isnot}, whereas the numeric predicates are {<, >, 
 a symbolic predicate. <value> denotes a value. It can 

The significance factor of a condition represents the 
condition in drawing the conclusion(s). (For example 
). 
ut, intermediate and output neurules. An input neurule 
t variables in its conditions and intermediate or output 
n intermediate neurule is a neurule having at least one 

onditions and intermediate variables in its conclusions. 
ng an output variable in its conclusions.  

Cn
)

-1 

(b)


2.3 The neurule based architecture 
In Fig.2, the architecture of a hybrid neurule-based expert system is illustrated. The 
run-time system (in the dashed rectangle) consists of three modules, functionally 
similar to those of a conventional rule-based system: the neurule base (NRB), the 
hybrid inference mechanism (HIM) and the working memory (WM). 
 

The NRB contain
4). The initial neur
mechanism (TRM) a
(training) data. The H
the input data in the
has the same forma
value the special sym
user or intermediate/

3. The Basic Ideas 
The main objective 
of as fine granularity
it is clear from the p
to and look much 
component, we retai
neurules are conside
neural units, which 
possible neural granu

The main problem
classify training pat
function. A well kn
non-separability in m

WM 

NRB HIM

M 

Input data Final conclusions 

Training data 

Facts 

Neurules 
TR
Fig.2 A neurule-based expert system architecture 
 

s neurules produced from empirical (training) data (see Section 
ules, constructed by the user, are trained using the training 
nd the training examples produced from the available empirical 

IM is responsible for making inferences by taking into account 
 WM and the rules in the NRB. The WM contains facts. A fact 
t as a condition/conclusion of a rule, however, it can have as 

bol “unknown”. Facts represent either conditions given by the 
final conclusions produced during an inference course.  

of our approach is to produce modular hybrid knowledge bases 
 as possible. Our main choice to this end is to use neurules. As 

revious section, neurules are hybrid rules that give pre-eminence 
like symbolic rules. By giving pre-eminence to the symbolic 
n the naturalness and modularity of symbolic rules. Internally, 
red as independent neural units, more specifically, as adaline 
are individually trained. As a single neural unit is the finest 
le, a neurule is the finest possible hybrid granule. 
 with this is the inherent inability of a single neural unit to 

terns (examples) that correspond to a non-separable (boolean) 
own such function is the XOR function, which is the basis of 

any cases. The training examples that correspond to the XOR 

Initial neurules 


function with two input variables are presented in Table 1, where v1 and v2 represent 
the input values (‘0’ and ‘1’ mean ‘false’ and ‘true’ respectively). Also, d represents 
the output value (‘-1’ and ‘1’ mean ‘inactive’ and ‘active’ respectively). We call 
success examples the patterns with d=1 and failure examples those with d=-1. From 
a knowledge representation point of view, success examples result in a cell’s 
activation and hence in (positive) knowledge production, whereas failure examples 
act like a protection from misactivations (produce negative knowledge).  

As it is known, there is no single-cell network that can represent the XOR 
function14. For example, a single unit cannot correctly classify all four patterns of 
Table 1. However, it can classify any three of them. Hence, if we remove one of the 
success examples, the remaining examples can be used to train a single unit 
(neurule). Furthermore, to be able to represent the removed success example, we 
need a second unit (neurule). However, to avoid misactivations we should use the 
failure examples, alongside the removed success example, to train the second neural 
unit. Thus, two independent neural units (neurules) are needed to represent the two-
input XOR function. The first unit is trained to classify the examples in the set {[0 0 
-1], [0 1 1], [1 1 -1]} and the second those in {[0 0 -1], [1 0 1], [1 1 -1]}. 
 
 
In crea

• Ea
ex

• Th
sam

The first 
activated,
two units 
satisfy the
examples
training s
that is the

Similar
four indep
represents
four succ

Table 1. Two-input XOR examples Table 2. Three-input XOR examples 
v1 v2 d 
0 0 -1 
ting these training sets, we had two i
ch unit (neurule) should be acti
ample. 
ere should be no two units (neuru
e success example. 

requirement assures that, there is 
 that is will never produce an outpu
(neurules) that will be activated by 
se requirements, contains one of th

. This means that the success examp
ubsets. Notice, here, that the two s
y have no common input values. 
 things hold for the three-input XO
endent neural units (neurules) are n
 a subset of the eight training examp
ess examples and all of the failure 

0 

1 

1 

1 

0 

1 

1 

1 

-1 
v1 v2 v3 d 
0 0 0 -1 
mplicit requirements in mind: 
vated by at least one success 

les) that can be activated by the 

no unit (neurule) that will never be 
t. The second assures that, there are no 
the same data. Each subset, in order to 
e success examples and all the failure 
les are the basis for the creation of the 
uccess examples are totally unrelated, 

R function (see Table 2). In this case, 
ecessary to represent it. Each of them 
les of Table 2 that includes one of the 
examples. Again, three of the success 

0 
0 
0 
1 
1 
1 
1 

0 
1 
1 
0 
0 
1 
1 

1 
0 
1 
0 
1 
0 
1 

 1 
 1 
-1 
 1 
-1 
-1 
1 


examples are totally unrelated, whereas the forth has only one common input value 
with the others. 

This analysis, which shows that totally or greatly unrelated success examples may 
cause non-separability, and an experience with a related problem12 led us to 
introduce the notion of ‘closeness’ between success examples (see Section 4.2) as a 
criterion for the creation of the training subsets with more than one success example. 
Closeness is used to guide distribution of success examples between subsets. 

In using the independent units (neurules) coming from the same training set, if 
one of them gets active, there is no need to evaluate the rest ones, since there is no 
possibility to have other activated units (neurules) for the same input data. 

4. Neurule Base Construction  

The algorithm for constructing a hybrid rule base from empirical (training) data is 
outlined below: 

1. Determine the input, intermediate and output variables and use dependency 
information to construct an initial neurule for each intermediate and output 
variable.  

2. Determine the training set for each initial neurule from the training data, train 
each initial neurule using its training set and produce the corresponding 
neurule(s). 

3. Put the produced neurules into the neurule base. 

In the sequel, we elaborate on each of the first two steps of the algorithm. 

4.1 Constructing the initial neurules 
To construct initial neurules, first we need to know or determine the input, 
intermediate and output variables. Then, we need dependency information. 
Dependency information indicates which variables (concepts) the intermediate and 
output variables (concepts) depend on. If dependency information is missing, then 
output variables depend only on input variables, as indicated by the training data. 

In constructing an initial neurule, all the conditions including the input, 
intermediate and the output variables that contribute in drawing a conclusion (which 
includes an intermediate or an output variable) constitute the inputs of the initial 
neurule and the conclusion its output. So, a neurule has as many conditions as the 
possible input, intermediate and output variable-value pairs. Also, one has to 
produce as many initial neurules as the different intermediate and output variable-
value pairs specified. Let’s assume that in the medical diagnosis domain there are 
four symptoms and two diseases. The symptoms are expressed by the conditions C1, 
C2, C3 and C4, and the diseases by the conclusions D1 and D2. We also know that 
D1 depends on C1, C2, C3 and D2 on C3, C4. Then, the following initial neurules 
are constructed: “(0) if C1 (0), C2 (0), C3 (0) then D1”, “(0) if C3 (0), C4 (0) then 
D2”. A zero initial value is assigned to each factor by default, except if the user 
assigns non-zero ones. For specific examples, see Section 6. 


4.2 Training the initial neurules 
From the initial training data, we extract as many (sub)sets as the initial neurules. 
Each such set, called a training set, contains training examples in the form [v1 v2 … 
vn d], where vi, i= 1, …,n are their component values, which correspond to the n 
inputs of the neurule, and d is the desired output (‘1’ for success, ‘-1’ for failure). 
Each training set is used to train the corresponding initial neurule and calculate its 
factors. The learning algorithm employed is the standard least mean square (LMS) 
algorithm14.  

However, there are cases where the LMS algorithm fails to specify the right 
significance factors for a number of neurules. That is, the corresponding adaline 
units of those rules do not correctly classify some of the training examples in their 
training sets. This means that the training examples correspond to a non-separable 
(boolean) function.  

To overcome this problem, the training set of the initial neurule is divided into 
subsets in a way that each subset contains success examples, which are “close” to 
each other in some degree. The closeness between two examples is defined as the 
number of common component values. For example, the closeness of [1 0 1 1 1] and 
[1 1 0 1 1] is ‘2’. Also, we define as least closeness pair (LCP), a pair of success 
examples with the least closeness in a training set. There may be more than one LCP 
in a training set. 

Initially, a LCP in the training set is found and two subsets are created each 
containing as its initial element one of the success examples of that pair, called its 
pivot. Each of the remaining success examples are distributed between the two 
subsets based on their closeness to their pivots. More specifically, each subset 
contains the success examples which are closer to its pivot. Then, the failure 
examples of the initial set are added to both subsets, to avoid neurule misfiring. 
After that, two copies of the initial neurule, one for each subset, are trained. If the 
factors of a copy misclassify some of its examples, the corresponding subset is 
further split into two other subsets, based on one of its LCPs. This continues, until 
all examples are classified. This means that from an initial neurule more than one 
final neurule may be produced, which are called sibling neurules. 

So, step 2 of the algorithm for each initial neurule is analyzed as follows: 
 2.1 From the initial training data, produce as many initial training sets as 

the number of the initial neurules. 
 2.2 Train each initial neurule, by applying the LMS algorithm to its initial 

training set. If the calculated factors classify correctly all the examples, 
produce the corresponding neurule. Otherwise, find a LCP and produce 
two subsets of the initial set. In each subset put its pivot, the success 
examples of the initial training set which are closer to the pivot, and all 
the failure examples. In constructing the subsets, success examples 
with the same closeness to both pivots are put in the subset containing 
a success example with the greatest closeness to it, otherwise to the 
one with the least number of success examples. 

 2.3 For each subset do the following: 


  2.3.1 Perform training of a copy of the corresponding initial neurule and 
calculate its factors.  

  2.3.2 If the calculated factors misclassify examples belonging to the 
subset, further divide the subset into smaller subsets as in step 2.2 
and apply step 2.3. 

  2.3.3 Produce the corresponding neurule. 

5. The Inference Mechanism 

Although the focus of this paper is not on the inference of the system, for the sake of 
completeness, we concisely refer to it in this section. The hybrid inference 
mechanism (HIM) implements the way neurules co-operate to reach a conclusion. 
HIM gives pre-eminence to symbolic reasoning, which is based on a backward 
chaining strategy. As soon as the initial input data is given and put in the WM, the 
output neurules are considered for evaluation. One of them is selected for 
evaluation. Selection is based on textual order. A rule succeeds if the output of the 
corresponding adaline unit is computed to be ‘1’, after evaluation of its conditions 
(inputs).  

A condition evaluates to ‘true’, if it matches a fact in the WM, that is there is a 
fact with the same variable, predicate and value. A condition evaluates to 
‘unknown’, if there is a fact with the same variable, predicate and ‘unknown’ as its 
value. A condition cannot be evaluated if there is no fact in the WM with the same 
variable. In this case, either a question is made to the user to provide data for the 
variable, in case of an input variable, or an intermediate neurule in NRB with a 
conclusion containing that variable is examined, in case of an intermediate variable. 
A condition with an input variable evaluates to ‘false’, if there is a fact in the WM 
with the same variable, predicate and different value. A condition with an 
intermediate variable evaluates to ‘false’ if additionally to the latter there is no 
unevaluated intermediate neurule in the NRB that has a conclusion with the same 
variable. Inference stops either when one or more output neurules are fired (success) 
or there is no further action (failure). 

In this process, because sibling neurules concern the same conclusion, if one of 
them fires, there is no need to evaluate any of the rest. To further increase inference 
efficiency, a number of heuristics are used12.  

6. Examples and Experimental Results 

In this section, we present application of our algorithm for constructing neurule 
bases to two different sets of training data. More specifically, we present the 
construction process and the resulted neurule base in each case. Also, we compare 
three methods for choosing the LCP. 

6.1 Fitting contact lenses 
The first data set was taken from a machine learning ftp repository15. It consists of 

24 patterns (p1-p24) and concerns empirical data for fitting contact lenses (see Table 


3). There are four input variables and one output variable. The input variables (with 
their possible values) are: age (young, pre-presbyopic, presbyopic), spectacle 
prescription (myope, hypermyope), astigmatic (no, yes), tear rate (reduced, 
normal). The output variable is: lenses-class (hard-lenses, soft-lenses, no-lenses). 
There are no intermediate variables, so there is no dependency information. Given 
that the output variable can take three possible values, we need three initial neurules, 
corresponding to the three possible conclusions. The output variable depends on all 
input variables, as empirical data shows. So, each initial neurule contains all the 
conditions related to the input variables. Each input variable produces as many 
conditions as its possible values. So, each neurule has nine conditions. The initial 
neurules are the same as the first three final neurules (see Fig. 3), except that the 
initial neurules have zero factors. 

 
Table 3. Data set for fitting contact lenses 

 
Pat. No age spect-pres astigmatic tear-rate lenses-class 

p1 young myope no reduced no-lenses 
p2 young myope no normal soft-lenses 
p3 young myope yes reduced no-lenses 
p4 young myope yes normal hard-lenses 
p5 young hypermetrope no reduced no-lenses 
p6 young hypermetrope no normal soft-lenses 
p7 young hypermetrope yes reduced no-lenses 
p8 young hypermetrope yes normal hard-lenses 
p9 pre-presbyopic myope no reduced no-lenses 

p10 pre-presbyopic myope no normal soft-lenses 
p11 pre-presbyopic myope yes reduced no-lenses 
p12 pre-presbyopic myope yes normal hard-lenses 
p13 pre-presbyopic hypermetrope no reduced no-lenses 
p14 pre-presbyopic hypermetrope no normal soft-lenses 
p15 pre-presbyopic hypermetrope yes reduced no-lenses 
p16 pre-presbyopic hypermetrope yes normal no-lenses 
p17 presbyopic myope no reduced no-lenses 
p18 presbyopic myope no normal no-lenses 
p19 presbyopic myope yes reduced no-lenses 
p20 presbyopic myope yes normal hard-lenses 
p21 presbyopic hypermetrope no reduced no-lenses 
p22 presbyopic hypermetrope no normal soft-lenses 
p23 presbyopic hypermetrope yes reduced no-lenses 
p24 presbyopic hypermetrope yes normal no-lenses 

 
The training sets of the initial neurules are extracted from the empirical data 

(Table 3) and are given in Table 4. In that table, the following correspondences are 
considered: age1 → ‘age is young’, age2 → ‘age is pre-presbyopic’, age3 → ‘age is 
presbyopic’, spec-pres1 → ‘spectacle-prescription is myope’, spec-pres2 → 
‘spectacle-prescription is hypermetrope’, astig1 → ‘astigmatic is no’, astig2 → 
‘astigmatic is yes’, tear-rate1 → ‘tear-rate is reduced’, tear-rate2 → ‘tear-rate is 
normal’, lenses-class1 → ‘lenses-class is hard-lenses’, lenses-class2 → ‘lenses-class 
is soft-lenses’, lenses-class3 → ‘lenses-class is no-lenses’. 
 
 
L1: (-2.4) if age is young (4.8), 
                    age is pre-presbyotic (-4.4), 
                    age is presbyotic (-4.5), 
                    spectacle is myope (-0.9), 
                    spectacle is hypermetrope (-2.1),
                    astigmatic is no (-6.4), 
                    astigmatic is yes (3.3), 
                    tear-rate is reduced (-7.5), 
                    tear-rate is normal (4.8) 
                then lenses-class is hard-lenses 
 
L2: (-2.4) if age is young (-0.5), 
                    age is pre-presbyotic (-0.3), 
                    age is presbyotic (-2.7), 
                    spectacle is myope (-4.0), 
                    spectacle is hypermetrope (0.9),
                    astigmatic is no (2.9), 
                    astigmatic is yes (-6.4), 
                    tear-rate is reduced (-7.4), 
                    tear-rate is normal (4.4) 
               then lenses-class is soft-lenses 

L3: (0.8) if age is young (-0.6), 
                   age is pre-presbyotic (-0.6), 
                   age is presbyotic (1.2), 
                   spectacle is myope (1.6), 
                   spectacle is hypermetrope (-0.2), 
                   astigmatic is no (2.7), 
                   astigmatic is yes (-2.0), 
                   tear-rate is reduced (4.4), 
                   tear-rate is normal (-4.6) 
              then lenses-class is no-lenses 
 
L4: (-0.7) if age is young (-6.2), 
                    age is pre-presbyotic (1.6), 
                    age is presbyotic (3.2), 
                    spectacle is myope (-5.8), 
                    spectacle is hypermetrope (4.7), 
                    astigmatic is no (-4.1), 
                    astigmatic is yes (2.6), 
                    tear-rate is reduced (3.4), 
                    tear-rate is normal (-4.5) 
               then lenses-class is no-lenses 

Table 4. Initial training sets for the contact lenses example 
 

lenses-class Ex. 
No 

age 
1 

age 
2 

age 
3 

spec-
pres

1 

spec-
pres

2 

astig
1 

astig
2 

tear-
rate

1 

tear-
rate

2 1 2 3 

p1 1 0 0 1 0 1 0 1 0 -1 -1 1 
p2 1 0 0 1 0 1 0 0 1 -1 1 -1 
p3 1 0 0 1 0 0 1 1 0 -1 -1 1 
p4 1 0 0 1 0 0 1 0 1 1 -1 -1 
p5 1 0 0 0 1 1 0 1 0 -1 -1 1 
p6 1 0 0 0 1 1 0 0 1 -1 1 -1 
p7 1 0 0 0 1 0 1 1 0 -1 -1 1 
p8 1 0 0 0 1 0 1 0 1 1 -1 -1 
p9 0 1 0 1 0 1 0 1 0 -1 -1 1 

p10 0 1 0 1 0 1 0 0 1 -1 1 -1 
p11 0 1 0 1 0 0 1 1 0 -1 -1 1 
p12 0 1 0 1 0 0 1 0 1 1 -1 -1 
p13 0 1 0 0 1 1 0 1 0 -1 -1 1 
p14 0 1 0 0 1 1 0 0 1 -1 1 -1 
p15 0 1 0 0 1 0 1 1 0 -1 -1 1 
p16 0 1 0 0 1 0 1 0 1 -1 -1 1 
p17 0 0 1 1 0 1 0 1 0 -1 -1 1 
p18 0 0 1 1 0 1 0 0 1 -1 -1 1 
p19 0 0 1 1 0 0 1 1 0 -1 -1 1 
p20 0 0 1 1 0 0 1 0 1 1 -1 -1 
p21 0 0 1 0 1 1 0 1 0 -1 -1 1 
p22 0 0 1 0 1 1 0 0 1 -1 1 -1 
p23 0 0 1 0 1 0 1 1 0 -1 -1 1 
p24 0 0 1 0 1 0 1 0 1 -1 -1 1 

 
Fig. 3. The neurule base for the contact lenses example 


Each of the three training sets consists of 24 examples. The examples in each of 
them have the same input patterns, but different output values, which are presented 
in Table 4 under the columns 1, 2 and 3 of the output variable ‘lenses-class’. 

For the first two initial neurules, the calculated factors successfully classified all 
training examples. The produced neurules L1 and L2 are presented in Fig. 3. 
However, it didn’t happen the same with the third initial neurule. So, from its initial 
training set two subsets were produced. 

There were six LCPs found (with closeness equal to ‘1’): (p1, p16), (p1, p24), 
(p7, p18), (p9, p24), (p15, p18) and (p16, p17). The first of them was chosen as the 
LCP. Then, from the third initial training set, two subsets were produced. The first, 
with pivot p1, included the success examples p3, p5, p9, p17, p18, p19 and p21, 
which were closer to p1 than to p16, and all failure examples. The second subset, 
with pivot p16, included the success examples p7, p11, p13, p15 and p24 and all the 
failure examples. Training of both copies of the third initial neurule was successful 
and two final neurules, L3 and L4, were produced (see Fig. 3). 

6.2 Diseases of the sarcophagus 
The second set of training data was taken from14. It includes 8 patterns that 

concern acute theoretical diseases of the sarcophagus. According to the example 
in14, there are six symptoms (Swollen feet, Red ears, Hair loss, Dizziness, Sensitive 
aretha, Placibin allergy), two diseases (Supercilliosis, Namastosis), whose diagnosis 
is based on the symptoms and three possible treatments (Placibin, Biramibio, 
Posiboost).  

 
Table 5. Data set for the diseases of sarcophagus 

 
Pat. 
No 

Sym
1 

Sym
2 

Sym
3 

Sym
4 

Dis  
1 

Dis  
2 

Treat 
1 

Treat 
2 

Treat 
3 

p1 T F ×××× F T F T F T 
p2 F T T F F T T T F 
p3 T T F T T T F F F 
p4 F F T F F F F F F 
p5 ×××× T T T T T F T T 
p6 F T T F T T T T F 
p7 T F F T T F F F F 
p8 T F T T F T F F F 
 
For reasons that will become clear in Section 7, we omit the symptoms ‘Swollen 

feet’ and ‘Red ears’ as well as their related data. Thus, symptoms are reduced to 
four. Also, we consider that ‘Supercilliosis’ does not any more depend on 
symptoms, because they have been removed, but it is given as input information. 
The training data for our example are given in Table 5, where Sym1 → ‘Hair loss’, 
Sym2 → ‘Dizziness’, Sym3 → ‘Sensitive aretha’, Sym4 → ‘Placibin allergy’, Dis1 
→ ‘Supercilliosis’, Dis2 → ‘Namastosis’, Treat1 → ‘Placibin’, Treat2 → 
‘Biramibio’ and Treat3 → ‘Posiboost’. Also, ‘T’and ‘F’ mean ‘true’and ‘false’ 


respectively and ‘××××’ means ‘unknown’. Finally, dependency information is provided 
(see Table 6), which shows the dependency between concepts. 

There is one input variable, namely symptom, which can take four possible values 
(the four symptoms). Also, there is one variable, called disease, which is both an 
input and an intermediate variable, depending on its value (see dependency 
information in Table 6). It can take two possible values (the two diseases). Finally, 
there is a last variable, treatment, which is both an intermediate and an output 
variable (see dependency information in Table 6) and can take three possible values 
(the three treatments). Because there are totally four possible values for the 
intermediate and output variables, four initial neurules are required.  

 
Table 6. Dependency information for the sarcophagus diseases problem 

 
 Sym
1 

Sym
2 

Sym
3 

Sym
4 

Dis  
1 

Dis  
2 

Treat 
1 

Treat 
2 

Namastosis 
(Dis2) √√√√ √√√√ √√√√      

Placibin 
(Treat1)    √√√√ √√√√ √√√√   

Biramibio 
(Treat2) √√√√    √√√√ √√√√   

Posiboost 
(Treat3)       √√√√ √√√√ 

 
The training sets for the four initial rules, which were extracted from the training 

data of Table 5, are presented in Tables 7-1 to 7-4. Notice that we didn’t use patterns 
including the ‘unknown’ value.  

 
Table 7-1. Training set for D1 

HairLoss 
(Sym3) 

Dizziness 
(Sym4) 

Sensitive aretha 
(Sym5) 

Namastosis 
(Dis2) 

1 0 1 1 
0 1 1 1 
1 0 0 -1 
0 0 1 -1 
1 1 0 1 

 
Table 7-2. Training set for D2 

Placibin allergy 
(Sym6) 

Supercilliosis 
(Dis1) 

Namastosis 
(Dis2) 

Placibin 
(Trea1) 

0 0 1 1 
0 1 0 1 
1 1 1 -1 
0 0 0 -1 
0 1 1 1 
1 1 0 -1 
1 0 1 -1 


Table 7-3. Training set for D3 

HairLoss 
(Sym3) 

Superscilliosis 
(Dis1) 

Namastosis 
(Dis2) 

Biramibio 
(Trea2) 

1 1 0 -1 
0 0 1 1 
1 1 1 -1 
0 0 0 -1 
0 1 1 1 
1 0 1 -1 

 
Table 7-4. Training set for D4-5 

Placibin 
(Trea1) 

Biramibio 
(Trea2) 

Posiboost 
(Trea3) 

1 0 1 
1 1 -1 
0 0 -1 
0 1 1 

 
Fig. 4. The neurule base for the sarcophagus diseases example 
 

D1: (-2.2) if Symptom is Dizziness (4.6), 
   Symptom is SensitiveAretha  (1.8), 
  Symptom is HairLoss (0.9) 
                  then Disease is Namastosis 
 
D2: (-0.4) if  Symptom is PlacibinAllergy (-5.4),
   Disease is Namastosis (1.8), 
  Disease is Supercilliosis (1.0) 
                  then Treatment is Placibin 
 
D3: (-0.4) if Symptom is HairLoss (-3.6), 
   Disease is Namastosis (1.8), 
  Disease is Supercilliosis (1.0) 
                  then Treatment is Biramibio 
 
D4: (-0.4) if Treatment is Biramibio (-4.4), 
  Treatment is Placibin (1.8) 
                  then Treatment is Posiboost 
 
D5: (-0.4) if Treatment is Placibin (-3.6), 
  Treatment is Biramibio (1.0) 
                  then Treatment is Posiboost 


The calculated factors of all the initial neurules, except the last one, successfully 
classified all the training examples, even those containing the ‘unknown’ value, 
which were not used (generalization capability). Thus, three final neurules were 
produced (D1-D3 in Fig. 4). The fourth initial neurule, pertaining to the treatment 
Posiboost, failed to classify its respective set of training examples (Table 7-4), 
because they correspond to a non-separable function (XOR type). Thus, two subsets 
were created containing the first three and the last three examples respectively. 
Finally, two neurules were produced (D4, D5).  

6.3 Choosing the least closeness pair 
A point of interest in training a neurule with a non-separable training set is how to 
choose a least closeness pair (LCP), in the process of producing the two subsets of 
the initial training set. Not all LCPs result in the same number of final neurules. So, 
we are looking for the pair that finally produces the minimum number of sibling 
neurules. We tried two heuristic methods for that. The best distribution method (BD) 
suggests choosing the pair that assures distribution of the two elements of the other 
pairs in different sets. So, examples with least closeness will be included in different 
sets, which may assure separability. The second, the mean closeness method (MC), 
computes the mean closeness of each of the two subsets to be created from each 
LCP. The mean closeness of a subset is the mean closeness of its examples. Then, 
calculates the mean closeness of the subsets created by each pair, which is the mean 
closeness of the two subsets, and chooses the pair with the greatest mean closeness. 

However, none of them faces all cases successfully. On the other hand, random 
choice method (RC) could be an alternative. In Table 8, results of using the above 
two heuristics and the random choice are presented. As random choice, we got the 
first LCP. The data used was indirectly taken from a medical rule base of 41 
symbolic rules. We extracted training patterns from the symbolic rules by the 
method for training sets specification described in12. We did so, because we knew 
the optimal number of the neurules to be produced. In Table 8, Opt No indicates the 
optimal number of neurules and OLCPs the optimal LCPs, that is the LCPs that lead 
to the optimal number of final neurules. 

 
Table 8. Comparison of methods for the least closeness pair choice 

Conclusion Exam-
ples 

Condi-
tions 

LCPs OLCPs Opt 
No 

MC BD RC 

inflammation 72 8 7 5 2 2/3 2 3 
arthritis 144 9 3 2 2 2 2/3 2 

primary-malignant 120 10 8 3 2 2 2/3 2 
secondary-
malignant 

72 7 2 1 2 2 2/3 3 

early-inflammation 324 11 7 7 4 4 4 4 
soft-tissue-early-

bone- inflammation 
288 11 2 2 2 2 2 2 

early-soft-tissue-
inflammation 

270 11 2 2 3 3 3 3 

 
As we can see from Table 8, none of the methods assures optimality of the 
number of the produced neurules in all cases. The expression ‘2/3’ means that the 
number of neurules can be 2 or 3. This is because there were more than one pairs 
that met the criterion of the method (e.g. had the same mean closeness or distributed 
the elements of the pairs in different subsets), but they weren’t all optimal. The MC 
method did a bit better than the BD method only in cases with a relatively small 
number of examples. However, random choice didn’t do bad, because,as a matter of 
fact, OLCPs is a large part, if not all, of LCPs. On the other hand, the MC method is 
computationally more expensive than the BD method and this latter than RC. So, 
given the computational effort required in the two heuristic methods, especially in 
the mean closeness, RC seems to be the best choice. Of course, some more 
experiments, with different rule bases, would give a more confident view on that. 

7. Discussion and Related Work 

The main contribution of this work is the representation of non-separable empirical 
data in a hybrid, natural and modular way, for use in expert systems, in contrast to 
existing connectionist approaches. A method for representing non-separable training 
examples in a connectionist expert system is presented in14. It is called the 
“distributed method”. What that method does is to introduce a number of 
intermediate cells, between the inputs and the related output, called “distributed 
cells”, whose bias and weights are randomly generated. This extra layer between 
inputs and output makes representation of non-separable examples possible. 

The problem with that method is that those intermediate cells have no meaning, 
that is there are no concepts related to the problem assigned to them, as happens 
with the other nodes in a connectionist knowledge base. Also, there is no specific 
way to determine the number of the required intermediate cells. Thus, the resulted 
knowledge base is unnatural and complicated.  

Following the process for generating a connectionist knowledge base from 
empirical data described in14, we constructed the connectionist knowledge base 
corresponding to the data set for fitting contact lenses (see Section 6.1). According 
to the process, each node is individually trained. In case of non-separable training 
examples, the distributed method is used. The resulted (real) knowledge base is 
depicted in Fig. 5 and its corresponding neural network in Fig. 6, where we used real 
numbers for the weights and biases, instead of integers. 

The knowledge base in Fig. 5 is actually a matrix that represents the connections, 
their weights and the biases of the cells (concepts) in the network of Fig. 6. A zero at 
a position in the matrix shows that there is no connection between the input cell 
(variable) of its column and the intermediate or output cell (variable) of its row. 

In the network of Fig. 6, there are three output cells (the cyclic ones) representing 
the three outputs (conclusions), three intermediate (distributed) cells (the triangles) 
introduced for representing the non-separable training examples of the ‘no-lenses’ 
training set and nine input cells representing the nine input values. For readability 
reasons, we didn’t draw all the connections neither put all the weights on the net of 
Fig. 6. Actually, all inputs are connected to all intermediate and all output cells and 
the outputs of all the intermediate cells are connected to the output cells. 


Lenses class 1 
-13.2 8.4 1.0 0.9 0.9 -2.1 -6.4 5.1 -5.7 4.8 0 0 0 
Lenses class 2 
-11.4 3.1 3.3 2.7 -4.0 2.7 2.9 -4.6 -7.4 6.2 0 0 0 
Intermediate variable 1 
3.2 0 -0.8 -3.6 7.8 -6.5 7.7 -6.7 -11 10.4 0 0 0 

Intermediate variable 2 
-4.0 -3.6 2.8 3.6 4.2 -2.9 4.1 -3.1 7.0 -7.6 0 0 0 

Intermediate variable 3 
-7.6 0 6.4 0 0.6 0.7 0.5 0.5 -0.2 -0.4 0 0 0 

Lenses class 3 
5.0 -5.4 -2.6 1.8 -1.2 2.5 -1.3 2.3 1.6 -2.2 -9.6 9.8 -5.3 

Bias age 
1 

age 
2 

age 
3 

spec
-pr1 

spec
-pr2 

astig
1 

astig
2 

tear-
r1 

tear-
r2 

Inter
1 

Inter
2 

Inter
3 

 
Fig. 5. The connectionist knowledge base for the contact lenses example 

 
age 
1 

age 
2 

age 
3 

spec-
pr1 

spec-
pr2 

astig 
1 

astig
2 

tear-
r1 

tear-
r2 

 
Fig. 6. The neural network for the contact lenses example 

 
A comparison of the knowledge base in Fig. 5 to the one in Fig. 3 demonstrates 

the advantages of neurules. It is clear that the benefits of symbolic rule-based 
representation, such as naturalness and modularity are retained. Neurules are 
understandable, since significance factors represent the contribution of 
corresponding conditions in drawing the conclusion. On the other hand, the 
connectionist knowledge base is a multilevel network with some meaningless 
intermediate units. Thus, it lacks the naturalness of neurules. 

The corresponding connectionist knowledge base for the diseases of the 
sarcophagus example is depicted in Fig. 7. It is a modified version of that presented 
in14, to fit our modified example. The corresponding network is a multi-level 
network with three distributed cells, introduced by the training algorithm.  

Let’s suppose now that we get some new knowledge, which says that 
‘Supercilliosis’ should not be given as input information, but it will be produced as 
an intermediate conclusion. Also, that it depends on ‘Hair loss’ and two new 

hard-lenses 
(lenses class 1) 

soft-lenses 
(lenses class 2) 

no-lenses 
(lenses class 3)

Inter1 
Inter2

Inter3

5.0 -11.4-13.2 

-5.4 

-2.6 
-2.2 

1.6 
-0.8 

3.2
-4 -7.6 


symptoms (inputs), namely ‘Swollen feet’ and ‘Red ears’, according to the (training) 
data given in Table 9. 

 
Namastosis 

-1 3 3 3 0 0 0 0 0 0 0 0 0 
Placibin 

-2 0 0 0 -4 2 2 0 0 0 0 0 0 
Biramibio 

-1 -4 0 0 0 1 3 0 0 0 0 0 0 
Intermediate variable 1 

2 0 0 0 0 0 0 -4 5 0 0 0 0 
Intermediate variable 2 

3 0 0 0 0 0 0 -2 2 0 0 0 0 
Intermediate variable 3 

0 0 0 0 0 0 0 -1 -3 0 0 0 0 
Posiboost 

3 0 0 0 0 0 0 -3 1 -3 -3 -1 0 
Bias Sym 

1 
Sym 

2 
Sym 

3 
Sym 

4 
Dis 
1 

Dis 
2 

Treat
1 

Treat
2 

Int
1 

Int
2 

Int
3 

Treat
3 

 
Fig. 7. The connectionist knowledge base for the diseases of the sarcophagus example. 

 
Table 9. Training data for Supercilliosis 

HairLoss 
(Sym1) 

SwollenFeet 
(Sym5) 

RedEars 
(Sym6) 

Supercilliosis 
(Dis1) 

1 1 1 1 
0 0 0 -1 
1 0 0 1 
0 1 1 -1 
0 1 0 1 
1 0 1 -1 

 
To introduce this new knowledge to our neurule base, we train a neurule with three 
conditions, corresponding to the three symptoms of Table 8, and a conclusion 
related to ‘Supercilliosis’. The result is the final neurule D6 depicted in Fig. 8, 
which is put into the neurule base. 

To introduce that knowledge into the connectionist knowledge base of Fig. 7, not 
only training of a new unit is needed, but also modifications should be made to the 
knowledge base. More specifically, a new row (concerning ‘Superscilliosis’) and 
two new columns (concerning ‘Swollen feet’ and ‘Red ears’) should be added. 

 
Fig. 8. The new neurule concerning ‘Supercilliosis’. 
 

D6: (-0.4) if Symptom is RedEars (-4.4), 
   Symptom is SwollenFeet (3.6), 
  Symptom is HairLoss (2.7) 
                  then Disease is Supercilliosis 


Furthermore, one can easily add new neurules to or remove old neurules from a 
neurule base without making any changes to the knowledge base, since neurules are 
functionally independent units, given that they do not affect existing knowledge. 
Thus, a type of incremental development of the knowledge base is still supported, 
although by larger knowledge chunks. This corresponds to introducing one or more 
networks in an existing connectionist knowledge base sharing or not inputs and/or 
intermediate cell outputs. This is either difficult or impossible to do. 

 
8. Conclusions 
In this paper, we introduce a method for generating neurules, a kind of hybrid rules, 
from empirical data of binary type. Neurules integrate neurocomputing and 
production rules. Each neurule is represented as an adaline unit. Thus, the 
corresponding rule base consists of a number of neurules (autonomous adaline 
units). In this way, the produced neurule base retains the modularity of symbolic 
rule bases. Also, it retains their naturalness, since neurules look much like symbolic 
rules. Furthermore, incremental development is still supported. This is in contrast to 
existing connectionist knowledge bases, which are not modular and thus do not 
actually offer incremental development. 

A difficult point in our approach is the inherent inability of the adaline unit to 
classify non-separable training examples. We overcome this difficulty by 
introducing the notion of ‘closeness’, as far as the training examples are concerned. 
That is, in case of failure, from the training set of the neurule two subsets of ‘close’ 
examples are produced and two copies of the neurule are trained. Failure of any 
copy training leads to further subsets production until success is achieved. 

A weak point of the neurules is the fact that we have multiple representations of 
the same knowledge, in case of sibling rules. 

Given the capability of producing neurules from empirical binary data and their 
advantages over symbolic rules, as far as inference efficiency and the rule base size 
are concerned12, we can argue that neurules are more suitable for representing 
knowledge in web-based intelligent tutoring systems than symbolic rules. This is our 
current continuation on this research. 

Acknowledgements 
This work was partially supported by the GSRT of Greece, Program ΠΕΝΕ∆’99, 

Project No Ε∆234. 

References 
[1] L. M. Fu (Ed), Proceedings of the International Symposium on Integrating Knowledge 

and Neural Heuristics (ISIKNH’94), Pensacola, FL (May 1994). 
[2] R. Sun and E. Alexandre (Eds), Connectionist-Symbolic Integration: From Unified to 

Hybrid Approaches, Lawrence Erlbaum (1997). 
[3] M. Hilario, An Overview of Strategies for Neurosymbolic Integration, ch.2 in [2]. 
[4] L-M Fu and L-C Fu, Mapping rule-based systems into neural architecture, Knowledge-

Based Systems 3 (1990) 48-56. 


[5] F. Kozato and Ph. De Wilde, How Neural Networks Help Rule-Based Problem Solving, 
Proceedings of the ICANN’91 (1991) 465-470. 

[6] Tan C.L. and T. S. Quash, Implementation of rule-based expert systems in a neural 
network architecture, The World Congress on Expert Systems Proceedings (1991) 1843-
1851. 

[7] Kuncicky D. C., S. I. Hruska and D. C. Lacher, Hybrid Systems: The equivalence of rule-
based expert system and artificial neural network inference, International Journal of 
Expert Systems, (1992) 4(3) 281-297. 

[8] S. I. Gallant, Connectionist Expert Systems, CACM, 31 (1988) 152-169.  
[9] B. Boutsinas and M. N. Vrahatis, Nonmonotonic Connectionist Expert Systems, 

Proceedings of the 2nd WSES/IEEE/IMACS International Conference on Circuits, 
Systems and Computers, Athens, Hellas (Oct. 1998). 

[10] A. Z. Ghalwash, A Recency Inference Engine for Connectionist Knowledge Bases, 
Applied Intelligence 9 (1998) 201-215. 

[11] I. Hatzilygeroudis, J. Prentzas, Neurules: Integrating Symbolic Rules and 
Neurocomputing, in D. Fotiades and S. Nikolopoulos (Eds), Advances in Informatics, 
World Scientific Pub., 2000, 122-133. 

[12] I. Hatzilygeroudis and J. Prentzas, Neurules: Improving the Performance of Symbolic 
Rules, International Journal on Artificial Intelligence Tools (IJAIT), 9(1) (2000) 113-
130. 

[13] I. Hatzilygeroudis and J. Prentzas, Producing Modular Hybrid Rule Bases for Expert 
Systems, Proceedings of the 13th International FLAIRS Conference, Orlando, FL (May 
2000) 181-185. 

[14] S. I. Gallant, Neural Network Learning and Expert Systems, MIT Press (1993). 
[15] ftp://ftp.ics.uci.edu/pub/machine-learning-databases/