Refinement of the HEPAR Expert System: Tools and Techniques∗ Peter Lucas Department of Computer Science Utrecht University Padualaan 14 3584 CH Utrecht, The Netherlands June 26, 2013 Abstract Methods and tools for the static and dynamic verification and validation of software sys- tems are common place in the field of software engineering. In the field of expert systems, where it is more difficult to ensure that a system meets the specifications and expecta- tions than in traditional software engineering, such tools are generally not available. In this paper, the need for more support of the development process by method and tools is illustrated by the approach taken in building the HEPAR system, a rule-based expert system that can be used as a supportive tool in the diagnosis of disorders of the liver and biliary tract. At a certain stage in the development of this system an incremental devel- opment methodology has been adopted, in which implementation of parts of the expert system was followed by dynamic validation. For this purpose, a collection of software tools were implemented as extensions to a rule-based expert-system shell. These tools provide valuable information about the effects of modification of the HEPAR knowledge base, and indicate places in the knowledge base for refinement. It is believed that similar software tools may prove helpful in the development of other expert systems as well. Keywords. Knowledge engineering; validation of expert systems; refinement of expert systems; evaluation of expert systems; expert-system validation tools. 1 Introduction In the field of software engineering it is generally recognized that the implementation of large software systems must be supported by methods and tools for their verification and valida- tion [19]. Often a distinction is made between static and dynamic approaches to verification and validation. Typical examples of static methods are program code inspection and meth- ods for proving program correctness. The application of static methods does not require the program to be executed. In contrast, dynamic verification and validation methods involve execution of the program and examining its output when it is presented with certain input. As has been pointed out repeatedly by many researchers, dynamic methods can be used only to demonstrate the presence of errors in a program, and never to demonstrate their absence. ∗Published in: Artificial Intelligence in Medicine, 6(2), 175–188, 1994. 1 Despite this fundamental limitation, software engineers consider dynamic methods indispen- sible aids in the software-development process, and supporting software tools are invariably included in programming environments. It is therefore ironical that in the development of expert systems, where it is much more difficult to ensure that the system meets its specifica- tions and expectations than in software engineering, tools that aid in the dynamic verification and validation of the system are not generally available. In this paper we discuss several static and dynamic methods, including some software tools, which were applied during the development of the HEPAR system, a rule-based expert system in the field of hepatology that offers support in the diagnosis of disorders of the liver and biliary tract. For a description of this system and of two successive studies of its performance, the reader is referred to [10, 11, 12]. Taking the development of the HEPAR system as a real-life example, we describe where in the development process the need for static and dynamic verification and validation methods and tools arose. This experience provides further evidence that in the development process of expert systems more support by methods and tools tailored to verification and validation is needed than is presently offered. The structure of the paper is as follows. In Section 2, we review the design and imple- mentation of the HEPAR system and analyse the problems encountered in the design stage. Several static verification and validation methods were employed at this stage of the project. In Section 3, the subject of knowledge-base refinement is related to expert-system validation using dynamic techniques. In Section 4, we describe some simple software tools which were used for the purpose of dynamic validation during the refinement of the HEPAR system. 2 Development of the HEPAR system 2.1 Knowledge acquisition and design It is now well-recognized that the acquisition of domain knowledge in the process of building an expert system is a difficult task [6]. In recent years, many methodologies have therefore been proposed, providing systematic methods to be followed in building an expert system. Examples of such methodologies are KADS [3] and KEATS [15, 16]. Some of these method- ologies include a set of software tools which help the knowledge engineer in building a specific application, mainly by assisting in the analysis of the problem domain. Some assistance by software tools may be provided in the design of the expert system as well. Examples of such tools are Shelley [2] and Acquist [15, 16]. Most methodologies place considerable emphasis on the process of gathering domain knowledge to be incorporated into the expert system, and on the development of conceptual models of the domain, being the result of the analysis of the knowledge collected. Although the HEPAR system was actually designed long before such methodologies came into play, the development of the system was initially carried out in a structured way, mainly following the top-down design methodology from software engineering. The knowledge con- cerning diagnosis in liver and biliary disease incorporated into the HEPAR system was derived from the experience of a specialist in internal medicine and hepatology and from the medical literature. The analysis of the problem of diagnosis of disorders of the liver and biliary tract indicated that the following aspect were important in this domain: • Expert hepatologists follow a clear and unambiguous strategy in diagnosis. The early classification of a patient’s disorder into rather general categories, such as whether or not 2 the disorder is biliary-obstructive in nature, is used for the selection of supplementary tests to reduce the number of alternative diagnoses to be considered. • Early in the diagnostic process only a limited amount of patient data is available, mainly obtained from medical history and physical examination. Still, a hepatologist is often capable of coming up with a working diagnosis of the patient’s disorder. In the design of the HEPAR system, the strategy followed by the hepatologist has been taken as the point of departure for problem decomposition of the diagnostic process, by distinguishing several subtasks [11]. The requirement that the ultimate system ought to be able to assist the clinician in the initial assessment of the patient, in whom only a limited amount of data is available, as well as in the assessment of the patient in whom more specific test results are known, proved to be extremely difficult. In general, to explicitly deal with data not available in the patient would yield an exponential number of combinations of conditions on known and unknown patient findings to be taken into account. Although the hepatologist involved in the project was able to reduce the number of useful combinations considerably, we felt quite uncertain with respect to the suitability of this knowledge for classifying actual patient cases. Note that current knowledge-acquisition methodologies do not offer much help in solving this problem. In most popular methodologies, the design process is essentially viewed as the process of abstraction from reality. Our problem was that we required some form of experimental feedback in refining the expert system to accommodate to reality. Likewise, only limited attention has been given to tools that provide information about the diagnostic quality of the advice produced by the expert system. 2.2 Implementation and experimental feedback Implementation of the HEPAR system was carried out using the EMYCIN-like expert-system shell DELFI-2 [9]. The advice produced by the system is in the form of a collection of conclusions: 1. Whether the patient has a hepatocellular of biliary-obstructive disorder. 2. Whether the features found in the patient indicate benign or malignant disease. 3. A differential diagnosis consisting of a collection of specific disorders, selected out of a set of about 80 disorders, each explaining some of the findings observed in the patient. In HEPAR, the first two conclusions are intermediate and the last is a final conclusion. After completing a considerable portion of the knowledge base, it was decided to carry out some experiments with the system using data from real patients to investigate whether the system was able to meet our expectations. It appeared that the system was unable to come up with an acceptable advice in many cases. An analysis of the results of this initial experiment yielded the following reasons for the disappointing performance: • Many rules were formulated too rigorously, such that these rules almost never applied in a patient with the given disease. • Many rules were defined without explicitly mentioning the medical context in which they should hold. These rules frequently succeeded in patients for whom they had not been designed. 3 if same(patient,complaint,abdominal pain) and ⇒notsame(patient,complaint,fever) and same(patient,signs,hepatomegaly) and same(pain,character,continuous) and same(ultrasound liver,parenchyma,multiple cysts) and same(patient,nature-disorder,benign) then conclude(patient,diagnosis,polycystic disease) with CF = 1.00 fi Figure 1: Weakly formulated production rule. if (same(complab,duration,chronic) or same(patient,type-disorder,biliary obstructive)) and same(patient,sex,female) ⇒ same(patient,sex,male) and same(serol,mitochondrial Ab,yes) then conclude(patient,diagnosis,primary biliary cirrhosis) with CF = 0.80 ⇒ with CF = 0.60 fi Figure 2: Rigorously formulated production rule. These problems may actually be taken as an indication of the knowledge clinicians draw upon in medical practice. Firstly, the knowledge of the clinician is partly based on the descriptions given in medical textbooks, in which there is little place for the description of atypical disease patterns, and partly on experience in the management of specific disorders. Rules which have been formulated too rigorously tend to describe the typical picture of the disease, and may assume the availability of an unrealistic amount of data for the patient. Secondly, the clinician has considerable experience with disorders frequently observed in clinical practice. However, these disorders carry a clinical context which the clinician may not be able to make explicit. Formalizing such knowledge may yield rules with a wider application than intended. As an example, consider the production rule depicted in Figure 1. In this HEPAR rule we use the object–attribute–value representation and certainty factors as employed in the DELFI-2 system [9, 13]. This rule was originally formulated without the second condition (indicated by the right arrow); for the modified rule to be applicable, fever must be absent in the patient. So, in its original form it is an example of a too weakly formulated production rule. This missing condition caused the original rule to interact with Caroli’s disease, infected liver cysts and cystic liver metastases. 4 The production rule shown in Figure 2 is an example of a too rigorously formulated rule, since in the form left from the right arrows, the rule is only applicable for female patients. Typically, a patient having primary biliary cirrhosis is a female, but the disorder is not limited to the female sex. A new production rule was therefore added to the knowledge base in which the expressions specified right from the arrow replaced the expressions left from the arrow. As said above, clinicians often have to base their early decisions on incomplete clinical evidence; likewise, expert systems must be able to deal with incomplete evidence as well. In designing and implementing the HEPAR system we have tried to explicitly handle incomplete patient data in the following two ways: 1. By distinguishing several different conceptual levels in the diagnostic problem-solving process; 2. By explicitly incorporating knowledge about unknown diagnostic test results for a pa- tient into the knowledge base. To deal with the first source of incompleteness of information in the HEPAR system, rules were drafted covering only the symptoms and signs of a disorder obtained early in the diagnostic process, whereas other rules were drafted only covering the results of supplementary tests obtained later in the diagnostic process. In this way a more or less layered structure of the knowledge base was obtained. The second source of incompleteness was dealt with by inspecting rules for conditions on data not always available in the patient. Some of these rules were used as a basis for new rules containing conditions concerning unknown data. Mainly static verification and validation methods were employed at this stage; the early experiments were only carried out to validate our design rationale. 3 Knowledge base refinement and validation 3.1 Refinement parameters The process of refining an expert system may be viewed as the iterative process of validating, extending and adapting its knowledge base. Here, we are concerned with dynamic validation. Since the extension and adaptation are based on the results of the validation, it is clearly important to decide on the parameters used for validating an expert system. The point of departure for expert system validation is to consider a medical diagnostic expert system as a computer program that tries to construct a model of a given patient which it compares with prestored descriptions of disease patterns in its knowledge base. Under ideal circum- stances, validation of a diagnostic expert system should therefore not only pertain to the advice produced by the expert system, but also to the assumptions made by the reasoning process on which the conclusions are based. It is, for instance, important to know whether it is possible for an expert system to arrive at conclusions which are correct, but based on incorrect assumptions. So, the conclusions of a medical diagnostic expert system should not be interpreted as unique answers, but as judgements of the patient’s status [8]. As a con- sequence, not unique answers but judgements should be validated. This view of validation is also able to cope with situations in which the expert system’s conclusions are incorrect, but nevertheless acceptable in the light of the data available. This approach to validation is particularly valuable when applied in refining an expert system; it is less appropriate in the final validation of an expert system [7]. 5 3.2 Refinement by dynamic validation For the purpose of the refinement of the HEPAR knowledge base, we have investigated several approaches to system validation. A diagnostic expert system like HEPAR may be validated against: 1. Patient cases with known clinical diagnosis; 2. The conclusions of some other, but similar decision support system; 3. The judgement of human experts in the field. In all three cases, there is a need for a test, or a combination of tests, that may be taken as a ‘gold standard’ for the comparison, although this is not as crucial as in the final validation of an expert system, because inspection of the knowledge base may provide additional information. Examples of tests that are suitable as a gold standard in the diagnosis of disorders of the liver and biliary tract are the histological examination of liver biopsies, ERCP and surgical exploration. There is not a single test available in hepatology that may be employed as a gold standard in the entire domain, because diagnosis of hepatocellular disorders differs considerably from diagnosis of biliary obstruction. Initially in the refinement, we have used part of the data from more than 1000 patients obtained from the Danish COMIK group as a source for comparison. Originally, these data have been used in the development of the Copenhagen Pocket Chart, a paper chart based on the statistical technique of logistic regression, which may assist the clinician in the early assessment of a patient with jaundice [14]. Unfortunately, this database was of limited value because only 23 disease categories were distinguished in this database, whereas in HEPAR more than 80 disorders of the liver and biliary tract are distinguished, and not all data required for HEPAR to derive final conclusions were included in the database. Therefore, the database was mainly useful for getting insight into the extent to which HEPAR was capable of dealing with incomplete patient data. During the further development of the system, a database with patient data from the Leiden University Hospital was put together, which was applied as the main device for the refinement of the knowledge base. Comparison of an expert system with some similar decision-support system is seldom straightforward, because of differences in required input and produced output among the sys- tems. For the refinement of the HEPAR system, we have included production rules that map diagnostic conclusions of HEPAR to the possible diagnostic conclusions of the Copenhagen Pocket Chart. The results produced by the Copenhagen Pocker Chart could thus be used as a simple means for rapidly finding patient case which deserved further study with regard to the conclusions produced by HEPAR. The hepatologist involved in the project has studied the reasoning process of HEPAR for a considerable number of patients. Validation of the reasoning process of HEPAR turned out to be very time-consuming, and as a consequence we have not been able to involve other hepatologists. However, on a limited scale we have profited from discussion with other hepatologists in examing the results produced by HEPAR. Taking the conclusions concerning the final diagnosis produced by the HEPAR system as a point of departure for refining the system was difficult, because the system’s advice consists of more than one conclusion. This problem is known as the multiple response problem [18]. Consider for example the situation that the expert system did produce the correct answer with highest ranking, as well as a conclusion with lower ranking which is however totally 6 unacceptable to the clinician. Restricting the validation only to the single conclusion with the highest ranking will give a distorted account of the actual performance of the system. These problems cannot be solved by only collecting information concerning the number of conclusions generated by the system and the ranking of the clinical diagnosis. Furthermore, consider the situation in which the conclusion with highest ranking is incorrect, but where the differential diagnosis as a whole is acceptable to the clinician. Taking only the conclusion with highest ranking into account will then give an inadequate impression of the system’s capabilities. However, due to the layered approach to diagnostic problem solving modelled in HEPAR, it is not only possible to compare specific disorders as a diagnosis with the clinical diagnosis confirmed in the patient, but also to check whether the patient’s disorder has been classified into the right diagnostic category (e.g. hepatocellular disorder). This layered approach makes it less likely that the differential diagnosis produced by the system is as a whole unacceptable to the clinician. Most of the information obtained by the dynamic validation of the HEPAR knowledge base was automatically compiled by a collection of simple software tools which are discussed in the next section. Without the availability of these tools, dynamic validation would have been too time-consuming for being practically feasible. 4 Tools for knowledge base refinement 4.1 Testing tools in software engineering Some of the software tools that have been developed for the dynamic validation of the HEPAR system, and applied in the refinement and the performance studies of the system, have been inspired by software tools commonly available in programming environments. Test-data generators, programs that systematically produce test data to be used as input to the program to be tested, are one type of program that may be useful in the dynamic validation of expert systems. However, for realistic testing, data from real-life cases are often indispensible. Another tool is the dynamic analyser, also known as the execution flow summarizer. A dynamic analyser adds instrumentation statements to a computer program in order to collect information on how many times a statement is executed. A display part of the dynamic analyser prints a summarizing execution report [19]. A tool with similar usage as the dynamic analyser is the call-graph profiler [5]. 4.2 A tool for performance measurement The environment of software tools developed for the dynamic validation of HEPAR consists of a non-interactive batch version of the expert-system shell DELFI-2 which is able to use a database of patient cases as its input. This system produces a report containing the results for each individual patient. The report together with a file containing information about the final clinical diagnoses and the two intermediate conclusions concerning the patient is then processed by a program which produces a table summarizing the results. Figure 3 shows the overall structure of the validation environment. The tools collect information with regard to the number of correct, incorrect and unclassified patient cases concerning: 1. the type of hepatobiliary disease (hepatocellular or biliary-obstructive); 7 HEPAR knowledge base Database of patient cases Batch version of DELFI-2 Report Statistics program Final clinical diagnosis of patients Summarizing tables Rule-application analyser Rule-application report/graph Figure 3: Environment for dynamic validation. 2. the nature of the disorder (benign or malignant); 3. the final diagnosis. With regard to the final diagnosis, the system computes the average number of conclusions, whether the clinical diagnosis occurs as the conclusion ranked first, or among the list of alternatives generated. An example of such a table, produced after refinement of the HEPAR system using the software tools and a database with data from 82 patients from Leiden University Hospital, is reproduced in Table 1, which shows the results for the patients after refinement. The reader should note that the system does not reach 100% correctness, for reasons discussed in Section 3.1. To obtain some insight into the capabilities of the system in handling incomplete data, the batch version of the expert-system shell can select part of the patient data from the database. For example, the system is capable of selecting only data obtained from history and physical examination. An example of a table in which the results for incomplete data are produced using this environment is shown in Table 2. As can be seen in Table 2, the number of correct final diagnoses decreased considerably when the patient data entered into the system were more incomplete. However, the percentage of incorrectly classified cases did not increase Table 1: Diagnostic results of HEPAR for a population of 82 patients with hepatobiliary disease. Conclusion Correct Incorrect Unclassified Total n (%) n (%) n (%) n (%) Type of hepato- 74 (90) 4 (5) 4 (5) 82 (100) biliary disorde Benign/malignant 78 (95) 4 (5) 0 (0) 82 (100) nature of disorder Final diagnosis 71 (86) 8 (10) 3 (4) 82 (100) Clinical diagnosis 76 (93) 3 (4) 3 (4) 82 (100) among conclusions 8 Table 2: Assessment of the effects of incompleteness of information on the diagnostic conclusions of the system, for a database of 82 patients with hepatobiliary disease. Conclusion Correct (%) Incorrect (%) Unclassified (%) A B C A B C A B C Type of hepato- 90 90 50 5 5 0 5 5 50 biliary derangement Benign or malignant 95 95 94 5 5 6 0 0 0 nature of disorder Final diagnosis 86 49 24 10 9 10 4 43 66 A: All available data presented to system. B: Only data concerning symptoms, signs, haematology and bloodchemistry (no data from ultrasound or serology presented). C: Only data from medical interview and physical examination. significantly, only the percentage of unclassified cases did. Tables such as presented here were used by the hepatologist as an indication of the effects of changes to the knowledge base. An accompanying textual report provided information for each individual patient, and gave also information about the rules applied in deriving the final conclusions. This information served as a point of departure for a more in-depth study of the reasoning behaviour of the system. 4.3 Dynamic analysis of the HEPAR knowledge base In the previous section, we have discussed how the study of the results of the HEPAR system for individual patient cases, has been employed for refining the diagnostic quality of HEPAR. A second source of information that has been used for the refinement of the system, was the contents of the HEPAR knowledge base itself, by studying the overall behaviour of the system when provided with a complete database with patient data. These tools bear some resemblance with a dynamic analyser as described in Section 4.1. In order to obtain infor- mation concerning the frequency of rule application over a given database of patient cases, the testing environment discussed in Section 4.2 includes a collection of tools which uses the report produced by the batch version of the expert system for a database of patient cases. The following results are produced by these programs: • An enumeration of all production rules used, with for each rule information about how often it has been used for a given database; • An overview of the frequency distribution of the rule application, both in textual and in graphical form. Figure 4 which was automatically produced by the environment, contains the results after refinement of the HEPAR knowledge base for the 82 patient cases from Leiden University Hospital. Most production rules (72 of about 500 rules contained in HEPAR) were applied only once. The accompanying textual form, which is reproduced in Figure 5, shows that from the rules that were applied several times, those with highest frequency were applied to conclude about the intermediate hypotheses. Only few, one to three, production rules were applied several times to reach a final conclusion. Because we have tried to obtain a rulebase 9 4 8 12 16 20 24 28 32 36 40 44 0 16 32 48 64 80 absolute frequency number of rules Figure 4: Rule-application bar graph for 82 patients. in which at most a few rules will succeed for a given case, the report does not provide results concerning failed rules. The reports were studied by the hepatologist involved in the project as another source for the refinement of HEPAR. 5 Discussion Recent knowledge-engineering methodologies place considerable emphasis on the development of conceptual models. A suitable conceptual model may be of real help in designing and im- plementing an expert system, as was also observed in the development of the HEPAR system, where a diagnostic problem-solving strategy was used as the basis for problem decomposition and structuring of the knowledge base. However, in many fields of medicine, the development of an expert system is only possible with sufficient experimental feedback, for which software tools are required. Some other software tools supporting the building of rule-based expert systems have been developed in the past. Teiresias was an experimental tool that assisted in the refinement of rule-based expert systems by interacting with the user in the analysis of the conclusions concerning single cases, applying meta-knowledge about the rulebase [4]. Although such an analysis is certainly useful, an approach as embodied by Teiresias does not give information about how well the system performs over a database of cases. Seek is a system that automatically suggests generalizations and specialization of production rules, based on the analysis of the success and failure of rules on processing case data [17]. This system is more in the line of our approach. The broadness of the domain of hepatology, and the amount of patient data incorporated in the HEPAR system, suggest that automatic tech- niques as provided by Seek are as yet not powerful enough to be applied for refining a system like HEPAR. The parameters used as a point of departure for the refinement of the HEPAR knowledge base are only a few of the many that are possible. Another elegant example of 10 FREQUENCY #RULES RULE NUMBER 1 72 161, 500, 510, 660, 700, 720, 790, 800, 860, 900, 930, 950, 980, 1100, 1140, 1150, 1330, 1340, 1370, 1410, 1480, 1540, 1610, 1630, 1710, 1730, 1820, 1980, 2000, 2010, 2030, 2050, 2140, 2240, 2250, 2380, 2530, 2680, 2690, 2750, 3011, 3020, 3022, 3090, 3100, 3101, 3120, 3160, 3171, 3192, 3220, 3340, 3370, 3425, 3430, 3450, 3460, 3470, 3520, 3530, 3610, 3780, 3820, 3870, 3902, 3930, 3960, 4000, 4111, 4122, 4162, 4260 2 26 110, 120, 130, 440, 460, 470, 740, 830, 1090, 1130, 1490, 1940, 1950, 2130, 2180, 2370, 3010, 3041, 3110, 3390, 3410, 3427, 3428, 3440, 3920, 3940 3 30 111, 112, 131, 132, 140, 190, 230, 240, 560, 580, 600, 620, 1220, 1900, 1970, 2060, 2080, 2110, 2170, 2220, 2280, 2310, 2720, 3190, 3350, 3970, 4230, 4240, 4250, 4252 4 7 260, 341, 890, 1580, 2160, 3080, 4320 5 8 1550, 1560, 2120, 2350, 3001, 3060, 3102, 3420 6 7 160, 300, 2330, 3050, 3070, 3900, 3903 7 11 90, 100, 332, 1210, 1570, 2200, 2320, 2340, 2970, 3000, 3910 8 4 270, 290, 340, 350 9 2 150, 180 10 3 250, 4080, 4310 12 2 540, 3103 18 2 220, 4020 22 1 370 26 1 5010 27 1 5000 30 1 5030 36 1 711 45 1 5020 Figure 5: Textual rule application form. knowledge-base refinement has been proposed by Adlassnig and Scheithauer in the context of the CADIAG-2/PANCREAS system [1]. They have used ROC curves for the optimal adjust- ment of the internal classification threshold in this expert system. The technique is, however, not applicable in the HEPAR system, because here failure of classification is the result of logical falsification, and not of the failure of reaching an internal threshold. Most current expert-system shells and expert-system builder tools do not offer facilities supporting the refinement of an expert system by dynamic validation. In the development of the HEPAR system, we have therefore developed a collection of simple software tools that provide useful information for the refinement of the system. These tools have also been used in two successive final validation studies of HEPAR [10, 12]. Although there are many ways in which these tools can be improved, these validation studies would not have been possible without the assistance of these tools. In our opinion, future software tools for building expert systems should offer a wider range of facilities for the detailed analysis, verification and validation of an expert system than is currently provided. References [1] K.P. Adlassnig and W. Scheithauer, Performance evaluation of medical expert systems using ROC curves, Computers and Biomedical Research 22 (1989) 297-313. [2] A. Anjewierden, J. Wielemaker, C. Toussaint, Shelley – Computer aided knowledge engineering, in: B. Wielinga, J. Boose, B. Gaines, G. Schreiber and M. van Someren, eds., Current Trends in Knowledge Acquisition (IOS Press, Amsterdam, 1990) 41-59. 11 [3] J. Breuker and B. Wielinga, Models of expertise in knowledge acquisition, in: G. Guida and C. Tasso, eds., Topics in Expert Systems Design (North-Holland, Amsterdam, 1989) 265-295. [4] R. Davis and D.B. Lenat, Knowledge-based Systems in Artificial Intelligence (McGraw- Hill, New York, 1982). [5] S.L. Graham, P.B. Kessler and M.K. McKusick, Gprof: a call graph execution profiler, in: Proceedings of the SIGPLAN’82 Symposium on Compiler Construction, SIGPLAN Notices 17 (1982) 120-126. [6] G. Guida and C. Tasso, Building expert systems: from life cycle to development method- ology, in: G. Guida and C. Tasso, eds., Topics in Expert Systems Design: methodologies and tools (North-Holland, Amsterdam, 1989) 3-24. [7] J. Hilden and J.D.F. Habbema, Evaluation of clinical decision aids – more to think about, Medical Informatics 15 (1990) 275-284. [8] C.A. Kulikowski and S.M. Weis, Representation of expert knowledge for consultation: the CASNET and EXPERT projects, in: P. Szolovits, ed., Artificial Intelligence in Medicine (Westview Press, Boulder, 1982) 21-56. [9] P.J.F. Lucas, Knowledge Representation and Inference in Rule-based Systems. Centre for Mathematics and Computer Science, Report CS-R8613, Amsterdam, 1986. [10] P.J.F. Lucas, R.W. Segaar, A.R. Janssens, HEPAR: an expert system for the diagnosis of disorders of the liver and biliary tract, Liver 9 (1989) 266-275. [11] P.J.F. Lucas, A.R. Janssens, Development and validation of HEPAR, an expert system for the diagnosis of disorders of the liver and biliary tract, Journal of Medical Informatics 16 (1991) 259-270. [12] P.J.F. Lucas, A.R. Janssens, Second evaluation of HEPAR, an expert system for the diagnosis of disorders of the liver and biliary tract, Liver 11 (1991) 340-346. [13] P.J.F. Lucas, L.C. van der Gaag, Principles of Expert Systems (Addison-Wesley, Wok- ingham, 1991). [14] P. Matzen, A. Malchow-Møller, J. Hilden, C. Thomsen, L.B. Svendsen, J. Gammelgaard, E. Juhl, Differential diagnosis of jaundice: a pocket diagnostic chart, Liver 4 (1984) 360- 71. [15] E. Motta, T. Rajan and M. Eisenstadt, A methodology and tool for knowledge acqusition in KEATS-2, in: G. Guida and C. Tasso, eds., Topics in Expert Systems Design (North- Holland, Amsterdam, 1989) 297-322. [16] E. Motta, T. Rajan, J. Domingue and M. Eisenstadt, Methodological foundation of KEATS, the knowledge engineer’s assistant, in: B. Wielinga, J. Boose, B. Gaines, G. Schreiber and M. van Someren, eds., Current Trends in Knowledge Acquisition (IOS Press, Amsterdam, 1990) 257-275. [17] P.G. Politakis, Emperical Analysis for Expert Systems (Pitman, London, 1985). 12 [18] R.E. Shannon, Systems Simulation: the art and science (Prentice-Hall, Englewood Cliffs, New Jersey, 1975). [19] I. Sommerville, Software Engineering (Addison-Wesley, Wokingham, 1992). 13