work_2cco4fotnjcvhnezvtxivsrouq ---- HugeDomains.com work_273jmyp63ba7zlvmlvgrkozvla ---- PII: 0196-8904(94)00075-B P e r g a m o n Energy Convers. Mgmt Vol. 36, No. 4, pp. 257-261, 1995 Copyright © 1995 Elsevier Science Ltd 0196-8904(94)00075-1 Printed in Great Britain. All rights reserved 0196-8904/95 $9.50 + 0.00 A N E X P E R T S Y S T E M A P P R O A C H T O T H E U N I T C O M M I T M E N T P R O B L E M D . P . K O T H A R P a n d A I J A Z A H M A D 2 1Centre for Energy Studies, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110 016 and 2Electrical Engineering Department, Regional Engineering College, Srinagar, Kashmir, India (Received 26 March 1994; received for publication 8 December 1994) Abstract--An expert system plays a key role in the better conservation and management of electrical energy in a power station having a number of units that have to be committed for a given load. This paper presents a hybrid expert system dynamic programming approach to the unit commitment problem. Here, the scheduling output of the usual dynamic programming is enhanced by supplementing it with the rule based expert system. The proposed system limits the number of constraints and also checks the possible constraint violations in the generated schedule. The expert system communicates with the operator in a friendly manner, and hence, the various program parameters can be adjusted to have an optimal, operationally acceptable schedule. Unit commitment Dynamic programming Expert system 1. I N T R O D U C T I O N I n o r d e r to select the m o s t e c o n o m i c a l schedule o f units in a p o w e r system, different p r o g r a m a l g o r i t h m s h a v e been developed in the past. These algorithms are b a s e d o n m a t h e m a t i c a l p r o g r a m m i n g m e t h o d s , such as d y n a m i c p r o g r a m m i n g ( D P ) [ 1 , 7], L a g r a n g i a n relaxation [2], b r a n c h a n d b o u n d etc. Experience with unit c o m m i t m e n t ( U C ) using a D P technique has s h o w n that, in o r d e r to o b t a i n a g o o d schedule, o p e r a t o r experience should be included in setting u p the c o n t r o l p a r a m e t e r s . This results in tuning the heuristic d a t a a n d input p a r a m e t e r s to p r o d u c e a lower cost a n d o p e r a t i o n a l l y acceptable schedule. A h y b r i d Artificial N e u r a l N e t w o r k ( A N N ) - D P a p p r o a c h [3], h a s also been used. This p a p e r p r o p o s e s a h y b r i d expert system (ES) d y n a m i c p r o g r a m m i n g a p p r o a c h to unit c o m m i t m e n t . H e r e , the scheduling o u t p u t o f the usual D P is enhanced b y s u p p l e m e n t i n g it with the rule b a s e d ES. This ES c o m b i n e s the knowledge o f the U C p r o g r a m m e r a n d a set o f rules. T h e m a i n feature o f the ES, thus developed using T u r b o P r o l o g V2.0, includes m a k i n g possible the use o f the p r o g r a m b y a n o n - e x p e r t user t h r o u g h its ability to guide the user t h r o u g h the i m p o r t a n t decision m a k i n g processes in the p r o b l e m set u p period. T h e system guides the user in adjusting the m a r g i n a l constraints, thereby ensuring a feasible solution within e c o n o m i c a l time limits. T h e m e t h o d h a s been applied to a 10 unit system, wherein the schedule has been o b t a i n e d b y the ES to ensure a feasible solution. I t provides a flexible p r o g r a m m e structure t h a t is easily a d a p t a b l e to changes in the system o p e r a t i o n . 2. E X P E R T S Y S T E M D E V E L O P M E N T I n the past, the ES a p p r o a c h for U C has been applied b y M o k h t a r i e t al. [5]. H e r e , they h a v e developed a n ES to analyze a n d i m p r o v e the D P b a s e d U C . T h e ES tool selected is b a s e d on the inference engine o f the well k n o w n M y c i n ES a n d has been developed for the c o m b u s t i o n turbine cycling p r o b l e m . O u a n g a n d S h a h i d e p o u r [4] h a v e p r o p o s e d a U C ES which applies the E A S E + N E X P E R T shell to o b t a i n a c o m m i t m e n t schedule b y using heuristic rules. A user friendly interface a n d graphic utilities h a v e also been set up. ECM 36/*--D 257 258 KOTHARI and AHMAD: THE UNIT COMMITMENT PROBLEM In the work presented here, a rule based ES approach [6] for handling the UC problem is based on the application of an ES to supplement the DP technique so that a more appropriate solution is achieved. 2.1. Main features of the ES development The main features of the ES developed here are: (i) It combines the knowledge of the UC programmer with the rule base and, hence, can lead an inexperienced operator to a better unit schedule. (ii) The knowledge base (KB)[9, 10], which is the data or knowledge used to make decisions, consists of two parts, i.e. the working memory and the rule base. The KB, containing the knowledge of both schedulers and mathematical programmers, is represented by if/then rule statements. (iii) Each rule represents a piece of knowledge relevant to the problem and the rules can be added, removed and modified in the knowledge base conveniently. (iv) Production rules are close to natural language and, therefore, easy to understand. This enables the user to communicate with the ES in a friendly manner in order to adjust various programme parameters, so that an optimal and acceptable solution is approached. (v) It can give the steps that lead to the conclusions and user's queries about the main phase of the problem. The user can confirm or correct the conclusion by examining the explanations given by the inference engine. The ES developed functions in the following main capacities: (i) Preprocessor of the DP results. (ii) KB and reasoning. (iii) Postprocessor of the result. (iv) User consultant. 2.2. Preprocessor of the DP results There are many constraints in the UC problem which are too difficult to include in the DP algorithm because it increases the programm execution time exponentially. In order to adjust the input data properly, the operator must have a thorough knowledge of the UC programme. The constraints, like operational and complex, are accommodated in the ES as described below. 2.2.1. Inclusion of operational constraints. The operational constraints are included by the user who is guided by the preprocessor, and this is accomplished through an interactive question and answer session during which the user will enter data and answer questions concerning the constraints, such as spinning reserve requirements, unit minimum up and down time, must run status, unit maintenance schedules, unit minimum and maximum generating limits etc. The information regarding these constraints is requested by the system and has to be supplied by the user. Take the case of the constraint "must-run-status". Here, it is assumed that, during any given hour, one or more units in any group may be designated as "must run", i.e. those units for which there will never be any doubt about the need to commit them either because of outstanding efficiency or exceptional capacity. There may also be certain units which will be known, in advance, to be unavailable due to scheduled or forced outage. Such units are designated as "must out". All such information is supplied by the user. 2.2.2. Inclusion of complex constraints. The inclusion of complex constraints in the problem not only makes the problem difficult but increases the program execution time also. This is particularly true of constraints like "unit minimum up and down time", which are time dependent. Here, it is assumed that such a constraint is not violated frequently, so that it can be relaxed and handled external to the UC program itself by adjusting the input data which is fed to it by the preprocessor. Detections for constraint violations can be made by including rules in the knowledge. Also, constraints like fuel constrained units, crew constraints and pollution constraints are present in the actual system operation, and these can be generalized to form production rules that would be added to the knowledge base. Here, in the present study, although it has been tried to include as many constraints as possible, yet these constraints could not be included to avoid the complexity of the problem. KOTHARI and AHMAD: THE UNIT COMMITMENT PROBLEM 259 2.3. Knowledge base and reasoning All the knowledge in solving the U C problem is presented by either a property list or a production rule, i.e. facts and rules. The property list has a form o f object-attribute-value. F o r instance, if the m a x i m u m capacity o f unit-1 is 500 MW, this knowledge is stored in the knowledge base by a triple: (Unit-l, m a x i m u m capacity, 500). Similarly, if the value o f cost parameter A o f unit 10 is Rs. 2500, this knowledge is stored as: (Unit-10, cost A, 2500). Other facts and system constraints are also represented in the same manner. The value portion o f the triple can be update if required. A n o t h e r form o f knowledge is a production rule. A rule expresses the relationship between facts and has a form o f (if clause then clause). Such rule based representation allows the expert system to approach a problem in a way similar to a h u m a n expert. An example o f knowledge representation is given below: I f the unit start up cost is low, A n d the efficiency is high, A n d the unit cost parameter values are low, Then the unit has 'must-run-status'. Such a rule is represented in Turbo Prolog as follows: Hypothesis (must-run-status), condition (start-up-cost-low), condition (efficiency high), condition (cost-parameter-values-low). The inference engine examines the first predicate o f the rule. If the start up cost is low, the engine goes to check the second predicate, if the efficiency is high, then it goes to check the third predicate, and it the cost parameter values are low, then the engine concludes that the unit has must-run-status. The facts for all units, like m i n i m u m and m a x i m u m generating limits, cost curve parameters, start up times etc. and the load curve for the period are stored in the d a t a base. This corresponds to the prolog's static database. The preceding rule a b o u t must-run-status, for example, is a part o f the rule base/static database. 2.3.1. Production rules in the knowledge base. In order to have a fast solution, a production rule based and priority list based heuristics are implemented. The rules, here used, use the priority list as the starting point with additional heuristics leading to the optimal commitment decisions. The decisions in the ES developed here are made according to the following set o f rules: Rule 1: I f a particular unit which is on the top o f the priority list o f available uncommitted units but has a higher m i n i m u m up-time (say, 4 h) than o f the other units (say 1 h) and the spinning reserve requirement is insufficient only for say 1 h, then it is more economical to commit the second unit for 1 h rather than the first unit for at least the m i n i m u m up-time o f 4 h. Rule 2: If, at a n y stage, there is an outage associated with the normally committed units, then commit the next most economical unit. Rule 3: Assign the value o f m a x i m u m tolerable insecurity level (MTIL) to be 0.000445. If, due to some forced outage, this value o f M T I L is not satisfied, then commit the units to the next nearest higher value o f M T I L . Rule 4: I f the load is much less t h a n expected, then the " m u s t r u n " unit (if suggested by the user) should be one o f the committed units in such a case. Rule 5: I f a unit is committed, decommitted and committed again and the " o f f " status in between is close or equal to the m i n i m u m down time o f the unit, commit the unit continuously at the required output. Rule 6: At any instant, if some committed units o f smaller capacity can be replaced in an " o f f " state, then do such replacement, preserving the spinning reserve constraint. 260 KOTHARI and AHMAD: THE UNIT COMMITMENT PROBLEM Rule 7: All the c o m m i t t e d units must operate at least at their m i n i m u m o u t p u t conditions. Rule 8: I f the load d e m a n d is not met at any instant, first let the m o st efficient unit generate m o r e power until either its m a x i m u m o u t p u t is reached o r the load d e m a n d is fulfilled. Rule 9" At any stage, if the sum o f the m a x i m u m generating capacity o f all c o m m i t t e d units is greater t h a n the load demand, then d e c o m m i t only that unit which has the lowest efficiency. 2.4. Postprocessor T h e function o f the postprocessor is to guide the user in analyzing the schedule an d also m a k e suggestions to the user a b o u t adjustments to p r o g r a m p aram et ers to i m p ro v e the overall schedule cost. T h e parameters adjusted here include m i n i m u m up an d d o w n time, unit initial conditions a n d priority ordering. A n o t h e r function o f the ES postprocessor is to detect constraint violations, which is d o n e by including rules in the knowledge base. These rules can deduce logically a violation condition f r o m the schedule o u t p u t and, subsequently, advise the user an d r e c o m m e n d corrections for the problem. We have seen that m i n i m u m u p and d o w n times are particularly difficult to model and c a n n o t be i n c o r p o r a t e d directly in a D P routine. So, to detect such violations, a rule can be introduced in the knowledge base. F o r example, one o f such rules is: Check constraint (up-time-violated): condition (unit on), condition (unit off), condition (dur-less-than-1 h). This rule states that, if the unit is committed, then d e c o m m i t t e d an d the d u r a t i o n between o n an d off is less than 1 h, then the unit up time is violated. T h e condition " u p - t i m e - v i o l a t e d " will n o t be a fact in the database. T h e r e are a n u m b e r o f such rules which have been used in the p r o g r a m m e for checking constraint violations. 2.5. User interface A natural language permits the user to c o m m u n i c a t e with the ES during the consultation process using a h u m a n language. T h e main feature o f the user interface, here, is t h at it works as an e x p l a n a t o r y system which permits the user to answer a question with why. T h e system then responds with any i n f o r m a t i o n regarding the c o m m i t m e n t o f a particular unit at a particular time and so helps the user to u n d e r s t a n d the reasoning process. A n y question raised by the o p e r a t o r concerning the result is answered by the ES, an d i f the results seemed to be unsatisfactory, the K B was modified and the entire process r e p e a t e d until satisfactory g o o d results were obtained. 3. S A M P L E S Y S T E M The rule based ES developed here is applied to analyze an d improve the D P results o f the sample system o f Ref. [8]. F r o m various consultation sessions with the ES, it was seen t h at the system is quite helpful in improving the results. A n u m b e r o f U C runs were m a d e to test the capability and friendliness o f the ES. T h e p r o g r a m has been ru n o n an I B M - P C - A T . Th e consultation sessions provide almost all i n f o r m a t i o n a b o u t the various U C schedules given at different values o f M T I L . T h e system also gives an explanation a b o u t an y decisions t ak en by it. It was seen that, in o rd er to increase the effectiveness o f the system, a considerable database is required. As the rules included here in the K B have been obtained mainly f r o m a t h o r o u g h survey o f the literature an d are based o n heuristics, therefore, it is expected that, if the rules o b t ai n ed f r o m field experts are included here, the system will obviously pave the way for handling the practical U C problem. The sample system consists o f 10 thermal generating units with different generating capacities and start up times. T h e time span is 24 h. It was pointed out by the ES n o t to put unit 2 o ff fo r 1 h an d then restart it again. It suggested that the unit must be kept on line f r o m the f o u r t h to the sixth hour. F u r t h e r , it suggested t h at the value o f M T I L at which the earlier schedule was c o m p u t e d by D P can be increased so t h at the overall schedule will be m o r e economical. T h e final U C schedule is depicted in Table 1. KOTHARI and AHMAD: THE UNIT COMMITMENT PROBLEM 261 Table 1 Load Unit status Hour (MW) l 2 3 4 5 6 7 8 9 10 1 2000 0 1 0 2 1980 0 1 0 3 1940 0 1 0 4 1900 0 1 1 5 1840 0 0 0 6 1870 0 1 0 7 1820 0 0 0 8 1700 0 0 0 9 1510 0 0 0 I0 1410 0 0 0 11 1320 0 0 0 12 1260 0 0 0 13 1200 0 0 0 14 1160 0 0 0 15 1140 0 0 0 16 1160 0 0 0 17 1260 0 0 0 18 1380 0 0 1 19 1560 0 0 1 20 1700 0 0 1 21 1820 0 0 1 22 1900 0 1 1 23 1950 0 I 1 24 1990 0 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 I 0 1 0 0 1 0 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 When one compares the schedule obtained here to that obtained purely by D P [8], there is not much variation except the points discussed above, but the main feature is the capability o f the ES to approach the operator in a friendly manner. A better solution may be achieved if the knowledge o f a good number of field experts is included in the database. 4 . C O N C L U S I O N An expert system and dynamic programming hybrid has been presented for solving the unit commitment problem. The dynamic programming solution o f a sample system of 10 units was supplemented by the expert system, resulting in better management o f the units and, hence, better energy conservation. R E F E R E N C E S 1. P. G. Lowery, I E E E Trans. Power Apparatus Systems PAS-85, (1966). 2. Slobodan Ruzic and Nikola Rajakovic, I E E E Trans. Power Systems 6, No. 1 (1991). 3. Z. Ouang and S. M. Shahidepour, 1EEE Trans. Power Systems 6, No. 3 (1991). 4. Z. Ouang and S. M. Shahidepour, Electric Power System Res. 20, No. 3 (1990). 5. Sasan Mokhtari, Jagjit Singh and Bruce Wollenberg, I E E E Trans. Power Systems 3, No. 1 (1988). 6. S. K. Tong et al., I E E E Trans. Power Systems 6, No. 3 (1991). 7. C. K. Pang and H. C. Chen, I E E E Trans. Power Systems PAS-95, No. 4 (1976). 8. A. K. Ayub and A. D. Patton, I E E E Trans. Power Systems PAS-90, (1971). 9. T. Sakaguchi et al., I E E E Trans. Power Systems PAS-102, No. 2 (1983). 10. Tim Taylor et al., I E E E Trans. Power Systems 4, No. 1 (1989). 11. I. J. Nagrath and D. P. Kothari, Power System Engineering. McGraw-Hill, New Delhi (1994). work_27mb4lyoerh5zhj32bxqmm23su ---- SOME EXPERT SYSTEM NEED COMMON SENSE John McCarthy Computer Science Department Stanford University Stanford, CA 94305 jmc@cs.stanford.edu http://www-formal.stanford.edu/jmc/ 1984 Abstract An expert system is a computer program intended to embody the knowl- edge and ability of an expert in a certain domain. The ideas behind them and several examples have been described in other lectures in this symposium. Their performance in their specialized domains are often very impressive. Nevertheless, hardly any of them have certain common sense knowledge and ability possessed by any non-feeble-minded human. This lack makes them “brittle”. By this is meant that they are difficult to extend beyond the scope originally contemplated by their designers, and they usually don’t recognize their own limitations. Many important applications will require common sense abilities. The object of this lecture is to describe common sense abili- ties and the problems that require them. Common sense facts and methods are only very partially understood to- day, and extending this understanding is the key problem facing artificial intelligence. This isn’t exactly a new point of view. I have been advocating “Com- puter Programs with Common Sense”since I wrote a paper with that title in 1 1958. Studying common sense capability has sometimes been popular and sometimes unpopular among AI researchers. At present it’s popular, per- haps because new AI knowledge offers new hope of progress. Certainly AI researchers today know a lot more about what common sense is than I knew in 1958 — or in 1969 when I wrote another paper on the subject. However, expressing common sense knowledge in formal terms has proved very difficult, and the number of scientists working in the area is still far too small. One of the best known expert systems is MYCIN (Shortliffe 1976; Davis, Buchanan and Shortliffe 1977), a program for advising physicians on treating bacterial infections of the blood and meningitis. It does reasonably well without common sense, provided the user has common sense and understands the program’s limitations. MYCIN conducts a question and answer dialog. After asking basic facts about the patient such as name, sex and age, MYCIN asks about suspected bacterial organisms, suspected sites of infection, the presence of specific symptoms (e.g. fever, headache) relevant to diagnosis, the outcome of labo- ratory tests, and some others. It then recommends a certain course of antibi- otics. While the dialog is in English, MYCIN avoids having to understand freely written English by controlling the dialog. It outputs sentences, but the user types only single words or standard phrases. Its major innovations over many previous expert systems were that it uses measures of uncertainty (not probabilities) for its diagnoses and the fact that it is prepared to explain its reasoning to the physician, so he can decide whether to accept it. Our discussion of MYCIN begins with its ontology. The ontology of a program is the set of entities that its variables range over. Essentially this is what it can have information about. MYCIN’s ontology includes bacteria, symptoms, tests, possible sites of in- fection, antibiotics and treatments. Doctors, hospitals, illness and death are absent. Even patients are not really part of the ontology, although MYCIN asks for many facts about the specific patient. This is because patients aren’t values of variables, and MYCIN never compares the infections of two differ- ent patients. It would therefore be difficult to modify MYCIN to learn from its experience. MYCIN’s program, written in a general scheme called EMYCIN, is a so- called production system. A production system is a collection of rules, each of which has two parts — a pattern part and an action part. When a rule is activated, MYCIN tests whether the pattern part matches the database. If so this results in the variables in the pattern being matched to whatever 2 entities are required for the match of the database. If not the pattern fails and MYCIN tries another. If the match is successful, then MYCIN performs the action part of the pattern using the values of the variables determined by the pattern part. The whole process of questioning and recommending is built up out of productions. The production formalism turned out to be suitable for representing a large amount of information about the diagnosis and treatment of bacterial infections. When MYCIN is used in its intended manner it scores better than medical students or interns or practicing physicians and on a par with experts in bacterial diseases when the latter are asked to perform in the same way. However, MYCIN has not been put into production use, and the reasons given by experts in the area varied when I asked whether it would be appropriate to sell MYCIN cassettes to doctors wanting to put it on their micro-computers. Some said it would be ok if there were a means of keeping MYCIN’s database current with new discoveries in the field, i.e. with new tests, new theories, new diagnoses and new antibiotics. For example, MYCIN would have to be told about Legionnaire’s disease and the associated Legionnella bacteria which became understood only after MYCIN was finished. (MYCIN is very stubborn about new bacteria, and simply replies “unrecognized response”.) Others say that MYCIN is not even close to usable except experimen- tally, because it doesn’t know its own limitations. I suppose this is partly a question of whether the doctor using MYCIN is trusted to understand the documentation about its limitations. Programmers always develop the idea that the users of their programs are idiots, so the opinion that doctors aren’t smart enough not to be misled by MYCIN’s limitations may be at least partly a consequence of this ideology. An example of MYCIN not knowing its limitations can be excited by telling MYCIN that the patient has Cholerae Vibrio in his intestines. MYCIN will cheerfully recommend two weeks of tetracycline and nothing else. Pre- sumably this would indeed kill the bacteria, but most likely the patient will be dead of cholera long before that. However, the physician will presumably know that the diarrhea has to be treated and look elsewhere for how to do it. On the other hand it may be really true that some measure of common sense is required for usefulness even in this narrow domain. We’ll list some areas of common sense knowledge and reasoning ability and also apply the criteria to MYCIN and other hypothetical programs operating in MYCIN’s domain. 3 1 WHAT IS COMMON SENSE? Understanding common sense capability is now a hot area of research in artificial intelligence, but there is not yet any consensus. We will try to divide common sense capability into common sense knowledge and common sense reasoning, but even this cannot be made firm. Namely, what one man builds as a reasoning method into his program, another can express as a fact using a richer ontology. However, the latter can have problems in handling in a good way the generality he has introduced. 2 COMMON SENSE KNOWLEDGE We shall discuss various areas of common sense knowledge. 1. The most salient common sense knowledge concerns situations that change in time as a result of events. The most important events are actions, and for a program to plan intelligently, it must be able to determine the effects of its own actions. Consider the MYCIN domain as an example. The situation with which MYCIN deals includes the doctor, the patient and the illness. Since MYCIN’s actions are advice to the doctor, full planning would have to include infor- mation about the effects of MYCIN’s output on what the doctor will do. Since MYCIN doesn’t know about the doctor, it might plan the effects of the course of treatment on the patient. However, it doesn’t do this either. Its rules give the recommended treatment as a function of the information elicited about the patient, but MYCIN makes no prognosis of the effects of the treatment. Of course, the doctors who provided the information built into MYCIN considered the effects of the treatments. Ignoring prognosis is possible because of the specific narrow domain in which MYCIN operates. Suppose, for example, a certain antibiotic had the precondition for its usefulness that the patient not have a fever. Then MYCIN might have to make a plan for getting rid of the patient’s fever and verifying that it was gone as a part of the plan for using the antibiotic. In other domains, expert systems and other AI programs have to make plans, but MYCIN doesn’t. Perhaps if I knew more about bacterial diseases, I would conclude that their treatment sometimes really does require planning and that lack of planning ability limits MYCIN’s utility. The fact that MYCIN doesn’t give a prognosis is certainly a limitation. 4 For example, MYCIN cannot be asked on behalf of the patient or the admin- istration of the hospital when the patient is likely to be ready to go home. The doctor who uses MYCIN must do that part of the work himself. More- over, MYCIN cannot answer a question about a hypothetical treatment, e.g. “What will happen if I give this patient penicillin?” or even “What bad things might happen if I give this patient penicillin?”. 2. Various formalisms are used in artificial intelligence for representing facts about the effects of actions and other events. However, all systems that I know about give the effects of an event in a situation by describing a new situation that results from the event. This is often enough, but it doesn’t cover the important case of concurrent events and actions. For example, if a patient has cholera, while the antibiotic is killing the cholera bacteria, the damage to his intestines is causing loss of fluids that are likely to be fatal. Inventing a formalism that will conveniently express people’s common sense knowledge about concurrent events is a major unsolved problem of AI. 3. The world is extended in space and is occupied by objects that change their positions and are sometimes created and destroyed. The common sense facts about this are difficult to express but are probably not important in the MYCIN example. A major difficulty is in handling the kind of partial knowledge people ordinarily have. I can see part of the front of a person in the audience, and my idea of his shape uses this information to approximate his total shape. Thus I don’t expect him to stick out two feet in back even though I can’t see that he doesn’t. However, my idea of the shape of his back is less definite than that of the parts I can see. 4. The ability to represent and use knowledge about knowledge is often required for intelligent behavior. What airline flights there are to Singapore is recorded in the issue of the International Airline Guide current for the proposed flight day. Travel agents know how to book airline flights and can compute what they cost. An advanced MYCIN might need to reason that Dr. Smith knows about cholera, because he is a specialist in tropical medicine. 5. A program that must co-operate or compete with people or other pro- grams must be able to represent information about their knowledge, beliefs, goals, likes and dislikes, intentions and abilities. An advanced MYCIN might need to know that a patient won’t take a bad tasting medicine unless he is convinced of its necessity. 6. Common sense includes much knowledge whose domain overlaps that of the exact sciences but differs from it epistemologically. For example, if I spill the glass of water on the podium, everyone knows that the glass will 5 break and the water will spill. Everyone knows that this will take a fraction of a second and that the water will not splash even ten feet. However, this information is not obtained by using the formula for a falling body or the Navier-Stokes equations governing fluid flow. We don’t have the input data for the equations, most of us don’t know them, and we couldn’t integrate them fast enough to decide whether to jump out of the way. This common sense physics is contiguous with scientific physics. In fact scientific physics is imbedded in common sense physics, because it is common sense physics that tells us what the equation s = 0.5gt2 means. If MYCIN were extended to be a robot physician it would have to know common sense physics and maybe also some scientific physics. It is doubtful that the facts of the common sense world can be represented adequately by production rules. Consider the fact that when two objects collide they often make a noise. This fact can be used to make a noise, to avoid making a noise, to explain a noise or to explain the absence of a noise. It can also be used in specific situations involving a noise but also to understand general phenomena, e.g. should an intruder step on the gravel, the dog will hear it and bark. A production rule embodies a fact only as part of a specific procedure. Typically they match facts about specific objects, e.g. a specific bacterium, against a general rule and get a new fact about those objects. Much present AI research concerns how to represent facts in ways that permit them to be used for a wide variety of purposes. 3 COMMON SENSE REASONING Our ability to use common sense knowledge depends on being able to do common sense reasoning. Much artificial intelligence inference is not designed to use directly the rules of inference of any of the well known systems of mathematical logic. There is often no clear separation in the program between determining what inferences are correct and the strategy for finding the inferences required to solve the problem at hand. Nevertheless, the logical system usually corre- sponds to a subset of first order logic. Systems provide for inferring a fact about one or two particular objects from other facts about these objects and a general rule containing variables. Most expert systems, including MYCIN, never infer general statements, i.e. quantified formulas. 6 Human reasoning also involves obtaining facts by observation of the world, and computer programs also do this. Robert Filman did an interesting thesis on observation in a chess world where many facts that could be obtained by deduction are in fact obtained by observation. MYCIN’s doesn’t require this, but our hypothetical robot physician would have to draw conclusions from a patient’s appearance, and computer vision is not ready for it. An important new development in AI (since the middle 1970s) is the formalization of nonmonotonic reasoning. Deductive reasoning in mathematical logic has the following property — called monotonicity by analogy with similar mathematical concepts. Sup- pose we have a set of assumptions from which follow certain conclusions. Now suppose we add additional assumptions. There may be some new con- clusions, but every sentence that was a deductive consequence of the original hypotheses is still a consequence of the enlarged set. Ordinary human reasoning does not share this monotonicity property. If you know that I have a car, you may conclude that it is a good idea to ask me for a ride. If you then learn that my car is being fixed (which does not contradict what you knew before), you no longer conclude that you can get a ride. If you now learn that the car will be out in half an hour you reverse yourself again. Several artificial intelligence researchers, for example Marvin Minsky (1974) have pointed out that intelligent computer programs will have to reason non- monotonically. Some concluded that therefore logic is not an appropriate formalism. However, it has turned out that deduction in mathematical logic can be supplemented by additional modes of nonmonotonic reasoning, which are just as formal as deduction and just as susceptible to mathematical study and computer implementation. Formalized nonmonotonic reasoning turns out to give certain rules of conjecture rather than rules of inference — their conclu- sion are appropriate, but may be disconfirmed when more facts are obtained. One such method is circumscription, described in (McCarthy 1980). A mathematical description of circumscription is beyond the scope of this lecture, but the general idea is straightforward. We have a property applicable to objects or a relation applicable to pairs or triplets, etc. of objects. This property or relation is constrained by some sentences taken as assumptions, but there is still some freedom left. Circumscription further constrains the property or relation by requiring it to be true of a minimal set of objects. 7 As an example, consider representing the facts about whether an object can fly in a database of common sense knowledge. We could try to provide axioms that will determine whether each kind of object can fly, but this would make the database very large. Circumscription allows us to express the assumption that only those objects can fly for which there is a positive statement about it. Thus there will be positive statements that birds and airplanes can fly and no statement that camels can fly. Since we don’t include negative statements in the database, we could provide for flying camels, if there were any, by adding statements without removing existing statements. This much is often done by a simpler method — the closed world assumption discussed by Raymond Reiter. However, we also have exceptions to the general statement that birds can fly. For example, penguins, ostriches and birds with certain feathers removed can’t fly. Moreover, more exceptions may be found and even exceptions to the exceptions. Circumscription allows us to make the known exceptions and to provide for additional exceptions to be added later — again without changing existing statements. Nonmonotonic reasoning also seems to be involved in human communi- cation. Suppose I hire you to build me a bird cage, and you build it without a top, and I refuse to pay on the grounds that my bird might fly away. A judge will side with me. On the other hand suppose you build it with a top, and I refuse to pay full price on the grounds that my bird is a penguin, and the top is a waste. Unless I told you that my bird couldn’t fly, the judge will side with you. We can therefore regard it as a communication convention that if a bird can fly the fact need not be mentioned, but if the bird can’t fly and it is relevant, then the fact must be mentioned. References Davis, Randall; Buchanan, Bruce; and Shortliffe, Edward (1977). Production Rules as a Representation for a Knowledge-Based Consultation Program, Artificial Intelligence, Volume 8, Number 1, February. McCarthy, John (1960). Programs with Common Sense, Proceedings of the Teddington Conference on the Mechanization of Thought Processes, London: Her Majesty’s Stationery Office. (Reprinted in this volume, pp. 000–000). McCarthy, John and Patrick Hayes (1969). Some Philosophical Problems from the Standpoint of Artificial Intelligence, in B. Meltzer and D. Michie (eds), Machine Intelligence 4, Edinburgh University. (Reprinted in B. L. 8 Webber and N. J. Nilsson (eds.), Readings in Artificial Intelligence, Tioga, 1981, pp. 431–450; also in M. J. Ginsberg (ed.), Readings in Nonmonotonic Reasoning, Morgan Kaufmann, 1987, pp. 26–45; also in this volume, pp. 000–000.) McCarthy, John (1980). Circumscription — A Form of Nonmonotonic Rea- soning, Artificial Intelligence, Volume 13, Numbers 1,2. (Reprinted in B. L. Webber and N. J. Nilsson (eds.), Readings in Artificial Intelligence, Tioga, 1981, pp. 466–472; also in M. J. Ginsberg (ed.), Readings in Nonmonotonic Reasoning, Morgan Kaufmann, 1987, pp. 145–152; also in this volume, pp. 000–000.) Minsky, Marvin (1974). A Framework for Representing Knowledge, M.I.T. AI Memo 252. Shortliffe, Edward H. (1976). Computer-Based Medical Consultations: MYCIN, American Elsevier, New York, NY. ANSWERS TO QUESTIONS DISCUSSION OF THE PAPER QUESTION: You said the programs need common sense, but that’s like saying, If I could fly I wouldn’t have to pay Eastern Airliness $44 to haul me up here from Washington. So if the programs indeed need common sense, how do we go about it? Isn’t that the point of the argument? DR. MCCARTHY: I could have made this a defensive talk about artificial intelligence, but I chose to emphasize the problems that have been identified rather than the progress that has been made in solving them. Let me remind you that I have argued that the need for common sense is not a truism. Many useful things can be done without it, e.g. MYCIN and also chess programs. QUESTION: There seemed to be a strong element in your talk about common sense, and even humans developing it, emphasizing an experiential compo- nent — particularly when you were giving your example of dropping a glass of water. I’m wondering whether the development of these programs is going to take similar amounts of time. Are you going to have to have them go through the sets of experiences and be evaluated? Is there work going on in terms of speeding up the process or is it going to take 20 years for a program from the time you’ve put in its initial state to work up to where it has a decent amount of common sense? 9 DR. MCCARTHY: Consider your 20 years. If anyone had known in 1963 how to make a program learn from its experience to do what a human does after 20 years, they might have done it, and it might be pretty smart by now. Already in 1958 there had been work on programs that learn from experi- ence. However, all they could learn was to set optimal values of numerical parameters in the program, and they were quite limited in their ability to do that. Arthur Samuel’s checker program learned optimal values for its parameters, but the problem was that certain kinds of desired behavior did not correspond to any setting of the parameters, because it depended on the recognition of a certain kind of strategic situation. Thus the first prerequisite for a program to be able to learn something is that it be able to represent internally the desired modification of behavior. Simple changes in behavior must have simple representations. Turing’s universality theory convinces us that arbitrary behaviors can be represented, but they don’t tell us how to represent them in such a way that a small change in behavior is a small change in representation. Present methods of changing programs amount to education by brain surgery. QUESTION: I would ask you a question about programs needing common sense in a slightly different way, and I want to use the MYCIN program as an example. There are three actors there — the program, the physician, and the pa- tient. Taking as a criterion the safety of the patient, I submit that you need at least two of these three actors to have common sense. For example if (and sometimes this is the case) one only were sufficient, it would have to be the patient because if the program didn’t use common sense and the physician didn’t use common sense, the patient would have to have common sense and just leave. But usually, if the program had common sense built in and the physician had common sense but the patient didn’t, it really might not matter because the patient would do what he or she wants to do anyway. Let me take another possibility. If only the program has common sense and neither the physician nor the patient has common sense, then in the long run the program also will not use the common sense. What I want to say is that these issues of common sense must be looked at in this kind of frame of reference. DR. MCCARTHY: In the use of MYCIN, the physician is supposed to supply the common sense. The question is whether the program must also have 10 common sense, and I would say that the answer is not clear in the MYCIN case. Purely computational programs don’t require common sense, and none of the present chess programs have any. On the other hand, it seems clear that many other kinds of programs require common sense to be useful at all. /@steam.stanford.edu:/u/ftp/jmc/someneed.tex: begun 1996 May 12, latexed 1996 May 12 at 1:24 p.m. 11 work_2eewznxmxbgbfbsg2ousndspha ---- PII: S0957-4174(96)00050-4 Pergamon Expert Systems With Applications, Vol. 11, No. 3, pp. 351-360, 1996 Copyright © 1996 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0957--4174/96 $15.00+0.00 PII: S0957-4174(96)00050-4 Self-Integrating Knowledge-Based Brain Tumor Diagnostic System CHING-HUNG WANG'~ Institute of Computer and Information Science, National Chiao-Tung University, Hsin-Chu 30050, Taiwan, R.O.C. TZUNG-PEI HONG Department of Information Management, Kaohsiung Polytechnic Institute, Kaohsiung 84008, Taiwan, R.O.C. SHIAN-SHYONG TSENG Institute of Computer and Information Science, National Chiao-Tung University, Hsin-Chu 30050, Taiwan, R.O.C. Abstract--In this paper, we present a self-integrating knowledge-based expert system f o r brain tumor diagnosis. The system we propose comprises knowledge building, knowledge inference and knowledge refinement. During knowlege building, an automatic knowledge-integration process, based on Darwin's theory o f natural selection, integrates knowledge derived from knowledge-acquisition tools and machine- learning methods to construct an initial knowledge base, thus eliminating a major bottleneck in developing a brain tumor diagnostic system. During the knowledge inference process, an inference engine exploits rules in the knowledge base to help diagnosticians determine brain tumor etiologies according to computer tomography pictures. And, a simple knowledge refinement method is proposed to modify the existing knowledge base during inference, which dramatically improves the accuracy o f the derived rules. The performance o f the brain tumor diagnostic system has been evaluated on actual brain tumor cases. Copyright © 1996 Elsevier Science Ltd 1. I N T R O D U C T I O N RECENTLY, EXPERT SYSTEMS h a v e been successfully applied to many fields and have shown excellent performance. Expert systems provide sound expertise in the form o f diagnosis, instruction, prediction, consulta- tion and so on. They can also be used as training tools to help new personnel interpret data and monitor observa- tions (Waterman, 1986). Developing a successful expert system requires, however, effectively integrating knowl- edge from a variety o f sources, such as that from domain experts, historical documentary evidence, or current records, to construct a complete, consistent and unambi- guous knowledge base (Baral, 1991; Gragun, 1987). For large-scale expert systems that generally cannot rely on a single knowledge source, the use o f multiple knowledge "~ Author for correspondence. A l s o Directorate General o f Tele- communication Laboratories, Ministry o f Transportation and Communications, Chung-Li, Taiwan 32617, Taiwan, R.O.C. inputs from many knowledge sources is especially important to ensure comprehensive coverage. Thus, integrating multiple knowledge sources plays a critical role in building successful expert systems. In this paper, we present a brain tumor diagnostic system that can integrate multiple knowledge sources to quickly build a prototype knowledge base. This prototype knowledge base then adapts itself according to inference results from the expert system, consequently improving the accuracy o f the rules it derives. The brain tumor diagnostic system (BTDS) consists o f three main functional units: knowledge building, knowl- edge inference and knowledge refinement (Wang & Tseng, 1995). The knowledge-building unit includes three modules: machine learning, knowledge acquisition and knowledge integration. The machine-learning mod- ule maintains a variety o f machine-learning strategies (Cendowska, 1987; Michalski, 1980; Mitchell, 1982; Quinlan, 1986) to induce knowledge from actual instances. The knowledge-acquisition module maintains 351 352 Ching Hung Wang et al. different knowledge-acquisition tools that allow knowl- edge engineers to acquire domain knowledge from various experts (Kelly, 1955; Hwang & Tseng, 1990). The knowledge-integration module uses evolutionary theory to automatically integrate knowledge from multiple sources (which may be derived by knowledge- acquisition tools or machine-learning methods) into the initial knowledge base. The inference unit helps diag- nosticians determine brain tumor etiologies according to computer axial tomography pictures. The knowledge- refinement unit uses a proposed knowledge-refinement method to modify the existing knowledge base during the inference process. The remainder o f this paper is organized as follows. The problem domain is introduced in Section 2. The architecture of the brain tumor diagnostic system is presented in Section 3. A knowlege-building unit is proposed in Section 4. A knowledge-inference unit is introduced in Section 5 . A knowledge-refinement method is proposed in Section 6. The implementation of the brain tumor diagnostic system is presented in Section 7. Conclusions are given in Section 8. 2. T H E P R O B L E M D O M A I N The field of brain tumor diagnosis is quite interesting and full of challenge since the brain is very complex and many causes of brain tumors are still unclear (Wills, 1982). Computer tomography (CT) is generally con- sidered the most reliable diagnostic technique for locating and characterizing brain tumors. Nearly all intracranial lesions are detected using CT. The usual examination involves scanning the neurocranium in a series of parallel transverse "slices". The head is bent forward so that the sectional plane lies at an angle of 12 ° to the orbitomeatal lines (Fig. 1). Each slice is 8 nun thick, so that 8-15 slices are usually sufficient to visualize the intracranial structures to be examined. A patient with a meningiomal tumor is shown in Fig. 2. Normally, several stages are necessary for doctors to FIGURE 1. Positions of six standard CT scans. FIGURE 2. An example of a CT picture. diagnose brain tumors. First, CT pictures of a patient's brain are analyzed and compared to determine the location and the density of the lesion. Next, the CT pictures are further analyzed to obtain data on calcifica- tion, degree of edema, shape of edema, degree of enhancement, type of enhancement, general appearance, size of mass, mass effect and bone change. After that, some possible brain tumors could be concluded. The brain tumor diagnosis is still difficult for inexperi- enced doctors due to the inherent complexity of brain tumors. Thus, combining multiple knowledge sources including knowledge from domain experts, historical documentary information and current records of actual instances, to develop a successful brain tumor diagnostic system is very important. From data supplied by Veterans' General Hospital (VGH) in Taipei, Taiwan, 12 parameters presently used in describing pictures derived by computerized axial tomography (CAT) scanning are shown in Table 1. One of six possible classes of brain tumors including pituitary adenoma, meningioma, medulloblastoma, glio- blastoma, astrocytoma and anaplastic protoplasmic astrocytoma (which are frequently found in Taiwan), must be identified. 348 actual cases of brain tumors from Veterans' General Hospital were used to evaluate the proposed system's performance. Table 2 shows an actual case expressed in terms of 12 features derived by computerized axial tomography (CAT) scanning, and a pathology report. 3. S Y S T E M A R C H I T E C T U R E The brain tumor diagnostic system proposed here consists of three main units: knowledge building, knowledge inference and knowledge refinement. These three units respectively generate, use and alter the rules in the knowledge base (Fig. 3). The knowledge building unit includes three modules: machine learning, knowledge acquisition and knowledge integration. The machine-learning module maintains a variety of learning methods (Michalski, 1980; Mitchell, 1982; Quinlan, 1986; Cendrowska, 1987) to induce various knowledge sources from different instance sets. Brain Tumor Diagnostic System 353 The knowledge-acquisition module maintains various knowledge acquisition tools (Kelly, 1955; Hwang & Tseng, 1990) that allow domain experts to input knowledge. Knowledge might then be directly obtained by various human experts using different knowledge- acquisition tools, or derived from different machine-learning methods. The knowledge-integration module rapidly combines multiple knowledge derived by the machine-learning module or the knowlege-acquisi- tion module to build a prototype knowledge base. The knowledge-integration approach is an adaptive search method, thus eliminating a major difficulty in knowledge integration. The knowledge inference component includes several modules: user interface, working memory, inference engine and explanation facility. The user interface helps users communicate easily with the expert system. The working memory stores facts that willbe used during the course of a consultation. The inference engine generates new facts based on the rules and facts currently known. The explanation facility, when requested, explains the system's reasoning to the user. A knowledge base integrated from multiple knowl- edge sources is often only a prototype, with TABLE 1 Twelve Brain Tumor Attributes and their Possible Values 1. LOCATION: (1) Brain parenchyma a. frontal b. temporal c. parietal d. occipital e. thalamus f. basal ganglia g. corpus callosum (2) Interior surface of brain a. frontal horn b. body of lateral ventricle c. atrium d. occipital horn e. temporal horn f. third ventricle (anterior) g. posterior third ventricle h. pineal region (3) Brain surface (excluding skull base, vault) Convexity: a. frontal b. temporal c. parietal d. occipital Parasagitah e. frontal f. parietal g. occipital h. flax i. tentorium (4) Skull vault (5) Anterior skull base (6) Middle cranial fossa (excluding sella) a. clivus b. sphenoid ridge c. parasagital skull base (7) Sellar (8) Sellar and suprasellar (9) Suprasellar (including tuberculum sellar) (10) ParaseUar (11) Cerebellopontine angle, ambiens cisterna (12) Brain stem (13) Fourth ventricle (14] Cerebellum (a. hemisphere b. vermis) (15) Cerebellar surface (extra-axial) (16) Cisterna magna (extra-axial) 2. PRECONTRAST: (1) Low (2)Iso (3) High (4) Mixed (5)With fat density (6)With air density 3. CALCIFICATION: (1) No (2) Marginal (3) Vascular-like (4) Lumpy, solid, punctate 4. EDENA: (1) No (2) < = 2 cm (3) <=1/2 hemisphere (4) >1/2 hemisphere 5. S H A P E E D E M A : (1) No (2) Smooth, regular (3) Digital, irregular 6. DEGREE_ENHANCEMENT: (1) No enhancement (2) Less than vessel :(3) Same as vessel (4) More than vessel 7. APPEARANCE_ENHANCEMENT: (1) Homogeneous (2) Thin regular marginal (3) Moderate regular marginal (4) Thick regular marginal (5) Gyrus- like (6) Grossly irregular (7) Mural nodule (8) Homogeneous with lucency inside (9) Thick irregular marginal 8. GENERAL. APPEARANCE: (1) Grossly cystic with fluid inside but no mural nodule (2) Cystic with mural nodule (3) Solid with small cyst/cysts (4) Solid with necrosis (5) Solid without necrosis or cyst (6) Mass with hemorrhage (7) Infiltrative lesion (8) Gyrus-like involvement (9) Leptomeningeal lesion 9. BONE__CHANGE: (1) No bony change (2) Sellar enlargement (3) Internal auditory meatus enlargement (4) Bony sclerosis (5) Bony erosion (6) Bony destruction 10. SIZE (cm): 11. MASS_EFFECT: (1) No mass effect, infiltrative type (2) With mass effect (3) Ipsilateral enlargement of ambiens cisterna 12. HYDROCEPHALUS: (1) No hydrocephalus, no previous shunting (2) Yes (3) No, but shunted previously 354 Ching Hung Wang et aL TABLE 2 A Case for Brain Tumor Diagnosis Feature Feature value Feature Feature value (1) Location sellarandsuprasellar (2) Precontrast high (3) Calcification marginal (4) Edema no (5) Shape edema smooth and regular (6) Size 1.2 cm (7) Enhancement degree (8) Enhancement appearance (9) General appearance (10) Bone change (11) Mass effect (12) Hydrocephalus less than vessel homogeneous with lucency solid with small cyst/cysts sellar enlargement with mass effect no hydrocephalus Pathology: pituitary adenoma unsatisfactory classification accuracy. Therefore, the prototype knowlege base must be refined. The knowl- edge-refinement unit automatically modifies the knowledge base according to results derived from the inference engine. The refinement algorithm also adopts an adaptive search method to alter the rules in the knowledge base. In the following sections, we will concentrate on the knowledge-building unit and the knowledge-refinement unit since the knowledge-inference engine is similar to other widely used types. 4. KNOWLEDGE-BUILDING UNIT Knowledge acquisition and machine learning are cur- rently two major techniques for acquiring knowledge from experts and data respectively. These two tech- niques, however, have their own limitations as Gaines pointed out (Gaines, 1989). Knowledge derived from machine-learning methods is quite dependent on the training data used, which easily makes the induced knowledge incomplete. Knowledge acquired from experts is often biased toward the experts' opinions which can easily make the derived knowledge subjective. In order to effectively construct a complete, consistent and objective knowledge base for brain tumor diagnosis, we were concerned with acquiring knowlege by integra- tion of the two techniques. Our aim was to construct an integrated brain-tumor diagnostic knowledge base from several individual knowledge sources. The knowledge-building unit can help knowledge engineers effectively acquire and integrate knowledge from various types of sources. In the following subsec- tion, we introduce the knowledge acquision, machine learning and knowlege integration functions. 4.1. K n o w l e d g e - A c q u i s i t i o n Module Recently, much study has been devoted to eliciting different types of knowledge by interviewing experts. Various knowledge-acquisition tools have been success- fully developed. In order to help BTDS easily acquire knowlege from various doctors, we included some commonly-used knowledge-acquisition tools in the Historic Documents i K n o w l e d g e A c q u i s i t i o n M a c h i n e I . ~ n i n , r i / ~ ~ ! M o d u l e M o d u l e ~ ~ o w l e d g e Ymowledge:'~ ~[ Acq~idUon I ~Inputs I II T°°l. / I ~,.~ Knowledge ~ _ ~ 'Mem,°d 111 . ~ _ _ ~ J i > / " I - I I n t e g r a t i o n L I " I i : - m ~ • ! IIAcq~a~aon II ~ t ~ I ILearninzl I i / / ~ t o w l e d g e ~ ! " [ x p e r t s Kn... o w t e d g e I n f e r e n c e . . ; . . . . . . . . . . . . . . . . . . . . . . . . . . . ~ ...... r FIGURE 3. Structure of the brain tumor diagnostic system. Brain Tumor Diagnostic System 355 knowledge-acquisition module. The knowledge-acquisi- tion module provides good flexibility and new knowledge-acquisition tools can be easily added to it. Experts can thus, depending on their preferences, choose the tools for knowledge input. The knowledge-acquisi- tion module has a knowledge-acquisition-tool manager that provides a user-friendly interface for operating various knowledge-acquisition toolsl The manager con- trols each knowledge-acquisition tool by invoking the services as required. Experts can thus easily apply any knowlege-acquisition tools to input their domain knowl- edge. Presently, two knowledge-acquisition tools, the Repertory Grid (Kelly, 1955) and EMCUD (Hwang & Tseng, 1990), are associated with the module. These tools meet the two general requirements described below. (1) knowledge-acquisition tools must be domain- independent; (2) the knowledge derived from tools must be easily translatable into the form of rules. A brief description of these knowledge-acquisition tools is given as follows. 4.1.1. Knowledge-Acquisition Tool: Repertory Grid. Operation of the repertory grid (Kelly, 1955) by a single expert can be briefly described as follows: Step 1. Elicit all the elements from the expert. At least two elements are needed to carry out the following procedure. Assume that five ele- ments, El, E2, E3, E4 and Es, are provided by the expert; we place them across the top of a grid. Step 2. Elicit constructs (traits and their opposites) from the expert. Each time three elements are chosen, ask for a construct to distinguish one element from the other two. The constructs obtained are listed down the side of the grid. Step 3. Rate all of the entries (elements, constructs) in the grid. Assume the traits C1, C2, C3, C4 and their opposites C'1, C~, C~, C~, have been given by the experts. As an example, the following repertory grid may be constructed. El E2 Ea E4 E5 C1 5 1 5 1 1 C~ C2 4 4 4 1 4 C~ C3 1 4 5 1 4 C~ C4 1 1 1 5 1 C~ Step 4. Generate production rules from the grids. 4.1.2. Knowledge-Acquisition Tool: EMCUD. EMCUD (Embedded Meanings Capturing and Uncertainty Decid- ing) (Hwang & Tseng, 1990) is a table-based knowledge acquisition method that can capture embedded meanings in given rules, and guide experts to decide certainty factors. The EMCUD strategy is briefly described as follows: Step 1. Apply some repertory grid-oriented method to derive the initial knowledge. Step 2. Construct an Attribute-Ordering Table that records the importance of each attribute to each object. Step 3. Elicit embedded meanings from the original rules and Attribute-Ordering Table. Generate embedded rules for each original rule. Step 4. Construct the constraint list to flag unwanted rules. Step 5. Guide experts to decide certainty factors of the embedded rules. 4.2. Machine-Learning Module Machine learning is another alternative for acquiring knowledge from training dam. Recently, several expert systems have been created that use machine-learning methods to generate rules from data (Gray 1990). In order to help knowledge engineers easily acquire knowl- edge from various sources, we include some commonly-used machine-learning tools in the machine- learning module. Knowledge engineers can, depending on training data representation, choose suitable tools for knowledge induction. Each machine-learning tool has a data store to hold the derived knowledge. If the derived knowledge is not expressed in the form of rules, it is then translated into the form of rules. Presently, four machine- learning tools, including Version Space (Mitchell, 1982), 1133 (Quinlan, 1986), PRISM (Cendrowska, 1987) and AQR (Michalski, 1980), are associated with the module. A brief description of these knowledge-acquisition tools is given as follows. 4.2.1. Machine-Learning Tool: Version Space. The Ver- sion Space learning strategy is mainly used for learning from training instances with only two classes: positive and negative (Mitchell, 1982). It attempts to induce concepts that include all positive training instances and exclude all negative training distances. The term "ver- sion space" is used to represent all legal hypotheses describable within a given concept-description language and consistent with all observed training instances. The term "consistenf' means that each hypothesis includes all given positive training instances and excludes all given negative ones. A version space can then be represented by two sets of hypotheses: set S and dual set G, defined a s " S={sls is a hypothesis consistent with observed instances. No other hypothesis exists that is both more specific than s and also consistent 356 Ching Hung Wang et al. with all observed instances }; G={glg is a hypothesis consistent with observed instances. No other hypothesis exists which is both more general than g and also consistent with all observed instances}. Sets S and G, together, precisely delimit a version space in which each hypothesis is both more general than some hypothesis in S and more specific than some hypothesis in G. When a new positive training instance is presented, set S is generalized to include this training instance; when a new negative training instance is presented, set G is specialized to exclude this training instance. When the Version Space is used to learn concepts from the multiple classes training set, one class is taken to be positive and all other classes are taken to be negative. 4.2.2. Machine-Learning Tool." ID3. In 1983, Quinlan proposed the ID3 learning algorithm that tries to form a decision tree from a set o f training instances (Quinlan, 1986). ID3 uses the heuristics of minimizing "entropy" in determining which attribute should be selected next in the decision tree. If Attribute A has m values (i.e. A1,A2 . . . . . Am) and the training set having attribute value Ai can be partitioned into n[ positive training instances and n,.- negative training instances, then the entropy o f choosing A as the next attribute is calculated according to the following formula: log2 n.+n+n7 ni- E = - n[ ~ ' n [ log2 n;" +n----~-" i = 0 Among all the feasible attributes, the one that entails the least entropy will be chosen as the next attribute. The same procedure is repeated until each terminal node in the decision tree contains only training instances with the same class. 4.2.3. Machine-Learning Tool: PRISM. The PRISM learning algorithm maximizes information gain instead of minimizing entropy in inducing modular rules (Cen- drowska, 1987). Attribute-valued pairs (selectors), in terms o f information theory, can be thought of as discrete messages. Given a message i, the amount o f information- gain about an event is defined as: r probability of event after i is received ] I(i)=l°g2[probability of event before i is received]" A selector (message) that provides more information- gain is then chosen to describe a class with a higher priority. The task o f the PRISM learning algorithm is to find the selector c~ x that contributes the most information- gain about a specified classificaton tS,, that is, for which 1(6,1%) is maximum. The major difference between PRISM and ID3 is that PRISM concentrates on finding only relevant attribute-value pairs, while ID3 is con- cerned with finding only the attribute that is, the most relevant overall, even though some values o f that attribute may be irrelevant. 4.2.4. Machine-Learning Tool: AQR. AQR is an induc- tion algorithm for generating a set o f classification rules (Michalski, 1980). When building decision rules, AQR performs a heuristic search through the hypothesis space to find the rules that account for all positive examples and no negative examples. AQR processes the training examples in stages; each stage generates a single rule, and then removes the examples it covers from the training set. This step is repeated until enough rules have been found to cover all the examples in the chosen class. 4.3. Knowledge-Integration Module The knowledge-integration module exploits all the available knowledge in the knowledge-acquisition mod- ule and the machine-learning module to construct a system with good performance. Some benefits o f integra- ing multiple knowledge sources in developing an expert system are described below (Medsker, 1995). (1) Knowledge acquired from different sources has good validity; (2) Domain knowledge is better understood from consensus among different knowledge sources; (3) Integrated knowledge can deal with more complex problems; (4) Knowledge integration may improve the per- formance o f the knowledge base. Since opinions o f different domain experts are differ- ent, the knowledge derived from each expert will be different, too. A similar problem also arises when separate knowledge sets are generated by individual learning methods. These various knowledge sets must be merged into a comprehensive knowledge base for the system to perform well. However, incompleteness, redundancy and inconsistency often arise. Removing them in knowledge integration is thus very important in developing a good brain tumour diagnostic system. The knowledge-integration module uses the genetic algorithm (Holland, 1 9 7 5 ) a s its integration engine to effectively integrate knowledge from multiple sources and rapidly construct a knowledge base. Here, we assume that all knowledge derived from the knowledge- acquisition and machine-learning modules are represented by rules since almost all knowledge derived by knowledge-acquisition tools or induced by machine- learning methods may easily be translated into or Brain Tumor Diagnostic System Doctors Historic Documents Actual Instances I~_°wlutodge IKnowledge Ac, quisilion [ [ Machine Learning I It Knowledge F.ncodmg I Integration I Knowledge Integration Module I l ' [ Knowledge Decoding \ I FIGURE 4. The flow chart of knowledge integration. 357 represented by rules. The flow chart for knowledge input and knowledge integration is shown in Fig. 4. In the knowledge-input stage, knowledge is acquired from various experts or induced from different training sets, and is represented as rule sets. In the knowledge-integration stage, each rule set is encoded into a bit string. The knowledge- integration module maintains a population of possible rule sets (bit strings) and uses the genetic algorithm to automatically search for the best integrated rule set to use as the knowledge base (Liao, 1995). The knowledge integration consists of three steps: encoding, integration and decoding. The encoding step transforms each rule set into a bit-string. The integration step chooses bit-string rule sets for "mating", gradually creating good offspring. The offspring then undergo recursive "evolution" until an optimal or a nearly optimal individual is found (Fig. 5). The decoding step then transforms the optimal or nearly optimal offspring into the form of rules. Since rule sets generated from different knowledge sources may vary in size and rule-set sizes may not be known beforehand, using an appropriate data structure to encode rule sets is therefore very important. In our system, variable-length bit strings are used to represent rule sets (De Jong, 1988). An example is given below. Example. Assume that two classes {C1, C2} in rule set RS are to be distinguished using three features {F1, F2, F3}. Assume Feature FI has three possible values {fu,f~z, f13} Feature F 2 has four possible values {fEl,fz2,fE3,f24}, and Feature F3 had three possible values {f3~,f3:,f33}. Also assume that the rule set RS has only two rules: R1: If (F1 =fl2) and (F2=f21) then Class is CI; R2: If (FI =f11) and (/73 =f32) then Class is C2. After encoding, the above rules are respectively repre- Initial populatlor Generation 0 Generation k P- Chromosocae ! ] - ~ -------~ C-'hromosome 2 V - - ~ ------ql Chr°m°s°me 3 [ - - ~ - - - - - - - b C,13romosomel x Knowledge encoding ~l'omo~m© cln'omosom© i Chromosome Genetic Operators Cla'°m°~°me2 i The best one Chromosome 3 ~ ............... - - - . ~ Chr°mos°me 3 ~ [ Chromosome / - - ~ R u l e Base [ i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i : . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ! Knowledge integration Knowledge decod/~g FIGURE 5. The knowledge-integration procedure. 358 Ching Hung Wang et al. sented as follows: Fi F2 F 3 Class RI 010 1000 111 10 /?2 100 1111 010 01 Finally, rule set R S is encoded into a chromosome: RS R/ n2 / X / X 010100011110 100111101001 \ ..__/\ .. / T T R~le-head points Four genetic operators, dynamic crossover, mutation, fusion and fission, are applied to the rule-set population during knowlege integration (Liao, 1995). The dynamic crossover operator takes two parent chromosomes and swaps parts o f their genetic information to produce offspring chromosomes. Unlike the conventional cross- over operator, the dynamic crossover operator selects crossover points that need not be at the same point- positions on both parent chromosomes: instead, the crossover points are at the positions the same distance from rule-head points. The mutation operator randomly changes some elements in a selected rule set to help the integration process escape from local-optimum "traps". The fusion operator checks and eliminates rule redun- dancy and subsumption relationships using an "OR" operation. If a string resulting from an "'OR" operation on two rules is the same as one o f the two rules, then a redundancy or subsumption relationship exists between the two rules. The fission operator selects the "closest" near-miss (Winston, 1992) rule to eliminate misclassi- fications and contradictions. In order to evaluate the fitness of an integrated rule set, an evaluation function is defined. The evaluation func- tion considers two factors: accuracy and complexity. Here, "complexity" is evaluated by the ratio of rule- increase in the integrated rule set, and "accuracy" is evaluated by the degree to which the integrated rule set can correctly classify test instances. Accuracy and complexity are then combined to represent the fitness value o f the rule set. The evaluation results are then fed back to the genetic algorithm to control how the solution space is searched to promote the quality o f rule sets. 5. KNOWLEDGE-INFERENCE UNIT Using the knowledge-integration approach proposed above, an integrated set o f rules can be formed from multiple knowledge sources. These rules comprise a knowledge base for brain tumor diagnosis. Some rules in the knowledge base are described below: rule 1: I F Appearance_of_Enhancement = "Homo- geneous" and Location = "Brain Paren- chyma, temporal" T H E N Pathology is Meningioma rule 2: I F Appearance_of Enhancement = "Moder- ate regular marginal" and Location = "Brain parenchyma, temporal" T H E N Pathology is Astrocytoma rule 3: I F Edema < = "1/2 hemisphere" and Appearance_of Enhancement= "Mural nod- ule" and Location= "Brain parenchyma, temporal" T H E N Pathology is Anaplastic Protoplasmic Astrocytoma rule 4: I F Appearance o f Enhancement= "Homo- geneus with lucency inside" and Location = "Brain parenchyma, temporal" T H E N Pathology is Glioblastoma rule 5: I F Appearance_of_Enhancement = "Moder- ate regular marginal" and Location = "'Brain parenchyma, parietal" T H E N Pathology is Anaplastic Protoplasmic Astrocytoma rule6: I F Appearance_of_Enhancement="Homo- geneous with lucency inside" and Location = "Brain parenchyma, parietal" T H E N Pathology is Glioblastoma rule 7: I F Precontrast = "Iso" and Location- = "Brain parenchyma, occipital" T H E N Pathology is Meningioma rule 8: IF Bone_Change = "Sellar enlargement" and Location = "Sellar and suprasellar" T H E N Pathology is Pituitary A d e n o m a rule 9: I F B o n e C h a n g e = "Bony erosion'" and Location = "Sellar and suprasellar" T H E N Pathology is Meningioma rule 10: I F Precontrast= "Low" and Appearance_of_ Enhancement= "Grossly irregular" and Location = "Cerebellum, vermis" T H E N Pathology is Astrocytoma rule 11: I F Precontrast = "High" and Appearance_ o f Enhancement= "Grossly irregular" and Location = "Cerebellum, vermis" T H E N Pathology is Medulloblastoma rule 12: I F Appearance_of_Enhancement= "Homo- geneous with lucency inside '" and Location = "Cerebellum, vermis" T H E N Pathology is Medulloblastoma In the diagnostic process, BTDS can assist doctors in determining brain tumor etiologies according to the features extracted from computer tomography pictures. Doctors first inspect the patient's symptoms and input the symptoms as facts into the diagnostic system. The inference engine then searches for diagnostic rules that B r a i n T u m o r D i a g n o s t i c S y s t e m Input Event Encode (4) I Chromosome 1 I • • • N e w p o p u l a t i o n B e s t C h r o m o s o m e + E v e n t s I ~ m . , [ Chromosome m I (7)~ Refine Inference ~ Current (1) ~--~Knowledge BasoJ I Genetic Adaptive Search ~ ~ New (Knowledge Integration) Knowledge Bas©j-' L . . . . . . . . Append Encode and Append FIGURE 6. The knowledge refinement process. 359 match the patient's symptoms, and suggests a pathol- ogy. 6. K N O W L E D G E - R E F I N E M E N T U N I T A knowledge base integrated from multiple knowledge sources is often only a prototype, with an unsatisfactory classification accuracy. During the inference process, rules in a knowledge base must be refined to improve the effectiveness o f the knowledge-base system. In this section, a knowledge-refinement scheme is proposed to refine rules during the inference process. The knowledge-refinement unit uses the knowledge- integration procedure as the basis for refining knowledge. A flow chart for the refinement process is shown in Fig. 6. During inference, an input event wrongly classified by the current knowledge base is appended to the set o f test instances. It is also encoded as a bit string and appended to the current best rule set. The new test set, including the wrongly-classified element, is then presented to the genetic adaptive search algorithm to evaluate rule sets for a new population. The refinement process works until the exception events can be correctly classified by the knowledge base, making the new knowledge base more accurate than the old one. 7. I M P L E M E N T A T I O N The brain tumor diagnostic system was implemented in C language on a SUN SPARC/2 workstation. Ten initial knowledge items (rule sets) were obtained from different groups of experts using the knowledge-acquisition module, or derived from historical documents or current records o f actual instances via machine-learning meth- ods. The knowedge-integration module automatically integrated the ten initial rule sets into a comprehensive knowledge base. 348 real brain tumour cases were used to evaluate the performance of the knowledge base. After 2000 execution generations o f the genetic algorithm, an accuracy rate o f 91.42% was obtained, with 92 rules in the resulting knowledge base. The knowledge base must be continuously refined to i m p r o v e the accuracy if rnisclassification occurs. These rules were then refined during the process o f inference. Finally, an accuracy o f 95.58% was achieved, with 103 rules in the resulting knowledge base. 8. C O N C L U S I O N S This paper presents the design o f a self-integrating knowledge-based brain tumor diagnostic system. The brain tumor diagnostic system proposed consists o f three main units: knowledge building, knowledge inference, and knowledge refinement. Genetic techniques are also shown here to be good tools for knowledge integration and knowledge refinement. The system was successfully implemented on a Sun/SPARC 2 workstation. 348 real brain tumor cases were used to evaluate the performance o f the brain tumor diagnostic system, with a classifica- tion accuracy higher than 95%. We may then conclude that the brain tumor diagnostic system is a successful medical system. Acknowledgements---The authors would like to thank Dr M. M. H. Tseng and Dr O. Y Guo of VGH, for their advice about brain tumors. R E F E R E N C E S Baral, C., Kraus, S. & Minker, J. (1991). Combining multiple knowledge bases. IEEE Transactions on Knowledge and Data Engineering, 3, 208-220. Cendrowska, J. (1987). PRISM: An algorithm for inducing modular rules. International Journal o f Man-Machine Studies, 27, 349-370. De Jong, K. A. (1988). Learning with genetic algorithm: An overview. Machine Learning, 3, 121-138. 3 6 0 Gaines, B. R. (1989). Integration issues in knowledge supports systems. International Journal of Man-Machine Studies, 26, 497-515. Gragun, B. J. & Studel, H. J. (1987). A decision-table based processor for checking completeness and consistency in rule-base expert- systems. International Journal o f Man-Machine Studies, 26, 633-648. Gray, N. A. B. (1990). Capturing knowledge through top-down induction of decision trees. 1EEE Expert, 41-50. Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor, MI: University of Michigan Press. Hwang, G. J. & Tseng, S. S. (1990). EMCUD: A knowledge acquisition method which captures embedded meanings under uncertainty. International Journal of Man-Machine Studies, 33, 431-451. Kelly, G. A. (1955). The psychology of personal constructs. New York: Norton. Liao, C. M. (1995). Using genetic algorithm technique for integrating multiple rule-sets. M.Sc. thesis, NCTU, Hsinchu, Taiwan. Medsker, L., Tan, M. & Turban, E. (1995). Knowledge acquisition from multiple experts: Problems and issues. Expert Systems With Ching Hung Wang et aL Applications, 9, 35-40. Michalski, R. S. & Chilausky, R. (1980). Learning by being told and learning from examples: An experimental comparison of the two methods of knowledge acquisition in the context of developing an expert system for soybean disease diagnosis. International Journal o f Policy Analysis and Information Systems, 4, 125-160. Mitchell, T. M, (1982). Generalization as search. Artificial Intelligence, 18, 203-226. Quinlan, J. (1986). Induction of decision tree. Machine Learning, 1, 81-106. Wang, C. H., Tseng, S. S. & Hong, T. P. (1995). Design of a self- adaptive brain tumor diagnostic system. Journal of Information Science and Engineering, 11, 275-294. Waterman, D. (1986). A guide to expert systems. Reading, MA: Addison-Wesley. Wills, K., Teather, D. & Innocent, P. (1982). An expert system for medical diagnosis of brain tumors. International Journal of Man- Machine Studies, 16, 341-349. Winston, P. H. (1992). Artificial intelligence (3rd edn). Reading,/VIA: Addison-Wesley. work_27g256ct6zf4fiivjunlac54um ---- pf-mim1 Toward Normative Expert Systems: Part I The Pathfinder Project David E. Heckerman Departments of Computer Science and Pathology University of Southern California HMR 204, 2025 Zonal Ave Los Angeles, CA 90033 Eric J. Horvitz Palo Alto Laboratory Rockwell International Science Center 444 High Street Palo Alto, California 94301 Bharat N. Nathwani Department of Pathology University of Southern California HMR 204, 2025 Zonal Ave Los Angeles, CA 90033 To appear in Methods of Information in Medicine, 1992 1 Abstract Pathfinder is an expert system that assists surgical pathologists with the diagnosis of lymph-node diseases. The program is one of a growing number of normative expert systems that use probability and decision theory to acquire, represent, manipulate, and explain uncertain medical knowledge. In this article, we describe Pathfinder and our research in uncertain-reasoning paradigms that was stimulated by the development of the program. We discuss limitations with early decision-theoretic methods for reason- ing under uncertainty and our initial attempts to use non-decision-theoretic methods. Then, we describe experimental and theoretical results that directed us to return to reasoning methods based in probability and decision theory. Keywords: expert systems, decision making, diagnosis, probability theory, decision theory, artificial intelligence, pathology 2 1 Introduction Decision-theoretic or normative expert systems have the potential to provide better decision support than do traditional expert systems in problem areas or domains where the accurate management of uncertainty is important. This potential for improvement arises because people, including experts, make mistakes when they make decisions under uncertainty. That is, people often deviate from the rules of decision theory, which provides a set of compelling principles or desiderata for how people should behave when reasoning or making decisions under uncertainty. Decision theory includes the rules of probability and the principle that a person should always choose the alternative that maximizes his expected utility. Traditional expert systems provide decision support by mimicking the recommendations of experts. They do so by managing uncertainty with non-decision-theoretic methods. Such systems are valuable, because they provide important information to a nonexpert who is confronted with a confusing decision, and because they offer reminders to users who may be stressed or fatigued. Nonetheless, they tend to duplicate the errors made by experts. In contrast, normative expert systems use decision theory to manage uncertainty. The word “normative” comes from decision analysts and cognitive psychologists who emphasize the importance of distinguishing between normative behavior, which is what we do when we follow the desiderata of decision theory, and descriptive behavior, which is what we do when unaided by these desiderata. By encoding expert knowledge in a decision-theoretic framework, we can reduce errors in reasoning, and thereby build expert systems that offer recommendations of higher quality. In this article, we describe Pathfinder, a normative expert system that assists surgical pathologists with the diagnosis of lymph-node diseases [1, 2]. The Pathfinder project began in 1983 as a joint project among researchers at Stanford University (David Heckerman, Eric Horvitz, and Larry Fagan) and the University of Southern California (Bharat Nathwani—the primary pathology expert—and Keung-Chi Ng) [3]. Currently, a commercial derivative of Pathfinder, called Intellipath, is being used by practicing pathologists and by pathologists in training as an educational tool [4]. Also in this article, we discuss the importance of the proper management of uncertainty for diagnosis of lymph-node diseases; and we discuss our research in uncertain-reasoning paradigms that was stimulated by the development of Pathfinder. In particular, we examine practical limitations with early decision-theoretic methods for reasoning under uncertainty and our initial attempts to overcome these limitations through the use of non-decision- theoretic reasoning paradigms. Then, we describe experiments with these non-decision- theoretic approaches as well as theoretical analyses that directed us to return to a method- ology based in probability and decision theory. In the companion to this article, we describe the decision-theoretic representations that we developed to make practical the construction of a normative version of Pathfinder. 2 Diagnosis in Surgical Pathology and Pathfinder Surgical pathologists perform diagnosis primarily by examining sections of tissue microscop- ically. Sometimes, pathologists also incorporate clinical, radiology, and laboratory informa- 3 tion, and examine tissue with expensive tests derived from immunology, microbiology, and cell-kinetics research. Based on this information, the pathologist provides a diagnosis to the surgeons and oncologists who participate in the patient’s treatment. That is, the pathologist tells these physicians, “the patient has disease x.” The well-being of patients depends strongly on the accuracy of the pathologist’s diag- nosis. In the case of lymph-node diagnosis, for example, let us suppose that the patient has Hodgkin’s disease, a malignant disease, but that the pathologist makes a diagnosis of mononucleosis, a benign disease that can resemble Hodgkin’s disease. In this situation, the patient’s chance of death is significantly greater than it would have been had the diagno- sis been correct, because he does not receive immediate treatment for his malignancy. In contrast, let us suppose that the patient has mononucleosis, and that the pathologist makes a diagnosis of Hodgkin’s disease. In this case, the patient likely will undergo expensive, painful, and debilitating treatment, to be “cured,” only because he never had the malignant disease in the first place. A general pathologist performs diagnosis on tissue sections from all parts of the body. When a general pathologist has difficulty with diagnosis, he frequently refers the case to a subspecialist, who has expertise in the diagnosis of a particular tissue type. This referral process usually incurs both a delay in diagnosis and an extra cost. Sometimes, the delay in diagnosis is unacceptable, and the pathologist cannot refer the case to a subspecialist. For example, surgeons often rely on pathologists for the timely diagnosis of disease in frozen tissue taken from patients under anesthesia [5, 6]. The subspecialty of lymph-node diagnosis is one of the most difficult areas in surgical pathology [7, 8, 9, 10]. For example, one multisite oncology study analyzed almost 9000 cases of malignant lymphoma. The study found that although experts show agreement with one another, the diagnoses rendered by general pathologists for certain diseases had to be changed by expert lymph-node pathologists in as many as 65 percent of the cases [10]. Our goal in building Pathfinder is to close the wide gap between the quality of lymph- node diagnoses made by general pathologists and those made by subspecialists. We hope to increase the accuracy of in-house pathology diagnoses, to reduce the frequency of referrals, and to assist pathologists with intraoperative diagnosis when there is insufficient time for expert consultation. Pathologists have difficulty with diagnosis for two reasons. First, they may misrecognize or fail to recognize microscopic features. Second, they may combine evidence inaccurately to form a diagnosis. The second problem arises because the pathologist must consider an enormous number of features and diseases, and because the relationships among diseases and features are uncertain. Most of Pathfinder research has concentrated on the evidence- combination problem. That is, we have worked to develop an expert system that can help pathologists cope with the many uncertain relationships in the diagnosis of lymph-node pathology. Indeed, Pathfinder reasons about more than 60 diseases that can invade the lymph node (25 benign diseases, 9 Hodgkin’s lymphomas, 18 non-Hodgkin’s lymphomas, and 10 metastatic diseases), using more than 130 microscopic, clinical, laboratory, immunologic, and molecular-biologic features. Similarly, in this article, we concentrate on the problem of managing uncertainty in large domains. Nevertheless, as we mention in Section 9, we also have addressed the feature-recognition problem. 4 3 A Pathfinder Dialog In rendering a diagnosis, a pathologist (1) identifies and quantifies features; (2) constructs a differential diagnosis, a set of diseases consistent with the observations; and (3) decides what additional features to evaluate and what costly tests to employ to narrow the differential diagnosis. He repeats these steps until he has observed all useful features. This procedure is called the hypothetico-deductive approach [11, 12, 13, 14]. Cognitive psychologists have found that physicians frequently employ this approach in performing clinical diagnosis [12, 14]. Pathfinder uses this same method, summarized in Figure 1, to assist pathologists with their task of diagnosis. Associated with each feature are two or more mutually exclusive and exhaustive instances. For example, the feature NECROSIS is associated with the instances ABSENT, PRESENT, and PROMINENT. The Pathfinder system allows a user to report instances for one or more salient features of a lymph-node section. Given these feature–instance pairs, the system displays a differential diagnosis ordered by likelihood of diseases. In response to a query from the user, Pathfinder recommends a set of features that are the most cost effective for narrowing the differential diagnosis. The pathologist can answer one or more of the recommended questions. This process continues until the differential diagnosis is a single disease, there are no additional tests or questions, or a pathologist determines that the informational benefits are not worth the costs of further observations or tests. The operation of the latest version of Pathfinder is illustrated by the set of screen photos in Figure 2. Figure 2(a) shows the initial Pathfinder screen. The FEATURE CATEGORY window displays the categories of features that are known to the system, the OBSERVED FEATURES window displays feature–instance pairs that will be observed by the pathologist, and the DIFFERENTIAL DIAGNOSIS window displays the list of possible diseases and their probabilities. The probabilities in Figure 2(a) are the prior probabilities of disease—the probabilities for disease given only that a patient’s lymph node has been removed and is being examined. If the user selects (double-clicks) the feature category SPHERICAL FEATURES, then Pathfinder displays a list of features for that category. To enter a particular feature, the user double-clicks on that feature, and then selects one of the mutually exclusive and ex- haustive instances for that feature. For example, Figure 2(b) shows what happens when the user selects the feature F % AREA (percent area of the lymph-node section that is occupied by follicles). In the figure, a third window appears that lists the instances for this feature: NA (not applicable), 1–10%, 11–50%, 51–75%, 76–90%, and >90%. Figure 2(c) shows the result of selecting the last instance for this feature. In particular, the feature–instance F % AREA: >90% appears in the middle column, and the differential diagnosis is revised, based on this observation. As we mentioned, the user can continue to enter any number of features of his own selection. Figure 2(d) shows the Pathfinder screen after the user has reported that follicles are in a back-to-back arrangement and show prominent polarity. Alternatively, the user can ask the program to recommend additional features for observation. Figure 2(e) shows that the most cost-effective feature to evaluate, given the current differential diagnosis, is monocytoid cells. If the user observes that monocytoid cells are prominent, then we obtain the differential diagnosis in Figure 2(f). In this case, the four features in the middle column have narrowed the differential diagnosis to a single disease: the early phase of AIDS. 5 Evidence-Gathering Decisions Salient Features Diagnosis Continue? Differential Diagnosis Figure 1: Hypothetico-deductive reasoning in Pathfinder. First, the pathologist reports instances of salient features to the system. The system then constructs a differential diagnosis—a list of hypotheses that are consistent with the observations, and an assignment of likelihood to each such hypothesis. Next, the system analyzes the current differential di- agnosis to identify the most useful features for the pathologist to observe. The process cycles until the differential diagnosis is narrowed to a single disease, there are no additional tests or questions, or the pathologist determines that the informational benefits are not worth the costs of further observations or tests. 6 (a) (b) (c) (d) (e) (f) Figure 2: A Pathfinder consultation. (a) Initially, Pathfinder displays (from left to right) the categories of features, an empty window that will contain feature-instance pairs reported by the user, and the prior probabilities of disease. (b) Double-clicking on the category SPHERICAL STRUCTURES and then on the feature F % AREA, the pathologist prepares to report to Pathfinder the percent area occupied by follicles. (c) Double-clicking on the instance >90%, the pathologist reports that more than 90% of the lymph-node is occupied by follicles. In response, the program produces a differential diagnosis in the right-hand window. (d) The pathologist now reports that follicles are in a back-to-back arrangement and show polarity. Pathfinder revises the differential diagnosis. (e) The pathologist has asked Pathfinder to display features that are useful for narrowing the differential diagnosis. The program displays the four most cost-effective features for the user to observe next. The most useful feature is monocytoid cells. (f) The user now reports that monocytoid cells are prominent. Pathfinder determines that only a single disease—AIDS EARLY (the early phase of AIDS)—is consistent with the four observations. (Adapted with permission from D. Heckerman, Probabilistic Similarity Networks, MIT Press, Cambridge, MA, 1991.)7 Figure 3: A graphical justification for the recommendation of MONOCYTOID CELLS. For each instance of the feature, the length and direction of a bar reflects the change in the probability of AIDS EARLY relative to that of FLORID FOLLIC HYPERP, given the observation of that feature–instance pair. The justification also includes the monetary cost of observing the feature. (Taken with permission from D. Heckerman, Probabilistic Similarity Networks, MIT Press, Cambridge, MA, 1991.) Pathfinder explains graphically its recommendations for additional observations. A bitmap of Pathfinder’s graphical justification of the diagnostic utility of the feature mono- cytoid cells is displayed in Figure 3. In this explanation, Pathfinder displays the change in probability of the two most likely hypotheses given the observation of each possible instance of the feature. The graph indicates that if monocytoid cells are absent, then the probability of FLORID FOLLIC HYPERP relative to that of AIDS EARLY increases slightly; if monocytoid cells are present or prominent, then the probability of FLORID FOLLIC HYPERP relative to that of AIDS EARLY decreases greatly; and if monocytoid cells show confluence, then the probability of FLORID FOLLIC HYPERP relative to that of AIDS EARLY remains unchanged. By glancing at this graph, we can see that this feature is useful for discriminating these two diseases. The window also displays the monetary cost of evaluating the feature, which is negligible in this case. 8 4 Decision-Theoretic Computations in Pathfinder Both an early version and the latest version of Pathfinder employ decision-theoretic compu- tations to assist pathologists with diagnosis. In this section, we examine these computations. In particular, we examine how Pathfinder (1) uses probabilistic inference to generate a dif- ferential diagnosis, (2) uses decision theory to recommend a diagnosis, and (3) uses decision theory to recommend features for observation. First, however, let us discuss some funda- mentals of probability and decision theory. 4.1 Probability Theory Probability theory has roots, more than 3 centuries ago, in the work of Bernoulli, Laplace, Fermat, and Pascal [15]. The theory describes how to infer the probability of one event from the probability of related events. The prevalent conception of the probability of some event x is that it is a measure of the frequency with which x occurs, when we repeat many times an experiment that has x as a possible outcome. A more general notion, however, is that the probability of x represents the degree of belief held by a person that the event x will occur in a single experiment. If a person assigns a probability of 1 to x, then he believes with certainty that x will occur. If he assigns a probability of 0 to x, then he believes with certainty that x will not happen. If he assigns a probability of between 0 and 1 to x, then he is to some degree unsure about whether or not x will occur. The interpretation of a probability as a frequency in a series of repeated experiments traditionally is referred to as the objective or frequentist interpretation. In contrast, the interpretation of a probability as a degree of belief is called the subjective or Bayesian in- terpretation, in honor of the Reverend Thomas Bayes, a scientist from the mid-1700s who helped to pioneer the theory of probabilistic inference [16, 15]. Both interpretations follow the same set of mathematical rules. In the Bayesian interpretation, a probability or belief will always depend on the state of knowledge of the person who provides that probability. For example, if we were to give someone a coin, he would likely assign a probability of 1/2 to the event that the coin would show heads on the next toss. If, however, we convinced that person that the coin was weighted in favor of heads, he would assign a higher probability to the event. Thus, we write the probability of x as p (x|ξ), which is read as the probability of x given ξ. The symbol ξ represents the state of knowledge or background knowledge of the person who provides the probability. The conception of probability as a measure of personal belief is central to research on the use of probability and decision theory for representing and reasoning with expert knowl- edge in computer-based reasoning systems. There is usually no alternative to acquiring from experts the bulk of probabilistic information used in an expert system. For example, there are more than 14 thousand probabilities in the latest version of Pathfinder; and some of these probabilities are on the order of 10−6. Thus, performing the experiments necessary to determine objective probabilities for Pathfinder would entail much time and great ex- pense. Fortunately, when experimental data is available, the Bayesian approach provides a mechanism for expert systems to update their probabilities, given this data [17, 18, 19]. 9 4.2 Decision Theory and Utility Assessment Decision theory extends the Bayesian interpretation of probability theory, and prescribes how a decision maker should choose among a set of alternatives or actions, given his utility or pref- erence for each possible outcome and his belief that each outcome will occur. In particular, decision theory includes the rules of probability theory and the maximum-expected-utility (MEU) principle, which states that a decision maker should always choose the alternative that maximizes his expected utility [20]. Utility assessment is nontrivial and is the subject of many debates. In this article, we mention only a few important points concerning utility assessment for Pathfinder. The interested reader should consult more general discussions by Keeney and Raiffa [21], McNeil et al. [22], and Howard [23]. For each disease pair (dj, dk) in Pathfinder, we encode the utility u(dj, dk), which sum- marizes the preferences of the decision maker for the situation in which a patient has disease dj, but is diagnosed as having disease dk. Factors that influence such preferences include the length of the patient’s expected life, the pain associated with treatment and with the disease itself, the psychological trauma to the patient and his family, and the monetary cost associated with treatment and with disability. An important consideration in the assessment of these (and any other) utilities is: Who is the decision maker? From our perspective, a pathologist is only a provider of information. Thus, the u(dj, dk) in the utility model of a computer-based diagnostic system should reflect the patient’s preferences. For example, consider the situation where a pathologist believes, after reviewing a case, that the probability of the benign infection mononucleosis is 0.9, and that the probability of Hodgkin’s disease is 0.1. Should the patient be treated for Hodgkin’s disease now, or should he wait for more definitive diagnostic signs to develop? As we discussed, delaying treatment of Hodgkin’s disease decreases the chances of long-term survival if the patient has this condition. On the other hand, the treatment for Hodgkin’s disease is highly invasive. The decision about therapy will depend on how the patient feels about the alternative outcomes. Different patients may have dramatically different preferences. As we discuss in Sections 4.4 and 4.5, differences in patient preferences can in principle affect recommendations made by an expert system for diagnosis. Thus, utility assessment poses a fundamental problem to any researcher who wants to develop such expert systems. Specifically, whenever a patient case is processed by an expert system, the system or a decision analyst should assess the utilities of that patient and provide these utilities to the system. Such utility assessment would be extremely time consuming and expensive. As we see in Sections 4.4 and 4.5, however, only Pathfinder’s diagnostic recommendations and not its recommendations for evidence gathering are sensitive to patient utilities. Thus, by allowing Pathfinder to make only evidence-gathering recommendations, we render the program’s recommendations insensitive to patient utilities. We can therefore encode in Pathfinder the utilities u(dj, dk) from one representative patient. To construct Pathfinder’s utility model, we assessed the utilities of Bharat Nathwani, the primary Pathfinder expert. We found it relatively easy to assess his utilities, because he was familiar with the ramifications of many specific correct and incorrect diagnoses. Another important consideration in utility assessment is the wide range of severities as- 10 sociated with outcomes. For example, if a patient has a viral infection and is incorrectly diagnosed as having cat-scratch disease—a disease caused by an organism that is killed with antibiotics—the consequences are not severe. In fact, the only non-negligible consequence is that the patient will take antibiotics unnecessarily for several weeks. If a patient has Hodgkin’s disease and is incorrectly diagnosed as having mononucleosis, however, the con- sequences are often lethal. It is important for us to measure preferences across such a wide range, because sometimes we must balance a large chance of a small loss with a small chance of a large loss. For example, even though the probability that a patient has syphilis is small—say, 0.001— treatment with antibiotics may be appropriate, because the patient may prefer the harmful effects of antibiotics to the small chance of the harmful effects of untreated disease. Early attempts to assess preferences for both minor and major outcomes in the same unit of measurement were fraught with paradoxes. For example, in a linear willingness-to-pay approach, a decision maker might be asked, “How much would you have to be paid to accept a one in ten-thousand chance of death?” If the decision maker answered, say, $1000, then the approach would dictate absurdly that he would be willing to be killed for $10 million. Howard (1980) constructed an approach that avoids many of the paradoxes of earlier models. Like several of its predecessors, the model determines what an individual is willing to pay to avoid a given chance of death, and what he is willing to be paid to assume a given chance of death. Also, like many of its predecessors, Howard’s model shows that, for small risks of death (typically, p < 0.001), the amount someone is willing to pay to avoid, or is willing to be paid to assume, such a risk is linear in p. That is, for small risks of death, an individual acts as would an expected-value decision maker with a finite value of life, called the small-risk value of life. For significant risks of death, however, the model deviates strongly from linearity. For example, the model shows that there is a maximum probability of death, beyond which an individual will accept no amount of money to risk that chance of death. Most people find this result to be intuitive.1 To use this model, we first determined Bharat Nathwani’s small-risk value of life. When asked what dollar amount he would be willing to pay to avoid chances of death ranging from 1 in 20 to 1 in 1000, he was consistent with the linear model to within a factor of 2, with a median small-risk value of life equal to $20 million (in 1988 dollars). To make the application of the model more convenient, we used Howard’s definition of a micromort: one–in–1-million chance of death [24]. In these units, the Pathfinder expert’s small-risk value of life was $20 per micromort.2 Given this small-risk value of life, we could then measure his preferences for major and minor outcomes in a common unit: 1 minus the probability of immediate, painless death that he was willing to accept to avoid a given outcome and to be once again healthy. In particular, we assessed his preferences for minor outcomes with willingness-to-pay questions, such as “How much would you be willing to pay to avoid taking antibiotics for two weeks?” We then translated these answers, via the linearity result of Howard’s model, to units of probability 1The result makes several assumptions, such as the decision maker is not suicidal and is not concerned about how his legacy will affect other people. 2In general, the micromort is a useful unit of measurement, because it helps to emphasize that the linear relationship between risk of death and willingness to pay holds for only small probabilities of death. 11 of death. For example, an answer of $100 translated to a utility of 1 − 5 micromorts = 1 − 0.000005 = 0.999995 We assessed his preferences for major outcomes directly in units of probability of death. For example, he imagined that he had—say—Hodgkin’s disease, and that he had been misdiag- nosed as having mononucleosis. He then imagined that there was a magic pill that would rid him of this disease with probability 1 − p, but would kill him, immediately and painlessly, with probability p. He then provided the value of p that made him indifferent between his current situation and the situation in which he takes the pill. The utility of this outcome is 1 − p. 4.3 Construction of a Differential Diagnosis First, let us examine the problem of differential diagnosis in general. Let m and n denote the number of diseases and features in a medical domain, respectively. Also, let d1, d2, . . . , dm denote the disease entities. For the moment, let us suppose that each disease dj may be present or absent. Let Dk denote some instance of diseases. That is, Dk denotes some assignment of present or absent to each of the diseases d1, d2, . . . , dm. Further, let f1, f2, . . . , fn denote the features in the domain, and let ij denote the observed instance for the jth feature. Now imagine that a user of a probabilistic expert system for this domain has observed instances for q features. To simplify the notation, let us renumber the n features so that the user has observed instances for the first q features. Typically, the user will want to know the probability of each disease instance, given the observations f1i1, f2i2, . . . , fqiq. This quantity for disease instance Dk is known as the posterior probability of Dk, and is denoted p(Dk|f1i1, f2i2, . . . , fqiq, ξ) (1) Thus, the number of probabilities of interest is exponential both in the number of observed features and in the number of diseases. In principle, an expert could assess directly these posterior probabilities. Aside from the intractable nature of this task, however, most physicians are more comfortable assessing probabilities in the opposite direction. That is, they are more comfortable assessing the probabilities that the set of observations f1i1, f2i2, . . . , fqiq will appear given a particular disease instance Dk, denoted p(f1i1, f2i2, . . . , fqiq|Dk, ξ) (2) Using Bayes’ theorem, the expert system can compute from these probabilities and the prior probability of disease instances p(Dk|ξ) the desired posterior probabilities p(Dk|f1i1, f2i2, . . . , fqiq, ξ) = p(f1i1, f2i2, . . . , fqiq|Dk, ξ) p(Dk|ξ) P Dl p(f1i1, f2i2, . . . , fqiq|Dl, ξ) p(Dl|ξ) (3) where the sum over Dl runs over all disease instances. Unfortunately, this approach to the problem is also intractable, because the number of probabilities of the form 12 p(f1i1, f2i2, . . . , fqiq|Dk, ξ) is exponential both in the number of diseases and in the number of features. To manage the complexity of the general case, researchers who built the first proba- bilistic expert systems made two assumptions. First, they supposed that all findings were conditionally independent, given any disease instance. That is, they assumed that, if the true disease state of the patient was known, then the likelihood of seeing any observation fkik did not depend on observations made about any other features. Thus, p(fjij |Dk, f1i1, . . . , fj−1ij−1, fj+1ij+1, . . . , fqiq, ξ) = p(fjij |Dk, ξ) (4) Given this assumption, it follows from the rules of probability [25] that p(f1i1, f2i2, . . . , fqiq|Dk, ξ) = p(f1i1|Dk, ξ) p(f2i2|Dk, ξ) · · · p(fqiq|Dk, ξ) (5) Second, these researchers supposed that the traditional disease entities were mutually ex- clusive and exhaustive. That is, they assumed that each disease instance corresponded to a situation where only one disease was present. Given these two assumptions, the expert system can compute the posterior probabilities of disease from the tractable computation p(dk|f1i1, f2i2, . . . , fqiq, ξ) = p(f1i1|dj, ξ) p(f2i2|dk, ξ) · · · p(fqiq|dk, ξ) p(dk|ξ) P dl p(f1i1|dl, ξ) p(f2i2|dl, ξ) · · · p(fqiq|dl, ξ) p(dl|ξ) (6) where dk represents the disease instance in which only disease dk is present. Thus, only the conditional probabilities p(fjij |dk, ξ) and the prior probabilities p(dk|ξ) are required for the computation. We call any model that employs these two assumptions a simple-Bayes model. Ledley and Lusted proposed this model for medical diagnosis in 1959 [26]. In the domain of lymph-node pathology, the assumption that diseases are mutually ex- clusive is appropriate, because co-occurring diseases almost always appear in different lymph nodes or in different regions of the same lymph node. Also, the large scope of Pathfinder makes reasonable the assumption that the set of diseases is exhaustive. The assumption of global conditional independence, however, is inaccurate. For example, given certain diseases, finding that follicles are abundant in the tissue section increases greatly the chances that sinuses in the interfollicular areas will be partially or completely destroyed. Nonetheless, to simplify our task, we used the simple-Bayes model to construct the first probabilistic version of Pathfinder. Later, after developing several graphical representation languages that we describe in the companion to this article, we encoded successfully the conditional noninde- pendence or conditional dependence among the features in the domain. We shall return to this discussion in Section 8. 4.4 Recommendation of a Diagnosis As we mentioned, a diagnosis is a statement of the form: “The patient has disease x.” Sometimes, as we saw in the patient case in Section 3, the posterior probability of one disease will equal 1 and the posterior probability of all other diseases will equal 0. In this case, making a diagnosis is not a decision. Rather, the diagnosis is a consequence of the rules of logic. In most cases, however, observations usually do not narrow the differential 13 diagnosis to a single disease. In these situations, making a diagnosis is a decision: an irrevocable allocation of resources under uncertainty. Using the MEU principle, the system can determine a diagnosis from the probabilities of disease and the utilities u(dj, dk). Let φ denote the set of feature–instance pairs f1i1, f2i2, . . . , fqiq that we have observed thus far. First, for each diagnosis dk, the system computes eu(dk|φ), the expected utility of that diagnosis given observations φ, using the formula eu(dk|φ) = X dj p(dj |φ) u(dj, dk) (7) To complete the determination, the system selects the optimal diagnosis, denoted dx(φ), using the equation dx(φ) = argmaxdk [eu(dk|φ)] (8) where the function argmaxdk [·] returns the disease that maximizes its argument. We do not allow Pathfinder to recommend diagnoses, because we have observed that such recommendations are somewhat sensitive to the utility model. That is, when we change the utilities in the model from values appropriate for one patient to values appropriate for another patient, the program’s recommendations can change significantly. By preventing Pathfinder from recommending diagnoses, we hope to encourage a change in the way pathologists and care-providing physicians communicate. In the short term, we hope that pathologists will begin to express clearly—in the language of probability—uncertainty associated with their observations. In the long term, we hope that each physician who is associated with the care of a patient—including the primary physician, the pathologist, the radiologist, the surgeon, the oncologist, and the radiotherapist—and the patient himself will communicate in decision- theoretic terms to determine the best treatment for that patient. Such communication could take place via a shared decision model embodied in an expanded version of Pathfinder. 4.5 Recommendation of Features to Narrow a Differential Diag- nosis Let us now consider how an expert system can use decision theory to recommend features for observation to narrow a differential diagnosis. First, the system enumerates all possible observation strategies. An example of an observation strategy is Observe f3. If f3 is present, then observe f2; otherwise, make no further obser- vations and make the diagnosis. If f3 and f2 are present, then observe f7, and make the diagnosis. If f3 is present and f2 is absent, then make the diagnosis. Next, the system computes the decision maker’s expected utility of all strategies, including the strategy in which the user observes no additional features. Finally, the system chooses the strategy that maximizes the decision maker’s expected utility. In practice, this approach is unfeasible, because there are more than 2n strategies for n unobserved features. To make computations tractable, both the old and new versions of Pathfinder employ the myopic approximation, introduced by Gorry and Barnett in 1968 [27]. In this approximation, a system identifies the best single feature to observe, by maximizing 14 the expected utility of the decision maker under the assumption that a diagnosis will be made after the user observes only one feature. Once the user observes the feature, the system repeats the myopic analysis, and may recommend additional features for observation. Let us examine formally the computation in Pathfinder. First, the system computes eu(dx(φ)|φ), the expected utility of the optimal diagnosis when the user observes no addi- tional features. From Equations 7 and 8, we have eu(dx(φ)|φ) = X dj p(dj |φ) u(dj, dx(φ)) (9) Now the system imagines that the user observes an additional feature fnew. Let φ 0 denote the union of the original set of observations and the observation for fnew. Pathfinder now identifies the optimal diagnosis, given the new set of observations: dx(φ 0 ) = argmaxdk   X dj p(dj |φ 0 ) u(dj, dk)   (10) The expected utility of this diagnosis, denoted eu(dx(φ 0 )|φ0), is given by eu(dx(φ 0 )|φ 0 ) = X dj p(dj |φ 0 ) u(dj, dx(φ 0 )) (11) In contrast, the expected utility of the original diagnosis, given observations φ 0 , is given by eu(dx(φ)|φ 0 ) = X dj p(dj |φ 0 ) u(dj, dx(φ)) (12) The quantity eu(dx(φ)|φ0) is never greater than the measure eu(dx(φ0)|φ0), because, by definition, the diagnosis dx(φ 0 ) maximizes expected utility, given the observations φ 0 . The system now computes the value of information of observing fnew, denoted vi(fnew|φ), which is the difference between eu(dx(φ 0 )|φ0) and eu(dx(φ)|φ0) averaged over the instances inew of the feature fnew. 3 That is, vi(fnew|φ) = X inew p(φ 0 |φ) h eu(dx(φ 0 )|φ 0 ) − eu(dx(φ)|φ 0 ) i (13) The value of information of observing fnew represents the largest amount that the decision maker would be willing to pay to observe fnew. This quantity is always greater than or equal to 0. Next, the system computes the net value of information of observing fnew, denoted nvi(fnew|φ), by subtracting the cost4 of observing fnew from the value of information of observing fnew. That is, nvi(fnew|φ) = vi(fnew|φ) − cost(fnew). (14) Finally, if there is at least one feature that has a net value of information greater than 0, Pathfinder recommends the feature for observation that has the highest net value of 3This definition and the definition of net value of information are appropriate for expected-value decision makers. Howard discusses the general case [28, 29]. 4We measured costs in dollars and then converted these costs to units of probability via Howard’s model. 15 information. Otherwise, the system suggests that the user should gather no additional evidence and make a diagnosis. In principle, the myopic approximation could affect the diagnostic accuracy of an expert system. For example, suppose that two features remain unobserved. In this case, the net value of information for the feature pair could exceed 0, and thus the user should observe the features. A value-of-information analysis on each feature alone, however, may indicate that neither feature is cost effective for observation. Consequently, the user would fail to observe these features, and thereby possibly make an incorrect diagnosis. Nonetheless, there is evidence that the myopic approximation does not often cause this problem in practice. For example, Gorry and Barnett have demonstrated that the approximation does not dimin- ish significantly the diagnostic accuracy of their program that assists physicians with the diagnosis of congenital heart disease [27]. In addition, although we have not yet conducted a similar experiment with Pathfinder, our expert almost always has been impressed by the questions generated by the myopic approximation. As we mentioned in previous sections, Pathfinder’s diagnostic recommendations are sen- sitive to the utility model, and we therefore do not allow Pathfinder to make such recom- mendations. Fortunately, however, we have found in an informal study that Pathfinder’s recommendations for evidence gathering are insensitive to the model. In particular, we have found that Pathfinder’s recommendations often are similar to those made by a second version of the program in which u(dj, dk) is equal to 1 when both dj and dk are benign diseases, u(dj, dk) is equal to 1 when both dj and dk are malignant diseases, and u(dj, dk) is equal to 0 otherwise. Consequently, we allow Pathfinder to make recommendations for evidence gathering. The observation that recommendations for evidence gathering are less sensitive to the utility model than are diagnostic recommendations may be due to the fact that more factors contribute to the computation of net value of information than to the computation of the diagnosis. That is, only the probabilities of diseases and the utilities u(dj, dk) contribute to the computation of the diagnosis, whereas these factors, the information content of a feature, and the cost of a feature contribute to the computation of net value of information. 5 Alternative Reasoning Methodologies The first medical expert systems employed computations based in probability theory. In particular, throughout the 1960s and early 1970s, medical expert systems used the simple- Bayes model to construct differential diagnoses. These systems included Warner’s system for the diagnosis of heart disease [30], Gorry’s program for the management of acute renal failure, and deDombal’s system for the diagnosis of acute abdominal pain [31]. Evaluations of most of these early systems showed that the programs performed well. In fact, the diagnoses rendered by several of them were more accurate than were those made by experienced physicians [31]. Nonetheless, in the early 1970s, researchers began to criticize these systems. They noted that the domains of these programs were small and did not reflect realistic clinical situations. Furthermore, researchers argued that errors due to the erroneous assumptions of the simple-Bayes model would become unacceptable as the domains of these systems were expanded [32, 33]. One group of investigators showed that 16 the diagnostic accuracy of an expert system based on the simple-Bayes model deteriorated significantly as the number of features in the system increased. These investigators traced the degradation in performance to violations of the conditional-independence assumptions in the simple-Bayes model [34]. Another group of researchers showed that the assumption of global conditional independence could be unrealistic in small domains as well [35]. In the early 1970s, perceptions of the inadequacy of the early decision-theoretic systems led to the development of alternative methods for reasoning under uncertainty [36, 37, 32]. Many of these developments occurred in the field of Artificial Intelligence in Medicine. Some of the alternative methods were ad hoc mechanisms, designed as custom-tailored techniques for particular domains or systems. These approaches included the MYCIN certainty-factor model and the QMR scoring scheme. Other methods were developed as alternative theoret- ical formalisms, such as the Dempster–Shafer theory of evidence and fuzzy decision theory. In the first year of Pathfinder research, we appreciated the limitations of the simple-Bayes model, and believed that the use of this model would significantly impair the performance of Pathfinder. Consequently, we examined several alternative reasoning methodologies. In this section, we introduce these approaches. In the following two sections, we describe our empirical and theoretical analyses of these methodologies in the context of Pathfinder. A well-known ad hoc method for managing uncertainty is the certainty-factor (CF) model [38]. Shortliffe and Buchanan designed the model to augment the rule-based approach to reasoning for MYCIN, a program for the diagnosis and treatment of bacteremias and meningitis [33]. In using the model, an expert attaches a certainty factor to each if–then rule. The certainty factor represents the expert’s change of the belief in the consequent of the rule, given the antecedent of the rule. In particular, a CF between 0 and 1 means that the expert’s belief in a consequent increases if the antecedent is true, whereas a CF between -1 and 0 means that the expert’s belief decreases. In a rule base, the consequent of one rule may serve as the antecedent of another rule. In addition, two or more rules may share the same antecedent or consequent. As a result, a rule base forms an inference network: a directed graph in which an arc from proposition a to proposition b corresponds to the rule “if a then b.” The CF model prescribes a method for propagating certainty factors through such a network. That is, given an observation of an antecedent in the network, we can use CF-model formulas to compute the effective certainty factor for any consequent in the network that is a descendent of that antecedent. Although the CF model was designed for MYCIN, the model has found many applications in other domains. Today, the model is the most popular method for managing uncertainty in rule-based systems. Quick Medical Reference or QMR (formerly Internist-1) uses another ad hoc method for managing uncertainty [39, 40]. The QMR project, now in its eighteenth year, assists internists with the diagnosis of more than 600 diseases, through the consideration of approximately 4000 manifestations or features of disease. In QMR, each feature has two instances; in particular, a feature is either absent or present. More important, each disease can be either absent or present. Thus, QMR can address cases in which more than one disease is present. The ad hoc scoring scheme employs two measures to represent the degree of association between a feature and a disease: an evoking strength and a frequency. The evoking strength for a given feature–disease pair represents the degree to which the presence of the disease causes that feature to be present. The frequency for a given feature–disease pair represents how often that feature is present in patients who have the disease. In addition, the scheme 17 represents the import of each feature, which is inversely proportional to the likelihood that an insignificant disease (such as the common cold) can cause the feature to be present. Given an assignment of present and absent to a subset of features, QMR uses evoking strengths, frequencies, and imports to assign a score to each disease. QMR then displays diseases in order of descending score. Like early decision-theoretic systems, QMR uses a hypothetico- deductive approach to reasoning. In particular, the system contains ad hoc algorithms for generating useful recommendations for additional evidence gathering based on the current differential diagnosis and on evoking strengths, frequencies, and imports. A more theoretical alternative to probabilistic reasoning was developed by Dempster and extended by Shafer [41, 42]. The approach, now called the Dempster–Shafer theory of evidence, was motivated by theoretical objections to the decision-theoretic approach [43]. Nonetheless, many artificial-intelligence (AI) researchers adopted special cases of the ap- proach to avoid the perceived computational intractability of decision theory [44, 45]. Cur- rently, the theory has many interpretations [42, 46, 47, 48]. One of the most popular inter- pretations is that given in Shafer’s original text. In this interpretation, an expert assesses the degree of support that a piece of evidence lends to hypotheses in the frame of discern- ment: a set of mutually exclusive and exhaustive hypotheses. He does so for every piece of evidence that may be observed. A combination rule can then be used to compute the degree of support that multiple pieces of evidence lend to hypotheses in the frame of discernment. In the interpretation, an expert assesses degrees of support for a single piece of evidence by constructing a basic probability assignment over the frame. That is, he assigns a mass, rang- ing from 0 to 1, to each subset of hypotheses in the frame. The mass for a particular piece of evidence and a subset of hypotheses represents the degree of support that the evidence lends to the subset. Like the CF model and the QMR scoring scheme, the Dempster–Shafer theory manipulates measures of change in belief. In Section 7.1, we examine the relationship among these methodologies. Another theoretical approach for managing uncertainty is fuzzy decision theory [49]. The theory addresses the presence of ambiguous terms such as “large” and “tall” in the specification of decision problems. Fuzzy decision theorists do not object to the use of probability theory or decision theory when events are defined precisely. They argue, however, that it is desirable to reason in situations where there is imprecision in the definition of events in addition to uncertainty about their occurrence. An example of a fuzzy decision problem is as follows: An urn contains many balls of various sizes, of which several are large. To draw a ball, you must pay a small sum of money. If you draw a large ball, however, you will win a valuable prize. Should you draw the ball? 6 Empirical Study of Reasoning Methods During the first year of Pathfinder research, we experimented with the reasoning method- ologies described in the previous section. In this section, we examine the results of those experiments. 18 6.1 Rule-Based Reasoning The first version of Pathfinder was a rule-based system that employed propositional logic for reasoning. After informally evaluating the system, we discovered two related problems. First, the program did not take into account the uncertainty associated with the relationships between observations and diseases. This deficiency of the program became apparent to us almost immediately. Indeed, as we have mentioned, proper management of uncertainty is crucial to accurate diagnosis in the domain of lymph-node pathology. We might have addressed this concern with the use of the CF model. We found, however, another problem with the rule-based approach, which forced us to abandon the methodology. In particular, our expert was frustrated by the system, because it asked many questions that were irrelevant to discriminating the diseases on the current differential diagnosis. This behavior was a result of the fact that the rule-based methodology generated recommendations for additional observations based on a fixed traversal through the rule base. As a result of our informal evaluations, we searched for a more flexible approach to the overall control of diagnostic reasoning. We discovered literature describing the hypothetico- deductive approach and systems such as QMR and Gorry’s diagnostic program that im- plemented the approach. We decided to construct a new version of Pathfinder modeled after QMR. Nonetheless, we were not satisfied with the scoring scheme of QMR because it had no theoretical foundation; and we searched for a more principled method for managing uncertainty. 6.2 Fuzzy Reasoning We considered fuzzy decision theory as a possible reasoning methodology for Pathfinder, but quickly rejected its use. We did so, because we found that neither general pathologists nor experts in hematopathology agreed on the meanings of fuzzy descriptions of feature– instance pairs such as “mild capsule thickening,” “rare Sternberg-Reed cells,” or “prominent necrosis.” For example, one expert stated that Sternberg-Reed cells were “rare” when there were one to five of these cells in any 4-square-centimeter section of a lymph node. Another expert stated that these cells were “rare” when there were one to ten of these cells in any 4-square-centimeter section. We did not believe that fuzzy decision theory—a scheme devised by researchers who were unfamiliar with the domain of lymph-node pathology—nor any other mechanism would provide meaningful inferences, if we continued to employ these fuzzy feature-instance descriptions. Instead, we asked the four hematopathology experts—Drs. Costan Berard, Jerome Burke, Ronald Dorfman, and Bharat Nathwani—to clarify the meanings of the descrip- tions that they were using. Although, as we have just discussed, the experts’ interpretations did not coincide initially, the experts did not find it difficult to construct unambiguous inter- pretations for each feature instance. The experts handled disagreements in a manner similar to that used by coauthors of a manuscript who are faced with a disagreement. That is, when their initial interpretations of a feature instance did not coincide, each pathologist put forth an argument for the merits of his interpretation of that feature instance. Then, in most cases, the four experts accepted unanimously one interpretation. When the experts could not agree, the primary author of the system (Bharat Nathwani) selected the interpretation. 19 6.3 The Dempster–Shafer Theory of Evidence Next, we examined the Dempster–Shafer theory of evidence. We discovered that the theory could be interpreted as a methodology for combining measures of change in belief, as could the CF model and QMR scoring scheme. We were attracted to the methodology, however, because it appeared to be a more principled approach to uncertainty management. Conse- quently, we constructed the second version of Pathfinder, using the Dempster–Shafer theory. In particular, we implemented a special case of the theory described by Barnett [44]. In this approach, the expert assigned nonzero masses only to (1) singleton subsets of the frame of discernment and (2) the entire frame of discernment. We refer to this simplified approach as the Dempster–Shafer–Barnett model. We then evaluated informally this version of Pathfinder by allowing the expert to exercise the system with real and imaginary cases. The expert was satisfied with the diagnostic accuracy of the system. At this time, probability theory was low on our list as a method for combining evi- dence to build a differential diagnosis, because of the limitations of the simple-Bayes model. Nevertheless, we were interested in experimenting with probabilistic reasoning, given the pio- neering work of Ledley, Lusted, Gorry, Barnett, and others. We re-examined the measures of uncertainty that we had assessed from our expert, and realized that these measures could be interpreted in probabilistic terms. We implemented the simple-Bayes model in Pathfinder, without assessing additional measures of uncertainty.5 We then compared the performance of the Dempster–Shafer–Barnett and simple-Bayes versions of Pathfinder. Without informing the expert, we switched the scoring scheme of Pathfinder from the Dempster–Shafer–Barnett approach to the simple-Bayes model. To our surprise, after running several cases with the probabilistic scheme, the expert exclaimed excitedly that the diagnostic accuracy of the program had improved significantly. Several years later, in a formal study, we compared the diagnostic accuracy of the Dempster–Shafer–Barnett, simple-Bayes, and CF models in the domain of lymph-node pathol- ogy. We verified our informal observation that the simple-Bayes model provided greater diagnostic accuracy (i.e., greater agreement with the expert) than did the Dempster–Shafer– Barnett model. We also found that the simple-Bayes model provided greater diagnostic accuracy than did the CF model [50]. 7 Theoretical Study of Reasoning Methods Surprised with the dominance of the simple-Bayes model over the alternative methods, we investigated the relationship between probability theory and alternative reasoning strategies over the next two years. We also studied the theoretical justifications for probabilistic and decision-theoretic reasoning. 5We assumed that the prior probability of each disease was equal. 20 7.1 Probabilistic Interpretations of Alternative Reasoning Meth- ods Heckerman and other researchers examined the relationship of the CF, QMR, and Dempster– Shafer–Barnett models with a simple probabilistic model for manipulating measures of change in belief called the odds–likelihood updating scheme. To understand the probabilistic model, let us suppose that we have a single disease d that can be true (d+) or false (d−). Further, suppose that we have n features f1, . . . , fn, where each feature can be present or absent. Let us apply the simple-Bayes model to this situation. That is, let us assume that all features are conditionally independent, given d+ and given d−. Thus, as in Section 4.3, we can use Bayes’ theorem to compute the posterior probability of d+; we obtain p(d+|f1, . . . , fn, ξ) = p(f1|d+, ξ) · · · p(fn|d+, ξ) p(d+|ξ) p(f1|d+, ξ) · · · p(fn|d+, ξ) p(d+|ξ) + p(f1|d−, ξ) · · · p(fn|d−, ξ) p(d−|ξ) (15) where any fi can be present or absent. In addition, we can apply Bayes’ theorem to compute the posterior probability of d−; we get p(d−|f1, . . . , fn, ξ) = p(f1|d−, ξ) · · · p(fn|d−, ξ) p(d−|ξ) p(f1|d+, ξ) · · · p(fn|d+, ξ) p(d+|ξ) + p(f1|d−, ξ) · · · p(fn|d−, ξ) p(d−|ξ) (16) When we divide Equation 15 by Equation 16, we obtain p(d+|f1, . . . , fn, ξ) p(d−|f1, . . . , fn, ξ) = p(f1|d+, ξ) p(f1|d−, ξ) · · · p(fn|d+, ξ) p(fn|d−, ξ) p(d+|ξ) p(d−|ξ) (17) We can rewrite Equation 17 as O(d+|f1, . . . , fn, ξ) = λ(f1, d+|ξ) · · · λ(fn, d+|ξ) O(d+|ξ) (18) where O(d+|ξ) = p(d+|ξ)p(d−|ξ) and O(d+|f1, . . . , fn, ξ) = p(d+|f1,...,fn,ξ) p(d−|f1,...,fn,ξ) (19) are the prior and posterior odds of d+, respectively, and λ(fi, d+|ξ) = p(f1|d+, ξ) p(f1|d−, ξ) (20) is the likelihood ratio for d+, given fi. Equation 18 is the odds–likelihood updating scheme. Heckerman showed that we can interpret the certainty factor for the rule “if fi then d+,” denoted CF (fi → d+|ξ) as a monotonically increasing function of the likelihood ratio λ(fi, d+|ξ). In particular, he showed that, if we make the identification CF (fi → d+|ξ) =    λ(fi,d+|ξ)−1 λ(fi,d+|ξ) λ(fi, d+|ξ) ≥ 1 λ(fi, d+|ξ) − 1 λ(fi, d+|ξ) < 1 (21) then the odds–likelihood updating scheme is identical to the formula in the CF model for combining certainty factors that is applied when a set of rules share the same consequent. 21 In addition, Heckerman showed that with the identification in Equation 21, the remaining formulas in the CF model are a close approximation to the rules of probability [51]. Grosof then showed that the Dempster–Shafer–Barnett model was isomorphic to the odds–likelihood updating scheme [52] via a different transformation of the likelihood ratio. In addition, Heckerman and Miller demonstrated that QMR’s ad hoc scoring scheme was isomorphic to the odds–likelihood updating scheme [53].6 These theoretical results helped us to understand the dominance of the simple-Bayes model over the nonprobabilistic alternatives. In particular, the other approaches did not avoid the assumptions of conditional independence of the simple-Bayes model; they merely obscured the assumptions. In fact, these approaches assumed that evidence was conditionally independent, given each disease and given the negation of each disease. When there are more than two mutually exclusive and exhaustive diseases in a domain, theses conditional independence assumptions are stronger than are the assumptions in the simple-Bayes model [58, 51]. We can understand the limitations of the alternative scoring schemes at a more intuitive level. Let us consider rule-based inference, in particular. The first rule-based inference schemes used the rules of logic. As a result, these schemes enjoyed a property known as modularity. That is, given the logical rule “if a then b,” and given that a is true, we can assert that b is true no matter how we established that a is true, and no matter what else we know to be true. For example, given the rule if l1 and l2 are parallel lines then l1 and l2 do not intersect we can assert that l1 and l2 do not intersect once we know that l1 and l2 are parallel lines. This assertion satisfies the property of modularity: The assertion depends on neither how we came to know that l1 and l2 are parallel, nor what else we know. The CF model is an extension of the rules of logic that imposes this same principle of modularity on inferences. For example, given the rule if PERITONITIS then APPENDICITIS, CF = 0.7 and given that a patient has peritonitis, the CF model allows us to increase the likelihood that the patient has appendicitis by the amount corresponding to a CF of 0.7, no matter how we establish that peritonitis is present. Given the correspondences described in the first part of this section, we see that the odds–likelihood updating scheme, the QMR scoring scheme, and the Dempster–Shafer–Barnett model also incorporate the property of modularity. We shall refer to these methods collectively as modular belief updating schemes. Unfortunately, these schemes in reality do not satisfy the property of modularity. Contin- uing our example, suppose the patient has vaginal bleeding. This fact increases the likelihood that she has a ruptured ectopic pregnancy, and thus increases the likelihood that she has peritonitis. The chances that the patient has an appendicitis decreases, however, because the presence of a ruptured ectopic pregnancy can account for the presence of peritonitis. Overall, we have that the likelihood of peritonitis increases, whereas the likelihood of appendicitis decreases. The modular rule linking peritonitis with appendicitis is inconsistent with these relationships. 6This work led to the construction of a probabilistic version of QMR, called QMR-DT [54, 55, 56, 57]. 22 In general, logical relationships represent complete models of interaction. In contrast, uncertain relationships encode invisible interactions. We summarize these hidden interac- tions with numerical measures, such as a certainty factor or likelihood ratio. In the process of such a summarization, we lose information about the detailed categorical interaction. There- fore, when we try to combine uncertain information, unexpected (nonmodular) interactions may occur. We should not expect that any modular belief updating scheme will be able to handle such subtle interactions. 7.2 A Practical Problem with Nonprobabilistic Methods In continuing to explore the difference between probabilistic and nonprobabilistic alterna- tives, we encountered a practical limitation associated with the use of modular belief updat- ing schemes. Specifically, these schemes require that we assess the strength of an uncertain relationship in the direction in which it is used. That is, we must specify the change in belief of an unobservable hypothesis, given an observable piece of evidence. Unfortunately, experts often are more comfortable quantifying uncertain relationships in the direction opposite to that in which they are used [59]. In particular, Kahneman and Tversky have shown that people usually are more comfort- able when they assess the likelihood of an effect given a cause rather than when they assess the likelihood of a cause given an effect. For example, expert physicians prefer to assess the likelihood of a finding, given a disease, rather than the likelihood (or belief update) of a disease, given a finding [60]. Henrion attributes this phenomenon to the nature of causal- ity. In particular, he notes that a predictive probability (the likelihood of a finding, given a disease) reflects a stable property of that disease. In contrast, a diagnostic probability (the likelihood of a disease, given a finding) depends on the incidence rates of that disease and of other diseases that may cause the finding. Thus, predictive probabilities are a more useful and parsimonious way to represent uncertain relationships—at least in medical domains (see [61], pages 252–3). The developers of QMR make a similar observation [39]. Unfortunately, effects are usually the observable pieces of evidence, and causes are the sought-after hypotheses. Thus, in using a modular belief updating scheme, we force experts to provide judgments of uncertainty in a direction that is more cognitively challenging. We thereby promote errors in assessment. In contrast, when we use probability theory to manage uncertainty, we can assess the strength of an uncertain relationship in one direction, and then reverse the relationship using Bayes’ theorem, when the need arises. 7.3 Compelling Principles for Uncertainty Management and De- cision Making After identifying limitations of non-decision-theoretic approaches for uncertain reasoning, we explored theoretical advantages of the decision-theoretic approach. Perhaps the most significant advantage we discovered was the fact that the rules of probability and the MEU principle follow from compelling principles, and that people often violate these principles when unaided by decision-theoretic systems. For example, Ramsey and deFinetti showed that anyone who does not follow the rules of probability theory would be willing to accept a “Dutch book”: a combination of bets 23 leading to a guaranteed loss of money under any circumstances [62, 63, 64]. In contrast, Cox developed a set of desiderata about fundamental properties of a measure of belief unre- lated to betting behavior that also imply the rules of probability [65, 25]. In addition, von Neumann and Morgenstern constructed a remarkable proof of the MEU principle [20]. They developed five compelling principles or desiderata that every decision maker should follow, and demonstrated that these desiderata imply the MEU principle. One desideratum is the principle of transitivity, which states that if a decision maker prefers outcome A to outcome B, and prefers outcome B to outcome C, then he must prefer outcome A to outcome C. To see that this desideratum is compelling, let us suppose that a decision maker’s preferences are not transitive. In particular, suppose he prefers A to B, B to C, and C to A. Because he prefers C to A, he should be willing to exchange A for C and a small payment. Similarly, this person should be willing to exchange C for B, and B for A. Thus, we can extract payments from him, and yet leave him with the same outcome. Repeating this procedure, called a money pump, we can extract an arbitrarily large payment from this person. Decision theorists and cognitive psychologists have devised justifications for each of von Neumann and Morgenstern’s desiderata [20, 66, 67]. Psychological studies have demonstrated that, in the real world, human decision makers exhibit a set of stereotypical deviations or biases from the desiderata of decision theory [68, 69]. From the perspective of decision theory, we can view these deviations as mistakes. For example, people sometimes have nontransitive preferences, and thus are vulnerable to a money pump. Also, physicians often forget to consider the prior probability of disease when making a diagnosis. As we mentioned in the introduction, we use the adjectives “normative” and “descriptive” to emphasize the differences between decision making consistent with the rules of decision theory and decision making in the real world. In medicine, descriptive errors in decision making are particularly dangerous, given the high stakes associated with some decisions. Such errors easily may lead to needless expen- ditures, pain and suffering, or loss of life. Fortunately, normative expert systems can help physicians to avoid such errors. For example, systems that encode explicitly the prior proba- bilities of diseases will likely improve the decisions made by physicians who would otherwise forget to consider this important information. Nonetheless, several researchers have argued that we should employ descriptive rather than normative methods for decision making in expert systems. In particular, investiga- tors have stated that decision-theoretic methods lack the expressiveness needed to encode expertise or to describe intelligent behavior [70, 37, 71, 43]. Indeed, several of these inves- tigators have argued that the CF model, fuzzy decision theory, and the Dempster–Shafer theory of evidence may be more appropriate than is decision theory for uncertainty man- agement, because these non-decision-theoretic methods are possibly more descriptive than is decision theory [71, 43, 72]. To our knowledge, however, there are no psychological studies that support these assertions. In fact, our experiment comparing the diagnostic accuracy of the simple-Bayes, CF, and Dempster–Shafer–Barnett models (see Section 6.3) demonstrated that, among these approaches, the simple-Bayes model is the most descriptive method for managing uncertainty. 24 8 A Return to Decision Theory: Practical Consider- ations Once we understood the limitations of non-decision-theoretic methods and the theoretical benefits of decision theory for managing uncertainty, we still faced the practical limitations of probability and decision theory. In particular, we were concerned that it would be unfeasible to relax the conditional independence assumptions of the simple-Bayes model for Pathfinder. Nonetheless, we speculated that adding the most salient conditional dependencies would not lead to a combinatorial explosion. Indeed, we conjectured that experts themselves could not appreciate all the subtle conditional dependencies that may exist among the features in large medical domains. To manage the complexity of their domain, these experts must be reasoning under many assumptions of conditional independence. We believed that, if we could capture these assumptions made by the experts, then we could produce normative expert systems that perform at least as well as do experts. Our work to achieve these goals in the domain of lymph-node pathology was successful. Over the next three years, we developed graphical knowledge representations that allowed us to capture the important conditional dependencies in this domain in a reasonable amount of time. In a formal evaluation, we demonstrated that the diagnostic accuracy of the new version of Pathfinder was at least as good as that of the Pathfinder expert [73]. In the companion to this article, we describe these new knowledge representations in detail. 9 Pathfinder in Clinical Practice Several years after the Pathfinder project was initiated, we re-engineered the program on MS-DOS computers, and made the system available commercially. The system, named Intellipath, consists of a normative expert system for lymph-node diagnosis and a set of supportive informational tools including an analog videodisc library of images, text infor- mation on diseases and microscopic features, references to the literature, and an automated report-generator. In 1988, the American Society of Clinical Pathologists began selling the Intellipath pro- gram to practicing pathologists and pathologists in training in North America. The program was a commercial success, and we constructed similar systems for other types of human tis- sue. Currently, approximately 200 pathologists are using the program, and systems for breast, bone, larynx, skin, small intestine, stomach, thymus, and urinary bladder pathology and for lung and thyroid cytology are available. In Section 2, we mentioned that pathologists have difficulty with diagnosis because they are unable to combine evidence accurately and because they misrecognize or fail to recognize microscopic features. Most of Pathfinder research addresses the first problem; the videodisc component of Intellipath, however, addresses the second problem. In particular, when a user has trouble recognizing a feature, he can ask Intellipath to display images of that feature. These images illustrate both typical and atypical presentations of the feature. Recently, we have integrated the latest version of the Pathfinder expert system with the Intellipath platform, including the videodisc for lymph-node pathology. We will evaluate this program, called Pathfinder II, in clinical trials funded by the National Cancer Institute. 25 In these studies, we shall determine whether pathologists who use the system perform better than do those pathologists who do not have access to the system. We shall quantify separately improvements in pathologists’ ability to combine evidence and improvements in their ability to recognize features. In addition, we shall determine whether pathologists can recognize situations in which they need assistance from an expert system or human expert. 10 Conclusions We are not alone in the investigation of probabilistic and decision-theoretic methods for uncertain reasoning in medical expert systems. Several other recent research projects have explored normative reasoning in medical expert systems, including the Nestor system for diagnosis of endocrinology disorders [74], the Glasgow Dyspepsia expert system for assisting in gastroenterology diagnosis [75], the Neurex system for diagnosis of neurological disorders [76], the Medas system for assisting physicians in emergency medicine [77], the Munin sys- tem for diagnosis of muscular disorders [78], and the Sleep Consultant system for diagnosis of sleep disorders [79]. Furthermore, over the last five years, a dedicated community of re- searchers addressing the management of uncertainty in reasoning systems has evolved. The proceedings of the main conference of this group of researchers, the Conference on Uncer- tainty in Artificial Intelligence, is a collection of the latest theoretical and empirical research on this topic [80, 81, 82, 83, 84]. We believe that the development of normative expert systems will lead to improvements in the capture and delivery of expert knowledge. This view is supported by our successes with Pathfinder. We hope that our experiences will inspire other medical-informatics investigators to develop normative expert systems for medicine. Acknowledgments We thank Greg Cooper, Lawrence Fagan, Thomas Lincoln, Randolph Miller, Keung-Chi Ng, Ramesh Patil, Edward Shortliffe, and Clive Taylor for advice and guidance on Pathfinder research. We also thank Costan Berard, Jerome Burke, and Ronald Dorfman for serving with Bharat Nathwani as domain experts in lymph-node pathology. Henri Suermondt, Mark Fischinger, Marty Chavez, and especially Keung-Chi Ng assisted with programming and data management. Lyn Dupré, Greg Cooper, and Keung-Chi Ng provided useful comments on this manuscript. Research on the Pathfinder project has been supported by the National Cancer Insti- tute under Grant RO1CA51729-01A, and by the National Library of Medicine under Grant RO1LM04529. Computational support has been provided in part by the SUMEX-AIM re- source under National Institutes of Health Grant LM05208. References [1] D.E. Heckerman, E.J. Horvitz, and B.N. Nathwani. Update on the Pathfinder project. In Proceedings of the Thirteenth Symposium on Computer Applications in Medical Care, 26 Washington, DC, pages 203–207. IEEE Computer Society Press, Silver Spring, MD, November 1989. [2] D.E. Heckerman, E.J. Horvitz, and B.N. Nathwani. The Pathfinder project. Technical Report KSL-90-08, Medical Computer Science Group, Section on Medical Informatics, Stanford University, Stanford, CA, February 1990. [3] D.E. Heckerman, E.J. Horvitz, and B.N. Nathwani. Pathfinder research directions. Technical Report KSL-89-64, Medical Computer Science Group, Section on Medical Informatics, Stanford University, Stanford, CA, October 1985. [4] B.N. Nathwani, D.E. Heckerman, E.J. Horvitz, and T.L. Lincoln. Integrated expert systems and videodisc in surgical pathology: An overview. Human Pathology, 21:11–27, 1990. [5] R.J. Rosai and L.V. Ackerman. The pathology of tumors: Part II. diagnostic techniques. Cancer, 29:22–39, 1979. [6] H.S. Levin. Operating room consultation by the pathologist. Urology Clinics of North America, 12:549–56, 1985. [7] G.E. Byrne. Rappaport classification of non-Hodgkin’s lymphoma: Histologic features and clinical significance. Cancer Treatment Reports, 61:935–944, 1977. [8] C.A. Coltman, R.A. Gams, J.H. Glick, and R.D. Jenkin. Lymphoma. In B. Hoogstraten, editor, Cancer Research Impact of the Cooperative Groups, pages 39–84. Masson Pub- lishing, 1980. [9] S.E. Jones, J.J. Butler, G.E. Byrne, C.A. Coltman, and T.E. Moon. Histopathologic review of lymphoma cases from the Southwest Oncology Group. Cancer, 39:1071–1076, 1977. [10] H. Kim, R.J. Zelman, M.A. Fox, J.M. Bennett, C.W. Berard, J.J. Butler, G.E. Byrne, R.F. Dorfman, R.J. Hartsock, R.J. Lukes, R.B. Mann, R.S. Neiman, J.W. Rebuck, W.W. Sheehan, D. Variakojis, J.F. Wilson, and H. Rappaport. Pathology panel for lymphoma clinical studies: A comprehensive analysis of cases accumulated since its inception. Journal of the National Cancer Institute, 68:43–67, 1982. [11] F.D. Bartlett. Thinking. Basic Books, New York, 1958. [12] A.S. Elstein, M.J. Loupe, and J.G. Erdman. An experimental study of medical diag- nostic thinking. Journal of Structural Learning, 2:45–53, 1971. [13] A.S. Elstein. Clinical judgment: Psychological research and medical practice. Science, 194:696–700, 1976. [14] A.S. Elstein, L.S. Shulman, and S.A. Sprafka. Medical Problem Solving: An Analysis of Clinical Reasoning. Harvard University Press, Cambridge, MA, 1978. [15] I. Hacking. The Emergence of Probability. Cambridge University Press, New York, 1975. 27 [16] T. Bayes. An essay towards solving a problem in the doctrine of chances. Biometrika, 46:293–298, 1958. Reprint of original work of 1763. [17] R.A. Howard. Decision analysis: Perspectives on inference, decision, and experimenta- tion. Proceedings of the IEEE, 58:632–643, 1970. [18] J. Pearl. How to do with probabilities what people say you can’t. In Proceedings of the Second Conference on Artificial Intelligence Applications, Miami Beach, FL, pages 6–12. IEEE Computer Society Press, Silver Spring, MD, December 1985. [19] D.J. Spiegelhalter. Probabilistic reasoning in predictive expert systems. In L.N. Kanal and J.F. Lemmer, editors, Uncertainty in Artificial Intelligence, pages 47–67, New York, 1986. North-Holland. [20] J. von Neumann and O. Morgenstern. Theory of Games and Economic Behavior. Prince- ton University Press, Princeton, NJ, 1947. [21] R.L. Keeney and H. Raiffa. Decisions with Multiple Objectives: Preferences and Value Tradeoffs. Wiley and Sons, New York, 1976. [22] B. J. McNeil, S. G. Pauker, H. C. Sox, and A. Tversky. On the elicitation of preferences for alternative therapies. New England Journal of Medicine, 306:1259–62, 1982. [23] R.A. Howard. On making life and death decisions. In R.C. Schwing and W.A. Albers, Jr., editors, Societal Risk Assessment, pages 89–113. Plenum Publishing, New York, 1980. [24] R.A. Howard. Microrisks for medical decision analysis. International Journal of Tech- nology Assessment in Health Care, 5:357–370, 1989. [25] M. Tribus. Rational Descriptions, Decisions, and Designs. Pergamon Press, New York, 1969. [26] R.S. Ledley and L.B. Lusted. Reasoning foundations of medical diagnosis. Science, 130:9–21, 1959. [27] G.A. Gorry and G.O. Barnett. Experience with a model of sequential diagnosis. Com- puters and Biomedical Research, 1:490–507, 1968. [28] R.A. Howard. Value of information lotteries. IEEE Transactions of Systems Science and Cybernetics, SSC-3(1):54–60, 1967. [29] R.A. Howard. Decision Analysis, EES231 Course Notes. Department of Engineering– Economic Systems, Stanford University, Stanford, CA, 1985. [30] H.R. Warner, A.F. Toronto, L.G. Veasy, and R. Stephenson. A mathematical approach to medical diagnosis: Application to congenital heart disease. Journal of the American Medical Association, 177:177–183, 1961. 28 [31] F.T. de Dombal, D.J. Leaper, J.R. Staniland, A.P. McCann, and J.C. Horrocks. Computer-aided diagnosis of acute abdominal pain. British Medical Journal, 2:9–13, 1972. [32] G.A. Gorry. Computer-assisted clinical decision making. Methods of Information in Medicine, 12:45–51, 1973. [33] E.H. Shortliffe. Computer-based Medical Consultations: MYCIN. North-Holland, New York, 1976. [34] D.G. Fryback. Bayes’ theorem and conditional nonindependence of data in medical diagnosis. Computers and Biomedical Research, 11:423–434, 1978. [35] M.J. Norusis and J.A. Jacquez. Diagnosis 1: Symptom nonindependence in mathemat- ical models for diagnosis. Computers and Biomedical Research, 8:156–172, 1975. [36] P. Szolovits. Artificial intelligence in medicine. In P. Szolovits, editor, Artificial Intelli- gence In Medicine, pages 1–19. Westview Press, Boulder, CO, 1982. [37] R. Davis. Consultation, knowledge acquisition, and instruction. In P. Szolovits, editor, Artificial Intelligence In Medicine, pages 57–78. Westview Press, Boulder, CO, 1982. [38] E.H. Shortliffe and B.G. Buchanan. A model of inexact reasoning in medicine. Mathe- matical Biosciences, 23:351–379, 1975. [39] R.A. Miller, E.P. Pople, and J.D. Myers. INTERNIST-1: An experimental computer- based diagnostic consultant for general internal medicine. New England Journal of Medicine, 307:476–486, 1982. [40] R.A. Miller, M.A. McNeil, S.M. Challinor, F.E. Masarie, and J.D. Myers. The INTERNIST-1/Quick Medical Reference project: Status report. Western Journal of Medicine, 145:816–822, 1986. [41] A.P. Dempster. Upper and lower probabilities induced by a multivalued mapping. Ann. Math. Statistics, 38:325–339, 1967. [42] G. Shafer. A Mathematical Theory of Evidence. Princeton University Press, Princeton, NJ, 1976. [43] G. Shafer. Savage revisited. Statistical Science, 1:463–501, 1986. [44] J. A. Barnett. Computational methods for a mathematical theory of evidence. In Proceedings of the Seventh International Joint Conference on Artificial Intelligence, Vancouver, BC, pages 868–875. International Joint Conference on Artificial Intelligence, August 1981. [45] J. Gordon and E.H. Shortliffe. A method for managing evidential reasoning in a hier- archical hypothesis space. Artificial Intelligence, 26:323–357, 1985. 29 [46] G. Shafer. Probability judgment in artificial intelligence. In Proceedings of the Workshop on Uncertainty and Probability in Artificial Intelligence, Los Angeles, CA, pages 91–98. Association for Uncertainty in Artificial Intelligence, Mountain View, CA, August 1985. Also in Kanal, L. and Lemmer, J., editors, Uncertainty in Artificial Intelligence, pages 127–136. North-Holland, New York, 1986. [47] R. Hummel and M. Landy. Evidence as opinions of experts. In Proceedings of the Second Workshop on Uncertainty in Artificial Intelligence, Philadelphia, PA, pages 135–143. Association for Uncertainty in Artificial Intelligence, Mountain View, CA, August 1986. Also in Kanal, L. and Lemmer, J., editors, Uncertainty in Artificial Intelligence 2, pages 43–53. North-Holland, New York, 1988. [48] P. Smets. Constructing the pignistic probability function in a context of uncertainty. In Proceedings of the Fifth Workshop on Uncertainty in Artificial Intelligence, Windsor, ON, pages 319–326. Association for Uncertainty in Artificial Intelligence, Mountain View, CA, August 1989. Also in Henrion, M., Shachter, R., Kanal, L., and Lemmer, J., editors, Uncertainty in Artificial Intelligence 5, pages 29–39. North-Holland, New York, 1990. [49] L. Zadeh. Is probability theory sufficient for dealing with uncertainty in AI: A negative view. In L.N Kanal. and J.F. Lemmer, editors, Uncertainty in Artificial Intelligence, pages 103–116. North-Holland, New York, 1986. [50] D.E. Heckerman. An empirical comparison of three inference methods. In Proceedings of the Fourth Workshop on Uncertainty in Artificial Intelligence, Minneapolis, MN, pages 158–169. Association for Uncertainty in Artificial Intelligence, Mountain View, CA, August 1988. Also in Shachter, R., Levitt, T., Kanal, L., and Lemmer, J., editors, Uncertainty in Artificial Intelligence 4, pages 283–302. North-Holland, New York, 1990. [51] D.E. Heckerman. Probabilistic interpretations for MYCIN’s certainty factors. In Pro- ceedings of the Workshop on Uncertainty and Probability in Artificial Intelligence, Los Angeles, CA, pages 9–20. Association for Uncertainty in Artificial Intelligence, Moun- tain View, CA, August 1985. Also in Kanal, L. and Lemmer, J., editors, Uncertainty in Artificial Intelligence, pages 167–196. North-Holland, New York, 1986. [52] B. Grosof. Evidential confirmation as transformed probability. In Proceedings of the Workshop on Uncertainty and Probability in Artificial Intelligence, Los Angeles, CA, pages 185–192. Association for Uncertainty in Artificial Intelligence, Mountain View, CA, August 1985. Also in Kanal, L. and Lemmer, J., editors, Uncertainty in Artificial Intelligence, pages 153–166. North-Holland, New York, 1986. [53] D.E. Heckerman and R.A. Miller. Towards a better understanding of the INTERNIST- 1 knowledge base. In Proceedings of Medinfo, Washington, DC, pages 27–31. North- Holland, New York, October 1986. [54] M. Henrion. Towards efficient probabilistic inference in multiply connected belief net- works. In R.M. Oliver and J.Q. Smith, editors, Influence Diagrams, Belief Nets, and Decision Analysis, chapter 17. Wiley and Sons, New York, 1990. 30 [55] D.E. Heckerman. A tractable algorithm for diagnosing multiple diseases. In Proceedings of the Fifth Workshop on Uncertainty in Artificial Intelligence, Windsor, ON, pages 174–181. Association for Uncertainty in Artificial Intelligence, Mountain View, CA, August 1989. Also in Henrion, M., Shachter, R., Kanal, L., and Lemmer, J., editors, Uncertainty in Artificial Intelligence 5, pages 163–171. North-Holland, New York, 1990. [56] M. Shwe, B. Middleton, D. Heckerman, M. Henrion, E. Horvitz, H. Lehmann, and G. Cooper. Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base: Part I. the probabilistic model and inference algorithms. Methods in Information and Medicine, 30:241–250, 1991. [57] B. Middleton, M. Shwe, D. Heckerman, M. Henrion, E. Horvitz, H. Lehmann, and G. Cooper. Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base: Part II. evaluation of diagnostic performance. Methods in Information and Medicine, 30:256–267, 1991. [58] R. Johnson. Independence and Bayesian updating methods. In Proceedings of the Workshop on Uncertainty and Probability in Artificial Intelligence, Los Angeles, CA, pages 28–30. Association for Uncertainty in Artificial Intelligence, Mountain View, CA, August 1985. Also in Kanal, L. and Lemmer J., editors, Uncertainty in Artificial Intel- ligence, pages 197–201. North-Holland, New York, 1986. [59] R.D. Shachter and D.E. Heckerman. Thinking backward for knowledge acquisition. AI Magazine, 8:55–63, 1987. [60] A. Tversky and D. Kahneman. Causal schemata in judgments under uncertainty. In D. Kahneman, P. Slovic, and A. Tversky, editors, Judgement Under Uncertainty: Heuristics and Biases. Cambridge University Press, New York, 1982. [61] E.J. Horvitz, J.S. Breese, and M. Henrion. Decision theory in expert systems and artificial intelligence. International Journal of Approximate Reasoning, 2:247–302, 1988. [62] F.P. Ramsey. Truth and probability. In R.B. Braithwaite, editor, The Foundations of Mathematics and other Logical Essays. Humanities Press, London, 1931. Reprinted in Kyburg and Smokler, 1964. [63] B. de Finetti. La prévision: See lois logiques, ses sources subjectives. Annales de l’Institut Henri Poincaré, 7:1–68, 1937. Translated in Kyburg and Smokler, 1964. [64] H.E. Kyburg and H.E. Smokler. Studies in Subjective Probability. Wiley and Sons, New York, 1964. [65] R. Cox. Probability, frequency and reasonable expectation. American Journal of Physics, 14:1–13, 1946. [66] L.J. Savage. The Foundations of Statistics. Dover, New York, 1954. [67] R.A. Howard. Risk preference. In R.A. Howard and J.E. Matheson, editors, Readings on the Principles and Applications of Decision Analysis, volume II, pages 629–663. Strategic Decisions Group, Menlo Park, CA, 1970. 31 [68] A. Tversky and D. Kahneman. Judgment under uncertainty: Heuristics and biases. Science, 185:1124–1131, 1974. [69] D. Kahneman, P. Slovic, and A. Tversky, editors. Judgment Under Uncertainty: Heuris- tics and Biases. Cambridge University Press, New York, 1982. [70] G.A. Gorry, J.P. Kassirer, A. Essig, and W.B. Schwartz. Decision analysis as the basis for computer-aided management of acute renal failure. American Journal of Medicine, 55:473–484, 1973. [71] B.G. Buchanan and E.H. Shortliffe, editors. Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project. Addison–Wesley, Reading, MA, 1984. [72] L. Zadeh. Personal communication. 1986. [73] D. Heckerman and B. Nathwani. An evaluation of the diagnostic accuracy of Pathfinder. Computers and Biomedical Research, In press. [74] G.F. Cooper. NESTOR: A Computer-based Medical Diagnostic Aid that Integrates Causal and Probabilistic Knowledge. PhD thesis, Medical Computer Science Group, Stanford University, Stanford, CA, November 1984. Report HPP-84-48. [75] D.J. Spiegelhalter and R.P. Knill-Jones. Statistical and knowledge-based approaches to clinical decision support systems, with an application in gastroenterology. Journal of the Royal Statistical Society, 147:35–77, 1984. [76] J.A. Reggia and B.T. Perricone. Answer justification in medical decision support sys- tems based on Bayesian classification. Computers in Biology and Medicine, 15:161–167, 1985. [77] M. Ben-Bassat, V.K. Carlson, V.K. Puri, M.D. Davenport, J.A. Schriver, M.M. Latif, R. Smith, E.H. Lipnick, and M.H. Weil. Pattern-based interactive diagnosis of multiple disorders: The MEDAS system. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2:148–160, 1980. [78] S. Andreassen, M. Woldbye, B. Falck, and S.K. Andersen. MUNIN: A causal probabilis- tic network for interpretation of electromyographic findings. In Proceedings of the Tenth International Joint Conference on Artificial Intelligence, Milan, Italy, pages 366–372. Morgan Kaufmann, San Mateo, CA, August 1987. [79] G. Nino-Murcia and M. Shwe. An expert system for diagnosis of sleep disorders. In M. Chase, R. Lydic, and C. O’Connor, editors, Sleep Research, volume 20, page 433. Brain Information Service, Los Angeles, CA, 1991. [80] L. Kanal and J. Lemmer, editors. Uncertainty in Artificial Intelligence. North-Holland, New York, 1986. [81] J. Lemmer and L. Kanal, editors. Uncertainty in Artificial Intelligence 2. North-Holland, New York, 1988. 32 [82] L. Kanal, T. Levitt, and J. Lemmer, editors. Uncertainty in Artificial Intelligence 3. North-Holland, New York, 1989. [83] R. Shachter, T. Levitt, L. Kanal, and J. Lemmer, editors. Uncertainty in Artificial Intelligence 4. North-Holland, New York, 1990. [84] M. Henrion, R. Shachter, L. Kanal, and J. Lemmer, editors. Uncertainty in Artificial Intelligence 5. North-Holland, New York, 1990. 33 work_2f5sazhjizewblsxvxrtlowiyu ---- wp-p1m-39.ebi.ac.uk Params is empty 404 sys_1000 exception wp-p1m-39.ebi.ac.uk no 221003657 Params is empty 221003657 exception Params is empty 2021/04/06-03:05:50 if (typeof jQuery === "undefined") document.write('[script type="text/javascript" src="/corehtml/pmc/jig/1.14.8/js/jig.min.js"][/script]'.replace(/\[/g,String.fromCharCode(60)).replace(/\]/g,String.fromCharCode(62))); // // // window.name="mainwindow"; .pmc-wm {background:transparent repeat-y top left;background-image:url(/corehtml/pmc/pmcgifs/wm-nobrand.png);background-size: auto, contain} .print-view{display:block} Page not available Reason: The web page address (URL) that you used may be incorrect. Message ID: 221003657 (wp-p1m-39.ebi.ac.uk) Time: 2021/04/06 03:05:50 If you need further help, please send an email to PMC. Include the information from the box above in your message. Otherwise, click on one of the following links to continue using PMC: Search the complete PMC archive. Browse the contents of a specific journal in PMC. Find a specific article by its citation (journal, date, volume, first page, author or article title). http://europepmc.org/abstract/MED/ work_2jjz24x67jeijo4fjg7ba55s2q ---- A Web-enabled hybrid approach to strategic marketing planning: Group Delphi+a Web-based expert system A Web-enabled hybrid approach to strategic marketing planning: Group DelphiCa Web-based expert system Shuliang Li* Westminster Business School, University of Westminster, 35 Marylebone Road, London NW1 5LS, UK Abstract A Web-enabled hybrid approach to strategic marketing planning is established in this paper. The proposed approach combines the group Delphi technique with a Web-based expert system, called WebStra (developed by the author), to support some key stages of the strategic marketing planning process. The Web-enabled approach is based upon client–server architecture that enables the sharing and delivery of computerised planning models and knowledge via the Internet, intranets or extranets, which allows widespread access by authorised users around the clock, across the world or throughout the company. In order to assess the overall value of the proposed approach, case-based evaluation work has been undertaken. Evaluation findings indicate that the approach is effective and efficient in terms of overcoming time and geographical barriers, saving decision-making time, coupling analysis with human judgment, helping improve decision-making quality, etc. q 2005 Elsevier Ltd. All rights reserved. Keywords: Strategic planning; Marketing strategy; Decision support system; Web-based expert system; Group Delphi; World Wide Web 1. Introduction The globalisation, the complexity and the dynamics of the business environments present real challenges to strategic marketing planners in the 21st century. The needs for appropriate techniques and technologies in support of strategic marketing planning have never been so great. Over the past years, attempts have been made by researchers to develop computer-based systems to support the process of strategic marketing planning. Some related typical work in this field may be found in Belardo, Duchessi, and Coleman (1994), Carlsson, Walden, and Kokkonen (1996), Cavusgil and Evirgen (1997), Levy and Yoon (1995) and McDonald and Wilson (1990), Mitri et al. (1999), Li (2000), and Li and Davies (2001), etc. The relevant systems developed in the past, however, are mainly restricted to assist with individual users using standalone PC-based programs, which may limit users’ access to computerised models and support systems. Moreover, program distribution is a serious problem for many types of expert systems, because most knowledge bases must be updated regularly (Eriksson, 1996). In addition, the critical importance of capturing and combining managerial judg- ment, especially groups of decision-makers’ judgment and intuition, with computer-based support has not been high- lighted adequately in previous research in this field. The World Wide Web is emerging as an increasingly important platform that can reduce the technological barriers and make it easier for users in different geographi- cal locations to access the decision support models and tools (Shim et al., 2002). The widespread use of World Wide Web and the Internet provide an opportunity for enabling computer-based decision support widely available. It is also argued that the Internet can complement traditional ways of competing (Porter, 2001). Appropriate Internet applications can improve communications with actual and potential customers, suppliers and collaborators; and can be a powerful promotion and sales tool (Hamill, 1997). Internet technologies and e-commerce applications can be used strategically for creating competitive advantage in the global markets. Devising Internet marketing strategies and associated e-commerce strategies as an organic part of business strategies is something that strategic marketing plans today should not neglect. The purpose of this study is to establish a Web-enabled approach that combines the advantages of a Web-based Expert Systems with Applications 29 (2005) 393–400 www.elsevier.com/locate/eswa 0957-4174/$ - see front matter q 2005 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2005.04.018 * Tel.: C44 20 7911 5000x3429; fax: C44 20 7911 5839. E-mail address: lish@wmin.ac.uk http://www.elsevier.com/locate/eswa expert system with the benefits of the group Delphi technique and links strategic marketing planning with Internet and e-commerce strategy formulation. The paper is structured as follows. The opening section outlines the logical process, the main functional components and judgmental ingredients of the Web-enabled approach. This is followed by discussions on the evaluation of the overall value of the approach. The final section offers some general conclusions and discussions. 2. A Web-enabled hybrid approach to strategic marketing planning Strategic marketing planning presents challenges to the experience, knowledge, judgment and intuition of individual managers. A group of managers from different functional departments can bring a variety of perspec- tives, knowledge, judgement and intuition to the planning process (Bass, 1983; Minkes, 1987). Eden (1990) points out that those who have the power to act must be integrally involved in the process of strategy develop- ment. Porter (1987) argues that strategic planning should employ multifunctional planning teams. Beveridge, Gear, and Minkes (1997) note that strategic decisions usually require consensus and commitment to a course of actions among groups of individuals because of differing perspectives, interests, and functional biases. This type of group decision-making can involve the resolution of conflict. Turban and Aronson (2001) note that decision- making in groups has the following benefits: groups are better than individuals at catching errors; groups are better than individuals at understanding problems; a group has more information or knowledge; and synergy may be produced. The group Delphi technique (Webler, Levine, Rakel, & Renn, 1991) provides a structured communication process in which a group of participants input, discuss and defend their judgment and intuition concerning existing knowl- edge and information. Firstly, views are discussed openly in the group Delphi. There is direct and immediate feedback. Any ambiguities are clarified immediately. Secondly, discussions or debate provides an internal check for consistency in accepted points of view (Webler et al., 1991). The group Delphi technique can be applied to resolve conflicting views and build consensus. It can also be used as a technique for reducing uncertainty surrounding the managerial inputs to the decision-making process. It is evident that the appropriate use of computer-based support systems can improve the process of marketing planning (Li, 2000; Wilson & McDonald, 1994). Keen and Scott Morton (1978) point out that computer-based decision support implies the use of computers to: assist decision- makers in their decision processes; support, rather than replace, managerial judgment. Computerised models are consistent and unbiased, but rigid, while managers are inconsistent and often lack analytical skills but are flexible in adapting to changing environments (Blattberg & Hoch, 1990; Li, Kinman, Duan, & Edwards, 2000) and have knowledge about their products and markets (Mintzberg, 1994a–c). It is argued that “the capacity of the human mind for formulating and solving complex problems is very small compared with the size of the problems whose solution is required for objective rational behaviour in the real world— or even for a reasonable approximation to such objective rationality” (Simon, 1957). The key ingredient is, therefore, the integration of the decision-makers into the model (McIntyre, 1982). It is also found that marketing planning is typically the shared collective responsibility of managers in many large companies (Li et al., 2000). Thus, combining computerised models with a group of decision-makers’ judgment and intuition would lead to better strategic decisions The World Wide Web provides a useful platform for developing, sharing and delivering decision support tools. The primary Web tools are Web servers using Hypertext Transfer Protocol (HTTP) containing Web pages created with Hypertext Markup Language (HTML), JavaScript, etc. accessed by client PCs running client software known as a Web browser (Shim et al., 2002). The driving forces for developing a Web-enabled hybrid approach towards strategic planning are: to publish marketing strategy expertise and guidelines on the Web; to transport the decision support models and tools over the Internet, extranets or intranets; to provide intelligent support round the clock, around the world or throughout the company; to link the formulation of marketing strategies with the development of e-com- merce strategies; and to combine Web-based intelligent support with group judgment and intuition. In particular, the Web-enabled approach aims at supporting the following three key stages of the strategic marketing planning process: † Assessing marketing environments and performing strategic analysis; † Producing strategic portfolio summary and generating strategic recommendations/options; † Selecting marketing strategies and related Internet/e- commerce strategies. The proposed approach is illustrated in Fig. 1. The Web-enabled approach is based on client–server architecture that allows the sharing and delivery of computerised intelligent decision support models via the Internet, intranets or extranets, which enables widespread access by any authorised user. The client can be an Internet-connected computer with a suitable Web browser such as Microsoft Internet Explorer. Users interact with the system using the normal Web access techniques of S. Li / Expert Systems with Applications 29 (2005) 393–400394 https://isiarticles.com/article/967 work_23cetdgvhrdpvdevpor2azsreu ---- Ontology matching: A literature review Expert Systems with Applications 42 (2015) 949–971 Contents lists available at ScienceDirect Expert Systems with Applications j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a Ontology matching: A literature review http://dx.doi.org/10.1016/j.eswa.2014.08.032 0957-4174/� 2014 Elsevier Ltd. All rights reserved. ⇑ Corresponding author. E-mail addresses: locerdeira@uvigo.es (L. Otero-Cerdeira), franjrm@uvigo.es (F.J. Rodríguez-Martínez), alma@uvigo.es (A. Gómez-Rodríguez). Lorena Otero-Cerdeira ⇑, Francisco J. Rodríguez-Martínez, Alma Gómez-Rodríguez LIA2 Research Group, Computer Science Department, University of Vigo, Spain a r t i c l e i n f o a b s t r a c t Article history: Available online 30 August 2014 Keywords: Ontology matching Literature review Classification framework User survey The amount of research papers published nowadays related to ontology matching is remarkable and we believe that reflects the growing interest of the research community. However, for new practitioners that approach the field, this amount of information might seem overwhelming. Therefore, the purpose of this work is to help in guiding new practitioners get a general idea on the state of the field and to determine possible research lines. To do so, we first perform a literature review of the field in the last decade by means of an online search. The articles retrieved are sorted using a classification framework that we propose, and the differ- ent categories are revised and analyzed. The information in this review is extended and supported by the results obtained by a survey that we have designed and conducted among the practitioners. � 2014 Elsevier Ltd. All rights reserved. 1. Introduction Ontology matching is a complex process that helps in reducing the semantic gap between different overlapping representations of the same domain. The existence of such different representations obeys to the natural human instinct to have different perspectives and hence to model problems differently. When these domains are represented using ontologies, the solution typically involves the use of ontology matching techniques. Ontologies and ontology matching techniques are an increasing trend as ontologies provide probably the most interesting opportu- nity to encode meaning of information. The last decades have born witness to a period of extensive research in this field. Nowadays, far from dying down, the activity seems to be increasing and new publications, where the ontology matching field is addressed, are continuously being released. This reflects the global interest in ontology matching which we have studied by means of an analytical review of the literature so far. Other works and publications have successfully presented the state-of-the-art in the field such as, Euzenat (2004), Shvaiko and Euzenat (2013) and Kalfoglou and Schorlemmer (2003b), although our purpose is quite different. We aim at retrieving articles related to ontology matching that have been published in the last decade, to classify and identify research lines relevant for ontology match- ing. We also aim at providing a reference framework for the inte- gration and classification of such articles. Therefore practitioners approaching the field for the first time would be aware of the dif- ferent types of publications regarding the field to better choose those that better fits their needs, they would gain knowledge about the main issues where the researchers have been working and what are the main trends and challenges still to be addressed in the next years. To this end, the remainder of the paper is organized as follows. In Section 2 a methodology to extract the articles is presented. Sec- tion 3 presents general statistical results of the retrieved publica- tions. Then, Section 4 illustrates the classification framework proposed and describes each one of the categories defined. In Sec- tion 5 we describe the limitations of the literature review and sug- gest a practitioner-oriented survey to support the results of the review. Such review is detailed in Section 6, and its limitations in Section 7. Finally, in Section 8 we present our discussion, conclud- ing remarks and directions for future work. 2. Procedures To retrieve the articles for this literature review, several well- known online databases were queried to obtain articles related to the ontology matching field. As result over 1600 articles were obtained which were filtered to narrow down the selection to the final 694 articles that are included in this review. This screen- ing allowed dismissing 58.09% of the initially retrieved articles. Although this is a high percentage of the articles it is worth notic- ing that the original search was broadly defined so an important number of false positives among the initial results were already expected. The screening of the articles was manually done, and http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2014.08.032&domain=pdf http://dx.doi.org/10.1016/j.eswa.2014.08.032 mailto:locerdeira@uvigo.es mailto:franjrm@uvigo.es mailto:alma@uvigo.es http://dx.doi.org/10.1016/j.eswa.2014.08.032 http://www.sciencedirect.com/science/journal/09574174 http://www.elsevier.com/locate/eswa 950 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 took over 3 months to complete it due to the several iterations and the amount of articles reviewed. This review covers journal articles and conference proceedings published within the last decade. Other publication forms such as books, newspapers, doctoral dissertations, posters, etc., were not considered as researchers use mainly journals and conference papers to obtain and spread knowledge, thus these types of publi- cations encompass the majority of the research papers published about this subject. The procedure followed to identify and filter the papers is reflected in Fig. 1. First, different online databases were queried using a combination of the following search strings: ontology matching, ontology mapping and ontology alignment. These dat- abases were: IEEEXplore Digital Library (IEEXplore, 2013), Science Direct (Direct, 2013), ACM Digital Library (ACM, 2013) and Scopus (Scopus, 2013). In addition to these databases, the articles published in the Ontology Alignment Evaluation Initiative (OAEI) (OAEI, 2013) were also included as this initiative is considered as the most prominent one regarding the evaluation of different matching systems and it helps practitioners improve their works on matching techniques. Sorting these data sources by amount of articles we have: Sco- pus (626), ACM Digital Library (383), Science Direct (267), OAEI (254) and IEEEXplore Digital Library (126), making this way a total of 1656 articles initially obtained. The next step was to dismiss those articles that were dupli- cated, i.e, that have been obtained through two or more data sources. When this situation arose, the criterion followed was to dismiss the articles belonging to the data source with a higher number of articles. By doing so, 335 articles were removed. Next, the 1321 remaining articles were analyzed considering their keywords and abstracts. Those whose keywords did not include specific mentions to the ontology matching field or whose abstracts did not introduce a research regarding this field were excluded. Of all the criteria considered, this was the one that pro- duced the sharpest cut down in the amount of articles. Addition- ally, while reviewing the keywords and abstracts, also the papers that corresponded to a poster publication were dismissed. Finally, the 795 articles remaining were carefully reviewed to dismiss those that did not consider ontology matching as their core Fig. 1. Procedure followed to retrieve th part. By applying this criterion another 101 articles were excluded, therefore leaving the 694 articles that are included in this litera- ture review. 3. Statistical results In this section some statistical information about the articles is presented and discussed. Articles were analyzed regarding publica- tion year and database from which they were obtained. 3.1. Articles sorted by publication year Fig. 2 represents the progression of the number of articles with respect to their publication years. The measurements and values that shape this progression are shown in Table 1. In this figure we can observe that the amount of published articles steadily increases from 2003 to 2012, where it peaks. The sharpest rise was found between 2005 and 2006 where the percentage of pub- lished articles rose from 3.75% to 8.36%. Between 2012 and 2013 we observe a pronounced decrease in the amount of published articles, although as this review covers only the first semester of 2013, it is highly likely that the amount of published articles by the end of the year would follow the increasing trend of the previ- ous years. This increasing pattern reflects the global interest of the research community in the ontology matching field. 3.2. Articles sorted by data source To retrieve the articles for this literature review, a total of five different data sources were used. Among these, four are well- known online databases that were queried to obtain the articles. The other source was the Ontology Alignment Evaluation Initiative site, where all the publications related to this initiative are published. The classification of the retrieved articles by data source is shown in Table 2 and graphically depicted in Fig. 3. These data state that Scopus provides the highest amount of articles to the total (41.79%, 290 articles) probably because it includes a wider variety of source journals. On the other hand, ScienceDirect pro- e articles for the literature review. Fig. 2. Articles sorted by publication year. Table 1 Articles with respect to publication year. Publication year Number of articles Percentage over total (%) 2003 4 0.58 2004 17 2.45 2005 26 3.75 2006 58 8.36 2007 75 10.81 2008 84 12.10 2009 83 11.96 2010 88 12.68 2011 100 14.41 2012 107 15.42 2013 52 7.49 Total 694 Table 2 Articles with respect to data source. Data source Number of articles Percentage over total (%) ACM Digital Library 72 10.37 IEEExplore Digital Library 114 16.43 Ontology Alignment Evaluation Initiative 174 25.07 ScienceDirect 44 6.34 Scopus 290 41.79 Total 694 Fig. 3. Articles sorted by data source. Fig. 4. Classification Framework. L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 951 vided significantly less articles than any of the other data sources analyzed (6.34%, 44 articles). The remaining data sources consid- ered respectively provide the 10.37% (72 articles, ACM Digital Library), 16.43% (114, IEEExplore Digital Library) and 25.07% (174 articles, OAEI). During the first filtering step (see Fig. 1) after initially retrieving the articles from the different data sources, they were processed in order to remove those duplicate results. In this situation the article was always removed from the data source that had more articles as the impact in the overall results for each database would be less significative than in the case of dismissing those from the data sources with less articles. 4. Classification Relying on the analysis of the articles selected for the literature review, we have defined an abstract framework that helps classify- ing them. This framework, depicted in Fig. 4, shows six different types of articles and it is in line with the general outline of the main issues in ontology matching proposed by Euzenat and Shvaiko in Euzenat and Shvaiko (2007, 2013). The different catego- ries identified cover the most prominent fields of interest in ontol- ogy matching. � Reviews. This category includes the publications devoted to sur- veying and reviewing the field of ontology matching. It also includes those articles focused on detailing the state-of-the- art as well as the future challenges in this field. � Matching techniques. This category covers the publications focused on different similarity measures, matching strategies and methodologies, that can be used in the matching systems. � Matching systems. This category includes those articles intro- ducing new matching systems and algorithms, and also those detailing enhancements to existing ones. � Processing frameworks. This category comprehends the articles that delve into the different uses of the alignments, i.e, those operations that can be performed from alignments, such as ontology merging, reasoning or mediation. � Practical applications. This category covers those articles that describe matching solutions applied to a real-life problem. � Evaluation. This category covers the articles describing different available approaches to evaluate the matching systems, as well as the different existing benchmarks and the most relevant per- formance measures. The results of classifying the articles within the framework defined are summarized in Table 3. According to our results the Table 3 General results of the classification. Category Number of articles Percentage (%) Reviews 46 6.63 Matching techniques 85 12.25 Matching systems 302 43.52 Processing frameworks 147 21.18 Practical applications 76 10.95 Evaluation 38 5.48 Total 694 952 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 greater efforts have been focused on developing matching systems and frameworks that exploit the alignments, while the evaluation of such systems as well as the review of the field and the develop- ment of applied solutions have not been object of such dedication. In Table 4 the results of the classification are shown sorted by year, identifying the amount of articles that match into each cate- gory and its yearly percentage. These results are graphically com- pared in Fig. 5 where the evolution can be more clearly distinguished. The amount of articles into Reviews showed an slight but upward trend within the first five years of the time frame consid- ered, peaking in 2008. Ever since, the trend in the amount of reviews per year has remained almost constant. The evolution of the Matching Techniques did not show any significant change until 2007 when the amount of articles rose from three to thirteen. After this sharp increase, the values remained constant for the next year and then they fell to six in 2009. In 2010 there was another signif- icant increase, followed by yet another fall in the amount of arti- cles. Apparently the periods of intense work in this category are followed by others where the amount of publications is signifi- cantly lower. Regarding the Matching Systems, the amount of published arti- cles shows an upward trend in the considered period, since, as sta- ted before, the values for 2013 can not be considered definitive as the articles for this review were retrieved in the first semester of the year. Regardless of the general upward trend, the amount of articles in this category reached some local peaks in 2006, 2009 and 2012, followed by periods of less activity. Anyway, it is impor- tant to notice that the periods of lower activity in this category cor- respond with periods of higher activity in matching techniques and vice versa. It is highly likely that the same researchers that define new matching systems are the ones that had suggested new matching techniques to be used within them, which also validates the process of construction of a matching system. The evolution in the number of Processing Frameworks was increasingly constant until 2007 when it reached a local peak, just to show a slight downward trend for two years, before starting a continuos rise that took the amount of published articles regarding this category to its top value in 2012. Regarding the Practical applications, the amount of articles shows in general an upward trend. It reaches a local peak in 2007 with 8 articles. In 2008 it shows a slight decrease. However, since 2010, the number of articles continues to rise every year. It is worth noticing that even when the data for 2013 does not cover the whole year, the amount of articles devoted to practical applica- tions was already a 66.6% of the articles published in 2012 in the same category. Finally, the publications related to Evaluation show an evolution pattern really similar to the one detected for the reviews, not show- ing any significant behavior. After providing some general outline of the evolution of the dif- ferent categories over the years, in the following sections a deeper analysis of each one of them is included, considering inner classifi- cations for each one of these general categories. 4.1. ‘Reviews’ category Within this category the publications have been further sorted according to their scope, namely, the publications were identified as being of either general purpose or specific purpose, as depicted in Fig. 6. More than half of the 46 articles included in this category, 26, were considered general purpose reviews as they offer some insight into the ontology matching field without specifically emphasizing on any subject. Among these, there are for instance, surveys (Falconer, Noy, & Storey, 2007; Shvaiko & Euzenat, 2005; Thayasivam, Chaudhari, & Doshi, 2012; Zhu, 2012), state-of-the- art articles (Droge, 2010; Gal & Shvaiko, 2008; Ngo, Bellahsene, & Todorov, 2013) and publications unveiling the future challenges of the field (Kotis & Lanzenberger, 2008; Shvaiko & Euzenat, 2013). In turn, the remaining 20 articles show a more limited scope within the field. When analyzing these articles we have stated that they were devoted either to delving into a very specific area within the ontology matching field or to studying the feasibility of apply- ing ontology matching to a certain domain. Some of the specific fields within ontology matching that were addressed cover topics such as (i) matching across different lan- guages (Fu, Brennan, & O’Sullivan, 2009), (ii) instance-based match- ing (Castano, Ferrara, Lorusso, & Montanelli, 2008), which is one among the different types of techniques to perform ontology matching. Other articles developed the subject of (iii) external sources for ontology matching (Fugazza & Vaccari, 2011; Lin & Sandkuhl, 2008a), these techniques take advantage of auxiliary or external resources in order to find matchings to terms based on lin- guistic relations between them such as synonymy or hyponymy. These external resources are usually lexicons or thesauri. On the other hand, among the domains where ontology match- ing could be used, we have identified articles on domains such as geography (Tomaszewski & Holden, 2012), medicine (Wennerberg, 2009) or agriculture (Lauser et al., 2008). 4.2. ‘Matching Techniques’ category Ontology matching techniques propose different approaches for the matching that are implemented in ontology matching algorithms. When building an ontology matching system, differ- ent algorithms are usually used, exploiting therefore different ontology matching techniques. In this category two different types of articles have been identified. Some articles are devoted to describing new or enhanced similarity measures and to analyz- ing the building blocks of the ontology matching algorithms, while others make use of such artifacts to define matching strategies or methodologies. In total there are 85 articles in this category, where 57:65% (49 articles) belong to the first group of basic matching techniques and 42:35% (36 articles) belong to complex matching techniques. The different matching techniques have been subject of study in the latest years. For the purpose of this review, to sort the (i) basic matching techniques we have followed the classification proposed by Euzenat and Shvaiko (Euzenat & Shvaiko, 2013), depicted in Fig. 7, since to the best our knowledge is the most complete one and reflects most of the other previous classifications. This classifi- cation is an evolution of another previously proposed by the same authors in Euzenat and Shvaiko (2007). This classification can be followed top-down and therefore focusing on the interpretation that the different techniques offer to the input information, but also bottom-up, focusing on the type of the input that the matching techniques use. Despite the fol- lowed approach both meet at the concrete techniques tier. Following the top-down interpretation, the matching tech- niques can be classified in a first level as: Table 4 Results of the classification. Year Category Number of articles Total Annual percentage 2003 Reviews 1 4 25.00% Matching techniques – – Matching systems 1 25.00% Processing frameworks 2 50.00% Practical applications – – Evaluation 0 – 2004 Reviews 1 17 5.88% Matching techniques 5 29.41% Matching systems 6 35.29% Processing frameworks 3 17.65% Practical applications – – Evaluation 2 11.76% 2005 Reviews 4 26 15.38% Matching techniques 3 11.54% Matching systems 6 23.08% Processing frameworks 9 34.62% Practical applications 1 3.85% Evaluation 3 11.54% 2006 Reviews 2 58 3.45% Matching techniques 3 5.17% Matching systems 30 51.72% Processing frameworks 18 31.03% Practical applications 4 6.90% Evaluation 1 1.72% 2007 Reviews 4 75 5.33% Matching techniques 13 17.33% Matching systems 27 36.00% Processing frameworks 19 25.33% Practical applications 8 10.67% Evaluation 4 5.33% 2008 Reviews 8 84 9.52% Matching techniques 13 15.48% Matching systems 34 40.48% Processing frameworks 16 19.05% Practical applications 5 5.95% Evaluation 8 9.52% 2009 Reviews 5 83 6.02% Matching techniques 6 7.23% Matching systems 46 55.42% Processing frameworks 11 13.25% Practical applications 10 12.05% Evaluation 5 6.02% 2010 Reviews 3 88 3.41% Matching techniques 18 20.45% Matching systems 37 42.05% Processing frameworks 16 18.18% Practical applications 10 11.36% Evaluation 4 4.55% 2011 Reviews 4 100 4.00% Matching techniques 9 9.00% Matching systems 47 47.00% Processing frameworks 21 21.00% Practical applications 13 13.00% Evaluation 6 6.00% 2012 Reviews 6 107 5.61% Matching techniques 9 8.41% Matching systems 52 48.60% Processing frameworks 23 21.50% Practical applications 15 14.02% Evaluation 2 1.87% 2013 Reviews 8 52 15.38% Matching techniques 6 11.54% Matching systems 16 30.77% Processing frameworks 9 17.31% Practical applications 10 19.23% Evaluation 3 5.77% L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 953 Fig. 5. Evolution of the different categories. Fig. 6. Specific types of articles within ‘Reviews’ category. 954 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 – Element-level matchers: these techniques obtain the corre- spondences by considering the entities in the ontologies in isolation, therefore ignoring that they are part of the struc- ture of the ontology. – Structure-level matchers: these techniques obtain the corre- spondences by analyzing how the entities fit in the structure of the ontology. In the second level of the classification, the techniques can be further classified as: – Syntactic: these techniques limit their input interpretation to the instructions stated in their corresponding algorithms. – Semantic: these techniques use some formal semantics to interpret their input and justify their results. If reading the classification bottom-up, the elementary match- ing techniques can be initially divided in two categories deter- mined by the origin of the information considered for the matching process: – Content-based: these techniques focus on the internal infor- mation coming from the ontologies to be matched. – Context-based: these techniques consider for the matching, external information that may come from relations between ontologies or other external resources (context). In the second level of the classification, both categories are fur- ther refined. The content-based category is further divided into four new groups, depending on the input that the techniques use: – Terminological: these methods consider their inputs as strings. – Structural: these methods are based on the structure of the entities (classes, individuals, relations) found in the ontology. – Extensional: these methods compute the correspondences by analyzing the set of instances of the classes (extension). – Semantic: these techniques need some semantic interpreta- tion of the input and usually use a reasoner to deduce the correspondences. In the second level of the classification for the context-based cat- egory, the techniques can be also further classified as syntactic or semantic techniques. The next level in any of both classifications already corresponds to the specific techniques. Following the different paths in this classification tree, several techniques may be reached. These cate- gories were used to further sort the 49 articles belonging to basic matching techniques. The classification of these articles was partic- ularly hard since most of them were not devoted to a single tech- nique but to several, so in the following we provide an example of articles that match into some categories but it is worth noticing that many of them may also be included in another one. � Formal Resource-based: these techniques use formal resources to support the matching process, such as upper level ontologies, domain-specific ontologies or the recorded alignments of previ- Fig. 7. Matching techniques classification. Extracted from the book ‘Ontology Matching’ (Euzenat & Shvaiko, 2013). L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 955 ously matched ontologies (alignment reuse). Examples of such techniques are Scharffe, Zamazal, and Fensel (2013) and Mascardi, Locoro, and Rosso (2010). � Informal Resource-based: these techniques, as those in the previ- ous category, also exploit an external resource, but in this case the external resources are informal ones. This group of tech- niques deduce relations between ontologies using the relation between the ontologies and such informal resources. An exam- ple of such category could not be found among the results of the review. � String-based: these techniques are based on the similarity of the strings that represent the names and descriptions of the entities in the ontologies. There are several string distance metrics that can be used in these methods Levenshtein, Jaccard, Jaro-Winkler, Euclidean, TFIDF, etc., (Cohen, Ravikumar, & Fienberg, 2003). Such techniques are present, for instance in the work of Akbari, Fathian, and Badie (2009). � Language-based: these techniques rely on Natural Language Pro- cessing, as these do not consider names as simply strings but words in some natural language. Techniques in this category are, for example, tokenisation, lemmatisation or stopword elimi- nation, some of which are applied by Shah and Syeda-Mahmood in Shah and Syeda-Mahmood (2004). This category also consid- ers those techniques that take advantage from external resources to find similarities between terms, using for instance, lexicons, dictionaries or thesauri. In He, Yang, and Huang (2011) for instance, the WordNet (WordNet, 2013) database is used as the external resource. � Constraint-based: these techniques consider criteria regarding the internal structure of the entities, such as the domain and range of the properties or the types of the attributes, to calculate the similarity between them. It is common to use these tech- niques in combination with others as in the work by Glückstad (2010). � Graph-based: these techniques consider the ontologies to match as labelled graphs, or even trees, and treat the ontology match- ing problem as a graph homomorphism problem. An example of these techniques can be found in the paper from Joslyn, Paulson, and White (2009). This category also considers those techniques that exploit as external resources, repositories where ontologies and their fragments, together with certain similarity measures are stored. A proposal in this line can be found in the paper from Aleksovski, Ten Kate, and Van Harmelen (2008). � Taxonomy-based: these techniques can be seen as a particular case of the previous ones which only consider the specialization relation. Examples were these techniques were applied could not be found in the articles belonging to basic matching tech- niques. However, an example of its application can be found in the work by Warin and Volk (2004). � Instance-based: these techniques exploit the extension of the classes in the ontologies, i.e., the individuals, with the intuition that if the individuals are alike, then the classes they belong to should also be similar. These techniques can use set-theoretic principles but also more elaborated statistical techniques. In this category we can classify the work by Loia, Fenza, De Maio and Salerno presented in Loia, Fenza, De Maio, and Salerno (2013). � Model-based: these techniques exploit the semantic interpreta- tion linked to the input ontologies. An example of this category are the description logics reasoning techniques, which are applied in the work published by Sánchez-Ruiz, Ontañón, González- Calero, and Plaza (2011). As mentioned at the beginning of this section, the techniques we have just presented, are the building blocks upon which (ii) complex matching techniques are built. This category includes meth- odologies that propose different ways to tackle the matching prob- lem but from a higher point of view. Examples of this can be found in Cohen et al. (2003) where Cohen, Ravikumar and Fienberg pro- pose the partitioning of the ontologies before starting the matching process, in Dargham and Fares (2008) where the Dargham and Fares present their methodology which takes as basis some well- known algorithms or in Acampora, Loia, Salerno, and Vitiello (2012) where Acampora, Loia, Salerno and Vitiello propose the use of a memetic algorithm to perform the alignment between two ontologies. 956 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 Other matching techniques also included in this category are those that can not be considered basic or that take advantage of other aspects of building a matching solution that are not directly related to computing the alignments. There are for instance articles devoted to presenting (i) different ways of aggregating the results from differ- ent similarity measures (Lai et al., 2010; Lin & Sandkuhl, 2007; Tian & Guo, 2010), others that (ii) combine the results of different matchers (Liu et al., 2012). There are others that also include techniques inher- ited from other fields such as (iii) learning methods (Rubiolo, Caliusco, Stegmayer, Coronel, & Fabrizi, 2012; Todorov, Geibel, & Kühnberger, 2010), probabilistic methods (Calı̀, Lukasiewicz, Predoiu, & Stuckenschmidt, 2008; Spiliopoulos, Vouros, & Karkaletsis, 2010) or those that consider the user’s involvement (Lin & Sandkuhl, 2008b) in the matching process. Despite the various types of matching techniques that have been presented in this section, both basic and complex ones, we are certain that many others will continue to arise. Some of these may offer a revisited version of previously exiting techniques, but more likely new types of techniques will be defined specially for those categories not fully explored yet. The main challenges these techniques face is their efficiency (Kotis & Lanzenberger, 2008; Shvaiko & Euzenat, 2013). Most of them describe approaches to matching ontologies which at a theoretical level may obtain a positive outcome. However, depending on the type of application where these techniques will be implemented, not all techniques will be equally valid. For instance, if considering a dynamic appli- cation, the amount of resources employed in terms of memory and time consumption should be kept to a minimum. 4.3. ‘Matching Systems’ category This category contains the articles focused on detailing new matching algorithms and systems as well as enhancements, mod- ifications or different approaches to previously defined ones. Some of these systems are well-known in the research community as they have participated for several years in the Ontology Alignment Evaluation Initiative. Although the purpose of this review is not to compile an exhaustive list of all the existing systems, it is worth mentioning some of the most relevant ones, such as: � AgreementMaker (Cruz, Antonelli, & Stroe, 2009a) is a schema and ontology matching system. It allows a high customization of the matching process, including several matching methods to be run on inputs with different levels of granularity, also allowing to define the amount of user participation and the for- mats that the input ontologies as well as the results of the align- ment may be stored in. This system has a high level of maturity because from 2007 there is a continuous flow of publications describing its foundations and enhancements: Sunna and Cruz (2007), Cruz, Antonelli, Stroe, Keles, and Maduko (2008), Cruz, Antonelli, and Stroe (2009b, 2009c), Cruz et al. (2010), Pesquita, Stroe, Cruz, and Couto (2010), Cross et al. (2011), Cruz, Stroe, Pesquita, Couto, and Cross (2011) and Cruz et al. (2011). � Anchor-Flood (Hanif & Aono, 2009) is an algorithm for ontology matching that starts out of an initial anchor, i.e, a pair of alike concepts between the ontologies. From this anchor using neigh- borhood concepts, new anchors are identified and the algorithm continues. Anchor-Flood was developed from 2008 to 2009, having in this period a total of 3 reported papers, (Hanif & Aono, 2008a, 2008b, 2009). This system has been tested in two different campaigns of OAEI. � AOAS (Zhang & Bodenreider, 2007) is an ontology matching sys- tem specifically devoted to aligning anatomical ontologies. It takes as input OWL ontologies and identifies 1:1, 1:n and n:m alignments. It is a hybrid approach that uses both direct tech- niques, such as lexical and structural ones, and indirect tech- niques, that consist in the identification of correspondences by means of a reference ontology. AOAS is a really specific sys- tem which, in the period from 2003 to 2013, only accounts for one publication, (Zhang & Bodenreider, 2007). � AROMA (David, 2011) finds equivalence and subsumption rela- tions between classes and properties of two different taxono- mies. It is defined as an hybrid, extensional and asymmetric approach that lays its foundations on the association rule para- digm and statistical measures. AROMA’s developers have been constantly working on it since 2006. There has been at least one paper published every year detailing this system and its evolution for the past 7 years. (David, Guillet, & Briand, 2006; David, 2007, 2008b, 2008a, 2011). This system has taken part in several editions of the OAEI. � ASCO (Le, Dieng-Kuntz, & Gandon, 2004) exploits all the infor- mation available from the entities in the ontologies, names, labels, descriptions, information about the structure, etc, to compute two types of similarities, a linguistic and a structural one, which are lately combined. The development of ASCO started in 2004, however it was discontinued until 2007 when it was reprised. In this 3 year-span, the publications describing it are: Le et al. (2004) and Thanh Le and Dieng-Kuntz (2007). � ASE (Kotis, Katasonov, & Leino, 2012a) is an automated ontology alignment tool based on AUTOMSv2 that computes equivalence and subsumption relations between two input ontologies. This system was released in 2012 and tested that year’s edition of the OAEI. So far, we have not found any other publication apart from Kotis et al. (2012a), describing it. However it is possible that other enhancements may be developed as this system was already based on a previous system by the same authors for which there was also a significant interval between the first release and the subsequent updates. � ASMOV (Behkamal, Naghibzadeh, & Moghadam, 2010) is an algorithm that derives an alignment from the lexical and struc- tural information of two input ontologies by computing a sim- ilarity measure between them. This algorithm also includes a step of semantic verification where the alignments are checked so that the final output does not contain semantic inconsisten- cies. The greatest efforts in maintaining ASMOV were concen- trated in 2008 when authors published Jean-Mary, Shironoshita, and Kabuka (2008) and Jean-Mary and Kabuka (2008), however, this system was constantly maintained from 2007 to 2010. In this period the following articles describe its features and performance: Jean-Mary and Kabuka (2007), Jean-Mary, Shironoshita, and Kabuka (2009, 2010) and Behkamal et al. (2010). Some of these articles detail the results obtained by ASMOV in the different editions it took part in. � AUTOMSv2 (Kotis, Katasonov, & Leino, 2012b) is an automated ontology matching tool that was build as an evolution of the previous tool AUTOMS (Kotis, Valarakos, & Vouros, 2006) which was enhanced with more alignment methods and synthesizing approaches as well as with multilingual support. Articles describing AUTOMSv2 were published in 2008 and 2012. This system shows one of the highest intervals of inactivity of the studied ones. Due to the short time interval from the last article, new articles describing AUTOMSv2 participation in OAEI or new versions could be published. � Coincidence-Based Weighting (Qazvinian, Abolhassani, (Hossein), & Hariri, 2008) uses an evolutionary approach for ontology matching. It takes as input OWL ontologies, which are pro- cessed as graphs, and a similarity matrix between the concepts of the ontologies, obtained from a string distance measure. It includes a genetic algorithm to iteratively refine the mappings. L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 957 This system was developed for 2 years, since 2007 to 2008, as reflect the articles describing it, (Haeri, Abolhassani, Qazvinian, & Hariri, 2007; Qazvinian et al., 2008). � CIDER (Gracia, Bernad, & Mena, 2011) is an ontology matching system that extracts the ontological context of the compared terms by using synonyms, hyponyms, domains, etc., and then enriches it by means of some lightweight inference rules. This system was developed using the Alignment API (David, Euzenat, Scharffe, & dos Santos, 2011). This system has been developed and maintained since 2008 to 2013. From the first paper describing it (Gracia & Mena, 2008) until the next one (Gracia et al., 2011), 3 years passed. Then again, publications regarding this system, were interrupted until last year with (Gracia & Asooja, 2013). This kind of systems developed over several years are usually the product of a deep effort of the developers to correct the issues detected in previous versions, and account for the time-span existing between the publication of the different articles. � CODI (Huber, Sztyler, Nößner, & Meilicke, 2011) uses some lex- ical similarity measures combined with schema information to output the alignments between concepts, properties and indi- viduals. It is based on the syntax and semantics of Markov logic, and turns every matching problem into an optimization prob- lem. According to the data in our review, CODI was developed and maintained from 2010 to 2011. In these years it took part in the corresponding OAEI contests (Huber et al., 2011; Noessner & Niepert, 2010). � COMA (Maßmann, Raunich, Aumüller, Arnold, & Rahm, 2011), COMA++ (Nasir & Noor, 2010) are ontology matching systems that have been developed over the last decade. They are highly evolved and customizable systems that support the combina- tion of different matching algorithms. These are highly evolved systems since they have been maintained and updated until 2011. In this period 5 articles were found, devoted to describing these systems, (Aumüller, Do, Massmann, & Rahm, 2005; Aumueller, Do, Massmann, & Rahm, 2005; Do & Rahm, 2002; Do, 2006; Engmann & Maßmann, 2007; Massmann, Engmann, & Rahm, 2006; Maßmann et al., 2011; Nasir & Noor, 2010). � DSSim (Nagy, Vargas-Vera, & Stolarski, 2009) is an ontology matching system which combines the similarity values pro- vided by both syntactic and semantic similarity algorithms to then refine the correctness of the outputs by means of a belief function. DSSim was developed from 2006 to 2009. In these 4 years, there was at least an article published every year detail- ing the system, its continuous enhancements, and its results in OAEI (Nagy, Vargas-Vera, & Motta, 2006, 2007; Nagy, Vargas- Vera, Stolarski, & Motta, 2008; Nagy et al., 2009). � Eff2Match (Chua & Kim, 2010) is an ontology matching tool that follows a process of anchor generation and expansion. This sys- tem is particularly focused on achieving an efficient perfor- mance and therefore it includes several techniques to reduce the amount of possible candidates to avoid unnecessary com- parisons. Considering the data obtained from our literature review, Eff2Match only has a related publication that was released in 2010 that presents its results for OAEI’10. � FalconAO (Hu & Qu, 2008) obtains the alignment between the input ontologies by internally running two algorithms, a lin- guistic one (LMO) (Zhang, Hu, & Qu, 2011) as a first step, to then use the alignments provided as an external output for the graph matching algorithm (GMO) (Hu, Jian, Qu, & Wang, 2005) subse- quently run. This system was continuously developed from 2005 to 2008 (Hu, Cheng, Zheng, Zhong, & Qu, 2006; Hu et al., 2007; Hu & Qu, 2008; Jian, Hu, Cheng, & Qu, 2005). In 2010, a new publication was released (Hu, Chen, Cheng, & Qu, 2010) with the results of this system in the OAEI. � FBEM (Stoermer & Rassadko, 2009a) is a matching system that is mainly focused on instance matching, this approach considers not only the similarity of entity features as keys and values, but also the fact that some features are more relevant for iden- tifying an entity than others. According to the data obtained in this review, this system is described in just 2 publications released in 2009 and 2010 respectively, (Stoermer & Rassadko, 2009b; Stoermer, Rassadko, & Vaidya, 2010). � FuzzyAlign (Fernández, Velasco, Marsa-Maestre, & Lopez- Carmona, 2012) is a fuzzy, rule-based ontology matching sys- tem that outputs the alignments between two input ontologies by exploiting the lexical and semantical information of the enti- ties’ names and the inner structure of the ontologies. This sys- tem was developed from 2009 to 2012, however, it was not a constant maintenance, since in this period only 2 articles account for its updates and enhancements, (Fernández, Velasco, & López-Carmona, 2009; Fernández et al., 2012). � GeRoMeSuite (Quix, Gal, Sagi, & Kensche, 2010) allows the matching of models represented in different languages, for instance XML Schemas with OWL ontologies. Besides it is a cus- tomizable system that includes several matching algorithms that may be combined according to different ways of aggrega- tion and filtering. GeRoMeSuite is a mature system that has been developed from 2007 to 2010 (Kensche, Quix, Li, & Li, 2007; Quix, Geisler, Kensche, & Li, 2008, 2009; Quix et al., 2010). How- ever, from 2010 to 2013 no new improvements were publicly released. It took part in OAEI editions from 2008 to 2010. � GLUE (Doan, Madhavan, Domingos, & Halevy, 2004) is a semi- automatic ontology matching system that uses a set of com- bined machine learning techniques to output the alignment between the taxonomies of two input ontologies. This system only has a publication released in 2004, which made us believe that its development has been discontinued. � GOMMA (Hartung, Kolb, Groß, & Rahm, 2013) uses several matchers to evaluate both the lexical and structural similarity of the entities from the input ontologies. It uses some enhanced comparison techniques to compute parallel string matching on graphical processing units. GOMMA was first released in 2011, and it has continued to be maintained up to date. The maturity level of this system is significative, as at least one publication regarding its results has been released every year since it was first developed (Groß, Hartung, Kirsten, & Rahm, 2012; Hartung et al., 2013; Kirsten, Gross, Hartung, & Rahm, 2011). This system has been tested so far in a edition of the OAEI, (Groß et al., 2012). � HCONE (Kotis & Vouros, 2004) is an approach to ontology merg- ing that uses WordNet (WordNet, 2013), a lexical database, as an external resource to obtain possible interpretations of the concepts being matched. HCONE was an stable system main- tained from 2004 to 2006 (Kotis & Vouros, 2004; Kotis, Vouros, & Stergiou, 2006; Vouros & Kotis, 2005). After 2006 we could not retrieve any publication were it was used or modified. � Hertuda (Hertling, 2012) is a very simple string matcher that separately handles the alignment of classes and properties, and select among the possible ones those reaching certain pre-established thresholds. Hertuda is a relatively young sys- tem which was released in 2012. Within the limits of this liter- ature review only two articles were found that described its overall behavior and results in OAEI: Hertling (2012) and Grau et al. (2013). � HotMatch (Dang et al., 2012a) combines several matching strat- egies, that exploit both the lexical and structural information, to obtain the alignments between the ontologies. Some filters are included to remove the false-positive mappings from the final 958 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 output. As happens with Hertuda, HotMatch is also a young sys- tem developed from 2012 up to date (Dang et al., 2012b; Grau et al., 2013), hence, new versions and enhancements are to be expected. This system, has also taken part in OAEI’12. � HMatch (Castano, Ferrara, & Messa, 2006) is an ontology match- ing system that linearly combines a linguistic affinity value and a contextual affinity one to compute the similarity of concept names and contexts. Internally it codifies the ontologies as graphs. HMatch was developed from 2006 to 2009 (Castano et al., 2006; Castano, Ferrara, Lorusso, & Montanelli, 2007; Castano et al., 2008). According to the articles retrieved for this review, HMatch only took part in the edition of 2006 of the OAEI. � IF-MAP (Kalfoglou & Schorlemmer, 2003a) is a matching system that lies its foundations on the mathematical theory of semantic information flow. To match two input ontologies it uses a refer- ence ontology used as common reference. In the time-span con- sidered in this literature review, only one article was found devoted to this system. Such publication was released in 2003, it is possible that there are other articles that had pub- lished before that, which would fall outside the scope of this review, anyhow, considering the time elapsed, we are prone to believing that this system has been discontinued. � iMatch (Albagli, Ben-Eliyahu-Zohary, & Shimony, 2012) is a probabilistic ontology matching system based on Markov net- works. It takes OWL ontologies as input and outputs 1:1 align- ments. The matching is tackled as a graph matching problem where the initial similarity between the nodes is provided, for instance, by the users. iMatch was first released in 2009 (Albagli, Ben-Eliyahu-Zohary, & Shimony, 2009) and then revis- ited in 2012 (Albagli et al., 2012). � KOSImap (Reul & Pan, 2010) uses description logic reasoning to firstly obtain implicit background knowledge for every entity in the ontologies, then build a similarity matrix out of the three types of similarities computed for the identified pairs of enti- ties, and finally dismiss those mappings considered false-posi- tives. KOSImap was only maintained for 2 years, since 2009, when it took part in the OAEI, (Reul & Pan, 2009) to 2010 (Reul & Pan, 2010). � LDOA (Kachroudi, Moussa, Zghal, & Yahia, 2011) combines some well-known terminological and structural similarity measures, but it also exploits an external resource by using Linked Data which provides additional information to the entities being matched. In the interval considered in this literature review, we have only retrieved one article devoted to describing this system, in 2011, describing its behavior in the OAEI. As it is rel- atively contemporary, new enhancements and publications are still to be expected. � Lily (Wang, 2011) combines different matching strategies to adapt itself to the problem being tackled at each moment, gen- eric ontology matching (GOM) for the normal-sized ontologies and large scale ontology matching (LOM) for more demanding matching tasks. It also includes a mapping debugging function used to improve the alignment results and to dismiss the faulty ones. Lily was first released in 2007 (Wang & Xu, 2007) and con- tinued to take part in the OAEI contests until 2011 (Wang & Xu, 2008; Wang & Xu, 2009; Wang, 2011). We consider that the reliability of this system has been enough proven as show the different results obtained in the OAEI. � LogMap (Jiménez-Ruiz & Cuenca Grau, 2011) is an ontology matching iterative process that starting with a set on anchor mappings obtained from lexical comparison, alternatively com- putes mapping repair and mapping discovery steps. To discover the new anchors structural information is also exploited. This system has a high level of maturity as in the last 3 years there have been at least 6 publication describing its performance and results in the OAEI contests (Jiménez-Ruiz & Cuenca Grau, 2011; Jiménez-Ruiz, Grau, & Zhou, 2011; Jiménez-Ruiz, Morant, & Grau, 2011; Jiménez-Ruiz, Grau, & Horrocks, 2012; Jiménez-Ruiz, Meilicke, Grau, & Horrocks, 2013). In this period LogMap’s developers have already implemented a light version of LogMap, LogMaplt. We are prone to believing that developers of this system will continue to improve it and include further functionalities and versions. � MaasMatch (Schadd & Roos, 2012a) computes a similarity cube between the concepts in the ontologies which is the result of aggregating a syntactic, a structural, a lexical and a virtual docu- ment similarity. An extraction algorithm is run to dismiss the faulty alignments from the final output. This system has been constantly updated since it was first released in 2011, its reliabil- ity and usefulness have been tested over the years by its partici- pation in the OAEI (Schadd & Roos, 2011, 2012a, 2012b, 2013). � MapPSO (Bock, Dänschel, & Stumpp, 2011) applies the particle swam optimization technique (PSO) to compute the alignment between two input ontologies. The MapEVO (Bock et al., 2011) system, developed by the same authors, relies on the use of evo- lutionary programming, another variant of population-based optimization algorithms. MapPSO has been described in at least 5 different publications between 2008 and 2011. Among those systems revised, MapPSO is one of those with the higher amount of publications (Bock & Hettenhausen, 2008, 2010; Bock, Liu, & Hettenhausen, 2009; Bock, 2010; Bock, Lenk, & Dänschel, 2010; Bock et al., 2011). These publications include the partici- pation of MapPSO in editions of OAEI from 2008 to 2011. � MapSSS (Cheatham, 2011) computes subsequently three types of metrics, syntactic, semantic and structural, and any positive result from any of them is included as a positive solution, and then it explores the neighborhood of the newly matched pair for new possible matches. Instead of defining a filtering system which would dismiss possible pair after being selected, this sys- tem works the other way round only selecting those nodes that match to only another node, and therefore not risking the pos- sibility of choosing a wrong solution. This system was released in 2011, and, ever since it took part in the annual contest of the OAEI. This system has been therefore significantly tested and evaluated (Cheatham, 2011; Cheatham & Hitzler, 2013). � MEDLEY (Hassen, 2012) is an ontology alignment system that uses lexical and structural methods to compute the alignment between classes, properties and instances. It also uses an exter- nal dictionary to tackle the problem of having concepts expressed in different natural languages. This system was described in 2012 in Hassen (2012) where the results of its par- ticipation in that year’s OAEI are summarized. Other publica- tions, as well as its participation in new OAEI editions, are to be expected because this system quite recent. � MoTo (Fanizzi, d’Amato, & Esposito, 2011) takes OWL ontologies as input and obtains equivalence relations between concepts. It initially uses several matchers whose results are combined by means of a metalearner. The alignments obtained are sorted, discarding those invalid ones. The remaining are divided into certain and uncertain. For the uncertain ones a validation pro- cess is started aiming at recovering them. In this literature review we have found publications describing this system both in 2010 (Esposito, Fanizzi, & d’Amato, 2010) and 2011 (Fanizzi et al., 2011), but so far, no other publication has been released. � OACAS (Zghal, Kachroudi, Yahia, & Nguifo, 2011) is an algorithm to align OWL-DL ontologies. It firstly transforms the ontologies into graphs and then it combines and aggregates different sim- ilarity measures. At each moment the most suitable similarity measure is applied according to the type of the entities being matched. It also exploits the neighboring relations of the enti- ties. The first article describing OACAS was published in 2009 (Zghal, Kachroudi, Yahia, & Nguifo, 2009), then until 2011 no L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 959 new articles were retrieved (Zghal et al., 2011). Therefore, con- sidering the two-year span between both, it is possible that new articles will be published within this or the next year. This sys- tem was publicly tested in the OAEI’11. � OLA (Kengue, Euzenat, & Valtchev, 2007) performs the alignment between two graph-represented ontologies and offers some extended features to manipulate the output alignment. This sys- tem was developed between 2004 (Euzenat & Valtchev, 2004) and 2005 (Euzenat, Guégan, & Valtchev, 2005), later, it was revisited in 2007 (Kengue et al., 2007). In this interval, it took part in the OAEI of 2005 and 2007. However, so far, it has been 6 years where no additional articles regarding OLA were retrieve by means of the queries run for this literature review. � oMap (Straccia & Troncy, 2005c) automatically aligns two OWL ontologies by using the prediction of different classifiers, such as terminological, machine learning-based, or some based on the structure and semantics of the OWL axioms. This system was deeply revised in 2005 (Straccia & Troncy, 2005a; Straccia & Troncy, 2005c; Straccia & Troncy, 2005b) when it was released and then in 2006 (Straccia & Troncy, 2006). In 2005 it took part in the OAEI, however, ever since, no new articles describing oMap have been published. � OMEN (Mitra, Noy, & Jaiswal, 2005) uses a Bayesian Network to improve the results of the alignment process by deriving unde- tected correspondences and rejecting existing false positives. By using probabilistic methods it enhances existing ontology map- pings by deriving missed matches and invalidating existing false matches. This system was described in just one paper in 2005 (Mitra et al., 2005). � OntoDNA (Kiu & Lee, 2007) is an automated and scalable system that uses hybrid unsupervised clustering techniques, which include Formal Concept Analysis (FCA) (Formica, 2006), Self- Organizing Map (SOM) and K-Means clustering, in addition to a Levenshtein edit distance as lexical measurement. OntoDNA was first described in 2006 (Kiu & Lee, 2006) and then revisited in 2007 to present its results in the OAEI (Kiu & Lee, 2007). � ontoMATCH (Lu, 2010) exploits both semantic and structural information to compute the alignments. It is internally divided into five components, a preprocessor, three individual matchers (Element Matcher, Relationship Matcher and Property Matcher), a combiner and a selector. This system was described in one paper in 2010 (Lu, 2010). � OPTIMA (Thayasivam et al., 2012) uses lexical information from the concepts to generate a seed alignment. Then it iteratively searches the space of candidate alignments following the tech- nique of expectation–maximization until convergence. This sys- tem was presented in 2011 taking part at the OAEI (Thayasivam & Doshi, 2011). It also took part in the following year’s contest (Thayasivam et al., 2012). � OWL-CM (Yaghlane & Laamari, 2007) uses different matchers to compute the alignments, whose results are then combined. It also includes some belief functions into the alignment process to improve the computed results. In this literature review we only retrieved one article in 2007 (Yaghlane & Laamari, 2007). � PRIOR+ (Mao & Peng, 2007) is an ontology matching system that lays its foundations on propagation theory, information retrie- val and artificial intelligence. It profits the linguistic and struc- tural information of the ontologies to match and measures the profile similarity of different elements in a vector space model. This system took part in the editions of 2006 (Mao & Peng, 2006) and 2007 (Mao & Peng, 2007) of the OAEI where its results and performance are described. � QOM (Ehrig & Staab, 2004) is a variation of the NOM algorithm devoted to improving the efficiency of the system. Some basic matchers are used whose results are refined by means of a sig- moid function, to be lately aggregated and sifted to output the final alignment. This system was developed in the first years of the interval considered in this literature review in 2 papers (Ehrig & Staab, 2004; Ehrig & Sure, 2004). � RiMOM (Wang et al., 2010) uses three different matching strat- egies, name-based, metadata-based and instance-based, whose results are then filtered and combined. A similarity propagation procedure is iteratively run until no more candidate mappings are discovered and the system converges. Within the limits of this review, RiMOM is the system that accounts for the highest number of individual publications describing it, 9. These articles span from 2004 to 2013. In this period, there have been periods of interruption in the flow of publications, however we believe they account for the development of a new version of the sys- tem (Li, Li, Zhang, & Tang, 2006; Li, Zhong, Li, & Tang, 2007; Li, Tang, Li, & Luo, 2009; Tang, Liang, Li, & Wang, 2004; Tang et al., 2006; Wang et al., 2010; Zhang, Zhong, Li, & Tang, 2008; Zhang, Zhong, Shi, Li, & Tang, 2009). RiMOM has taken part in several editions of the OAEI, therefore it has been tested and evaluated over several years. � SAMBO (Lambrix, Tan, & Liu, 2008) contains several matchers that exploit different features of the ontologies, and it is the user the one who decides to use one or several. If several are chosen the combination of the results by each of them are com- puted by means of a weighted sum. Results are then filtered according to some thresholds and presented to the user as sug- gested alignments to be confirmed. This system has been main- tained from 2005 to 2008. In this period, at least 5 different articles were published describing its performance and its results in the OAEI (Lambrix & Tan, 2005, 2006; Lambrix et al., 2008; Tan, Jakoniene, Lambrix, Aberg, & Shahmehri, 2006; Tan & Lambrix, 2007). � SEMA (Spiliopoulos, Valarakos, Vouros, & Karkaletsis, 2007) combines six different matching methods whose running sequence is pre-established and where each method takes as input the results of the previous methods. This procedure is iteratively applied until no new mappings are discovered. The matchers used a are lexical matcher, a latent features matcher, a vector space model matcher, an instance based matcher a structural based matcher and a property based matcher. SEMA was described in 2 different articles in 2007 (Spiliopoulos et al., 2007), one of them presenting its results in the OAEI con- test (Spiliopoulos et al., 2007). � SERIMI (Araújo, de Vries, & Schwabe, 2011c) is a matching sys- tem developed for instance matching, which is mainly divided into two phases, a selection one and a disambiguation one. Dur- ing these phases information retrieval strategies and string matching techniques are applied. This system was described in 4 articles between 2011 and 2012 (Araújo, Hidders, Schwabe, & de Vries, 2011a, 2011b; Araújo, Tran, DeVries, Hidders, & Schwabe, 2012). In 2011 it also took part in the OAEI (Araújo et al., 2011c). � ServOMap (Ba & Diallo, 2013) is a large scale ontology matching system that also supports multilingual terminologies. It uses an Ontology Server (ServO) and takes advantage of Information Retrieval techniques to compute the similarity between the entities in the input ontologies. This system was recently devel- oped, in 2012 (Ba & Diallo, 2012a, 2013; Diallo & Ba, 2012). However it has already taken part in the OAEI (Ba & Diallo, 2012b). This points out that developers are actively working on the maintenance of this system, therefore new improve- ments are to be expected, in addition to those already released in 2013 (Diallo & Kammoun, 2013; Kammoun & Diallo, 2013). � SIGMa (Lacoste-Julien et al., 2013) is a knowledge base iterative propagation alignment algorithm that uses both the structural information from the relationship graph as well as some simi- larity measures between entity properties. SIGMa is also one 960 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 of the systems that can be considered as young since the first publication where it is described is from 2012 (Lacoste-Julien et al., 2012). This system is being actively maintained, and a new publication was released in 2013 (Lacoste-Julien et al., 2013). � S-Match (Giunchiglia, Autayeu, & Pane, 2012) is an open source semantic matching framework that transforms input tree-like structures such as catalogs, conceptual models, etc., into light- weight ontologies to then determine the semantic correspon- dences between them. It contains the implementation of several semantic matching algorithms, each one suitable for dif- ferent purposes. The amount of publications describing S-Match is one of the highest among those considered for this review, 7. This system has been maintained since 2003, however it was not a steady process, as the last publications have intervals between them of at least 2 years (Giunchiglia, Shvaiko, & Yatskevich, 2004, 2005a, 2005b; Giunchiglia, Yatskevich, & Shvaiko, 2007; Giunchiglia et al., 2012; Shvaiko, Giunchiglia, & Yatskevich, 2009). � SOBOM (Xu, Wang, Cheng, & Zang, 2010a) is an ontology match- ing algorithm that uses as start point a set of anchors provided by a lexical matcher. It also uses the Semantic Inductive Similar- ity Flooding algorithm to compute the similarity between the concepts of the sub-ontologies obtained from the anchors. SOBOM has been described in publications ranging from 2008 to 2012 (Xu, Tao, Zang, & Wang, 2008; Xu et al., 2010a; Xu, Wang, Cheng, & Zang, 2010b; Xu, Wang, & Liu, 2012), including those of its participation in several editions of OEAI. In the two- year period from 2010 to 2012, no publications were retrieved regarding this system. As the last article dates from 2012, new updates and enhancements are still to be expected. � SODA (Zghal, Yahia, Nguifo, & Slimani, 2007) uses linguistic and structural similarity measures to compute the alignment between two OWL-DL ontologies which are firstly transformed into graphs. This graphs undergo two successive phases, first a linguistic similarity comparison and then a structural similarity comparison, after which the semantic similarity of the graphs is obtained. This algorithm outputs the correspondences between the entities together with their similarity measure values. So far, only one article describing SODA was found (Zghal et al., 2007), published in 2007. � TaxoMap (Hamdi, Safar, Niraula, & Reynaud, 2010) provides an alignment for two OWL ontologies by exploiting the informa- tion in the labels of the concepts and the subsumption links that connect those concepts in the hierarchy. TaxoMap has been maintained since it was released in 2007 up to 2010. In this interval, 6 articles were published describing the system and its participation in the OAEI (Hamdi, Zargayouna, Safar, & Reynaud, 2008; Hamdi, Safar, Niraula, & Reynaud, 2009; Hamdi et al., 2010; Safar & Reynaud, 2009; Safar, 2007; Zargayouna, Safar, & Reynaud, 2007). � TOAST (Jachnik, Szwabe, Misiorek, & Walkowiak, 2012) is an ontology matching system based on statistical relational learn- ing. This system needs a train set from which to learn the semantics equivalence relation on the basis of partial matches. TOAST is another system, considered young, as all the publications describing it were found for 2012 (Szwabe, Misiorek, & Walkowiak, 2012). However, this system has already taken part in the OAEI (Jachnik et al., 2012). If the devel- opers continue with this trend, new articles on this system are to be expected. � WeSeE-Match (Paulheim, 2012) exploits the idea of using infor- mation available on the web to match the ontologies which would supposedly be the procedure followed by a human trying to manually match some terms without being an expert in the domain of the matched terms. Therefore this approach uses a web search engine to retrieve documents relevant to the con- cepts to match and compare the results obtained, the more sim- ilar the search results, the higher the concepts’ similarity value. This system accounts for few publications, however these report the results obtained by the system in the OAEI of 2012 and 2013 (Paulheim, 2012; Paulheim & Hertling, 2013). � WikiMatch (Hertling & Paulheim, 2012a) exploits the use of Wikipedia’s search engine to obtain documents related to the concepts being matched. Since there is no duplicity in the titles names of the articles in Wikipedia for the same language, the algorithm compares the sets of retrieved titles to obtain the sim- ilarity between the two concepts. As happens with WeSeE- Match, WikiMatch has been only described so far in two articles, although these are quite recent. One of them presenting its over- all behavior (Hertling & Paulheim, 2012a) and the other report- ing its results for the OAEI’12 (Hertling & Paulheim, 2012b). � X-SOM (Curino, Orsi, & Tanca, 2007b) combines the similarity maps output by different matching algorithms by means of a neural network and uses logical reasoning and heuristics to enhance the quality of the mappings. This system was devel- oped between 2007 and 2010, adding up a total of 4 articles (Curino, Orsi, & Tanca, 2007a; Curino et al., 2007b; Merlin, Sorjamaa, Maillet, & Lendasse, 2009; Merlin, Sorjamaa, Maillet, & Lendasse, 2010), including those that describe its par- ticipation in OAEI’07. � YAM++ (Ngo & Bellahsene, 2012a) uses machine learning tech- niques to discover the mappings between entities in two ontol- ogies, even if these are not expressed in the same natural language. It uses matchers at element and structural level. At element level the similarity is computed by some terminologi- cal metrics which can be combined by machine learning based combination methods. At structural level the ontologies are transformed into graphs and considering the results of the ter- minological metrics as the starting points, a similarity flooding algorithm propagation is run. This system has a high level of maturity as it has been continuously evolving since 2009 up to 2013 (Duchateau, Coletta, Bellahsene, & Miller, 2009b, 2009a; Ngo, Bellahsene, & Coletta, 2011; Ngo & Bellahsene, 2012a, 2012b, 2013). In this period, at least 6 articles describing its behavior and overall results in the different editions of the OAEI have been published. Hence its validity and maturity is well proven. Further considering the evolution and maturity degree of these systems, in Table 5 the amount of articles published regarding each one of the presented systems is shown. These values show a vary- ing level of development in the different systems, as the amount of publications ranges from 1 to 9. In Fig. 8 these results are disaggre- gated by year. As it suggests, systems are devoted on average 2:6 years of work, which are usually consecutive. However, as hap- pens, for instance, with ASCO the work was interrupted and resumed 3 years later. In other systems, as RiMOM, this discontin- uation in the amount of publications accounts for the development of a new version. 4.4. ’Processing Frameworks’ category This category covers two types of publications, articles devoted to researching the processing and exploiting of the ontology align- ments (25.85%) and also those that describe some enhanced align- ment frameworks and alignment formats (74.10%). Among those articles devoted to processing and exploiting the alignments, the most common topics were related to (i) ontology merging (Kim, Kim, & Chung, 2011), i.e., integrating two ontologies from different sources into a single new one with the information from both of them, (ii) ontology transformation (Šváb-Zamazal, Table 5 Amount of articles yearly devoted to each system. System Articles AgreementMaker 9 Anchor-Flood 3 AOAS 1 AROMA 8 ASCO 2 ASE 1 ASMOV 5 AUTOMSv2 2 CBW 2 CIDER 3 CODI 2 COMA 5 DSSim 4 Eff2Match 1 FalconAO 5 FBEM 2 FuzzyAlign 2 GeRoMeSuite 4 GLUE 1 GOMMA 3 HCONE 3 Hertuda 2 HotMatch 2 Hmatch 3 IF-MAP 1 iMatch 2 KOSImap 2 LDOA 1 Lily 4 LogMap 6 MaasMatch 3 MapPSO 5 MapSSS 3 MEDLEY 1 MoTo 2 OACAS 2 OLA 3 oMap 4 OMEN 1 OntoDNA 2 ontoMATCH 1 OPTIMA 2 OWL-CM 1 PRIOR+ 2 QOM 2 RiMOM 9 SAMBO 5 SEMA 2 SERIMI 4 ServOMap 5 SIGMa 2 S-Match 7 SOBOM 4 SODA 1 TaxoMap 6 TOAST 2 WeSeE-Match 2 WikiMatch 2 X-SOM 4 YAM++ 6 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 961 Svátek, & Iannone, 2010), that implies expressing an ontology with respect to another one, (iii) reasoning (Zhang, Lin, Huang, & Wu, 2011), that involves using the correspondences between the ontol- ogies as rules for reasoning with them, and (iv) alignment argumen- tation (Trojahn, Quaresma, & Vieira, 2012), that is a way of explaining the alignments by providing arguments to support or dismiss them. Despite the remarkable variety of topics in the articles from this sub-category, they represent a small percentage of the processing frameworks category, which in their majority were focused mainly on defining and developing alignment frameworks such as (Noy & Musen, 2003), where the process does not finish with the align- ment but other actions are also available for the user, like manip- ulating the alignments or performing some of the procedures previously mentioned. 4.5. ‘Practical Applications’ category The articles in this category present articles where ontology matching has been applied to a real-life problem. Within this cat- egory we found articles devoted to different subjects such as (i) semantic web and web services (Di Martino, 2009) where most pub- lications presented ways of using ontology matching for service discovery or service composition, (ii) P2P systems (Atencia, Euzenat, Pirrò, & Rousset, 2011), where ontology matching was used as a way to reduce the semantical heterogeneity between the queries the users pose to the system and the documents stored, therefore improving the accuracy of the returned results. Other fields worth mentioning are (iii) learning systems (Arch-int & Arch-int, 2013), that focus the use of ontology matching tech- niques either on narrowing down the distance between the user’s and the stored documents or as a way to ease the knowledge share and reuse among users, and last, (iv) multi-agent systems (Mascardi, Ancona, Bordini, & Ricci, 2011), where the use of ontology match- ing has always been related to guaranteeing that the different agents in a communication process could be actually able to inter- act and achieve the common goals. In spite of the growing tendency in the development of practical applications, in general lines, it does not reach the 30% of the matching systems implemented each year, as Fig. 9 reflects. This situation is quite remarkable as it suggests that only a slight part of the matching systems developed have a practical application in real-life projects. To clarify this situation we have conducted a survey among ontology matching practitioners, where we asked them mainly about the future challenges of the field and its appli- cation in real-life projects. The description and results of this sur- vey are further detailed in Section 6. 4.6. ‘Evaluation’ category Out of the articles for this literature review, 38 are devoted to evaluating the performance of the matching systems. We can split these articles into two categories regarding the scope of the arti- cles. There are 14 articles (36.84%) focused on studying the perfor- mance measures and on proposing different alternatives to evaluate the matching systems. We have included such articles in a category named elementary approaches. The remaining 24 articles (63.16%) delve into evaluation meth- ods were different existing platforms, systems or benchmarks to evaluate the matching systems are explored. Regarding the (i) elementary approaches several articles explore alternatives to the well-known information retrieval measures of precision and recall which are used in this field to evaluate respec- tively the correctness and completeness of the matching systems. Examples of such publications are the works by Paulheim, Hertling and Ritze (Paulheim, Hertling, & Ritze, 2013), Niu, Wang, Wu, Qi and Yu (Niu, Wang, Wu, Qi, & Yu, 2011) or Euzenat (Euzenat, 2007). Additionally in this category we have also included those papers that describe a new evaluation method or approach such as the ones by Ferrara, Nikolov, Noessner and Scharffe (Ferrara, Nikolov, Noessner, & Scharffe, 2013) and by Tordai, van Ossenbrug- gen, Schreiber and Wielinga (Tordai, van Ossenbruggen, Schreiber, & Wielinga, 2011). These measures and approaches are usually included as part of the systems proposed to evaluate the matching systems which are included in the (ii) evaluation methods category. In this category we have included those papers delving into the different existing Fig. 8. Evolution of the systems over the years. 962 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 platforms, systems and benchmarks used for evaluation, as well as the results of evaluating the most widespread systems against these benchmarks or in these platforms. Regarding the data sets or benchmarks for evaluation, the most well-known and used are those developed for the Ontology Align- ment Evaluation Initiative which has been taken as a reference since 2004 (Euzenat, Stuckenschmidt, & Yatskevich, 2005) and that has been evolving over the years (Rosoiu, dos Santos, & Euzenat, 2011). This initiative contains various tracks with different data sets which evaluate several features of the tested systems. The benchmark test (Euzenat, Ros�oiu, & Trojahn, 2013) is built around a seed ontology and many variations of it, and its purpose is to provide a stable and detailed picture of the contesting algo- rithms. These tests are organized into simple tests, where the objec- tive is to compare the original ontology with itself, a random one and a generalization, systematic tests, where the original ontology is to be compared with others where some modifications have been included, such as removing names, translating into other languages, flattening or expanding the hierarchy, etc., and finally, real-life ontologies. The anatomy track evaluates the matching sys- tems with the task of matching two large ontologies, the Adult Mouse Anatomy and part of the NCI Thesaurus which describes the human anatomy. The conference track contains different ontol- ogies from the conference organization domain. The interest of this track lies in the fact that these ontologies have been independently defined. The MultiFarm track aims at testing the ability of the sys- tems to deal with multilingualism. The library track is a real-world task to match two thesaurus, the STW and the TheSoz, both used in libraries for indexation and retrieval. The interactive track tests the results obtained by the systems when the user is somehow involved. The Large BioMed track consists of finding alignments between the Foundational Model of Anatomy (FMA, 2013), SNOMED Clinical Terms (SNOMED, 2013), and the National Cancer Institute Thesaurus (NCI, 2013). Finally, the Instance Matching track focus its efforts on instance matching systems and techniques. Fig. 9. Evolution of matching systems vs. practical applications. L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 963 Besides the benchmarks, in this category we have also included some actual systems used to evaluate the matching systems such as Wrigley, García-Castro, and Nixon (2012) and Tyl and Loufek (2009). In this section we have presented our classification framework and further detailed the results of classifying the retrieved articles with this classification framework. In the following section we ana- lyze the limitations to this review. 5. Limitations of the Literature Review A literature review in the field of ontology matching is a very demanding task firstly due to the amount of background knowl- edge necessary to properly sort the identified articles, and secondly by the extent of the subject itself and the number of fields where the research on ontology matching is used. The articles studied for this review were retrieved by querying online databases with different expressions regarding the ontology matching field. In spite of the high amount of articles retrieved, over 1600, it is possible that many others have not been recovered as we are dealing with a very wide field of knowledge. Other databases besides those queried for this review, could have also been used to raise the amount of retrieved articles and broaden the scope of the review, however, the databases used are considered as the most relevant ones among the practitioners. In addition only those articles in english were finally included even though some publications were written in other languages, as we considered english as the predominant language regarding the research community. In spite of the limitations previously described, this paper makes a brief review of the ontology matching field between 2000 and 2013. The articles written in this period were also sorted according to a classification framework which has allowed us to identify the different topics and problems that researchers were tackling for the last decade. Nevertheless this has also brought up several questions regarding the current research interests of researchers and practitioners, mainly whether they have continued to research on the same topics or not, and to check, for instance, if they have changed topic or even field. Another main issue that we have detected is the fact that, in the last decade lots of different matching systems and techniques have been developed, however, we could not state their use in real-life applications. In order to clarify these doubts we have designed and per- formed a survey among the researchers. Its structure as well as the results of this survey are detailed in Section 6. 6. Trends in Ontology Matching: Practitioner-oriented Survey We conducted a survey to clarify those concerns emerged from our literature review. Such concerns were mainly related to the current state of the research on ontology matching and its applica- tion in real-life projects. 6.1. Participants and Survey design The participants in the survey were selected among those tak- ing part at the OAEI contests. In a two month period, from Decem- ber 2013 to February 2014, they were individually contacted by email and presented with the questionnaire shown in Table 6. Even though the participants were directly contacted by email, their identities and responses were strictly confidential and only avail- able to the team conducting the survey. Out of the 288 experts con- tacted, we received 46 replies. The survey was designed with 8 short open-ended questions. Although we have initially considered to define some of these questions as multiple-choice, we discarded that idea as we did not want to influence at any degree in the answers provided by the participants. These questions can be classified into three groups. Questions 1 to 3 are background questions, questions 4; 5 and 7 are research field questions and questions 6 and 8 are future challenges questions. The background questions were included in the questionnaire to assess the suitability of the participants and contextualize the answers they may provide. The research field questions were designed to gain knowledge about the current fields and topics that have become more attractive to the research community, and finally future challenges questions were designed to identify according to practitioners’ point of view the main challenges that are still to be addressed and the potential expansion fields for ontology matching. 6.2. Survey results Out of the 46 answers that we received, only 5 declined the par- ticipation in the survey, one answered it partially and 13 have not been researching on ontology matching for a while. Out of this researchers that have stopped working in the field, some of them have recently stopped and they answered the questionnaire any- how, as their contribution was still relevant while others suggested a more appropriate contact within their groups to redirect the requests, half of which were answered back. To sum up, the initial Table 6 Questionnaire used for the survey. Number Question Type 1. How long have you been researching in Ontology Matching? Background 2. What are your main purposes to do it? Background 3. How many research papers have you written on topics related to Ontology Matching? Background 4. Within the Ontology Matching field, in which particular topic are you currently working on? Research field 5. From your point of view, which are the main fields where the research on Ontology Matching is currently being applied? Research field 6. According to your expertise, which are the main challenges that are still to be addressed? Future challenges 7. Will you continue to research in Ontology Matching?Why? Research field 8. In which fields do you believe that Ontology Matching could also be used? Future challenges 964 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 amount of replies was cut down to 33 actual answers with profit- able information. 6.2.1. Background questions The background of the participants in the research is quite broad and varied, and it includes different types of researchers in the field. From a temporal point of view, there are those who have started in the field more than a decade ago but also those who have recently started to work in this field, specifically values range from 1 to 14 years. Most of the practitioners (78.12%) who answered this questionnaire have been researching in the field for over 5 years, being the average number of years working in the field is slightly superior to 7. In Fig. 10, the number of years researching in the field is shown in relation to the number of researchers. Moreover some researchers have directly tackled the ontology matching problem by focussing on very specific topics such as assessing the impact of using different similarity metrics in different ontology matching tasks, aligning large ontologies or improving ontol- ogy matching by using reasoning, while others arrived to ontology matching as a support tool for matters such as data integration, semantic interoperability or telecommunications systems interoperability. Anyhow the suitability of the participants is more than satisfac- tory as they account on their own for above 460 publications of dif- ferent types linked to this subject. 6.2.2. Research field questions The research field questions were included in the questionnaire to learn about the fields where respondents are working, as well as, the fields where they think ontology matching could also be applied. Out of the answers sent by the participants, we could determine that a high percentage of them (63%) is working in any of these four topics: instance matching, user involvement in the matching process, data interlinking and discovery of different types of correspondences, not 1:1 equivalence relations. The rest of the respondents mentioned other topics such as parallel ontology matching, large-scale ontology matching, ontology matching negotia- tion and mapping reuse, which are more specific. However, besides these main research topic, most respondents included similarity metrics and combinations of methods to improve the coverage of ontology matching as a way to enhance or support their other research interests. Regarding the question about the fields where ontology match- ing is being applied, the consensus shown was noticeable. This question provided mainly two types of answers. From a practical point of view, respondents agree that the medical and life science domain is the one that is using ontology matching the most. Other researchers offered a more theoretical type of answers, mostly mentioning data integration and interoperability as the fields where ontology matching is being applied. We have found really meaningful that several researchers pointed out that nowadays the use of ontology matching tech- niques is reserved to spot cases and that the research at this time seems merely foundational. However, they also agree that ontol- ogy matching can be applied in any field where there are two par- ties that need to communicate and that employ potentially different protocols, being this way the list of use cases potentially long. Finally, when questioned about whether they would continue to research on this field, the majority, 63.64%, confirmed they would follow with present or related research lines claiming as reasons for instance, that there are still plenty of challenges to address and that the development of new domains will sparkle new matching problems. On the contrary, 30.30% of the respondents sta- ted that they would, if not yet, change subjects. Among the reasons to do so, some mentioned they have moved to other related fields such as linking open information systems or knowledge transforma- tion, while others definitely quit the field claiming the lack of use- fulness for real applications or the little incentive coming from the application side. A small percentage, 6.06% were still considering whether to change subject or not. 6.2.3. Future challenges questions These group of questions were included in the questionnaire to gain knowledge about how practitioners see the future evolution of the field and the main challenges still to be addressed. These answers provided are really useful as they identify a variety of challenges to address and quote several new fields where matching techniques could also be used and hence they could be used to guide the research lines adopted by different research groups. Regarding the main challenges still to be addressed in the ontol- ogy matching field, most respondents agree on the need to auto- matically discover complex relations, instead of 1:1, to correctly align large ontologies and to focus on applying automatically created mappings to practical applications. Other topics that arose in the responses were not supported by so many respondents but they point out anyway challenges that need to be addressed, such as: 1. Automated acquisition of reference alignment for evaluating large scale matching systems. 2. Creating large datasets to asses matching algorithms. 3. Define good tools that are easy to use for non-experts. 4. Develop high quality and fast intelligent combinations of string-based and new semantic-similarity measures. 5. Holistic ontology matching. 6. How to effectively complement automatic computation with human validation. 7. How to minimize involvement of users when turning matches into mapping. 8. Human readable explanations for matches. 9. Improving the mapping process through semi-automatic machine learning. 10. Integration of domain knowledge into alignment techniques. 11. Learning what metrics to choose in which scenario. 12. Precision and Recall of automatic methods. 13. Scalability and parallelization of the matching. Fig. 10. Number of researchers in relation with the number of years working in the field. L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 965 14. Semantic mapping. The answers provided by the practitioners for this question point out challenges that are inline with those highlighted by Euz- enat and Shvaiko in Shvaiko and Euzenat (2013) and Euzenat and Shvaiko (2013). Table 7 presents a comparison of those challenges outlined by Euzenat and Shvaiko and those mentioned by respondents to our survey. As this table states, most of the challenges issued by prac- titioners have been also considered in those by Euzenat and Shvaiko, however there are certain mismatches worth noticing. As previously stated, most researchers mainly agree on 3 chal- lenges, identified in Table 7 with ‘⁄’, however, just one of these can be classified into the categories by Euzenat and Shvaiko. It is worth noticing that one of this challenges is the application of automatically created mappings to practical applications, which supports our working hypothesis to start this survey, that the application of ontology matching techniques in real-life projects is still a field to be further developed. Some practitioners, in addition to pointing out the challenges, also used the answer to this question to mention certain situations that they consider a mistake, as the fact that most approaches to ontology matching focus on lexicographic and structural informa- tion while language is more complex than that and hence achiev- ing a perfect precision and recall is impossible in real life applications. Other aspect they complained about was benchmarks habitually used to test the matching systems (OAEI benchmarks). They claim that even if they are really useful, their main drawback is that there are yet too many artificial datasets and tasks in it. Other practitioners took their remarks a step forward claiming that science culture does not reward creating and maintaining one tool and instead everyone creates a prototype for a paper and then abandons it. Finally, regarding the fields where ontology matching could also be applied we have obtain two types of answers. Some practitio- ners gave a fuzzy answer mentioning that matching techniques could be used practically anywhere where is no standard for informa- tion exchange and where the domains are open to adopt ontology approaches or in broad sense in any information related field. On the contrary, others actually mentioned fields to apply these techniques, such as: bioinformatics, information systems, e-com- merce, web services, intrusion detection systems, cultural heritage, library science, government, education, banking, personal and social data management, law, etc. Most agree that the fact that in these fields ontologies and ontology matching techniques are not already in use is due to a lack of information regarding the potential benefits. 7. Limitations of the Survey There are, of course, limitations to this survey, the foremost being the sampling size and the population. Although we feel that our 33 final responses offered a wide variety of useful remarks and points of view, it is true that the sample is still quite small and hence our analysis may be biased. Besides, in an effort to prevent the questions from influencing the answers and to obtain as much information as possible, we have defined the questionnaire with 8 open-ended questions. This fact, possibly together with the way some questions were posed, led us to obtaining answers that, Table 7 Comparison of future challenges. Challenges identified by Euzenat & Shvaiko Challenges mentioned by practitioners LARGE-SCALE AND EFFICIENT MATCHING Automated acquisition of reference alignment for evaluating large scale matching systems Creating large datasets to asses matching algorithms Scalability and parallelization of the matching ⁄ Correctly align large ontologies MATCHING WITH BACKGROUND KNOWLEDGE Integration of domain knowledge into alignment techniques MATCHER SELECTION, COMBINATION AND TUNING Develop high quality and fast intelligent combinations of string-based and new semantic-similarity measures Learning what metrics to choose in which scenario USER INVOLVEMENT How to effectively complement automatic computation with human validation How to minimize involvement of users when turning matches into mapping EXPLANATION OF MATCHING RESULTS Human readable explanations for matches UNCERTAINTY IN ONTOLOGY MATCHING – ALIGNMENT MANAGEMENT – Other challenges that do not fit in previous categories Improving the mapping process through semi-automatic machine learning Precision and Recall of automatic methods Semantic mapping Define good tools that are easy to use for non-experts Holistic ontology matching ⁄ Automatically discover complex relations, instead of 1:1 ⁄ Focus on applying automatically created mappings to practical applications 966 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 however interesting, did not exactly match what we expected from them. Also, the participants targeted were obtained from the par- ticipants at the OAEI contests, most of which are academically ori- ented, therefore our survey may be biased towards academical researchers rather than a balance between academical and indus- trial researchers. Finally, this survey was the answer to some con- cerns that arose while conducting the literature review, and we consider the results here as a first analysis. Our intention is to revi- sit these answers looking for deeper connections between the answers, to address questions such as: Is there any relation between how long a researcher has been working in a subject and the amount of publications?, Which type of researchers are more prone to quitting?, etc. 8. Conclusions In this paper, we have achieved a twofold goal. Initially we per- formed a literature review of the ontology matching field, whose results led to the definition and development of the survey lately conducted. To address the task of performing a literature review of the ontology matching field, we have defined a classification frame- work which helped in structuring our review by providing a com- prehensive model to sort the different types of publications. This review was based on an online search of ontology matching related papers from 2003 to the first semester of 2013. The initial amount of articles obtained, over 1600, was reduced by filtering them according to their topics, keywords, abstracts and content. With the articles left after the several trimming iterations, we have initially performed a statistical evaluation and analysis. Later, we have sorted the articles following the framework and then we have analyzed each one of the categories in the framework, evalu- ating the different types of articles and topics treated. While performing this deeper analysis of the articles and their topics, some concerns arose regarding the actual research interests of the practitioners as we detected a high amount of papers related to theoretical solutions and approaches while the number of applied ones was significantly lower. The approach chosen to clar- ify these concerns was to ask openly to the research community, by means of a practitioner-oriented survey. The purpose of such survey was to gain knowledge about the current state of the ontology matching field and the application of such techniques to real-life environments. We have noticed that most researchers share the same concerns about the practical application of the ontology matching techniques, and the problem of having too many theoretical solutions but few applied ones. However, due to the nature of the survey, with open-ended ques- tions, there is more information that we have not reflected in this work and which we plan on analyzing and exploiting in the future. By means of this work we have provided a general overview of the ontology matching field in the last decade. It can be used as a starting point for new practitioners to get a general idea but also, to help in deciding on research lines, hopefully by tackling some of the challenges highlighted in the survey. References Acampora, G., Loia, V., Salerno, S., & Vitiello, A. (2012). A hybrid evolutionary approach for solving the ontology alignment problem. International Journal of Intelligent Systems, 27, 189–216. ACM. (2013). ACM Digital Library. URL: . Akbari, I., Fathian, M., & Badie, K. (2009). An improved mlma+ and its application in ontology matching. In Innovative technologies in intelligent systems and industrial applications, 2009. CITISIA 2009 (pp. 56–60). Albagli, S., Ben-Eliyahu-Zohary, R., & Shimony, S. E. (2009). Markov network based ontology matching. In IJCAI (pp. 1884–1889). Albagli, S., Ben-Eliyahu-Zohary, R., & Shimony, S. E. (2012). Markov network based ontology matching. Journal of Computer and System Sciences, 78, 105–118. Aleksovski, Z., Ten Kate, W., & Van Harmelen, F. (2008). Using multiple ontologies as background knowledge in ontology matching. In Proceedings of the 1st international workshop on collective semantics: collective intelligence and the semantic web CISWeb’08 (pp. 35–49). Araújo, S., Hidders, J., Schwabe, D., & de Vries, A. P. (2011a). Serimi – resource description similarity, rdf instance matching and interlinking. CoRR abs/ 1107.1104. URL: . Araújo, S., Hidders, J., Schwabe, D., & de Vries, A. P. (2011b). Serimi – resource description similarity, rdf instance matching and interlinking. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. URL: . Araújo, S., Tran, D., DeVries, A., Hidders, J., & Schwabe, D. (2012). Serimi: Class-based disambiguation for effective instance matching over heterogeneous web data. In Z. G. Ives, & Y. Velegrakis (Eds.), WebDB (pp. 25–30). URL: . Araújo, S., de Vries, A.P., & Schwabe, D., (2011c). SERIMI results for OAEI 2011. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I. F. Cruz (Eds.), OM, CEUR- WS.org. (pp. 212–219). Arch-int, N., & Arch-int, S. (2013). Semantic ontology mapping for interoperability of learning resource systems using a rule-based reasoning approach. Expert Systems with Applications, 40, 7428–7443. Atencia, M., Euzenat, J., Pirrò, G., & Rousset, M. C. (2011). Alignment-based trust for resource finding in semantic p2p networks. In Proceedings of the 10th http://refhub.elsevier.com/S0957-4174(14)00514-4/h0005 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0005 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0005 http://dl.acm.org/ http://refhub.elsevier.com/S0957-4174(14)00514-4/h0025 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0025 http://dblp.uni-trier.de/db/journals/corr/corr1107.html#abs-1107-1104 http://dblp.uni-trier.de/db/journals/corr/corr1107.html#abs-1107-1104 http://dblp.uni-trier.de/db/conf/semweb/om2011.html#AraujoHSV11 http://dblp.uni-trier.de/db/conf/webdb/webdb2012.html#AraujoTDHS12 http://dblp.uni-trier.de/db/conf/webdb/webdb2012.html#AraujoTDHS12 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0055 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0055 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0055 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0060 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0060 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 967 international conference on the semantic web – Volume Part I (pp. 51–66). Berlin, Heidelberg: Springer-Verlag. Aumüller, D., Do, H. H., Massmann, S., & Rahm, E. (2005). Schema and ontology matching with coma++. In F. Özcan (Ed.), SIGMOD conference (pp. 906–908). ACM. URL:. Ba, M., & Diallo, G. (2012a). Knowledge reposiory as entity similarity computing enabler. In SITIS (pp. 975–981). Ba, M., & Diallo, G. (2012b). Servomap and servomap-lt results for oaei 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N. F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. URL: . Ba, M., & Diallo, G. (2013). Large-scale biomedical ontology matching with ServOMap. IRBM, 34, 56–59. Behkamal, B., Naghibzadeh, M., & Moghadam, R. (2010). Using pattern detection techniques and refactoring to improve the performance of ASMOV. In 5th International symposium on telecommunications (IST) (pp. 979–984). Bock, J. (2010). Mappso results for oaei 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. URL: . Bock, J., Dänschel, C., & Stumpp, M. (2011). MapPSO and MapEVO results for OAEI 2011. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. (pp. 179–183). Bock, J., & Hettenhausen, J. (2008). Mappso results for oaei 2008. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. URL: . Bock, J., & Hettenhausen, J. (2010). Discrete particle swarm optimisation for ontology alignment. Information Sciences [in Press]. doi:http://dx.doi.org/ 10.1016/j.ins.2010.08.013. Bock, J., Lenk, A., & Dänschel, C., 2010. Ontology Alignment in the cloud. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao, & I. Cruz (Eds.), Proceedings of the 5th international workshop on ontology matching (OM-2010), CEUR workshop proceedings. (pp. 73–84). Bock, J., Liu, P., & Hettenhausen, J. (2009). Mappso results for oaei 2009. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR-WS.org. URL: . Calı̀, A., Lukasiewicz, T., Predoiu, L., & Stuckenschmidt, H. (2008). Tightly integrated probabilistic description logic programs for representing ontology mappings. In Proceedings of the 5th international conference on Foundations of information and knowledge systems (pp. 178–198). Berlin, Heidelberg: Springer-Verlag. Castano, S., Ferrara, A., Lorusso, D., & Montanelli, S. (2007). The hmatch 2.0 suite for ontology matchmaking. In G. Semeraro, E. D. Sciascio, C. Morbidoni, & H. Stoermer (Eds.), SWAP, CEUR-WS.org. URL: . Castano, S., Ferrara, A., Lorusso, D., & Montanelli, S. (2008). On the ontology instance matching problem. In 19th International workshop on database and expert systems application, 2008. DEXA’08 (pp. 180–184). Castano, S., Ferrara, A., & Messa, G. (2006). Results of the HMatch ontology matchmaker in OAEI 2006. In P. Shvaiko, J. Euzenat, N. F. Noy, H. Stuckenschmidt, V. R. Benjamins, & M. Uschold (Eds.), Ontology matching, CEUR-WS.org. (pp. 134–143). Cheatham, M. (2011). MapSSS results for OAEI 2011. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. (pp. 171–179). Cheatham, M., & Hitzler, P. (2013). Stringsauto and mapsss results for oaei 2013. In OM (pp. 146–152). Chua, W. W. K., & Kim, J. J. (2010). Eff2Match results for OAEI 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao, & I.F. Cruz (Eds.), OM, CEUR-WS.org. (pp. 150–157). Cohen, W. W., Ravikumar, P. D., & Fienberg, S. E. (2003). A Comparison of string distance metrics for name-matching tasks. In IIWeb (pp. 73–78). Cross, V., Stroe, C., Hu, X., Silwal, P., Panahiazar, M., Cruz, I. F., Parikh, P., & Sheth, A. P. (2011). Aligning the parasite experiment ontology and the ontology for biomedical investigations using agreementmaker. In O. Bodenreider, M. E. Martone, & A. Ruttenberg (Eds.), ICBO, CEUR-WS.org. URL: . Cruz, I. F., Antonelli, F. P., & Stroe, C. (2009a). AgreementMaker: Efficient matching for large real-world schemas and ontologies. In Proc. VLDB Endow. 2 (pp. 1586– 1589). Cruz, I. F., Antonelli, F. P., & Stroe, C. (2009b). Agreementmaker: Efficient matching for large real-world schemas and ontologies. PVLDB 2 (pp. 1586–1589). URL: . Cruz, I. F., Antonelli, F. P., & Stroe, C. (2009c). Efficient selection of mappings and automatic quality-driven combination of matching methods. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. Noy, & A. Rosenthal (Eds.), Proceedings of the 4th international workshop on ontology matching, CEUR workshop proceedings (pp. 49–60). URL: . Cruz, I. F., Antonelli, F. P., Stroe, C., Keles, U. C., & Maduko, A. (2008). Using agreementmaker to align ontologies for oaei 2009: Overview, results, and outlook. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR-WS.org. URL: . Cruz, I. F., Stroe, C., Caci, M., Caimi, F., Palmonari, M., Antonelli, F. P., & Keles, U. C. (2010). Using agreementmaker to align ontologies for oaei 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. URL: . Cruz, I. F., Stroe, C., Caimi, F., Fabiani, A., Pesquita, C., Couto, F. M., & Palmonari, M. (2011). Using agreementmaker to align ontologies for oaei 2011. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. URL: . Cruz, I. F., Stroe, C., Pesquita, C., Couto, F.M., & Cross, V. (2011). Biomedical ontology matching using the agreementmaker system. In O. Bodenreider, M. E. Martone, & A. Ruttenberg (Eds.), ICBO, CEUR-WS.org. URL: . Curino, C., Orsi, G., & Tanca, L. (2007a). X-som: A flexible ontology mapper, In Proc. of SWAE (DEXA Workshops) (pp. 424–428). Curino, C., Orsi, G., & Tanca, L. (2007b). X-SOM Results for OAEI 2007. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), OM, CEUR-WS.org. (pp. 276–285). Dang, T. T., Gabriel, A., Hertling, S., Roskosch, P., Wlotzka, M., Zilke, J. R., Janssen, F., & Paulheim, H. (2012a). HotMatch results for OEAI 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N. F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR- WS.org. (pp. 145–151). Dang, T. T., Gabriel, A., Hertling, S., Roskosch, P., Wlotzka, M., Zilke, J. R., Janssen, F., & Paulheim, H. (2012b). Hotmatch results for oeai 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N. F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR- WS.org. URL:. Dargham, J., & Fares, E. (2008). A hybrid approach for ontology mapping. In Proceedings of the 2008 international conference on semantic web and web services, SWWS 2008 (pp. 343–349). David, J. (2007). AROMA: A method for the discovery of alignments between ontologies from association rules. Thèse d’informatique. Université de Nantes. Nantes (FR). URL: . David, J. (2008a). Aroma results for oaei 2008. David, J. (2008b). Aroma results for oaei 2009. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR- WS.org. URL: . David, J. (2011). AROMA results for OAEI 2011. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. (pp. 122–125). David, J., Euzenat, J., Scharffe, F., & dos Santos, C. T. (2011). The alignment api 4.0. Semantic Web 2, 3–10. David, J., Guillet, F., & Briand, H. (2006). Matching directories and owl ontologies with aroma. In P. Yu, V. Tsotras, E. Fox, & B. Liu (Eds.), Proc. 15th ACM international conference on information and knowledge management (CIKM), Arlington (VA US) (pp. 830–831). US: ACM. Di Martino, B. (2009). Semantic web services discovery based on structural ontology matching. International Journal of Web and Grid Services, 5, 46–65. Diallo, G. & Ba, M. (2012). Effective Method for Large Scale Ontology Matching. Semantic WebApplications and Tools for Life Sciences (SWAT4LS). CEUR Workshop Proceedings, 952. Diallo, G., & Kammoun, A. (2013). Towards learning based strategy for improving the recall of the servomap matching system. In A. Paschke, A. Burger, P. Romano, M. S. Marshall, & A. Splendiani (Eds.), SWAT4LS, CEUR-WS.org. URL:. Direct, S. (2013). Science Direct. URL: . Do, H., & Rahm, E. (2002). COMA – a system for flexible combination of schema matching approaches. In Proceedings of the 28th VLDB conference, Hong Kong, China. URL: . Do, H. H. (2006). Schema matching and mapping-based data integration: Architecture, approaches and evaluation. Saarbrücken: Vdm Verlag Dr. Müller. Doan, A., Madhavan, J., Domingos, P., & Halevy, A. (2004). Ontology matching: A machine learning approach. In Handbook on ontologies in information systems (pp. 385–403). Springer-Verlag. Droge, E. (2010). Guidelines on ontology matching. Information-Wissenschaft und Praxis, 61, 143–147. Duchateau, F., Coletta, R., Bellahsene, Z., & Miller, R. J. (2009a). (not) yet another matcher. In CIKM (pp. 1537–1540). Duchateau, F., Coletta, R., Bellahsene, Z., & Miller, R. J. (2009b). Yam: A schema matcher factory. In CIKM (pp. 2079–2080). Ehrig, M., & Staab, S. (2004). QOM – quick ontology mapping. In Proc. 3rd international semantic web conference (ISWC04) (pp. 683–697). Springer. Ehrig, M., & Sure, Y. (2004). Ontology mapping – an integrated approach. In ESWS (pp. 76–91). Engmann, D., & Maßmann, S. (2007). Instance matching with coma++. In M. Jarke, T. Seidl, C. Quix, D. Kensche, S. Conrad, & E. Rahm, et al. (Eds.), BTW workshops (pp. 28–37). Aachen: Verlagshaus Mainz. URL:. Esposito, F., Fanizzi, N., & d’Amato, C. (2010). Recovering uncertain mappings through structural validation and aggregation with the moto system. In SAC (pp. 1428–1432). Euzenat, J. (2004). State of the art on ontology alignment. Knowledge Web, 2. Euzenat, J. (2007). Semantic precision and recall for ontology alignment evaluation. In IJCAI international joint conference on artificial intelligence (pp. 348–353). Euzenat, J., Guégan, P., & Valtchev, P. (2005). Ola in the oaei 2005 alignment contest. In Integrating Ontologies. Euzenat, J., Ros�oiu, M. E., & Trojahn, C. (2013). Ontology matching benchmarks: Generation, stability, and discriminability. Journal of Web Semantics, 21, 30–48. Euzenat, J., & Shvaiko, P. (2007). Ontology matching. Berlin, New York: Springer. Euzenat, J., & Shvaiko, P. (2013). Ontology matching (2nd ed.). Heidelberg (DE): Springer-Verlag. URL:. http://refhub.elsevier.com/S0957-4174(14)00514-4/h0060 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0060 http://dblp.uni-trier.de/db/conf/sigmod/sigmod2005.html#AumuellerDMR05 http://dblp.uni-trier.de/db/conf/sigmod/sigmod2005.html#AumuellerDMR05 http://dblp.uni-trier.de/db/conf/semweb/om2012.html#BaD12 http://dblp.uni-trier.de/db/conf/semweb/om2012.html#BaD12 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0085 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0085 http://dblp.uni-trier.de/db/conf/semweb/om2010.html#Bock10 http://dblp.uni-trier.de/db/conf/semweb/om2010.html#Bock10 http://dblp.uni-trier.de/db/conf/semweb/om2008.html#BockH08 http://dx.doi.org/10.1016/j.ins.2010.08.013 http://dx.doi.org/10.1016/j.ins.2010.08.013 http://ceur-ws.org http://dblp.uni-trier.de/db/conf/semweb/om2009.html#BockLH08 http://dblp.uni-trier.de/db/conf/semweb/om2009.html#BockLH08 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0125 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0125 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0125 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0125 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0125 http://dblp.uni-trier.de/db/conf/swap/swap2007.html#CastanoFLM07 http://dblp.uni-trier.de/db/conf/swap/swap2007.html#CastanoFLM07 http://dblp.uni-trier.de/db/conf/icbo/icbo2011.html#CrossSHSPCPS11 http://dblp.uni-trier.de/db/conf/icbo/icbo2011.html#CrossSHSPCPS11 http://dblp.uni-trier.de/db/journals/pvldb/pvldb2.html#CruzAS09 http://ceur-ws.org http://dblp.uni-trier.de/db/conf/semweb/om2009.html#CruzASKM08 http://dblp.uni-trier.de/db/conf/semweb/om2009.html#CruzASKM08 http://dblp.uni-trier.de/db/conf/semweb/om2010.html#CruzSCCPAK10 http://dblp.uni-trier.de/db/conf/semweb/om2010.html#CruzSCCPAK10 http://dblp.uni-trier.de/db/conf/semweb/om2011.html#CruzSCFPCP11 http://dblp.uni-trier.de/db/conf/icbo/icbo2011.html#CruzSPCC11 http://dblp.uni-trier.de/db/conf/icbo/icbo2011.html#CruzSPCC11 http://dblp.uni-trier.de/db/conf/semweb/om2012.html#DangGHRWZJP12 http://dblp.uni-trier.de/db/conf/semweb/om2012.html#DangGHRWZJP12 http://tel.archives-ouvertes.fr/tel-00200040/en/ http://dblp.uni-trier.de/db/conf/semweb/om2009.html#David08a http://dblp.uni-trier.de/db/conf/semweb/om2009.html#David08a http://refhub.elsevier.com/S0957-4174(14)00514-4/h0255 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0255 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0255 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0255 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0260 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0260 http://dblp.uni-trier.de/db/conf/swat4ls/swat4ls2013.html#DialloK13 http://dblp.uni-trier.de/db/conf/swat4ls/swat4ls2013.html#DialloK13 http://www.sciencedirect.com/ http://citeseer.nj.nec.com/do02coma.html http://refhub.elsevier.com/S0957-4174(14)00514-4/h0285 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0285 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0290 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0290 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0290 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0295 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0295 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0310 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0310 http://dblp.uni-trier.de/db/conf/btw/btw2007w.html#EngmannM07 http://dblp.uni-trier.de/db/conf/btw/btw2007w.html#EngmannM07 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0330 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0345 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0345 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0345 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0350 http://book.ontologymatching.org 968 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 Euzenat, J., Stuckenschmidt, H., & Yatskevich, M. (2005). Introduction to the ontology alignment evaluation 2005. In Proceedings of the K-Cap 2005 workshop on integrating ontologies, Banff (Canada) (pp. 97–102). Euzenat, J., & Valtchev, P. (2004). Similarity-based ontology alignment in OWL-Lite. In Proc. of the 15th european conference on artificial intelligence (ECAI) – Valencia (ES) (pp. 333–337). Falconer, S., Noy, N., & Storey, M. A. (2007). Ontology mapping – a user survey. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), Proceedings of the workshop on ontology matching (OM2007) at ISWC/ASWC2007, Busan, South Korea (pp. 113– 125). Fanizzi, N., d’Amato, C., & Esposito, F. (2011). Composite ontology matching with uncertain mappings recovery. ACM SIGAPP Applied Computing Review, 11, 17–29. Fernández, S., Velasco, J., Marsa-Maestre, I., & Lopez-Carmona, M. (2012). FuzzyAlign: A fuzzy method for ontology alignment. In: KEOD 2012 – proceedings of the international conference on knowledge engineering and ontology development (pp. 98–107). Fernández, S., Velasco, J. R., & López-Carmona, M. A. (2009). A fuzzy rule-based system for ontology mapping. In PRIMA (pp. 500–507). Ferrara, A., Nikolov, A., Noessner, J., & Scharffe, F. (2013). Evaluation of instance matching tools: The experience of {OAEI}. Web Semantics: Science, Services and Agents on the World Wide Web 21, 49–60. Special Issue on Evaluation of Semantic Technologies. FMA (2013). Foundational Model of Anatomy. URL: . Formica, A. (2006). Ontology-based concept similarity in formal concept analysis. Information Sciences, 176, 2624–2641. Fu, B., Brennan, R., & O’Sullivan, D. (2009). Multilingual ontology mapping: Challenges and a proposed framework. In Adaptive and emergent behaviour and complex systems – proceedings of the 23rd convention of the society for the study of artificial intelligence and simulation of behaviour, AISB 2009 (pp. 33–35). Fugazza, C., & Vaccari, L. (2011). Coupling human- and machine-driven mapping of SKOS thesauri. International Journal of Metadata, Semantics and Ontologies, 6, 155–165. Gal, A., & Shvaiko, P. (2008). Advances in ontology matching. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) LNCS, (Vol. 4891, pp. 176–198). Giunchiglia, F., Autayeu, A., & Pane, J. (2012). S-Match: An open source framework for matching lightweight ontologies. Semantic Web, 3, 307–317. Giunchiglia, F., Shvaiko, P., & Yatskevich, M. (2004). S-match: An algorithm and an implementation of semantic matching. In Proc. of the first european semantic web symposium – ESWS (pp. 61–75). Giunchiglia, F., Shvaiko, P., & Yatskevich, M. (2005a). S-match: An algorithm and an implementation of semantic matching. In Semantic Interoperability and Integration. Giunchiglia, F., Shvaiko, P., & Yatskevich, M. (2005b). Semantic schema matching. In OTM conferences (1) (pp. 347–365). Giunchiglia, F., Yatskevich, M., & Shvaiko, P. (2007). Semantic matching: Algorithms and implementation. Journal on Data Semantics, 9, 1–38. Glückstad, F. (2010). Terminological ontology and cognitive processes in translation. In Proceedings of the 24th Pacific Asia conference on language, information and computation (pp. 629–636). Gracia, J., & Asooja, K. (2013). Monolingual and cross-lingual ontology matching with cider-cl: Evaluation report for oaei 2013. In OM (pp. 109–116). Gracia, J., Bernad, J., & Mena, E. (2011). Ontology matching with cider: Evaluation report for oaei 2011. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I.F. Cruz (Eds.), OM, CEUR-WS.org. (pp. 1–8). Gracia, J., & Mena, E. (2008). Ontology matching with cider: Evaluation report for the oaei 2008. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. URL:. Grau, B. C., Dragisic, Z., Eckert, K., Euzenat, J., Ferrara, A., Granada, R., Ivanova, V., Jiménez-Ruiz, E., Kempf, A. O., Lambrix, P., Nikolov, A., Paulheim, H., Ritze, D., Scharffe, F., Shvaiko, P., Trojahn, C., & Zamazal, O. (2013). Results of the ontology alignment evaluation initiative 2013. In: ISWC workshop on ontology matching. Groß, A., Hartung, M., Kirsten, T., & Rahm, E. (2012). Gomma results for oaei 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N. F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. URL: . Haeri, S. H., Abolhassani, H., Qazvinian, V., & Hariri, B. B. (2007). Coincidence-based scoring of mappings in ontology alignment. JACIII, 11, 803–816. URL: . Hamdi, F., Safar, B., Niraula, N. B., & Reynaud, C. (2009). Taxomap in the oaei 2009 alignment contest. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR-WS.org. URL: . Hamdi, F., Safar, B., Niraula, N. B., & Reynaud, C. (2010). TaxoMap alignment and refinement modules: results for OAEI 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. (pp. 212–219). Hamdi, F., Zargayouna, H., Safar, B., & Reynaud, C. (2008). Taxomap in the oaei 2008 alignment contest. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. URL:. Hanif, M. S., & Aono, M. (2008a). Alignment results of anchor-flood algorithm for oaei-2008. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. URL:. Hanif, M. S., & Aono, M. (2008b). Anchor-flood: Results for oaei 2009. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR-WS.org. URL:. Hanif, M. S., & Aono, M. (2009). Anchor-flood: Results for OAEI 2009. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR-WS.org. (pp. 127–134). Hartung, M., Kolb, L., Groß, A., & Rahm, E. (2013). Optimizing similarity computations for ontology matching – experiences from GOMMA. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) LNBI (Vol. 7970, pp. 81–89). Hassen, W. (2012). MEDLEY results for OAEI 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N.F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR- WS.org. (pp. 168–172). He, W., Yang, X., & Huang, D. (2011). A hybrid approach for measuring semantic similarity between ontologies based on wordnet. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) LNAI (Vol. 7091, pp. 68–78). Hertling, S. (2012). Hertuda results for OAEI 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N. F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR- WS.org. (pp. 141–144). Hertling, S., & Paulheim, H. (2012a). WikiMatch – using Wikipedia for ontology matching. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N. F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. (pp. 220–225). Hertling, S., & Paulheim, H. (2012b). Wikimatch results for oeai 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N. F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. URL:. Hu, W., Chen, J., Cheng, G., & Qu, Y. (2010). ObjectCoref and Falcon-AO: Results for OAEI 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. URL:. Hu, W., Cheng, G., Zheng, D., Zhong, X., & Qu, Y. (2006). The results of falcon-ao in the oaei 2006 campaign. In P. Shvaiko, J. Euzenat, N. F. Noy, H. Stuckenschmidt, V. R. Benjamins, & M. Uschold (Eds.), Ontology matching, CEUR-WS.org. URL: . Hu, W., Jian, N., Qu, Y., & Wang, Y. (2005). GMO: A Graph Matching for Ontologies. In B. Ashpole, M. Ehrig, J. Euzenat, & H. Stuckenschmidt (Eds.), Integrating ontologies, CEUR-WS.org. (pp. 41–48). Hu, W., & Qu, Y. (2008). Falcon-AO: A practical ontology matching system. Web Semantics: Science, Services and Agents on the World Wide Web, 6, 237–239. Hu, W., Zhao, Y., Li, D., Cheng, G., Wu, H., & Qu, Y. (2007). Falcon-ao: Results for oaei 2007. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), OM, CEUR-WS.org. URL: . Huber, J., Sztyler, T., Nößner, J., & Meilicke, C. (2011). CODI: combinatorial optimization for data integration: Results for OAEI 2011. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. (pp. 134–141). IEEXplore. (2013). IEEEXplore Digital Library. URL: . Jachnik, A., Szwabe, A., Misiorek, P., & Walkowiak, P. (2012). TOAST results for OAEI 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N.F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. (pp. 205–212). Jean-Mary, Y. R., & Kabuka, M. R. (2007). Asmov results for oaei 2007. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), OM, CEUR-WS.org. URL: . Jean-Mary, Y. R., & Kabuka, M. R. (2008). Asmov: Results for oaei 2008. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. URL: . Jean-Mary, Y. R., Shironoshita, E. P., & Kabuka, M. R. (2008). Asmov: Results for oaei 2009. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR-WS.org. URL: . Jean-Mary, Y. R., Shironoshita, E. P., & Kabuka, M. R. (2009). Ontology matching with semantic verification. Web Semantics, 7, 235–251. Jean-Mary, Y. R., Shironoshita, E. P., & Kabuka, M. R. (2010). Asmov: Results for oaei 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. URL: Jian, N., Hu, W., Cheng, G., & Qu, Y. (2005). Falconao: Aligning ontologies with falcon. In B. Ashpole, M. Ehrig, J. Euzenat, & H. Stuckenschmidt (Eds.), Integrating Ontologies, CEUR-WS.org. URL:. Jiménez-Ruiz, E., & Cuenca Grau, B. (2011). LogMap: Logic-based and scalable ontology matching. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) LNCS, (Vol. 7031, pp. 273–288). Jiménez-Ruiz, E., Grau, B. C., & Horrocks, I. (2012). Logmap and logmaplt results for oaei 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N. F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. URL: . Jiménez-Ruiz, E., Grau, B. C., & Zhou, Y. (2011). Logmap 2.0: Towards logic-based, scalable and interactive ontology matching. In A. Paschke, A. Burger, P. Romando, M. S. Marshall, & A. Splendiani (Eds.), SWAT4LS (pp. 45–46). ACM. http://refhub.elsevier.com/S0957-4174(14)00514-4/h0375 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0375 http://sig.biostr.washington.edu/projects/fm/ http://sig.biostr.washington.edu/projects/fm/ http://refhub.elsevier.com/S0957-4174(14)00514-4/h0400 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0400 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0410 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0410 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0410 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0420 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0420 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0440 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0440 http://dblp.uni-trier.de/db/conf/semweb/om2008.html#GraciaM08 http://dblp.uni-trier.de/db/conf/semweb/om2008.html#GraciaM08 http://dblp.uni-trier.de/db/conf/semweb/om2012.html#GrossHKR12 http://dblp.uni-trier.de/db/conf/semweb/om2012.html#GrossHKR12 http://dblp.uni-trier.de/db/journals/jaciii/jaciii11.html#HosseinAQH07 http://dblp.uni-trier.de/db/journals/jaciii/jaciii11.html#HosseinAQH07 http://dblp.uni-trier.de/db/conf/semweb/om2009.html#HamdiSNR08 http://dblp.uni-trier.de/db/conf/semweb/om2009.html#HamdiSNR08 http://dblp.uni-trier.de/db/conf/semweb/om2008.html#HamdiZSR08 http://dblp.uni-trier.de/db/conf/semweb/om2008.html#HamdiZSR08 http://dblp.uni-trier.de/db/conf/semweb/om2008.html#HanifA08 http://dblp.uni-trier.de/db/conf/semweb/om2008.html#HanifA08 http://dblp.uni-trier.de/db/conf/semweb/om2009.html#HanifA08a http://dblp.uni-trier.de/db/conf/semweb/om2009.html#HanifA08a http://dblp.uni-trier.de/db/conf/semweb/om2012.html#HertlingP12a http://dblp.uni-trier.de/db/conf/semweb/om2012.html#HertlingP12a http://dblp.uni-trier.de/db/conf/semweb/om2010.html#HuCCQ10 http://dblp.uni-trier.de/db/conf/semweb/om2010.html#HuCCQ10 http://dblp.uni-trier.de/db/conf/semweb/om2006.html#HuCZZQ06 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0555 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0555 http://dblp.uni-trier.de/db/conf/semweb/om2007.html#HuZLCWQ07 http://ieeexplore.ieee.org/Xplore/home.jsp http://ieeexplore.ieee.org/Xplore/home.jsp http://dblp.uni-trier.de/db/conf/semweb/om2007.html#Jean-MaryK07 http://dblp.uni-trier.de/db/conf/semweb/om2007.html#Jean-MaryK07 http://dblp.uni-trier.de/db/conf/semweb/om2008.html#Jean-MaryK08 http://dblp.uni-trier.de/db/conf/semweb/om2009.html#Jean-MarySK08 http://dblp.uni-trier.de/db/conf/semweb/om2009.html#Jean-MarySK08 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0595 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0595 http://dblp.uni-trier.de/db/conf/semweb/om2010.html#Jean-MarySK10 http://dblp.uni-trier.de/db/conf/semweb/om2010.html#Jean-MarySK10 http://dblp.uni-trier.de/db/conf/kcap/kcap2005w.html#JianHCQ05 http://dblp.uni-trier.de/db/conf/kcap/kcap2005w.html#JianHCQ05 http://dblp.uni-trier.de/db/conf/semweb/om2012.html#Jimenez-RuizGH12 http://dblp.uni-trier.de/db/conf/semweb/om2012.html#Jimenez-RuizGH12 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 969 URL:.. Jiménez-Ruiz, E., Meilicke, C., Grau, B. C., & Horrocks, I. (2013). Evaluating mapping repair systems with large biomedical ontologies. In 26th International workshop on description logics (pp. 246–257). Jiménez-Ruiz, E., Morant, A., & Grau, B. C. (2011). Logmap results for oaei 2011. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I. F. Cruz (Eds.), OM, CEUR- WS.org. URL: . Joslyn, C., Paulson, P., & White, A. M. (2009). Measuring the structural preservation of semantic hierarchy alignment. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR-WS.org. (pp. 1–12). Kachroudi, M., Moussa, E. B., Zghal, S., & Yahia, S. B. (2011). LDOA results for OAEI 2011. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. (pp. 148–156). Kalfoglou, Y., & Schorlemmer, M. (2003a). IF-Map: An ontology-mapping method based on information-flow theory. Journal of Data Semantics, 98–127. Kalfoglou, Y., & Schorlemmer, M. (2003b). Ontology mapping: The state of the art. Science, 2, 3. Kammoun, A., & Diallo, G. (2013). Servomap results for oaei 2013. In OM (pp. 169– 176). Kengue, J. F. D., Euzenat, J., & Valtchev, P. (2007). OLA in the OAEI 2007 evaluation contest. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), OM, CEUR- WS.org. (pp. 188–195). Kensche, D., Quix, C., Li, X., & Li, Y. (2007). Geromesuite: A system for holistic generic model management. In C. Koch, J. Gehrke, M. N. Garofalakis, D. Srivastava, K. Aberer, A. Deshpande, D. Florescu, C. Y. Chan, V. Ganti, C. C. Kanne, W. Klas, & E. J. Neuhold (Eds.), VLDB (pp. 1322–1325). ACM. URL: .. Kim, J., Kim, P., & Chung, H. (2011). Ontology construction using online ontologies based on selection, mapping and merging. International Journal of Web and Grid Services, 7, 170–189. Kirsten, T., Gross, A., Hartung, M., & Rahm, E. (2011). Gomma: A component-based infrastructure for managing and analyzing life science ontologies and their evolution. Journal of Biomedical Semantics, 2, 6. Kiu, C. C., & Lee, C. S. (2006). Ontology mapping and merging through ontodna for learning object reusability. Educational Technology and Society, 9, 27–42. URL: . Kiu, C. C., & Lee, C. S. (2007). OntoDNA: Ontology alignment results for OAEI 2007. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), OM, CEUR-WS.org. (pp. 196–205). Kotis, K., Katasonov, A., & Leino, J. (2012a). ASE results for OAEI 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N. F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. (pp. 116–123). Kotis, K., Katasonov, A., & Leino, J. (2012b). AUTOMSv2 results for OAEI 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N. F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. (pp. 124–132). Kotis, K., & Lanzenberger, M. (2008). Ontology matching: Current status, dilemmas and future challenges. In International conference on complex, intelligent and software intensive systems, 2008. CISIS 2008 (pp. 924–927). Kotis, K., Valarakos, A. G., & Vouros, G. A. (2006). Automs: Automated ontology mapping through synthesis of methods. In P. Shvaiko, J. Euzenat, N. F. Noy, H. Stuckenschmidt, V. R. Benjamins, & M. Uschold (Eds.), Ontology matching, CEUR- WS.org. Kotis, K., & Vouros, G. A. (2004). The HCONE approach to ontology merging. In Proceedings of the 21st international symposium on computer architecture (pp. 137–151). IEEE Computer Society Press. Kotis, K., Vouros, G. A., & Stergiou, K. (2006). Towards automatic merging of domain ontologies: The hcone-merge approach. Journal of Web Semantics, 4, 60–79. URL: . Lacoste-Julien, S., Palla, K., Davies, A., Kasneci, G., Graepel, T., & Ghahramani, Z. (2012). Sigma: Simple greedy matching for aligning large knowledge bases. CoRR abs/1207.4525. Lacoste-Julien, S., Palla, K., Davies, A., Kasneci, G., Graepel, T., & Ghahramani, Z. (2013). SIGMa: simple greedy matching for aligning large knowledge bases. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 572–580). New York, NY, USA: ACM. http:// dx.doi.org/10.1145/2487575.2487592. Lai, J., Wang, Y., Zhang, R., Gu, X., Yu, T., & Li, J., 2010. Aggregating multiple ontology similarity based on iowa operator. In 2010 2nd International workshop on database technology and applications (DBTA) (pp. 1–4). Lambrix, P., & Tan, H. (2005). A framework for aligning ontologies. In F. Fages & S. Soliman (Eds.), PPSWR (pp. 17–31). Springer. URL: . Lambrix, P., & Tan, H. (2006). Sambo – a system for aligning and merging biomedical ontologies. Journal of Web Semantics, 4, 196–206. Lambrix, P., Tan, H., & Liu, Q. (2008). SAMBO and SAMBOdtf Results for the Ontology Alignment Evaluation Initiative 2008. Proceedings of the 3rd International Workshop on Ontology Matching (OM-2008). CEUR Workshop Proceedings, 431,190–198. Lauser, B., Johannsen, G., Caracciolo, C., Van Hage, W., Keizer, J., & Mayr, P. (2008). Comparing human and automatic thesaurus mapping approaches in the agricultural domain. In Proceedings of the international conference on dublin core and metadata applications (pp. 43–53). Le, B. T., Dieng-Kuntz, R., & Gandon, F. (2004). On ontology matching problems – for building a corporate semantic web in a multi-communities organization. In ICEIS (4) (pp. 236–243). Li, J., Tang, J., Li, Y., & Luo, Q. (2009). RiMOM: A dynamic multistrategy ontology alignment framework. IEEE Transactions on Knowledge and Data Engineering, 21, 1218–1232. doi: http://dx.doi.org/10.1109/TKDE.2008.202. Li, Y., Li, J., Zhang, D., & Tang, J. (2006). Result of ontology alignment with rimom at oaei’06. In P. Shvaiko, J. Euzenat, N. F. Noy, H. Stuckenschmidt, V. R. Benjamins, & M. Uschold (Eds.), Ontology matching, CEUR-WS.org. URL: Li, Y., Zhong, Q., Li, J., & Tang, J. (2007). Result of ontology alignment with rimom at oaei’07. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), OM, CEUR-WS.org. URL: . Lin, F., & Sandkuhl, K. (2007). Polygon-based similarity aggregation for ontology matching. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) LNCS (Vol. 4743, pp. 255– 264). Lin, F., & Sandkuhl, K. (2008a). A survey of exploiting WordNet in ontology matching. IFIP International Federation for Information Processing, 276, 341–350. Lin, F., & Sandkuhl, K. (2008b). User-based constraint strategy in ontology matching. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) LNCS, (Vol. 5061, pp. 687–696). Liu, B., Cao, S. G., Cao, D. F., Li, Q. C., Liu, H. T., & Shi, S. N. (2012). An ontology based semantic heterogeneity measurement framework for optimization in distributed data mining. In 2012 International conference on machine learning and cybernetics (ICMLC) (pp. 118–123). Loia, V., Fenza, G., De Maio, C., & Salerno, S. (2013). Hybrid methodologies to foster ontology-based knowledge management platform. In 2013 IEEE symposium on intelligent agent (IA) (pp. 36–43). Lu, Z. (2010). ontoMATCH: A probabilistic architecture for ontology matching. In Conference on information sciences and interaction sciences (ICIS), 2010 3rd international (pp. 174–180). Mao, M., & Peng, Y. (2006). Prior system: Results for oaei 2006. In Ontology matching. Mao, M., & Peng, Y. (2007). The PRIOR+: Results for OAEI campaign 2007. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), OM, CEUR-WS.org. (pp. 219– 226). Mascardi, V., Ancona, D., Bordini, R., & Ricci, A. (2011). Cool-agentspeak: Enhancing agentspeak-dl agents with plan exchange and ontology services. In International conference on web intelligence and intelligent agent technology (WI-IAT), 2011 IEEE/WIC/ACM (pp. 109–116). Mascardi, V., Locoro, A., & Rosso, P. (2010). Automatic ontology matching via upper ontologies: A systematic evaluation. IEEE Transactions on Knowledge and Data Engineering, 22, 609–623. Massmann, S., Engmann, D., & Rahm, E. (2006). Coma++: Results for the ontology alignment contest oaei 2006. In P. Shvaiko, J. Euzenat, N. F. Noy, H. Stuckenschmidt, V. R. Benjamins, & M. Uschold (Eds.), Ontology matching, CEUR-WS.org. URL: . Maßmann, S., Raunich, S., Aumüller, D., Arnold, P., & Rahm, E. (2011). Evolution of the COMA match system. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. (pp. 49–60). Merlin, P., Sorjamaa, A., Maillet, B., & Lendasse, A. (2009). X-som and l-som: a nested approach for missing value imputation. In ESANN. URL: . Merlin, P., Sorjamaa, A., Maillet, B., & Lendasse, A. (2010). X-som and l-som: A double classification approach for missing value imputation. Neurocomputing, 73, 1103–1108. URL: . Mitra, P., Noy, N. F., & Jaiswal, A. R. (2005). Ontology mapping discovery with uncertainty. In Fourth international conference on the semantic web (ISWC-2005) (pp. 537–547). Nagy, M., Vargas-Vera, M., & Motta, E. (2006). Dssim-ontology mapping with uncertainty. In P. Shvaiko, J. Euzenat, N. F. Noy, H. Stuckenschmidt, V. R. Benjamins, & M. Uschold (Eds.), Ontology matching, CEUR-WS.org. URL:. Nagy, M., Vargas-Vera, M., & Motta, E. (2007). Dssim – managing uncertainty on the semantic web. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), OM, CEUR- WS.org. URL: . Nagy, M., Vargas-Vera, M., & Stolarski, P. (2009). DSSim results for OAEI 2009. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR-WS.org. (pp. 160–169). Nagy, M., Vargas-Vera, M., Stolarski, P., & Motta, E. (2008). Dssim results for oaei 2008. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. URL: . Nasir, S., & Noor, N. (2010). Analysing the effectiveness of COMA++ on the mapping between traditional Malay textile (TMT) knowledge model and CIDOC CRM. In Proceedings 2010 international symposium on information technology – visual informatics, ITSim’10 1 (pp. 1–6). NCI (2013). National Cancer Institute Thesaurus. URL: . Ngo, D., & Bellahsene, Z. (2012a). YAM++: A multi-strategy based approach for ontology matching task. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) LNAI (Vol. 7603, pp. 421–425). Ngo, D., & Bellahsene, Z. (2012b). Yam++ results for oaei 2012. In OM. Ngo, D., & Bellahsene, Z. (2013). Yam++ results for oaei 2013. In OM (pp. 211–218). Ngo, D., Bellahsene, Z., & Coletta, R. (2011). Yam++ results for oaei 2011. In OM. http://dblp.uni-trier.de/db/conf/swat4ls/swat4ls2011.html#Jimenez-RuizGZ11 http://dblp.uni-trier.de/db/conf/swat4ls/swat4ls2011.html#Jimenez-RuizGZ11 http://dblp.uni-trier.de/db/conf/semweb/om2011.html#Jimenez-RuizMG11 http://dblp.uni-trier.de/db/conf/semweb/om2011.html#Jimenez-RuizMG11 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0645 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0645 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0650 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0650 http://dblp.uni-trier.de/db/conf/vldb/vldb2007.html#KenscheQLL07 http://dblp.uni-trier.de/db/conf/vldb/vldb2007.html#KenscheQLL07 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0670 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0670 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0670 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0675 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0675 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0675 http://dblp.uni-trier.de/db/journals/ets/ets9.html#KiuL06 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0710 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0710 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0710 http://dblp.uni-trier.de/db/journals/ws/ws4.html#KotisVS06 http://dx.doi.org/10.1145/2487575.2487592 http://dx.doi.org/10.1145/2487575.2487592 http://dblp.uni-trier.de/db/conf/ppswr/ppswr2005.html#LambrixT05 http://dblp.uni-trier.de/db/conf/ppswr/ppswr2005.html#LambrixT05 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0740 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0740 http://dx.doi.org/10.1109/TKDE.2008.202 http://dblp.uni-trier.de/db/conf/semweb/om2006.html#LiLZT06 http://dblp.uni-trier.de/db/conf/semweb/om2006.html#LiLZT06 http://dblp.uni-trier.de/db/conf/semweb/om2007.html#LiZLT07 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0780 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0780 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0780 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0820 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0820 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0820 http://dblp.uni-trier.de/db/conf/semweb/om2006.html#MassmannER06 http://dblp.uni-trier.de/db/conf/semweb/om2006.html#MassmannER06 http://dblp.uni-trier.de/db/conf/esann/esann2009.html#MerlinSML09 http://dblp.uni-trier.de/db/conf/esann/esann2009.html#MerlinSML09 http://dblp.uni-trier.de/db/journals/ijon/ijon73.html#MerlinSML10 http://dblp.uni-trier.de/db/journals/ijon/ijon73.html#MerlinSML10 http://dblp.uni-trier.de/db/conf/semweb/om2006.html#NagyVM06 http://dblp.uni-trier.de/db/conf/semweb/om2006.html#NagyVM06 http://dblp.uni-trier.de/db/conf/semweb/om2007.html#NagyVM07a http://dblp.uni-trier.de/db/conf/semweb/om2007.html#NagyVM07a http://dblp.uni-trier.de/db/conf/semweb/om2008.html#NagyVSM08 http://dblp.uni-trier.de/db/conf/semweb/om2008.html#NagyVSM08 http://ncit.nci.nih.gov/ 970 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 Ngo, D., Bellahsene, Z., & Todorov, K. (2013). Opening the black box of ontology matching. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) LNCS (Vol. 7882, pp. 16– 30). Niu, X., Wang, H., Wu, G., Qi, G., & Yu, Y. (2011). Evaluating the stability and credibility of ontology matching methods. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) LNCS (Vol. 6643, pp. 275–289). Noessner, J., & Niepert, M. (2010). Codi: Combinatorial optimization for data integration: Results for oaei 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. URL: Noy, N. F., & Musen, M. A. (2003). The {PROMPT} suite: Interactive tools for ontology merging and mapping. International Journal of Human-Computer Studies, 59, 983–1024. OAEI (2013). Ontology Alignment Evaluation Initiative. URL: . Paulheim, H. (2012). WeSeE-Match results for OEAI 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N. F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR- WS.org. (pp. 213–219). Paulheim, H., & Hertling, S. (2013). Wesee-match results for oaei 2013. In OM (pp. 197–202). Paulheim, H., Hertling, S., & Ritze, D. (2013). Towards evaluating interactive ontology matching tools. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) LNCS (Vol. 7882, pp. 31–45). Pesquita, C., Stroe, C., Cruz, I. F., & Couto, F. M. (2010). Blooms on agreementmaker: results for oaei 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. URL: . Qazvinian, V., Abolhassani, H., (Hossein), S. H. H., & Hariri, B. B. (2008). Evolutionary coincidence-based ontology mapping extraction. Expert Systems, 25, 221–236. Quix, C., Gal, A., Sagi, T., & Kensche, D. (2010). An integrated matching system GeRoMeSuite and SMB: results for OAEI 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. (pp. 166–171). Quix, C., Geisler, S., Kensche, D., & Li, X. (2008). Results of GeRoMeSuite for OAEI 2008. Proceedings of the 3rd International Workshop on Ontology Matching (OM-2008). CEUR Workshop Proceedings, 431, 160–166. Quix, C., Geisler, S., Kensche, D., & Li, X. (2009). Results of geromesuite for oaei 2009. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR-WS.org. URL: . Reul, Q., & Pan, J. (2010). KOSImap: Use of description logic reasoning to align heterogeneous ontologies. In CEUR workshop proceedings 573 (pp. 489–500). Reul, Q., & Pan, J. Z. (2009). Kosimap: Ontology alignments results for oaei 2009. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR-WS.org. URL: . Rosoiu, M. E., dos Santos, C. T., & Euzenat, J. (2011). Ontology matching benchmarks: generation and evaluation. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. (pp. 1–12). Rubiolo, M., Caliusco, M., Stegmayer, G., Coronel, M., & Fabrizi, M. G. (2012). Knowledge discovery through ontology matching: An approach based on an artificial neural network model. Information Sciences, 194, 107–119. Intelligent Knowledge-Based Models and Methodologies for Complex Information Systems. Safar, B. (2007). Exploiting wordnet as background knowledge. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), Proceedings of the workshop on ontology matching (OM2007) at ISWC/ASWC2007, Busan, South Korea. Safar, B., & Reynaud, C. (2009). Alignement d’ontologies basé sur des ressources complémentaires illustration sur le système taxomap. Technique et Science Informatiques, 28, 1211–1232. URL: . Sánchez-Ruiz, A., Ontañón, S., González-Calero, P., & Plaza, E. (2011). Measuring similarity in description logics using refinement operators. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) LNAI (Vol. 6880, pp. 289–303). Schadd, F. C., & Roos, N. (2011). Maasmatch results for oaei 2011. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. URL: . Schadd, F. C., & Roos, N. (2012a). MaasMatch results for OAEI 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N. F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. (pp. 160–167). Schadd, F. C., & Roos, N. (2012b). Maasmatch results for oaei 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N. F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. URL: . Schadd, F. C., & Roos, N. (2013). Summary of the maasmatch participation in the oaei-2013 campaign. In OM (pp. 139–145). Scharffe, F., Zamazal, O., & Fensel, D. (2013). Ontology alignment design patterns. Knowledge and Information Systems, 1–28. Scopus (2013). Scopus. URL: . Shah, G., & Syeda-Mahmood, T. (2004). Searching databases for sematically-related schemas. In Proceedings of the 27th annual international ACM SIGIR conference on research and development in information retrieval (pp. 504–505). New York, NY, USA: ACM. Shvaiko, P., & Euzenat, J. (2005). A survey of schema-based matching approaches. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) LNCS (Vol. 3730, pp. 146–171). Cited By (since 1996)243. Shvaiko, P., & Euzenat, J. (2013). Ontology matching: State of the art and future challenges. IEEE Transactions on Knowledge and Data Engineering, 25, 158–176. Shvaiko, P., Giunchiglia, F., & Yatskevich, M. (2009). Semantic matching with s- match. In R. D. Virgilio, F. Giunchiglia, & L. Tanca (Eds.), Semantic Web Information Management (pp. 183–202). Springer. URL: . SNOMED. (2013). SNOMED Clinical Terms. URL: . Spiliopoulos, V., Valarakos, A.G., Vouros, G. A., & Karkaletsis, V. (2007). SEMA: Results for the ontology alignment contest OAEI 2007. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), OM, CEUR-WS.org. (pp. 244–254). Spiliopoulos, V., Vouros, G. A., & Karkaletsis, V. (2010). On the discovery of subsumption relations for the alignment of ontologies. Web Semantics: Science, Services and Agents on the World Wide Web, 8, 69–88. Stoermer, H., & Rassadko, N. (2009a). Results of OKKAM feature based entity matching algorithm for instance matching contest of OAEI 2009. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR-WS.org. (pp. 200–207). Stoermer, H., & Rassadko, N. (2009b). Results of okkam feature based entity matching algorithm for instance matching contest of oaei 2009. In OM. Stoermer, H., Rassadko, N., & Vaidya, N. (2010). Feature-based entity matching: The fbem model, implementation, evaluation. In CAiSE (pp. 180–193). Straccia, U., & Troncy, R. (2005a). omap: An implemented framework for automatically aligning owl ontologies. In SWAP. Straccia, U., & Troncy, R. (2005b). oMAP: Combining classifiers for aligning OWL ontologies. In Proceedings of the 6th international conference on web information systems engineering (WISE 2005), New York City, USA. (pp. 133–147). URL: . Straccia, U., & Troncy, R. (2005c). oMAP: Results of the ontology alignment contest. In Proceedings of the K-Cap workshop on integrating ontologies (IntOnt 2005), Banff, Canada (pp. 92–96). Straccia, U., & Troncy, R. (2006). Towards distributed information retrieval in the semantic web: Query reformulation using the omap framework. In ESWC (pp. 378–392). Sunna, W., & Cruz, I. F. (2007). Using the agreementmaker to align ontologies for the oaei campaign 2007. In Shvaiko, P., Euzenat, J., Giunchiglia, F., & He, B. (Eds.), OM, CEUR-WS.org. URL: . Šváb-Zamazal, O., Svátek, V., & Iannone, L. (2010). Pattern-based ontology transformation service exploiting oppl and owl-api. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) LNAI (Vol. 6317, pp. 105–119). Szwabe, A., Misiorek, P., & Walkowiak, P. (2012). Tensor-based relational learning for ontology matching. In KES (pp. 509–518). Tan, H., Jakoniene, V., Lambrix, P., Aberg, J., & Shahmehri, N. (2006). Alignment of biomedical ontologies using life science literature. In E. G. Bremer, J. Hakenberg, E. H. Han, D. P. Berrar, & W. Dubitzky (Eds.), KDLL (pp. 1–17). Springer. URL: . Tan, H., & Lambrix, P. (2007). Sambo results for the ontology alignment evaluation initiative 2007. In OM. Tang, J., Li, J., Liang, B., Hunag, X., Li, Y., & Wang, K. (2006). Using Bayesian decision for ontology mapping. Web Semantics: Science, Services and Agents on the World Wide Web, 4, 243–262. http://dx.doi.org/10.1016/j.websem.2006.06.001. Tang, J., Liang, B. Y., Li, J., & Wang, K. (2004). Risk minimization based ontology mapping. In C. H. Chi & K. Y. Lam (Eds.), Proceedings of the advanced workshop on content computing (pp. 469–480). Berlin, Heidelberg: Springer. http://dx.doi.org/ 10.1007/978-3-540-30483-8_58. Thanh Le, B., & Dieng-Kuntz, R. (2007). A graph-based algorithm for alignment of owl ontologies. In Proceedings of the IEEE/WIC/ACM international conference on web intelligence (pp. 466–469). Washington, DC, USA: IEEE Computer Society. http:// dx.doi.org/10.1109/WI.2007.10. URL:. Thayasivam, U., Chaudhari, T., & Doshi, P., 2012. Optima+ results for OAEI 2012. In P. Shvaiko, J. Euzenat, A. Kementsietsidis, M. Mao, N. F. Noy, & H. Stuckenschmidt (Eds.), OM, CEUR-WS.org. (pp. 181–188). Thayasivam, U., & Doshi, P. (2011). Optima results for oaei 2011. In OM. Tian, X., & Guo, Y. (2010). A cosine theorem based algorithm for similarity aggregation of ontologies. In 2010 2nd International conference on signal processing systems (ICSPS) (pp. V2-16–V2-19). Todorov, K., Geibel, P., & Kühnberger, K. U. (2010). Mining concept similarities for heterogeneous ontologies. Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics) LNAI (Vol. 6171, pp. 86–100). Tomaszewski, B., & Holden, E. (2012). The geographic information science and technology and information technology bodies of knowledge: An ontological alignment. In Proceedings of the 13th annual conference on Information technology education (pp. 195–200). New York, NY, USA: ACM. Tordai, A., van Ossenbruggen, J., Schreiber, G., & Wielinga, B. (2011). Let’s agree to disagree: On the evaluation of vocabulary alignment. In Proceedings of the sixth international conference on knowledge capture (pp. 65–72). New York, NY, USA: ACM. http://dblp.uni-trier.de/db/conf/semweb/om2010.html#NoessnerN10 http://dblp.uni-trier.de/db/conf/semweb/om2010.html#NoessnerN10 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0915 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0915 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0915 http://oaei.ontologymatching.org/ http://oaei.ontologymatching.org/ http://dblp.uni-trier.de/db/conf/semweb/om2010.html#PesquitaSCC10 http://dblp.uni-trier.de/db/conf/semweb/om2010.html#PesquitaSCC10 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0945 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0945 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0945 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0955 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0955 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0955 http://dblp.uni-trier.de/db/conf/semweb/om2009.html#QuixGKL08a http://dblp.uni-trier.de/db/conf/semweb/om2009.html#QuixGKL08a http://dblp.uni-trier.de/db/conf/semweb/om2009.html#ReulP08 http://dblp.uni-trier.de/db/conf/semweb/om2009.html#ReulP08 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0980 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0980 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0980 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0980 http://refhub.elsevier.com/S0957-4174(14)00514-4/h0980 http://dblp.uni-trier.de/db/journals/tsi/tsi28.html#SafarR09 http://dblp.uni-trier.de/db/journals/tsi/tsi28.html#SafarR09 http://dblp.uni-trier.de/db/conf/semweb/om2011.html#SchaddR11 http://dblp.uni-trier.de/db/conf/semweb/om2012.html#SchaddR12a http://dblp.uni-trier.de/db/conf/semweb/om2012.html#SchaddR12a http://refhub.elsevier.com/S0957-4174(14)00514-4/h1020 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1020 http://www.scopus.com/ http://refhub.elsevier.com/S0957-4174(14)00514-4/h1030 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1030 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1030 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1030 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1040 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1040 http://dblp.uni-trier.de/db/books/collections/Virgilio2009.html#ShvaikoGY09 http://dblp.uni-trier.de/db/books/collections/Virgilio2009.html#ShvaikoGY09 http://www.ihtsdo.org/index.php?id=545 http://www.ihtsdo.org/index.php?id=545 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1065 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1065 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1065 http://www.cwi.nl/media/publications/wise05troncy.pdf http://dblp.uni-trier.de/db/conf/semweb/om2007.html#SunnaC07 http://dblp.uni-trier.de/db/conf/semweb/om2007.html#SunnaC07 http://dblp.uni-trier.de/db/conf/pakdd/kdll2006.html#TanJLAS06 http://dx.doi.org/10.1016/j.websem.2006.06.001 http://dx.doi.org/10.1007/978-3-540-30483-8_58 http://dx.doi.org/10.1007/978-3-540-30483-8_58 http://dx.doi.org/10.1109/WI.2007.10 http://dx.doi.org/10.1109/WI.2007.10 http://dx.doi.org/10.1109/WI.2007.10 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1165 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1165 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1165 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1165 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1170 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1170 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1170 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1170 L. Otero-Cerdeira et al. / Expert Systems with Applications 42 (2015) 949–971 971 Trojahn, C., Quaresma, P., & Vieira, R. (2012). Exploiting majority acceptable arguments for ontology matching. International Journal of Artificial Intelligence, 8, 1–19. Tyl, P., & Loufek, J. (2009). Comp – comparing ontology matching plug-in. In Fifth international conference on next generation web services practices, 2009. NWESP’09 (pp. 44–49). Vouros, G. A., & Kotis, K. (2005). Extending hcone-merge by approximating the intended meaning of ontology concepts iteratively. In A. Gómez-Pérez & J. Euzenat (Eds.), ESWC (pp. 198–210). Springer. URL: . Wang, P. (2011). Lily results on SEALS platform for OAEI 2011. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. (pp. 156–162). Wang, P., & Xu, B. (2007). Lily: The results for the ontology alignment contest oaei 2007. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), OM, CEUR-WS.org. URL:. Wang, P., & Xu, B. (2008). Lily: Ontology alignment results for oaei 2008. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & H. Stuckenschmidt (Eds.), OM, CEUR- WS.org. URL:. Wang, P., & Xu, B. (2009). Lily: Ontology alignment results for oaei 2009. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR-WS.org. URL: . Wang, Z., Zhang, X., Hou, L., Zhao, Y., Li, J., Qi, Y., & Tang, J. (2010). RiMOM results for OAEI 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao, & I. F. Cruz (Eds.) OM, CEUR-WS.org. (pp. 195–202). Warin, M., & Volk, H.M. (2004). Using WordNet and semantic similarity to disambiguate an ontology. Technical Report. STOCKHOLMS UNIVERSITET Institutionen för lingvistik. Wennerberg, P. (2009). Aligning medical domain ontologies for clinical query extraction. In Proceedings of the 12th conference of the european chapter of the association for computational linguistics: student research workshop, Association for Computational Linguistics, Stroudsburg, PA, USA (pp. 79–87). WordNet (2013). WordNet. URL: . Wrigley, S. N., García-Castro, R., & Nixon, L. (2012). Semantic evaluation at large scale (seals). In Proceedings of the 21st international conference companion on World Wide Web (pp. 299–302). New York, NY, USA: ACM. Xu, P., Tao, H., Zang, T., & Wang, Y. (2008). Alignment results of sobom for oaei 2009. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR-WS.org. URL: . Xu, P., Wang, Y., Cheng, L., & Zang, T. (2010a). Alignment results of SOBOM for OAEI 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. (pp. 203–2011). Xu, P., Wang, Y., Cheng, L., & Zang, T. (2010b). Alignment results of sobom for oaei 2010. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. URL: . Xu, P., Wang, Y., & Liu, B. (2012). A differentor-based adaptive ontology-matching approach. Journal of Information Science, 38, 459–475. Yaghlane, B. B., & Laamari, N. (2007). OWL-CM: OWL combining matcher based on belief functions theory. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), OM, CEUR-WS.org. (pp. 206–218). Zargayouna, H., Safar, B., & Reynaud, C. (2007). Taxomap in the oaei 2007 alignment contest. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), OM, CEUR-WS.org. URL: . Zghal, S., Kachroudi, M., Yahia, S. B., & Nguifo, E. M. (2009). Oacas – ontologies alignment using composition and aggregation of similarities. In J. L. G. Dietz (Ed.), KEOD (pp. 233–238). INSTICC Press. URL:.. Zghal, S., Kachroudi, M., Yahia, S. B., & Nguifo, E. M. (2011). OACAS: Results for OAEI 2011. In P. Shvaiko, J. Euzenat, T. Heath, C. Quix, M. Mao, & I. F. Cruz (Eds.), OM, CEUR-WS.org. (pp. 190–196). Zghal, S., Yahia, S. B., Nguifo, E. M., & Slimani, Y. (2007). SODA: An OWL-DL based ontology matching system. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), OM, CEUR-WS.org. (pp. 261–267). Zhang, H., Hu, W., & Qu, Y. (2011). Constructing virtual documents for ontology matching using mapreduce. In J. Z. Pan, H. Chen, H. G. Kim, J. Li, Z. Wu, & I. Horrocks, et al. (Eds.), JIST (pp. 48–63). Springer. Zhang, J., Lin, P., Huang, P., & Wu, G. (2011). Research on semantic web service composition based on ontology reasoning and matching. Communications in Computer and Information Science CCIS, 243, 450–457. Zhang, S., & Bodenreider, O. (2007). Hybrid alignment strategy for anatomical ontologies: Results of the 2007 ontology alignment contest. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & B. He (Eds.), OM, CEUR-WS.org. (pp. 1–11). Zhang, X., Zhong, Q., Li, J., & Tang, J. (2008). Rimom results for oaei 2008. In P. Shvaiko, J. Euzenat, F. Giunchiglia, & H. Stuckenschmidt (Eds.), OM, CEUR- WS.org. URL: . Zhang, X., Zhong, Q., Shi, F., Li, J., & Tang, J. (2009). Rimom results for oaei 2009. In P. Shvaiko, J. Euzenat, F. Giunchiglia, H. Stuckenschmidt, N. F. Noy, & A. Rosenthal (Eds.), OM, CEUR-WS.org. URL: . Zhu, J. (2012). Survey on ontology mapping. Physics Procedia, 24(Part C), 1857–1862. http://refhub.elsevier.com/S0957-4174(14)00514-4/h1175 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1175 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1175 http://dblp.uni-trier.de/db/conf/esws/eswc2005.html#VourosK05 http://dblp.uni-trier.de/db/conf/esws/eswc2005.html#VourosK05 http://dblp.uni-trier.de/db/conf/semweb/om2007.html#WangX07 http://dblp.uni-trier.de/db/conf/semweb/om2008.html#WangX08 http://dblp.uni-trier.de/db/conf/semweb/om2008.html#WangX08 http://dblp.uni-trier.de/db/conf/semweb/om2009.html#WangX08a http://dblp.uni-trier.de/db/conf/semweb/om2009.html#WangX08a http://wordnet.princeton.edu/ http://refhub.elsevier.com/S0957-4174(14)00514-4/h1230 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1230 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1230 http://dblp.uni-trier.de/db/conf/semweb/om2009.html#XuTZW08 http://dblp.uni-trier.de/db/conf/semweb/om2009.html#XuTZW08 http://dblp.uni-trier.de/db/conf/semweb/om2010.html#XuWCZ10 http://dblp.uni-trier.de/db/conf/semweb/om2010.html#XuWCZ10 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1250 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1250 http://dblp.uni-trier.de/db/conf/semweb/om2007.html#ZargayounaSR07 http://dblp.uni-trier.de/db/conf/semweb/om2007.html#ZargayounaSR07 http://dblp.uni-trier.de/db/conf/ic3k/keod2009.html#ZghalKYN09 http://dblp.uni-trier.de/db/conf/ic3k/keod2009.html#ZghalKYN09 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1280 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1280 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1280 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1285 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1285 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1285 http://dblp.uni-trier.de/db/conf/semweb/om2008.html#ZhangZLT08 http://dblp.uni-trier.de/db/conf/semweb/om2008.html#ZhangZLT08 http://dblp.uni-trier.de/db/conf/semweb/om2009.html#ZhangZSLT08 http://dblp.uni-trier.de/db/conf/semweb/om2009.html#ZhangZSLT08 http://refhub.elsevier.com/S0957-4174(14)00514-4/h1305 Ontology matching: A literature review 1 Introduction 2 Procedures 3 Statistical results 3.1 Articles sorted by publication year 3.2 Articles sorted by data source 4 Classification 4.1 ‘Reviews’ category 4.2 ‘Matching Techniques’ category 4.3 ‘Matching Systems’ category 4.4 ’Processing Frameworks’ category 4.5 ‘Practical Applications’ category 4.6 ‘Evaluation’ category 5 Limitations of the Literature Review 6 Trends in Ontology Matching: Practitioner-oriented Survey 6.1 Participants and Survey design 6.2 Survey results 6.2.1 Background questions 6.2.2 Research field questions 6.2.3 Future challenges questions 7 Limitations of the Survey 8 Conclusions References work_2jzz4ykqmfbbxhw6m4crviv56i ---- Intelligent Programm Support for Dynamic Integrated Expert Systems Construction doi: 10.1016/j.procs.2016.07.426 Intelligent Programm Support for Dynamic Integrated Expert Systems Construction Galina V. Rybina1, Victor M. Rybin1, Yury M. Blokhin1, and Sergey S. Parondzhanov1 National Research Nuclear University MEPhI (Moscow Engineering Physics Institute), Kashirskoye shosse, 31, Moscow, 115409, Russian Federation galina@ailab.mephi.ru Abstract The problems of intellectualization in the development process of integrated expert systems basing on the the problem-oriented methodology and the AT-TECHNOLOGY workbench are considered. The automation of dynamic integrated expert systems construciton is in focus. The intellgient programm environment and its basic components, including standard design procedure are reviewed. The detailed description of procedure for dynamic integrated expert system construction is given. The examples of applied integrated expert system prototypes developed with described procedure are listed. Keywords: dynamic integrated expert system, problem-oriented methodology, typical design procedure, intelligent programm environment, intelligent planner, AT-TECHNOLOGY workbench 1 Introduction Trends towards integration of research in different fields of artificial intelligence had most clearly manifested at the turn of the XX and the XXI centuries and made it necessary to combine se- mantically different objects, models, methodologies, concepts and technologies. As a result, new intelligent system architectures were emerged, in particular integrated expert systems (IES), i.e. systems with scalabale architecture and expansible functions [8, 9, 10, 11, 17, 3, 2]. Problem-oriented methodology for a wide class (static, dynamic, tutoring and others) of applied IES construction was developed by Rybina G.V. [8] from Department of Cybernetics National Research Nuclear University MEPhI. Problem-oriented methodology [8] is actively used and constantly develops: about 30 applied prototypes IES for diagnosis, control, de- sign, planning and tutoring problems were constructed (detailed description can be found in [8, 9, 10, 11]). The core idea is based on the conceptual modeling of the IES architecture at all levels of the integration processes in the IES and focusing on the modeling specific types of unformalised tasks that are relevant to the technology of traditional expert systems. In the lab- oratory ”Intelligent Systems and Technologies” of Department of Cybernetics in MEPhI several Procedia Computer Science Volume 88, 2016, Pages 205–210 7th Annual International Conference on Biologically Inspired Cognitive Architectures, BICA 2016 Selection and peer-review under responsibility of the Scientific Programme Committee of BICA 2016 c© The Authors. Published by Elsevier B.V. 205 http://crossmark.crossref.org/dialog/?doi=10.1016/j.procs.2016.07.426&domain=pdf http://crossmark.crossref.org/dialog/?doi=10.1016/j.procs.2016.07.426&domain=pdf generations of software tools for the automated support of problem-oriented methodology were created [8, 9, 10, 11, 13, 15, 16, 14]. It is called AT-TECHNOLOGY workbench. A large part of the problems is linked to the high complexity phases of design and imple- mentation of the IES, as showed by practical experience of creating a series of static, dynamic, and tutoring IES through the use of problem-oriented methodology and AT-TECHNOLOGY workbench. Problem domain and human factor provide a significant impact on modelling pro- cess. Designing of dynamic IES architecture is the most difficult phase, because in the dynamic IES important place is given to the integration of methods and means of temporal information acquisition, presentation and processing with the methods and means of the outside world simulation in real time. This leads to expansion of architecture (for example, intelligent control systems [16, 5, 7, 4]), built on the concept of dynamic IES relevant subsystems adequately reflecting all the processes and laws of functioning simulated systems [6, 16], as an integral phase of building dynamic IES. Therefore, the need to develop intelligent software environment [8, 9] for the further devel- opment of problem-oriented methodology and AT-TECHNOLOGY workbench with the aim of creating intelligent technology to build specific classes of IES has become urgent. Now let us review basic features of problem-oriented methodology for IES construction, which is described in details in [8]. 2 Features of Problem-Oriented Methodology for Inte- grated Expert Systems Construction In the context of solving the modern IES construction problems (in particular for the manage- ment of complex discrete systems) problem-oriented methodology has the following properties [8, 9, 10, 11]: a powerful combination method of acquiring knowledge that supports the au- tomated process of acquiring knowledge from the sources of knowledge of different typology (experts, database, text) is used to gain knowledge; generalized knowledge representation lan- guage designed for building models of problem areas in dynamic IES allows to represent tempo- ral knowledge, based on a modified interval Allen logic [1] and time control logic, together with the basic knowledge, including those containing knowledge with uncertainty, imprecision and vagueness; supports the use of various output means (universal AT-SOLVER and a specialized temporal solver designed for dynamic tasks); in the context of enhanced functionality and princi- ples of the components IES deep integration provides the possibility of implementing simulation techniques for modeling the external environment and how to interact with them; the high effi- ciency of the a large number of applied IES development, including dynamic areas of concern; instrumentally supported by a modern software such as workbench (AT-TECHNOLOGY). Significant place in the framework of the problem-oriented methodology for IES construct- ing (basic points are reflected in [8]) is given to the methods and means of intelligent software support for the development processes, which form general concept of ”intellectual environ- ment”. Complete formal description of the intellectual environment model and methods of the individual components implementation is presented in [8], so here only a brief description of the model in the form of quaternion is presented. MAT = 〈KB, K, P, TI〉 (1) KB is a technological knowledge base (KB) on the composition of the project, and typical design solutions used in development of IES. K = {Ki}, i = 1..m - set of current contexts Ki, Intelligent Programm Support Rybina, Rybin, Blokhin and Parondzhanov 206 consisting of a set of objects from the KB, editing or implementing on the current control step. P a special program - an intelligent planner that manages the development and IES testing process. TI = {TIi}, i = 1..n - many tools TIi, applied at various stages of IES development. A component of the KB is a declarative basis of intellectual support for the development of IES, acting as data storage in a given environment and defined as KB = 〈WKB, CKB, PKB〉 (2) WKB is a KB containing knowledge of the standard design procedures (SDP), describing the sequences and methods of using various tools to create applied IES and a sequence of steps for creating IES. CKB - is KB comprising knowledge about the use of SDP and reusable components (RUC), including fragments of previously created IES prototypes. PKB (optional) - is a KB containing specific knowledge used at various stages of creating IES prototype for solving problems that require innovative approaches. The current context Ki is represented as set of Ki = 〈KD, KP〉. KD here is a declarative context for storing static declarative information about the structure of the project, the knowl- edge engineer and the current user. KP is a procedural context, which includes objects clearly affecting the further planner steps (LC system phase, currently edited or executable object, the current target, the current executor, the global development plan, etc.). The main procedural (operational) component is intelligent planner. This model generally describes it. P = 〈SK, AF, Pa, Pb, I, GP〉 (3) SK here is the state of the current context, in which the planner was activated. AF = {AFi}, i = 1..k is a set of functional modules AFi, a part of planner. Pa is a selection procedure for the current target based on the global development plan. Pb is a selection procedure for the best executive function module from the list of possible candidates. I - procedures to ensure the interface with the corresponding components of the AT-TECHNOLOGY workbench; GP - operating procedures for the IES global development plan. Each reusable component (RUC), involved in the development of an IES prototype, is represented by tuple: RUC = 〈N, Arg, F, PINT, FN〉 (4) N in this model is the name of the component, by which it is registered in the workbench. Arg = {Argi}, i = 1..l - set of arguments containing current project database subtree serving the input parameters for the functions from the set. F = {Fi}, i = 1..s - a variety of methods (RUC interfaces) for this component at the implementation level. PINT - a set of other kinds of RUC interfaces, used by the methods of the RUC. FN = {FNi}, i = 1..v - set of functions names performed by this RUC. Any SDP can be represented as tuple SDP = 〈C, L, T〉 (5) , where C - is the set of conditions under which the SDP can be implemented; L - script implementation described in the describing internal language actions of the SDP ; T - set of parameters initialized by intelligent planner at SDP inclusion in the development plan of a IES prototype. Today there are already implemented and used following SDPs: ”Tutoring web-IES con- struction”, ”Distributed Knowledge Acquisition” and others. And currently researching SDP ”Dynamic IES construction” is described below. Intelligent Programm Support Rybina, Rybin, Blokhin and Parondzhanov 207 3 Description of Typical Design Procedure for Dynamic Integrated Expert Systems Construction Described above SDP model is specified as SDPD = 〈CD, LD, TD〉 for dynamic IES construc- tion, where CD - is a set of conditions to initialize SDPD realization; LD - scenario of dynamic IES construction; TD - a set of parameters initialized by intelligent planner when SDPD is included into dynamic IES construction plan. Conditions set CD. Following set of conditions must be satisfied to include SDPD into development plan: • following lifecycle stage is system requirements analysis; • in dynamic IES architecture model (as a hierarchy of EDFD) there is an element, describ- ing simulation model presence (here, complex technical systems are simulated); • there is at least one EDFD element, connected with solving of non-formalized problem. Scenario LD. There are following SDPD scenario stages: 1. System equirment analisys stage. Following actions are performed: automated knowledge acquisition based on temporal knowledge acquisition method [8], which per- forms direct acquisition process of knowledge, containing temporal references; simulation model development with help of specialized visual editor. 2. Design stage. Here are performed following actions: forming of reasoning tools (a synergy of universal AT-SOLVER and Temporal Solver is currently supported); conversa- tions of previously obtained knowledge field into knowledge base described in generalized knowledge representation language; explanation development; IES core components con- figuration. 3. Implementation stage. During the final stage of dynamic IES prototype construction following steps are performed: simulation visual representation development; development of components for communication with user with language for dialogue scenario descrip- tion;integration with external systems (DB, applied software modules, etc.); aggregate integration of IES components. Parameters set TD. In case of including SDPD into development plan there are two parameters included into current context: one identifies currently executed SDP, the other one - current scenario step. Now, using [12], let us review features of dynamic IES prototyping with SDPD, which functions are supported by a set of operational RUC in each IES construction stage. Let us assume, that knowledge engineer already built dynamic IES architecture model as a EDFD hierarchy. With help of AT-SOLVER in automated mode initial architecture layout of IES prototype is formed as it is shown in Fig. 1. Global plan is also generated with intelligent planner core. Detailed development plans generation is performed iteratively, and following selection and execution of the most priority task is performed with following modification of architecture layout. The process is looped, until all tasks are not done. Detailed plan is regenerated in it is necessary. Because it is usually assumed, that IES construction planning problem actions are performed with deterministic result, so deterministic planning approach is used as base [12]. It is performed Intelligent Programm Support Rybina, Rybin, Blokhin and Parondzhanov 208 Technological KB Architecture model in the form of EDFD hierarchy Initial architecture layout General plan of IES prototyping Detailed plan generation Planning task execution Synthesis of a new architecture layout fragment Current IES project RUC RUC RUC SDP SDP SDP Universal AT-SOLVER PDDL planner Knowledge Engineer Figure 1: Dynamic IES construction planning scheme in project state space, formed by sets of possible values of project parameters. Built EDFD hierarchy is analyzed with intelligent planner in following manner. Elements which have to be implemented are recognized, for example simulation model, DB fragments, non-formalized tasks, etc. These elements represent big tasks and precedence relation between them is built, in particular with use of PDDL-planners. Initial architecture layout generating and global IES development plan generating is per- formed once after EDFD hierarchy is build. Then detail development plan is generated iter- atively with help of intelligent planner after each architecture layout fragment adding. De- tailed plan is visualized with visualization component, and knowledge engineer can initialize specific plan task execution. Each plan task is performed with use of appropriate RUC of AT-TECHNOLOGY workbench. Samples of another complex SDPs, for example, connected with tutoring IES development are described in [8, 9, 10, 11]. The difficulties of the tutoring IES development technology are caused by supporting two different work modes DesignTime, oriented to work with teachers (course/discipline ontology creating processes, different typed training im-pacts creating, etc.) and RunTime, for working with students (current student model building processes, including psychological model, etc.). 4 Conclusion Currently, an experimental software study of the current version of the intelligent planner during educational IES prototyping on various courses is carried out. This study is carried out in particular, for the collective development of IES prototypes with limited resources. Research and development related to the use of described SDP ”Dynamic IES construciton” with intelligent software environment for two dynamic IES prototypes (”Management of med- Intelligent Programm Support Rybina, Rybin, Blokhin and Parondzhanov 209 ical forces and resources for major traffic accidents” and ”Resource management for satellite communications system between regional centers”) and tutoring IES prototypes for different courses/disciplines were succesfully performed. 4.1 Acknowledgments This work was supported by the Russian Foundation for Basic Research under grant no. 15-01- 04696 and Competitiveness Growth Program of the Federal Autonomous Educational Institu- tion of Higher Professional Education National Research Nuclear University MEPhI (Moscow Engineering Physics Institute). References [1] James Allen. Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11):832–843, 1983. [2] B.E. Fedunov. Intelligent systems for the core of anthropocentric objects and its modeling. pages 394–400, 2012. [3] Crina Grosan and Ajith Abraham. Intelligent Systems - A Modern Approach, volume 17 of Intel- ligent Systems Reference Library. Springer, 2011. [4] I. M. Makarov, V. M. Lokhin, S. V. Manko, and M. P. Romanov. Artificial intelligence and intelligent control systems. Nauka, Moscow, 2006. [5] G. S. Osipov. Artificial intelligence methods. FIZMATLIT, Moscow, 2011. [6] M. Pidd. Computer simulation in Management Science. Wiley, 5th ed. Chichester, 2004. [7] V.M. Rybin and S.S. Parondzhanov. Using of dynamic integrated expert systems for intelligent control. International Journal of Applied Engineering Research, 10(13):33202–33205, 2015. [8] G. V. Rybina. Theory and technology of construction of integrated expert systems. Monography. Nauchtehlitizdat, Moscow, 2008. [9] G. V. Rybina. Intelligent systems: from A to Z. Monography series in 3 books. Vol. 1. Knowledge- based systems. Integrated expert systems. Nauchtehlitizdat, Moscow, 2014. [10] G. V. Rybina. Intelligent systems: from A to Z. Monography series in 3 books. Vol. 2. Intelligent dialogue systems. Dynamic intelligent systems. Nauchtehlitizdat, Moscow, 2015. [11] G. V. Rybina. Intelligent systems: from A to Z. Monography series in 3 books. Vol. 3. Problem- oriented intelligent systems. Tools for intelligent system developing. Dynamic intelligent systems. Nauchtehlitizdat, Moscow, 2015. [12] G. V. Rybina and Y. M. Blokhin. Modern automated planning methods and tools and their use for control of process of integrated expert systems construction. Artificial intelligence and decision making, (1):75–93, 2015. [13] G. V. Rybina and A. O. Deineko. Distributed knowledge acquisition for the automatic construction of integrated expert systems. Scientific and Technical Information Processing, 38:1–7, 2011. [14] G.V. Rybina and Y.M. Blokhin. Methods and means of intellectual planning: Implementation of the management of process control in the construction of an integrated expert system. Scientific and Technical Information Processing, 42(6):432–447, 2015. [15] G.V. Rybina and A.V. Mozgachev. The use of temporal inferences in dynamic integrated expert systems. Scientific and Technical Information Processing, 41(6):390–399, 2014. [16] G.V. Rybina, V.M. Rybin, S.S. Parondzhanov, and S.T.H. Aung. Some aspects of simulation application in dynamic integrated expert systems. Life Science Journal, 11(SPEC. ISSUE 8):144– 149, 2014. [17] A. R. Tyler. Expert systems research trends. Nova Science Publishers, New York, 2007. Intelligent Programm Support Rybina, Rybin, Blokhin and Parondzhanov 210 work_2kjbpm2btjfo5fuxi67ouhrveu ---- paper.dvi Applying KADS to KADS� knowledge based guidance for knowledge engineering John K�C� Kingston AIAI�TR���� January ���� This paper was published in the journal �Expert Systems The International Journal of Knowledge Engineering � ��� �� February ���� Arti cial Intelligence Applications Institute University of Edinburgh �� South Bridge Edinburgh EH� �HN United Kingdom c� The University of Edinburgh� ����� Abstract The KADS methodology �Schreiber et al� ����� �Tansley � Hayball� ����� and its successor� CommonKADS �Wielinga et al� ��� � have proved to be very useful approaches for modelling the various transformations involved be� tween eliciting knowledge from an expert and encoding this knowledge in a computer program� These transformations are represented in a series of mod� els� While it is widely agreed that these methods are excellent approaches from a theoretical viewpoint� the documentation provided concentrates on de ning what models should be produced� with only general guidance on how the models should be produced� This has the advantage of making KADS and CommonKADS widely applicable� but it also means that considerable training and experience is required to become pro cient in them� This paper reviews three projects� which investigated the feasibility of producing speci c guidance for certain decisions which are required when using KADS or CommonKADS to develop a knowledge based system� Guid� ance was produced for the identi cation of the generic task addressed by a knowledge based system� for the selection of appropriate AI techniques for implementing the analysed knowledge� and for selecting a suitable tool for implementing the system� Each set of guidance was encoded in its own knowledge based system� which was itself developed with the assistance of KADS or CommonKADS� These projects therefore both studied and applied KADS and CommonKADS in order to produce knowledge based guidance for knowledge engineers� The projects showed that it was feasible to produce heuristic guidance which could be understood� applied� and occasionally overridden by knowl� edge engineers� The guidance provides reasonably experienced knowledge engineers with a framework for making the key decisions required by Com� monKADS� in the same way that CommonKADS provides knowledge engi� neers with a framework for representing knowledge� The projects also pro� duced some new insights about CommonKADS domain modelling and about the process of task identi cation� � Introduction The KADS methodology �Schreiber et al� ����� �Tansley � Hayball� ����� and its successor� CommonKADS �Wielinga et al� ������ are collections of structured meth� ods for building knowledge based systems� analagous to methods such as SSADM for software engineering� The development of these methods was funded by the European Community�s ESPRIT programme between ���� and ����� KADS and CommonKADS view the construction of KBS as a modelling activity� and so these methods require a number of models to be constructed which represent di�erent views on problem solving behaviour� in its organisational and application context� CommonKADS recommends the construction of six models � � a model of the organisational function and structure� � a model of the tasks required to perform a particular operation� � a model of the capabilities required of the agents who perform that operation� � a model of the communication required between agents during the operation� � a model of the expertise required to perform the operation� which is divided into three sub�levels � models of declarative knowledge about the domain� � models of the inference processes required during problem solving� � an ordering of the inference processes� � a model of the design of a KBS to perform all or part of this operation� For more details on these models� see �deHoog et al� ������ Experience has shown that the models recommended by KADS and Com� monKADS provide an excellent basis for representing the various transformations required between eliciting knowledge from an expert and encoding it in a com� puter program� In addition� KADS and CommonKADS provide various libraries of generic models which have proved very useful to knowledge engineers �see �Kingston� ������ for example�� However� experience has also shown that the task of developing these models is non�trivial� There are a number of decisions to be taken at each stage in the modelling process� some of which have a major impact on the models� or on the implemented system� These decisions include �� Deciding the approach which should be taken to modelling should models be produced bottom�up from acquired knowledge� top�down by instantiating generic problem�solving methods which CommonKADS provides� or by an intermediate approach which selects a generic model and then modi es it in the light of domain knowledge� �� Selecting models from a library� The library of generic inference structures is indexed according to the type of task which is being tackled so the knowledge engineer must determine the most appropriate task type for the current task� �� Deciding whether the design should preserve the structure of the expertise model� �� Deciding which knowledge representations and inference techniques should be used within the design� � �� Deciding on the most appropriate tool for implementing this knowledge based system� Much of the CommonKADS documentation concentrates on de ning what should be done in order to produce models� while only specifying how it should be done in general terms� This has the advantage that it speci es the content of mod� els without enforcing an approach on practitioners� thus making CommonKADS widely applicable and compatible with many di�erent approaches to knowledge engineering� however� by the same token� it requires knowledge engineers to be fairly experienced at making good knowledge engineering decisions before they can make full use of KADS or CommonKADS� The CommonKADS project has pro� vided guidance on making some of the decisions outlined above �e�g� guidance on top�down�bottom�up approaches to model construction �Wielinga� ����� and model instantiation �L�ockenho� � Valente� ������� however� many of the organisa� tions which have recognised the value of KADS and CommonKADS have had to obtain training and accept a long learning curve for each of their sta� who wants to become pro cient in the KADS approach� An alternative approach to providing extensive training and experience for all sta� would be to provide speci c guidance on developing KADS models� thus pro� viding in�house KADS guidance based on the company�s own knowledge engineering experiences� The purpose of this paper is to review three projects which investi� gated the feasibility of producing guidance for some of the important decisions listed above� The tasks for which guidance was produced were � identifying the task type� � selecting appropriate AI techniques at the design stage�� � choosing a suitable implementation tool� These projects were all performed by students in the Department of Arti cial Intelligence� University of Edinburgh as part of an M�Sc course in Intelligent Knowl� edge Based Systems� The students were supervised by sta� from the Department of Arti cial Intelligence� and by members of the Knowledge Engineering Methods group at AIAI� the supervisors also acted as experts from whom knowledge was elicited� It is a requirement that all students on Edinburgh�s M�Sc course produce func� tioning software as part of their project� which meant that the students were re� quired to acquire knowledge� analyse the knowledge� and to implement this knowl� edge in a knowledge�based system� It therefore seemed sensible for the students to use KADS to help them in this process� since practical experience of using �Guidelines on whether a design should preserve the structure of an expertise model are cur� rently being considered� � KADS ought to help them in producing useful guidance� The main body of this paper therefore shows how KADS was used to model the tasks involved in making KADS�related decisions in these three projects� � Identifying task types The rst project looked at the task of identifying the most appropriate task type for a particular problem �Krueger� ������ ��� The task The student who took on this project was set the task of developing a technique for distinguishing and classifying di�erent expert tasks� with a focus on the tax� onomy of expert tasks developed by the KADS methodology �see Figure ��� This taxonomy de nes the contents of the library of generic inference structures� there is a generic inference structure associated with �almost� every leaf node in the tax� onomy� The selection of a suitable generic inference structure therefore boils down to the identi cation of the most appropriate task type from this taxonomy� SYSTEM MODIFICATION SYSTEM SYNTHESIS SYSTEM ANALYSIS Prediction Identifying Repairing Controlling Remedying Modelling Transformation Design Planning Configuration Refinement design Transformational design Monitoring Classification Prediction of values Prediction of behaviour Maintaining Assessment Diagnosis Simple classification Multiple fault diagnosis Single fault diagnosis Localisation Causal Tracing Systematic Diagnosis Heuristic Classification Multiple stream refinemt design Single stream refinemt design Exploration- based design Hierarchical design Figure �� Taxonomy of task types � ��� The project Given that the student has been presented with a problem to solve using knowledge� based techniques� the rst stage of KADS analysis is to identify the type of task which is carried out in order to select a generic inference structure� Fortunately� the obvious �chicken and egg situation which arose here was circumvented by reading an early KADS report �Breuker�� ������ which suggests that the appropriate task type here is assessment � that is� assessing how well each of the task types matches the task under consideration� and then selecting the one which matches most closely� The student therefore designed and implemented a KBS which used an assessment approach to the identi cation of task types� KADS� generic inference structure for assessment tasks �Figure �� recommends that assessment is carried out by abstracting �i�e� generalising� key features of a particular problem case description� and specifying the preferred value of these features from a system model which represents the �ideal world � The features of the case are then matched against the preferred features to determine the degree of acceptability of the case� For this task� the �problem case is an expert task� and the �ideal world is �one of� a set of generic inference structures� The generic inference structure for assessment tasks was therefore adapted and instantiated in the ways described below the resulting inference structure is shown in Figure �� Case description System model Norms Decision class Abstract case description match abstract specify Figure �� Generic inference structure for assessment tasks In order to instantiate this generic structure to problem in hand� the student was required to make some alterations to the generic inference structure � The key features of an expert task were obtained by asking the user� rather than by abstracting from a detailed description� The abstraction inference was therefore replaced by an obtain step�� �In CommonKADS� obtain is considerd to be a transfer task rather than an inference step� Transfer tasks are represented by rounded rectangles� � � The set of inference structures goes through a two�stage speci cation the rst stage determines the features which must be asked of the user� and the second stage speci es parameters of these features which can be compared against the user�s replies� � A �feedback from the di�erences discovered to the set of inference structures under consideration is explicitly recorded in the problem�speci c inference structure� user’s knowledge about expert task differences parameters expected (with weightings) features of structures under consideration set of inference structures under consideration user-supplied parameters refine compare specify-2 specify-1 obtain Figure �� Instantiated inference structure for assessing inference structures The resulting system� known as SEXTANT�� asks users to identify inputs of � Selection of EXpert TAsks by Nature of the Task � a task� outputs of a task� and knowledge roles which exist in the problem solving process� it then matches this information against the corresponding attributes of each generic inference structure in order to attach a likelihood weighting to each inference structure� As weightings increase or decrease� some inference structures are removed from consideration until only one �or a few� are left� The remaining inference structure�s� are then recommended to the user� As the project progressed� it became obvious that an alternative technique for identifying task types could be developed� based on the taxonomy of task types shown in Figure �� By starting at the topmost node in the taxonomy� it is possible to traverse the taxonomy by asking a single question at each node to decide which is the most appropriate subcategory for the current problem� For example� at the topmost level� the following question could be asked � Does the task involve �� Establishing unknown properties or behaviour of an object within the domain� �� Composing a new structural description of a possible object within the domain� �� A combination of the above� If the rst answer is chosen� then the most appropriate subcategory is System Analysis tasks� if the second� then System Synthesis� if the last� then System Mod� i�cation is the most appropriate task type� By de ning suitable questions for each non�leaf node in the taxonomy� it becomes possible to identify task types simply by answering a series of questions� E�ectively� the taxonomy is transformed into a decision tree� and the identi cation of task types becomes a comparatively simple classi cation task� The nal version of the SEXTANT system incorporates both the �decision tree approach and the �assessment approach� Novice users of KADS can progress through the decision tree� however� if they are unable to answer questions in the decision tree� the assessment approach takes over� using the remaining candidates from the decision tree as the set of inference structures on which assessment is performed� Experienced users of KADS should be able to specify a relatively small set of task types at the outset� and so will only need to use the assessment part of the SEXTANT system in order to support them in their nal decision� The use of SEXTANT also forces knowledge engineers to consider the nature and the dependencies of each inference step carefully� which should provide assistance in the task of instantiating the chosen generic inference structure to the actual task being performed� SEXTANT was implemented in CLIPS ���� and is therefore capable of running on a range of hardware� � ��� Results This project demonstrated that it was feasible to provide guidance on task selec� tion� using either an assessment�based approach or a relatively simple decision tree approach� The creation of appropriate questions for the decision tree required con� siderable thought about the nature of the choices being made at each choice point� The questions which were generated were reasonably consistent in their format� which suggests that the tasks in the library form a coherent set� It is not clear that the tasks form a complete set of all possible expert tasks� however� and even those tasks which are in the library do not always have an associated generic inference structure� It is likely that more work is needed on these tasks types � particularly system synthesis and system modi cation tasks � in order to produce a complete set of knowledge based tasks �cf� �Tansley � Hayball� ����� �Kingston� ������� Since this project was performed� the CommonKADS project has rede ned generic inference structures� by simplifying the basic structures in the library and providing extensive guidance on con guring the generic structures to the require� ments and features of a particular task �L�ockenho� � Valente� ������ This alter� ation has greatly increased the scope for de ning many expert tasks� as variations of a smaller number of �basic tasks � It has also reduced the utility of the �as� sessment approach in SEXTANT� because the di�erences between the simpli ed inference structures in the library now require less detailed analysis� and because the con guration process requires users to think deeply about the nature of the inferences performed in the task� However� SEXTANT�s �decision tree approach is still useful� indeed� it could usefully be extended to incorporate guidance on con� guring inference structures� since this guidance consists of a set of questions which are used to recommend particular alterations to a basic inference structure� These �con guring questions could therefore be considered to be extensions to the lowest levels of the decision tree� The �decision tree component of SEXTANT has been used on other KADS� related projects� including the projects described later in this paper� � Selecting appropriate AI techniques The second project was intended to help knowledge engineers select appropriate knowledge representation and inference techniques when constructing a KBS design �MacNee� ������ ��� The task Once a task type has been identi ed� the acquired knowledge can be analysed by building the KADS expertise model� This consists of � � an inference structure� con gured and instantiated to represent the reasoning processes which take place in the task under consideration� � a set of domain models� identifying key concepts in the domain and the rela� tionships between them� � a task structure� enforcing an ordering on the inference steps�� The resulting expertise model must then be transformed into a program speci� cation� this is the task of the KADS design phase� The approach recommended for the CommonKADS design phase prescribes a ��stage design process� and includes suggested approaches for modelling the rst two phases �van de Velde et al� ����� � Application design typically consists of a conceptual decomposition of the expertise model into a number of functional units and�or data objects� � Architecture design requires decisions on whether to use rules� objects� or other representational techniques� for di�erent parts of the application design� � Platform design matches these chosen techniques with the facilities and environment o�ered by the chosen programming tool� with consequent alter� ations to the architecture design and�or the programming tool� This approach has been used successfully on some projects� and appears to represent a su�ciently detailed breakdown of the design process�� however� unless a strongly prescriptive top�down approach to modelling is being used� there is little speci c guidance on the design process� The aim of the project described in this section was to elicit and acquire some guidelines for the production of an architectural design� The approach used was based on the probing questions approach� developed at Rome Air Force Base� USA �Kline � Dolins� ����� and further developed at AIAI �Inder et al� ������ in which a KBS designer is asked a number of questions about the analysed knowledge� with the answers being used to produce some recommendations of suitable representa� tional techniques� The general format of the questions is if a certain feature exists in the analysed knowledge then consider using a particular knowledge representation� or imple� mentation technique� The designer is asked whether the if condition is true if it is� the recommen� dation supplied by the latter part of the �probing question is added to the set of recommendations� An example of a probing question might be �See �Wielinga et al� ���� for a fuller description of the expertise model� �These � phases approximately correspond to the stages of functional decomposition� be� havioural design and physical design speci ed in the original KADS project� � if the problem�solving task is such that a pre�enumerated set of so� lutions can be established �as distinct from the type of task in which solutions are constructed as a result of the satisfaction of constraints� then use goal�driven reasoning weighting� � else use data�driven reasoning weighting� � It can be seen that probing questions are e�ectively heuristics for knowledge engineers� encoded in a rule�based format� The student�s task was to acquire prob� ing questions �either by eliciting knowledge from experienced knowledge engineers� or by reading the available literature�� to collate the resulting collection of heuris� tic rules into a structure of some kind� and then to implement a knowledge based system to run these rules� ��� The project� acquisition and analysis The student chose to concentrate his e�orts on eliciting knowledge from experts� Three experts were used� each of whom was interviewed �or asked to make extensive comments on documents sent by electronic mail� on several occasions� Knowledge elicitation techniques used included introductory interviews� card sorting� and tri� adic comparison of actual KBS systems� One of the key results of knowledge elicitation was the identi cation of a number of functional requirements and design features for knowledge base systems� Many functional requirements relate to the KBS� need to model the domain objects and their relationships� and the need to model the inferences which are made about these domain objects� These requirements may lead to various needs for example� there may be requirements for certain knowledge representations� for the ability to represent uncertainty in knowledge� and for the ability to generate and examine large numbers of solutions� Design features are knowledge representation and in� ference techniques used to satisfy the functional requirements� Examples include data�driven reasoning� rules� semantic networks� and blackboard architectures� There is also a sizeable group of functional requirements� which are concerned with the capabilities of the KBS environment and with producing a smooth inter� action between the user and the KBS� These requirements may specify a need for dialogue with and explanations to the user� a need for a high level of computational e�ciency in the program� or a need to consider how �or whether� to present the user with large numbers of possible solutions� It was therefore decided that these envirnomental features would also be considered as design features� The categorisation of functional requirements and design features can be seen in Figure �� Note that �thoughts of God is intended to be a catch�all category for ill�de ned subjective in uences� such as design for elegance� or parsimonious design� �� expert knowledge Knowledge of KBS environment and user/KBS interaction (0) ‘THOUGHTS OF GOD’ (1a) KNOWLEDGE STRUCTURE -- (a): domain objects and relationships (1b) KNOWLEDGE STRUCTURE -- (b): inferences and generic task (2) UNCERTAINTY IN KNOWLEDGE (3) SOLUTIONS (4) DATA (5) DIALOGUE AND EXPLANATION (6) COMPUTATIONAL EFFICIENCY (7) DEVELOPMENT: Analysed construction, maintenance, and CATEGORIES OF DESIGN FEATURE expansion (8) SAFETY CRITICALITY (A) DEPTH-OF-REASONING AND ARCHITECTURE -- shallow, model-based, blackboard (B) KNOWLEDGE REPRESENTATION STRUCTURES -- rules, frames, OOP, networks, ... (C) INFERENCE TYPES -- goal-driven, data-driven (D) CONTROL OF FLOW OF INFERENCE: search strategy, constraints on search, meta-control (E) HANDLING OF UNCERTAINTY IN KNOWLEDGE AND OF INCOMPLETENESS AND INACCURACY OF DATA (F) USER INTERFACE: KBS input/output (G) KNOWLEDGE ACQUISITION CATEGORIES OF FUNCTIONAL REQUIREMENT Figure �� Categories of functional requirements and design features The next stage was to create a KADS expertise model to represent all the ac� quired knowledge from the di�erent viewpoints speci ed by KADS� In this project� the domain level of the model of expertise was developed rst� it was decided that the domain level consisted of the various functional requirements and design fea� tures� Once the domain level had been completed� the inference structure was developed� It was decided that the task type being performed was heuristic classi� �cation� The generic inference structure for heuristic classi cation and the instan� tiated inference structure which forms part of the expertise model can be seen in Figure �� It can be seen that the level of abstraction of the probing questions varies considerably� the conditions might be based on general knowledge of the operation of KBS systems� or on speci c knowledge of the task under consideration� and the recommendations might be to use a particular design feature� or to use one of a category of design features� Finally� a task structure was developed� which speci ed that the KBS should continue to ask questions until all possible relevant questions had been asked� in order to ensure that all functional requirements are considered� �� DESIGN FEATURE MATCH ANALYSED EXPERT KNOWLEDGE KNOWLEDGE OF KBS ENVIRONMENT CLASS FEATURE DESIGN REFINE FUNCTIONAL REQUIREMENTS Inference structure -- adapted and instantiated AND INTERACTION MATCH SOLUTION ABSTRACTION SOLUTION ABSTRACTION PROBLEM ABSTRACT PROBLEM REFINE Inference structure -- generic: heuristic classification Figure �� Generic and instantiated inference structures for a heuristic classi cation task ��� The project� design As the design is a relatively late stage in the development of the KBS� the student was able to apply the elicited probing questions to his own design problem� The re� sults of this exercise can be seen in an appendix to the project thesis �MacNee� ������ in which the nal implemented system was �retrospectively� applied to itself� The main resulting recommendations are summarised in the following table �� Design feature Recommendation Shallow reasoning Strong Rules Strong Goal driven reasoning Moderate Depth rst search Moderate Truth maintenance Moderate Certainty factors Moderate !Canned� text for explanations Moderate Data driven reasoning Weak Model�based reasoning Strong negative It is hardly surprising that a knowledge base which is based around heuristic probing questions should elicit strong recommendations for shallow reasoning and for the use of rules� The justi cation for some of the other recommendations is less obvious� this was noted during the project� and it was therefore decided that the implemented system would need to be able to justify its reasoning to the user in a clear and coherent manner�� An example of an explanation� based on the table above� is that the recommendation for depth� rst search arose because of the need to ask questions of the user� and because the student stated that the natural ow of dialogue was to ask detailed questions about one subject before asking general questions about another subject� Depth� rst search supports this mode of reasoning� and therefore makes it easy to ask questions in a natural order� ��� The project� platform design � implementation Having performed architectural design� the nal stage of design must be performed the matching up of the recommended design features with the chosen programming tool� The decisions required at this stage are described in more detail in section �� but for now� it is su�cient to note that the the probing questions should be consid� ered as a starting point for tool selection� not as a prescription� For this project� a choice of two programming tools was available CLIPS and Prolog� The table above indicates a stronger recommendation for goal�driven reasoning than for data� driven reasoning� which would favour Prolog� however� bearing in mind that the probing questions are heuristics� the factors contributing to this recommendation were examined in more detail� It turned out that the recommendation for goal� driven reasoning was based on a combination of three weak contributing factors� and one of these factors is actually not applicable in this case� because the system is designed to make the KBS ask all possible questions� instead of stopping when a single solution has been found� The recommendation for goal�driven reasoning was therefore downgraded to !weak�� �This decision accounts for the recommendation for �canned� text in the above table� �� CLIPS and Prolog are both equally strong in most of the other recommended de� sign features �although CLIPS does provide its own facilities for truth maintenance� which Prolog does not�� the conclusion is that� from the viewpoint of providing ad� equate design features� CLIPS and Prolog are both equally recommended for this project� The choice of tool was therefore heavily in uenced by other factors� such as the student�s previous experience� The tool chosen was CLIPS� a largely rule�based tool whose primary reasoning mechanism is depth� rst forward chaining� CLIPS also provides facilities for truth maintenance and certainty factors� Using CLIPS� the PDQ system �which� in this context� stands for �Probing Design Questions � was implemented in approximately � weeks� ��� Results PDQ is a workable system which produces recommendations which are helpful� although heuristic� as the above example concerning goal�driven reasoning shows� The questions require the knowledge engineer to have a good understanding of the problem� and some preliminary ideas about possible designs� for example� one question asks if the problem �can be subdivided into �� or more distinct modules � This suggests that PDQ is most approriate for knowledge engineers who have some experience of building knowledge based systems� Some further work has been done on the set of probing questions since the PDQ system was completed� in an attempt to separate those questions which produce abstract recommendations from those which recommend more speci c design fea� tures� This process has lead to the transferral of a small set of probing questions to the rst stage of the design process �application design�� because some design features �such as blackboard architectures or constraint�based programming� have such a profound e�ect on the architecture of a program that they must be con� sidered as alternative ways of performing the initial decomposition of the analysed knowledge� The probing questions are therefore able to provide some guidance on whether to perform a structure�preserving design� or whether to make use of an es� tablished AI paradigm� This decision is another of the key decisions for knowledge engineers listed in the introduction to this paper� It is possible that the �prob� ing questions technique could be extended to provide extensive support for this important decision� � Choosing a suitable implementation tool The nal project described in this paper aimed to produce guidance on the se� lection of a suitable shell or toolkit for implementing a knowledge based system �Robertson� ������ �� ��� The task The two projects described above have shown that knowledge�based guidance can be provided for certain areas of KADS modelling� However� the knowledge engineer�s decision making does not end when the �probing questions have been answered and the KADS design has been completed� as section ��� suggests� the choice of a programming tool requires careful consideration� taking into account both the recommendations of the probing questions and other factors external to the knowledge representation requirements� The requirements for this project were that the student should identify factors important to the selection of a KBS building tool� and develop a program which would recommend appropriate tools for a project� Given that the PDQ system had already been developed� it seemed sensible to use PDQ�s recommendations as a starting point for the identi cation of important factors� The student also referred to a number of books on knowledge engineering �e�g� �Price� ������ which gave their own advice on tool selection� ��� The project� acquisition Knowledge acquisition for this project was initially performed by reading various textbooks� and examining the output of the PDQ system� The results of this work were compiled� and represented graphically using a simple node and arc represen� tation� These diagrams� and associated text� were used as input to a knowledge elicitation interview with an experienced knowledge engineer� This interview pro� duced further useful information� which was used to alter and extend the diagrams� Finally� the student was directed to investigate a set of expert system building tools which was representative of all the categories which he had identi ed� The major result of the knowledge acquisition was two classi cations of KBS building tools � The rst classi cation divided tools into shells� toolkits and AI languages� These categories were further subdivided� shells were divided into those which evaluate rules by pattern matching �e�g� Xi Plus� OPS�� early versions of CLIPS� and those which are e�ectively �rule networks �e�g� Crystal��� Toolkits were subdivided into top�range toolkits �such as ART and KEE� and mid�range toolkits �such as Nexpert Object� ProKappa and Kappa PC�� Languages were not subdivided� since there are comparatively few popular languages for implementing knowledge based systems� � The second classi cation was based on the historical background of the tools� tools were classi ed as belonging to the �ART Camp �e�g� ART�IM� CLIPS� �This distinction was rst de ned in �Inder et al� ��� � �� ECLIPSE� or the �KEE Camp �including ProKappa and Kappa PC�� Tools in the same !camp� have similar features to each other� since many of them are derived from similar programming philosophies or tools� for example� all tools in the ART camp o�er e�cient forward chaining rules based on an implementation of the RETE algorithm� but are comparatively weak on backward chaining� because the algorithm used o�ers almost no support for backward chaining� whereas all tools in the KEE camp have good support for object�oriented programming� but comparatively ine�cient rules�� The other main conclusion from the knowledge acquisition phase �largely drawn from �Rothenberg� ������ was that the importance of particular features of a pro� gramming tool is dependent on the phase of the project� For example� at the very early stages of the project� rapid prototyping might be used to support knowledge acquisition� to investigate the feature of the tool� or to help the knowledge engineer learn to use the tool� at this stage� the training and debugging features of the tool are very important� However� when the nal system is delivered� the training and debugging tools are of little or no importance� whereas the speed and e�ciency of the execution of the program become very important� The student identi ed ve phases of program development �exploration� prototyping� development� elding and operation� and seven features of a tool �ease of use� e�ciency� extendability� exibility� portability� reliability and support� which are more or less desirable at various phases� For further details of the proposed relationship between phases of development and tool features� see �Rothenberg� ����� or �Robertson� ������ ��� The project� analysis � � � Domain level analysis The domain level of the expertise model comprised the classi cations identi ed in the knowledge acquisition phase� plus a selection of factors which in uence the choice of tool �such as the cost of the tool� the required platform for development� and the experience of programmers with the tool�� It also de ned a prototype !frame�� with a number of attributes �or� in the terminology of CommonKADS� a concept� with a number of properties� for de ning tools� This frame proved remarkably di�cult to de ne� because of di�culties in distinguishing properties of tools from tool�related concepts� For example� there was considerable discussion on whether the frame should have �forward chaining as a property �with values such as rete� procedural or none� or whether it should have �reasoning types as a multiple�valued property� with forward chaining being one of the values of �See �Mettrey� ���� for some benchmarking tests� Note� however� that the vendors of KAPPA� PC claim considerable performance improvements in newer versions of KAPPA�PC� which have appeared since Mettrey�s study was done� �� this property� This discussion led to an important obaservation concerning KADS� which is described in section ���� � � � Inference level analysis When the inference level is considered� it can be seen that this project is similar in structure to the PDQ project PDQ identi ed functional requirements which had to be mapped to design features� whereas this project uses various design features and other requirements to select a suitable tool� or class of tool� However� while the task of PDQ was to identify all relevant design features� the task required when selecting a tool is to select the best tool for the task� This requires comparison of a set of possible tools against all relevant factors� and assessment of how well each tool matches the overall set of criteria� The most appropriate task type in the KADS taxonomy is therefore assessment� rather than heuristic classi cation� By the time this project was carried out� some guidance had been published by members of the CommonKADS consortium �L�ockenho� � Valente� ����� on the con guration of inference structures to match particular assessment tasks� Starting with a minimal generic inference structure �Figure ��� this guidance consists of a set of questions which are asked of the user in order to decide which of the inference steps relevant to assessment tasks are actually performed in this task� The results of these questions are used to devise a problem�speci c version of the inference structure �e�g� Figure ��� which is then instantiated to the problem in hand by changing the generic labels on the knowledge roles into problem�speci c labels� The nal result �Figure �� forms the inference level of the model of expertise� A worked example of the con guration of an inference structure can be found in �Kingston� ������ System ModelCase Description Decision Class Match Cases Figure � Minimal generic inference structure for assessment tasks �� Abstract Case Description Measurement System Decision Classes Measure Case Set of Norms Specify Set of Norms Specify Conflict Resolution Conflict Resolution Criteria Resolve Conflicts Decision ClassDecision Class Resolve Conflicts Conflict Resolution Criteria Specify Conflict Resolution Specify Set of Norms Set of NormsMeasure Case Decision Classes Measurement System Abstract Case Description Abstract Case Description Measurement System Decision Classes Measure Case Set of Norms Specify Set of Norms Specify Conflict Resolution Conflict Resolution Criteria Resolve Conflicts Decision ClassDecision Class Resolve Conflicts Conflict Resolution Criteria Specify Conflict Resolution Specify Set of Norms Set of NormsMeasure Case Decision Classes Measurement System Abstract Case Description Figure �� Con gured inference structure for selecting a KBS tool �� Desired Product Features Available Products Measure Case Specify Set of Norms Available Product Features Chosen Products Conflict Resolution Criteria Specify Conflict Resolution Resolve Conflicts Ideal Product Figure �� Instantiated inference structure for selecting a KBS tool It can be seen that the task of selecting a KBS tool requires comparison of the features of available tools against the features desired for a particular project� this produces a shortlist of suggested tools� which is then further re ned to choose the ideal tool for the job� ��� The project� design and implementation The con gured inference structure contributed greatly to the quick and accurate development of an expertise model� Once the model was complete� design was per� formed� with the assistance of the PDQ system� PDQ produced very strong recom� mendations for using both rules �to represent the applicability of di�erent factors� �� and frames �to represent individual tools�� it also produced a strong recommen� dation for goal�driven reasoning� and a moderate recommendation for data�driven reasoning� At this point� the student was o�ered the choice of two tools CLIPS ��� or Sic� stus Prolog� The choice was restricted to two tools because these tools were easily available� and because the student had some familiarity with both these tools� the latter factor was important because of the tight timescales of the project� The lim� ited choice simpli ed the decision process for reasons described below� Prolog was chosen as the most appropriate tool� The system was therefore implemented using Prolog� using a goal�driven approach which investigated several di�erent categories of tool features one by one� The details of fteen di�erent KBS tools were also implemented� using a predicate called �frame which was de ned to allow the rep� resentation of object�like structures� The features of each tool were then matched against the features required by the user� the relative importance of each feature in the overall assessment was scaled according to the phase of development for which the tool would be used� and according to an importance value entered by the user� The nal list of tools� ordered by their overall score� was then displayed to the user� In order to test the system� it was retrospectively applied to the choice between CLIPS and Prolog for the student�s own project� The results of the consultation showed that Sicstus Prolog and CLIPS were among the more highly favoured tools� although they were not at the top of the list� however� most of the tools which were preferred were considerably more expensive� The main factors which led to Prolog being preferred were the recommendation for goal�driven reasoning from the probing questions� the problems in integrating rules and objects in version � of CLIPS� � and the student�s greater degree of familiarity with Prolog� which led the system to believe that features which were unavailable in Prolog �such as frames� could easily be programmed� It therefore seems that the choice of Prolog was the best decision from the available options� ��� Results This system can be considered to be a sophisticated prototype� Its reasoning ca� pabilities are wide�ranging �over many di�erent tool features and aspects of KBS development� and its algorithm for assessing tools provides results which closely resemble the opinions of expert knowledge engineers� in short� the heart of the sys� tem is su�ciently good to be commercially usable� It might nd favour as a tool for organisations to assess the adequacy of their existing tools for a proposed project� rather than to recommend a particular too� for a project which is already under Version � of CLIPS included a full object�oriented programming system� but these objects could not be pattern matched in the conditions of rules� Version � of CLIPS has introduced this facility� �� way� However� its knowledge base needs to be extended to include more tools� and it would bene t from improvements to its user interface and e�ciency� The use of KADS modelling and the probing questions has led to a knowledge base architecture which is well supported by documentation and justi cation more importantly it can be extended easily� because new tools can be added to the knowledge base simply by de ning a new !frame�� Ease of maintenance is a vital feature for a system which provides expertise about a domain which changes as rapidly as this one� � Discussion The three projects described above have produced knowledge based systems which provide guidance on particular decisions which users of CommonKADS have to make� The SEXTANT system assists knowledge engineers in the early stages of knowledge analysis� when they are selecting a suitable generic inference structure from CommonKADS� library of inference structures� PDQ provides suggestions for the architectural design of a knowledge based system� and the tool selection system provides guidance on matching the recommendeddesign with available shells or toolkits� The guidance which has been produced seems to be useful PDQ in particular has been used by later generations of students� and on commercial projects� It is important to realise that these systems are not intended to take over the decision�making role of a knowledge engineer� The guidance which these systems provide is heuristic� in some cases� careful analysis might suggest that the guidance should not be followed �see section ��� for an example�� The guidance also requires an understanding of knowledge engineering terminology� as well as an in�depth understanding of the problem to be solved� in order to make the best use of the advice� The key bene t of the guidance is in assisting knowledge engineers by providing a framework for performing the required analysis� just as CommonKADS provides a framework for representing knowledge� It does this by identifying the questions which need to be asked in order to make a particular decision� as well as providing suggested answers to those questions and �in some cases� justi cation for those answers� The projects also produced new insights about the modelling techniques rec� ommended by KADS and CommonKADS� The students found that using KADS helped them to think clearly about their knowledge bases and to identify speci c areas where problems occurred� However� KADS and CommonKADS are intended to assist rather than to replace the judgement of a knowledge engineer� the students all discovered that the KADS approach did not provide a �magic wand solution to all the problems which they encountered� For example� the SEXTANT project originally considered the choice of an appropriate task type to be an assessment �� task� but it was later discovered that a simpler approach� based on a decision tree� could be used to accomplish most of the job� Another example can be seen in the domain modelling for the tool selection project� where the student had di�culty in deciding whether �forward chaining should be a property or a value of a property� The crux of the problem was that neither approach could be ruled incorrect at a theoretical level� the choice between them depended on purely pragmatic con� straints� such as the number of possible values of the property� and the number of possible attributes of the concept� In other words� the de nition of concepts and properties in a domain appears to be context�dependent� This observation has far�reaching implications for CommonKADS domain modelling� in that it suggests that there is more than one !correct� model of any chosen domain of knowledge� There is therefore scope for further research on CommonKADS� identifying di�er� ent styles of domain modelling and the circumstances in which di�erent approaches are most appropriate� In terms of a single project� the choices of classi cations at the domain level are unlikely to a�ect the functionality of the nal system very much� although they may a�ect the ease of achieving that functionality� � Summary In summary� CommonKADS provides a declarative framework for representing knowledge� which can be used to promote clearer thinking and structuring of a knowledge base� CommonKADS is most appropriate for knowledge engineers who have enough experience to be able to make their own decisions about knowledge analysis or design in those exceptional circumstances when CommonKADS� rec� ommendations prove to be unhelpful� CommonKADS itself is su�ciently complex that knowledge engineers require experience with and�or guidance on the KADS approach to use it to its full extent� The projects described in this paper provide guidance for some of the key decisions which must be made when using Com� monKADS� This guidance is aimed at those people who would be capable of using CommonKADS it o�ers heuristic advice� which is normally good advice but may need to be overridden in exceptional circumstances� The results of these projects suggest that it is feasible to produce knowledge� based guidance for users of KADS or CommonKADS� as well as producing some new insights about domain modelling and task selection� It is hoped that the results of the projects above can be amalgamated with other projects which are producing guidance of CommonKADS �for tasks such as con guring inference structures �L�ockenho� � Valente� ����� and converting CommonKADS� Concep� tual Modelling Language to its Formal Modelling Language �Aben� ������� in order to make CommonKADS a more powerful tool for structuring knowledge engineer� ing� �� Acknowledgements I would like to thank Dr Dave Robertson and Dr Mandy Haggith of the Department of Arti cial Intelligence� University of Edinburgh� Ian Filby of AIAI� and Dr Robert Inder of the Human Communications Research Centre� University of Edinburgh for providing the students with supervision and expertise� I would also like to thank Dr Robertson� Dr Haggith and Robert Rae of AIAI for their comments on drafts of this paper� References �Aben� ����� Aben� M� ������� Formal methods in Knowl� edge Engineering� Unpublished Ph�D� thesis� SWI� University of Amsterdam� The relevant chapter is also available as CommonKADS report KADS� II�T����WP�UvA��������� �Breuker�� ����� Breuker�� J� A� ������� Model�driven Knowledge Ac� quisition� University of Amsterdam and STL� ES� PRIT project ����� Deliverable A�� �deHoog et al� ����� de Hoog� R�� Martil� R�� Wielinga� B�� Taylor� R�� Bright� C� and van de Velde� W� ������� The Common KADS model set� ESPRIT Project P���� KADS�II KADS�II�M��DM���b�UvA����� ���� University of Amsterdam and others� �Inder et al� ����� Inder� R�� � Aylett� R�� Bental� D�� Lydiard� T� and Rae� R� �October ������ Study on the Evaluation of Expert Systems Tools for Ground Segment Infras� tructure Final Report� Technical Report AIAI�TR� ��� AIAI� �Kingston� ����� Kingston� J�K�C� ������� Re�engineering IMPRESS and X�MATE using CommonKADS� In Expert Sys� tems ��� British Computer Society� Cambridge Uni� versity Press� Also available as AIAI�TR����� �Kingston� ����� Kingston� J�K�C� ������� Design by Exploration A Proposed CommonKADS Inference Structure� Sub� mitted to �Knowledge Acquisition�� �� �Kline � Dolins� ����� Kline� P�J� and Dolins� S�B� ������� Designing ex� pert systems � a guide to selecting implementation techniques� Wiley� �Krueger� ����� Krueger� A� �Sept ������ Classi cation of Expert Tasks the SEXTANT system� Unpublished M�Sc� thesis� Dept of Arti cial Intelligence� University of Edinburgh� �L�ockenho� � Valente� ����� L�ockenho�� C� and Valente� A� �March ������ A Li� brary of Assessment Modelling Components� In Pro� ceedings of �rd European KADS User Group Meeting� pages �������� Siemens� Munich� �MacNee� ����� MacNee� C� �Sept ������ PDQ A knowledge�based system to help knowledge�based system designers to select knowledge representation and inference tech� niques� Unpublished M�Sc� thesis� Dept of Arti cial Intelligence� University of Edinburgh� �Mettrey� ����� Mettrey� W� �February ������ Expert systems and tools Myths and realities� IEEE Expert� �Price� ����� Price� C� ������� Knowledge Engineering Toolkits� Ellis Horwood� This book concentrates on available toolkits� and on how to go about selecting an appro� priate toolkit� While some of the tools it mentions are now somewhat dated� the principles of tool se� lection are still valid� It also gives clear examples of implementation� �Robertson� ����� Robertson� S� �Sept ������ A KBS to advise on se� lection of KBS tools� Unpublished M�Sc� thesis� Dept of Arti cial Intelligence� University of Edinburgh� �Rothenberg� ����� Rothenberg� J� ������� Expert System Tool Evalu� ation� In Guida� G� and Tasso� C�� �eds��� Topics in Expert System Design� North�Holland� �Schreiber et al� ����� Schreiber� A� Th�� Wielinga� B� J� and Breuker� J� A�� �eds��� ������� KADS� A Principled Approach to Knowledge�Based System Development� Academic Press� London� �� �Tansley � Hayball� ����� Tansley� D�S�W� and Hayball� C�C� ������� Knowledge�Based Systems Analysis and Design� A KADS Developers Handbook� Prentice Hall� �van de Velde et al� ����� van de Velde� W�� Duursma� C� and Schreiber� G� ������� Design model and process� Unpublished KADS�II�M��MM�VUB� Vrije Universiteit Brussel� �Wielinga� ����� Wielinga� B� �October ������ Expertise Model Model De nition Document� CommonKADS Project Report� University of Amsterdam� KADS� II�M��UvA��������� �Wielinga et al� ����� Wielinga� B�� Van de Velde� W�� Schreiber� G� and Akkermans� H� ������� The KADS Knowledge Mod� elling Approach� In Proceedings of the Japanese Knowledge Acquisition Workshop JKAW�� �� �Wielinga et al� ����� Wielinga� B�� van de Velde� W�� Schreiber� G� and Akkermans� H� �Jun ������ Expertise Model Def� inition Document� CommonKADS Project Report� University of Amsterdam� �� work_2l47wd4umbgptlckvrzotnbl3q ---- 1 Transforming Standalone Expert Systems into a Community of Cooperating Agents N. R. Jennings, L. Z. Varga, Dept. of Electronic Engineering, Queen Mary and Westfield College, University of London, Mile End Road, London E1 4NS, UK. email: n.r.jennings@qmw.ac.uk l.varga@qmw.ac.uk R. P. Aarnts, Volmac Nederland B. V., Daltonlaan 300, 3584 BJ Utrecht, The Netherlands. email: rob_aarnts@eurokom.ie J. Fuchs1 and P. Skarek PS Division, CERN, 1211 Geneve 23, Switzerland. email:joachim@wgs.estec.esa.nl (Fuchs); pskpsk@cernvm.cern.ch (Skarek) ABSTRACT Distributed Artificial Intelligence (DAI) systems in which multiple problem solving agents cooperate to achieve a common objective is a rapidly emerging and promising technology. However, as yet, there have been relatively few reported cases of such systems being employed to tackle real-world problems in realistic domains. One of the reasons for this is that DAI researchers have given virtually no consideration to the process of incorporating pre-existing systems into a community of cooperating agents. Yet reuse is a primary consideration for any organisation with a large software base. To redress the balance, this paper reports on an experiment undertaken at the CERN laboratories in which two pre-existing and standalone expert systems for diagnosing faults in a particle accelerator were transformed into a community of cooperating agents. The experiences and insights gained during this process provide a valuable first step towards satisfying the needs of potential users of DAI technology - identifying the types of changes required for cooperative problem solving, quantifying the effort involved in transforming standalone systems to ones suitable for cooperation and highlighting the benefits of a cooperating systems approach in a realistic industrial application. KEYWORDS: Distributed Artificial Intelligence, Multi-Agent Systems, Cooperating Expert Systems, Fault Diagnosis, Particle Accelerator Control 1. Now working at esa/estec, WGS, Keplerlaan 1, NL-2200 AS Noordwijk, The Netherlands 2 INTRODUCTION As computer hardware and software becomes increasingly powerful, so applications which used to be considered beyond the scope of automation come into reach. To cope with these increased demands, software systems are becoming correspondingly larger and more complex. However the problems encountered in building these large systems are not simply scaled up versions of those faced when constructing small ones. Since the late 1960’s, when the “software crisis” was first noted, it has been realised that large systems require radically different techniques and methods. One paradigm for overcoming the complexity barrier is to build systems of smaller more manageable components which can communicate and cooperate1,2,3. Such a Distributed Artificial Intelligence (DAI) approach has several potential advantages. Firstly, divide and conquer has long been championed as a means of constructing large systems because it limits the scope of each processor. The reduced size of the input domain means the complexity of the computation is lower, thus enabling the components to be simpler and more reliable. Secondly, decomposition aids problem conceptualisation; many tasks appear difficult because of their sheer size. Other benefits include reusability of problem solving components, greater robustness in the case of component failure, speed up due to parallel execution, enhanced problem solving due to the combination of multiple paradigms and sources of information and finally increased system modularity1,3. In DAI systems, individual problem solving entities are called agents; agents are grouped together to form communities which cooperate to achieve the goals of the individuals and of the system as a whole. It is assumed that each agent is capable of a range of useful problem solving activities in its own right, has its own aims and objectives and can communicate with others. The ability to solve some problems alone (coarse granularity4) distinguishes components of DAI systems from the components of neural systems in which individual nodes have very simple states (either on or off) and only by combining many thousands of them can problem solving expertise be recognised. 3 Agents in a community usually have problem solving expertise which is related, but distinct, and which frequently has to be combined to solve problems. Such joint work is needed because of the dependencies between agents’ actions, the necessity to meet global constraints and the fact that often no one individual has sufficient competence to solve the entire problem alone. There are two main causes of such interdependence (adapted from Davis and Smith5). Firstly, when problem partitioning yields components which cannot be solved in isolation. In speech recognition, for example, it is possible to segment an utterance and work on each component in isolation, but the amount of progress which can be made on each segment is limited. Allowing the sharing of hypotheses is a far more effective approach6. Secondly, even if subproblems are solvable in isolation, it may be impossible to synthesize their results because the solutions are incompatible or because they violate global constraints. For instance when constructing a house, many subproblems are highly interdependent (eg determining the size and location of rooms, wiring, plumbing, etc.). Each is solvable independently, but conflicts which arise when the solutions are collected are likely to be so severe that no amount of work can make them compatible. It is also unlikely that global constraints (eg total cost less than £70,000) would be satisfied. In such cases, compatible solutions can only be developed by having interaction and agreements between the agents during problem solving. It is the need for significant amounts of cooperation to achieve tasks and the relative autonomy of the agents to determine their own activities which distinguishes DAI from more conventional distributed systems work. Despite these potential advantages, there are still relatively few DAI applications working on realistic problems in real world domains7. One of the reasons for this is the mismatch between the needs of organisations who require solutions to their problems and the research objectives of the DAI community. Typically organisations which have problems amenable to a cooperating systems approach already possess computer systems in which they have invested substantial amounts of time and money. Naturally enough they want the return on this investment to be maximised, meaning the systems should be utilised until they become obsolete or significantly better alternatives become available. However, most work in DAI assumes that the agents have been 4 purpose built using tools and techniques designed solely for cooperative problem solving. Virtually no consideration is given to the process of incorporating pre-existing systems into a cooperating community - yet this must be a central concern if DAI is to leave the laboratory and progress to real applications. To help redress the balance, this paper reports on an experiment carried out at the CERN laboratory, under the auspices of the ARCHON project8,9, in which two standalone and pre- existing expert systems for diagnosing faults in a particle accelerator were transformed into a community of cooperating agents. The problems faced during this process and the insights which emerged are recounted as an important first step towards tackling the larger issue of providing a methodology for describing how pre-existing systems can be incorporated into a cooperating community. The framework used to facilitate the cooperative problem solving was GRATE10,11 which is described more fully in the following section. This paper also indicates the types of social interaction which can be expected between large, coarse grained agents - an important indication of the appropriateness of theoretical research into techniques for cooperation and coordination for real-size industrial applications. Finally because DAI (and AI in general) techniques are so rarely applied to real applications, this experiment offers valuable insights into the problems and constraints encountered during this process - such issues often fail to emerge when idealised problems or simulated environments are used12. GRATE: A GENERAL FRAMEWORK FOR COOPERATIVE PROBLEM SOLVING GRATE (Generic Rules and Agent model Testbed Environment) is a general framework for constructing communities of cooperating agents for industrial applications. GRATE agents have two major components - a cooperation and control layer and a domain level system (see figure 1). The domain level system solves problems such as detecting and diagnosing faults, proposing remedial activities and checking the validity of operator actions. These problems are expressed as tasks - atomic units of processing when viewed from the cooperation and control layer. The cooperation layer is a meta-level controller which operates on the domain level system; its 5 objective is to ensure that the agent’s activities are coordinated with those of others within the community. It decides which tasks should be performed locally, determines when social activity is appropriate, receives requests for cooperation from other community members, and so on. GRATE’s clear delineation of domain problem solving and knowledge related to cooperation and control has several advantages. Firstly, it increases software reusability in that the cooperation layer can be deployed in multiple applications without having to disentangle the knowledge used to guide social activity from that used to solve domain level problems. Secondly, the domain and cooperation layers can be developed independently provided that they respect the interface Acquaintance Models Self Model SITUATION ASSESSMENT MODULE COOPERATION MODULE Information Store CONTROL DATA Task 1 Task 2 Task n Inter-agent communication CONTROL MODULE Communication Manager Domain Level System Cooperation & Control Layer INTERFACE Figure 1: Detailed GRATE Agent Architecture 6 definition. This is especially important with respect to pre-existing software because it places very few restrictions or constraints on the types of system which can be incorporated. Thus the domain level systems may be written on different hardware and software platforms, use different knowledge representation formalisms and have different control regimes13. By providing a standard interface to the domain level system, all of the underlying heterogeneity can be masked; enabling the cooperation layer to be used in a wide range of applications. The division also simplifies the domain level system because it can continue to act on the basis of local information; it need not be concerned with the activities of other agents because any influence they have on its behaviour will be exerted through the cooperation layer. The disadvantage of the delineation, with respect to pre-existing systems, is that the control regime and some interface interpretation commands may need to be written to enable the cooperation layer to exact the appropriate control - this issue is discussed further in the section describing the implementation details. Also for systems which are purpose built for cooperative problem solving in a particular environment, this architecture may involve creating an artificial barrier in its local control regime if the interface definition is too rigid. In GRATE communities control is completely distributed, there is no hierarchy and all agents are equal. Agents have a degree of autonomy in generating new activities and in deciding which tasks to perform next. A global controller would have been the easiest way of ensuring the cooperating community acted in a coherent way; with its knowledge of the goals, actions and interactions of all community members the controller could have ensured: that misleading and distracting information was not spread, that multiple agents did not compete for unshareable resources simultaneously, that agents did not unwittingly undo the results of each others activities and that the same actions were not carried out redundantly. However because of the complexity of most industrial applications this approach was deemed inappropriate because: • Bandwidth limitations make it impossible for agents to be constantly informed of all developments in the system5. 7 • A local perspective simplifies conceptualisation and implementation. Problems become more complex if agents have to monitor the activities of all the others - the theory of bounded rationality14. • The controller would become a severe communication and computational bottleneck and would cause the whole system to collapse if it failed15. GRATE’s cooperation and control layer has three main problem solving modules. Each module is implemented as a separate forward-chaining production system with its own inference engine and local working memory. Communication between the modules is via message passing. The rules built into each module are generic, they are applicable for controlling activities in a broad class of industrial applications. Some credence to this claim is given by the fact that GRATE has been applied to the domain of electricity transportation management10 and detecting overloads in a telecommunication network16, as well as the problem of diagnosing faults in a particle accelerator which is reported here. A more comprehensive description of GRATE can be found in reference 17. The control module is GRATE’s interface to the domain level system and is responsible for managing all interactions with it. This interaction is controlled through the following set of primitives: • From GRATE to the domain level system: • From the domain level system to GRATE: 8 The situation assessment module makes decisions which affect both of the other modules. It decides which activities should be performed locally and which should be delegated, which requests for cooperation should be honoured, how requests should be realised, what actions should be taken as a result of freshly arriving information and so on. It issues instructions to, and receives feedback from, the other modules. Typical requests to the cooperation module include “get information X” and “send out information Y to interested acquaintances”. Requests to the control module are of the form “stop task T1” and “start task T2”. The cooperation module is responsible for managing the agent’s social activities. The need for such activity is detected by the situation assessment module, but its realisation is left to this module. Three primary objectives related to the agent’s role in a social problem solving context are supported. Firstly, the cooperation module has to establish new social interactions (eg find an agent capable of supplying a desired piece of information). Secondly, the module has to maintain cooperative activity once it has been established, tracking its progress until successful completion (eg sending out relevant intermediate and final results to interested agents). Finally the module has to respond to cooperative initiations from other agents. The cooperation layer’s other components provide support for the activities of the main problem solving modules. The information store provides a repository for all the data which the underlying domain level system has generated or which has been received as a result of interaction with others. Acquaintance and self models are representations of other agents and of the local domain level system respectively. They describe the agent’s current problem solving state, the tasks it is able to solve, the goals it is working towards and so on - a fuller description is given in reference 10 and an illustration of their use in the CERN experiment is given in the next section. The agent models and the information store contain all the domain dependent data needed at the cooperation and control layer - thus enabling the rules of the problem solving modules to be application independent. Agents communicate with one another via a message passing paradigm. This form of 9 communication has several advantages over a shared memory approach (such as a blackboard18,19). Firstly, message passing has well understood semantics and offers a more abstract means of communication20. No hidden interactions can occur; so there is greater comprehensibility, reliability and control over access rights. Secondly, message passing makes fewer assumptions about system architecture. Finally, shared memory systems do not easily scale up. If only a single blackboard exists then it becomes a severe bottleneck and if several exist the semantics revert to message passing21. DIAGNOSING FAULTS IN A PARTICLE ACCELERATOR This section describes the diagnosis problem of accelerator operation, details two pre-existing expert systems used for this task which are running at the CERN laboratories and outlines the potential benefits of cooperation in this application. Particle Accelerator Operation The CERN Proton Synchrotron (PS) accelerators are one of the world’s most sophisticated high energy research tools. The PS complex is at the heart of CERN’s experimental facilities acting as an injector for the larger accelerators - the Super Proton Synchrotron and the huge Large Electron Positron rings. In the PS, nuclear particles are focused into particle beams, accelerated, and directed through several linear and ring accelerators using electromagnetic fields. These beams are then used in the physicists’ experiments. Different experiments require different types of beam, the variations are provided by the accelerator’s different operational modes. The PS complex is controlled by a team of operators who maintain the beam performance and the operational modes of the accelerators. Accelerator operation is a task demanding technical competence, experience, diagnostic skill and judgement. The operator has to handle information coming from control system modules, accelerator components and the acceleration process itself. Observations are made via measurement devices and range from simple status displays of specific accelerator components to complicated application programs and graphical information. The 10 operator can change setpoint control values for the different components directly or he can use application programs which present the beam properties on a higher conceptual level. Different sections of the accelerator are controlled from different consoles in the same control room. In the present system if there is some suspicion about a problem in an overlapping area, operators communicate by talking to their colleagues on other consoles. The operator’s workload is constantly increasing as new accelerators and different operational modes are added to the system. To cope with this increase, automated tools are becoming an integral part of the control process. CERN has already equipped their control room with several supporting tools - ranging from simple ones that display status information to high level software based on Expert System technology22,23,24,25,26. This experiment on cooperative problem solving concentrated on two of the high level tools, BEDES and CODES, which employ expert systems technology for diagnosing faults in the accelerator. BEDES and CODES The main goal of BEDES (BEam Diagnostic Expert System) is to diagnose operational faults at the beam level for the PS Booster injection part of the PS complex. Operational faults occur, for example, if the intensity of the particle beam falls below a certain level or if the beam deviates considerably from its ideal trajectory. Such problems can be caused by the incorrect setting of a control parameter (e.g. wrong timing is set for a switching magnet), a breakdown in a controller (e.g. the switching magnet is not working), or an error in the control system (e.g. a module that controls the switching magnet is down). BEDES can diagnose the first two types of fault if the underlying control system is still working correctly. If such faults are detected, BEDES tries to recover from them by resetting the correct control value or by optimizing a control value respectively. CODES (COntrol Diagnostic Expert System) has a similar control structure to BEDES, the main difference is in the domain knowledge. Whilst BEDES works on the beam level, CODES 11 operates on the level of the accelerator’s control system. This control system consists of thousands of hardware and software modules in a large computer network and is used by the operator to ensure that the accelerator process works successfully. A fault in the control system usually manifests itself in terms of a deviation of the particle beam and so the problem will be picked up by BEDES. In such cases BEDES and CODES could work together to determine the source of the fault. BEDES could help to detect whether the problem is caused by wrong parameter settings or a breakdown of a controller, while CODES could determine whether the fault is caused by the control system itself. Preparing for Cooperative Problem Solving When the particle accelerator is running, the operator receives a myriad of information from which he has to make an overall judgement of the situation and take reasonable decisions. As humans are resource-bounded, the operator is only able to use a limited number of tools and correlate very few pieces of information in real time. Therefore assistance is required in producing a consistent interpretation of the information. At present BEDES and CODES report information independently to the operator who then has to translate it to a common domain, determine whether it is consistent and decide upon the appropriate course of action. If the tools were capable of exchanging information directly, they would be capable of producing a consistent view automatically, leaving the operator to concentrate on the more cognitive activities (eg interpreting and acting upon diverse sources of information, tweaking the system to enhance performance, etc.) for which he is better suited. From here stems the motivation for introducing cooperation between the expert systems27,28. BEDES and CODES have knowledge about the diagnosis of the same particle accelerator from different perspectives. In certain situations, faults can be identified or even recovered from using only one of the systems, but in many instances contributions from both of them are needed. By exchanging intermediate and final results, the expert systems are able to focus each other’s problem solving activity on promising areas and draw each other away from unprofitable avenues 12 of reasoning. Initially cooperation between BEDES and CODES was studied by means of several paper exercises. Later practical exercises started and the expert systems were enhanced with an application-specific cooperation software which sent hypotheses directly from BEDES to CODES29. These preliminary studies identified some key design issues which needed to be addressed, these include: how are local actions performed in one expert system?, what is the common language of the expert systems?, how does one expert system model the other expert system? and how does one expert system model itself? Each of these questions are addressed in turn before the GRATE experiment is described in detail. Local Actions The main unit of reasoning for both BEDES and CODES is the hypothesis. Hypotheses are stored on an agenda; the status of the agenda determines whether the expert system is active or just idling. At the beginning of each inference cycle the agenda is rearranged (see figure 2), which means that hypotheses are realigned according to their priority and any which have become obsolete are removed. The first hypothesis is then taken from the agenda, it is evaluated and possibly more detailed descendant hypotheses are created and injected into the agenda. Evaluation requires that data describing the current status is gathered from the accelerator’s control system (e.g. currently valid control values); reasoning about this data is then undertaken, the outcome of which is a change in state of the hypothesis being evaluated (e.g. from unconfirmed to confirmed). If the evaluated hypothesis is confirmed, the fault has been found and diagnosis stops - the results are then reported to the operator. 13 From this structure it is apparent that the natural unit of local activity is the basic inference cycle and that local action can be controlled through operations on the agenda (eg to stop diagnosis the agenda should be cleared and to focus on a promising hypothesis it should be moved to the beginning of the agenda). Using this strategy local and cooperative actions can be kept separate and well organised, cooperative features can be added to the expert systems without significantly modifying their existing reasoning mechanisms. Common Language As both BEDES and CODES represent their hypotheses using a similar structure, it was decided to use this as the basis of a common language between the two agents. Hypotheses are assertions together with accompanying knowledge about how to prove them. The assertion is about an element or parameter of the accelerator which might be in an incorrect state or could go wrong in the near future. The related knowledge is composed of the necessary inference steps to prove the assertion and has two parts: procedural steps including data acquisition (eg read values from the Agenda Empty? Retrieve Data from Control System Take First Hypothesis Wait Re-Arrange Agenda Create and Inject Hypotheses Evaluate Hypothesis ACTIVE IDLE Yes No Recover from Error Report ResultConfirmed NOT.Confirmed Figure 2: Expert System Control Loop 14 control system, filter uncertainty and discrepancies and compute the derived parameter) and declarative rules operating on the structural description of the diagnosed system stored in the knowledge base of the expert system (eg is the value close enough to “ideal”?, if not then create derived hypotheses for those parts that can cause the deviation). Hypotheses are implemented as frames. Although the structure of the hypotheses are the same for both expert systems, the contents of the slots are different. A suspected-entity slot describes the element which may be at fault. A state-of-entity slot provides detailed data about the state of a suspected entity - including information such as the element is in fault, is operating out of specification or is operating normally. During the evaluation cycle, this slot is used to indicate the progress of the fault finding. The state-of-hypothesis slot expresses the state of the hypothesis itself. Possible values include: NOT.EVALUATED The hypothesis is newly created and not yet evaluated. NOT.CONFIRMED The hypothesis could not be confirmed but no attempt has been made to deny it. CONFIRMED The hypothesis was confirmed but no recovery attempt has been made. The state of a hypothesis is important for cooperation, because it contains information on the current phase of the diagnosis process. For example if a hypothesis is confirmed by one agent, then the fault has been found and the other system should stop trying to locate it. Another example is that if a hypothesis is evaluated by an expert system because of a request from an acquaintance and it cannot be confirmed, then the originator should be informed since it affects its local problem solving behaviour. A rating slot indicates the priority of a hypothesis and is used to order items on the agenda. During evaluation the expert system might create new hypotheses of a more detailed level - resulting in a tree structure (see figure 3). BEDES and CODES are incapable of understanding each others hypotheses directly because they refer to different domains - BEDES to sub-systems and 15 elements, CODES to knobchains, modules and details. However there is a level of commonality, in that translation can be performed between element and knobchain level hypotheses. This process involves changing the value of some slots of the hypothesis and loading structural data into the knowledge base. For example if BEDES suspects that something is wrong with a controller element (the element is the suspected-entity of a BEDES hypothesis), then CODES cannot directly use this. CODES has to map the element to that set of control system elements (knobchain) which operates this controller element. It also has to load into its knowledge base the structure of the knobchain. The structural knowledge is physically stored in a centralised and separate database for ease of maintenance and because of its sheer size. However for the purposes of this experiment, the database was regarded as part of CODES’s domain level system. So that the agents are not unnecessarily distracted by extraneous hypotheses which they cannot understand (eg CODES G1 S1 S2 E1 E4E3E2 K1 K2 K6K3 K4 K5 M1 M2 M3 M4 M7M6 D1 D2 BEDES CODES general sub-system element knobchain module detail There is a correspondance between E1-4 and K1-4. K5 and K6 have been generated independently by CODES Level of Commonality Figure 3: Hierarchy of Hypotheses 16 cannot use those of a general or sub-system level), agents represent the types of hypotheses that their acquaintances can process in their agent models (see the following section for more details). This knowledge is then used to guide hypothesis interchange. The advantage of using the hypothesis as a common language is that it involves a minimal translation overhead. Also it is close to the language used by the domain level systems which reduces the amount of modification required in the pre-existing systems. The disadvantage is that any new agents which may be added to the community at a later stage must also be able to represent and understand knowledge in this particular format. A better approach in terms of extensibility would be to construct a domain independent interlingua in which assumptions about the knowledge representation commitments are stated explicitly - see the work on the knowledge interchange format30,31 for a more comprehensive discussion of this issue. Benefits of a Distributed AI Approach The benefits of a DAI approach in this particular domain include: 1) As the accelerator and its control system consist of huge numbers of elements, corresponding to more than 10,000 setpoint control values, it would be extremely difficult to maintain and develop a centralised knowledge base for the whole process. Decomposing the problem into smaller modules results in smaller subproblems which are much easier to tackle. A modularised approach also fits more naturally into the existing organisational structure - knowledge of different domains (located in different divisions or groups) can be kept separate, but can be combined by cooperative problem solving at runtime. 2) The overall system will be open. 32 New agents covering different aspects of the particle accelerator process can be added when they are developed without having to alter the application’s existing conceptual model. This is important because new accelerators are built and added during the lifetime of the accelerator complex, also new operational modes may be developed. 17 3) The computing power of several workstations connected together through a network can be utilised. Thus the agents can work in parallel and produce results faster by sharing the workload. 4) Some of the drudgery and non-cognitive aspects of the operator’s job are removed, leaving greater time for the higher level tasks which cannot be automated using the currently available technology. THE GRATE CERN EXPERIMENT This section describes how cooperative fault detection can be carried out using the methods and tools discussed above. A typical cooperative scenario involving BEDES and CODES is outlined, before the steps involved in transforming the stand-alone systems into a community of cooperating agents being controlled by GRATE are expanded upon. A Cooperative Scenario In the implemented scenario, the main form of cooperation manifests itself in terms of the intelligent sharing of information between the two agents. This information is used to indicate changes in the status of the particle accelerator and to direct agents’ problem solving by sharing intermediate and final results. There are three distinct phases to controlling the particle accelerator. Firstly, there is normal operating conditions in which no fault has been detected. In this phase BEDES monitors the accelerator system to identify possible discrepancies. Monitoring involves continually comparing measurable system properties (such as the particle beam’s intensity, efficiency and trajectory) with their archived “ideal” values. If there is a significant discrepancy, then there is a possible fault in the accelerator. When a possible fault is detected, BEDES carries out a preliminary diagnosis phase and produces a list of hypotheses about suspected subsystem components. Once BEDES has produced a list of hypotheses to explain the accelerator fault, the second 18 phase of verifying the cause of the problem begins. The fault may have occurred as the result of a problem at the beam level or a fault with the control system. Therefore as well as starting to verify the cause of the fault, BEDES also informs CODES that there is probably a faulty element in the accelerator. When CODES receives this notification, it starts a diagnostic process to determine whether the problem lies within the control system. At this stage BEDES and CODES share the common goal of trying to locate the accelerator’s fault; they are looking at different aspects of the problem but their work is related by the fact that the hypotheses of CODES are further specialisations of BEDES’s element level hypotheses. As the two agents proceed with their diagnoses, various possibilities for cooperation exist based on the exchange of information about hypotheses. When such information is received, the recipient will undertake one of the following courses of action. A practical illustration of each case follows. 1) take no action if the information is not relevant to its current problem solving context. 2) use the information to deflect the focus of its problem solving activity away from an unprofitable area. 3) use the information to concentrate its problem solving activity on a promising area. Case 1 As a result of its evaluation, BEDES creates some new element level hypotheses. These hypotheses are sent to CODES, which translates them, before adding them to its agenda. As they are new hypotheses (status NOT.EVALUATED) they are merely added to the list of things to do, they do not affect the focus of CODES’s current problem solving. Case 2 BEDES evaluates an element level hypothesis (denoted by H). As H is at the element level it 19 would have already been sent to CODES when it was created (see Case 1). However as a consequence of its evaluation task BEDES has produced more information about H. This additional information is either that H is NOT.CONFIRMED or that H is CONFIRMED (see Case 3 for the latter situation). In the former case, when CODES receives the NOT.CONFIRMED message it takes one of the following actions depending on its problem solving context: a) CODES has not started working on H yet. Since BEDES was unable to confirm H, the chance that CODES will find a fault with the derivatives of H has been lowered. Knowing this, CODES’s rating of H will be reduced meaning that other more likely hypotheses will be dealt with first. b) CODES has started work on H or its derivatives. The probability that one of the hypotheses of the derived tree can be confirmed has been decreased. If there are other high level hypotheses in its agenda, then CODES continues with those. CODES drops its attention on the hypothesis tree of H and will, after the next rearrange agenda, continue with a new tree. c) CODES has already finished the evaluation of H and all the hypotheses derived from it. If CODES was also unable to confirm any hypothesis in the tree of H, then this is further confirmation of BEDES’s result and can be used to increase the operator’s level of confidence in the information. However if CODES did find a fault, then there is a conflict. In this instance resolution is straightforward; since CODES works at a lower level than BEDES its results are assumed to be more reliable. The user is thus presented with the result of CODES and a short note about the conflict. Case 3 If BEDES can confirm H, then the problem solving effort of CODES now switches to concentrate on the hypothesis tree of H and its derivatives. If CODES also detects a fault in this tree then the user’s confidence in the diagnosis will be heightened. If it does not find a fault then 20 there is a conflict and the operator is informed. The above cases describe situations in which information supplied by BEDES is used to direct the problem solving of CODES. However there is also a valuable flow of information in the opposite direction. The exchange of hypotheses from CODES to BEDES works differently because BEDES cannot translate nor understand the hypotheses of CODES directly. So knobchain level hypotheses created by CODES (eg K5 and K6 in figure 2) are of no direct relevance to BEDES. BEDES is only interested in results related to the hypotheses which it has previously sent to CODES (K1-K4 in figure 2). There are two situations in which CODES should send a result to BEDES: a) CODES was able to confirm a hypothesis. Since BEDES cannot understand the hypothesis itself a back-translation is needed. This involves moving up the tree to find the root node from which all the other hypotheses were derived and then sending this hypothesis back to BEDES. Thus if CODES finds a fault in the detail level (say D1 in figure 2) then its root hypothesis (K3) should be translated to (E3) and sent back to BEDES. b) CODES could not confirm any of the hypotheses derived from those sent by BEDES. This only occurs if all hypothesis from the tree are evaluated and none of them could be confirmed. The result sent back (after translation) will be the original hypothesis of BEDES with the status NOT.CONFIRMED. In both of these cases, BEDES tries to integrate the information received into its own problem solving activity. The situations which might occur now are quite similar to those of Cases 1-3 in that the attention of BEDES can be drawn or dropped to a certain hypothesis depending on the status of the information supplied by CODES. If BEDES was ahead of CODES then the results will be compared; in the case of a conflict it is assumed that the agent which found a fault is more reliable. The cooperative fault finding phase comes to an end if all hypotheses are evaluated and none 21 of them could be confirmed (a transient fault or a false alarm) or a hypothesis has been confirmed. In either case the recovery phase will begin. This phase is outside the scope of this experiment. Integrating Pre-Existing Expert Systems Converting the standalone versions of BEDES and CODES into a community of cooperating agents being controlled by GRATE required three main activities to be carried out. Firstly, some adaptations to the control of BEDES and CODES were required and the domain level tasks had to be defined. Secondly, the interface between the expert systems and GRATE had to be constructed. Finally, the acquaintance and self models of the agents needed to be populated - this process includes specifying the recipes2 which control agent activity, enumerating the tasks which the domain level system can perform and representing the information which other agents would benefit from receiving. Expert System Adaptations As figure 2 illustrates, the initial control of both BEDES and CODES was a non-interruptible loop. However for the purpose of controlling local problem solving from an upper layer, such a coarse granularity was inappropriate. To utilise the benefits of cooperation, GRATE has to be capable of influencing the rating of hypotheses and of injecting new items into the agenda. Therefore the control cycle needed to be split into more manageable components. As the original coding of the control loop was carried out in a modular fashion it was relatively straightforward to decouple the cycling routines from the actual functionality which it drove. Having identified the control regime, the next step is to determine the domain level system tasks. When performing this analysis the overriding objective was to minimise the amount of change required to the structure of the pre-existing systems, whilst still permitting the benefits of interaction to take place. Inspection of the existing control loop suggested there should be six tasks 2. Recipes are sequences of actions known by an agent for achieving a particular objective33. 22 - one for each node of the graph. However a deeper examination of the system structure revealed that “retrieve data” and “evaluate hypothesis” are virtually indivisible because the former is deeply embedded within the latter. Also “create and inject” is intimately related to the evaluation process and also could not easily be separated. Selection of hypothesis from the agenda was regarded as the initialisation phase for evaluation, therefore it was decided to leave it hidden in the intelligent system. Thus the control loop was collapsed into two basic tasks - evaluating hypotheses and rearranging the agenda. • REARRANGE-AGENDA: re-arranges the agenda so that the highest priority tasks are near the beginning. Also removes any superfluous hypotheses. • EVALUATE-HYPO: takes the first hypothesis from the agenda and evaluates it. This level of control was considered appropriate for two reasons. Firstly because of the way the pre-existing systems were implemented; any other decomposition would have required significant modifications to the existing structure. Secondly, from the perspective of exploiting information gleaned from other agents, the advantages of a finer level control would have been negligible. As a consequence of this reconceptualisation it is apparent that some of the control which resided originally in the expert system has migrated up into GRATE’s cooperation and control layer. However not all of the control has been moved, a significant amount of lower level control remains with the domain level system. In this application GRATE exerts control over the domain level system through its agenda. Therefore some of the manipulation functions which existed within the domain level system needed to be made available at the cooperation and control layer if the benefits of information received from acquaintances is to be exploited. These functions include: • INJECT-HYPO: inject a hypothesis into the agenda • DELETE-HYPO: remove a hypothesis from the agenda 23 • GET-AGENDA: return the current contents of the agenda • CHANGE-RATING: modify the rating slot of a hypothesis in the agenda The responsibility for ensuring that these commands are executed in a coherent manner resides with the control of the domain level system. Thus if GRATE decides that the rating of a hypothesis should be modified, it issues the “change-rating” command to its domain level system. Once received this directive will not be acted upon immediately since, for example, the expert system may be in the middle of performing an evaluation. Only when it comes to its rearrange agenda task will the modification actually take place. From a design perspective, it is important that such domain specific control remains within the expert systems. Exporting it to the upper level would require GRATE’s control module to be at least as sophisticated as the control of the domain level system and would also mean that it was different for each and every application. Maintaining a clean separation of concerns allows GRATE’s control module to be simpler and more generic. Interfacing GRATE and the Domain Level Systems BEDES and CODES run on separate workstations and are implemented in KEE making use of SUN Common LISP; GRATE is written in Allegro Common LISP. Because of incompatibilities between the different pieces of software, and also for efficiency reasons, it was decided to run GRATE on a third workstation. Thus all the agents’ cooperation and control layers ran on one machine, this machine being different from the ones which were executing the domain level systems. To allow an agent’s control module to interact with its domain level system a communication package was utilised. This package established bidirectional communication between SUN Common LISP on one workstation and Allegro Common LISP on another. It was based on standard UNIX tools such as sockets and TCP/IP and had been developed as part of the application specific cooperation software used for preliminary experimental work. GRATE would issue commands such as: START(EVALUATE-HYPO). This directive would be picked up by the communication package which would send the message onto the workstation running the 24 appropriate domain level system. In addition to interacting with the domain level system, the cooperation and control layer needs to carry out reasoning about received and generated information. To facilitate this, some domain dependent functions needed to be written for inclusion into GRATE’s recipes. In this experiment these functions were primarily related to providing an interpretation of the common language (i.e. the hypotheses) and of presenting output to the operator. Examples of such functions include: • (has-slot-value ) boolean function which verifies if a specified slot of a hypothesis contains a certain value • (has-equal-slots ) boolean function which verifies if the slots of the two hypotheses contain the same value • (find-related-hypos ) returns all members of the hypothesis list which have the same value in the SUSPECTED.ENTITY slot as the specified hypothesis • (confirm ) displays message to the operator that a hypothesis has been confirmed by both agents Instantiating the Agent Models When building a GRATE application a significant proportion of the knowledge required to control cooperative problem solving is built into the system. For these experiments, no additions were needed to the generic rule set. Thus each agent had exactly the same rules in its cooperation and control layer and the application builder was only concerned with the domain-dependent 25 features of GRATE (i.e. the agent models). Firstly, the self models need to be instantiated. This involves describing the tasks which the domain level system is able to perform - including the name, the inputs it must receive in order to execute and the results which are produced. In this experiment the self models contained descriptions of the following tasks: rearrange-agenda, evaluate-hypo, inject-hypo, delete-hypo, get-agenda and change-rating. Two sample descriptions are given below: TASK NAME: EVALUATE-HYPO MANDATORY INPUTS:(HYPO) RESULTS PRODUCED:(STATUS NEW-HYPOS) TASK NAME: CHANGE-RATING MANDATORY INPUTS: (HYPOS RATING-CHANGE) RESULTS PRODUCED: NIL Tasks are grouped together into recipes. Recipes have trigger conditions which indicate when they should be activated, a body which describes the actions to be performed and a description of the results produced. The recipe which encodes the basic control loop for the agent’s fault verification phase is shown in figure 43. This recipe is triggered when the accelerator monitoring phase detects a problem; it loops continuously until the cause of the fault has been ascertained whereupon a recovery mode recipe is invoked. RECIPE NAME:(VERIFY-CAUSE-OF-FAULT) TRIGGER: (ENTER-FAULT-FINDING-MODE) ACTIONS:( (START (REARRANGE-AGENDA (> FIRST-HYPO)) 3. “>” means unbound variable and “<” indicates a bound variable. Thus the evaluate-hypo task takes one input (called first-hypo) and produces two outputs (respectively named status and new-hypos). 26 (START (EVALUATE-HYPO (< FIRST-HYPO) (> STATUS) (> NEW-HYPOS))) (LOOP-UNTIL (FAULT-CONFIRMED (< STATUS)))) RESULTS: (NEW-HYPOS) Figure 4: Basic Control Loop for Fault Verification Phase Figure 5 illustrates a recipe which describes how CODES utilises information about hypotheses received from BEDES. In particular it highlights the way in which information about the state of hypotheses can be used to draw or deflect CODES’s attention from a particular branch of the search space. It encodes cases two and three of the cooperative scenarios highlighted earlier; note to simplify the example, cases of conflict are not dealt with. The recipe is triggered when CODES receives a hypothesis which BEDES has evaluated (status CONFIRMED or NOT.CONFIRMED). According to cooperative scenario case 1, CODES will already have received information about the hypothesis when it was first generated (status NOT.EVALUATED). To carry out the necessary reasoning, CODES has to identify those hypotheses in its agenda which are related to the one just received from BEDES. This matching process is carried out by the recipe’s first two actions GET-AGENDA and FIND-RELATED- HYPOS. The remaining recipe actions are conditional upon CODES’s current problem solving context. The first condition tests whether BEDES has also verified a hypothesis which CODES has already confirmed (CONFIRMED.HYPO). If this is the case, then the level of confidence in the diagnosis is increased and the operator should be informed. The second conditional action draws the attention of CODES to the hypotheses related to the one confirmed by BEDES - this is achieved by increasing the rating of the related hypotheses by a value of 30. The final action deflects CODES’s attention away from a hypothesis tree which appears less promising. 27 RECIPE NAME:(USE-EVALUATED-HYPO-INFORMATION) TRIGGER: (AND(info-available HYPO) (not (has-slot-value HYPO STATE.OF.HYPO NOT.EVALUATED)) ACTIONS: ( (start(GET-AGENDA (> FULL-AGENDA))) (start (FIND-RELATED-HYPOS (< FULL-AGENDA)(< HYPO)(> RELATED-HYPOS))) (start-if (and (has-slot-value HYPO STATE.OF.HYPO CONFIRMED) (has-equal-slot HYPO CONFIRMED.HYPO SUSPECTED.ENTITY)) (CONFIRM (< HYPO))) (start-if (has-slot-value HYPO STAT.OF.HYPO CONFIRMED) (CHANGE-RATING (< RELATED-HYPOS) 30)) (start-if (has-slot-value HYPO STAT.OF.HYPO NOT.CONFIRMED) (CHANGE-RATING (< RELATED-HYPOS) -20))) RESULTS: NIL Figure 5: CODES recipe for exploiting information received from BEDES Once the self models have been completed the acquaintance models need to be populated. In the example cooperative scenarios the most important feature to model about another agent is the information which it is known to be interested in. For example BEDES’s model of CODES contains the information that CODES would benefit from receiving any newly generated hypotheses which are at the element level (cooperative scenario case 1), any element level hypotheses that it has been unable to confirm (cooperative scenario case 2), or any element level hypotheses that it has been able to confirm (cooperative scenario case 3). 28 INTERESTS: (..(HYPO (AND (AT-ELEMENT-LEVEL HYPO) (HAS-SLOT-VALUE HYPO STAT.OF.HYPO NOT.EVALUATED))) (HYPO (AND (AT-ELEMENT-LEVEL HYPO) (HAS-SLOT-VALUE HYPO STAT.OF.HYPO CONFIRMED))) (HYPO (AND (AT-ELEMENT-LEVEL HYPO) (HAS-SLOT-VALUE HYPO STAT.OF.HYPO NOT.CONFIRMED)))...) RESULTS AND EXPERIENCES BEDES and CODES were successfully transformed from standalone expert systems to a community of cooperating agents under the control of GRATE. This transformation was achieved with minimal modifications to the pre-existing expert systems and with no augmentation to GRATE’s generic knowledge. The cooperating system was tested using a special development mode of the accelerator in real time. As the accelerator operates in a time sharing fashion it was possible to deliberately introduce faults into the system in the test mode without disturbing the other modes which were serving real physicists experiments. The results of this experiment highlighted some shortcomings in the design decision to map the BEDES and CODES expert systems directly into agents. This proved to be a less than optimal choice because of the large amount and diverse range of processing carried out in each system. As both systems were originally conceived as standalone pieces of software they contain a vast array of functionality which does not logically belong together - including monitoring, data acquisition, fault diagnosis and recovery. Also because of their sheer size, the expert systems were becoming unwieldy in their own right. Introducing such systems into a cooperating community merely exacerbated these problems. 29 Using the cooperation metaphor it is possible to divide the systems into a number of simpler and logically separate agents which could work on dedicated areas of the problem. A new design was proposed in which the functionality contained in BEDES and CODES was split into seven agents34. This new approach offers greater system modularity and allows the benefits of parallelism and interaction to be exploited to an even greater degree. Firstly the data acquisition and treatment functionalities were separated out. It became apparent that the reasoning process in the different agents was heavily reliant on the correctness of the data. For this reason, the treatment of acquisitions is likely to become increasingly sophisticated in the future and so a dedicated agent is warranted. An additional advantage is that it is easier to provide treated data from a single source rather than having to do separate acquisition and treatment in each and every agent. The next decision was to remove the user interface functionality from the individual systems and provide homogeneous presentation of data through a specialised agent. This provides the operator with one entry point to the entire agent community. It also allows him to be presented with high level information about process parameters which can be obtained through interaction with the acquisition agent. This is not possible in the existing control system and so the operator has to rely on raw data from the process. BEDES and CODES were originally conceived as diagnostic expert systems; it was some time before their recovery facilities were incorporated. Because of their add-on nature, it was decided to separate the recovery actions from the diagnostic part. Finally, BEDES reasons on two conceptual levels: on the high level of beam parameters (which must be deduced from raw data rather than acquired directly) and on the direct level of the equipment (raw data). In the modified design these two levels are mapped into separate agents. This is beneficial because it frees the high level reasoning from the shackles imposed by the strict mechanism of the hypothesis verification process. 30 CONCLUSIONS The stated aim of this work was to take two standalone and pre-existing expert systems and construct from them a community of cooperating agents. This was achieved by using the GRATE system to control the cooperative activity and required only slight modifications to the expert systems. The cooperating community worked together to diagnose faults which occurred in the real particle accelerator process. Most reported systems work with highly idealised problem solvers and simplified domains. This naturally makes experimentation easier, but is dangerous in that the assumptions which have to be made may hide important issues which need to be addressed if the technology is to be used in realistic environments eventually. This experiment used real expert systems working on a real world problem. By adopting this approach the experiment provided many useful insights into the fundamental problem of incorporating pre-existing systems in a community of cooperating agents. When undertaking this activity, the structure of the pre-existing system needs to be analysed to ensure it is open enough for the cooperation and control layer to exploit the information gleaned during interactions with fellow community members. Necessary modifications may include defining a finer granularity of control, making previously hidden control functions explicit or developing completely new functions. From this experiment it is clear that some of the basic control which previously resided in the domain level system needs to be moved into the cooperation and control layer. However lower level control which is more application specific is best left at the domain level. When deciding the separation of concerns there is a tradeoff between the amount of restructuring of the domain level system and the desired granularity of control. A detailed analysis of the existing system must be undertaken so that a balance can be struck which avoids a significant reformulation system but which still allows the benefits of interaction to be realised. A second important consideration which this experiment uncovered is the relationship between 31 pre-existing systems and agents - it is not always best to adopt a simple one to one mapping. Often standalone systems contain a number of logically disparate tasks which can be split into separate agents. This decomposition typically allows greater parallelism to be exploited in problem solving and makes better use of the cooperating systems metaphor. The types of interaction exhibited by BEDES and CODES are typical of a broad class of problem solving called Functionally Accurate, Cooperative (FA/C)35. In the FA/C paradigm agents asynchronously exchange partial results about the intermediate state of their processing to ensure the community arrives at a consistent interpretation of the whole problem. From a traditional computer science perspective, this type of cooperative problem solving can be viewed as a form of distributed search which has multiple loci of control36. In the CERN experiment, the partial results are hypotheses and as further evidence emerges (eg hypotheses become CONFIRMED or NOT.CONFIRMED) so the problem solving behaviour of the community develops accordingly. The types of cooperation exhibited in this experiment are similar in nature to other industrial applications which have been studied within the context of the ARCHON project (eg electricity transport management37 and management of electricity distribution38) which suggests that theoretical research into the FA/C paradigm has a practical use. This experiment also provides further support for two of Lesser’s observations about the FA/ C paradigm36. Firstly, that in comparison to a standalone version, an FA/C agent is more complex. This can be seen by the refashioning of and additions to the expert systems’ control regimes. Secondly it is observed that effective control of cooperative problem solving requires local control decisions to be influenced by the state of problem solving in other agents. In this experiment, the behaviour of other agents is monitored by the cooperation and control layer and influence is exerted through modifications of the agent’s agenda. The implemented cooperation schemes are relatively straightforward, but there is scope for greater sophistication which may enhance performance still further. At present when BEDES sends CODES its initial block of hypotheses, CODES starts processing them in the order in which they 32 arrive. This means both agents start trying to verify the fault from the same position in the search space. As the hypotheses are unrated at this point, this focuses the communities efforts on an uneccesarily small portion of the accelerator. It would be better if the two agents worked on different areas of the search space while the information about the faults is limited, then only when further information becomes available would they focus their joint efforts on promising areas. This could be realised by a relatively straightforward approach in which CODES starts working on hypotheses from the end of the list or in a more elegant manner by a form of negotiation39,40 in which agents decide upon the portion of the problem space on which they will concentrate their initial efforts. This division of labour is possible because both systems are usually capable of detecting the same fault. If, however, neither of them is able to find the fault in their part of the search space then only at this stage should they start trying to work on areas which the other agent has already processed. A final enhancement to the application would be to incorporate a more sophisticated mechanism for resolving conflicting opinions between the agents. For example rather than simply assuming the result produced by CODES is correct, it would be better if some conflict resolution expertise could be used to determine the source of the conflict and provide a means of resolving it41. Such a mechanism would enable better quality information to be presented to the operator. ACKNOWLEDGMENTS The work described in this paper has been carried out in the ESPRIT II project ARCHON (P2256) whose partners are: Atlas Elektronik, JRC Ispra, Framentec-Cognitech, Labein, Queen Mary and Westfield College, IRIDIA, Iberdrola, EA Technology, Amber, Technical University of Athens, University of Amsterdam, Volmac, CERN and University of Porto. REFERENCES 1 Bond, A. H. and Gasser, L. (eds), Readings in Distributed Artificial Intelligence, Morgan Kaufmann (1988). 33 2 Gasser, L. and Huhns, M. N., (eds), Distributed Artificial Intelligence Volume II, Pitman Publishing (1989). 3 Huhns, M. N. (ed) Distributed Artificial Intelligence, Pitman Publishing (1988). 4 Sridharan, N. S. 1986 Workshop on Distributed AI, AI Magazine, Fall, 75-85 (1987). 5 Davis, R. and Smith, R. G. Negotiation as a Metaphor for Distributed Problem Solving, Artificial Intelligence, 20, 63-109 (1983). 6 Lesser, V. R. and Erman, L. D. An Experiment in Distributed Interpretation, IEEE Trans. on Computers, 29(12), 1144-1163 (1980). 7 Jennings, N. R. and Wittig, T. ARCHON: Theory and Practice, in Distributed Artificial Intelligence: Theory and Praxis, (eds L.Gasser and N.M.Avouris), 179-195, Kluwer Academic Press (1992). 8 Wittig, T. ARCHON: An Architecture for Multi-Agent Systems, Ellis Horwood (1992). 9 Jennings, N. R. Cooperation in Industrial Systems, Proc. ESPRIT Conference, Brussels, Belgium, 253-263 (1991). 10 Jennings, N. R. Mamdani, E. H. Laresgoiti, I. Perez, J. and Corera, J. GRATE: A General Framework for Cooperative Problem Solving, Journal of Intelligent Systems Engineering, 1 (2) 102-114 (1992). 11 Jennings, N. R. Using GRATE to Build Cooperating Agents for Industrial Control, Proc. IFAC/ IFIP/IMACS International Symposium on Artificial Intelligence in Real Time Control, 691- 696, Delft, The Netherlands (1992). 12 Cohen, P. R. A Survey of the Eighth National Conference on Artificial Intelligence: Pulling Together or Pulling Apart, AI Magazine, 12 (1), 16-41 (1991). 34 13 Roda, C. and Jennings, N. R. The Impact of Heterogeneity on Cooperating Agents, Proc. AAAI Workshop on Cooperation among Heterogeneous Intelligent Systems, Anaheim, Los Angeles, USA (1991). 14 Simon, H. A. Models of Man, New-York, Wiley (1957). 15 Lesser, V. R., and Corkill, D. D, Distributed Problem Solving, Encyclopedia of Artificial Intelligence (Ed S.C.Shapiro), 245-251, John Wiley and Sons (1987). 16 Whitney, C. Cooperating Intelligent Agents: A Study of GRATE, BT Report MAIN-WP1008, BTRL Martlesham Heath, Ipswich, UK (1992). 17 Jennings, N. R. Joint Intentions as a Model of Multi-Agent Cooperation in Complex Dynamic Environments, Ph.D. Thesis, Dept. Electronic Engineering, Queen Mary and Westfield College (1992). 18 Engelmore, R. and Morgan, T. (eds) Blackboard Systems, Addison Wesley (1988). 19 Hayes-Roth, B. The Blackboard Architecture: A General Framework for Problem Solving?, Stanford Heuristic Programming Project, HPP-83-30, Stanford University (1983). 20 Hewitt, C. E. and Kornfield, W. A. Message Passing Semantics, SIGART Newsletter, 48 (1980). 21 Hewitt, C. E. and Lieberman, H. Design Issues in Parallel Architectures for Artificial Intelligence, Proc. of IEEE Computer Society International Conference, 418-423 (1984). 22 Malandain, E. Pasinelli, S. and Skarek, P. A Fault Diagnostic Expert System Prototype for the CERN PS, Europhysics Conference on Control Systems for Experimental Physics, Villars-sur- Ollon, Switzerland (1987). 23 Skarek, P. Malandain, E. Pasinelli, S. and Alarcon, I. A Fault Diagnosis Expert System for 35 CERN Using KEE, SEAS (SHARE European Association) Spring Meeting, Davos, Switzerland (1988). 24 Malandain, E. Pasinelli, S. and Skarek,P. Knowledge Engineering Methods for Accelerator Operation, European Particle Accelerator Conference, Rome, Italy (1988). 25 Malandain, E. and Skarek, P. Linking a Prototype Expert System to an Oracle Database, IASTED, International Conference on Expert Systems, Theory and Applications, Zurich, Switzerland (1989). 26 Malandain, E. An Expert System in the Accelerator Domain, International Workshop on Software Engineering, Artificial Intelligence and Expert Systems for High Energy Nuclear Physics, Lyon Villeurbanne, France (1990). 27 Fuchs, J. Skarek, P. Varga, L. and Wildner-Malandain,E. Integration of Generalized KB- Systems in Process Control and Diagnosis, Invited paper for the SEAS conference, Lausanne, Switzerland (1991) 28 Fuchs, J. Skarek, P. Varga, L. and Wildner-Malandain,E. Distributed Cooperative Architecture for Accelerator Operation, 2nd International Workshop on Software Engineering, Artificial Intelligence and Expert Systems for High Energy and Nuclear Physics, L’Agelonde, La- Londe-les-Maures, France (1992). 29 Varga, L. Cooperation Between the Two Diagnostic Expert Systems BEDES and CODES, CERN Technical Report, PS/CO/WP 91-02, (1991) 30 Neches, R. Fikes, R. Finin, T. Gruber, T. Patil, R. Senator, T. and Swartout, W. R. Enabling Technology for Knowledge Sharing, AI Magazine, Fall, 36-56 (1991). 31 Ginsberg, M. L. Knowledge Interchange Format: The KIF of Death, AI Magazine, Fall, 57-63 (1991). 36 32 Hewitt, C. E. The Challenge of Open Systems, BYTE, 10 (4), 223-244 (1985). 33 Pollack, M. E. Plans as Complex Mental Attitudes, in Intentions in Communication (Eds P.R.Cohen, J.Morgan and M.E.Pollack), 77-105, MIT Press (1990). 34 Fuchs, J. Skarek, P. Varga, L. and Wildner-Malandain, E., (1992), Distributed Cooperative Architecture for Accelerator Operation, ARCHON Technical Report 26, CERN, Geneva (1992). 35 Lesser, V. R. and Corkill, D. D. Functionally Accurate, Cooperative Distributed Systems, IEEE Trans. on Systems, Man and Cybernetics, 11 (1), 81-96 (1981). 36 Lesser, V. R. A Retrospective View of FA/C Distributed Problem Solving, IEEE Trans. on Systems, Man and Cybernetics, 21 (6), 1347-1362 (1991). 37 Aarnts, R. P. Corera, J. Perez, J. Gureghian, D. and Jennings, N. R. Examples of Cooperative Situations and their Implementation, Vleermuis Journal of Software Research, 3 (4), 74- 81 (1991). 38 Cockburn, D. Varga, L. Z. And Jennings, N. R. Cooperating Intelligent Systems for Electricity Distribution, Proc. Expert Systems 1992 (Applications Track) , Cambridge, UK (1992). 39 Laasri, B. Laasri, H. and Lesser, V. R. An Analysis of Negotiation and its Role in Cooperative Distributed Problem Solving, Proc. Second Generation Expert Systems Conference, Avignon, France (1991) 40 Conry, S. E. Kuwabara, K. Lesser, V. R. and Meyer, R. A. Multi-Stage Negotiation for Distributed Constraint Satisfaction, IEEE Trans. on Systems, Man and Cybernetics, 21 (6), 1462-1477 (1991) 41 Klein, M. Supporting Conflict Resolution in Cooperative Design Systems, IEEE Trans. on Systems, Man and Cybernetics, 21 (6), 1379-1390 (1991). 37 FIGURE LEGENDS 1) Detailed GRATE Agent Architecture 2) Expert System Control Loop 3) Hierarchy of Hypotheses 4) Basic Control Loop for Fault Verification Phase 5) CODES recipe for exploiting information received from BEDES work_2ntpv4mpyre3xdhmwt3t2mqfve ---- Annals of Emerging Technologies in Computing (AETiC) Vol. 5, No. 1, 2021 Obada Alhabashneh, "Fuzzy-based Adaptive Framework for Module Advising Expert System”, Annals of Emerging Technologies in Computing (AETiC), Print ISSN: 2516-0281, Online ISSN: 2516-029X, pp. 13-27, Vol. 5, No. 1, 1st January 2021, Published by International Association of Educators and Researchers (IAER), DOI: 10.33166/AETiC.2021.01.002, Available: http://aetic.theiaer.org/archive/v5/v5n1/p2.html. Research Article Fuzzy-based Adaptive Framework for Module Advising Expert System Obada Alhabashneh Mutah University, Karak, Jordan o.alhabashneh@mutah.edu.jo Received: 3rd November 2020; Accepted: 22nd December 2020; Published: 1st January 2021 Abstract: In the enrolment process, selecting the right module and lecturer is very important for students. The wrong choice may put them in a situation where they may fail the module. This could lead to a more complicated situation, such as receiving an academic warning, being de-graded, as well as withdrawn from the program or the university. However, module advising is time-consuming and requires knowledge of the university legislation, program requirements, modules available, lecturers, modules, and the student's case. Therefore, the creation of effective and efficient systems and tools to support the process is highly needed. This paper discusses the development of a fuzzy-based framework for the expert recommender system for module advising. The proposed framework builds three main spaces which are: student-space (SS), module- space (MS), and lecturer-space (LS). These spaces are used to estimate the risk level associated with each student, module, and lecturer. The framework then associates each abnormal student case in the students’ grade history with the estimated risk level in the SS, MS, and LS involved in that particular case. The fuzzy- based association-rule learning is then used to extract the dominant rules that classify the consequent situation for each eligible module if it is to be taken by the student for a specific semester. The proposed framework was developed and tested using real-life university data which included student enrollment records and student grade records. A five-fold cross-validation process was used for testing and validating the classifying accuracy of the fuzzy rule base. The fuzzy rule base achieved a 92% accuracy level in classifying the risk level for enrolling on a specific module for a specific student case. However, the average classifying accuracy achieved was 89.2% which is acceptable for this problem domain as it involves human behavior modeling and decision making. Keywords: Intelligent Academic Advisor; Module Adviser; Expert System; Fuzzy Logic; Fuzzy Rule-based. 1. Introduction Every academic semester, students enroll in different modules offered by their universities. Enrolment might sound like a simple process, but it involves many complications and problems for students particularly in these universities where the credit hour system is adopted. According to this system, although there are some constraints such as the academic plan, the students still have the freedom to select the sections to enroll in based on their preferences including the module, timeslot, and lecturer of the offered sections in the timetable. However, the lack of knowledge about the modules and the lecturers, coupled with the contradictory advice they receive from their colleagues and the complexity of the academic plan of their programs may affect the students' ability to select the right choice. This creates a need for professional advice as taking a wrong choice may lead to unwanted situations such as failing the module, or more seriously, academic warning, program withdrawal or even leaving the university [1-3]. To address this problem, universities usually have academic advisors to help the students in selecting their modules and sections for the upcoming semester based on different factors including http://aetic.theiaer.org/ http://aetic.theiaer.org/ http://www.theiaer.org/index.htm http://aetic.theiaer.org/archive/v5/v5n1/p2.html mailto:o.alhabashneh@mutah.edu.jo AETiC 2021, Vol. 5, No. 1 14 www.aetic.theiaer.org the student’s cumulative Grade Point Average (GPA), the available modules, timeslots, module’s lecturer, and the academic plan of the student. However, module advising is a challenging task as it is complicated, time-consuming, and needs qualified and knowledgeable human resources. This urges the need for developing intelligent systems for module-advising to aid the students in selecting the right modules at the right time to ensure a smooth passage for students throughout their academic plans to finish their degrees with fewer problems. Intelligent Recommender Systems (RSs) are widely used in different application areas including online shopping, movies, social networking, and others. However, they are less common in education and this area is still under development [1-3]. RSs can provide the students with personalized recommendations on suitable modules based on the student’s academic history. Such systems can also incorporate previous student's experiences to provide an improved recommendation. Furthermore, they adapt to the changes in the data and give the recommendation based on that. The need for RSs and their requirements in the education sector has been discussed in [4-6]. More recently, the use of RSs in education has received more attention and their potential has been proven [7, 8]. This paper discusses the development of a fuzzy-based framework to be used in recommender systems for module advising. The proposed framework builds the recommendation based on creating three main spaces which are: student-space (SS), module-space (MS), and lecturer-space (LS). These spaces are used to estimate the risk level associated with each student, module, and lecturer. The framework then associates each abnormal student case in the students’ grade history with the estimated risk level in the SS, MS, and LS involved in that particular case. Fuzzy-based association- rule learning [9,10] is then used to extract and summarize a fuzzy rule base. The fuzzy rule base is used to predict the risk level associated with the combination of a specific student, module, and lecture and build the recommendation to students based on that. The proposed approach provides a novel mechanism to estimate the consequent risk associated with the student selection of a specific module that is taught by a specific lecturer. The risk estimation is based on creating and analyzing the three space elements. 2. Related Work 2.1. Course/Module Advising in Higher Education Academic advising is vital for a good academic experience of students as it positively impacts their success and retention [11, 12]. However, it requires specific knowledge of the student's situation, history, program’s academic plan, modules, and lecturers [12]. Gordon et al. [13], defined academic advising as “situations in which an institutional representative gives insight or direction to a college student about an academic, social, or personal matter”. Module advising is one of the main tasks of the academic advisor, which refers to the process by which the academic advisor supports the student in selecting the right modules in which to enroll. Having the importance and the complexity of module advising, different researchers have argued that there is a need for developing intelligent systems to support this task [2, 4, 7, 14]. Developing such systems aims to minimize the demand for human advisors and gives them more time to focus on other important advising tasks including career development and financial issues. [12-16]. A novel approach for module long term plaining called Interactive Decision Support for Course Planning (IDiSC+) was proposed by Mohamed [12]. The approach used optimization techniques to support both the student and the advisor in building a recommended long-term academic plan (towards graduation) to be followed by the student. Laghari [14] developed an Automated Course Advising System (ACAS) for module planning. The system distributes the modules of the academic plan on different semesters based on the history of other students. There has been significant research effort in applying information technology for module advising. Roushan et al. [15] introduced an Internet-based approach to support the module-advising process which integrated the process of advising with the enrolment taking into account the constraints of the program plan and the university policy. However, the system did not replace the human academic-advisor but rather facilitated the advisor role by providing an automated tool for AETiC 2021, Vol. 5, No. 1 15 www.aetic.theiaer.org communication and information exchange. A decision-support system was proposed to support academic advisors in preparing a pre-enrolment plan for the students and assist in the offering of appropriate modules for the upcoming semester [16]. Al-Ghamdi et al.[17] developed an advisor expert system (PAS) for postgraduate students. The proposed system was designed to assist them in the selection of the most relevant modules without referring to a human module-advisor. A web- based advising system [18] was proposed which supports three types of users, including students, advisors, and heads of department to make sure that a complete picture is available for the students. Mattei et al.[19] developed a decision-theory advising tool to enhance the advisor-student relationship. The tool allowed students to browse the module offerings, possible future scenarios, and their outcomes. Shatnawi et al. [20] used the enrolment and marking history from similar cases and applied an association rule-based system to give general recommendations when selecting the modules on which to enroll. 2.2. Content-based Different researchers have used a content-based academic recommendation system for module selection. For example, Lin et al.[21] utilized a multi-agent approach and ontology to provide a dynamic and personalized recommended module list. The multi-agent approach, which included various agents, used a preference-driven planning algorithm supported by the ontology to build the recommendations. Darimola et al.[4] integrated Case-Based Reasoning (CBR) with Rule-Based Reasoning (RBR) techniques to provide an intelligent approach for module advising. The approach also used historical data to build a list of recommended modules for the following semester. 2.3. Collaborative filtering Collaborative filtering was also used for academic advising. Huang et al. [22] proposed a score- based prediction approach for course recommendation. The approach used a user domain collaborative filtering to create the recommended course list. Courses were clustered based on the student feedback in [23] and the resulted clusters were combined with fuzzy based rule association technique to create the course recommendation. Nafea et al. [24] developed a learning-style-based collaborative-filtering approach for module recommendations which utilized different metrics to identify similar profiles including k-means, cosine, and person correlation. Chang et al. [25] proposed a user-based collaborative-filtering approach to predict student grades. Mortenson et al. [26] introduced a collaborative filtering approach for module selection which utilizes an artificial immune mechanism for the prediction. Bydžovská [27] investigated the effect of student and module features on the enrolment patterns and designed a collaborative-filtering-based system to predict the module grade. Yao et al. [28] attempted to increase the fairness of the module recommendation by addressing the biased-recommendation problem against minority groups of students. They developed four different fairness metrics that can be optimized using the learning objectives 2.4. Knowledge-based Different knowledge-based recommendation approaches have been proposed for module selection. Xu et al. [29] developed a knowledge-based approach to offer a personalized module sequence to the new students. This approach utilized a dynamic learning algorithm that learns from the performance of other students on a specific module. Koutrika [30] argued that recommendation methods should not be 'hard-wired', but it should be flexible. In that sense, a new paradigm for the recommendation was introduced in which a recommendation approach can be defined declaratively as a high-level parameterized workflow comprising traditional relational operators and new operators that generate or combine recommendations. Keston et al. [31] utilized semantic web expert system technologies to build a knowledge base that is used by an intelligent web-based application to provide the required recommendations. Engin et al. [32] developed an expert rule-based system for module recommendation. The rules were captured from the real advisors and then injected into a rule-base using Oracle Policy Automation (OPA). AETiC 2021, Vol. 5, No. 1 16 www.aetic.theiaer.org Hashemi and Blondin [33] included several factors to be taken into consideration when recommending modules for students such as the frequency of the module offering, balancing the module load, and shortening the graduation path. All these factors and others were included in a rule base which was used for recommending modules. Ayman [34] proposed an expert system for module selection which included both prescriptive and developmental advising models and utilized object- oriented databases for data and rule representation. Abdullah Al-Ghamdi et al. [17] proposed a rule- based advising system. The system was designed for postgraduate students and built recommendations to support the students in the selection of modules related to the topic of their research thesis. Nambia and Dutta [35] introduced a dynamic and flexible rule-based advising system in which the rules were separated from the execution to enable the student to try different scenarios by updating the XML file where the rules are stored. Nguyen et al. [36] proposed a knowledge-based framework that utilized a learning data-warehouse for discovering patterns in the student behavior including module selection and achievements. These patterns were then used to make the recommendations. 2.5. Hybrid Other researchers proposed Hybrid approaches in which more than different perspectives were integrated. Daramola. et al. [4] designed an intelligent expert system for module advising which integrated rule-based reasoning (RBR) with Case-Based Reasoning (CBR) using the academic history of the students. Sobecki [37] applied Ant Colony Optimization (ACO) to provide an efficient module advisor system. The system predicted the final grade of the students in a module, based on a domain- specific representation integrated with the ACO. Abdulwahhab [38] integrated Genetic Algorithms (GAs) and a Decision Tree for short-term module-scheduling. 2.6. Fuzzy Based A few researchers have used fuzzy logic to develop module advising systems. Goodarzi and Rafe [39] developed a fuzzy-based expert system for student advising. The proposed system was a web-based module that can be integrated with the university portal. The module fuzzifies the business rules and the GPA of the students to advise them on which modules should be taken in the following term. Adak [40] used Fuzzy techniques to recommend elective modules to students. It analyzed transcripts of students to extract fuzzy rules to relate the module to the students who intend to take them. Baloul and Williams [41] used the Order of Preference by Similarity to Ideal Solution (TOPSIS) technique to develop a fuzzy model for probation students to minimize the risk of taking the wrong modules in the early stage of their study. 3. The proposed approach This paper discusses the development of a fuzzy-based framework of an expert system for module advising. The proposed framework assumes that recommendations are based on three main elements which are: student-space (SS), module-space (MS), and lecturer-space (LS). SS contains the current student status which includes their accumulative average (if there is any academic warning), the closest abnormal academic status and to which extent it's close to that status. It also contains the knowledge domains of the academic program and the progress of the students in each domain. MS contains the average module mark for the last five years and the knowledge domains that the module belongs to. LS contains the average of the marks for the lecturer for all modules and the average of the lecturer's marks for each module. The three spaces and the result of the final calculation are then combined in a matrix called the case-space (CS). Fuzzy based association-rule learning is then used to extract the dominant rules to classify the consequent case for each eligible module if it was taken by the student for a specific semester. The fuzzy logic is used to handle the uncertainty involved in modeling the human decision and to provide a transparent and interpretable mechanism module taking risk estimation. AETiC 2021, Vol. 5, No. 1 17 www.aetic.theiaer.org The main purpose of the proposed framework is to estimate the consequent risk level (Low, Medium, and High) taking a specific module. The risk level is assigned based on a list of unwanted cases associated with the student failing or not progressing in the module. These cases include a GPA decrease, moving down from one GPA category to a lower one (degrading), receiving an academic warning, withdrawn from the academic program, withdrawn from the university, and graduation pending. In addition to the risk-level estimation, the framework provides the student with a justification (interpretation) for the estimated risk-level based on the student's situation, the targeted module, and the lecturer. The framework was developed based on real-life university data which included a historical record of enrolment associated, student's marks, cumulative GPA, module offerings, academic plans for the programs, and finally the knowledge domains for both the programs and the modules. The proposed approach is depicted in Figure 1 Figure 1: The proposed framework Fig. 2 shows the use case diagram of the proposed system. Figure 2: System use case diagram AETiC 2021, Vol. 5, No. 1 18 www.aetic.theiaer.org The proposed approach consists of six main steps: 3.1. Step-1: Creating the Three Spaces and Abnormal Case Matrix In this step, the student-space (SS), module-space (MS), lecturer-space (LS), and abnormal-case matrix are created as follows: 3.1.1. Step-1.A: Creating the three Spaces The three spaces SS, MS, and LS are created as shown in Fig. 3Error! Reference source not found.. The spaces are extracted from the university database. Figure 3: The three spaces and abnormal case matrix Student space: this space contains information about the student as shown in Table 1. Table 1: Student space Element Description 1. Student Id A unique identification number for the student 2. Cumulative GPA The accumulative average of the student out of 100 3. Student Average Mark for the Program Knowledge Domains This is a matrix that includes the knowledge domains of the student academic program and the average mark of the student for the modules which belong to each domain. 4. Potential Risk Situation This refers to the approximate abnormal student situation for the current situation of the student. (i.e., the current abnormal situation of the student boundary between two GPA categories, such as “very good” and “satisfactory”). Each abnormal-situation type is given a code and a percent of 100 to indicate the level of risk which is given to the code. 5. Current Student Situation This refers to the student's current abnormal situation. i.e., If the current abnormal situation of the student boundary between two GPA categories, such as “very good” and “satisfactory”, or the student's GPA has dropped. Each abnormal-situation type is given a code and a percent of 100 to indicate the level of risk which is given to the code. Module Space: this space contains information about the module as shown in Table 2. Table 2: Module space Element Description 1. Module Id A unique identification number of the Module 2. General Average of Grades The average grades for the students in the Module 3. Average grade for each program A matrix includes the academic programs which contain the module and the average module grade for each program 4. Average grade for each Lecturer who taught the module A matrix which includes the lecturers who taught the module and the average grade for the module given by each Lecturer 5. Fail Rate The percentage of students who failed the module AETiC 2021, Vol. 5, No. 1 19 www.aetic.theiaer.org Lecturer Space: This space contains information about the module as shown in Table 3. Table 3: Lecturer space Element Description 1. Lecturer Id A unique identification number of the lecturer 2. General Average of Grades The average grade of the lecturer 3. Fail Rate The percentage of students who failed the modules taught by the lecturer. 3.1.2. Step-1.B: Creating Abnormal Case Matrix In this sub-step, the abnormal-case matrix is created by selecting the student id, module id, lecturer id, and the student abnormal situation for abnormal cases in the student enrolment records from the university database. The abnormal case is usually indicated by a flag or a symbol in the database as shown in Figure 4. Figure 4: Abnormal-case matrix with case mapping 3.2. Step-2: Case Risk Analysis and Calculation For each problematic case in the dataset, a risk weight is calculated for the three spaces: SS, MS, and LS and the associated abnormal situation. The product of this step is shown in Fig. 5. The risk was calculated for the three spaces using equations developed based on common sense and the opinion of field experts such as the academic registry members, lectures, and head of departments. The results than have been presented to the experts and evaluated by them. Figure 5: Case-Risk Analysis and Calculation Risk estimation of student-space: 𝑺𝑹(𝑺𝑺) = (𝟏𝟎𝟎 − ( 𝑮𝑷𝑨(𝑺𝑰𝑫) + 𝑲𝑫𝑨(𝑴𝑰𝑫,𝑺𝑰𝑫) 𝟐 ) + 𝑷𝑨𝑪 + 𝑪𝑨𝑪)/𝟑 Equation 1: Risk estimation of student-space Where SR is the risk estimation of the student space in relation to the current case. It takes the student space as input and returns a percentage value out of 100. SS is the student space. GPA is the student cumulative GPA, SID is the student Id, KDA is the student average mark of the knowledge domain to which the module belongs. PAC is the scaled weight (out 100) of the potential abnormal situation. CAC is the scaled weight (out 100) of the current abnormal situation of the student. Risk estimation of module-space: AETiC 2021, Vol. 5, No. 1 20 www.aetic.theiaer.org 𝑴𝑹(𝑴𝑺) = (𝟏𝟎𝟎 − 𝑴𝑨𝑮(𝑴𝑰𝑫) + 𝑴𝑷𝑨(𝑴𝑰𝑫,𝑷𝑰𝑫) + 𝑴𝑳𝑨(𝑴𝑰𝑫,𝑳𝑰𝑫) 𝟑 + 𝑴𝑭𝑹(𝑴𝑰𝑫))/𝟐 Equation 2: Risk estimation of module-space Where MR is the risk estimation of the module in relation to the current student situation. It takes the Module Space (MS) as an input and returns a percentage out of 100, MS is the module space. MAG is the general average mark of the module. MID is the Module identification number; MPA is the average module mark for the student's program. PID is the identification number of the student's academic program. LID is the lecturer identification number. CAI is the average mark of the Module for the Lecturer. MFR is the Module fail rate which is a percentage of the students who failed the Module. Risk estimation of lecturer-pace: 𝑳𝑹(𝑳𝑺) = ((𝟏𝟎𝟎 − 𝑳𝑨𝑮(𝑳𝑰𝑫)) + 𝑳𝑭𝑹(𝑳𝑰𝑫))/𝟐 Equation 3: Risk Estimation of Lecturer Space Where LR is the risk estimation that comes from the lecturer side out 100, LS is the lecturer-space. LAG is the general average mark of the lecturer. LID is the lecturer-identification number. LFR is the lecturer fail rate which is a percentage of the students who failed the modules taught by the lecturer. Situation risk (SR) estimation: The resulting situation types were provided to a set of experts (i.e., academic-registry staff, advisors, and heads of department) and they were asked to assign a specific weight which reflects the risk level. The risk level takes a value between 1 and 100. The net risk for each situation type is calculated by taking the average of the weights given by the experts. 3.3. Step-3: Linguistic Labelling In this step, the final risk weights of the three spaces and the resulting situation are given linguistic labels of H-high, M-medium, and L-low. These linguistic labels are generated using Type- 1 and based on predefined fuzzy sets based on the Mendel Wang method [10]. In the proposed approach, the member function of the fuzzy sets uses a triangle shape as shown in Figure 6 and is based on three parameters {a, b, c} as shown in Equation 4. 𝑇𝑟𝑖𝑎𝑛𝑔𝑙𝑒(𝑥;𝑎;𝑏;𝑐) = { 0, 𝑥 ≤ 𝑎. 𝑥 − 𝑎 𝑏 − 𝑎 , 𝑎 ≤ 𝑥 ≤ 𝑏. 𝑐 − 𝑥 𝑐 − 𝑏 , 𝑏 ≤ 𝑥 ≤ 𝑐. 0, 𝑐 ≤ 𝑥. Equation 4: Triangle fuzzy set Where the parameters {a, b, c} (with a Then) where the linguistic labels of the three spaces are the antecedents and the linguistic label of the resulting situation is the consequent as shown in Figure 8. SS MS LS Result ↓ ↓ ↓ ↓ H H H → H Antecedents Consequent Figure 8: Rule base creating 3.5. Step-5: Fuzzy Rule-base Compression and Validation The rule base extracted in the previous step could contain many rules based on the size of the dataset. It may have a large amount of repetition (i.e., Repeated rules) and contradiction. The contradiction here means rules that have the same antecedent with different consequents such as shown in the following example in Table 4: Table 4: Contradictory rule patterns R(SS) (MS IS Result ↓ ↓ ↓ ↓ H H H → H H H H → L H H H → M To address these issues a five-fold cross-validation [42] process was used to train the rule base to summarize it to the most dominant unique patterns. In this validation method the dataset is divided into 5 equal-size sets (D1, .., D5). For each fold the following steps are applied: 1. One of the subsets Dn is selected as a testing set Tn and the other subsets are groups in one training set D^. 2. The rule compression(summarization) technique [43] is applied to the training set to produce a summarized rule-base 3. The summarized rule-base is applied to the test set to predict the risk level. 4. The predicted risk levels are compared with the actual risk levels from the dataset, to identify the accuracy of the rule-base in predicting the risk level The compression technique: This technique uses two measures for rule quality, which are “generality” and “reliability” and are used to identify rule patterns with maximum firing strength. Generality measures the number of instances in the extracted rule base which supports each rule pattern. Reliability measures the confidence level of each rule pattern [43]. AETiC 2021, Vol. 5, No. 1 22 www.aetic.theiaer.org In the proposed approach, generality is calculated using scaled fuzzy support, and the reliability is calculated by multiplying the scaled fuzzy support by the firing strength of the rule pattern. The support of the rule pattern refers to the number of the rules in the rule-base that the pattern represents. The “confidence” refers to the strength of a specific rule pattern against the contradictory patterns (i.e. because other rule patterns have the same antecedent and a different consequent) [43]. Fuzzy support is used to identify the unique rule-patterns with their occurrence in the extracted rule base. The fuzzy support can be scaled for each unique rule-pattern using the total number of instances in the rule-base which have the same consequent. Equation 5 shows how the scaled fuzzy support for a unique rule pattern is calculated: 𝑠𝑐𝐹𝑢𝑧𝑧𝑆𝑢𝑝(𝑅𝑃𝑙) = 𝑵𝑹𝑷𝒍 𝑵𝑹𝑷𝒍 + 𝑵𝑹�̂�𝒍 Equation 5: Fuzzy support Where l =1 to M, l is the index of the rule pattern, RPi is a unique rule pattern i.e. (H, H, H →M), 𝑵𝑹𝑷𝒍 is the number of instances in the extracted rule base supporting the rule pattern RPi. 𝑵𝑹�̂�𝒍is the number of instances in the extracted rule base which support other patterns with the same consequent i.e. ({M, H, H →M}, {M, M, H →M},…). The “confidence” is a measure of the uniqueness of the pattern as it indicates its strength against the contradictory patterns, which are the other patterns having the same antecedents but a different consequent. Equation 6 shows how confidence is calculated 𝑠𝑐𝐶𝑜𝑛𝑓(𝑅𝑃𝑙) = 𝑠𝑐𝐹𝑢𝑧𝑧𝑆𝑢𝑝(𝑹𝑷𝒍) 𝐶𝑜𝑹𝑷𝒍 Equation 6: Confidence Where 𝐶𝑜𝑹𝑷𝒍 is the number of instances in the extracted rule base which supports the contradictory rule- patterns with 𝑅𝑃𝑙. The final scaled-weight of the rule pattern is calculated as the product of the fuzzy support and the fuzzy confidence as shown in Equation 7. 𝑠𝑐𝑊𝑖 = 𝑠𝑐𝐹𝑢𝑧𝑧𝑆𝑢𝑝 × 𝑠𝑐𝐶𝑜𝑛𝑓 Equation 7: Final scaled weight Each of the generated unique rule patterns is assigned the scaled fuzzy weight measure scWi as follows: Table 5: The scaled fuzzy weight of the unique rule-patterns SS MS LS Result scWi ↓ ↓ ↓ ↓ H H H → H 0.35 H H H → L 0.02 H H H → M 0.12 The scaled fuzzy weight of the unique rule patterns is then used to select the rules with the highest weights among the contradictory-patterns set. The result of the compression process is a summarized rule-base that contains dominants and consistent rule patterns. The resulting rule base will be used later to estimate the risk level considering a specific module, lecturer, and student. 3.6. Step-6, Compressed Rule Base Selection The accuracy levels of the five compressed rule-bases resulting from the previous step are compared and the compressed rule-base with the highest accuracy level was selected to be included in the system. 4. Experiment and Results 4.1. Dataset A real-life university data was used as a dataset for this paper. The dataset included student- enrolment records, student-mark records, module records, academic program records, and records for lecturers. The dataset consisted of the records of 5000 students who faced problems during their AETiC 2021, Vol. 5, No. 1 23 www.aetic.theiaer.org studies. These problems included academic warnings, program withdrawals, program changes, and cumulative average grade down-gradings. 4.2. Creating the Three Spaces (SS, MS, and LS) The university uses an Oracle database for the student information system (SIS), which was used to create a view for each of the three space types. A view was also created to include the risk estimation for each space associated with the risk estimation provided by the expert. 4.3. Creating the Fuzzy Rule-Base To create the fuzzy rule base, MatLab Fuzzy Logic Tool Kit was used to create the fuzzy sets, which were then used to create the linguistic labels for the data extracted from the three spaces. 4.4. Fuzzy Rule-Base Training and Compression A five-fold cross-validation technique was used to train the rule-base, then, for each fold, the fuzzy rule-base was divided into two different subsets, which were training and testing. The 80%- 20% rule was used to identify the size of each set. The compression technique discussed in section (3) was applied on the training set to produce the compressed rule-base. The resulting compressed rule- base was applied to the test set to predict the risk level. The predicted risk-levels were compared with the actual risk-levels provided by the experts to determine the accuracy of the rule-base in predicting the risk level as shown in Table 6. Table 6: Rule base classifying accuracy sample Passed? Predicted Actual Situation Type L_Risk L_ID M_risk M_ID S_Risk S_ID 1 L L A M 210508 H 1002 M ****112 0 M H D L 210125 H 2511 H ****210 1 H H E H 030410 M 1002 H ****525 1 H H D H 120215 L 5147 L ****402 1 M M B M 130514 H 1002 M ****111 1 M M F M 010816 L 2142 M ****237 1 H H D H 070516 M 2151 H ****252 0 M H D L 220589 H 2101 H ****332 The accuracy level of the test results for each fold is shown in Table 7. As shown in the table the proposed approach achieved 92% classifying accuracy and 89.2 as an average classifying accuracy. The comparison between the proposed approach and the other approaches is not a straightforward task as the main focus of the proposed approach differs from those of the others. The proposed approach aims to address the risk estimation while some others focused on long-term planning or supported the process of the academic advisory as a whole. However, the results can still be compared to indicate the performance of the proposed approach. For example, the multi-agent system proposed in [22] achieved only 60% user satisfaction with its effectiveness. Besides, the user satisfaction with the intelligent advising system proposed in [4] was 77.8% which indicates at the performance proposed system in this paper is acceptable. Table 7: 5-Fold accuracy test results Fold Number of Rules Accuracy 1 24 88% 2 25 91% 3 27 92% 4 23 86% 5 23 89% Average 89.2% AETiC 2021, Vol. 5, No. 1 24 www.aetic.theiaer.org 4.5. Best Compressed Rule Base Selection The accuracy levels of the five compressed rules bases resulting from the previous step were compared and the compressed rule-base with the highest accuracy level was selected to be included in the system as shown in Table 8. Table 8: Best rule set I SS MS IS Result scWi ↓ ↓ ↓ ↓ 1 H H H → H 0.352 2 H H M → H 0.311 3 H H L → M 0.273 4 H M H → H 0.218 5 H M M → H 0.346 6 H M L → M 0.438 7 H L H → H 0.517 8 H L M → M 0.593 9 M L L → L 0.433 10 M H H → H 0.214 11 M H M → M 0.511 12 M H L → M 0.162 13 M M H → M 0.169 14 M M M → M 0.283 15 M M L → M 0.364 16 M L H → M 0.407 17 M L M → M 0.502 18 L L L → L 0.584 19 L H H → H 0.224 20 L H M → M 0.436 21 L H L → M 0.365 22 L M H → M 0.156 23 L M M → M 0.132 24 L M L → L 0.214 25 L L H → L 0.375 26 L L M → L 0.325 27 L L L → L 0.154 5. Conclusion This paper has presented a fuzzy-logic based approach of an expert system for module advising. The approach creates three spaces which are student-space, module-space, and lecturer space, and associate them with the abnormal situation type for each abnormal case. Each space and the abnormal situation type is then given a linguistic label using fuzzy sets. The linguistic labels are then associated with each other for each case to generate a fuzzy rule. The fuzzy rules for all cases are combined in one rule-base and then compressed to extract those rules with the highest firing strength. The fuzzy logic was used to handle the uncertainty implied in the human judgment of the student case as well as to provide a transparent and interpretable mechanism for predicting the risk level of enrolling a student on a module. The approach was developed using real-life university data and achieved an acceptable level of accuracy of 92% which is expected to improve as more data is captured and used to train the rule base. Although the achieved accuracy might sound like it needs a bit of enhancement but having that the machine learning approach used for training the rule base the accuracy is expected to enhance as more data instances are included. Also, the accuracy level acceptance depends on the problem and context, especially with cases in which are molding human decisions or behavior. Although there are different approaches have been proposed in this area, this paper introduces a novel mechanism that creates three spaces to estimate the risk of the student situation associated with a specific module. The three spaces which are namely: Students, Lecturer, and Module provide a multi-angle view on the student case and makes the estimation more realistic. Also, applying the fuzzy logic provides a means to handle the uncertainty included in the human decision-making regarding module selection. AETiC 2021, Vol. 5, No. 1 25 www.aetic.theiaer.org The fuzzy rule base also provides a transparent mechanism to make the recommendation which means it doesn’t provide the risk level only but also the justification of that recommendation based on the risk level of the three spaces. References [1]. Lam, S. S. and Choi, S. P. M. (2013). Implementing an efficient preference-based academic advising system. International Journal of Applied Management Science, 5(4), pp 297–321, DOI: 10.1504/IJAMS.2013.057110. [2]. Almutawah, K. A. (2014). A decision support system for academic advisors. International Journal of Business Information Systems, 16(2), pp 177. DOI 10.1504/IJBIS.2014.062837. [3]. Daramola, O., Emebo, O., Afolabi, I. and Ayo, C. (2014). Implementation of an Intelligent Course Advisory Expert System Cased-Based Course Advisory Expert System. In (IJARAI) International Journal of Advanced Research in Artificial Intelligence, 3(5), Available: www.ijarai.thesai.org [4]. Bendakir, N. and Aïmeur, E. (2006). Using association rules for course recommendation. AAAI Workshop - Technical Report, vol. WS-06-05, pp 31–40, Available: https://www.aaai.org/Papers/Workshops/2006/WS- 06-05/WS06-05-005.pdf [5]. O’Mahony, M. P. and Smyth, B. (2007). A recommender system for on-line course enrolment: An initial study. RecSys’07: Proceedings of the 2007 ACM Conference on Recommender Systems, pp 133–136. DOI: 10.1145/1297231.1297254. [6]. Sandvig, J. and Burke, R. (2005). Aacorn: A CBR recommender for academic advising. Technical Report TR05- 015, Available: http://facweb.cs.depaul.edu/research/techreports/TR05-015.doc. [7]. Ajanovski, V. V. (2017). Guided Exploration of the Domain Space of Study Programs. In 4th Joint Workshop on Interfaces and Human Decision Making for Recommender Systems (IntRS), Available: http://ceur- ws.org/Vol-1884/paper8.pdf. [8]. Young, P. (2017). A Recommender System for Personalized Exploration of Majors, Minors, and Concentrations *, RecSys 2017 Poster Proceedings 27 (2017), Available: http://ceur-ws.org/Vol- 1905/recsys2017_poster12.pdf. [9]. Sharma, A. and Tiwari, N. (2013). Design and Analysis of Fuzzy based Association Rule Mining. International Journal of Computer Applications and Information Technology, 3(I), pp 12, Available: https://www.ijcait.com/IJCAIT/31/314.pdf. [10]. Wang, L.-X. X. and Mendel, J. M. (1992). Generating Fuzzy Rules by Learning from Examples. IEEE Transactions on Systems, Man and Cybernetics, 22(6), pp 1414–1427. DOI: 10.1109/21.199466. [11]. Young-Jones, A. D., Burt, T. D., Dixon, S. and Hawthorne, M. J. (2012). Academic Advising: Does it Really Impact Student Success? In Quality Assurance in Education. vol. 21, Available: www.emeraldinsight.com. [12]. Mohamed, A. (2015). A decision support model for long-term course planning. Decision Support Systems, vol. 74, pp 33–45. DOI: 10.1016/j.dss.2015.03.002. [13]. Harding, B. (2008). Students with specific advising needs. In V. N. Gordon, W. R. Habley and T. J. Grites (Eds.), Academic advising: A comprehensive handbook (2nd ed.), pp. 189–203, Jossey-Bass. [14]. Laghari, M. S. (2014). Automated Course Advising System. International Journal of Machine Learning and Computing, pp 47–51. DOI: 10.7763/ijmlc.2014.v4.384. [15]. Roushan, T., Chaki, D., Hasdak, O., Chowdhury, M. S., Rasel, A. A., Rahman, M. A. and Arif, H. (2014). University course advising: Overcoming the challenges using decision support system. 16th Int’l Conf. Computer and Information Technology, ICCIT 2013, pp 13–18. DOI: 10.1109/ICCITechn.2014.6997355. [16]. Talal Al-Nory, M. (2012). Simple decision support tool for university academic advising. Proceedings of 2012 International Symposium on Information Technologies in Medicine and Education, ITME 2012, vol. 1, pp 53–57. DOI: 10.1109/ITiME.2012.6291245. [17]. Abdullah, A.-G., Sumaia, A.-G., Fadel, A., AL-Ruhaili, F. and Thamary, A.-A. (2012). An Expert System for Advising Postgraduate Students. International Journal of Computer Science and Information Technology, 3(3), pp 4529–4532. [18]. Albalooshi, F. and Shatnawi, S. (2010). HE-Advisor: A Multidisciplinary Web-Based Higher Education Advisory System. In Global Journal of Computer Science and Technology, 10(7), pp. 37-49. [19]. Mattei, N., Dodson, T., Guerin, J. T., Goldsmith, J. and Mazur, J. M. (2014). Lessons Learned from Development of a Software Tool to Support Academic Advising. In Proceedings of the 2014 Zone 1 Conference of the American Society for Engineering Education - “Engineering Education: Industry Involvement and Interdisciplinary Trends”, ASEE Zone 1 2014. IEEE Computer Society. DOI: 10.1109/ASEEZone1.2014.6820659. [20]. Shatnawi, R., Althebyan, Q., Ghalib, B. and Al-Maolegi, M. (2014). Building A Smart Academic Advising System Using Association Rule Mining. arXiv, Available: http://arxiv.org/abs/1407.1807 AETiC 2021, Vol. 5, No. 1 26 www.aetic.theiaer.org [21]. Lin, F., Dunwei, S. L., Frank, W., Kinshuk, Z. and Mcgreal, R. (2007). e-Advisor: A Multi-agent System for Academic Advising, Available: http://io.acad.athabascau.ca/~oscarl/pub/ABSHL2007.pdf [22]. Huang, L., Wang, C. D., Chao, H. Y., Lai, J. H. and Yu, P. S. (2019). A Score Prediction Approach for Optional Course Recommendation via Cross-User-Domain Collaborative Filtering. IEEE Access, vol. 7, pp 19550– 19563, DOI: 10.1109/ACCESS.2019.2897979. [23]. Asadi, S. and Shokrollahi, Z. (2019). Developing a Course Recommender by Combining Clustering and Fuzzy Association Rules. Journal of AI and Data Mining, 7(2), pp. 249–262. DOI: 10.22044/jadm.2018.6260.1739. [24]. Nafea, S. M., Siewe, F. and He, Y. (2019). A Novel Algorithm for Course Learning Object Recommendation Based on Student Learning Styles. Proceedings of 2019 International Conference on Innovative Trends in Computer Engineering, ITCE 2019, pp 192–201, DOI: 10.1109/ITCE.2019.8646355. [25]. Chang, P. C., Lin, C. H. and Chen, M. H. (2016). A hybrid course recommendation system by integrating collaborative filtering and artificial immune systems. Algorithms, 9(3), DOI: 10.3390/a9030047. [26]. Newell, C. and Miller, L. (2013). Design and evaluation of a client-side recommender system. RecSys 2013 - Proceedings of the 7th ACM Conference on Recommender Systems, pp 473–474. DOI: 10.1145/2507157.2508220. [27]. Bydžovská, H. (2015). Are collaborative filtering methods suitable for student performance prediction? Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9273, pp. 425–430. DOI: 10.1007/978-3-319-23485-4_42. [28]. Yao, S. and Huang, B. (2017). Beyond Parity: Fairness Objectives for Collaborative Filtering. Advances in Neural Information Processing Systems, vol. 2017-Decem, pp. 2922–2931, http://arxiv.org/abs/1705.08804. [29]. Xu, J., Xing, T. and van der Schaar, M. (2015). Personalized Course Sequence Recommendations. IEEE Transactions on Signal Processing, 64(20), pp. 5340 - 5352, DOI: 10.1109/TSP.2016.2595495. [30]. Koutrika, G., Bercovitz, B. and Garcia-Molina, H. (2009). FlexRecs: Expressing and combining flexible recommendations. SIGMOD-PODS’09 - Proceedings of the International Conference on Management of Data and 28th Symposium on Principles of Database Systems, pp 745–757. DOI: 10.1145/1559845.1559923. [31]. Keston, L. and Goodridge, W. (2015). AdviseMe: An Intelligent Web-Based Application for Academic Advising. International Journal of Advanced Computer Science and Applications, 6(8). DOI: 10.14569/ijacsa.2015.060831. [32]. Engin, G., Aksoyer, B., Avdagic, M., Bozanli, D., Hanay, U., Maden, D. and Ertek, G. (2014). Rule-based expert systems for supporting university students. Procedia Computer Science, vol. 31, pp 22–31. DOI: 10.1016/j.procs.2014.05.241. [33]. Hashemi, R. R. and Blondin, J. (2010). SASSY: A petri net based student-driven advising support system. ITNG2010 - 7th International Conference on Information Technology: New Generations, pp 150–155. DOI: 10.1109/ITNG.2010.57. [34]. Ayman, M. (2011). A Prototype Student Advising Expert System Supported with an Object-Oriented Database. International Journal of Advanced Computer Science and Applications, 1(3). DOI: 10.14569/specialissue.2011.010316. [35]. Nambiar, A. N. and Dutta, A. K. (2010). Expert system for student advising using JESS. ICEIT 2010 - 2010 International Conference on Educational and Information Technology, Proceedings, vol 1. DOI: 10.1109/ICEIT.2010.5607701. [36]. Thanh Binh, N., Thi Anh Duong, H., Hieu, T., Duc Nhuan, N. and Hong Son, N. (2008). An integrated approach for an academic advising system in adaptive credit-based learning environment. VNU Journal of Science, Natural Sciences and Technology, vol. 24, pp 110–121, Avaiable: https://repository.vnu.edu.vn/bitstream/11126/4690/3/TC_02468.pdf. [37]. Sobecki, J. and Tomczak, J. M. (2010). Student courses recommendation using ant colony optimization. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5991 LNAI(PART 2), pp 124–133, DOI: 10.1007/978-3-642-12101-2_14. [38]. Abdulwahhab, R. S., Al Makhmari, H. S. and Al Battashi, S. N. (2015, March 12). An educational web application for academic advising. 2015 IEEE 8th GCC Conference and Exhibition, GCCCE 2015, DOI: 10.1109/IEEEGCC.2015.7060084. [39]. Goodarzi, M. H. and Rafe, V. (2012). Educational Advisor System Implemented by Web-Based Fuzzy Expert Systems. Journal of Software Engineering and Applications, 05(07), pp 500–507, DOI: 10.4236/jsea.2012.57058. [40]. Adak, M. F., Yumusak, N. and Taskin, H. (2016, April 28). An elective course suggestion system developed in computer engineering department using fuzzy logic. 2016 International Conference on Industrial Informatics and Computer Systems, CIICS 2016. DOI: 10.1109/ICCSII.2016.7462394. AETiC 2021, Vol. 5, No. 1 27 www.aetic.theiaer.org [41]. Baloul, F. M. and Williams, P. (2013). Fuzzy academic advising system for on probation students in colleges of applied sciences - OMN. Proceedings - 2013 International Conference on Computer, Electrical and Electronics Engineering: “Research Makes a Difference”, ICCEEE 2013, pp. 372–377, DOI: 10.1109/ICCEEE.2013.6633965. [42]. Moss, H. B., Leslie, D. S. and Rayson, P. (2018). Using J-K fold Cross Validation to Reduce Variance When Tuning NLP Models. arXiv, vol. 1, pp 2978–2989, Available: http://arxiv.org/abs/1806.07139. [43]. Wu, D., Mendel, J. M. and Joo, J. (2010). Linguistic summarization using IF-THEN rules. 2010 IEEE World Congress on Computational Intelligence, WCCI 2010, DOI: 10.1109/FUZZY.2010.5584500. © 2020 by the author(s). Published by Annals of Emerging Technologies in Computing (AETiC), under the terms and conditions of the Creative Commons Attribution (CC BY) license which can be accessed at http://creativecommons.org/licenses/by/4.0. work_2o6cjbbw2bdj7nsiozoi6ob4ku ---- A novel expert system for objective masticatory efficiency assessment RESEARCH ARTICLE A novel expert system for objective masticatory efficiency assessment Gustavo Vaccaro 1☯*, José Ignacio Peláez2,3☯, José Antonio Gil-Montoya4☯ 1 International Postgraduate School, School of Dentistry, Granada University, Granada, Spain, 2 Department of Languages and Computer Sciences, University of Malaga, Malaga, Spain, 3 Prometeo Project, National Secretary of Higher Education, Science, Technology and Innovation (SENESCYT), University of Guayaquil, Guayaquil, Ecuador, 4 Gerodontology Department, School of Dentistry, Granada University, Granada, Spain ☯ These authors contributed equally to this work. * fabianvaccaro@correo.ugr.es Abstract Most of the tools and diagnosis models of Masticatory Efficiency (ME) are not well docu- mented or severely limited to simple image processing approaches. This study presents a novel expert system for ME assessment based on automatic recognition of mixture patterns of masticated two-coloured chewing gums using a combination of computational intelligence and image processing techniques. The hypotheses tested were that the proposed system could accurately relate specimens to the number of chewing cycles, and that it could identify differences between the mixture patterns of edentulous individuals prior and after complete denture treatment. This study enrolled 80 fully-dentate adults (41 females and 39 males, 25 ± 5 years of age) as the reference population; and 40 edentulous adults (21 females and 19 males, 72 ± 8.9 years of age) for the testing group. The system was calibrated using the fea- tures extracted from 400 samples covering 0, 10, 15, and 20 chewing cycles. The calibrated system was used to automatically analyse and classify a set of 160 specimens retrieved from individuals in the testing group in two appointments. The ME was then computed as the predicted number of chewing strokes that a healthy reference individual would need to achieve a similar degree of mixture measured against the real number of cycles applied to the specimen. The trained classifier obtained a Mathews Correlation Coefficient score of 0.97. ME measurements showed almost perfect agreement considering pre- and post-treat- ment appointments separately (κ� 0.95). Wilcoxon signed-rank test showed that a com- plete denture treatment for edentulous patients elicited a statistically significant increase in the ME measurements (Z = -2.31, p < 0.01). We conclude that the proposed expert system proved able and reliable to accurately identify patterns in mixture and provided useful ME measurements. PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 1 / 20 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Vaccaro G, Peláez JI, Gil-Montoya JA (2018) A novel expert system for objective masticatory efficiency assessment. PLoS ONE 13 (1): e0190386. https://doi.org/10.1371/journal. pone.0190386 Editor: Marco Magalhaes, University of Toronto, CANADA Received: August 13, 2016 Accepted: December 13, 2017 Published: January 31, 2018 Copyright: © 2018 Vaccaro et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: All source code files and MEPAT dataset will be available at the GitHub repository at: https://github.com/fabianvaccaro/ perceptodent. Funding: This work was funded by the Secretaria Nacional de Educación, Ciencia y Teconologı́a (SENESCYT) of the Government of Ecuador, with budget allocation No. 0099-SPP, http://www. educacionsuperior.gob.ec. Competing interests: The authors have declared that no competing interests exist. https://doi.org/10.1371/journal.pone.0190386 http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0190386&domain=pdf&date_stamp=2018-01-31 http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0190386&domain=pdf&date_stamp=2018-01-31 http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0190386&domain=pdf&date_stamp=2018-01-31 http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0190386&domain=pdf&date_stamp=2018-01-31 http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0190386&domain=pdf&date_stamp=2018-01-31 http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pone.0190386&domain=pdf&date_stamp=2018-01-31 https://doi.org/10.1371/journal.pone.0190386 https://doi.org/10.1371/journal.pone.0190386 http://creativecommons.org/licenses/by/4.0/ https://github.com/fabianvaccaro/perceptodent https://github.com/fabianvaccaro/perceptodent http://www.educacionsuperior.gob.ec http://www.educacionsuperior.gob.ec Introduction Objective evaluation of the masticatory function Health care services for the elderly and physically disabled population are ever-increasing chal- lenges where practitioners are required to evaluate the functional impairments of individuals in faster and more accurate ways while using less invasive methods. One approach to this mat- ter is the objective evaluation of the human mastication, which is a complex biomechanical process that involves coordinated movements of the jaw, tongue, lips, and cheek; and one of the main functions of the stomatognathic system. [1]. Objective mastication assessment can be performed in two ways: firstly, by quantifying the changes that the food has suffered during mastication, i.e. the Masticatory Performance; and secondly, by calculating the number of chewing strokes that would be required to achieve a certain degree of food degradation, i.e. the Masticatory Efficiency [2]. The Masticatory Perfor- mance (MP) has been defined as: a measure of the comminution of food attainable under stan- dardized testing conditions [3]; and is considered an objective indicator of oral functional capabilities, widely used to measure the impact of dental treatments, to assess levels of disabil- ity and orofacial functional impairments following stroke [4,5], and has also been associated with malnutrition risk [6]. On the other hand, the Masticatory Efficiency (ME) has been origi- nally defined as the number of extra chewing strokes needed by the patient to achieve the same pulverization as the standard person [7]; however, the strict measurement of the ME is pres- ently in disuse, mainly because patients with impaired mastication would need to masticate for very large periods of time. Furthermore, it is important to notice that several studies used the terms MP and ME interchangeably while referring exclusively to the MP. Current MP assessment techniques are based on the objective quantification of the degra- dation of a test-food subjected to mastication. The degradation level is determined by measur- ing a property (colour, weight, median particle size, chemical concentration, etc.) of a piece of natural or artificial food (e.g. Optosil/Optocal TM , peanuts, ham, chewing gums, paraffin blocks, carrots, jelly gums, etc.), where the property is prone to changes related to the number of chewing strokes. The fastest and easiest routine for objective MP assessment is the mixture quantification of a two-coloured cohesive specimen subjected to mastication [8–10]. In a mixing test a test-food specimen is formed by two differently-coloured layers of chewing gum or paraffin stacked together. Previous studies suggest that there are similarities among the visual characteristics of chewing gums masticated for the same number of chewing strokes when considering young and healthy human subjects [11]. These similarities have allowed experts to subjectively iden- tify the mixture using comparison tables. An example set of masticated specimens for 3, 9, 15, and 25 chewing cycles is shown in Fig 1; where it is possible to notice that the red and white layers are mixed, to some extent, in a regular fashion. The amount of mixture reached with each chewing cycle would depend on the masticatory function of the individual and on the structural characteristics of the specimen such as the size, thickness, density, hardness, viscos- ity, and tinctures used for colouring. Several studies have proposed simple digital image analysis approaches for mixture quanti- fication that are more precise than visual inspection techniques, and modern mixing tests cur- rently focus on these kinds of procedures [8]. The first attempt to measure the mixture of food using digital image analysis employed several custom-made algorithms, but these were not fully described, hence not possible to replicate [12]. Later on, the magic wand tool of the Adobe Photoshop Elements1 software was used to select and count the pixels corresponding to the regions of the masticated bolus that were not mixed [8]. The “unmixed fraction” of the specimen was manually calculated using a Microsoft Excel1 spreadsheet (Microsoft Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 2 / 20 https://doi.org/10.1371/journal.pone.0190386 Corporation, One Microsoft Way, Redmond, WA, USA). The measurements provided by this procedure were highly affected by the user, meaning high variability of the results; also, it was time-consuming and difficult for clinical settings. These difficulties promoted the creation of a specialized tool called ViewGum (dHAL Soft- ware. Kifissia, Greece, www.dhal.com), which has been receiving special attention from the dental and medical communities because of its ease of usage [9,13]. ViewGum uses the Bai and Sapiro segmentation algorithm to isolate the bolus from the background of the image, and measures the Circular Standard Deviation of the Hue channel (SDHue) in the HSI colour space; however, SDHue measurements may not to be suitable for white-coloured chewing gums [9,11], thus limiting the range of potential test-foods. In a different work, the Wolfram Mathematica1 software (Wolfram Research, Champaign, IL, USA) was used to compute the custom visual feature “DiffPix” for mixing quantification [10]; nevertheless, neither the seg- mentation, nor filtering, nor the feature extraction procedure itself were clearly exposed, so the DiffPix computation approach is not available for reproduction. The problematics of masticatory performance assessment The MP quantification techniques based on digital image processing retain various weak- nesses. Firstly, most of the algorithms or tools employed for this task are not well-documented Fig 1. Example set of masticated chewing gums. https://doi.org/10.1371/journal.pone.0190386.g001 Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 3 / 20 http://www.dhal.com/ https://doi.org/10.1371/journal.pone.0190386.g001 https://doi.org/10.1371/journal.pone.0190386 [8,9,11–13]. Also, there is neither consensus about which properties of the test-food should be monitored, nor scales for the mixture. The mixture can be quantified by numerous features of the resultant bolus [9,11,14]; however, existing MP measurement approaches rely on a single feature as the MP indicator, thus leading to high variability in the measurements and severely limiting the assessment methodology to a narrow set of test-foods [11]. Furthermore, MP assessment techniques require specialized training, equipment, and considerable amounts of time. For all these reasons, a comprehensive and integral reference framework for assessing the mixture in two-coloured chewing gums is needed. Proposed approach and purpose of this work This study explores the possibility to accurately measure the level of mixture in two-coloured chewing gums subjected to mastication within a comprehensive reference scale (0 to 100%). Consequently, important questions arise: what aspect of the masticated bolus should be con- sidered as the mixture indicator? And, from other point of view: how do specimens mixed by 0%, 25%, 50% (and so on) look like? These are not easy questions, as there are limitless ways to describe a digital image; and on the other hand, mastication is an erratic process, thus samples mixed under similar conditions would surely present noticeable differences. To overcome these difficulties, this study redefines the MP and ME under the premise that it is possible to identify patterns in the visual characteristics of masticated two-coloured chew- ing gum specimens when considering a young and healthy reference population. Firstly, we have redefined the MP for mixing tests as “the set of measurements that characterize the state of a sample subjected to a given number of chewing strokes”, thus extending the original defi- nition to include more than one feature. On the other hand, the MP of a given specimen would not suffice to achieve a diagnosis of the masticatory function of the patient, because a reference dataset is needed for comparison. Therefore, we have also redefined the ME for mix- ing tests as “the equivalent number of chewing strokes that an individual from a healthy refer- ence population would need to achieve a similar degree of mixture under controlled experimental conditions measured against the known number of chewing strokes applied to the sample”. The key aspect of this new ME definition is the calculation of the number of chewing strokes that would produce a similar MP outcome for the reference population; there- fore, the challenge of evaluating the masticatory function of an individual can be summarized as a classification problem where the MP (observed state) of a masticated specimen is classified into a category represented by the number of chewing strokes needed by the reference popula- tion. Mathematically, the ME can be represented as: ME ¼ P T ð1Þ where ME � 0, P is the number of chewing strokes needed for an individual from a healthy reference population (P2 N,), and T is the known number of chewing strokes applied to the sample (T2 N, T > 0). Consequently, a specimen with a ME of 0 implies a total absence of mixture, while a ME of 1 implies a normal level of mixture, i.e. comparable to what a reference healthy person would achieve. Furthermore, an ME > 1 implies that the diagnosed individual chews better than the average healthy person. The ME can also be expressed as a percentage for easier interpretation and easily associated with linguistic tags as shown in Table 1. The issue of evaluating multiple visual characteristics and performing an accurate classifica- tion of the sample can be efficiently solved by the application of computational intelligence techniques for pattern identification and automatic classification. Therefore, the aim of this paper is to present and validate a novel expert system for mixture patterns recognition in Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 4 / 20 https://doi.org/10.1371/journal.pone.0190386 young and healthy individuals, and objective Masticatory Performance efficiency assessment in masticatory-compromised individuals. The hypotheses tested in this work are: 1. The proposed system can accurately classify masticated two-coloured chewing gum speci- mens into the corresponding group represented by the number of chewing cycles applied. 2. The proposed system is able to identify differences between the patterns of mixture of eden- tulous individuals prior and after treatment with complete removable dentures. Materials and methods Knowledgebase The proposed expert system involves the identification of mixture patterns in two-coloured chewing gums; these patterns are the basis of a classification procedure to compute the P score (see Eq 1) of new specimens obtained from masticatory-compromised individuals. However, it is important to notice that the structural characteristics of a chewing gum brand are related to its visual characteristics; besides, a brand’s availability may not be the same worldwide. To overcome these problems the proposed system comprises two main components, as shown in Fig 2: a calibration stage oriented to identify patterns in the mixture of a reference population for a selected chewing gum brand; and a diagnosis stage oriented determine the ME of a patient using the same brand of chewing gums. These are related by an auxiliary component called Masticatory Efficiency and Performance Assessment Technique (MEPAT) which con- tains the information generated from the calibration in order to accurately perform mastica- tory assessment tests. Calibration The calibration stage aims to identify patterns in the visual characteristics of masticated two- coloured chewing gums by analysing a broad distribution of reference samples. This process can be resource-intensive and time-consuming because it requires the participation of various reference individuals and the analysis of multiple samples per participant. In this work, the cal- ibration stage was performed in the Faculty of Dentistry of the University of Guayaquil, Ecua- dor; and was orchestrated by four trained clinicians. The calibration process comprises the test-food selection, reference population selection, sample retrieval, digitization, segmentation, feature extraction, feature selection, machine learning, and classifier validation steps; which are described in the following sections. Test-food selection. The test-food considered for this approach was a chewing gum wafer composed of two differently coloured layers. These colours were different enough to permit an adequate interpretation of the level of mixture with a mere visual inspection. Previous studies have proposed red-blue[8], green-blue [13], red-white [11], among other colour combinations. Table 1. Example of linguistic tags associated to Masticatory Efficiency levels. Linguistic tag Masticatory Efficiency level Totally impaired 0% Impeded 25% Limited 50% Adequate 75% Normal 100% Better than the norm Greater than 100% https://doi.org/10.1371/journal.pone.0190386.t001 Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 5 / 20 https://doi.org/10.1371/journal.pone.0190386.t001 https://doi.org/10.1371/journal.pone.0190386 In this work, the selected test-food was composed of two flavours of Trident1 chewing gums: watermelon (red dye) and spearmint (white dye); which are commercially available in Ecuador at the time of the experiment. The chewing gum strips came in the form of individually wrapped strips measuring 2.5 × 9 × 38 mm. Each specimen was formed by manually stacking the two pieces of chewing gum. Reference population selection. A total of eighty volunteers (N1 = 80) were recruited for the reference group (G1): 41 females aged 25 ± 4.2 years; and 39 males aged 25 ± 5.8 years. Sub- jects were students from the Faculty of Dentistry of the University of Guayaquil, Ecuador. The inclusion criteria were being 18 to 35 years old, having at least 28 natural teeth, Angle Class I occlusion, and a DMFT score of 2 or less. Exclusion criteria were TMJ dysfunction symptoms, orofacial pain, bruxism, tooth wear, and the presence of fixed or removable orthodontic appli- ances. Written informed consent was obtained from all participants. Formal approval through the Ethical Committee for Human and Animal Experimentation of the University of Guaya- quil was obtained for this experiment. Sample retrieval. An operator instructed the subjects in G1 to masticate five specimens by 0, 5, 10, 15, 20 chewing cycles correspondently, and silently counted the number of cycles. Sub- jects rested for 30–60 seconds between mastication sessions to prevent fatigue. The subjects expelled the resultant boluses and the operator located each bolus between two sheets of trans- parent film intended for document lamination, and immediately flattened them to a 1mm thick wafer using a wheel-driven screw press. The flattening step is important because this assembly provides resistance to manipulation and is ideal digitization [9]. A total of 400 sam- ples were collected this way. Fig 2. Proposed mastication assessment solution model. https://doi.org/10.1371/journal.pone.0190386.g002 Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 6 / 20 https://doi.org/10.1371/journal.pone.0190386.g002 https://doi.org/10.1371/journal.pone.0190386 Regarding the selection of the set of chewing cycles, previous studies indicate that this list must include 20 chewing strokes as the mean human mastication duration; and no more than 50 chewing strokes because of fatigue of the masticatory muscles [15]. On the other hand, 0 chewing strokes represents the “pre-mastication state” of the specimen, and it is obtained by retrieving the food specimen right after the subject introduces it into the mouth. This is impor- tant because saliva may play an important role during the image analysis process. Nevertheless, the task of selecting the adequate number of chewing cycles is currently a point of discussion among scholars. Digitization. All specimens were individually scanned on both sides using a Canoscan Lide 2201 flatbed scanner (300 dpi, standard calibration parameters for colour images). A flatbed scanner was chosen against digital photography because empirical experimentation showed that scanned images offered better image quality and even illumination. However, empirical tests during the calibration stage exhibited that digital photography under controlled conditions may also provide adequate images. Segmentation. The resultant digital images were segmented to isolate the area of the bolus against the background. Specimens often had irregular shapes, vague boundaries, and very heterogeneous coloration; therefore, these conditions made precise automatic segmenta- tion a nontrivial task. Active contour modelling have been used previously, but required addi- tional effort for the clinicians [13]; so a more general approach was needed. Watershed algorithms can serve to transform the brightness levels of the image into a set of clusters or “water pools” by identifying relevant markers on the image and filling their surroundings [16]; in this regard, improvements of the Watershed algorithm has been proposed in the literature which are specially tailored for complex medical imaging [17]. On the other hand, Watershed algorithms can become highly complex for colour image segmentation and usually identify many clusters per image, hence they often require post-processing methods. Another approach for this matter involves iterative shrinkage methods for the selection of a specific region in the image, which has been successfully used for complex medical imaging segmentation such as magnetic resonance imaging [18]; however, these are better suited for sparse level images with diffuse contours and may require supervised intervention to indicate the desired search region. In this regard, the proposed approach implemented a fully-automated colour-based seg- mentation algorithm, constructed upon the combination of Mean Shift [19,20], distance map, and K-Means classification algorithms [21]. Further details about this segmentation approach are detailed in S1 Appendix. This custom-made segmentation algorithm was used to divide the image into two regions: the bolus located in the centre of the image, which is considered as the Region of Interest (ROI); and the Background, surrounding the bolus. An example of the application of the proposed region classification process considering Mean Shift (MS) + dis- tance map (DM) + K-Means (KM) is shown in Fig 3, where it is possible to notice an improve- ment in the classification of regions, compared to the usage of KM, MS + KM, and DM + KM. Feature extraction and MP calculation. The proposed expert system approach considers the whole observed state of a specimen, i.e. the MP of the sample. Mathematically, the MP can be described as: MP ¼ðf 1 ; f 2 . . . ; fkÞ ð2Þ where fi represents the i th observed feature, and k represents the total number of features. It is important to notice that if just one feature is considered then it may lack of sufficient informa- tion about the sample; consequently, the MP comprises multiple features at once. Additionally, these features can be obtained from different extraction methodologies that are already Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 7 / 20 https://doi.org/10.1371/journal.pone.0190386 acknowledged by clinicians, e.g.: pixel counts, histogram analysis, etc. [10,13]. Under this premise, a custom set of feature extraction models was applied over the ROI of each pair of images corresponding to the same specimen. Feature extraction models included the mean and the absolute variance of the colour pixels; and the absolute variance, skewness, energy, entropy, and highest peaks of the colour histograms. These were computed for each compo- nent of the RGB, CIE-L�u�v� [22], HSI, and Normalized RGB colour spaces separately (see S2 Appendix). Additionally, the Circular Variance of the Hue channel of the HSI colour space was computed [9,13]. Consequently, the feature extraction stage considered a total of 121 fea- tures obtained from different image processing methodologies and colour space models (10 methods × 12 channels + CVOH). Heuristic extraction models such as Binary Particle Swarm Optimization or advanced texture characterization like Wavelet-based methods were not con- sidered for this experiment because of their high computational times compared to simple pixel and histogram feature extraction models; nevertheless, further improvement of the pro- posed solution should make extensive use of these kind of advanced characterization approaches [23,24]. The selection of the colour spaces was based on empirical experience. In this case the origi- nal images are represented in the RGB colour space, which is used for fast representation of 256 shades of Red, Green, and Blue. The CIE-L�u�v� is an easily computable transformation of the CIE XYZ (Tristimulus) colour space [25], extensively used in graphical computing. The HSI (Hue, Saturation, and Intensity) colour space is a cylindrical-coordinate representation of the points in the RGB with applications in computer vision [24], and previous studies suggest it is useful for characterizing the mixture of chewing-gums [9]. Finally, the Normalized RGB colour space is obtained from the RGB by a normalization procedure [26]; the influx of the Fig 3. Comparison of different automatic segmentation methods applied over chewing-gums identification. The goal of the segmentation process was to discriminate the chewing gum bolus in the centre of the image against the background. Three segmentations methods were employed: Mean Shift (MS), Distance Map (DM) and K-Means (KM). https://doi.org/10.1371/journal.pone.0190386.g003 Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 8 / 20 https://doi.org/10.1371/journal.pone.0190386.g003 https://doi.org/10.1371/journal.pone.0190386 brightness is diminished by the normalization, so it is less susceptible to changes related to the light-source. Feature selection. The large number of features increases the chances of accurately char- acterizing a sample; however, it hiders the effectiveness of pattern recognition processes that will be applied in further stages. It is possible to discard some of the features by computing a relevancy score (q) associated to each extracted feature, computed as: qi ¼ jrðFiÞjþ gðFiÞ n 2 ! ! 2 0 B B B B B B B B B @ 1 C C C C C C C C C A ð3Þ where Fi represents the set of measurements obtained from extraction of the i th feature, ρ(Fi) is the coefficient of correlation with the number of chewing cycles calculated with Spearman’s rho, γ(Fi) is the amount of statistically different pairs of chewing strokes that the i th feature can discriminate, and n is the amount of different numbers of chewing strokes (n = | C |). The value γ(Fi) is calculated with ANOVA, considering the number of chewing strokes as the fixed factor, corrected with post hoc Bonferroni, and considering that: 0 � gðFiÞ� n 2 ! ð4Þ A feature can be discarded if the associated q score is lower than 0.5. The above-mentioned fea- ture selection process may serve to reduce the size of the Artificial Neural Network model that will be used for pattern recognition in this experiment. Additional dimensionality reduction can be achieved by applying principal component analysis (PCA) to the set of non-discarded features by converting the set of possibly correlated features into a set linearly uncorrelated principal components [27]. Nonetheless, previous experimentation showed that dimensional- ity reduction using PCA may not necessarily improve pattern recognition performance of this solution. In this regard, PCA results for the non-discarded features are shown in the Results section for comparison purposes. Machine learning. The information extracted from each specimen (S) during the calibra- tion phase was summarized as a 2-touple consisting of the MP of the specimen and the number of chewing strokes (t), such that: S ¼ðMP; tÞ ð5Þ Then, the specimens were grouped as: Si ¼fSjS ¼ðMP;xÞ^ x ¼ tig ð6Þ where Si represents the set of MPs of all specimens that were masticated by ti chewing strokes. The proposed methodology used an artificial neural networks (ANN) algorithm for pattern identification [28–31]. ANNs are rough mathematical models of biological neurons where electrical signals are represented as numerical values; they mimic some features of biological brains, especially the ability to learn, i.e. to acquire the ability of processing information in cer- tain patterns [32]. The chosen ANN architecture was the Multilayer Perceptron (MLP) [33]. The MLP maps a set of inputs to a set of desired outputs through multiple layers of nodes; Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 9 / 20 https://doi.org/10.1371/journal.pone.0190386 where each node is an artificial neuron. The basic MLP structure consists of an Input layer, an Output layer, and any number of Hidden layers [24,34–36]. In our case, the MLP structure consisted of k nodes in the Input layer (where k is the num- ber of features extracted from the specimens), 1 node in the Output layer (binary: true or false), and a variable number h of nodes in the Hidden Layer (k/3 � h� k). A set of MLPs was constructed for each Si with the task of determining if the sample was masticated by pi chewing strokes and considering different numbers of hidden neurons. The best number of neurons h in the Hidden layer was computed by sequentially increasing by 1 the value of h from k/3 to k, and executing 10 trainings per h value. Inputs and outputs were obtained by breaking the information (i.e., the S) of each sample in two: the MP was considered the input, and the p as the output. The entire data set was ran- domly divided in three groups for each training execution: 40% for the Training Group (TG), 30% for the Validation Group (VG), and 30% for the Testing Group (SG). Each MLP was trained using the data in the TG, and using the VG as a reference to stop overfitting [37]. Classifier validation. The proposed system fed each trained network with the MP inputs from the SG group, and compared the predicted outputs (the P score) against the known num- ber of chewing strokes. Then, the system computed the Matthews Correlation Coefficient (MCC) to assess the performance of a trained network for each execution [38]. The MCC is a measurement of the quality of binary classification that considers true and false positives and negatives, and can be regarded as a correlation coefficient between the observed and predicted classifications, where MCC = 1 represents a perfect prediction, MCC = 0 represents a random equivalent prediction, and MCC = -1 represents a complete disagreement between prediction and observations. The MCC was computed as follows: MCC ¼ TP � TN � FP � FN ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðTP þ FPÞ�ðTP þ FNÞ�ðTN þ FPÞ�ðTN þ FNÞ p ð7Þ where TP represents the number of true positives, TN represents the number of true negatives, FP represents the FP represents the number of False Positives, and FN represents the number of False Negatives. The system considered an MPL as suitable if its MCC was over 0.95, and choose the MLP with the highest MCC. The calibration stage would be considered unsuccess- ful if no suitable MLPs are found after iterating through all the possible h values. The system assembled the set of suitable MPLs for each pi as a unique classifier in the form of a binary cas- cade: new samples were classified for S1, then for S2, and so on. The system discarded any sam- ple that could not be classified into any Si group. The MEPAT The information obtained from the calibration stage includes details about the Test-Food, the features that best characterize it, relevant data about the experimental procedure, and the trained classifier. To perform diagnostic analyses over new specimens a clinician operator must follow the same sample retrieval procedure, utilize the same feature extraction methods, and feed the trained classifier with information about a new sample formatted in the right way. This paper proposes a new standardized representation of the information, procedures, and tools required to perform Masticatory Efficiency and performance diagnoses named Mas- ticatory Efficiently and Performance Assessment Technique (MEPAT). With this, the pro- posed solution explores the possibility to standardize the information resulting from a calibration execution following these objectives: 1. To be written in a comprehensive markup language. Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 10 / 20 https://doi.org/10.1371/journal.pone.0190386 2. To contain relevant information about the calibration experiment. 3. To contain all the information needed to execute a diagnosis over a new specimen. 4. To be portable between different users and devices. 5. To provide scalability for new feature extraction procedures. 6. To be open-source. The MEPAT follows a custom-made XML structure (see S1 File), which is software-inde- pendent, and can be easily stored and transferred. The MEPAT can be graphically represented as shown in Fig 4; and mathematically consists of a tuple: MEPAT ¼hTF; ES;CH;CLS;OP;PERi ð8Þ where TF represents the information about the test-food, ES represents the Experimental Set- tings, CH represents the set of selected features that characterize the test-food, CLS represents the trained classifier, OP represents the information about the Operator that orchestrated the calibration stage, and PER represents the performance of the overall MEPAT. On the other hand, additional information such as a Unique Identifier (UID), creation date, and upload date, may be used for synchronization purposes. Additional information about the MEPAT components can be found in S3 Appendix. The resultant information obtained from the cali- bration process was included in a MEPAT structure and used in further stages of this work. Diagnosis The ultimate purpose of the calibration stage was to provide the necessary information to per- form an adequate diagnosis of the masticatory function of an individual. To do so, the system employed a MEPAT to execute a single diagnosis mixing-test, which is defined as a calibrated methodology for assessing the masticatory function of an individual using a single specimen of a two-coloured chewing gum. Fig 4. Graphical representation of the MEPAT XML schema. https://doi.org/10.1371/journal.pone.0190386.g004 Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 11 / 20 https://doi.org/10.1371/journal.pone.0190386.g004 https://doi.org/10.1371/journal.pone.0190386 MEPAT selection. This experiment considered the MEPAT that was created during the previous calibration execution because it contained all the information necessary to perform single diagnosis mixing-tests with the available resources at the University of Guayaquil in Ecuador. Clinical procedure and sample retrieval. A total of 40 volunteers (N2 = 40) were recruited for the testing group (G2): 21 females aged 73 ± 8.7 years; and 19 males aged 71 ± 9.1 years. Subjects were patients from the dental prosthetics clinic of the Faculty of Dentistry of the University of Guayaquil, Ecuador. The inclusion criteria were being older than 60 years old and complete edentulism. Exclusion criteria were TMJ dysfunction symptoms, orofacial pain, and severe cognitive impairment. Written informed consent was obtained from each subject after a full explanation of the research project. The Ethical Committee for Human and Animal Experimentation of the University of Guayaquil provided a formal approval for this experiment. Four samples were obtained from each patient: first, patients received two consecutive tests without wearing dental prosthesis; then, patients received the next two tests 30 days after com- plete removable dentures were fitted during a follow-up appointment. The overall procedure for sample retrieval for was conducted as follows: 1. An operator provided the patient with a specimen of the selected test-food. 2. The operator instructed the patient to masticate the test-food specimen by 20 chewing strokes on the preferred chewing side (notice that the largest number of chewing cycles dur- ing the calibration stage was 20). 3. The operator monitored the patient while silently counting the number of chewing strokes. 4. The masticated specimen was retrieved when 20 chewing cycles were achieved. 5. The operator put the specimen between two sheets of transparent film intended for docu- ment lamination. 6. The operator flattened the specimen to a 1mm thick wafer using a calibrated press. 7. The pressed wafer was scanned for both sides. Digital image analysis. The system segmented all the digital images obtained from the samples using the MS + DM + KM procedure (see S1 Appendix); then, the system extracted a set of features following the instructions stored in the CH component of the MEPAT. Classification. The proposed system extracted a set of features from each sample follow- ing the instructions in the CLS component of the MEPAT. Then, the classifier categorised each sample in one of the classes related to a number of chewing strokes. This classification provided the number of chewing strokes that a healthy reference individual would need to achieve a similar degree of mixture, i.e. the P score of the sample. Masticatory Efficiency quantification. The system computed the ME correspondent to each sample using Eq 1, considering T = 20 and the P score calculated in the Classification step. For instance, if a sample scored a P of 15 then the corresponding ME would be: ME = (15/20) × 100% = 75%. Statistical analysis Statistical analyses were performed on MATLAB 2015a (The MathWorks Inc., MA, USA) using the Statistic Toolbox. First, the Mathews Correlation Coefficient [38] was used for vali- dating the pattern identification performance for G1. Secondly, the inter-rater agreement Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 12 / 20 https://doi.org/10.1371/journal.pone.0190386 (consensus) of consecutive ME measurements of the G2 was verified using the Cohen’s Kappa statistic, considering the initial and follow-up appointments separately [39]. Thirdly, the differ- ences between the ME measured prior and after treatment with complete removable prosthe- ses (first appointment and follow-up appointment respectively) were evaluated using the Wilcoxon signed-rank test, considering the highest ME value from the two measurements. The performance of the proposed ANN classification approach was compared against a sin- gle-feature classification methodology. The goal was to verify the practicality of implementing a complex classification procedure instead of computing the ME directly from single MP mea- surements obtained from traditional methods. To do so we implemented a binary cascade to determine the closeness of each sample to the appropriate number of chewing strokes by com- puting its standard score (z-score) as the number of standard deviations by which the MP mea- surement is above the mean. The z-score has been used previously by Halazonetis et al. [13] to diagnose the state of the masticatory function by comparing the MP (CVOH, see S2 Appendix) measurements of a given sample against a known MP distribution. Mathematically, the z- score was computed as: zi;T ¼ mpi � mpT sT ð9Þ Where zi,T is the z-score of the i-th sample computed for T chewing strokes, mpi is the MP measured from a single feature, mpT is the mean MP value computed from the Training Group, and σT is the standard deviation. Then, samples from the Testing Group were pre-clas- sified as part to the T group if |zi,T| � 0.25; hence, the core binary classification performance per T group was computed using the MCC score. Finally, the samples were classified in the T group that provided the lowest absolute z-score; i.e., where the MP value was closest to the mean. Results Calibration stage This study retrieved a total of 400 specimens during the calibration stage (800 digital images). The complete calibration process required a combined total of 156.05 hours, which corre- sponded to 9 sessions of sample retrieval (~4 hours per session with 4 operators), and one ses- sion of image processing and classifier training (3.86 hours) which was performed on an Intel Core i7-5930K PC with 32GB of RAM. The Table 2 provides detailed information about the time required for the calibration phase; in this regard, the mastication, resting and digitization, and image processing steps accounted for 64.5%, 33%, and 2.5% of the calibration execution time respectively. A total of 35 features with a q score above 0.5 were selected as good mixture characterizers (calculated with Eq 3). On every case, the Kolmogorov-Smirnov test confirmed the normality Table 2. Distribution of time shares of the calibration stage for 400 samples. Execution time per sample in seconds: average (std. dev.) Total execution time in hours Time share Mastication 90.6 (19.1) 100.7 64.5% Digitization a 52.9 (4.2) 51.5 33.0% Image processing 3.5 (1.1) 3.9 2.5% Total 178.6 (5.8) 156.1 100.0% a Also includes the resting time https://doi.org/10.1371/journal.pone.0190386.t002 Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 13 / 20 https://doi.org/10.1371/journal.pone.0190386.t002 https://doi.org/10.1371/journal.pone.0190386 of the distribution per chewing stroke (p < 0.05); significant differences among mixing states were confirmed with ANOVA corrected with post hoc Bonferroni (p < 0.05); and showed moderate-to-high correlation with the number of chewing strokes (| ρ | > 0.5; p < 0.05). The PCA executed over the set of selected features indicated that 3 principal components (PC1 to PC3) explained 91.301%, 4.488%, and 4.211% of the variance respectively. The Table 3 lists the selected features along their corresponding q scores and PCA factors rotated with Promax rotation method. Classifiers trained with all the 35 selected features performed slightly better than those trained solely with the first 3 principal components after 200 iterations, although no significant differences were found between training groups (p = 0.412). The classifier that showed the best Table 3. List of features selected as mixture state characterizers, showing the model, the colour space channel, the relevancy q score, and the PCA factors for the first 3 principal components. Feature extraction model Channel q PC1 PC2 PC3 A. Variance of the histogram H 0.841 0.99 -0.02 -0.04 Entropy of the histogram H 0.827 0.01 0.00 0.00 Value of the 1st peak of the histogram u 0.825 0.00 0.35 0.00 Value of the 1st peak of the histogram B 0.820 0.00 0.00 0.08 A. Variance of the pixels G 0.805 0.00 0.00 0.00 Value of the 1st peak of the histogram S 0.802 0.00 0.00 0.00 Value of the 1st peak of the histogram Rn 0.800 0.00 0.00 0.00 A. Variance of the pixels u 0.799 0.00 0.00 0.00 A. Variance of the pixels S 0.795 0.11 0.01 0.19 A. Variance of the pixels Rn 0.786 0.00 0.00 0.00 A. Variance of the pixels Gn 0.783 0.70 0.26 0.10 A. Variance of the pixels L 0.776 0.00 0.00 0.00 Value of the 1st peak of the histogram G 0.774 0.00 0.00 0.00 Value of the 1st peak of the histogram Gn 0.772 0.01 -0.05 0.25 A. Variance of the pixels B 0.772 0.01 -0.06 0.27 Skewness of the histogram H 0.769 0.00 -0.01 0.28 Mean of the pixels H 0.706 0.00 0.00 0.00 Value of the 1st peak of the histogram L 0.705 0.00 0.00 0.00 Value of the 1st peak of the histogram Bn 0.692 0.00 0.00 0.00 A. Variance of the pixels Bn 0.671 -0.01 -0.05 0.61 Entropy of the histogram R 0.657 0.00 0.00 0.00 Entropy of the histogram I 0.634 0.51 0.00 0.00 A. Variance of the histogram R 0.625 -0.02 -0.03 0.41 Entropy of the histogram u 0.625 -0.02 -0.08 0.40 Entropy of the histogram G 0.624 -0.03 -0.05 0.46 A. Variance of the histogram B 0.623 0.00 0.00 0.00 A. Variance of the histogram G 0.604 0.00 0.00 0.00 Entropy of the histogram S 0.600 0.00 0.00 0.00 A. Variance of the histogram I 0.599 -0.03 0.23 0.32 Entropy of the histogram L 0.598 0.00 0.00 0.00 Entropy of the histogram B 0.595 0.00 0.00 0.00 A. Variance of the histogram S 0.590 -0.02 0.93 -0.10 A. Variance of the histogram L 0.576 0.00 0.00 0.00 Entropy of the histogram Gn 0.571 0.00 0.01 0.00 Entropy of the histogram Rn 0.558 0.00 0.00 0.74 https://doi.org/10.1371/journal.pone.0190386.t003 Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 14 / 20 https://doi.org/10.1371/journal.pone.0190386.t003 https://doi.org/10.1371/journal.pone.0190386 overall performance was trained using all the features listed in Table 3, with 16 neurons in the hidden layer (h = 16), and was obtained after 110 iterations. Further performance details of the resultant classifier are listed in Table 4. On the other hand, the core performances of single-feature classifiers are detailed in S4 Appendix. The name of each features is expressed using the corresponding Mixture Feature Code (MFC, see S3 Appendix). The MCC score was computed per group to better visualize the ability of the classifier to differentiate between numbers of chewing strokes. Finally, the global classification score (including T = 0, . . . 20) is presented. Diagnosis stage The complete diagnosis stage required a combined total of 6.23 hours, which corresponded to two sessions of sample retrieval (~2.6 hours per session with 4 operators), and two sessions of image processing and automatic classification (~ 12 minutes); with an average execution time per patient of 4.5 minutes. Some operators reported difficulties while counting the chewing cycles; in those cases, the operator asked an assistant for help and the final number of chewing cycles was determined by agreement. The Cohen’s Kappa statistic showed that repeated measurements of ME for the G2 group showed almost perfect agreement, considering pre- and post-treatment appointments sepa- rately (κ� 0.95). Furthermore, a Wilcoxon signed-rank test showed that a complete denture treatment for edentulous patients elicited a statistically significant increase in the ME measure- ments of the individuals (Z = -2.31, p < 0.01). Additionally, the Absolute Variance of the His- togram of the Hue channel (VhH) was evaluated separately for comparison reasons [11]; in this regard, complete denture treatment for edentulous patients elicited a statistically signifi- cant increase in VhH measurements (p < 0.001). The mean, median, and standard deviations of ME and VhH measurements are listed in Table 5 along with the corresponding linguistic tag associated to the ME level (see Table 1). Table 4. Performance details derived from the confusion matrix of the trained best trained classifier obtained from the calibration stage. Confusion matrix component Score Sensitivity 0.98 Specificity 0.99 Accuracy 0.99 MCC 0.97 https://doi.org/10.1371/journal.pone.0190386.t004 Table 5. Masticatory Efficiency (ME) and absolute variance of the histogram of the Hue (VhH) of edentulous individuals measured prior and after treatment with complete dentures. Statistic ME ME level tag VhH Prior treatment Mean 0.26 Impeded 10.26×106 Median 0.25 Impeded 9.56×106 Standard deviation 0.22 - 1897.01 Mode 0.50 Limited 10.11×106 After treatment Mean 0.71 Adequate 24.05×106 Median 0.75 Adequate 23.49×106 Standard deviation 0.23 - 3314.78 Mode 0.75 Adequate 23.99×106 https://doi.org/10.1371/journal.pone.0190386.t005 Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 15 / 20 https://doi.org/10.1371/journal.pone.0190386.t004 https://doi.org/10.1371/journal.pone.0190386.t005 https://doi.org/10.1371/journal.pone.0190386 Discussion This paper introduced new definitions for ME and MP. The differences between ME and MP were clearly distinguished, and the MP was used as a component for the calculation of the ME. Additionally, a reference scale for the ME is presented for the first time. The calibration stage required most of the experimental time and resources, as it involved a complex (yet easier than traditional) clinical execution. On the other hand, the average execu- tion time needed to obtain a full ME diagnosis from a patient was 5 minutes, which is consid- erably fast for a clinical setting. The disparity between calibration and diagnosis stages was predictable, as the calibration stage required more samples and more complex computational processing steps. The visual features selected in this experiment as characterizers of the mixture were com- puted using various deterministic algorithms applied over a broad range of colour channels. Two feature reduction approaches were followed, first, a relevance-based method selected a total of 35 bare features, implying that the mixture may be assessed by more than a single fea- ture, and that some features are better than others at characterizing the mixture. The variance of the histogram of the Hue channel of the HSI colour space provided the best overall rele- vancy, while the entropy of the histogram of the normalized-red channel of the normalized RGB colour space provided the lowest acceptable relevancy. On the other hand, a PCA approach selected 3 PCs that accounted for 99.9% of the variance, thus suggesting that most of the features provided little-to-none new variance information to the model. For both feature selection approaches pattern identification and classifier training showed very high performance scores, with an MCC = 0.97 for the best case using the 35 bare features as inputs. This suggest that the proposed expert system was reliable at identifying mixture pat- terns for the selected test-food. On the other hand, results at S4 Appendix show that single- feature classifiers performed poorly in comparison to ANN-based classifiers for this purpose. The best single-feature classifier employed the EhH feature as the MP indicator with global MCC = 0.321. However, it is interesting to notice that many single-feature classifiers provided good core classification performance when tasked to identify samples masticated for 20 chew- ing strokes (T = 20); in this case Nhu, NhB, NhGn, Ehu, EhGn, VhGn, P2H, NhRn, NhL, NhG, EhH, V1H, NhS, NhI, Nhv, VhB scored an MCC � 0.5, where the Nhu obtained the highest core classification performance score with MCC = 0.660. Diagnosis stage results showed that ME of edentulous individuals were significantly higher after receiving treatment with complete dentures (p < 0.01), implying an increase in the masti- cation process outcome from “impeded” to “adequate” (see Table 1). The proposed methodology can be improved in future works by strengthening the feature extraction and selection processes; as one of the key factors when computing the ME via pattern recognition is the quality of the MP indicators and the amount of relevant informa- tion that these provide to the model. Therefore, more sophisticated mixture quantification approaches such as texture analysis and wavelet transformations may significantly improve the performance of the proposed system. Also, better feature selection approaches are may help to reduce the necessity of large datasets of samples during the training stage. It is possible to sustain that efforts to standardize MP evaluation in a worldwide scale may not be viable, as there are significant differences in digitization equipment and test-food avail- ability among countries. Differences in the digitization equipment may produce undesirable effects on the classifier outcome, although this phenomenon has not been profoundly studied, and the proposed calibration model can be easily adapted to handle different kinds of digitiza- tion devices. Additionally, choosing the right test-food was a crucial task, as it greatly influ- ences the outcome of the masticatory assessment [14]. A very useful list of specifications for Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 16 / 20 https://doi.org/10.1371/journal.pone.0190386 specimens aimed to use for a two-colour-mixing ability test was presented by Schimmel, et al (2015) [9]. Regarding this, we recommend that specimens must have two different colours that would mix when chewed, be sugar-free, be easy to chew, must not have a hard coating, and should not stick to artificial dentures. Specially made test-foods for masticatory performance assessment made by Lotte1 (Lotte Co., Ltd., Tokyo, Japan) may be acquired in Japan and Korea, but they are not available worldwide. Nonetheless, we consider plausible to find suitable specimens for mixing-tests most of the time. Further applications of the proposed expert system may be executed in two different scenar- ios: the calibration stage should be performed within a research context with more resources, and the diagnosis stage may be performed in a clinical practice context, linked by the sharing of experimental information in the form of a MEPAT. Shortcomings of the proposed method This study considered only one colour combination and brand of chewing gums, hence further studies may include other colour combinations and brands available in other regions and countries. In addition, the study sample for the diagnosis stage should be extended to include more masticatory-compromising pathologies and treatments. In our case, treatment with complete oral dentures was chosen because it represents a large portion of the daily routine of dental practice in Ecuador, and Masticatory Efficiency evaluation provides useful diagnostic information and relevant data for treatment enhancing. It is important to notice that the proposed system involved algorithms designed to reduce the influx of the operator during the image processing steps; therefore, subsequent analysis of the same set of images will always provide the same feature measurements. Nevertheless, pat- tern identification involved an aleatory component, which was required to ensure the robust- ness of the classifier. This means that subsequent executions of the machine learning step would provide different results each time. In this experiment, the classifier validation step diminished the randomness of the calibration execution by requiring a high MCC score and the selection of the classifier with the best performance. The proposed methodology comprises only simple pixel and histogram feature extraction models, so certain chewing gum colour combinations can affect the quality of the features. This phenomenon has been registered by Halazonetis et al. where the Circular Variance of the Hue channel performed poorly with colour combinations that included similar HUE values although easily differentiable by direct observation [13]. In this regard, the present study employed red and white chewing gum samples; so, the mixture of these two colours produced pink shades that may have affected the performance of HUE-based features. Conclusions Within the limitations of this study we conclude that the proposed expert system proved able and reliable to accurately identify patterns in mixture and subsequently classify masticated two-coloured chewing gum specimens into the corresponding group represented by the num- ber of chewing cycles applied considering a healthy reference population. Furthermore, the expert system proved able to identify differences in the ME of edentulous individuals with and without total prosthesis. Finally, we propose the inclusion of the newly presented ME defini- tion and reference scale in further research studies in this field. Supporting information S1 Appendix. Border-preserving region segmentation and classification. (DOCX) Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 17 / 20 http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0190386.s001 https://doi.org/10.1371/journal.pone.0190386 S2 Appendix. Feature extraction models. (DOCX) S3 Appendix. Detailed information about the MEPAT construction. (DOCX) S4 Appendix. Core classification performance of single-feature classifiers. (DOCX) S1 File. XML schema for the MEPAT. (XML) Acknowledgments We express our gratitude to the volunteers who participated in this study. Our deepest appreci- ation goes to the Prometeo Project of the Secretarı́a de Educación Superior, Ciencia, Tecnolo- gı́a e Innovación (SENESCYT) of the Government of Ecuador for supporting this research work. We would also like to show our gratitude to the directive board of the Faculty of Den- tistry of the University of Guayaquil for providing us with many of the necessary resources for this study. Author Contributions Conceptualization: Gustavo Vaccaro, José Antonio Gil-Montoya. Data curation: José Ignacio Peláez, José Antonio Gil-Montoya. Formal analysis: Gustavo Vaccaro, José Ignacio Peláez. Funding acquisition: José Ignacio Peláez. Investigation: Gustavo Vaccaro, José Antonio Gil-Montoya. Methodology: Gustavo Vaccaro. Project administration: José Ignacio Peláez. Resources: José Ignacio Peláez. Software: Gustavo Vaccaro, José Ignacio Peláez. Supervision: José Ignacio Peláez, José Antonio Gil-Montoya. Validation: José Antonio Gil-Montoya. Visualization: Gustavo Vaccaro. Writing – original draft: Gustavo Vaccaro. Writing – review & editing: José Ignacio Peláez, José Antonio Gil-Montoya. References 1. Ohira A, Ono Y, Yano N, Takagi Y. The effect of chewing exercise in preschool children on maximum bite force and masticatory performance. Int J Paediatr Dent. 2012; 22: 146–53. https://doi.org/10.1111/ j.1365-263X.2011.01162.x PMID: 21781200 2. Bates J, Stafford G, Harrison A. Masticatory function—a review of the literature. III. Masticatory perfor- mance and efficiency. J Oral Rehabil. 1976; 3: 57–67. Recuperado: http://onlinelibrary.wiley.com/doi/ 10.1111/j.1365-2842.1976.tb00929.x/full PMID: 772184 Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 18 / 20 http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0190386.s002 http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0190386.s003 http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0190386.s004 http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0190386.s005 https://doi.org/10.1111/j.1365-263X.2011.01162.x https://doi.org/10.1111/j.1365-263X.2011.01162.x http://www.ncbi.nlm.nih.gov/pubmed/21781200 http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2842.1976.tb00929.x/full http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2842.1976.tb00929.x/full http://www.ncbi.nlm.nih.gov/pubmed/772184 https://doi.org/10.1371/journal.pone.0190386 3. The Glossary of Prosthodontic Terms. J Prosthet Dent. Elsevier; 2005; 94: 10–92. https://doi.org/10. 1016/j.prosdent.2005.03.013 PMID: 16080238 4. Dai R, Lam OLT, Lo ECM, Li LSW, Wen Y, McGrath C. Orofacial functional impairments among patients following stroke: a systematic review. Oral Dis. 2014; https://doi.org/10.1111/odi.12274 PMID: 25041135 5. Yamashita S, Hatch JP, Rugh JD. Does chewing performance depend upon a specific masticatory pat- tern? J Oral Rehabil. 1999; 26: 547–53. Recuperado: http://www.ncbi.nlm.nih.gov/pubmed/10445472 PMID: 10445472 6. Gil-Montoya JA, Subirá C, Ramón JM, González-Moles MA. Oral health-related quality of life and nutri- tional status. J Public Health Dent. 2008; 68: 88–93. https://doi.org/10.1111/j.1752-7325.2007.00082.x PMID: 18248335 7. Manly RS, Braley LC. Masticatory Performance and Efficiency. J Dent Res. 1950; 29: 448–462. https:// doi.org/10.1177/00220345500290040701 PMID: 15436916 8. Schimmel M, Christou P, Herrmann F, Müller F. A two-colour chewing gum test for masticatory effi- ciency: development of different assessment methods. J Oral Rehabil. 2007; 34: 671–8. https://doi.org/ 10.1111/j.1365-2842.2007.01773.x PMID: 17716266 9. Schimmel M, Christou P, Miyazaki H, Halazonetis D, Herrmann FR, Müller F. A novel colourimetric technique to assess chewing function using two-coloured specimens: validation and application. J Dent. 2015; 43: 955–964. https://doi.org/10.1016/j.jdent.2015.06.003 PMID: 26111925 10. Weijenberg RAF, Scherder EJA, Visscher CM, Gorissen T, Yoshida E, Lobbezoo F. Two-colour chew- ing gum mixing ability: digitalisation and spatial heterogeneity analysis. J Oral Rehabil. 2013; 40: 737– 43. https://doi.org/10.1111/joor.12090 PMID: 23927753 11. Vaccaro G, Pelaez JI, Gil JA. Choosing the best image processing method for masticatory performance assessment when using two-coloured specimens. J Oral Rehabil. 2016; 43: 496–504. https://doi.org/ 10.1111/joor.12392 PMID: 26968333 12. Prinz JF. Quantitative evaluation of the effect of bolus size and number of chewing strokes on the intra- oral mixing of a two-colour chewing gum. J Oral Rehabil. 1999; 26: 243–7. PMID: 10194734 13. Halazonetis DJ, Schimmel M, Antonarakis GS, Christou P. Novel software for quantitative evaluation and graphical representation of masticatory efficiency. J Oral Rehabil. 2013; 40: 329–35. https://doi.org/ 10.1111/joor.12043 PMID: 23452188 14. van der Bilt A, Mojet J, Tekamp FA, Abbink JH. Comparing masticatory performance and mixing ability. J Oral Rehabil. 2010; 37: 79–84. https://doi.org/10.1111/j.1365-2842.2009.02040.x PMID: 19968766 15. van der Bilt A. Assessment of mastication with implications for oral rehabilitation: a review. J Oral Reha- bil. 2011; 38: 754–80. https://doi.org/10.1111/j.1365-2842.2010.02197.x PMID: 21241351 16. Beucher S, Lantuéjoul C. Use of Watersheds in Contour Detection. International workshop on image processing, real-time edge and motion detection. 1979. 17. Lu S, Wang S, Zhang Y. A note on the marker-based watershed method for X-ray image segmentation. Comput Methods Programs Biomed. 2017; 141: 1–2. https://doi.org/10.1016/j.cmpb.2017.01.014 PMID: 28241959 18. Zhang Y, Dong Z, Phillips P, Wang S, Ji G, Yang J. Exponential Wavelet Iterative Shrinkage Threshold- ing Algorithm for compressed sensing magnetic resonance imaging. Inf Sci (Ny). 2015; 322: 115–132. https://doi.org/10.1016/j.ins.2015.06.017 19. Comaniciu D, Meer P. Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell. IEEE; 2002; 24: 603–619. https://doi.org/10.1109/34.1000236 20. Christoudias CM, Georgescu B, Meer P. Synergism in low level vision. Object recognition supported by user interaction for service robots. IEEE Comput. Soc; 2002. pp. 150–155. https://doi.org/10.1109/ ICPR.2002.1047421 21. MacQueen J. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. The Regents of the University of California; 1967. 22. Commission Internationale de L’Eclairage. Publication No. CIE 15.2. Colorimetry. 2nd ed. Vienna, Aus- tria: Central Bureau of the CIE.; 1986; 23. Zhang Y, Wang S, Phillips P, Ji G. Binary PSO with mutation operator for feature selection using deci- sion tree applied to spam detection. Knowledge-Based Syst. 2014; 64: 22–31. https://doi.org/10.1016/j. knosys.2014.03.015 24. Veredas F, Mesa H, Morente L. Binary tissue classification on wound images with neural networks and bayesian classifiers. IEEE Trans Med Imaging. IEEE; 2010; 29: 410–27. https://doi.org/10.1109/TMI. 2009.2033595 PMID: 19825516 Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 19 / 20 https://doi.org/10.1016/j.prosdent.2005.03.013 https://doi.org/10.1016/j.prosdent.2005.03.013 http://www.ncbi.nlm.nih.gov/pubmed/16080238 https://doi.org/10.1111/odi.12274 http://www.ncbi.nlm.nih.gov/pubmed/25041135 http://www.ncbi.nlm.nih.gov/pubmed/10445472 http://www.ncbi.nlm.nih.gov/pubmed/10445472 https://doi.org/10.1111/j.1752-7325.2007.00082.x http://www.ncbi.nlm.nih.gov/pubmed/18248335 https://doi.org/10.1177/00220345500290040701 https://doi.org/10.1177/00220345500290040701 http://www.ncbi.nlm.nih.gov/pubmed/15436916 https://doi.org/10.1111/j.1365-2842.2007.01773.x https://doi.org/10.1111/j.1365-2842.2007.01773.x http://www.ncbi.nlm.nih.gov/pubmed/17716266 https://doi.org/10.1016/j.jdent.2015.06.003 http://www.ncbi.nlm.nih.gov/pubmed/26111925 https://doi.org/10.1111/joor.12090 http://www.ncbi.nlm.nih.gov/pubmed/23927753 https://doi.org/10.1111/joor.12392 https://doi.org/10.1111/joor.12392 http://www.ncbi.nlm.nih.gov/pubmed/26968333 http://www.ncbi.nlm.nih.gov/pubmed/10194734 https://doi.org/10.1111/joor.12043 https://doi.org/10.1111/joor.12043 http://www.ncbi.nlm.nih.gov/pubmed/23452188 https://doi.org/10.1111/j.1365-2842.2009.02040.x http://www.ncbi.nlm.nih.gov/pubmed/19968766 https://doi.org/10.1111/j.1365-2842.2010.02197.x http://www.ncbi.nlm.nih.gov/pubmed/21241351 https://doi.org/10.1016/j.cmpb.2017.01.014 http://www.ncbi.nlm.nih.gov/pubmed/28241959 https://doi.org/10.1016/j.ins.2015.06.017 https://doi.org/10.1109/34.1000236 https://doi.org/10.1109/ICPR.2002.1047421 https://doi.org/10.1109/ICPR.2002.1047421 https://doi.org/10.1016/j.knosys.2014.03.015 https://doi.org/10.1016/j.knosys.2014.03.015 https://doi.org/10.1109/TMI.2009.2033595 https://doi.org/10.1109/TMI.2009.2033595 http://www.ncbi.nlm.nih.gov/pubmed/19825516 https://doi.org/10.1371/journal.pone.0190386 25. Judd DB. Hue Saturation and Lightness of Surface Colors with Chromatic Illumination. J Opt Soc Am. Optical Society of America; 1940; 30: 2. https://doi.org/10.1364/JOSA.30.000002 26. Vezhnevets V, Sazonov V, Andreeva A. A Survey on Pixel-Based Skin Color Detection Techniques. Graphicon-2003. Moscow, Russia; 2003. pp. 85–92. 27. Jolliffe I. Principal Component Analysis. New York: Springer-Verlag; 2002. https://doi.org/10.1007/ b98835 28. Haykin S. Neural Networks: A Comprehensive Foundation. Prentice Hall PTR; 1998; 29. Ripley B.D. Pattern Recognition and Neural Networks. Oxford, UK .: Cambridge University Press; 1996. 30. Zhang Y, Sun Y, Phillips P, Liu G, Zhou X, Wang S. A Multilayer Perceptron Based Smart Pathological Brain Detection System by Fractional Fourier Entropy. J Med Syst. Springer US; 2016; 40: 173. https:// doi.org/10.1007/s10916-016-0525-2 PMID: 27250502 31. Wang S-H, Zhang Y, Li Y-J, Jia W-J, Liu F-Y, Yang M-M, et al. Single slice based detection for Alzhei- mer’s disease via wavelet entropy and multilayer perceptron trained by biogeography-based optimiza- tion. Multimed Tools Appl. Springer US; 2016; 1–25. https://doi.org/10.1007/s11042-016-4222-4 32. Suzuki K, editor. Artificial Neural Networks—Architectures and Applications. InTech; 2013. https://doi. org/10.5772/3409 33. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagating errors. Nature. 1986; 323: 533–536. https://doi.org/10.1038/323533a0 34. Čepek M, Šnorek M, Chudáček V. Ecg signal classification using game neural network and its compari- son to other classifiers. Artificial Neural Networks-ICANN 2008. Springer; 2008. pp. 768–777. 35. Callejón AM, Casado AM, Fernández MA, Peláez JI. A System of Insolvency Prediction for industrial companies using a financial alternative model with neural networks. Int J Comput Intell Syst. Taylor & Francis Group; 2013; 6: 29–37. https://doi.org/10.1080/18756891.2013.754167 36. Vaccaro G, Pelaez JI. Dental tissue classification using computational intelligence and digital image analysis. Biodental Engineering III—Proceedings of the 3rd International Conference on Biodental Engi- neering, BIODENTAL 2014. Taylor and Francis—Balkema; 2014. pp. 221–226. https://doi.org/10. 1201/b17071 37. Lawrence S, Giles CL, Tsoi AC. Lessons in neural network training: Overfitting may be harder than expected. Proceedings of the Fourteenth National Conference on Artificial Intelligence, AAAI-97. Menlo Park, California: AAAI Press; 1997. pp. 540–545. 38. Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta—Protein Struct. 1975; 405: 442–451. https://doi.org/10.1016/0005-2795(75) 90109-9 39. Cohen J. A Coefficient of Agreement for Nominal Scales. Educ Psychol Meas. 1960; 20: 37–46. https:// doi.org/10.1177/001316446002000104 Objective masticatory efficiency assessment PLOS ONE | https://doi.org/10.1371/journal.pone.0190386 January 31, 2018 20 / 20 https://doi.org/10.1364/JOSA.30.000002 https://doi.org/10.1007/b98835 https://doi.org/10.1007/b98835 https://doi.org/10.1007/s10916-016-0525-2 https://doi.org/10.1007/s10916-016-0525-2 http://www.ncbi.nlm.nih.gov/pubmed/27250502 https://doi.org/10.1007/s11042-016-4222-4 https://doi.org/10.5772/3409 https://doi.org/10.5772/3409 https://doi.org/10.1038/323533a0 https://doi.org/10.1080/18756891.2013.754167 https://doi.org/10.1201/b17071 https://doi.org/10.1201/b17071 https://doi.org/10.1016/0005-2795(75)90109-9 https://doi.org/10.1016/0005-2795(75)90109-9 https://doi.org/10.1177/001316446002000104 https://doi.org/10.1177/001316446002000104 https://doi.org/10.1371/journal.pone.0190386 work_2pw4vkdn4jh4dohqkeifcbjleq ---- Label Ranking Forests Cláudio Rebelo de Sá1,3 Carlos Soares2,3 Arno Knobbe1 Paulo Cortez4 1 LIACS, Universiteit Leiden, Netherlands 2 Faculdade de Engenharia, Universidade do Porto, Portugal 3 INESCTEC Porto, Porto, Portugal 4 ALGORITMI Centre, Department of Information Systems, University of Minho, Portugal Abstract The problem of Label Ranking is receiving increasing attention from several research communities. The algorithms that have been developed/adapted to treat rankings of a fixed set of labels as the tar- get object, include several different types of decision trees (DT). One DT-based algorithm, which has been very successful in other tasks but which has not been adapted for label ranking is the Random Forests (RF) algorithm. RFs are an ensemble learning method that combines different trees obtained using different randomization techniques. In this work, we propose an ensemble of decision trees for Label Ranking, based on Random Forests, which we refer to as Label Ranking Forests (LRF). Two different algorithms that learn DT for label ranking are used to obtain the trees. We then compare and discuss the results of LRF with standalone decision tree approaches. The results indicate that the method is highly competitive. 1 Introduction Label Ranking (LR) is an increasingly popular topic in the machine learn- ing literature (Ribeiro et al., 2012; de Sá et al., 2011; Cheng & Hüllermeier, 2011; Cheng et al., 2012; Vembu & Gärtner, 2010). LR studies a problem of learning a mapping from instances to rankings over a finite number of prede- fined labels. It can be considered a natural generalization of the conventional 1 classification problem, where the goal is to predict a single label instead of a ranking of all the labels (Cheng et al., 2009). Some application of Label Ranking approaches are (Hüllermeier et al., 2008): Meta-learning (Brazdil & Soares, 1999), where we try to predict a ranking of a set of algorithms according to the best expected accuracy on a given dataset; Microarray analysis (Hüllermeier et al., 2008) to find patterns in genes from Yeast on five different micro-array experiments (spo, heat, dtt, cold and diau); Image categorization (Fürnkranz et al., 2008) of landscape pictures from several categories (beach, sunset, field, fall foliage, mountain, urban). There are two main approaches to the problem of LR: methods that transform the ranking problem into multiple binary problems and methods that were developed or adapted to treat the rankings as target objects, with- out any transformation. An example of the former is the ranking by pair- wise comparisons (Hüllermeier et al., 2008). Examples of algorithms that were adapted to deal with rankings as the target objects include decision trees (Todorovski et al., 2002; Cheng et al., 2009), naive Bayes (Aiguzhinov et al., 2010) and k -Nearest Neighbor (Brazdil et al., 2003; Cheng et al., 2009). Some of the latter adaptations are based on statistical distribution of rank- ings (e.g., (Cheng et al., 2010)) while others are based on ranking distance measures (e.g., (Todorovski et al., 2002; de Sá et al., 2011)). Tree-based models have been used in classification (Quinlan, 1986), re- gression (Breiman et al., 1984) and also label ranking (Todorovski et al., 2002; Cheng et al., 2009; de Sá et al., 2015) tasks. These methods are popular for a number of reasons, including how they can clearly express information about the problem, because their structure is relatively easy to interpret even for people without a background in learning algorithms. In classification, combining the predictive power of an ensemble of trees often comes with significant accuracy improvements (Breiman, 2001). One of the earliest examples of ensemble methods is bagging (a contraction of bootstrap-aggregating) (Breiman, 1996). In bagging, an ensemble of trees is generated and each one is learned on a random selection of examples from the training set. A popular ensemble method is Random Forests (Breiman, 2001) which combines different randomization techniques. Considering the success of Random Forests in terms of improved accu- racy for classification and regression problems (Biau, 2012), some approaches have been proposed to deal with different targets, such as bipartite rankings (Clémençon et al., 2013). Label Ranking Forests should also be seen as a potential robust approach for LR. Adapting RF to Label Ranking can be a straightforward process once you have adapted decision trees. In this work, we propose an approach of ensemble learners which we 2 refer to as Label Ranking Forests (LRF). The proposed method is a natural adaptation of Random Forests for LR, combining the task-independent RF algorithm with the traditional algorithm for top-down induction of decision trees adapted for label ranking. The available adaptations of decision tree algorithms for LR include Label Ranking Trees (LRT) (Cheng et al., 2009), Ranking Trees (Rebelo et al., 2008) and Entropy-based Ranking Trees (de Sá et al., 2015). Considering that the set of trees, in most cases, predict distinct rankings, one should also take into account ranking aggregation methods. This paper extends previous work (de Sá et al., 2015), in which we pro- posed a new version of decision trees for LR, called the Entropy-based Rank- ing Trees and empirically compared them to existing approaches. The main contribution in this paper is the new Label Ranking Forests algorithm, which is an adaptation of the RF ensemble method, using Entropy-based Ranking Trees as the base level algorithm. The results indicate that LRF are compet- itive with state of the art methods and improve the accuracy of standalone decision trees. An additional contribution is an extension of the original experimental study on Entropy-based Ranking Trees, by analyzing model complexity. 2 Label Ranking In this section, we start by formalizing the problem of label ranking (Sec- tion 2.1) and then we discuss the adaptation of the decision trees algorithm for label ranking (Section 2.2) and one such adaptation, Entropy Ranking Trees (Section 2.3). 2.1 Formalization The Label Ranking (LR) task is similar to classification. In classification, given an instance x from the instance space X, the goal is to predict the label (or class) λ to which x belongs, from a pre-defined set L = {λ1, . . . ,λk}. In LR, the goal is to predict the ranking of the labels in L that is associated with x (Hüllermeier et al., 2008). A ranking can be represented as a total order over L defined on the permutation space Ω. A total order can be seen as a permutation π of the set {1, . . . ,k}, such that π(a) is the position of λa in π. As in classification, we do not assume the existence of a deterministic X → Ω mapping. Instead, every instance is associated with a probability distribution over Ω (Cheng et al., 2009). This means that, for each x ∈ X, there exists a probability distribution P(·|x) such that, for every π ∈ 3 Ω, P(π|x) is the probability that π is the ranking associated with x. The goal in LR is to learn the mapping X → Ω. The training data is a set of instances D = {〈xi,πi〉}, i = 1, . . . ,n, where xi is a vector containing the values x j i,j = 1, . . . ,m of m independent variables describing instance i and πi is the corresponding target ranking. Given an instance xi with label ranking πi, and the ranking π̂i predicted by an LR model, we can evaluate the accuracy of the prediction with loss functions on Ω. Some of these measures are based in the number of discordant label pairs: D(π,π̂) = #{(a,b)|π(a) > π(b) ∧ π̂(a) < π̂(b)} If normalized to the interval [−1, 1], this function is equivalent to Kendall’s τ coefficient, which is a correlation measure where D(π,π) = 1 and D(π,π−1) = −1, where π−1 denotes the inverse order of π (e.g. π = (1, 2, 3, 4) and π−1 = (4, 3, 2, 1)). The accuracy of a model can be estimated by averaging this coefficient over a set of examples. Other correlation measures, such as Spearman’s rank correlation coefficient (Spearman, 1904), have also been used (Brazdil et al., 2003). Although we assume total orders, it may be the case that two labels are tied in the same rank (i.e. πi(a) = πi(b),a 6= b). In this case, a variation of Kendall’s τ, the tau− b (Agresti, 2010) can be used. 2.2 Ranking Trees Tree-based models have been used in classification (Quinlan, 1986), regres- sion (Breiman et al., 1984), and label ranking (Todorovski et al., 2002; Cheng et al., 2009; de Sá et al., 2015) tasks. These methods are popular for a num- ber of reasons, including how they can clearly express information about the problem, because their structure is relatively easy to interpret even for people without a background in learning algorithms. It is also possible to obtain information about the importance of the various attributes for the prediction depending on how close to the root they are used. The Top-Down Induction of Decision Trees (TDIDT) algorithm is com- monly used for induction of decision trees (Mitchell, 1997). It is a recursive partitioning algorithm that iteratively splits data into smaller subsets which are increasingly more homogeneous in terms of the target variable (Algo- rithm 1). A split is a test on one of the attributes that divides the dataset into two disjoint subsets. For instance, given a numerical attribute x2, a split could be x2 ≥ 5. Given a splitting criterion that represents the gain in purity obtained with a split, the algorithm chooses the split that optimizes 4 its value in each iteration. In its simplest form, the TDIDT algorithm only stops when the nodes are pure, i.e., when the value of the target attribute is the same for all examples in the node. This usually causes the algorithm to overfit, i.e., to generate models that capture the noise in the data, as well as the regularities that are of general usefulness. One approach to address this problem is to introduce a stopping criterion in the algorithm that tests whether the best split is significantly improving the quality of the model. If not, the algorithm stops and returns a leaf node. The algorithm is executed recursively for the subsets of the data obtained based on the best split until the stopping criterion is met. A leaf node is represented by a value of the target attribute generated by a rule that solves potential conflicts in the set of training examples that are in the node. That value is the prediction that will be made for new examples that fall into that node. In classification, the prediction rule is usually the most frequent class among the training examples. Algorithm 1 TDIDT algorithm Input: Dataset D BestSplit = Test of the attributes that optimizes the SPLITTING CRI- TERION if STOPPING CRITERION == TRUE then Determine leaf prediction based on the target values in D Return a leaf node with the corresponding LEAF PREDICTION else LeftSubtree = TDIDT(D¬BestSplit) RightSubtree = TDIDT(DBestSplit) end if The adaptation of this algorithm for label ranking involves an appropriate choice of the splitting criterion, stopping criterion and the prediction rule (Algorithm 1). Splitting Criterion The splitting criterion is a measure that quantifies the quality of a given partition of the data. It is usually applied to all the possible splits of the data that can be made with tests on the values of individual attributes. In Ranking Trees (RT) the goal is to obtain leaf nodes that contain ex- amples with target rankings as similar between themselves as possible. To assess the similarity between the rankings of a set of training examples, the mean correlation between them is calculated using Kendall, Spearman or any other ranking correlation coefficient. The quality of the split is given by the 5 Table 1: Illustration of the splitting criterion Attribute Condition=true Condition=false values rank corr. values rank corr. x1 a 0.3 {b,c} -0.2 b 0.2 {a,c} 0.1 c 0.5 {a,b} 0.2 x2 < 5 -0.1 ≥ 5 0.1 weighted mean correlation of the values obtained for the subsets, where the weight is given by the number of examples in each subset. For simplicity, if we ignore the weights, the splitting criterion of ranking trees is illustrated both for nominal and numerical attributes in Table 1. The nominal attribute x1 has three values (a, b and c). Therefore, three binary splits are possible. For the numerical attribute x2, a split can be made in between every pair of consecutive values. In this case, the best split is x1 = c, with a mean correlation of 0.5, in comparison to a mean correlation of 0.2 for the remaining, i.e., the training examples for which x1 = {a,b}. Stopping Criterion The stopping criterion is used to determine if it is worthwhile to make a split or if there is a significant risk of overfit- ting (Mitchell, 1997). A split should only be made if the similarity between examples in the subsets increases substantially. Let Sparent be the similarity between the examples in the parent node and Ssplit the weighted mean sim- ilarity in the subsets obtained with the best split. The stopping criterion is defined as follows (Rebelo et al., 2008): (1 + Sparent) ≥ γ(1 + Ssplit) (1) Note that the relevance of the increase in similarity is controlled by the γ parameter. A γ ≥ 1 does not ensure increased purity of child nodes. On the other hand, small γ values require splits with very large increase in purity, which means that the algorithm will stop the recursion early. Prediction Rule The prediction rule is a method to generate a prediction from the (possibly conflicting) target values of the training examples in a leaf node. In LR, the aggregation of rankings is not so straightforward as in other tasks (e.g. classification or regression) and is known as the ranking aggregation problem (Yasutake et al., 2012). It is a classical problem in social choice literature (de Borda, 1781) but also in information retrieval 6 Table 2: Illustration of the prediction rule λ1 λ2 λ3 λ4 π1 1 3 2 4 π2 2 1 4 3 π 1.5 2 3 3.5 π̂ 1 2 3 4 tasks (Dwork et al., 2001). A consensus ranking minimizes the distance to all rankings (Kemeny & Snell, 1972). A simple approach, which we adopted in this work, is to compute the average ranking (Brazdil et al., 2003) of the predictions. It is calculated by averaging the rank for each label λj, π (j) = ∑ i πi (j) /n. The predicted ranking π̂ is the ranking π of the labels λj obtained based on the average ranks π (j). Table 2 illustrates the prediction rule used in this work. 2.3 Entropy Ranking Trees Recently, we proposed an alternative approach to decision trees for ranking data, the Entropy-based Ranking Trees (ERT) (de Sá et al., 2015). ERT uses an adaptation of Information Gain (IG) (de Sá et al., 2016) to as- sess the splitting points and the Minimum Description Length Principle Cut (MDLPC) (Fayyad & Irani, 1993) as the stopping criterion. To explain this method, we start by presenting the IG for rankings measure and then the adapted splitting and stopping criteria. Decision trees for classification, such as ID3 (Quinlan, 1986), use Infor- mation Gain (IG) as a splitting criterion to determine the best split points. IG is a statistical property that measures the gain in entropy, between the prior and actual state (Mitchell, 1997). In this case, we measure it in terms of the distribution of the target variable, before and after the split. In other words, considering a set S of size nS, since entropy, H, is a measure of disor- der, IG is basically how much uncertainty in S is eliminated after splitting on a numerical attribute xa: IG (xa,T; S) = H (S) − |S1| nS H (S1) − |S2| nS H (S2) where |S1| and |S2| are the number of instances on the left side (S1) and the number of instances on the right side (S2), respectively, of the cut point T in attribute xa. 7 In cases where S is a set of rankings, we can use the entropy for rankings (de Sá et al., 2016) which is defined as: Hranking (S) = K∑ i=1 P (πi,S) log (P (πi,S)) log ( kt (S) ) (2) where P (πi,S) is the proportion of rankings equal to πi in S, K is the number of distinct rankings in S and kt (S) is the average normalized Kendall τ (Kendall & Gibbons, 1970) distance in the subset S: kt (S) = ∑K i=1 ∑n j=1 τ(πi,πj)+1 2 K ×nS . As in Section 2.2, the leaves of the tree should not be forced to be pure. Instead, a stopping criterion should be used to avoid overfitting and be ro- bust to noise in rankings. Given an entropy measure, the adaptation of the splitting and stopping criteria comes in a natural way. As shown in (de Sá et al., 2016), the MDLPC Criterion can be used as a splitting criterion with the adapted version of entropy Hranking. This entropy measure also works with partial orders, however, in this work, we only use total orders. 3 Random Forests Random Forests (RF) (Breiman, 2001) are an ensemble method originally proposed for classification and regression problems. It essentially consists of the generation of multiple decision trees obtained using different randomiza- tion techniques. The set of predictions made by each of these trees is then aggregated to obtain the prediction of the ensemble. The RF algorithm is related to another popular ensemble method by the same author, Bagging (Breiman, 1996), which stands for bootstrap-aggregating. This is an ensemble method that takes a predefined number s of samples (without replacement) from the training data to construct s models. Given a new example, s predictions are generated, which are then aggregated, usu- ally with average or mode, to obtain a combined prediction. RF can be regarded as an extension of bagging. Given a forest size s and a training dataset D, a set of bootstrap samples, {D′1 . . . ,D′s} is generated by sampling with repetition from D. A decision tree is learned from each {D′1 . . . ,D′s}, which is grown in a slightly different way from the original. At each node, only a random subset of the m features can be used for splitting. In classification, the number of random features used in each split is usually√ m and in regression log2 m. This results in what is usually referred to as 8 random trees. As in bagging, each of the s random trees makes predictions on the test data, which are then combined using a suitable aggregation method. One of the reasons for the popularity of RF lays in the fact that they have few parameters to tune and can be applied to various tasks (Scornet et al., 2014). They require a simple implementation and even with small sample sizes it usually gives accurate results. Moreover, considering that it uses s independent learners, it can be parallelized. One of the reasons that makes RF a popular approach is that it is possible to take advantage of the algorithm to assess variable importance (Genuer et al., 2010). 3.1 Label Ranking Forests Considering the success of Random Forests in terms of improved accuracy for classification and regression problems, some approaches have been proposed to deal with different targets, such as bipartite rankings (Clémençon et al., 2013). Label Ranking Forests should also be seen as a potential robust approach for LR. Adapting RF to Label Ranking can be a straightforward process once you have adapted decision trees. Thus, we propose a new ensemble LR algorithm, the Label Ranking Forests based on Random Forests. With this approach, we expect to in- crease the accuracy of Label Ranking tree methods. In classification and regression, the aggregation of predictions is done in a simple way, mode and mean, respectively. However, as discussed in Section 2.2, the aggregation of rankings is not so straightforward. Like in Ranking Trees, we use the average ranking (Brazdil et al., 2003) to aggregate the predictions. Given the similarity of the LR task to classification, the number of random subset features we use in each split is √ m, the same value that is used in RF for classification. When the algorithm is not able to find a good split on any of the √ m selected features for the root node, it looks for a split on all the m features instead. This prevents the random feature selection mechanism from gener- ating empty trees. 4 Empirical Study In this section we describe the empirical study to investigate the performance of LRF and the tree methods used at the base level. We start by describ- ing the experimental setup (Section 4.1), then the results of the base-level 9 algorithms (Section 4.2) and finally the results of the new algorithm (Sec- tion 4.3). 4.1 Experimental setup The experiments are carried out on datasets from the KEBI Data Repository at the Philipps University of Marburg (Cheng et al., 2009) that are typically used in LR research (Table 3). They are based on classification and regression datasets, obtained using two different transformation methods: A) the target ranking is a permutation of the classes of the original target attribute, derived from the probabilities generated by a Naive Bayes classifier; B) the target ranking is derived for each example from the order of the values of a set of numerical variables, which are then no longer used as independent variables. A few basic statistics of the datasets used in our experiments are presented in Table 3. Although these are somewhat artificial datasets, they are quite useful as benchmarks for LR algorithms. A simple measure of the diversity of the target rankings is the Unique Ranking’s Proportion, Uπ. Uπ is the proportion of distinct target rankings for a given dataset (Table 3). As a practical example, the iris dataset has 5 distinct rankings for 150 instances, which yields Uπ = 5 150 ≈ 3%. This means that all the 150 rankings are duplicates of these 5. The code for all the experiments presented in this paper has been written in R (R Development Core Team, 2010).1 The generalization performance of the LR methods was estimated using a methodology that has been used previously for this purpose (Hüllermeier et al., 2008). The evaluation measure is Kendall’s τ and the performance of the methods was estimated using ten-fold cross-validation. 4.2 Results with Label Ranking Trees We evaluate the two variants of ranking trees described earlier: ranking trees (RT) and entropy ranking trees (ERT) (Sections 2.2 and 2.3). The RT algorithm has a parameter γ, that can affect the accuracy of the model. Based on previous results, we use γ = 0.98 for RT (de Sá et al., 2015). Table 4 presents the results obtained by the two decision tree approaches, RT and ERT, in comparison to the results for Label Ranking Trees (LRT), that are reproduced from the original paper (Cheng et al., 2009). We note that we have no information about the depth of the trees obtained with the latter and thus such information is omitted in Table 4. Even though LRT 1The code is available at https://github.com/rebelosa/labelrankingforests. 10 Table 3: Summary of the KEBI datasets Datasets type #examples #labels #attributes Uπ autorship A 841 4 70 2% bodyfat B 252 7 7 94% calhousing B 20,640 4 4 0.1% cpu-small B 8,192 5 6 1% elevators B 16,599 9 9 1% fried B 40,769 5 9 0.3% glass A 214 6 9 14% housing B 506 6 6 22% iris A 150 3 4 3% pendigits A 10,992 10 16 19% segment A 2310 7 18 6% stock B 950 5 5 5% vehicle A 846 4 18 2% vowel A 528 11 10 56% wine A 178 3 13 3% wisconsin B 194 16 16 100% performs best in most of the cases presented, both RT and ERT are also competitive methods. Figure 1 shows how much smaller ERT trees are, in general. By generating smaller trees, ERT provides more interpretable models when compared with RT. An exception is the calhousing dataset, where ERT generates larger trees. However, in this case, the increase in size is justified by a reasonable increase of accuracy (Table 4). To compare different ranking methods, we use a combination of Fried- man’s test and Dunn’s Multiple Comparison Procedure (Neave & Worthing- ton, 1992), which has been used before for this purpose (Brazdil et al., 2003). First we run the Friedman’s test to check whether the results are different or not, with the following hypotheses: H0 : The distributions of Kendall’s τ are equal H1 : The distributions of Kendall’s τ are not equal Using the Friedman test (implemented in the stats package (R Development Core Team, 2010)) we obtained a p-value < 1%, which shows strong evi- dence against H0. This means that there is a high probability that the three methods have different performance. 11 Table 4: Results obtained for Ranking Trees on KEBI datasets (the mean accuracy is represented in terms of Kendall’s tau, τ; the best mean accuracy values are in bold) RT ERT LRT mean mean mean accuracy depth accuracy depth accuracy authorship .883 8.0 .889 4.0 .882 bodyfat .111 11.9 .182 2.7 .117 calhousing .182 1.0 .291 11.6 .324 cpu-small .458 17.2 .437 6.1 .447 elevators .746 18.9 .757 7.9 .760 fried .797 20.2 .774 13.2 .890 glass .871 8.2 .854 3.0 .883 housing .794 12.9 .704 3.4 .797 iris .963 4.3 .853 2.0 .947 pendigits .871 14.0 .838 5.9 .935 segment .929 12.0 .901 5.0 .949 stock .897 10.8 .859 5.0 .895 vehicle .817 11.0 .787 4.1 .827 vowel .833 12.5 .598 3.6 .794 wine .905 4.0 .906 2.0 .882 wisconsin .334 10.0 .337 2.3 .343 average .712 11.1 .685 5.1 .730 Thus, we tested which of the three methods are different from one an- other with the Dunns Multiple Comparison Procedure (Neave & Worthing- ton, 1992). Using the R package dunn.test (Dinno, 2015), we tested the following hypotheses for each pair of methods a and b: H0 The distributions of Kendall’s τ for a and b are equal H1 The distributions of Kendall’s τ for a and b are not equal Table 5 indicates that there is no statistical evidence that the methods are different. The statistical tests confirm our observation that, although LRT generally obtains better results than RT and ERT, the latter approaches are competitive. 12 Figure 1: Comparison of the average depth of the trees obtained with RT (blue) and ERT (red) on KEBI datasets Table 5: Dunn’s test results (p-values) RT ERT LRT RT 0.22 0.37 ERT 0.22 0.13 LRT 0.37 0.13 13 4.3 Results with Label Ranking Forests We generated forests with 100 trees and aggregated the predicted rankings with the average ranking method (Brazdil et al., 2003). Table 6 presents the results obtained by the Label Ranking Forests using RT and ERT, referred to as LRF-RT and LRF-ERT, respectively. The average depth of the trees for LRF-RT is, for most cases, smaller than that of the tree obtained with the RT algorithm, while the accuracy is better. On average, for each 0.019 increase in accuracy there was a decrease of 1.8 in the average depth of the trees. One exception is the elevators dataset, with suffered a significant decrease in accuracy by using the LRF method. The comparison between ERT and LRF-ERT leads to different observa- tions. The average depth of the trees increases when using LRF. This can be explained by the fact that the measure of entropy for rankings used in ERT is very robust to noise in rankings (de Sá et al., 2016). Hence, it requires a larger amount of dissimilarity in a set of rankings to find a partition. As noted in Section 4.3 (Figure 1) the depth of the trees is much smaller with ERT than with RT. An additional observation is that the result obtained using LRF with ERT yielded a significant reduction in accuracy. In Figure 2, we can observe how much the accuracy increases/decreases with LRF when compared to the corresponding base-level trees alone. In the vast majority of datasets, there is some improvement in accuracy. The only exception is the elevators dataset, as mentioned above. Using the same statistical tests as before (Section 4.2), we compare LRF- RT and LRF-ERT with the RT, ERT and LRT methods. With the Fried- man’s test we got a p-value < 1%, which shows strong evidence against H0. Then, now that we know that there are some differences between the 2 meth- ods we will test which are different from one another with the Dunns Mul- tiple Comparison Procedure. Since we got a p-value around 25%, between LRF-RT and the LRF-ERT, we cannot conclude that there is no statistical evidence that the methods are different. On the pairwise comparisons of the methods Table 8, we measure how many time each method wins, in terms of accuracy. In this analysis, we conclude that Label Ranking Forests using RT give the best results, proving the effectiveness of the approach. On the other hand, even though LRF-ERT shows some improvement in terms of accuracy relatively to ERT, it did not behave much better than RT or LRT (Table 8). Again, this might be caused by the fact that the measure of entropy for rankings used in ERT is very robust to noise. For this reason, the depth of trees in LRF-ERT is, on average, 70% the depth of trees in LRF-RT. While this can be an advantage in terms of Label Ranking Trees, 14 Table 6: Results obtained for Label Ranking Forests on KEBI datasets, using two different label ranking trees, RT and ERT (the mean accuracy is represented in terms of Kendall’s tau, τ; the best mean accuracy values are in bold) LRF-RT LRF-ERT mean mean accuracy depth accuracy depth authorship .912 8.3 .906 7.7 bodyfat .212 10.6 .211 5.3 calhousing .185 1.0 .294 8.3 cpu-small .469 13.9 .471 7.8 elevators .605 10.0 .721 9.5 fried .887 15.5 .841 14.5 glass .874 6.0 .849 2.7 housing .780 10.9 .699 3.7 iris .973 4.9 .933 2.3 segment .930 10.8 .917 5.2 stock .892 9.9 .869 5.5 vehicle .850 10.0 .849 9.4 vowel .844 11.5 .701 4.9 wine .932 4.3 .925 2.8 wisconsin .460 8.8 .429 3.7 average .720 9.1 .708 6.2 Table 7: Dunn’s test for all the methods (p-values) RT ERT LRT LRF-RT LRF-ERT RT 0.23 0.34 0.31 0.44 ERT 0.23 0.13 0.11 0.28 LRT 0.34 0.13 0.46 0.29 LRF-RT 0.31 0.11 0.46 0.25 LRF-ERT 0.44 0.28 0.29 0.25 15 Figure 2: Accuracy gained/lost per dataset for using the ensemble method LRF, instead of standalone decision trees RT (blue) and ERT (red) on KEBI datasets Table 8: Pairwise comparisons of the methods in terms of win statistics. RT ERT LRT LRF-RT LRF-ERT Total (Rank) RT 9 6 3 7 25 (4) ERT 6 3 2 4 15 (5) LRT 9 12 7 9 37 (2) LRF-RT 12 13 8 13 46 (1) LRF-ERT 8 11 6 2 27 (3) 16 in Label Ranking Forests it is less relevant because it is hard to interpret 100 trees per dataset. 5 Conclusions In this work, we propose an ensemble of decision tree methods for Label Ranking, called Label Ranking Forests (LRF). The method is tested with two different base-level methods Ranking Trees (RT) and Entropy-based Ranking Trees (ERT). We present an empirical evaluation using well known datasets in this field. We also extend the analysis from previous work for tree-based methods, RT and ERT, and compare with the state of the art Label Ranking Trees (LRT) approach. The analysis on the decision trees shows that both RT and ERT are valid and competitive approaches. While RT usually gives better accuracy, on the other hand, ERT generates trees with much smaller depth (around 50% less, in comparison to RT). Our results were also compared with the published results for Label Ranking Trees (LRT) (Cheng et al., 2009). LRT has in general better accuracy than RT and ERT, however, statistical tests showed that none of the methods is significantly different. This means that both RT and ERT are competitive approaches, and, since they are distance-based methods, we can also say that this kind of approaches is worth pursuing. The two ensemble approaches, LRF-RT and LRF-ERT, used the base ranking tree models RT and ERT, respectively. Similarly to the application of Random Forests to other tasks, there was a general increase in accuracy when compared to the corresponding base-level methods. The results confirm that both LRF-RT and LRF-ERT are highly competitive LR methods. LRF- RT, in particular, stands out as a clear winner in terms of accuracy. As future work, we might improve the comparison with LRT method (Cheng et al., 2009), by implementing it and testing it both as learning algorithm and as the base-level method for Label Ranking Forests. Also, LRF can po- tentially produce similar benefits as the Random Forest method, in terms of feature selection or input variable importance measurement, when applied to LR datasets. Finally, the experiments in this paper were carried out on a set of standard benchmark datasets, which represent artificial LR problems. We plan to apply these approaches on real world datasets e.g. related with user preferences (Kamishima, 2003). 17 Acknowledgments This work is financed by the ERDF - European Regional Development Fund through the Operational Programme for Competitiveness and Internationali- sation - COMPETE 2020 Programme within project POCI-01-0145-FEDER- 006961, and by National Funds through the FC - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) as part of project UID/EEA/50014/2013. References Agresti, A. (2010), Analysis of ordinal categorical data, Vol. 656, John Wiley & Sons. Aiguzhinov, A., Soares, C. & Serra, A. P. (2010), A similarity-based adap- tation of naive bayes for label ranking: Application to the metalearning problem of algorithm recommendation, in ‘Discovery Science - 13th In- ternational Conference, DS 2010, Canberra, Australia, October 6-8, 2010. Proceedings’, pp. 16–26. Biau, G. (2012), ‘Analysis of a random forests model’, Journal of Machine Learning Research 13, 1063–1095. Brazdil, P. & Soares, C. (1999), Exploiting Past Experience in Ranking Clas- sifiers, in H. Bacelar-Nicolau, F. C. Nicolau & J. Janssen, eds, ‘Applied Stochastic Models and Data Analysis’, Instituto Nacional de Estat́ıstica, pp. 299–304. Brazdil, P., Soares, C. & da Costa, J. P. (2003), ‘Ranking learning algo- rithms: Using IBL and meta-learning on accuracy and time results’, Ma- chine Learning 50(3), 251–277. Breiman, L. (1996), ‘Bagging predictors’, Machine Learning 24(2), 123–140. Breiman, L. (2001), ‘Random forests’, Machine Learning 45(1), 5–32. Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. (1984), Classifi- cation and Regression Trees, Wadsworth. Cheng, W. & Hüllermeier, E. (2011), ‘Label ranking with abstention: Pre- dicting partial orders by thresholding probability distributions (extended abstract)’, Computing Research Repository, CoRR. 18 Cheng, W., Dembczynski, K. & Hüllermeier, E. (2010), Label ranking meth- ods based on the plackett-luce model, in ‘ICML’, pp. 215–222. Cheng, W., Huhn, J. C. & Hüllermeier, E. (2009), Decision tree and instance- based learning for label ranking, in ‘Proceedings of the 26th Annual Inter- national Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14-18, 2009’, pp. 161–168. Cheng, W., Hüllermeier, E., Waegeman, W. & Welker, V. (2012), Label ranking with partial abstention based on thresholded probabilistic models, in ‘Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States.’, pp. 2510–2518. Clémençon, S., Depecker, M. & Vayatis, N. (2013), ‘Ranking forests’, Journal of Machine Learning Research 14(1), 39–73. de Borda, J. C. (1781), ‘Mémoire sur les élections au scrutin’. de Sá, C. R., Rebelo, C., Soares, C. & Knobbe, A. J. (2015), Distance- based decision tree algorithms for label ranking, in ‘Progress in Artificial Intelligence - 17th Portuguese Conference on Artificial Intelligence, EPIA 2015, Coimbra, Portugal, September 8-11, 2015. Proceedings’, pp. 525– 534. de Sá, C. R., Soares, C. & Knobbe, A. J. (2016), ‘Entropy-based discretiza- tion methods for ranking data’, Inf. Sci. 329, 921–936. de Sá, C. R., Soares, C., Jorge, A. M., Azevedo, P. J. & da Costa, J. P. (2011), Mining association rules for label ranking, in ‘Advances in Knowledge Dis- covery and Data Mining - 15th Pacific-Asia Conference, PAKDD 2011, Shenzhen, China, May 24-27, 2011, Proceedings, Part II’, pp. 432–443. Dinno, A. (2015), dunn.test: Dunn’s Test of Multiple Comparisons Using Rank Sums. R package version 1.2.3. Dwork, C., Kumar, R., Naor, M. & Sivakumar, D. (2001), Rank aggregation methods for the web, in ‘Proceedings of the Tenth International World Wide Web Conference, WWW 10, Hong Kong, China, May 1-5, 2001’, pp. 613–622. Fayyad, U. M. & Irani, K. B. (1993), Multi-interval discretization of continuous-valued attributes for classification learning, in ‘Proceedings 19 of the 13th International Joint Conference on Artificial Intelligence. Chambéry, France, August 28 - September 3, 1993’, pp. 1022–1029. Fürnkranz, J., Hüllermeier, E., Loza Menćıa, E. & Brinker, K. (2008), ‘Multilabel classification via calibrated label ranking’, Machine Learning 73(2), 133–153. Genuer, R., Poggi, J. & Tuleau-Malot, C. (2010), ‘Variable selection using random forests’, Pattern Recognition Letters 31(14), 2225–2236. Hüllermeier, E., Fürnkranz, J., Cheng, W. & Brinker, K. (2008), ‘Label ranking by learning pairwise preferences’, Artificial Intelligence 172(16- 17), 1897–1916. Kamishima, T. (2003), Nantonac collaborative filtering: recommendation based on order responses, in ‘Proceedings of the Ninth ACM SIGKDD In- ternational Conference on Knowledge Discovery and Data Mining, Wash- ington, DC, USA, August 24 - 27, 2003’, pp. 583–588. Kemeny, J. & Snell, J. (1972), Mathematical Models in the Social Sciences, MIT Press. Kendall, M. & Gibbons, J. (1970), Rank correlation methods, Griffin London. Mitchell, T. (1997), Machine Learning, McGraw-Hill. Neave, H. & Worthington, P. (1992), Distribution-free Tests, Routledge. Quinlan, J. R. (1986), ‘Induction of decision trees’, Machine Learning 1(1), 81–106. R Development Core Team (2010), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Rebelo, C., Soares, C. & Costa, J. (2008), Empirical Evaluation of Rank- ing Trees on Some Metalearning Problems, in J. Chomicki, V. Conitzer, U. Junker & P. Perny, eds, ‘Proceedings 4th AAAI Multidisciplinary Work- shop on Advances in Preference Handling’. Ribeiro, G., Duivesteijn, W., Soares, C. & Knobbe, A. J. (2012), Multilayer perceptron for label ranking, in ‘Artificial Neural Networks and Machine Learning - ICANN 2012 - 22nd International Conference on Artificial Neu- ral Networks, Lausanne, Switzerland, September 11-14, 2012, Proceedings, Part II’, pp. 25–32. 20 Scornet, E., Biau, G. & Vert, J.-P. (2014), ‘Consistency of random forests’, ArXiv e-prints. Spearman, C. (1904), ‘The proof and measurement of association between two things’, American Journal of Psychology 15, 72–101. Todorovski, L., Blockeel, H. & Džeroski, S. (2002), Ranking with Predictive Clustering Trees, in T. Elomaa, H. Mannila & H. Toivonen, eds, ‘Proc. of the 13th European Conf. on Machine Learning’, number 2430 in ‘LNAI’, Springer-Verlag, pp. 444–455. Vembu, S. & Gärtner, T. (2010), Label ranking algorithms: A survey, in J. Fürnkranz & E. Hüllermeier, eds, ‘Preference Learning’, Springer- Verlag, pp. 45–64. Yasutake, S., Hatano, K., Takimoto, E. & Takeda, M. (2012), Online rank ag- gregation, in ‘Proceedings of the 4th Asian Conference on Machine Learn- ing, ACML 2012, Singapore, Singapore, November 4-6, 2012’, pp. 539–553. 21 work_2s7yi5h4orcibame2ndlhum7lu ---- Multi-faceted Assessment of Trademark Similarity This is an Open Access document downloaded from ORCA, Cardiff University's institutional repository: http://orca.cf.ac.uk/93588/ This is the author’s version of a work that was submitted to / accepted for publication. Citation for final published version: Setchi, Rossitza 2016. Multi-faceted Assessment of Trademark Similarity. Expert Systems with Applications 65 , pp. 16-27. 10.1016/j.eswa.2016.08.028 file Publishers page: http://dx.doi.org/10.1016/j.eswa.2016.08.028 Please note: Changes made as a result of publishing processes such as copy-editing, formatting and page numbers may not be reflected in this version. For the definitive version of this publication, please refer to the published source. You are advised to consult the publisher’s version if you wish to cite this paper. This version is being made available in accordance with publisher policies. See http://orca.cf.ac.uk/policies.html for usage policies. Copyright and moral rights for publications made available in ORCA are retained by the copyright holders. Accepted Manuscript Multi-faceted Assessment of Trademark Similarity Rossitza Setchi , Fatahiyah Mohd Anuar PII: S0957-4174(16)30421-3 DOI: 10.1016/j.eswa.2016.08.028 Reference: ESWA 10821 To appear in: Expert Systems With Applications Received date: 22 December 2015 Revised date: 3 August 2016 Accepted date: 4 August 2016 Please cite this article as: Rossitza Setchi , Fatahiyah Mohd Anuar , Multi-faceted Assessment of Trademark Similarity, Expert Systems With Applications (2016), doi: 10.1016/j.eswa.2016.08.028 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. http://dx.doi.org/10.1016/j.eswa.2016.08.028 http://dx.doi.org/10.1016/j.eswa.2016.08.028 ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Highlights  A novel method for the assessment of trademark similarity is proposed.  The method blends together visual, semantic and phonetic similarity.  It produces an aggregated score based on the individual assessments.  Evaluation using information retrieval measures and human judgment. ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Multi-faceted Assessment of Trademark Similarity Rossitza Setchi 1 and Fatahiyah Mohd Anuar 1,2 , 1 School of Engineering, Cardiff University, 14-17 The Parade, Cardiff CF24 3AA, UK E-mail: setchi@cf.ac.uk 2 Faculty of Engineering, Multimedia University, Cyberjaya, 63100, Malaysia E-mail: fatahiyah@mmu.edu.my Abstract—Trademarks are intellectual property assets with potentially high reputational value. Their infringement may lead to lost revenue, lower profits and damages to brand reputation. A test normally conducted to check whether a trademark is highly likely to infringe other existing, already registered, trademarks is called a likelihood of confusion test. One of the most influential factors in this test is establishing similarity in appearance, meaning or sound. However, even though the trademark registration process suggests a multi-faceted similarity assessment, relevant research in expert systems mainly focuses on computing individual aspects of similarity between trademarks. Therefore, this paper contributes to the knowledge in this field by proposing a method, which, similar to the way people perceive trademarks, blends together the three fundamental aspects of trademark similarity and produces an aggregated score based on the individual visual, semantic and phonetic assessments. In particular, semantic similarity is a new aspect, which has not been considered by other researchers in approaches aimed at providing decision support in trademark similarity assessment. Another specific scientific contribution of this paper is the innovative integration, using a fuzzy engine, of three independent assessments, which collectively provide a more balanced and human-centered view on potential infringement problems. In addition, the paper introduces the concept of degree of similarity since the line between similar and dissimilar trademarks is not always easy to define especially when dealing with blending three very different assessments. The work described in the paper is evaluated using a database comprising 1,400 trademarks compiled from a collection of real legal cases of trademark disputes. The evaluation involved two experiments. The first ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T experiment employed information retrieval measures to test the classification accuracy of the proposed method while the second used human collective opinion to examine correlations between the trademark scoring/rating and the ranking of the proposed method, and human judgment. In the first experiment, the proposed method improved the F-score, precision and accuracy of classification by 12.5%, 35% and 8.3%, respectively, against the best score computed using individual similarity. In the second experiment, the proposed method produced a perfect positive Spearman rank correlation score of 1.00 in the ranking task and a pairwise Pearson correlation score of 0.92 in the rating task. The test of significance conducted on both scores rejected the null hypotheses of the experiment and showed that both scores correlated well with collective human judgment. The combined overall assessment could add value to existing support systems and be beneficial for both trademark examiners and trademark applicants. The method could be further used in addressing recent cyberspace phenomena related to trademark infringement such as customer hijacking and cybersquatting. Keywords—Trademark assessment, trademark infringement, trademark retrieval, degree of similarity, fuzzy aggregation, semantic similarity, phonetic similarity, visual similarity. 1. Introduction Trademarks are valuable intellectual property (IP) assets that identify the commercial source or origin of products or services. They are visual signs in the form of logos or brand names that allow goods or services to be easily recognized and distinguished by consumers. Similar to other intangible company assets, trademarks can be subject to legal protection. Trademark registration through an IP office provides legal protection for companies and individuals on registered marks in the jurisdiction(s) that the registration office covers. It therefore provides legal certainty and underpins the right of the trademark owner. ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Trademark infringement is a form of IP crime that may lead to lost revenue, lower profits and additional costs, such as the legal fees necessary to enforce a trademark. In addition, trademark infringement is time-consuming when enforcing rights and, perhaps more importantly, can lead to severe damage of brand reputation. Recent statistics show that trademark infringement has become a serious economic and legal issue. For example, the United States International Trade Commission, as reported by the Chairman of the Joint Economic Committee, stated that the number of investigated infringement cases rose from the year 2010 to 2011 by 23.2%. A total of 3,400 trademark infringement cases were filed in the US District Courts in 2012, which excluded a presumably even larger number of cases where settlements were reached prior to the filing of cases (Scott, 2013). Some of the reported cases involve new cybercrime phenomena such as customer hijacking and cybersquatting (Scott, 2013). In another investigation conducted by the US International Trade Commission in 2011, the average annual increase of trademark litigation cases concerning US-based companies from 2002–2011 was 39.8% (US International Trade Commission, 2011). Despite these alarming trademark infringement statistics, the number of newly registered trademarks together with the existing trademarks used in the market continues to grow (Office for Harmonization in the Internal Market [OHIM], 2012; Dodell, 2013). This trend, which has been observed worldwide, has recently created administrative problems for many trademark registration offices as the registration process has become more complex and lengthy. The trademark registration process includes a trademark similarity examination (OHIM, 2014), which requires a multi-faceted similarity assessment. One of the steps involved is making sure that the trademark to be registered is not similar to any trademark that has already been registered, as the registration of trademarks that are found to be identical or similar to any existing trademarks and provide identical or similar goods or services may potentially be opposed, as indicated in section 5 of the Trade Marks Act 1994 (Trade Marks Act, 1994). This is important in order to avoid infringements and protect the rights of existing registered trademarks. ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T The current practice of examining trademark similarity generally involves a search to retrieve relevant trademarks from a very large trademark database on the basis of a specific type of similarity. For example, the Industrial Property Automation System (IPAS), a support system developed by the World Industrial Property Organisation (WIPO), provides three trademark search options, namely a bibliography search based on the filing date and registration number, a phonetic search based on phonetic rules and common prefixes and suffixes, and a logo search based on the Vienna classification code for figurative trademarks (WIPO, 2014). The research in this paper is motivated by the guidelines in the trademark examination manual, which require overall similarity assessment. From a theoretical point of view, the paper contributes to the body of knowledge in the area of intelligent human- centered decision support and in particular the use of fuzzy logic and semantics in complex evaluations and assessments related to infringement and the likelihood of confusion. Previous research has addressed some of these aspects to a certain degree. For example, the need to consider many facets or aspects in complex evaluations has been recognized by a number of researchers working in various domains. Many of them employ fuzzy logic, which is a particularly suitable reasoning technique in domains where the selection of the best alternative is highly complex and the judgement is based on subjective perceptions (Mardani et al., 2015). For example, a knowledge evaluation method aimed at estimating the quality of knowledge and its market value uses fuzzy logic to aggregate several aspects including knowledge complexity, marketable value, and the reputation of the knowledge supplier (Chen, 2011). Fuzzy numbers are also used to calculate the value of a patent and the chance of mitigation (Agliardi and Agliardi, 2011), which similar to quality of knowledge in the above example, are also parameters very difficult to measure objectively. Semantics and fuzzy logic are employed in group decision making (Gupta and Mohanty, 2016), consensus building (Li et al., 2017), opinion mining (Martínez-Cruz et al., 2016) and knowledge management (Li et al., 2011). ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T This paper offers an original approach to the problem of trademark infringement, which is based on multi-facet assessment and verified through human judgement. The proposed computational method for assessing trademark similarity employs multi-faceted evaluation of the three main aspects of trademark similarity: visual, semantic and phonetic. In particular, semantic similarity is a new aspect which has not been considered in any previous approaches aimed at developing decision support systems for trademark similarity assessment. Therefore, the specific scientific contribution of this paper is the innovative integration, using a fuzzy engine, of three independent assessments, which collectively provide a more balanced view on potential infringement problems. The combined overall assessment could add value to existing support systems and be beneficial for both trademark examiners and trademark applicants. The rest of the paper is organized as follows: The next section provides an overview of existing trademark search systems and briefly discusses fuzzy logic, the inference concept employed in this research. The proposed computational method is introduced in Section 3. Section 4 describes the experimental setup and presents the results. A discussion is provided in Section 5. Section 6 concludes the study. 2. Related Work This section reviews related work in the scope of this study. It consists of two subsections. The first subsection reviews existing trademark search systems, and the second subsection briefly discusses the concept of fuzzy inference, which inspired the development of the proposed method for the multi-faceted assessment of trademark similarity. 2.1 Existing Trademark Search Systems Table 1 shows examples of trademarks with different types of similarity: visual, semantic and phonetic. The trademark pair NEXT and NEST possess some degree of visual similarity due to the total number of letters and the number of identical letters used. In addition, although NEXT is a figurative trademark, its style/font is similar to the typeface font ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T of the trademark NEST, which contributes to the visual similarity between them. The second pair, MAGIC TIMES and MAGIC HOUR, are semantically similar due to the identical word that they share and the lexical relation between the non-identical words in the trademark text. The last pair, i.e. SVIZZEROTALER and SWISS TALER, are phonetically similar because although these trademarks are spelled differently, their pronunciation is similar. Many trademarks share more than one type of similarity; however, despite the existing variety in the types of similarity, most of the research in this area is focused on retrieving trademarks based on their visual similarity using low-level features. Examples of such systems include TRADEMARK (Kato et al., 1990), STAR (Wu et al., 1996) and ARTISAN (Eakins et al., 1996), which have been widely referred to by many researchers. TRADEMARK uses graphical descriptor vectors derived from shape features while STAR employs a traditional content-based image retrieval (CBIR) framework together with a set of shape-based descriptors, including Fourier descriptors, gray-level projection and moment invariants. In addition, it utilizes the spatial layout of the images although this has been found to be extremely challenging. ARTISAN also utilizes shape-based feature descriptors but includes Gestalt-based principles to retrieve abstract geometric trademark designs. These three studies have inspired further research in trademark image retrieval focused on the visual similarity aspect of trademarks. For example, Kim and Kim (1998) employed a moment-based shape descriptor and analyzed the distribution model of 90 moment orders for all the images in their database. A closed contour shape descriptor using angle code strings was developed by Peng and Chen (1998). Jain and Vailaya (1998) proposed the use of the edge direction histogram and improved the descriptor so that it became scale and rotation invariant. Other research includes a comparative study of several common shape-based descriptors for trademark similarity comparison (Eakins et al., 2003) and a compositional shape descriptor that combines several shape descriptors (Hong & Jiang, 2008; Wei et al., 2009). ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Despite the amount of work undertaken, visual similarity assessment is mainly limited to trademarks with figurative marks or logos. Notwithstanding, statistics of registered trademarks in five European countries have shown that only 30% of all trademarks employ logos as their proprietary marks (Schietse et al., 2007). The trademark similarity of the remaining 70% of registered trademarks is still insufficiently researched. For example, despite the recent advances in computational semantics, the existing trademark search systems that focus on text are primarily built around keyword-based retrieval or approximate string matching. Such systems return trademarks that match parts or entire words in query text. In Europe, OHIM recently launched a search system that allows trademark applicants and third parties to search for trademarks in different languages (OHIM, 2013). The system also provides an advanced search option that offers three search types: word prefix, full phrase and exact match. In the United Kingdom, the UK Intellectual Property Office (IPO) offers similar search options with an additional option that looks for similar query strings (UK IPO, 2013). The IPO search system utilises an approximate string-matching technique, which looks for fairly similar patterns in strings, together with several predefined criteria including word length and the number of similar and dissimilar letters shared by the words. Despite their usefulness, the comparison mechanism employed in such systems limits their effectiveness as it does not cover all similarity aspects that are normally assessed during the trademark examination process. Advances in computational semantics provide an opportunity to overcome the limitations of traditional text-based retrieval by exploring semantic similarity. In the context of trademark similarity examination and analysis, it allows the comparison of trademarks based on their semantic similarity derived using external knowledge sources such as lexical ontologies. From the point of view of knowledge engineering, a lexical ontology is a framework that specifies the underlying structure and lexical relationships for knowledge representation and the organization of lexical information (Storey et al., 1998). On the other hand, advances in computational linguistics and genealogy provide a mechanism to ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T compare trademarks based on their phonetic similarity (Convington, 1998; Kondrak, 2003; Pfeifer et al., 1996; Philips, 2000). This includes computational linguistics studies of similarities between cognates, i.e. words from different languages that share the same linguistic origin and etymology, and name-matching applications in genealogy, which retrieve similar names despite spelling variations. This research promotes the view that the existing work on visual similarity can be extended using the recent advances in semantic retrieval, computational linguistics and computational genealogy. This approach is consistent with the requirement for holistic assessment outlined in the OHIM trademark manual (OHIM, 2014). Trademark comparison based on visual, semantic and phonetic similarity, individually, has been the paramount focus of the present authors’ previous work (Anuar et al., 2013; 2014; 2016). The main contribution of this paper is that it extends previous approaches by providing a consolidated holistic assessment process. In addition, the paper introduces the concept of degree of similarity since the line between similar and dissimilar is not always easy to define. 2.1 Fuzzy Logic Studies on information retrieval of music and artist recommendations (McFee & Lanckriet, 2009; Zhang et al., 2009) compute multi-faceted similarity based on low-level features and subjective criteria. Fuzzy logic has not yet been applied to multi-faceted similarity assessment but has been used in many applications that require human reasoning and decision-making. Examples include control systems in the engineering domain, doctor– patient decision-making in the medical domain, and risk analysis in e-commerce (Abou & Saleh, 2011; Fazzolari et al., 2013; Ngai & Wat, 2005). Furthermore, the concept of fuzzy logic has long been recognized in legal studies (Cook, 2001; Kosko, 1994), which is an important consideration in the area of IP rights protection. This paper promotes the use of fuzzy logic to compute the degree of similarity between trademarks due to its natural modelling capability that can mimic the very complex system underlying the human mind. ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T The concept of fuzzy logic was first introduced as a mathematical tool for dealing with uncertainty (Zadeh, 1965). From the point of view of set theory, the concept of fuzzy logic is an extension of the crisp set concept in which every preposition must be either ‘true’ or ‘false’ or in a range of values. Instead, fuzzy logic asserts that every preposition can simultaneously have a certain degree of a membership function of the ‘true’ or ‘false’ class. An inference system based on fuzzy logic uses fuzzy set operations and properties for reasoning and consists of a fuzzy rule base. A fuzzy rule generally has two components, the IF component, i.e. the antecedent, which describes a condition, and the THEN component, i.e. the consequent, which describes a conclusion. It follows the format: IF , THEN (1) In the context of a human-oriented process that requires approximate human reasoning or decision-making based on experiences and insights, a human inference system tends to use verbal variables to create verbal rules in a form similar to Eq. 1. Since the terms and variables used in human inference systems are normally fuzzy rather than precise, a fuzzy inference system is highly applicable in such applications. Verbal terms and variables can therefore be expressed mathematically as membership degrees and membership functions with symbolic verbal phrases rather than numeric values. Indirectly, this provides a systematic mechanism to utilize the uncertain and imprecise information used in human judgment. The implementation of the fuzzy inference approach in various applications commonly involves two inference models, i.e. the Mamdani inference model, which is based on a fuzzy relational model, and the Takagi–Sugeno inference model (Akgun, 2012). Both models employ slightly different approaches in the output aggregation process in that Mamdani uses defuzzification and Takagi–Sugeno employs weighted average to compute the crisp output. An alternative approach is the Tsukamoto model, which represents the consequent of the fuzzy rules with monotonical membership functions (Jang et al., 1997). A ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T more recent approach is the inference model based on a combination of adaptive neural networks and fuzzy logic (Leng et al., 2009). The Mamdani inference model is employed in this paper due to its intuitive and linguistic model applicability, which makes it very suitable for human-oriented applications. 3. Trademark Degree-of-Similarity Aggregation Method This section introduces the proposed method and highlights the main steps involved in it. The method was based on a systematic analysis of 1,400 trademarks extracted from real dispute cases. This analysis revealed that the trademark cases in the collection were either real words/phrases such as ‘MAGIC HOUR’, out-of-vocabulary words/phrases such as ‘SVIZZEROTALER’ or a combination of both. In addition, the analysis also showed that in cases involving only out-of-vocabulary words, only visual and phonetic assessments were performed since such words do not carry any lexical meaning. The four different types of trademarks defined in OHIM (2014), namely word mark, figurative word mark, purely figurative mark and purely figurative mark with figurative word mark (Fig. 1), require different processing techniques and analytical approaches, hence the development of a method that facilitates the similarity comparison of both real words and out-of-vocabulary words. The conceptual model of the proposed system (Fig. 2) comprises four main modules. Three of these modules assess trademarks in terms of their visual, semantic and phonetic similarity while the fourth module, the fuzzy inference engine, aggregates the final score based on the three individual assessments. Each module has its individual functional requirements and uses a different approach to achieve its predefined function. For example, the visual similarity module employs visual descriptors based on the shape features of the individual letters included in the trademarks. Fig. 3 shows the flowchart of the proposed method and the four individual steps involved: (i) individual assessments, (ii) fuzzification, (iii) inference and (iv) defuzzification. ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T 3.1 Step 1: Assessment of Visual, Semantic and Phonetic Similarity This step involves the assessment of the three main aspects of similarity. The visual similarity assessment of purely figurative trademarks such as logos is computed using an advanced algorithm (Anuar et al., 2013) that employs global and local shape features, i.e. Zernike moment and an edge-gradient co-occurrence matrix, represented as vectors. The similarity between the trademarks is then computed using normalized Euclidean distances between their corresponding vectors. The same approach, combined with the string algorithm (Navarro, 2001) is used to compute the visual similarity of trademarks with word marks and figurative word marks (Table 2). Unlike approximate string matching that uses binary values in the letter-to-letter comparison, such as ‘1’ and ‘I’ in the example shown in Fig. 4, the algorithm developed in this paper computes the visual similarity between letters using their shape descriptors. This provides a mechanism that differentiates between different letters and numbers that look similar, such as ‘1’ and ‘I’, and less similar letters and numbers, such as ‘1’ and ‘X’. Table 3 shows that the proposed algorithm exhibits better discriminating power compared to approximate string matching. The trademark semantic similarity assessment is based on a similarity computation model (Anuar et al., in press), which utilizes a lexical ontology, i.e. WordNet, as an external knowledge source. WordNet is a large electronic lexical database of the English language that is freely available and was developed based on psycholinguistic theories that model human semantic organization. It has been extended to over 30 different languages, ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T including Dutch, Spanish, German, Basque and Arabic (Abouenour et al., 2013; Fernandez- Montraveta et al., 2008; Gonzalo et al., 1999; Hinrichs et al., 2013; Pociello et al., 2011). The computation of semantic trademark similarity uses two sets of features to represent each trademark: the token feature set and the synonyms feature set. The token feature set consists of a set of words included in the trademark. For example, the token feature set for the trademark ‘Red Bull’ is (red, bull). The synonym feature set on the other hand comprises synonyms, direct hypernyms, i.e. words that are more general in meaning in the taxonomic hierarchy, and direct hyponyms, i.e. words that are instances of their corresponding trademark tokens. The similarity score is computed using a combination of Tversky’s contrast model of similarity (Tversky, 1977), which considers the number of shared features, together with the edge-based word similarity score between the tokens, derived using lexical ontology, i.e. WordNet. Finally, the phonetic similarity assessment computes trademark similarity based on the phonological features of the phonemes in the trademark text combined with typographic mapping and a token rearrangement process (Anuar et al., 2014). The algorithm represents the phonemes in a word string as vectors with phonetic features where each vector consists of 10 binary main features and two multi-valued features extracted from the phonological properties of human speech production (Kondrak, 2003). The algorithm differentiates between more similar phoneme pairs, such as ‘m’ and ‘n’, and less similar phoneme pairs, such as ‘m’ and ‘p’. In addition, the algorithm converts special characters or symbols in the trademark text to their corresponding meaning. For example, the ampersand symbol ‘&’ is substituted by ‘and’. This conversion allows typographic symbols to be processed in the way regular words appearing in trademarks are handled. 3.2 Step 2: Fuzzification The fuzzification step is the process of mapping the crisp values of the input variables to fuzzy sets. Three input variables corresponding to the visual, semantic and phonetic assessments are fuzzified in this step using five triangular-based membership ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T functions, as defined in Eq. 2. These functions were employed in this study because of their simplicity and good performance, which have been proven theoretically (Barua et al., 2014) and used in various engineering and non-engineering applications (Gañán et al., 2012; Kaur & Kaur, 2012; Ngai & Wat, 2005). Moreover, these functions have recently been used in a court case decision-making study that included traffic violations and crime cases (Sabahi & Akbarzadeh-T, 2014). A graphical representation of the input membership functions is shown in Fig. 5. 175.0, 25.0 75.0 75.0,0 )( 1,0 175.0, 25.0 1 75.05.0, 25.0 5.0 5.0,0 )( 75.0,0 75.05.0, 25.0 75.0 5.025.0, 25.0 25.0 25.0,0 )( 5.0,0 5.025.0, 25.0 5.0 25.00, 25.0 0,0 )( 25.0,0 25.00, 25.0 25.0 )( 5 43 21                    x x x xf x x x x x x xf x x x x x x xf x x x x x x xf x x x xf (2) 3.3 Step 3: Inference This step uses the Mamdani fuzzy inference model, a well-known inference model used in various fuzzy logic-based applications (Abou & Saleh, 2011; Akgun et al., 2012; Chatzichristofis et al., 2012). A set of fuzzy rules was first developed based on the OHIM trademark examination manual (OHIM, 2014) and an empirical study of 1,400 trademarks involved in dispute cases. The rules are expressed in tabular form using five two- dimensional fuzzy associative matrices, which correspond to a total of 125 rules. Fig. 6 shows the five associative matrices of the developed rules. Five input and output conditions are associated with each rule: very low (VL), low (L), medium (M), high (H), and very high (VH). Each cell in the associative matrices corresponds to the output condition triggered by ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T the rules associated with the condition of the input variables. For example, the verbal rule corresponding to the first cell of matrix (c) in Fig. 6 is translated as ‘IF the phonetic score IS M (medium) and the semantic score IS VL (very low) and the visual score IS VL (very low), THEN the output score IS L (low)’. The output membership functions that correspond to the five output conditions also consist of five triangular-based functions, as in Eq. 3. A graphical representation of these functions is shown in Fig. 7. f 1 (x) = 0.2 - x 0.3 , 0 £ x £ 0.2 0, x ³ 0.2 f 2 (x) = 0, x ³ 0 x + 0.1 0.3 , 0 £ x £ 0.2 0.5 - x 0.3 , 0.2 £ x £ 0.5 0, x ³ 0.5 f 3 (x) = 0, x £ 0.25 x - 0.2 0.3 , 0.2 £ x £ 0.5 0.8 - x 0.25 , 0.5 £ x £ 0.8 0, x ³ 0.75 f 4 (x) = 0, x £ 0.5 x - 0.5 0.3 , 0.5 £ x £ 0.8 1.1- x 0.3 , 0.8 £ x £ 1 f 5 (x) = 0, x £ 0.8 x - 0.8 0.3 , 0.8 £ x £ 1 (3) The aggregation of the compositional output involves a fuzzy operation between the fuzzified input and the fuzzy relations established by the rules. It is derived using the implication–aggregation (min–max) method (Akgun et al., 2012): m 0 = max(min(m i 1 (k),m i 2 (k),m i 3 (k))) (4) where m i 1 ,m 1 2 ,m i 3 are the mapping of the first, second and third inputs from the crisp set to the fuzzy set, i.e. the visual, semantic and phonetic similarity scores, respectively, and k is the k-th IF–THEN preposition, or the fuzzy rule. 3.4 Step 4: Defuzzification ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T This step uses the centroid or centre of mass defuzzification method to quantify the compositional output from the fuzzy set to the real output that corresponds to the degree-of- similarity value. It computes the centroid under the curve resulting from the compositional operation performed during the inference step. The centroid computation is given by the following equation: centroid = f(x) ×x dxò f(x) dxò (5) where f(x) is the membership function associated with the compositional output. Fig. 8 shows an illustrative example of the proposed aggregation process for the trademark pair SKYPINE and SKYLINE. Their degree of similarity was computed as 0.798. 4. Experimental Setup and Results This section describes the two experiments performed in this study and the evaluation method used to conduct them. The first experiment evaluated the proposed method from a computational point of view using information retrieval measures. The second experiment was designed to capture human perception, i.e. the way people view similarity in trademarks. 4.1 Experiment 1 The main objective of the first experiment was to test the classification performance of the proposed method when differentiating between possible cases of infringement. The developed method was compared to the traditional approach of considering the individual aspects of similarity. The experiment employed information retrieval measures such as F- score, precision score and accuracy. The scores were derived from the classification confusion matrix shown in Table 4, where TP, FP, FN and TN refer to true positive, false positive, false negative and true negative, respectively. ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T A collection of real court cases comprising 1,400 trademarks (Schweizer, 2013) was analysed and used to create a database. An excerpt from a court case report for two disputed trademarks, AURA and AUREA, is shown below. It provides the conclusion and rationale of the experts investigating this particular case. Based on such findings, the database was then split into two groups, i.e. with degree of similarity that may or may not lead to confusion as judged by the experts. On the visual level, the trademarks have a strong similarity in the sense that the length of the verbal elements is almost identical (AURA/AUREA), i.e. four against five letters. Only the vowel ‘E’ of the contested trademark differs from the four letters of ‘AURA’ trademark. The overall visual impression is therefore very similar. Aurally, the signs are also very similar. The vowel ‘E’ can be easily used. The overall phonetic impression is also very similar. Although that there is no semantic similarity, the risk of misperception on trademarks does exist due to high visual and phonetic similarity. The fact that the opponent has an additional letter ‘E’ does not change the overall similarity finding. In view of that, the similarity of the trademarks is therefore recognized. For evaluation purposes, a repeated holdout evaluation procedure was performed in which the database was divided into two random disjoint training (50%) and testing (50%) sets. The training set was used to obtain a threshold score to classify the dataset employed in this experiment. Pairwise degrees of similarity scores between the trademark pairs in the training set were first computed using the proposed method. A histogram-based thresholding algorithm (Nobuyuki, 1979) was then used to estimate the threshold value of the computed degree-of-similarity scores by exhaustive searches for a value that minimized the intra-class variance of the binary classes. The threshold value obtained from the training set was then used to classify the data in the testing set. This procedure was repeated 1,000 times and in each repetition the F-score, precision and accuracy were computed using Eqs. 6–8: F - score = 2TP TP +FP + TP +FN (6) precision = TP TP +FP (7) ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T accuracy = TP + TN Total Data (8) where TP, TN, FP and FN are the true positive, true negative, false positive and false negative trademarks, respectively, as classified by the binary classification performed in this experiment, and Total Data (calculated as 700) is the total number of trademark pairs in the database. The average scores were then used to evaluate the overall performance of the proposed method. The procedure was repeated using the scores from the individual assessments of visual, semantic and phonetic similarity. Table 5 shows the classification results obtained using the three individual similarity assessments and the proposed method. 4.2 Experiment 2 The main objective of the second experiment was to prove the following two hypotheses: 1. The similarity ranking of the trademark pairs produced by the proposed method correlates with human collective judgment. 2. The similarity rating of each trademark pair produced by the proposed method correlates with human collective judgment. Two significance tests were performed using the Spearman rank correlation score and the Pearson pairwise correlation score to statistically prove these hypotheses and reject the null hypotheses of this experiment. The Spearman rank correlation score, which takes values in the range of -1 to 1 (both -1 and 1 being the negative and positive perfect correlations, respectively, and 0 indicating no correlation), is a measure of statistical dependence between two ranked variables. The score indicates how strong the relationship between the ranked variable can be and is described using a monotonic function. The Pearson pairwise correlation score on the other hand measures the strength of a linear association between two variables. The Pearson correlation attempts to draw a line of best fit through the values ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T of two variables; the score itself describes the dispersion of the data points from the line of best fit. The Pearson correlation score has the same value range as the Spearman rank correlation score. As it involved human judgment, this experiment used a crowdsourcing platform for evaluation purposes. Crowdsourcing is an open call task recently introduced in information retrieval studies and has been proven to produce fast and reliable results in a cost-effective way (Corney, 2010; Fadzli & Setchi, 2012; Snow et al., 2008). This task, commonly known as a human intelligence task (HIT), is a small portion of an even larger task distributed among a large group of workers without any apparent contact. A total of 25 trademarks were randomly selected from the database used in Experiment 1 as a query set in Experiment 2. The trademark similarity assessment system developed in this study was then used to rank the set of trademarks returned from each query from the highest degree-of-similarity (ds) score to the lowest. Three trademarks with high (ds > 3.5), medium (2.0 < ds ≤ 3.5) and low (ds ≤ 2.0) distribution scores were selected from the retrieved set and used in the crowdsourcing task. Table 6 shows the 25 queries used in this experiment together with the three retrieved results classified by the proposed method as having high, medium and low similarity, respectively. Fig. 9 shows one of the HITs used in the experiment. In each HIT, the workers were presented with three different trademarks and asked to score their similarity with the query trademark using a scale from 1 to 5 (1 being the least similar and 5 being the most similar). Each query was evaluated by 20 different workers, which resulted in a total of 500 HITs. The selection of the HIT workers was based on two criteria: the number and acceptance rate of their previously completed assignments. The first criterion required the workers to have completed at least 1,000 HITs. The acceptance ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T rate of the previously completed HITs was set to 95%, indicating the approval level of the work done as evidenced by their HITs requestors. These two criteria were introduced to ensure the quality of the collected feedback. Next, the average similarity scores for each query given by the workers were computed and compared with the normalized similarity score produced by the proposed method (Table 7). The similarity scores (Fig. 10) were used to compute the Spearman rank correlation score and the Pearson pairwise correlation score shown in Table 8. 5. Discussion The first experiment verified the classification performance of the proposed multi- faceted method, which aggregated a similarity score based on all three similarity aspects (see Table 5). The method produced an F-score of 0.911, which translated into respective improvements of 15.2%, 150% and 12.5% compared to the F-scores produced using visual, semantic and phonetic similarity individually. Among these three similarity aspects, phonetic similarity produced the best F-score (0.810) while semantic similarity showed the worst performance in terms of F-score (0.364). The proposed method also surpassed the three individual similarity aspects in terms of precision. With a precision score of 0.924, it improved the individual performance of the visual, semantic and phonetic similarity assessments by 35%, 312% and 35.4%, respectively. Similar improvements were demonstrated in terms of accuracy. The proposed method produced an accuracy score of 0.910 compared to the accuracy produced using visual, semantic and phonetic similarity (0.819, 0.610 and 0.840, respectively), which resulted in improvements of 11%, 49% and 8.3%, respectively. Overall, the results from the first experiment clearly show that the proposed degree-of-similarity aggregation method has the best classification performance ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T compared to assessments based on individual similarity aspects. Moreover, this approach is well aligned to the recommended trademark examination procedure, which requires trademarks to be examined in a holistic way. The second experiment was designed to investigate the performance of the proposed method in comparison with human collective judgment. Two correlation measures, the Spearman rank correlation and the Pearson pairwise correlation, were used to statistically prove the hypotheses. The proposed method obtained a perfect Spearman rank score of 1 and a Pearson pairwise correlation score of 0.92. A statistical significance test performed on both correlation scores rejected the null hypotheses of the experiment and indirectly proved that the degree-of-similarity scores produced by the proposed method correlated well with human collective judgment on trademark overall similarity. This strong correlation can be also observed in the scatter plot shown in Fig. 10, which displays a concentration of almost all points along the best-fit line (the straight black line on the graph). 5. Conclusions A support system to assess the overall degree of similarity between trademarks is essential for trademark protection so the work presented in this paper was motivated by the need to help prevent trademark infringement by identifying existing similarities between trademarks. This paper contributes to the body of knowledge in this area by the development of a method that measures the degree of similarity between trademarks on the basis of all three aspects of similarity: visual, semantic and phonetic. The method uses fuzzy logic to aggregate the overall assessment, which provides a more balanced and human-centered view on potential infringement problems. In addition, the paper introduces the concept of degree of similarity since the line between similar and dissimilar trademarks is not always easy to define especially when dealing with blending three very different assessments. ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T One of the strengths of the proposed method is its rigorous evaluation using a large, purpose-built collection of real legal cases of trademark disputes. Moreover, the experiments performed in this study examined the performance of the proposed method from two points of view. First, the relative performance of the method was investigated from an information retrieval perspective in terms of classification performance. Using a crowdsourcing platform, the second experiment investigated the performance of the method relative to human judgment. The results of the experiments confirmed that there is a significant improvement in trademark similarity assessment when all similarity aspects are carefully considered. The results also showed that the proposed method demonstrates a statistically significant correlation against human collective judgment. Therefore, the experiments convincingly validated both original hypotheses outlined in this study. In conclusion, the proposed system can provide a support mechanism in the trademark similarity analysis performed by trademark examiners during trademark registration. Moreover, the method for assessing the trademark similarity could be extended to address recent cyberspace phenomena such as consumer hijacking and cybersquatting. A particular limitation of the proposed work is its focus on only one aspect of the concept of likelihood of confusion, i.e. computing the similarity between trademarks. In reality, there are several other factors influencing the perceptions of the consumers. Such factors include strength of the registered trademarks, proximity of the channels of trade, product relatedness and consumer traits (sophistication and care). Such a study, which is currently underway, requires a multi-disciplinary approach, which involves experts from business studies, marketing, psychology and engineering. Acknowledgements—The authors wish to acknowledge the help of Christopher Harrison, Peter Evans, William Morell and Rich Corken from the UK Intellectual Property Office in finalizing some of the ideas behind this research. ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T References Abou, A., and Saleh, E. (2011). A Fuzzy decision support system for management of breast cancer. International Journal of Advanced Computer Science and Applications, 2(3), 34-40. Abouenour, L., Bouzoubaa, K., & Rosso, P. (2013). On the evaluation and improvement of Arabic WordNet coverage and usability. Language Resources and Evaluation, 47(3), 891- 917. Agliardi, E. and Agliardi, R. (2011). An Application of Fuzzy Methods to Evaluate a Patent under the Chance of Litigation. Expert Systems with Applications, 10, 13143-13148. Akgun, A., Sezer, E. A., Nefeslioglu, H. A., Gokceoglu, C., & Pradhan, B. (2012). An easy- to-use MATLAB program (MamLand) for the assessment of landslide susceptibility using a Mamdani fuzzy algorithm. Computers and Geosciences, 38(1), 23-34. Anuar, F. M., Setchi, R., and Lai, Y. K. (2013). Trademark image retrieval using an integrated shape descriptor. Expert Systems with Applications, 40, 105-121. Anuar, F. M., Setchi, R., and Lai, Y. K. (2014). Trademark retrieval based on phonetic similarity. IEEE International Conference of Systems, Man and Cybernetics, San Diego, USA, 1642-1647. Anuar, F. M., Setchi, R., and Lai, Y. K. (2016). Semantic retrieval of trademarks based on conceptual similarity. IEEE Transaction of Systems, Man and Cybernetics: System, vol. 46(2), pp. 220-233. Barua, A., Mudunuri, L.S., and Kosheleva, O. (2014). Why trapezoidal and triangular membership functions work so well: Towards a theoretical explanation. Journal of Uncertain Systems, 8(3), 164-168. Chatzichristofis, S. A., Zagoris, K., Boutalis, Y., and Arampatzis, A. (2012). A fuzzy rank- based late fusion method for image retrieval. In: Schoeffmann, K., Merialdo, B., Hauptmann, ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T A., Ngo, C-W., Andreopoulos, Y., Breiteneder, C. (Eds). Lecture Notes in Computer Science, 7131, 463-472. Berlin Heidelberg: Springer. Chen, T.-Y. (2011). Value Ontology-Based Multi-Aspect Intellectual Asset Valuation Method for Decision-Making Support in k-Commerce, Expert Systems with Applications, 38(5), 5471-5485. Cook, B. B. (2001). Fuzzy logic and judicial decision making. Judicature, 85(2), 70-100. Corney, J. R., Torres-, C., Jagadeesan, A. P., Yan, X. T., Regli, W. C., and Medellin, H. (2010). Putting the crowd to work in a knowledge-based factory. Advanced Engineering Informatics, 24(3), 243-250. Covington, M. A. (1998). Alignment of multiple languages for historical comparison. International Conference on Computational Linguistics, Montreal, Canada (pp. 275-280). Dodell, L. (2013) The trademark problem: Casualty insurance’s dirty little secret. Retrieved from: http://www.carriermanagement.com (accessed 14 Dec 2015). Eakins, J. P., Riley, K. J., and Edwards, J. D. (2003). Shape feature matching for trademark image retrieval. International Conference of Image and Video Retrieval, Urbana Champaign, USA (pp. 28–38). Eakins, J. P., Shields, K., and Boardman, J. (1996). ARTISAN – A shape retrieval system based on boundary family indexing. Storage and Retrieval for Still Image and Video Databases, iv, 2670, 17–28. Fadzli, S. A., and Setchi, R. (2012). Concept-based indexing of annotated images using semantic DNA. Engineering Applications of Artificial Intelligence, 25(8), 1644-1655. Fazzolari, M., Alcala, R., Nojima, Y., Ishibuchi, H., and Herrera, F. (2013). A review of the application of multi objective evolutionary fuzzy systems: Current status and further directions. IEEE Transactions on Fuzzy Systems, 21(1), 45-65. ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Fernandez-Montraveta, A., Vazquez, G., and Fellbaum, C. (2008). The Spanish version of WordNet 3.0. In: Text Resources and Lexical Knowledge, 8, 175-182. Berlin and New York: Mouton de Gruyter. Gañán, C., Muñoz, J. L., Esparza, O., Mata, J., and Alins, J. (2012). Risk-based decision making for public key infrastructures using fuzzy logic. International Journal of Innovative Computing, Information and Control, 8(11), 7925-7942. Gonzalo, J., Verdejo, F. and Chugur, I. (1999). Using EuroWordNet in a concept-based approach to cross-language text retrieval. Applied Artificial Intelligence. 13(7), 647-678. Gupta, M. and Mohanty, B. K. (2016). An Algorithmic Approach to Group Decision Making Problems under Fuzzy and Dynamic Environment. Expert Systems with Applications, 55, 118-132. Hinrichs, E., Henrich, V., and Barkey, R. (2013). Using part-whole relations for automatic deduction of compound-internal relations in GermaNet. Language Resources and Evaluation, 47(3), 839-858. Hong, Z., and Jiang, Q. (2008). Hybrid content-based trademark retrieval using region and contour features. Advanced Information Networking and Applications Workshop, Okinawa, Japan (pp. 1163–1168). Jain, A. K., and Vailaya, A. (1998). Shape-based retrieval: A case study with trademark image databases. Pattern Recognition, 31(9), 1369–1390. Jang, J. S. R., Sun, C. T., and Mizutani, E. (1997). Fuzzy inference systems. Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence (pp. 73-91). Upper Saddle River, NJ: Prentice Hall. Kato, T., Fujimura, K., and Shimogaki, H. (1990). TRADEMARK. Multimedia image database system with intelligent human interface. Systems and Computers in Japan, 21(11), 33–46. ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Kaur, A., and Kaur, A. (2012). Comparison of Mamdani-type and Sugeno-type fuzzy inference systems for air conditioning system. International Journal of Soft Computing and Engineering, 2, 323-325. Kim, Y. S., and Kim, W. Y. (1998). Content-based trademark retrieval system using a visually salient feature. Image and Vision Computing, 16(12–13), 931–939. Kondrak, G. (2003). Phonetic alignment and similarity. Computers and the Humanities, 37(3), 273-291. Kosko, B. (1994). Fuzzy Thinking: The New Science of Fuzzy Logic. New York: Hyperion. Leng, G., Zeng, X. J., and Keane, J. A. (2009). A hybrid learning algorithm with a similarity- based pruning strategy for self-adaptive neuro-fuzzy systems. Applied Soft Computing, 9(4), 1354-1366. Li, C.-C., Liu, L., and Li, C.-B. (2017). Personalized Individual Semantics In Computing With Words for Supporting Linguistic Group Decision Making. An Application on Consensus Reaching. Information Fusion, 33, 29-40. Li, M., Liu, L., and Li, C.-B. (2011). An approach to expert recommendation based on fuzzy linguistic method and fuzzy text classification in knowledge management systems. Expert Systems with Applications, 38, 8586-8596. Mardani, A., Jusoh, A. and Zavadskas, E. K. (2015). Fuzzy Multiple Criteria Decision- Making Techniques and Applications – Two Decades Review from 1994 to 2014. Expert Systems with Applications, 42(8), 4126-4148. McFee, B., and Lanckriet, G. R. (2009). Heterogeneous Embedding for Subjective Artist Similarity. International Society for Music Information Retrieval Conference, Kobe, Japan (pp. 513-518). Navarro, G. (2001). A guided tour to approximate string matching. ACM Computing Survey, 33(1), 31-88. ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Ngai, E. W. T., and Wat, F. K. T. (2005). Fuzzy decision support system for risk analysis in e-commerce development. Decision Support Systems, 40(2), 235-255. Nobuyuki, O. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man and Cybernetics, 9(1), 62-66. OHIM (2012). Annual report 2012. Retrieved from: https://oami.europa.eu (accessed 10 Dec 2013). OHIM (2013). Trademark Search. Available at: https://oami.europa.eu. (accessed 10 Dec 2013). OHIM (2014). Guidelines for examination in the office for harmonization in the internal market on community trade marks, part c opposition, section 2 identity and likelihood of confusion, chapter 3 comparison of signs. Retrieved from: https://oami.europa.eu (accessed 1 Feb 2014). Peng, H. L., and Chen, S. Y. (1997). Trademark shape recognition using closed contours. Pattern Recognition Letters, 18(8), 791–803. Pfeifer, U., Poersch, T., and Fuhr, N. (1996). Retrieval effectiveness of proper name search methods. Information Processing and Management, 32(6), 667-679. Philips, L. (2000). The double metaphone search algorithm. C/C++ Users Journal, 18(6), 38- 43. Pociello, E., Agirre, E., and Aldezabal, I. (2011). Methodology and construction of the Basque WordNet. Language Resources and Evaluation, 45(2), 121-142. Sabahi, F., and Akbarzadeh-T, M.R. (2014). Introducing validity in fuzzy probability for judicial decision-making. International Journal of Approximate Reasoning, 55(6), 1383-1403. Schietse, J., Eakins, J. P., and Veltkamp, R. C. (2007). Practice and challenges in trademark image retrieval. International Conference on Image and Video Retrieval, Amsterdam, The Netherlands (pp. 518–524). ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Schweizer, M. (2013). Trade Marks Court Cases Database. Retrieved from: http://decisions.ch (accessed 10 Dec 2013). Scott, C. D. (2013). Trademark strategy in the internet age: Customer hijacking and the doctrine of initial interest confusion. Journal of Retailing, 89(2), 176-189. Storey, V. C., Dey, D., Ullrich, H., and Sundaresan, S. (1998). An ontology-based expert system for database design. Data and Knowledge Engineering, 28(1), (31-46). Trade Marks Act 1994. (1994). Retrieved from: http://www.legislation.gov.uk/ (accessed 24 Nov 2014). Tversky, A. (1977). Features of similarity. Psychology Review, 84, 327–352. Snow, R., O'Connor, B., Jurafsky, D., and Ng, A. Y. (2008). Cheap and fast-but is it good? Evaluating non-expert annotations for natural language tasks. Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii (pp. 254-263). USITC (2011). China: Effects of intellectual property infringement and indigenous innovation policies on the U.S. economy. Retrieved from: http://www.usitc.gov/ (accessed 20 Dec 2013). UKIPO (2013). Trademark Search. Available at: http://www.ipo.gov.uk. (accessed 10 Dec 2013). Wei, C. H., Li, Y., Chau, W. Y., and Li, C. T. (2009). Trademark image retrieval using synthetic features for describing global shape and interior structure. Pattern Recognition, 42(3), 386–394. WIPO (2014). WIPO IPAS Functional and Technical Overview. Retrieved from: https://www3.wipo.int/confluence/display/wipoimd/WIPO+IPAS+Functional+and+Technical+ Overview (accessed 24 Nov 2014). Wu, J. K., Lam, C. P., Mehtre, B. M., Gao, Y. J., and Narasimhalu, A. D. (1996). Content- based retrieval for trademark registration. Multimedia Tools and Applications, 3(3), 245–267. ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338-353. Zhang, B., Shen, J., Xiang, Q., and Wang, Y. (2009). CompositeMap: a novel framework for music similarity measure. 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, MA, USA (pp. 403-410). ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Table 1 Different type of trademark similarity Trademark 1 Trademark 2 Similarity Aspect NEST Visual MAGIC TIMES MAGIC HOUR Conceptual SVIZZEROTALER SWISS TALER Phonetic Table 2 Pseudocode of the visual similarity computation employed in the proposed algorithm Pseudocode: /*comment*/ 1: /* This part of the code is performed for the visual similarity score computation for trademarks with text*/ 2: define Qt and Dt as the query and trademark from the database 3: compute Aq and Ad as new strings that produce optimal alignment between Qt and Dt 4: define score as the letter-to-letter visual similarity matrix between Qt and Dt; 5: define m=maximum(length(Aq), length(Ad)); 6: for i=0 until m 7: if Aq(i)=Null || Ad(i)==Null 8: score(i)=0; 9: else 10: score(i)=compute visual similarity score between Aq(i) and Ad(i) 11: end 12: define total_score= sum(score)/m); Table 3 Degree-of-similarity computed using approximate string matching and visual similarity Approximate String Matching Visual Similarity 1NDEX :: INDEX 0.80 0.923 1NDEX :: XNDEX 0.80 0.861 ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Table 4 Confusion matrix employed for the computation of the F-score, precision score, and accuracy score. Actual Class Predicted Class Positive Negative Positive TP FN Negative FP TN Table 5 F-score, precision, and accuracy computed using visual, conceptual, and phonetic similarity and the proposed method. Visual Similarity Conceptual Similarity Phonetic Similarity Proposed Method F-score 0.791 0.364 0.810 0.911 Precision 0.683 0.224 0.682 0.924 Accuracy 0.819 0.610 0.840 0.910 ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Table 6 List of 25 queries and their corresponding results used in this experiment Queries Result 1 Result 2 Result 3 WEBIATOR WebFOCUS autoscout24 FRUIT TIGER LION FRUIT SMOOTH FRUIT RED BULL GSTAR XSTAR sakira SVIZZEROTALER SWISS TALER SEVIKAR SCHNEIDER NEST Nexans SKYLINE SKY ROOM PREVISA BONITA SWEETLAND HEIDI LAND AMORA AMORE AXARA ARTOR RIMOSTIL Rivotril REBOVIR REFODERM CYRA CYREL ara adria GLOBRIX Globix ZYLORIC GRILON Lifestyle Living Style LIFE TEX SNOW LIFE WOOD STONE MOONSTONE WILTON SwissTron NATURE ELLA NATURESSA MARQUELA ecopower ECOPOWER HARRY POTTER TRIX TREAC TREAKOL SANTHERA SANZEZA SALFIRA sunirse MUROLINO MURINO MONARI MATTERHORN MAGIC TIMES MAGIC HOUR Maritimer MATCH WORLD RED BULL FLYING BULL Feel'n LEARN SEE'N LEARN FEEL GOOD FIGUREHEAD bonvita BONAVITA Botoceutical FMH FNH FTG MR ACTIVIA ACTEVA ADWISTA ACCET ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Table 7 Similarity scores obtained from the hit assignments and the proposed trademark degree-of-similarity aggregation method. No QUERIES Human Interactive Task Rating Proposed Method Result 1 Result 2 Result 3 Result 1 Result 2 Result 3 1 webautor 3.40 2.35 1.00 4.98 2.99 1.90 2 FRUIT TIGER 3.45 2.05 1.20 3.94 2.17 1.75 3 GSTAR 4.05 2.45 1.00 4.23 2.82 1.86 4 SVIZZEROTALER 3.70 2.10 1.15 3.84 2.77 1.82 5 NEXT 4.00 2.80 1.10 4.29 2.86 1.79 6 SKYPINE 4.20 2.65 1.60 3.99 2.84 1.93 7 Prevista 4.70 3.20 1.35 4.17 2.68 1.96 8 SWEETLAND 3.70 2.10 1.20 3.94 2.85 2.00 9 AMORA 4.50 2.35 1.85 4.28 2.67 1.05 10 RIMOSTRIL 3.95 2.30 1.65 4.04 2.22 1.76 11 CYRA 3.75 2.25 1.45 3.94 2.68 1.83 12 GLOBRIX 4.75 1.60 1.40 4.14 2.14 1.84 13 Lifestyle 4.25 2.35 1.50 3.98 2.43 1.82 14 WOOD STONE 3.60 1.70 1.45 4.32 2.30 1.91 15 NUTELLA 3.65 2.20 1.40 3.74 2.96 2.00 16 ecopower 4.45 2.80 1.10 5.00 2.96 0.87 17 TWIX 4.00 1.70 1.20 3.98 2.48 1.94 18 SANTHERA 3.20 2.05 1.15 3.86 2.96 1.96 19 MUROLINO 4.50 3.35 1.65 3.97 2.59 1.85 20 MAGIC TIMES 3.70 2.15 1.50 3.78 2.82 1.88 21 RED BULL 3.90 3.00 1.75 3.85 3.33 1.98 22 Feel'n LEARN 4.00 2.55 1.30 3.95 3.28 1.85 23 bonvita 4.90 2.65 1.55 4.20 2.69 1.85 24 FMH 4.40 2.75 1.40 4.43 2.07 1.57 25 ACTIVIA 4.25 2.00 1.65 4.20 2.22 1.98 Average 4.04 2.38 1.38 4.12 2.67 1.80 Table 8 Spearman rank correlation and Pearson pairwise correlation Spearman Rank Correlation Pearson Pairwise Correlation 1.00 (p<0.05) 0.92 (p<0.0001) ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Fig. 1 Different types of trademarks (OHIM, 2014) Fig. 2 Conceptual model of the proposed method for multi-faceted assessment of trademarks. Trademark Conceptual Comparison Algorithm Inference Engine Module Semantic Technology Set Similarity Theory CBIR Technology Shape Features Conceptual Module Phoneme Analysis Phonological Features Phonetic Module Fuzzy Logic Trademark Similarity Aggregation Score Algorithm Trademark Phonetic Comparison Algorithm Visual Module Trademark Visual Comparison Algorithm FORTIS Word mark Figurative word mark Purely figurative mark Purely figurative mark with figurative word mark ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Phonetic SimilarityConceptual SimilarityVisual Similarity Fuzzification Inference Defuzzification Input Individual Assessments Ranked Output Fig. 3 Flow chart of the proposed method. I N D E X 1 N D E X .615 1 1 1 1 Shape-based Similarity Comparison Algorithm Similarity =4.615 / 5 = 0.923 Fig. 4 Illustrative example of the visual similarity score computation employed in the proposed method. ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 very low low medium high very high 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Fig. 5 Input membership functions used. PHONETIC VL CONCEPTUAL VL L M H VH V I S U A L VL VL VL L H VH L VL L L H VH M L L M H VH H H H H H VH VH VH VH VH VH VH PHONETIC L CONCEPTUAL VL L M H VH V I S U A L VL VL VL L H VH L VL L L H VH M L M M H VH H H H H H VH VH VH VH VH VH VH PHONETIC M CONCEPTUAL VL L M H VH V I S U A L VL L M M H VH L M M M H VH M M M M H VH H H H H H VH VH VH VH VH VH VH PHONETIC H CONCEPTUAL VL L M H VH V I S U A L VL H H H H VH L H H H H VH M H H H H VH H H H H H VH VH VH VH VH VH VH PHONETIC VH CONCEPTUAL VL L M H VH V I S U A L VL H H VH VH VH L H H VH VH VH M VH VH VH VH VH H VH VH VH VH VH VH VH VH VH VH VH (a) (b) (c) (d) (e) Fig. 6 Associative matrices used for rule derivation in the inference process. very low low medium high very high 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Fig. 7 Output membership functions utilized in the inference process. ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Fig. 8 Illustrative example of the proposed aggregation method for the trademark pair Skypine and SKYLINE. HIT Preview Trade Marks: Degree of Similarity Scoring Trademarks may seem similar because they look similar (i.e. have visual similarity), have similar meaning (conceptual similarity) or sound similar (phonetic similarity). This task examines the degree of similarity between different trademarks. Based on the above explanation, please rank the following trademarks on the scale of 1 to 5, 5 being the most similar to the query and 1 being the least similar to it. Query trade mark: 1. TRIX 2. TREAC 3. TREAKOL 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Submit Fig. 9 An example task used in Experiment 2. Phonetic Similarity Module Visual Similarity Module Conceptual Similarity Module 0.62 0.42 0.81 87 88 ฀ ฀ 92 93 ฀ ฀ 112 113 ฀ ฀ 117 118 Implication Operation Method (min) :: SKYLINE 0.798 87. IF visual is medium AND conceptual is low AND phonetic is high THEN output is high 88. IF visual is medium AND conceptual is medium AND phonetic is high THEN output is high 92. IF visual is high AND conceptual is low AND phonetic is high THEN output is high 93. IF visual is high AND conceptual is medium AND phonetic is high THEN output is high 112. IF visual is medium AND conceptual is low AND phonetic is very high THEN output is very high 113. IF visual is medium AND conceptual is medium AND phonetic is very high THEN output is very high 117. IF visual is high AND conceptual is low AND phonetic is very high THEN output is very high 118. IF visual is high AND conceptual is medium AND phonetic is very high THEN output is very high Centroid Defuzzification Method In fe re n c e E n g in e M o d u le Aggregation Operation (max) ACCEPTED MANUSCRIPT A C C E P T E D M A N U S C R IP T Fig. 10 Similarity scores obtained in Experiment 2. 0.00 1.00 2.00 3.00 4.00 5.00 6.00 0.00 1.00 2.00 3.00 4.00 5.00 6.00 P ro p o s e d A lg o ri th m D e g re e o f S im il a ri ty S c o re HIT Similarity Score work_2vcf2a424rfptcuffnkojfx2hm ---- Sem-Fit: A semantic based expert system to provide recommendations in the tourism domain to provide recommendations Sem-Fit: A semantic based expert system in the tourism domain Ángel García-Crespo 1, José Luis López-Cuadrado 2, Ricardo Colomo-Palacios ⇑, Israel González-Carrasco 2, Belén Ruiz-Mezcua 3 Computer Science Department, Universidad Carlos III de Madrid, Av. Universidad 30, Leganés 28911, Madrid, Spain a r t i c l e i n f o a b s t r a c t Keywords: Semantic technologies Semantic labeling Fuzzy logic Recommender systems Hotels The hotel industry is one of the leading stakeholders in the tourism sector. In order to reduce the trav eler’s cost of seeking accommodations, enforce the return ratio efficiency of guest rooms and enhance total operating performance, evaluating and selecting a suitable hotel location has become one of the most critical issues for the hotel industry. In this scenario, recommender services are increasingly emerg ing which employ intelligent agents and artificial intelligence to ‘‘cut through’’ unlimited information and obtain personalized solutions. Taking this assumption into account, this paper presents Sem Fit, a seman tic hotel recommendation expert system, based on the consumer’s experience about recommendation provided by the system. Sem Fit uses the consumer’s experience point of view in order to apply fuzzy logic techniques to relating customer and hotel characteristics, represented by means of domain ontolo gies and affect grids. After receiving a recommendation, the customer provides a valuation about the rec ommendation generated by the system. Based on these valuations, the rules of the system are updated in 1. Introduction Tourism is one of the most powerful With roughly 11% of the world’s total emplo is often presented as the first global indust the first tourist continent (Longhi, 2007). Th nization (2006) predicts that by 2020, tou world will increase by over 200%. Because tion intensive business, there are opportun order to adjust the new recommendations to past user experiences. To test the validity of Sem Fit, the validation accomplished includes the interaction of the customer with the system and then the results are compared with the expert recommendation for each customer profile. Moreover, the values of preci sion and recall and F1 have been calculated, based on three points of view, to measure the degree of rel evance of the recommendations of the fuzzy system, showing that the system recommendations are on the same level as an expert in the domain. industries worldwide. yment or GDP, tourism ry, and Europe is by far e World Tourism Orga rist arrivals around the tourism is an informa merce (Cao & Schniederjans, 2006). Not in vain, to be able to com pete in an increasingly competitive and globalized market, companies need to be provided with new strategies that allow them to confront successfully the environment challenges (Acosta & Febles, 2010). Developments in search engines, carrying capacity and speed of networks, have influenced the number of travelers around the world that use technologies for planning and experi encing their travels (Buhalis & Law, 2008). Thus, according to ities to apply informa Kenteris, Gavalas, and Economou (2009), the convergence of IT tion technology (IT) to support tourism and tourists (Watson, Akselsen, Monod, & Pitt, 2004), and, pursuing these opportunities, and communications technologies and the rapid evolution of the Internet have been some of the most influential factors in tourism the tourism industry is leading eCommerce applications (Werthner & Ricci, 2004) and one of the fastest growing segments of eCom ⇑ Corresponding author. Tel.: +34 91 624 59 58; fax: +34 91 624 9129. E-mail addresses: angel.garcia@uc3m.es (Á. García-Crespo), joseluis.lopez. cuadrado@uc3m.es (J.L. López-Cuadrado), ricardo.colomo@uc3m.es (R. Colomo- Palacios), Israel.gonzalez@uc3m.es (I. González-Carrasco), belen.ruiz@uc3m.es (B. Ruiz-Mezcua). 1 Tel.: +34 91 624 94 17; fax: +34 91 624 9129. 2 Tel.: +34 91 624 91 17; fax: +34 91 624 9129. 3 Tel.: +34 91 624 99 68; fax: +34 91 624 9129. 1 that have changed travelers’ behavior. Indeed the Internet is cur rently the primary source of tourist destination information for travelers (Chiu, Yueh, Leung, & Hung, 2009). Buhalis (1998) pointed out that the use of the Internet in the tourism industry provides access to a large number of people, as well as offering the opportu nity to develop closer relationships with customers. ICTs have radically changed the efficiency and effectiveness of tourism organisations, the way that businesses are conducted in the marketplace, as well as how consumers interact with organisations (Buhalis, 2003). And in a highly competitive environment, mailto:angel.garcia@uc3m.es mailto:joseluis.lopez.cuadrado@uc3m.es mailto:joseluis.lopez.cuadrado@uc3m.es mailto:ricardo.colomo@uc3m.es mailto:Israel.gonzalez@uc3m.es mailto:belen.ruiz@uc3m.es referencia bibliográfica. Published in:Expert Systems with Applications, (15 September 2011), 38(10), 13310–13319 according to Luo, Feng, and Cai (2004), tourists who searched on the Internet tended to spend more at their destinations as com pared to those who consult other information sources. The hotel industry is one of the leading stakeholders in the tour ism sector. In order to reduce travelers’ cost of seeking accommo dations, enforce the return ratio efficiency of guest rooms and enhance total operating performance, evaluating and selecting a suitable hotel location has become one of the most critical issues for the hotel industry (Chou, Hsu, & Chen, 2008). Service quality in hotels can be regarded as a composite measure of various attri butes, including tangible attributes but also intangible/subjective attributes (safety, quietness, . . .) (Benítez, Martín, & Román, 2007). Consumers’ judgments towards a service depend basically on the strength of their beliefs or expectations about various fea tures or attributes associated with the service and the weight of attributes (Engel, Blackwell, & Miniard, 1995). And, since online travelers are enthusiastic about meeting other travelers who have similar attitudes, interests and way of life (Wang, Yu, & Fesenma ier, 2002), there is a strong connection between travelers’ judg ments and hotel recommendations. In this scenario, recommender services are increasingly emerging that employ intelligent agents and artificial intelligence to ‘‘cut through’’ unlim ited information and obtain personalized solutions (Ricci & Werth ner, 2006). Recommender systems are commonly defined as applications that e commerce sites exploit to suggest products and provide consumers with information to facilitate their deci sion making processes (Niininen, Buhalis, & March, 2007). How to deliver relevant information to both potential and existing cus tomers has become an important task for the hospitality industry (Xian, Kim, Hu, & Fesenmaier, 2007), which, in many cases, is based on artificial intelligence techniques (Schiaffino & Amandi, 2009). Destination recommendation systems are mostly fed with subjec tive information provided by the tourism industry itself (Goosen, Meeuwsen, Franke, & Kuyper, 2009), but most of the relevant infor mation for recommendation should come from customers. Indeed, according to Klein (1998), travel products in general are arguably ‘‘experience goods’’ in that full information on certain attributes cannot be known without direct experience. Taking this assumption into account, this paper presents Sem Fit, a semantic hotel recommendation expert system, based on the consumer’s experience about the recommendation provided by the system. The proposed expert system uses the consumer’s experience point of view in order to apply fuzzy logic techniques to relating customer and hotel characteristics. Hotel characteris tics are represented by means of a domain ontology. After receiving a recommendation, the customer provides a valuation about the recommendation generated by the system. Based on the customer’s valuations, the rules of the system are updated in order to adjust the new recommendations to the past user experiences. The paper consists of five sections and is structured as follows. Section 2 reviews the relevant literature. Section 3 discusses the main features of Sem Fit including the conceptual model, algorith mic and architecture. Section 4 describes the evaluation of the tool performed including a description of the sample, the method, re sults and discussion. Finally, the paper ends with a discussion of re search findings, limitations and concluding remarks. 2. Background It is a widely recognized fact that information and decision making have become the foundation for the world economy (Wang, 2008). Among many enterprise assets, knowledge is trea ted as a critical driving force for attaining enterprise performance goals because knowledge facilitates better business decision 2 making in a timely fashion (Han & Park, 2009). And due to the importance of tourism, recommender systems and decision sup port systems for tourism have been a field of study since the very beginnings of artificial intelligence. A recommender system can provide a set of solutions that best fit the user, depending on different factors concerning the user, the objective or the context where it is applied. Such systems can re duce search efforts (Liang, Lai, & Ku, 2006) and provide valuable information to assist consumers’ decision making process (Ricci, 2002) in order to solve the problem of information overload (Kuo, Chen, & Liang, 2009). Adomavicius and Tuzhilin (2005) pro vide a survey of recommender systems as well as describe various limitations of current recommendation methods, and discuss pos sible extensions that can improve recommendation capabilities and make recommender systems applicable to an even broader range of applications. Due to the importance of tourism, many efforts had been de voted to recommender systems for tourism (e.g. Castillo et al., 2008; Loh, Lorenzi, Saldana, & Licthnow, 2004; Ricci & Nguyen, 2007; Wallace, Maglogiannis, Karpouzis, Kormentzas, & Kollias, 2003), often based on artificial intelligence techniques, as was pre dicted in the early nineties by Crouch (1991): intelligent agents (e.g. Aciar, Serarols Tarres, Royo Vela, & De la Rosa i Esteva, 2007; Schiaffino & Amandi, 2009), fuzzy approaches (e.g. Lenar & Sobecki, 2007; Ngai & Wat, 2003), Bayesian networks (e.g. Huang & Bian, 2009; Jiang, Shang, & Liu, 2009), to cite just a few. Nor is the field of using semantics in tourism new. Fodor and Werthner (2004) presented Harmonise, a project that deals with business integration in tourism using ontologies for mediation. The SATINE project by Dogac et al. (2004) describes how to deploy semantically enriched travel Web services and how to exploit semantics through Web service registries. Niemann, Mochol, and Tolksdorf (2008) propose how to enhance hotel search with Semantic Web Technologies. Jakkilinki, Georgievski, and Sharda (2007) proposed an ontology based e Tourism Planner AuSTO that enables users to create an itinerary in one single application by this intelligent tool that builds on semantic web technologies. The LA_DMS project (Kanellopoulos, 2008) provides semantic based information for tourism destinations by combining the P2P para digm with semantic web technologies. In the specific field of using semantics to provide better infor mation for tourists there have been relevant and recent efforts in the literature. For example, García Crespo et al. (2009) proposed a semantically enriched recommendation platform for tourists on route, later expanded to Destination Management Organizations (García Crespo, Colomo Palacios, Gómez Berbís, Chamizo, & Rive ra, 2010a). Lee, Chang, and Wang (2009) used ontologies to provide recommendation in the context of the city of Tainan (Taiwan). Fi nally, and in the most similar contribution to the literature, Huang and Bian (2009) integrated Bayesian networks and semantics to provide personalized recommendations for tourist attractions over the Internet. According to Huang and Bian (2009), there are two challenges in developing a system for personalized recommendations for tourism. One is the integration of heterogeneous online travel information. The other is the semantic matching of tourist attractions with travelers’ preferences. Taking into account that all information will be given to the system by the users using this means, the challenge of Sem Fit will be the semantic match ing between attractions and preferences. To do so, Sem Fit will use the fuzzy logic paradigm in order to express the relationship between the hotel characteristics and the customer preferences. The Sem Fit’s fuzzy engine will recalculate this fuzzy relation ships based on the customer feeling about previous recommen dations, allowing the automatic adaptation of the recommender system to the customers’ preferences. Young people 10 20 30 40 Fig. 2. Membership function for the fuzzy set ‘‘young people’’. 3. Sem-Fit: fundamentals and internals Usually experts and customers express their knowledge and preferences in the form of imprecise terms such as ‘‘young’’ or ‘‘near’’. The fuzzy logic paradigm allows the representation of these imprecise terms in order to implement intelligent systems. Based on this paradigm, Sem Fit allows the representation of fuzzy vari ables in order to describe hotels and customers. Later on, Sem Fit uses these variables to perform recommendations. This section is structured as follows. First, an explanation on the fuzzy logic par adigm is given. Secondly, the process of capturing fuzzy knowledge in hotel recommendation is depicted. Later on, the recommenda tion process is defined and explained. Fourth, the customer feeling capturing procedure is portrayed. Finally, Sem Fit architecture and implementation is shown. 3.1. Fuzzy logic paradigm The fuzzy sets theory provides a framework for the representa tion of the uncertainty of many aspects of human knowledge (Zadeh, 1965). Classic sets theory establishes that the elements of the universe may belong to a set or may not belong to that set. Then, given the set of odd numbers: U f1; 2; 3; 4; 5; . . .g; we can affirm that ‘‘3’’ belongs to such a set. We can also affirm that the number ‘‘21’’ belongs to the set of numbers greater than 7, but number 3 does not belong to such a set. The membership function for a given set can be depicted as shown in Fig. 1. When the element belongs to the set, the function takes the value 1, and when the ele ment does not belong to the set, the function takes the value 0. For a given element, fuzzy sets theory proposes the use of inter mediate degrees of membership to a set. In this way, if we consider the set of young people we can consider that a person who is 15 years old belongs to such a set with a degree of 1 (belongs), a person who is 30 years old belongs in some degree (for example 0.7) and a person who is 80 years old does not belong to this set (degree of 0). We can represent this membership function as de picted in Fig. 2. Fuzzy sets provide a way for defining cases in which the mem bership to a given set is relative or the membership function is not defined at all, allowing the representation of imprecision or uncer tainty. Imprecision and uncertainty modeling by means of fuzzy sets allows solving problems which cannot be solved by means 1 0 1 2 3 4 5 6 7 8 9 10 11 … Fig. 1. Membership function for the set of numbers greater than 7. 3 of classic techniques. Some such domains are: classification prob lems, pattern recognition, signal processing, knowledge based sys tems or temporal reasoning. Using this fuzzy theory, the purpose of Sem Fit is to offer hotel recommendations based on expert criterion. This expert criterion will be represented by means of fuzzy sets based on the affect grid (Russell, Weiss, & Mendelsohn, 1989). However, the customer pref erences may be different from the expert point of view or may change as time goes by. For this reason it is necessary to update the fuzzy rules which represent the expert criterion using the cus tomers’ feedback. 3.2. Capturing the fuzzy knowledge about hotel recommendation Fig. 3 depicts the main elements in our proposal about captur ing knowledge in hotel recommendations: 1. The semantic descriptions of the hotels. 2. The fuzzy relationship between the characteristics of the cus tomers and the characteristics of the hotels. 3. The customer feeling about the recommendations. The first step of the recommendation process consists of repre sentation of knowledge about how the hotels are selected. This knowledge is expressed using fuzzy sets. In a first stage, the fuzzy sets are defined based on the expert knowledge. An expert defines by means of fuzzy sets the characteristics of the hotels and the characteristics of the customers. Some fuzzy sets defined are: lux ury hotel, relaxing trip or young people. After defining the fuzzy terms used by the expert to describe hotels and customers, the membership function is defined for each fuzzy set. This membership function allows for the transformation of the customer characteristics and the hotels characteristics into fuzzy values. Studies of emotion and affect in human beings have an estab lished history which originates in philosophy. As a result of this tradition, and using their own work as a basis (Russell, 1980), Russell et al. (1989) proposed a measure of affect which had a pro found impact on the field of social psychology. They termed the measure the affect grid, a scale designed as a quick means of assessing affect along the dimensions of pleasure displeasure and arousal sleepiness on a 1 9 scale. The affect grid may prove to be the instrument of choice when subjects are called on to make affective judgments in rapid succession or to make a large number of judgments, especially when those judgments are to be aggre gated (1989). Using this scale, researchers can collect emotion rat ings from stress and tension to calm, relaxation or serenity, and Fig. 3. System design. from ecstasy, excitement and joy to depression, melancholy, sad ness, and gloom. According to the studies of these authors, the affect grid is potentially suitable for any study that requires judgments about affect of either a descriptive or a subjective kind. Based on the cus tomer characteristics, the fuzzy system will estimate the custom ers affect grid for each hotel characteristic. Then, the fuzzy sets for the representation of the customer characteristics are related with the concepts of the hotel ontology by means of the affect grid. In this way, the results of the fuzzy recommender can be translated into concepts of the hotel ontology in order to obtain the recommendation. The measure of the affect grid will be translated to linguistic tags related to the hotel characteristics by mean of semantic annotations based on the rules defined by the expert options. Such tags are ade quate, very adequate, not adequate or never recommend. Each tag has a value related in order to rate each hotel based on the amount of positive or negative tags obtained from the fuzzy reasoning. Affect grid was previously employed by authors in previous works that combined semantic technologies with emotions (García Crespo, Colomo Palacios, Gómez Berbís, & García Sánchez, 2010b). 3.3. Recommendation process The main steps of the recommendation process are depicted in Fig. 4. The recommendation is primarily based on the customer char acteristics. The steps of the reasoning process are: 1. Obtaining the customer characteristics. The customer provides the information about his personal situation, preferences, etc. The information required is established based on the expert cri terion as mentioned in Section 3.1. 2. Fuzzification. The values provided by the customer are con verted into the fuzzy sets defined by the expert to describe the customer characteristics. The fuzzy value for each variable is obtained by means of the membership function related to each fuzzy set. 4 3. Once the customer profile has been fuzzified, the fuzzy rules are evaluated obtaining the set of fuzzy values for the hotel charac teristics, as well as its tag for defining the level of adequation based on the affect grid (Fig. 5). 4. The results obtained are defuzzified in order to obtain a set of concrete characteristics for the hotels. 5. With the defuzzified results, hotels with the obtained charac teristics are retrieved based on the hotels ontology and the annotations. 6. Next the fuzzy recommendation system will calculate the weight of each hotel based on the values of the suitability obtained from the decision matrix. The total weight can be configured in two modes, called ‘‘nor mal mode’’ and ‘‘sensible mode’’. The ‘‘normal’’ mode calculates the weight of the hotel as the summitry of all the suitabilities: Pproduct Xn i 1 Xm j 1 W ij; where Wij is the defuzzified value of the association between the customer characteristic i and the hotel characteristic j. The ‘‘sensible’’ mode calculates the weight of the hotel as the product of all the suitabilities. Pproduct Yn i 1 Ym j 1 W ij: The sensible mode allows greater differences between hotels, pro viding more differentiation between the recommendations. 7. The hotel which obtains the highest weight is the final recom mendation, called ‘‘Star recommendation’’. Also a set of alterna tive recommendations, called ‘‘suggested hotels’’ will be shown. These steps constitute the primary recommendation process. However, the expert criterion may not be adequate, because the customer tendencies can change. For this reason, the fuzzy rela tionships between the customer characteristics and the character istics of the hotels must be dynamically recalculated based on the customer feeling. Fig. 4. Recommendation process. Fig. 5. Affect grid for hotel characteristics. 3.4. Capturing the customer feeling As mentioned, the knowledge about the relationships between customer characteristics and hotel characteristics represented by means of the affect grid (Russell et al., 1989) has to be updated. There are two possibilities to update such values. On the one hand, the expert can update the values of the affect grid and, on the other hand, these values can be automatically updated by means of heu ristics. The first option is the classic approach in which an expert updates the knowledge after studying the customer tendencies. The second option allows the automatic adaption of the recom mender system based on the information acquired during the rec ommendation process. For this option it is necessary to take into account the level of agreement of the customer with the solution provided in order to make future recommendations. Such recom 5 mendations have an effect in two ways: on the one hand they al low the correction of the initial criterion of the expert because the negative valuation by the customers may imply the need to redefine the initial affect grid. On the other hand, the feedback of the customer may be stored in order to adjust the criterion of the expert to the customer preferences. For this reason, we propose the automatic reconfiguration of the fuzzy relationships using customer feedback. After receiving a rec ommendation the customer will rate his level of pleasure displea sure about the characteristics of the recommendation received. The objective of this question is to obtain the first impression of the customer about the recommendation. The fact that the rating of the customer is about his feeling about the recommendation in stead of his real experience is important. This is because the rec ommendation could be correct but the real experience of the customer could be negative, due to external factors (bad weather, personal problems, illness, etc.). The customer feelings are stored in the knowledge base. When the number of customer evaluations is greater than one hundred, the fuzzy system will evaluate the overall customer feelings based on fuzzy meta rules defined in order to determine when it will be necessary to update the affect grids. If the overall customer feeling is negative, then the affect grids related to the negative valued rec ommendations will be adjusted according to such meta rules. The updating process will be repeated every 100 customer valuations. 3.5. System architecture and implementation Fig. 6 depicts a three layer scheme that represents the recom mendation system architecture. The first layer corresponds to the user interface. There are two different elements in the user interface layer. On the one hand, the administrator GUI allows the administrator to define the fuzzy rules that represent knowledge about hotel recommen dation. The administrator GUI consists of a web based applica tion to allow the definition of the fuzzy sets as well as the membership function for each fuzzy set. This application also al lows the easy matching between customer characteristics and hotel characteristics by means of the affect grid. On the other hand, the GUI recommendation allows the customers to obtain intelligent recommendations about hotels. The GUI recommen dation consists of a web questionnaire in which the customer answers a set of questions proposed by the expert. Such ques tions determine the customer profile in order to determine the most suitable hotel. After the questions have been answered, the fuzzy engine evaluates the rules, and the most suitable hotel Fig. 6. System a 6 is shown to the customer. Optionally, a list of less suitable hotels can be displayed. After the recommendation, the customer pro vides his degree of pleasure displeasure about the recommenda tion received. This information will be used by the fuzzy engine for reconfiguring the fuzzy relationships between the customer characteristics and the hotel characteristics. The second layer represents the business logic. This layer con tains the fuzzy engine and the semantic engine. The fuzzy engine will evaluate the fuzzy rules defined by the expert in order to determine the most suitable hotel characteristics for a given cus tomer profile based on the affect grid. As mentioned, the fuzzy knowledge is defined by means of the GUI administrator. Once the hotel characteristics have been determined, the fuzzy engine retrieves the hotels with these characteristics by means of the semantic engine. The semantic engine, based on the Jena Frame work (Reynolds, 2006), will manage the hotels ontology and will provide the set of hotels with the characteristics determined by the fuzzy engine. Besides the recommendations, the fuzzy engine will recalculate the fuzzy affect grid based on the customer feed back. The fuzzy recalculator has been implemented as a daemon in the web server. Such a daemon monitors customer feedback, recalculates the fuzzy relationships between characteristics when the customer evaluation is significant and updates the affect grids according to such customer feedback. Finally, the persistence layer stores the knowledge about the hotel recommendation. As mentioned, on the one hand, the hotel ontology defines the relevant characteristics of each hotel. The concepts of the hotel ontology (Fig. 7) describe the category of each hotel based on stars, the room characteristics, the special equip ment of the hotel, etc. This ontology also describes special activi ties related to the hotel based on the season. All this information rchitecture. Fig. 7. Partial view of the hotels ontology. is used to describe the hotels available in the system. The hotels ontology has been defined using the Ontology Web Language (OWL) (Bechhofer et al., 2004). The OWL language has three vari ants: OWL Lite, OWL DL and OWL Full. OWL Lite provides a small set of features, while OWL DL is more expressive than OWL Lite providing decidability based on description logics. OWL Full allows full expressivity but decidability is not warranted. For this reason, we have used OWL DL for the ontology definition. The storage and ontology reasoning has been developed based on the Jena frame work. Besides the ontology specific storage, a database stores the information about the fuzzy sets and the fuzzy relationships with the ontology. On the other hand, the customer characteristics and their relationships with the hotel characteristics are repre sented by means of fuzzy sets and fuzzy relationships by means of affect grids and are stored in a database. Table 1 Comparison between system and expert recommendations. Precision Recall F1 Star recommendation vs. expert recommendation 0.58 0.58 0.58 Overall system suggestions vs. expert recommendation 0.19 0.96 0.32 4. Evaluation The subsequent section describes the empirical evaluation of the project. The final aim of this study is to work out if Sem Fit serves as a valid recommendation system in a controlled environment. 4.1. Research design Once the system has been developed and trained, it is necessary to prove the validity of our proposal; we must prove that the star recommendations provided by the system are accepted by the cus tomers. In case of disagreement with the star recommendation, it is of interest to know if the customer found a valid alternative in the list of suggested hotels. Additionally, given a set of customer profiles we have evaluated the similarity between the recommen dations provided by the fuzzy system and the recommendations provided by four experts in hotel recommendations. By means of this experiment we have evaluated the accuracy of the fuzzy sys tem comparing its recommendations to the recommendations of human experts. In case of disagreement between the expert crite ria and the fuzzy system’s star recommendation, it is of interest to know if the expert recommendation is included in the list of sug gested hotels. Finally, comparison between the expert recommen dation and the selection of the customer has been performed in order to validate the expert criterion. Precision, recall and F1 will be used in order to measure the de gree of relevance of the recommendations of the system. This tech nique has been employed based on two points of view: on the one hand, it is used to measure customer agreement with the recommendation received and, on the other hand, to measure the agreement between the system results and the expert recommen dations. As mentioned, this technique has been also employed to measure customer agreement with the expert’s recommendation. These measures had been used before to measure recommenda tions in semantic systems (e.g. García Crespo, Colomo Palacios, Gómez Berbís, & Ruiz Mezcua, 2010c, García Crespo, Rodríguez, Mencke, Gómez Berbís, & Colomo Palacios, 2010d, Paniagua Martín, García Crespo, Colomo Palacios, & Ruiz Mezcua, 2011). Precision, recall and F1 measures are defined as follows: 7 precision CorrectHotelsFound TotalHotelsFound ; recall CorrectHotelsFound TotalCorrectHotels ; F1 2 � precision � recall precision þ recall : The correct hotels will be determined by the expert criteria or the customer depending on the test. The following subsections describe the experimentation carried out and the results obtained. 4.2. Sample In order to determine the validity of the proposed approach, the designed evaluation was carried out. A set of 50 students (37 male and 13 female) in their final year of the Computer Science degree program at Universidad Carlos III de Madrid accessed the system in order to obtain a recommendation for their spring break. The system was trained with information on 10 hotels in Mallorca (Spain) based on the knowledge of an expert from a travel agency. With the results obtained from the system, each student selected the start preference or, in case of disagreement, one of the alterna tive recommendations provided or an empty alternative. Four experts in the travel domain recommend a hotel for each student in order to compare the recommendation of the system with the expert criteria. Recommendation generated by the fuzzy system was also compared with the expert recommendations in order to measure the validity of the star and alternative recommendations. 4.3. Results The experimentation was carried out in two separated tests that will be analyzed in the next subsections. 4.3.1. Test 1. Comparison between the system results and the expert recommendation Table 1 summarizes the precision and recall and F1 values ob tained by means of the comparison between the expert recommen dations and the results provided by the fuzzy system. This comparison has been divided into two stages. In the first stage, we have compared the star recommendation of the system with the expert recommendation. The star recommendation consists of a single recommendation (the top rated hotel for the customer characteristics) provided by the fuzzy system for each customer, and the expert recommendation is also a single recommendation suggested by the expert for each customer. The precision of the fuzzy system is 0.58. The value of recall is the same as precision be cause the number of returned categories is the same as the number of valid categories. We consider this value a promising result be cause more than 50% of the recommendations of the system are the same as the expert, and there is only one possible result. On the other hand, the fuzzy system provides a set of four alter natives for the star recommendation. We have compared the over all results of the system (five results, star recommendation and four suggested hotels) with the suggestion of the expert. In this scenario we have obtained a precision of 0.192, less than the 0.58 obtained for the star recommendation. It is logical conse quence because the number of returned values is greater (5 for each customer vs. only one star recommendation) and the number of correct values (recommendations of the expert) is the same. However the recall value achieves the value of 0.96. It means that in 96% of the cases the fuzzy system includes the recommendation of the expert in the set of results returned (either as star or sug gested recommendation). It is a good result because the system is capable of providing an expert recommendation in the set of suggestions in almost all of the cases. 4.3.2. Test 2. Comparison between the customer selection and the suggestions of both system and expert Test 1 compared the results of the system with the suggestions of the expert. But, are the expert suggestions a good basis? In order to answer this question, Test 2 compares the recommendations of the expert with the selection of the customer. Table 2 shows the precision, recall and F1 values obtained in this test. The precision value is 0.76, and the recall value is the same. It means that the 76% of expert recommendations are accepted by the customer. The second stage of the Test 2 consisted of the comparison be tween the system result and the final customer selection. In this stage we consider separately, on the one hand, only the star recom mendation and, on the other hand, the star recommendation plus the alternative suggestions. Precision and recall and F1 values for test 2 are presented in Table 3. The first row shows the comparison between the star recommendation and the customer selection. In this case, the precision value is 0.48. It means that the star recommendation of the system is the customer selection in the 48% of the cases. The value of recall is the same because the valid values are the same as the amount of values considered. The second row of Table 3 presents the comparison between the overall results of the system (star recommendation plus 4 sugges tions) and the final customer selection. As in Test 1, the precision value is smaller because the number of suggestions is greater, but the recall value is 0.96. It means that in 96% of the cases, the system offers a star recommendation or an alternative suggestion that satisfies the customer. 4.4. Discussion After the interaction of the customer with the system, the re sults were compared with the expert recommendation for each customer profile. We found that 24 customers found the star rec Table 2 Comparison between expert recommendation and customer selection. Precision Recall F1 Expert recommendation vs. customer selection 0.76 0.76 0.76 Table 3 Comparison between system recommendation and customer selection. Precision Recall F1 Star recommendation vs. customer selection 0.48 0.48 0.48 Overall system suggestions vs. customer selection 0.19 0.96 0.32 8 ommendation provided by the fuzzy system suitable and 48 cus tomers found a recommendation included in the star and suggested hotels suitable. Only two customers did not find a suit able recommendation. On the other hand, we found that 29 star recommendations coincide with the expert recommendation, and 48 expert recommendations were included in the sum of the star recommendations with the suggested recommendations. We can conclude that the sum of suggested and star recommendations provides an accurate set of recommendations in which the cus tomer can find a hotel. The system recommendations fit with the expert recommendation taking into account star and suggested ho tels. Finally the expert recommendation coincides with the cus tomer selection in 38 cases. However, the sum of star and suggested hotels of the fuzzy systems obtain better results than the expert recommendation. As shown in the previous subsection, the values of precision and recall and F1 have been calculated based on three points of view. Test 1 measured the similarity between the system recommenda tions and the expert recommendations. If we only take into ac count the star recommendation, both precision and recall values for the system recommendation in this scenario are 0.58. It means that 58% of the system star recommendations coincide with the ex pert recommendation. As mentioned, it is an acceptable margin be cause there is only one valid value (the expert recommendation which is highly subjective) and the system only offers one star rec ommendation. However, if we include in the study the four alter native suggestions of the system we can see that the precision decreases because the number of categories found is greater and the correct value is only one, but the recall value is 0.96. It is an excellent result because in 96% of the cases the system offers the expert recommendation. It means that the system recommenda tions are on the same level as an expert in the domain and the cus tomer can express his feelings in the same way that he does with the expert. Test 2 studies, on the one hand, the results of the expert vs. cus tomer selection. In this case, the precision and recall values ob tained are 0.76. We can see that the expert obtains better results with only one recommendation. This is natural, because the expert can take into account more parameters such as the corporal expression of the customer, non verbal behavior) in order to pro vide a recommendation. It is a good value, because the expert pro vides 3 good recommendations out of 4 recommendations. It also means that the knowledge represented in the fuzzy system is accu rate. On the other hand, Test 2 compared the star recommendation of the system with the customer selection. In this case, the preci sion value was 0.48. It means that the star recommendation is ac cepted by the customer in 48% of the cases. However, when we add the additional suggestions of the system, the precision is smaller because the number of suggestions is greater and the number of correct values is only one (the customer selection), but the recall is 0.96. It is an excellent result because it means that the customer finds a suitable recommendation in 96% of the cases. It is a high rate of positive recommendations considering that the selection of a hotel is a highly subjective decision. Cao and Li (2007) obtain an average recall of 83.82% for recommendations of 15 laptops in all 138 laptops for seven different customers; however in Cao and Li’s study the customer can select more than one recommen dation. Zenebe and Norcio (2009) present a comparative study of several techniques for recommending systems obtaining a recall of 38% for the fuzzy set theoretical method in a highly subjective domain such as movies selection. Zenebe and Norcio’s experiment includes a large number of customers and movies, and it is natural that the recall value was smaller. In our case of study the number of customers and products are smaller. Based on the obtained results, the proposed system is able to estimate customers’ feelings based on their profile and is capable of offering a set of recommendations that will be accepted most of the time. Future research may include a larger number of hotels and customers in order to measure the scalability of the proposed system. 5. Conclusions and future work We have presented Sem Fit, a semantic hotel recommendation expert system, based on consumers’ experience about the recom mendation provided by the system. The proposed expert system uses the consumers’ experience point of view in order to apply fuz zy logic techniques to relate customers and hotels characteristics, represented by means of domain ontologies and affect grids. After receiving a recommendation, the customer provides a valuation about the recommendation generated by the system. Based on the customers’ valuations, the rules of the system are updated in order to adjust the new recommendations to the past user experi ences. The validation accomplished shows that the sum of star and suggested hotels of the fuzzy systems obtains better results than the expert recommendation. Moreover, the values of precision and recall and F1 reveal that the Sem Fit recommendations are on the same level as an expert in the domain and the customer will be able to express his feelings in the same way that he does with the expert. As an extension of this paper, the knowledge base could be im proved including more products, such as suitable destinies or asso ciated services. Furthermore, other ways of collecting data on customer feeling about a recommendation with greater accuracy and clarity could be studied. Acknowledgements This work is supported by the Spanish Ministry of Industry, Tourism, and Commerce under the EUREKA project SITIO (TSI 020400 2009 148), SONAR2 (TSI 020100 2008 665) and GO2 (TSI 020400 2009 127). References Aciar, S. V., Serarols-Tarres, C., Royo-Vela, M., & De la Rosa i Esteva, J. L. (2007). Increasing effectiveness in e-commerce: Recommendations applying intelligent agents. International Journal of Business and Systems Research, 1(1), 81–97. Acosta, Z., & Febles, J. (2010). The organizacional management as instrument to overcome the resistance to the innovative process: An application in the canary company. International Journal of Human Capital and Information Technology Professionals, 1(2), 49–64. Adomavicius, G., & Tuzhilin, A. (2005). Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Transactions on Knowledge and Data Engineering, 17(6), 734–749. Bechhofer, S., van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D. L., Patel- Schneider, P. F., et al. (2004). OWLWeb ontology language reference. http:// www.w3.org/TR/owl-ref/. Benítez, J. M., Martín, J. C., & Román, C. (2007). Using fuzzy number for measuring quality of service in the hotel industry. Tourism Management, 28(2), 544–555. Buhalis, D. (1998). Strategic use of information technologies in the tourism industry. Tourism Management, 19(5), 409–421. Buhalis, D. (2003). eTourism: Information technology for strategic tourism management. London: Pearson. Buhalis, D., & Law, R. (2008). Progress in information technology and tourism management: 20 years on and 10 years after the Internet—The state of eTourism research. Tourism Management, 29(4), 609–623. Cao, Q., & Schniederjans, M. J. (2006). Agent-mediated architecture for reputation- based electronic tourism systems: A neural network approach. Information & Management, 43(5), 598–606. Cao, Y., & Li, Y. (2007). An intelligent fuzzy-based recommendation system for consumer electronic products. Expert Systems with Applications, 33(1), 230–240. Castillo, L., Armengol, E., Onaindía, E., Sebastiá, L., González-Boticario, J., Rodríguez, A., et al. (2008). SAMAP: An user-oriented adaptive system for planning tourist visits. Expert Systems with Applications, 34(2), 1318–1332. Chiu, D. K. W., Yueh, Y. T. F., Leung, H. F., & Hung, P. C. K. (2009). Towards ubiquitous tourist service coordination and process integration: A collaborative travel agent system architecture with semantic web services. Information Systems Frontiers, 11(3), 241–256. 9 Chou, T. Y., Hsu, C. L., & Chen, M. C. (2008). A fuzzy multi-decision model for international tourist hotels location selection. International Journal of Hospitality Management, 27(2), 293–301. Crouch, G. I. (1991). Expert computer systems in tourism: Emerging possibilities. Journal of Travel Research, 29(3), 3–10. Dogac, A., Kabak, Y., Laleci, G., Sinir, S., Yildiz, A., Kirbas, S., et al. (2004). Semantically enriched Web services for the travel industry. ACM Sigmod Record, 33(3), 21–27. Engel, J. F., Blackwell, R. D., & Miniard, P. W. (1995). Consumer behavior. Forth Worth, TX: The Dryden Press. Fodor, O., & Werthner, H. (2004). Harmonise: A step toward an interoperable e- tourism marketplace. International Journal of Electronic Commerce, 9(2), 11–39. García-Crespo, A., Chamizo, J., Rivera, I., Mencke, M., Colomo-Palacios, R., & Gómez- Berbís, J. M. (2009). SPETA: Social pervasive e-Tourism advisor. Telematics and Informatics, 26(3), 306–315. García-Crespo, A., Colomo-Palacios, R., Gómez-Berbís, J. M., Chamizo, J., & Rivera, I. (2010a). Intelligent decision-support systems for e-tourism: Using SPETA II as a knowledge management platform for DMOs and e-tourism service providers. International Journal of Decision Support System Technology, 2(1), 35–47. García-Crespo, A., Colomo-Palacios, R., Gómez-Berbís, J. M., & García-Sánchez, F. (2010b). SOLAR: Social link advanced recommendation system. Future Generation Computer Systems, 26(3), 374–380. García-Crespo, A., Colomo-Palacios, R., Gómez-Berbís, J. M., & Ruiz-Mezcua, B. (2010c). SEMO: A framework for customer social networks analysis based on semantics. Journal of Information Technology, 25(2), 178–188. García-Crespo, A., Rodríguez, A., Mencke, M., Gómez-Berbís, J. M., & Colomo- Palacios, R. (2010d). ODDIN: Ontology-driven differential diagnosis based on logical inference and probabilistic refinements. Expert Systems with Applications, 37(3), 2621–2628. Goosen, M., Meeuwsen, H., Franke, J., & Kuyper, M. (2009). My Ideal tourism destination: Personalized destination recommendation system combining individual preferences and GIS data. Information Technology & Tourism, 11(1), 17–30. Han, K. H., & Park, J. W. (2009). Process-centered knowledge model and enterprise ontology for the development of knowledge management system. Expert Systems with Applications, 36(4), 7441–7447. Huang, Y., & Bian, L. (2009). A Bayesian network and analytic hierarchy process based personalized recommendations for tourist attractions over the Internet. Expert Systems with Applications, 36(1), 933–943. Jakkilinki, R., Georgievski, M., & Sharda, N. (2007). Connecting destinations with an ontology-based e-tourism planner. In M. Sigala, L. Mich, & J. Murphy (Eds.), Information and communication technologies in tourism (pp. 21–32). Berlin, Germany: Springer Wien. Jiang, Y., Shang, J., & Liu, Y. (2009). Maximizing customer satisfaction through an online recommendation system: A novel associative classification model. Decision Support Systems, 48(3), 470–479. Kanellopoulos, D. N. (2008). An ontology-based system for intelligent matching of travellers’ needs for group package tours. International Journal of Digital Culture and Electronic Tourism, 1(1), 76–99. Kenteris, M., Gavalas, D., & Economou, D. (2009). An innovative mobile electronic tourist guide application. Personal and Ubiquitous Computing, 13(2), 103–118. Klein, L. R. (1998). Evaluating the potential of interactive media through a new lens: search versus experience goods. Journal of Business Research, 41(3), 195–203. Kuo, M. H., Chen, L. C., & Liang, C. W. (2009). Building and evaluating a location- based service recommendation system with a preference adjustment mechanism. Expert Systems with Applications, 36(2 Part 2), 3543–3554. Lee, C. S., Chang, Y. C., & Wang, M. H. (2009). Ontological recommendation multi- agent for Tainan City travel. Expert Systems with Applications, 36(1), 6740–6753. Lenar, M., & Sobecki, J. (2007). Using recommendation to improve negotiations in agent-based systems. Journal of Universal Computer Science, 13(2), 267–286. Liang, T. P., Lai, H. J., & Ku, Y. C. (2006). Personalized content recommendation and user satisfaction: Theoretical synthesis and empirical findings. Journal of Management Information Systems, 23(3), 45–70. Loh, S., Lorenzi, F., Saldana, R., & Licthnow, D. (2004). A tourism recommender system based on collaboration and text analysis. Information Technology & Tourism, 6(3), 157–165. Longhi, C. (2007). Usages of the Internet and e-tourism. Towards a new economy of tourism. In Proceedings of the second international conference advances in tourism economics, Portugal. Luo, M., Feng, R., & Cai, L. A. (2004). Information search behavior and tourist characteristics: The internet vis-à-vis other information sources. Journal of Travel & Tourism Marketing, 17(2-3), 15–25. Ngai, E. W. T., & Wat, F. K. T. (2003). Design and development of a fuzzy expert system for hotel selection. Omega, 31(4), 275–286. Niemann, M., Mochol, M., & Tolksdorf, R. (2008). Enhancing hotel search with semantic Web technologies. Journal of Theoretical and Applied Electronic Commerce Research, 3(2), 82–96. Niininen, O., Buhalis, D., & March, R. (2007). Customer empowerment in tourism through consumer centric marketing (CCM). Qualitative Market Research: An International Journal, 10(3), 265–281. Paniagua-Martín, F., García-Crespo, Á., Colomo-Palacios, R., & Ruiz-Mezcua, B. (2011). SSAAMAR: Semantic annotation architecture for accessible multimedia resources. IEEE Multimedia, 18(2), 16–25. Reynolds, D. (2006). Jena rules. In Proceedings of the 2006 Jena user conference. Ricci, F. (2002). Travel recommender systems. IEEE Intelligent Systems, 17(6), 55–57. http://www.w3.org/TR/owl-ref/ http://www.w3.org/TR/owl-ref/ Ricci, F., & Nguyen, Q. N. (2007). Acquiring and revising preferences in a critique- based mobile recommender system. IEEE Intelligent Systems, 22(3), 22–29. Ricci, F., & Werthner, H. (2006). Recommender systems. International Journal of Electronic Commerce, 11(2), 5–9. Russell, J. A. (1980). A circumplex model of affect. Journal of Personality and Social Psychology, 39, 1161–1178. Russell, J. A., Weiss, A., & Mendelsohn, G. A. (1989). The affect grid: A single-item scale of pleasure and arousal. Personality and Social Psychology, 57(3), 493–502. Schiaffino, S., & Amandi, A. (2009). Building an expert travel agent as a software agent. Expert Systems with Applications, 36(2), 1291–1299. Wallace, M., Maglogiannis, I., Karpouzis, K., Kormentzas, G., & Kollias, S. (2003). Intelligent one-stop-shop travel recommendations using an adaptive neural network and clustering of history. Information Technology & Tourism, 6, 181–193. Wang, J. (2008). Improving decision-making practices through information filtering. International Journal of Information and Decision Sciences, 1(1), 1–4. 10 Wang, Y., Yu, Q., & Fesenmaier, R. D. (2002). Defining the virtual tourist community: Implications for tourism marketing. Tourism Management, 23(4), 407–417. Watson, R. T., Akselsen, S., Monod, E., & Pitt, L. F. (2004). The open tourism consortium: laying the foundations for the future of tourism. European Management Journal, 22(3), 315–326. Werthner, H., & Ricci, F. (2004). E-Commerce and tourism. Communications of the ACM, 47(12), 101–105. World Tourism Organization (2006). Tourism market trends 2005: World overview & tourism topics. Madrid, Spain: World Tourism Organization. Xian, Z., Kim, S. E., Hu, C., & Fesenmaier, D. R. (2007). Language representation of restaurants: Implications for developing online recommender systems. Hospitality Management, 26(4), 1005–1018. Zadeh, L. (1965). Fuzzy sets. Information & Control, 8(3), 338–353. Zenebe, A., & Norcio, A. F. (2009). Representation, similarity measures and aggregation methods using fuzzy sets for content-based recommender systems. Fuzzy Sets and Systems, 160(1), 76–94. work_2wc5hvqsirhb3f4jngklcwbfky ---- International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-9 Issue-2, December, 2019 4622 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number: B5116129219/2019©BEIESP DOI: 10.35940/ijeat.B5116.129219 ABSTRACT---Lung Cancer is the second most recurrent cancer in both men and women and which is the leading cause of cancer death worldwide. The American cancer Society (ACS) in US estimates nearly 228,150 new cases of lung cancer and 142,670 deaths from lung cancer for the year 2019. This paper proposes to build an ontology based expert system to diagnose Lung Cancer Disease and to identify the stage of Lung Cancer. Ontology is defined as a specification of conceptualization and describes knowledge about any domain in the form of concepts and relationships among them. It is a framework for representing shareable and reusable knowledge across a domain. The advantage of using ontology for knowledge representation of a particular domain is they are machine readable. We designed a System named OBESLC (Ontology Based Expert System for Lung Cancer) for lung cancer diagnosis, in that to construct an ontology we make use of Ontology Web Language (OWL) and Resource Description Framework (RDF) .The design of this system depends on knowledge about patient’s symptoms and the state of lung nodules to build knowledge base of Lung Cancer Disease. We verified our ontology OBESLC by querying it using SPARQL query language, a popular query language for extracting required information from Semantic web. We validate our ontology by developing reasoning rules using semantic Web Rule Language (SWRL).To provide the user interface, we implemented our approach in java using Jena API and Eclipse Editor. Keywords: Semantic Web, Ontology, Lung Cancer, RDF, OWL, SWRL, SPARQL. I. INTRODUCTION Lung cancer is the most common type of cancer and constitutes 24% of all cancer related deaths. About 13% of all new cancers are lung cancers and in US One in 16 people will be diagnosed with lung Cancer. Lung cancer occurs due to the uncontrolled growth of abnormal cells present in lungs which causes growth of tumors or lesions that reduces breathing ability of a person. The important identified reasons behind lung cancer disease are Smoking, Exposure to asbestos, Radon , some hazardous chemicals , exposure to continuous air pollution and Genetic factors etc. Many people with lung cancer were detected in advanced stages but not at early stage so the mortality rate is more in lung cancer patients. It is recommended that people who have family histology of Lung cancer and smokers must undergo tests like LCDT etc., periodically to identify lesions or nodules in the lungs and must consult physicians or general practitioners for diagnosing the disease. [8,9] Revised Manuscript Received on 2 December, 2019. J.Sirisha , Research Scholar,Krishna University, Machilipatnam,Andhra Pradesh, India.( Email: siri.jagannadham@gmail.com) Dr. M. Babu Reddy, Head(i/c), Department of Computer Science, Krishna University, Machilipatnam, Andhra Pradesh, India (email m_babureddy@yahoo.com_ In general an expert system is an intelligent system that supports decision making capabilities and solves complex problems through reasoning. Expert systems are Artificial intelligence based computational tools that consist of components like knowledge base and inference engine. Knowledge base consists of facts and rules related to a particular field or domain and Inference engine deduces new facts. Similarly in diagnosing any disease, A Medical expert system is a computer program consisting of knowledge regarding medical domain that provides accurate information about disease diagnosis there by it deduces prognosis and treatment plans etc. In our work we have designed an expert system which can consider the symptoms of patients and nodule size to detect the lung cancer patients in early stage thereby we can improve the perpetuity of a lung cancer patient.[1,2,7] Ontology is a main component of semantic web that refers to the science of describing different kinds of entities and relations among entities in a particular domain. The semantic web Technologies like RDF(Resourse Description Format),OWL(Ontology Web language) enables the machine to understand the knowledge stored in an ontology and do the complex work involved in searching , sharing and merging the information on the web. Building an Ontology based Expert system in medical domain is very much needed in present days because medical knowledge is increasingly more composite and uncontrollable.[13]. In this paper we are concentrating on expert system in medical domain and use of ontology as a computational aid by applying semantic aspects for generating rules to diagnose and identify the stage of disease. Here we developed a knowledge base related to lung cancer i.e., lung cancer ontology so that the people will know the details regarding this disease as their initial medical assistance .In general for detecting any type of cancer the physicians or medical experts depends on image analysis of lung nodules where images can be analyzed by radiologists using the scan reports and size of the nodule or nodules in the image which can be obtained through various types of scans like Computer Tomography(CT) and Magnetic Resonance Imaging(MRI).In our work, by taking the size of the nodule from image reports and some important symptoms into consideration our lung cancer ontology expert system (OBESLC) detects lung cancer patients and also identify the stage of patient using staging system by formulizing rules using SWRL language. An Ontology Based Expert System for Lung Cancer : OBESLC J.Sirisha, M. Babu Reddy An Ontology Based Expert System for Lung Cancer : OBESLC 4623 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number: B5116129219/2019©BEIESP DOI: 10.35940/ijeat.B5116.129219 The following diagram depicts the concepts ,data properties and individuals created for OBESLC system using protégé tool Fig III.b Data Properties in OBESLC Fig III.c Concepts in OBESLC Fig III.d Individuals created for the class LungCancer Phase 3: Rule generation using Semantic Web Rule Language(SWRL) The process of generating rules is very much important for reasoning the developed ontology using “Semantic Web Rule language” (SWRL). In our lung cancer ontology reasoning makes the process of diagnosis and detecting the Stage of patient very effectively and efficiently. Here we developed many rules for identifying whether the patient has Lung Cancer or not, If so detecting the stage of Lung Cancer etc. To generate rules „SWRL tab‟ must be incorporated through SWRL plug-in in Protégé Tool. [26] In the OBESLC system First Rule is used to diagnose whether the patient has Lung Cancer or not by considering the symptoms of patient and nodule size in the lung. Here as the disease was identified mainly based on symptoms in the initial diagnosis process we should consider symptoms and the reasons for those symptoms also. So we have divided the Rule1 into four parts as Rule1a,Rule1b, Rule1c, Rule1d where Rule1 will be completed with a combination of Rule1d and one among Rule1a,Rule1b ,Rule1c.The second rule is used to identify the stage of lung cancer according to TNM system, The third rule categorizes the lesion as benign or malignant which is combination of rules to identify the lesion category according to its diameter. The Fourth rule is also combination of several rules allows us to suggest the treatment according the stage of lung cancer patient. Examples of some rules generated for OBESLC System: Rule1 - > Lung Cancer Detection Symptom checking to identify Lung cancer patient Rule1a : hasWeightLoss(?x, true) ^ hasSymOfCPorPCorCWB(?x, true) ^ hashabitOfEitherSorA(?x, true) ^ Patient(?x) -> probabilityOfHaving(?x, LungDesease) Rule1b : HasFamilyHistoryofCancer(?x, true) ^ hasWeightLoss(?x, true) ^ hasSymOfCPorPCorCWB(?x, true) ^ Patient(?x) -> probabilityOfHaving(?x, LungDesease) Rule 1c: hasWeightLoss(?x, true) ^ hasExposureOfAorRorCAP(?x, true) ^ hasSymOfCPorPCorCWB(?x, true) ^ Patient(?x) -> probabilityOfHaving(?x, LungDesease) To Diagnose Lung Cancer Patient Rule1d: Patient(?p) ^ probabilityOfHaving(?p, LungDesease) ^ Nodule(?n) ^ hasNoduleSize(?n, ?s) ^ swrlb:greaterThan(?s, 2) -> hasDesease(?p, LC) Rule2 -> To Identify Stage of Lung cancer hasSpreadBV(?x, false) ^ hasT(?p, "T1") ^ hasN(?p, "N0") ^ hasSpreadOP(?p, false) ^ hasSpreadLN(?p, false) ^ hasM(?p, "M0") ^ LungCancer(?p) -> classifiedInto(?p, StageIA) International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-9 Issue-2, December, 2019 4624 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number: B5116129219/2019©BEIESP DOI: 10.35940/ijeat.B5116.129219 Rule3 -> To categorize the lesion as benignant or malignant Patient(?p)^Nodule(?x)^foundIn(?p,?x)^hasDiameter(?x,? s)^swrlb:greaterThan(?s,8) ^swrlb:lessThan(?s,15)->hasCategory(?x,LRc4A) Rule 4 -> Required Treatment according to the Lung Cancer Stage LungCancer(?x) ^ classifiedInto(?x, StageIA) -> treatedBy(?x, Surgery) III. IMPLEMENTATION& RESULTS To implement our proposed system we use different types of programming languages and computing software like Protégé Editor is used construct the medical ontology of OBESLC system and to update RDF (Resourse Description Framework) files. we make use of Jena API which is a programming toolkit that depends on java programming language to interact with our OBESLC system. Eclipse IDE was used to develop graphical demonstrators for the purpose of user interaction . SWRL language is used for reasoning the ontology by generating rules and to query the developed ontology we make use of SPARQL. This type of implemented system is needed by clinicians, General practioners and medical students so that they can have initial knowledge base of lung cancer disease and can easily diagnose the Lung cancer disease ,Stage identification and suggest Treatment plans to the patients as a part of their initial screening process. Sample rule execution of OBESLC system in Protégé Tool Executing a rule in protégé involves the steps like i) Exporting OWL axioms into rule engine ii) Execution of rule using rule engine iii)Translate the inferred axioms into OWL model. If we perform all these actions successfully we can able to impose our rules into the ontology and draw inferences according to our proposed rules Fig IV.a Rule1d execution in protégé using rule engine and transferring it into OWL Knowledge Fig IV.b Rule2 execution in protégé using rule engine and transferring it into OWL Knowledge Fig IV.c Rule4 execution in protégé using rule engine and transferring it into OWL Knowledge Validating OBESLC system by Sample SWRL Rule extraction Using developed demonstrators in Jena with Eclipse editor: After executing and trasfering any rule into OWL Model, we can extract the same using user interface demonstrators by providing communication between Owl file generated by Protégé Tool and Jena API with Eclipse editor. Here we have shown results of SWRL rules in the developed interfaces for Lung cancer Treatment (d,e,f) ,Lung cancer detection(g) and Lung cancer stage identification(h) respectively. (d) (e) An Ontology Based Expert System for Lung Cancer : OBESLC 4625 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number: B5116129219/2019©BEIESP DOI: 10.35940/ijeat.B5116.129219 (f) (g) (h) Fig IV (d) Semantic representation of concepts before Rule execution e) Semantic representation of concepts after Rule execution f)The Result of the rule for treatment of lung cancer in developed interface (g) The Result of the rule for detecting lung cancer patient in developed interface (h) The Result of the rule for lung cancer stage identification in developed interface. Verifying the OBESLC System using SPARQL : To validate the developed ontology we make use of SPARQL query language which interrogate the ontology and produces the results . SPARQL stands for SPARQL Protocol And RDF Query Language.Querying and extracting information from the knowledge base is an important task in semantic web through which users and applications can interact with data in the ontologies. Here we have presented some sample SPARQL queries through which we can test our rules of OBESLC system by communicating with the developed user interface demonstrators. Query for Treatment : PREFIX rdf: PREFIX owl: PREFIX rdfs: PREFIX xsd: PREFIX lc: SELECT ?Treatment ?CancerStage WHERE { ?CancerStage lc:classifiedInto ?stage. ?CancerStage lc:treatedBy ?Treatment } Query for identifying Stage of Lung Cancer: PREFIX rdf: PREFIX owl: PREFIX rdfs: PREFIX xsd: PREFIX lc: SELECT ?LCancer ?Stage WHERE {?LCancer lc:hasT ?t. ?LCancer lc:hasN ?n. ?LCancer lc:hasM ?m. ?LCancer lc:hasSpreadBV ?x. ?LCancer lc:hasSpreadLN ?y. ?LCancer lc:hasSpreadOP ?z. ?LCancer lc:classifiedInto ?Stage } V. CONCLUSION Developing an expert system using ontology in the field of medical domain produces better results with less complexity. In this paper we have developed OBESLC System (Ontology Based Expert System for Lung Cancer) using state of art semantic web technologies to diagnose, to identify lung cancer stage and to provide treatment plan according the stage of patient. We have successfully implemented(includes verification and validation) the OBESLC system using SPARQL queries ,SWRL rules and used the Apache Jena API along with Eclipse Editor to extract details from ontology and presented the query results to the user with the help of graphical demonstrators. International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 – 8958, Volume-9 Issue-2, December, 2019 4626 Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number: B5116129219/2019©BEIESP DOI: 10.35940/ijeat.B5116.129219 This system is very much help to the General Practioners and clinicians to assess the disease according to the patient symptoms along with scan(CT,MRI,PET etc)reports. OBESLC incorporates knowledge base of Lung cancer disease through which anybody who are not aware of this disease and especially medical students can have an idea regarding the reasons behind lung cancer attack, Types, stages of lung cancer and treatment plans available. This system contains probabilistic rules to diagnose the disease. In future we can enrich the system by incorporating a machine learning technique to learn from results of this system and use these results for further diagnosing. REFERENCES 1. Rawte, Vipula, and Bidisha Roy. "OBESTDD: Ontology based expert system for thyroid disease diagnosis." 2015 International Conference on Nascent Technologies in the Engineering Field (ICNTE). IEEE, 2015. 2. Al-Hamadani, Baydaa Taha, and Raad Fadhil Alwan. "An ontology- based expert system for general practitioners to diagnose cardiovascular diseases." Advances in Computational Sciences and Technology 8.1 (2015): 53-65. 3. NLCA: The National Clinical Lung Cancer Audit (LUCADA)Data Manual (2010), available from: http://www.hscic.gov.uk/lung 4. Sesen, M. Berkan, et al. "Lung cancer assistant: a hybrid clinical decision support application for lung cancer care." Journal of The Royal Society Interface 11.98 (2014): 20140534. 5. Sesen, M. Berkan, et al. "Lung Cancer Assistant: An ontology-driven, online decision support prototype for lung cancer treatment selection." OWL: Experiences and Directions Workshop (OWLED). 2012. 6. Klar, R., and A. Zaiss. "Medical expert systems: design and applications in pulmonary medicine." Lung 168.1 (1990): 1201-1209. 7. https://en.wikipedia.org/wiki/Expert_system 8. https://lungevity.org/for-supporters-advocates/lung-cancer-statistics 9. https://www.cancer.org/content/cancer/en/cancer/lung- cancer/about/key-statistics.html 10. https://www.mayoclinic.org/diseases-conditions/lung- cancer/symptoms-causes/syc-20374620 11. Thomas, Robert F. "The Benefits of expert systems in health care. practical experiences from CATEG05-ES." AIME 89. Springer, Berlin, Heidelberg, 1989. 93-97. 12. Brusa, Graciela, Ma Laura Caliusco, and Omar Chiotti. "A process for building a domain ontology: an experience in developing a government budgetary ontology." Proceedings of the second Australasian workshop on Advances in ontologies-Volume 72. Australian Computer Society, Inc., 2006. ------sparql 13. Noy, Natalya F., and Deborah L. McGuinness. "Ontology development 101: A guide to creating your first ontology." (2001). 14. Ganapathy, Gopinath, and S. Sagayaraj. "To generate the ontology from java source code OWL creation." (2011). 15. Tudorache, Tania, et al. "Supporting collaborative ontology development in Protégé." International Semantic Web Conference. Springer, Berlin, Heidelberg, 2008. 16. Fensel, Dieter, et al. "OIL: An ontology infrastructure for the semantic web." IEEE intelligent systems 16.2 (2001): 38-45. 17. Berners-Lee, Tim, James Hendler, and Ora Lassila. "The semantic web." Scientific american 284.5 (2001): 28-37. 18. Biswas, Dipanwita, et al. "Disease diagnosis system." International Journal of Computer Science & Informatics 1.2 (2011): 48-51. 19. Dehariya, Ashish, et al. "An effective approach for medical diagnosis preceded by artificial neural network ensemble." 2011 3rd International Conference on Electronics Computer Technology. Vol. 1. IEEE, 2011. 20. Wang, Hsien-Tseng, and Abdullah Uz Tansel. "Composite ontology- based medical diagnosis decision support system framework." Communications of the IIMA 13.2 (2013): 4 21. https://radiopaedia.org/articles/lung-rads 22. https://www.acr.org/Clinical-Resources/Reporting-and-Data- Systems/Lung-Rads 23. https://www.cancerresearchuk.org/about-cancer/lung-cancer/stages- types-grades/tnm-staging 24. The 8th lung cancer TNM classification and clinical staging system: review of the changes and clinical implications 25. https://radiologyassistant.nl/chest/lung-cancer-tnm-8th-edition 26. Liu, C-H., et al. "Ontology-based context representation and reasoning using owl and swrl." 2010 8th Annual Communication Networks and Services Research Conference. IEEE, 2010. 27. Sirisha, J., and M. Babu Reddy. "A Prototype Model for Developing Cancer Ontology in Medical Domain." (2017). 28. Knublauch, Holger, et al. "The Protégé OWL plugin: An open development environment for semantic web applications." International Semantic Web Conference. Vol. 3298. 2004. 29. Fernández-López, Mariano, Asunción Gómez-Pérez, and Natalia Juristo. "Methontology: from ontological art towards ontological engineering." (1997). 30. Harrison's Principles of Internal Medicine, 20e by J. Larry Jameson, Anthony S. Fauci, Dennis L. Kasper, Stephen L. Hauser, Dan L. Longo, Joseph Loscalzo ABOUT AUTHORS J.Sirisha, received her B.Tech degree in Computer science and Information Technology from Jawaharlal Nehru Technological University,Hyderabad and M.Tech degree in Computer Science and Engineering from Acharya Nagarjuna University,Guntur. She is working as Asst Professor in the Department of Information Technology, Prasad V. Potluri Siddhartha Institute of Technology, Vijayawada, Andhra Pradesh, India. She has 15 years of teaching experience and currently pursuing Ph.D from Krishna University, Machilipatnam.She hasPublished 14 research papers in various reputed Journals Her areas of interests include Data Mining, Semantic web Mining, and Social Networking. M.BabuReddy, received his Ph.D in Computer Science from Acharya Nagarjuna University,Guntur and M.Tech in Computer Science from GGIST,Patiala . He is presently working as Head(i/c) in the Department of Computer Science, Krishna University, Machilipatnam, Andhra Pradesh, India. He has 20 years of teaching experience .He has Published 62 research papers in various reputed Journals His areas of interests include Software Engineering, Artificial Intelligence, Machine Learning, Data Mining, Semantic web Mining work_2wnvavslprfthlo5orvk5l53du ---- Elsevier Editorial System(tm) for Expert Systems With Applications Manuscript Draft Manuscript Number: Title: An Ontology for Managing Network Services Quality Article Type: Full Length Article Keywords: Ontology; Multiservice IP Networks; Network Service Management; SLA/SLS; Semantics Corresponding Author: Prof. Paulo Carvalho, Ph.D. Corresponding Author's Institution: University of Minho First Author: Carlos Rodrigues, MSc Order of Authors: Carlos Rodrigues, MSc; Solange Rito Lima, Ph.D.; Luis Alvarez Sabucedo, Ph.D.; Paulo Carvalho, Ph.D. Highlights "An Ontology for Managing Network Services Quality" (Full length article) > Managing Internet services and resources urges for a formal and systematic approach. > Semantic description of network services fosters interoperability and self- management. > We propose an ontological model for multiservice IP networks. > Dynamic negotiation and auditing of network service contracts are improved. > An API is provided to allow service management platforms to access ontological contents. Highlights An Ontology for Managing Network Services Quality Carlos Rodrigues Department of Informatics, University of Minho Solange Rito Lima Department of Informatics, University of Minho Luis M. Álvarez Sabucedo Telematic Engineering Department, University of Vigo Paulo Carvalho Department of Informatics, University of Minho Preprint submitted to Expert Systems with Applications April 7, 2011 *Manuscript Click here to view linked References http://ees.elsevier.com/eswa/viewRCResults.aspx?pdf=1&docID=12406&rev=0&fileID=118483&msid={83070562-258D-42A8-8D62-36F27A54CEA1} An Ontology for Managing Network Services Quality Carlos Rodrigues Department of Informatics, University of Minho Solange Rito Lima Department of Informatics, University of Minho Luis M. Álvarez Sabucedo Telematic Engineering Department, University of Vigo Paulo Carvalho Department of Informatics, University of Minho Abstract The evolution of IP networks to a service-oriented paradigm poses new challenges to service providers regarding the management and auditing of network services. The road toward ubiquity, heterogeneity and virtualization of network services and resources urges for a formal and systematic approach to network management tasks. In this context, the semantic characterization and modeling of services provided to users assumes an essential role in fostering autonomic service management, service negotiation and configuration. The semantic and formal description of services and resources is also relevant to assist paradigms such as cloud computing, where a large diversity of resources have to be described and managed in a highly dynamic way. This paper is centered on the definition of an ontology for multiservice IP networks which intends to address multiple service management goals, namely: (i) to foster client and service provider interoperability; (ii) to man- age network service contracts, facilitating the dynamic negotiation between clients and ISPs; (iii) to access and query SLA/SLSs data on a individual or aggregated basis to assist service provisioning in the network; and (iv) to sus- tain service monitoring and auditing. In order to take full advantage of the proposed semantic model, a service model API is provided to allow service Preprint submitted to Expert Systems with Applications April 7, 2011 management platforms to access the ontological contents. This ontological development also takes advantage of SWRL to discover new knowledge, en- riching the possibilities of systems described using this support. Keywords: Ontology, Multiservice IP Networks, Network Service Management, SLA/SLS, Semantics 1. Introduction The evolution of the Internet as a convergent communication infrastruc- ture supporting a wide variety of applications and services poses new chal- lenges and needs to network management, which has to be more focused on managing services instead of network equipment. This approach requires the capability of viewing the network as a large distributed system, offering an encompassing set of services to users. Commonly, the type of service, its Quality of Service (QoS) requirements and other technical and administrative issues are settled between customers and Internet Service Providers (ISPs) through the establishment of Service Level Agreements (SLA). The technological component of this agreement is defined through Service Level Specifications (SLS). SLSs provide a valuable guidance to service deployment on network infrastructures and monitoring of contracts’ compliance. Attending to the ever growing number of home and business customers, contracted services and network heterogeneity, the implementation and management of network services are very demanding tasks for ISPs. Besides the inherent complexity, this process may lead to inefficient policy implementation and poor resource management. In fact, under the current variety of services offered, e.g. IP telephony, 3-play or 4-play solutions, the interaction between service providers and end customers is rigid and rather limitative regarding service negotiation and auditing tasks. For instance, from a user point-of-view, the possibility of a short-term upgrade on his access bandwidth to the Internet or a tight quality control of the subscribed service would be of undeniable relevance. From a service provider perspective, providing this sort of facilities, would clearly improve the level of service being offered, increasing competitiveness and resource management efficiency. These aspects are impelling ISPs to pursue autonomic solutions for service negotiation, configuration and management. Although several proposals exist in the literature toward achieving dy- namic service negotiation and management (D’Arienzo et al., 2004; Sarangan 3 and Chen, 2006; Cheng et al., 2008; Zaheer et al., 2010), the lack of a strong formal ground in addressing these tasks is evident and overcoming it is essen- tial (Atkinson and Floyd, 2004). A formal specification of network services management semantics is required as the building blocks to create reasoning mechanisms to allow developing self-managed ISPs. By using a knowledge based formal framework and an inference engine capable of reasoning over concepts, relations and changes of state, it is possible to create a more flexible and robust ground for specifying and implementing autonomic and adaptive management tasks. As a contribution in this context, this work proposes an ontology speci- fication in the domain of multiservice networks, which formally specifies the contractual and technical contents of SLAs, the network service management processes and their orchestration, promoting service autonomic management and configuration. This model provides support for a Service Management Platform that facilitates client and service provider interoperability, service contracts management including service data querying by the provider and, at some levels, by the client. This is enabled through a developed Service- Model API, which allows the applicational use of the proposed ontology. The multiservice network semantic model is developed in Web Ontology Language (OWL), assisted by the Protégé-OWL tool. The use of Seman- tic Web technologies enhances service management modeling expansiveness and reusability. This paper is structured as follows: research work on ontologies related to service definition and QoS is debated in Section 2; the developed model and each of its modules are presented in Section 3; the way semantics are applied based on the developed API is discussed in Section 4; examples of practical usage of the proposed model are provided in Section 5; the conclusions and future work are included in Section 6. 2. Related work Ontologies are being commonly used to bring semantics to the World- Wide Web (WWW). The WWW Consortium (W3C) developed the Resource Description Framework (RDF) (Brickley and Guha, 1999), a language for encoding knowledge on Web pages to make it understandable to electronic agents searching for information. The Defense Advanced Research Projects Agency (DARPA), in conjunction with the W3C, developed DARPA Agent Markup Language (DAML) by extending RDF with more expressive con- 4 structs aimed at facilitating agent interaction on the Web (Hendler and McGuinness, 2000). More recently, the W3C Web Ontology Working Group developed OWL (Web Ontology Language) (Bechhofer et al., 2004) based on description logic, maintaining as much compatibility as possible with the existing languages, including RDF and DAML. In the context of this work, several research studies focusing on ontologies for network services support and QoS are found within the research commu- nity. QoSOnt (Dobson et al., 2005) is an OWL ontology that centers on com- parative QoS metrics and requirements definition. For the purpose of us- ability and extensibility, the ontology is divided into different layers: the base layer, the domain usage layer, the attribute layer and the units layer. This structure allows replacing layers according to user needs. Although this ontology supplies the correct semantics for matchmaking, this was never demonstrated due to datatype limitations in OWL 1.0. To overcome this problem, a pure XML based solution was used, losing all of the virtues of OWL (Dobson and Sanchez-Macian, 2006). DAML-QoS (Zhou et al., 2004) is a QoS metrics ontology for Web Services (WS) developed in DAML+ OIL, with the aim of integrating the DAML-S framework (which evolved to OWL-S). The ontology is divided in three lay- ers: QoSProfile Layer, QoS Property Definition Layer and QoS Metrics Layer. The applicational scenario is defined in the QoSProfile, where customer in- quiring, QoS advertisement and templates’ definition can take place. The QoS properties domain rules are defined in the QoS Property Layer. The range of domain properties classes are defined in the QoS Metrics Layer. In (Zhou et al., 2005) a new Service Level Objective (SLO) concept, met- rics’ monitoring and statistical calculation semantics are presented. Through comparing SLOs, it is possible to infer that the initial WS performance ob- jectives are being met. For matchmaking bound restrictions, it is used the cardinality constraint of the ontology. The use of cardinality to express bounds upon QoS metrics is a misuse of ontology construction (Dobson and Sanchez-Macian, 2006). The second problem pointed to this model is the inexistent relation between the metrics definition and what they measure (Dobson and Sanchez-Macian, 2006) (Prudencio et al., 2009). MOQ (Kim et al., 2007) is another proposal of a QoS semantics model for WS, but it is not exactly an ontology. It only specifies axioms and does not present a taxonomy structure or a dictionary of concepts. This proposal covers in depth the concepts of composite requirements and service trace- 5 ability. It is divided into requirements ontology, measurement ontology and traceability ontology. The use of axioms allows requirement matching, re- quirement complexity identification, requirements compliance (through the establishment of conformance points) and service activities traceability. MonONTO (Moraes et al., 2008) ontology aims at creating a knowledge base to support a client recommendation system. The ontology serves as a support to a decision recommendation tool by providing high-level infor- mation to the user about the compliance of the network facing the service level demands. This process is primarily accomplished through the match- making of NetworkCharacteristics against ServiceCharacteristics individuals. These individuals are essentially concepts of QoS metrics. Some of the Net- workCharacteristics individuals relates to MeasurementTool individuals for monitoring tools conceptualization. This ontology was designed using OWL and SWRL (Semantic Web Rule Language). In (Alípio et al., 2006), it is proposed an ontology which aims at the au- tomation of network services management and mapping of services’ require- ments into the network. The ontology is viewed from three perspectives: the network service classification, the service level specification, the deploy- ment of network services. The network service categorization and the service level specification concepts follow (Babiarz et al., 2006) and Tequila (Goderis et al., 2003) guidelines. This ontology was developed in Flora-2 (based on F-Logic and Transaction Logic frameworks). Although being a more power- ful language, F-Logic lacks the interoperability and reusability of Semantic Web languages such as OWL. A group of generic ontologies to provide a framework for building SLAs is presented in (Green, 2006). The Unit Ontology contains all the comparable elements on SLA, with the intention of supporting the creation of any type of measurable unit. It also allows the definition of unit supported comparators and the creation of comparison operations. The other examples of avail- able ontologies are: the Temporal Ontology for temporal occurrences such as events and intervals; The Network Units Ontology for units related to telecommunications networks; and the SLA Ontology for basic SLA specifi- cation. Therefore, rather than a QoS ontology, it is proposed a set of reusable ontologies for providing support for other QoS semantic model implementa- tions. In (Royer et al., 2008), it is proposed an SLA ontology covering essentially Authentication, Authoring and Accounting (AAA) issues. It is applied a Profile-based solution, where the authorization for service use is dictated by 6 the user profile. A set of user profiles of a Customer entity are mapped into a set of SLAs. A user profile can be constrained to a defined SLS. For scalability reasons, users with the same user profile are settled into groups. There is also the notion of differentiated SLS and quantitative or qualitative type of metrics. Expressed through OWL, this ontology deepens the concepts of QoS Control Admission, which are not addressed by most of other QoS ontologies. The OWL developed ontology NetQoSOnt (Prudencio et al., 2009) in- tends mainly to be the support of a reasoning tool for service requirements matchmaking. It promotes the definition of SLSs containing quality param- eters that belong to the following levels: the Quality of Experience, the Quality in the Application Level, the Quality in the Network Level and the Quality in the Link Level. NetQoSOnt presents the concept of Layer and a separate module is created for each Layer defined. The layered structure of NetQoSOnt emulates the TCP/IP stack. A Base Module provides the skele- ton for layer creation. For QoS specification units, it is used the Measurement Units Ontology (MUO), a units specialized ontology. In the proposals discussed above, the lack of an unified and encompass- ing approach for semantic modeling of services and corresponding contracts in a multiservice environment is clear. In fact, most of the proposals are more focused on specification of network services metrics than on integrated service management. As mentioned, the focus is mainly on aspects such as: (i) the specification and characteristics of metrics; (ii) the process of met- ric compression and matchmaking; and (iii) description of services through metrics. In the present work, a holistic model for modeling multiservice networks is provided paying special attention to the characterization and auditing of services quality. This ontology focuses on service contracts to assist network services’ implementation by specifying how the defined contract elements are deployed in the network infrastructure, a feature not considered in the re- viewed works. Thus, the present ontology model provides a service contract description involving not only metrics, but other relevant entities to service management and network deployment, providing closer relations between classes of service and service contracts, and between service contracts and network infrastructure. Its modular structure leaves room to model expan- sion and integration with other proposals. 7 3. Multiservice Network Ontology The proposed model is divided in two main modules: the service man- agement module and the network module. As illustrated in Figure 1, these modules are organized as a layered structure where the upper layer has a dependency relation with the lower layer. This structure mimics real life where the management component is, indeed, above the physical network. This formal representation of a network is expressed in formal terms using the support of OWL, following the principles from Methontology (Fernández- López, M. et al., 1997). [Figure 1 about here.] The network module, as stated above, acts as the base layer. It includes concepts of network node, network interface and network equipment config- uration elements related to the implementation of contracted services in the network. The management module covers the domain network service man- agement related to service contracts, including service monitoring rules. This module uses several elements of the network module. Services are categorized by relating them to a type of SLS (Morand et al., 2005; Diaconescu et al., 2003; Goderis et al., 2003). According to recommendations from (Babiarz et al., 2006), ITU Y.1541 and 3GPP standards, current service types include: real-time services, multimedia services, data services, and default traffic ser- vice. Another important component of the proposed service model regards to multiservice monitoring (see Figure 1). This implies the definition of the main monitoring issues to include in the multiservice ontology to assist the auditing of Internet services both from an ISP and customer perspective. To service providers, it will also allow a tight control of services, network resources and related configuration procedures. On top of the Multiservice Ontology, a complete ServiceModel API offers to a Service Management Platform the access to the ontological contents. Without detailing the construction of the ontology at this point, it is relevant to highlight the identification of competence questions. These are the first and the last step in this methodology and fulfill the need to establish the requirements and the outcomes of the ontology itself, i.e., which questions the ontology will be able to answer. In the present case, the definition of an ontology for multiservice IP networks intends to address multiple service- oriented goals. Possible competence questions include: 8 (i) from a customer perspective: Which type of service packs are available for subscription at present? Which is the available bandwidth for a particular service from a specific access point? Is my contracted service being delivery within the negotiated QoS? (ii) from an ISP perspective: At an aggregate level, which is the allo- cated bandwidth for a particular service type? Which are the negotiated parameters per SLS? Which are the configuration parameters on each inter- face of edge network node and the available bandwidth per interface? Which services are supported between specific ingress and egress interfaces? Are the QoE/QoS requirements of a particular service being accomplished? On which points of the network are occurring QoS violations? In the description of modules provided in the sections below, a top-down approach will be followed to allow a broad view of the multiservice ontology. 3.1. Management Module The management module is where service contracts or SLAs are defined and managed. The first concept is the Client which identifies the customer part of the contract and stores all client information. A client is related to at least one SLA which represents a service contract. An SLA can have more than one SLS. The SLS structure, illustrated in Figure 2, follows the recommendations in (Morand et al., 2005; Diaconescu et al., 2003; Goderis et al., 2003), and is briefly described below. [Figure 2 about here.] • SLS Identification: This field identifies the SLS for management pur- poses, being used by both provider and customer. It is composed of a unique SLS id parameter and a Service id parameter, allowing to identify multiple SLSs within the same service. • Scope: The scope specifies the domain boundaries over which the ser- vice will be provided and managed, and where policies specified in a service contract are applied. Normally, SLSs are associated with uni- directional flows between at least one network entry point and at least one exit point. To cover bidirectionality, more than one SLS is associ- ated with a service. The entry points and the exit points are expressed through ingress and egress interfaces, respectively (see Section 3.2). At least two Interfaces (ingress and egress) instances must be specified. 9 The interface identification must be unique and is not restricted to the IP address (the identification can be defined at other protocol layer). • Traffic Classifier: The Traffic Classifier specifies how the nego- tiated service flows are identified for differentiated service treatment. Following Diffserv terminology, multifield (MF) classification and be- havior aggregate (BA) classification are supported (see Section 3.2). Usually, BA classification takes place over previously marked traffic, e.g. in network core nodes or, in the case of SLSs, between ISPs. Two traffic classifiers can be specified, an ingress traffic classifier and op- tionally an egress one. The ingress/egress classifier is then applied to each ingress/egress interface within the scope of the SLS. • Traffic Conditioner: This field specifies the policies and mechanisms applied to traffic flows in order to guarantee traffic conformance with the traffic profiles previously specified. Traffic conditioning occurs after traffic classification, so there is always a relation between the traffic classifier and the traffic conditioner specified within a SLS. An unlimited number of TrafficConditioner instances can be spec- ified. As in the traffic classifier property, the conditioners are divided into ingress and egress depending on their role. The ingress/egress conditioner is articulated with the ingress/egress classifier on each in- terface defined in the SLS scope as an ingress/egress QoS policy. This property is not mandatory. • Performance Guarantees: The Performance Guarantee fields specify the guarantees of service quality and performance provided by the ISP. Four quality metrics are considered: delay, jitter, bandwidth and packet loss, expressed through instances of the Bandwidth, Delay, Jitter and PacketLoss Metric subclasses. The definition of at least one instance of these Metric subclasses is mandatory, except on the Default Service type of SLS. Whenever there is a performance guarantee specification, a traffic conditioning action must also be specified. Delay and jitter are usually specified by their maximum allowed value or by a pair consisting of a maximum upper bound and a quantile. Packet loss (edge-to-edge) is represented by the ratio between the packet loss detected at the egress node and the number of packets sent at ingress node. Instead of quantitative, quality and performance parameters can also be specified in a qualitative manner. 10 • Reliability: The Reliability is usually specified by the mean downtime (MDT) and by the maximum allowed time to repair (TTR). The no compliance of the negotiated parameters may result in a penalty for the ISP. • Service Schedule: The Service Schedule defines the time period of service availability associated with an SLS. While a start date is al- ways specified, an end date is only specified in case of a reservation, ReservedServiceSchedule, in which the client requests the service during a specific period of time. In the default case, StandardService- Schedule, only the service start date is specified, i.e., the contract must be explicitly terminated by the client. • Monitoring: Monitoring refers to SLS’ performance parameters mon- itoring and reporting. For that purpose, a measurement period, a reporting date and a threshold notification are specified. Other pa- rameters such as the maximum outage time, total number of outage occurrences, reporting rules and reporting destination may be speci- fied. • Type of Service: The type of service is described by the Service class. This class allows the definition of services offered by the ISP to cus- tomers from a business-oriented perspective. Offered services are de- scribed through a set of qualitative metrics. The mapping from a qual- itative service description to a quantitative service specification is as- sured by the ISP. The Service class allows to relate the SLS with a specific instance of service offering. It also helps establishing SLS tem- plates on an application level. Services can be offered as a package (e.g. triple or quadruple play services) through the ServicePack class. 3.2. Network Module At present, an ISP is represented as a cloud network, where only edge (ingress and egress) nodes are visible. The abstract representation of domain internal nodes and inherent internal service configuration mechanisms are left for future work. Therefore, instead of representing configuration elements at per-hop level, the model is focused on a per-domain level. In this module (see Figure 3) there are three key elements: [Figure 3 about here.] 11 • Node: The Node class represents a network node (on the current model, corresponds to a domain border node). It is related to a set of Interface class instances. • Interface: The Interface class represents ingress and egress points of the ISP domain. Specifically it allows the mapping of external network interfaces or entry/exit points of ISP border nodes. The interface sup- ports a two-way traffic flow. It is possible to attach layer 2 and layer 3 addresses to the interface concept in order to relate it to a real net- work interface in the ISP domain. Each interface has a total bandwidth capacity and a reserved bandwidth capacity specified dynamically for ingress traffic and egress traffic. For QoS purposes, it is possible to specify a set of QoS policies. In this case, a QoS policy is a relation between a traffic classifier instance and a set of traffic conditioner in- stances applied to traffic classified by the former. A QoS policy can be an ingress policy or an egress policy. The Interface class, as illustrated in Figure 3, is defined by an iden- tifier, link and network layer addresses and total bandwidth capacity both downstream and upstream. It includes two counters for ingress and egress reserved bandwidth of all contracted services applied to this interface. Each instance can be related to a set of QoS policies applied on incoming and outgoing traffic. A boolean value is also defined for interface state indication. [Figure 4 about here.] • Traffic Classifier: The TrafficClassifier class, as represented in Fig- ure 4, has two subclasses: MF and BA. The BA class instances, applied to previously marked traffic, only have one field, a relation with a Mark class instance. The Mark class contemplates all forms of aggregated traffic marking (such as DSCP, IPv6 FlowLabel, MPLS Exp, etc.). The MF class allows the definition of traffic classification rules with multiple fields. There are no constraints on the number of allowed fields and these are divided into: link, network and transport header fields. This means that several types of fields can be used: IPv4 and IPv6 addresses, IPv6 Flow Label, ATM VPI/VCI and MPLS Labels. The fields used in the classification rule are combined through a logic operator rep- resented through the LogicOperator class instances AND and OR. For 12 a more complex classifying rule definition, other TrafficClassifier class instances can be stated as fields, working as nested classification rules. [Figure 5 about here.] • Traffic Conditioner: The traffic conditioner is designed to measure traf- fic flows against a predetermined traffic profile and, depending on the type of conditioner, take a predefined action based on that measure- ment. Traffic conditioning is important to ensure that traffic flows enter the ISP network in conformance with the established service profile. It is also an important policy for handling packets according to their con- formity level facing a certain traffic profile with the purpose of differen- tiating them in the network. According to their features, there are three TrafficConditioner subclasses (see Figure 5): the Marker, Policer and Shaper classes. The policer usually takes an immediate action on packets according to their compliance against predefined traffic pro- file. A Policer class instance must have a set of traffic measurement parameters and at least two levels of actions defined. Three different policers are defined in the current model. The TokenBucketPolicer represents a single rate policer with two level actions (for in profile and out of profile traffic). The SingleRateThreeColorMarker and TwoRateThreeColorMarker are examples of policers with three levels of conformance actions. The Shaper is the only conditioner subclass where no immediate action is taken on traffic flows. Instead, all pack- ets are buffered until traffic profile compliance is verified. The Marker class is a special type of conditioner which performs traffic marking and may be combined with other traffic conditioner elements. 3.3. Multiservice Monitoring Module As illustrated in Figure 1, the aim of the monitoring system to develop is twofold: (i) to monitor and control SLSs parameters in order to ensure that mea- sured values are in conformance with the negotiated service quality levels. This auditing purpose involves a prior characterization of each service re- quirements, monitoring parameters and corresponding metrics, and the defi- nition of appropriate measurement methodologies and tools to report multi- level QoS and performance metrics to users and system administrators; 13 (ii) to measure and control the usage of network resources. This includes the identification of network configuration aspects impacting on services per- formance, namely scheduling and queuing strategies on network nodes. In fact, monitoring network resources and triggering traffic control mechanisms accordingly will allow to maintain consistent quality levels for the supported services and the fulfillment of the negotiated SLSs. The monitoring process should provide measures reflecting the real status of services’ performance without introducing significant overhead or interfer- ing with network operation. Therefore, measurements have to be accurate, fast and carried out on a regular basis, while minimizing intrusion. For im- proving monitoring scalability, network services with identical QoS require- ments should be grouped and monitored as an aggregate, minimizing specific or customer dependent information. Another main concern of this task is to congregate users and ISPs perspec- tives regarding the description and control of services quality. This means that the perceived service quality for users (Quality of Experience - QoE), commonly expressed through subjective parameters, has to be identified and mapped into objective and quantifiable QoS parameters, able to be effec- tively controlled by network service providers. Therefore, the articulation of QoE and QoS, and the identification of appropriate measurement methodolo- gies for evaluating and controlling service quality levels in both perspectives (users and ISPs) is a main concern to cover in the present module. In this context, multiservice monitoring is expected to provide a clear identification and layering of all monitoring issues to include in the multi- service ontology to assist auditing and control of negotiated service levels through the proposed Service Management Platform. 3.4. The VoIP Service as Example As mentioned before, a service provider may describe each provided ser- vice through a set of qualitative metric values, which are then mapped to quantitative values to assist, for instance, configuration and service control. For example, a VoIP type of service can be described as: VoIP Service Bandwidth: Low_Bandwidth = at least 1 Mbps Delay: VeryLow_Delay = at most 100 ms Jitter: VeryLow_Jitter = at most 50 ms Packet Loss: VeryLow_Loss = at most 0.001 % of lost packets 14 This type of service description is used for SLS classification in accordance with the specified metrics. In other way, SLSs can be built based on the type of service description when required. An example of an SLS instance for the VoIP service is shown in Figure 6. [Figure 6 about here.] When the SLS instance is set, the TrafficClassifer and TrafficConditioner specified lead to QoS policy instances. A relation is then established between each QoS policy and network interfaces instances specified in the scope of the SLS (Figure 6). This policy information is useful for automating the deploy- ment of QoS mechanisms in the ISP network infrastructure. By establishing relations among all these entities, a change in one of them affects all other related entities. For example, a change in an SLS parameter is spread through all the corresponding SLS configurations in the network infrastructure. 4. Applying semantics This section discusses how the presented model is converted into an on- tological support. Thus, the characterization of the multiservice domain can be used in further software solutions ranging from web contents to complex software agents responsible for decision making. This ontology was devel- oped according to the basis proposed by Methontology (Fernández-López, M. et al., 1997). In this way, it is guaranteed its conformance with a set of methodological rules and the final product can be traced to its origin and reused in a simple and cost-effective manner. The proposed ontology provides the main concepts and properties re- quired to describe multiple services levels and corresponding quality in a network domain. For its implementation, according to the terminology pro- posed in Methontology, it was used Protégé to generate the OWL represen- tation. This representation uses not only classes and properties, but it also includes restrictions on the values of the previous ones. Therefore, it is en- sured the conformance of current contents and future pieces of information to the established parameters of the system. Besides per-class restrictions, a set of general rules are defined for es- tablishing new rule-based relations between individuals. These rules are ex- pressed using SWRL and they are applied to check information in order to 15 discover new possible instances and properties within the system. So far, there are defined rules for: • validation of interfaces capacity included on a contract scope; • compliance verification of monitored metrics in relation to service con- tract specifications; • changing interfaces network status; • qualitative classification of performance metrics; • classification of SLSs according to the type of service. For example, the following rule states that if all SLS performance metrics have qualitative values matching a definition of a type of service then the SLS specifies a service of that type. SLS(?sls)∧Service(?service) ∧ includesBandwidth(?sls, ?bandwidth) ∧ includesBandwidthQualV alue(?service, ?qualiBandwidth) ∧ includesDelay(?sls, ?delay) ∧ includesDelayQualV alue(?service, ?qualiDelay) ∧ includesJitter(?sls, ?jitter) ∧ includesJitterQualV alue(?service, ?qualiJitter) ∧ includesLossQualV alue(?service, ?qualiLoss) ∧ includesPacketLoss(?sls, ?loss) ∧ includesQualitativeV alue(?bandwidth, ?qualiBandwidth) ∧ includesQualitativeV alue(?delay, ?qualiDelay) ∧ includesQualitativeV alue(?jitter, ?qualiJitter) ∧ includesQualitativeV alue(?loss, ?qualiLoss) → definesSLS(?service, ?sls) Additional rules are defined for the above mentioned issues. Nevertheless, for other purposes, it is suggested to define rules at application level due to the complexity and limitations of SWRL (Zwaal et al., 2006) at using knowledge from different sources and involving advanced logical checks. On the top of the provided ontology, it is developed a complete software API. This API, referred as the ServiceModel API, is implemented following the diagram presented in Figure 7. 16 [Figure 7 about here.] The Jena Framework (McBride, 2002) plays a major role in the devel- oped software. It provides support for working with RDF and OWL based archives. The handling of OWL entities (classes, individuals and restrictions) is provided by the Jena Ontology API. Recall that the ontological content can be accessed from the local computer or from a remote server. The Pellet (Sirin et al., 2007) engine is the reasoner used due to its SWRL (Horrocks et al., 2004) support. Working on top of the Jena framework, the Jena beans API binds RDF resources (in this case, OWL classes) to Java beans simplifying the process of Java-to-RDF conversion. This feature enables users to work with individuals as Java objects. The persistence of the knowledge is guaranteed by the TDB (Owens et al., 2008) technology, which is clearly simpler and more efficient than the SDB solution (uses SQL databases for storing RDF datasets). However, the API integration of an SDB solution is not totally abandoned. The ServiceModel APl intents to assist future projects in several goals: (i) to foster client and service provider interoperability; (ii) to manage network service contracts, facilitating the dynamic negotiation between clients and ISPs; (iii) to access and query SLA/SLSs data on a individual or aggregated basis to assist service provisioning in the network; and (iv) to assist service monitoring and auditing. Therefore, this API, aimed to sustain further de- velopments, supports in a straight-forward manner for software developers the following features: • the insertion and removal of information on the Knowledge Base (cre- ating/destroying individuals); • the validation of the Knowledge Base information (classification and realization); • establishment of more complex rules based relations (not possible through SWRL); • Knowledge Base querying. This feature is implemented through SPARQL (Prud’hommeaux and Seaborne, 2008) and the ARQL Jena API; • Knowledge Base persistence. 17 5. Practical Application Once the semantic model for fully describing SLAs/SLSs is set, services on their top can be provided. Semantics is not, in general, an end by itself. On the contrary, its use is motivated by achieving further goals. In the case of this proposal, it is generated a framework to boost interoperability and advanced data mining features. Bearing that in mind, services were derived as proof-of- concept. Firstly, it is suggested a RDFa support to introduce annotations on xhtml descriptions about SLAs and SLSs. Afterwards, it is suggested the use of a semantic engine to recover information from a repository of information regarding the formers. RDFa (Birbeck and Adida, 2008) is a semantic technology that empowers users to include semantic annotations on XML (or xhtml) contents. These annotations are invisible for human user but easily recovered for software agents using GRDDL (Connolly, Editor). It is important to keep in mind that both technologies are official recommendations from W3C. Taking as a basis the provided OWL model for describing the system, annotations can be included in xhtml describing SLAs and SLSs. For the sake of clarity, it is included the following example: 1

2 The provided connection under the i n t e r f a c e 3 Service1 , 4 provides a t o t a l bandwidth of 5 6 100 Mbps. 7

As shown in this xhtml snippet, a network interface and some of its properties are described. This definition of capacities of the Network Module can be directly recovered using GRDDL. The use case expected for this functionality is related to the web pages of ISPs. Service providers, when offering their services, can include this information into their web pages. Users will be able to recover this information through software agents on their behalf, and include it into a data repository for further decisions. Once these pieces of information are included in a Semantic Database, re- gardless of its origin, either from GRDDL extraction or from other sources, it is possible to get added-value services. Using SPARQL Queries (Prud’hommeaux 18 and Seaborne, 2008), for example, it is possible to locate SLAs/SLSs fulfill- ing specific properties the user is interested in. The only requirement is to identify the graph matching the desired properties and implement the corre- sponding SPARQL query. Authors successfully tested this feature by means of the API provided. It is actually rather simple to deploy a software tool that looks for, for instance, the cheapest SLA in the market or the one offering the fastest network access, among other features. 6. Conclusions and Future Work This paper has presented an innovative approach to the development of a semantic model in the domain of multiservice networks. This model for- mally specifies concepts related to service and SLS definition, network service management, configuration and auditing, creating the reasoning mechanisms to ground the development self-managed ISPs. Although being conceptually aligned with the differentiated service model, the solution is generic without being tied to a specific QoS paradigm. The usefulness of the present semantic service modeling has been pointed out for multiple applications in the context of multiservice management. In particular, aspects such as dynamic service negotiation between service provides and end customers, and auditing of Internet services being provided may be strongly improved as consequence of using the proposed ontology. Possibilities and features from this ontology are also presented to soft- ware developers by means of a ServiceModel API. The functionality within this library can be used for the above mentioned goals. Due to the modular schema for this software component, its inclusion on future projects consti- tutes a simple task that will provide a useful support in further developments. References M. D’Arienzo, A. Pescapè, G. Ventre, Dynamic Service Management in Het- erogeneous Networks, Journal of Network and System Management 12 (2). V. Sarangan, J. Chen, Comparative Study of Protocols for Dynamic Service Negotiation in Next Generation Internet, IEEE Communications Magazine 44 (3) (2006) 151–156. Y. Cheng, A. Leon-Garcia, I. Foster, Toward an Autonomic Service Man- agement Framework: A Holistic Vision of SOA, AON, and Autonomic 19 Computing, IEEE Communications Magazine 46 (5) (2008) 138–146, URL http://dx.doi.org/10.1109/MCOM.2008.4511662. F.-E. Zaheer, J. Xiao, R. Boutaba, Multi-provider Service Negotiation and Contracting in Network Virtualization, in: IEEE NOMS’10, 471–478, 2010. E. Atkinson, E. Floyd, IAB Concerns and Recommendations Regarding Internet Research and Evolution, RFC 3869, Internet Engineering Task Force, URL http://www.rfc-editor.org/rfc/rfc3869.txt, 2004. D. Brickley, R. Guha, Resource Description Framework (RDF) Schema Spec- ification, URL http://www.w3.org/TR/rdf-schema,W3C, 1999. J. Hendler, D. McGuinness, DARPA Agent Markup Language, IEEE Intel- ligent Systems 15 (6). S. Bechhofer, F. van Harmelen, J. Hendler, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider, , L. A. Stein, OWL Web Ontology Language Refer- ence, W3C, 2004. G. Dobson, R. Lock, I. Sommerville, QoSOnt: a QoS Ontology for Service- Centric Systems, in: EUROMICRO ’05: Proceedings of the 31st EUROMI- CRO Conference on Software Engineering and Advanced Applications, IEEE Computer Society, Washington, DC, USA, ISBN 0-7695-2431-1, 80– 87, URL http://dx.doi.org/10.1109/EUROMICRO.2005.49, 2005. G. Dobson, A. Sanchez-Macian, Towards Unified QoS/SLA Ontologies, in: SCW ’06: Proceedings of the IEEE Services Computing Workshops, IEEE Computer Society, Washington, DC, USA, ISBN 0-7695-2681-0, 169–174, URL http://dx.doi.org/10.1109/SCW.2006.40, 2006. C. Zhou, L.-T. Chia, B.-S. Lee, DAML-QoS Ontology for Web Services, in: IEEE International Conference on Web Services, IEEE Computer So- ciety, Los Alamitos, CA, USA, ISBN 0-7695-2167-3, URL http://doi. ieeecomputersociety.org/10.1109/ICWS.2004.1314772, 2004. C. Zhou, L.-T. Chia, B.-S. Lee, QoS Measurement Issues with DAML-QoS Ontology, in: IEEE International Conference on E-Business Engineering, IEEE Computer Society, Los Alamitos, CA, USA, ISBN 0-7695-2430-3, 395–403, URL http://doi.ieeecomputersociety.org/10.1109/ICEBE. 2005.100, 2005. 20 A. C. Prudencio, R. Willrich, M. Diaz, S. Tazi, Quality of Service Spec- ifications: A Semantic Approach, IEEE International Symposium on Network Computing and Applications (2009) 219–226URL http://doi. ieeecomputersociety.org/10.1109/NCA.2009.36. H. M. Kim, A. Sengupta, J. Evermann, MOQ: Web services ontologies for QoS and general quality evaluations, Int. Journal of Metadata, Semantics and Ontologies 2 (3) (2007) 195–200, ISSN 1744-2621, URL http://dx. doi.org/10.1504/IJMSO.2007.017612. P. Moraes, L. Sampaio, J. Monteiro, M. Portnoi, MonONTO: A Domain Ontology for Network Monitoring and Recommendation for Advanced In- ternet Applications Users, in: Network Operations and Management Sym- posium Workshops, IEEE NOMS 2008, 116–123, URL http://dx.doi. org/10.1109/NOMSW.2007.21, 2008. P. Alípio, J. Neves, P. Carvalho, An Ontology for Network Services., in: International Conference on Computational Science (3), 240–243, URL http://dx.doi.org/10.1007/11758532_33, 2006. J. Babiarz, K. Chan, F. Baker, Configuration Guidelines for DiffServ Ser- vice Classes, RFC 4594 (Informational), URL http://www.ietf.org/ rfc/rfc4594.txt, 2006. D. Goderis, Y. T’joens, C. Jacquenet, G. Memenious, G. Pavlou, R. Egan, D. Griffin, P. Georgatsos, L. Georgiadis, P. V. Heuven, Service Level Spec- ification Semantics, Parameters, and Negotiation Requirements, Internet- Draft, drafttequila-sls-03.txt, 2003. L. Green, Service level agreements: an ontological approach, in: ICEC ’06: Proceedings of the 8th international conference on Electronic commerce, ACM, New York, NY, USA, ISBN 1-59593-392-1, 185–194, URL http: //doi.acm.org/10.1145/1151454.1151490, 2006. J. C. Royer, R. Willrich, M. Diaz, User Profile-Based Authorization Poli- cies for Network QoS Services, in: IEEE International Symposium on Network Computing and Applications, IEEE Computer Society, Los Alamitos, CA, USA, ISBN 978-0-7695-3192-2, 68–75, URL http://doi. ieeecomputersociety.org/10.1109/NCA.2008.39, 2008. 21 Fernández-López, M., Gómez-Pérez, A, Juristo, N., METHONTOLOGY: From Ontological Art Towards Ontological Engineering., Symposium on Ontological Art Towards Ontological Engineering of AAAI. (1997) 33–40. P. Morand, M. Boucadair, R. E. P. Levis, H. Asgari, D. Griffin, J. Griem, J. Spencer, P. Trimintzios, M. Howarth, N. Wang, P. Flegkas, K. Ho, S. Georgoulas, G. Pavlou, P. Georgatsos, T. Damilatis, Mescal D1.3 - Final Specification of Protocols and Algorithms for Inter-domain SLS Manage- ment and Traffic Engineering for QoS-based IP Service Delivery and their Test Requirements , Mescal Project IST-2001-37961, 2005. A. Diaconescu, S. Antonio, M. Esposito, S. Romano, M. Potts, Cadenus D2.3 - Resource Management in SLA Networks, Cadenus Project IST- 1999-11017, 2003. H. Zwaal, M. Hutschemaekers, M. Verheijen, Manipulating context informa- tion with SWRL, A-MUSE Deliverable D3.12, 2006. B. McBride, Jena: a Semantic Web Toolkit, IEEE Internet Computing 6 (6) (2002) 55–59, ISSN 1089-7801, URL http://dx.doi.org/10.1109/MIC. 2002.1067737. E. Sirin, B. Parsia, B. Grau, A. Kalyanpur, Y. Katz, Pellet: A practical OWL-DL reasoner, Journal of Web Semantics 5 (2) (2007) 51–53, ISSN 1570-8268. I. Horrocks, P. F. Patel-Schneider, H. Boley, S. Tabet, B. Grosof, M. Dean, SWRL: A Semantic Web Rule Language Combining OWL and RuleML, W3C Member Submission, W3C, URL http://www.w3.org/Submission/ SWRL, 2004. A. Owens, A. Seaborne, N. Gibbins, mc schraefel, Clustered TDB: A Clus- tered Triple Store for Jena, in: WWW 2009, URL http://eprints.ecs. soton.ac.uk/16974/, 2008. E. Prud’hommeaux, A. Seaborne, SPARQL Query Language for RDF, W3C Recommendation, W3C, URL http://www.w3.org/TR/2008/ REC-rdf-sparql-query-20080115/, 2008. M. Birbeck, B. Adida, RDFa Primer, W3C Note, W3C, URL http://www. w3.org/TR/2008/NOTE-xhtml-rdfa-primer-20081014/, 2008. 22 D. Connolly(Editor), Gleaning Resource Descriptions from Dialects of Lan- guages (GRDDL), W3C Recommendation, W3C, URL http://www.w3. org/TR/grddl, 2007. 23 List of Figures 1 Service model diagram . . . . . . . . . . . . . . . . . . . . . . 25 2 SLS class diagram . . . . . . . . . . . . . . . . . . . . . . . . . 26 3 Interface class diagram . . . . . . . . . . . . . . . . . . . . . . 27 4 Classifier class diagram . . . . . . . . . . . . . . . . . . . . . . 28 5 Conditioner class diagram . . . . . . . . . . . . . . . . . . . . 29 6 SLS example diagram . . . . . . . . . . . . . . . . . . . . . . . 30 7 ServiceModel API structure diagram . . . . . . . . . . . . . . 31 24 Figure 1: Service model diagram 25 -i d : s tr in g -n a m e : s tr in g -e m a il : s tr in g -c o u n tr y : s tr in g -s tr e e tA d d re s s : s tr in g -t e le p h o n e : i n t C li e n t -m e tr ic V a lu e : f lo a t -u n it : s tr in g -u n it C o n v e rs io n : f lo a t -b a s e U n it : s tr in g M e tr ic -i d : s tr in g S L A -i d : s tr in g S L S -s ta rt D a te S e rv ic e S c h e d u le -i n c lu d e s S L A 0 .. * -i n c lu d e d S L A B y 1 -i n c lu d e s S L S 0 .. * -includedSLSBy 1 -i n c lu d e s S c o p e 1 .. * 1 -includesPerformanceGuarantees 1 1 -i s M o n it o re d B y 0 .. * -m o n it o ri z e s S L S 1 -i n c lu d e s S e rv ic e S c h e d u le 1 1 -i d : s tr in g N e tw o rk M o d u le :: T ra ff ic C la s s if ie r -i d : s tr in g N e tw o rk M o d u le :: T ra ff ic C o n d it io n e r -i d : s tr in g -n a m e : s tr in g -i n c lu d e s T o ta lB a n d w id th C a p a c it y : f lo a t -i n c lu d e s R e s e rv e d In g re s s B a n d w id th C a p a c it y : f lo a t -i n c lu d e s R e s e rv e d E g re s s B a n d w id th C a p a c it y : f lo a t -i n c lu d e s L a y e r2 A d d re s s : s tr in g -i n c lu d e s L a y e r3 A d d re s s : s tr in g -i s A c ti v e : b o o l N e tw o rk M o d u le :: In te rf a c e -i n c lu d e s C la s s if ie r 1 .. 2 1 -i n c lu d e s C o n d it io n e r * * -i s In C o n fo rm it y : b o o l M o n it o rS L S -s e rv ic e N a m e : s tr in g S e rv ic e -d e fi n e s S L S 0 .. * 1 -n a m e : s tr in g S e rv ic e P a c k -i n c lu d e s S e rv ic e 1 .. * * -i n c lu d e s P e rf o rm a n c e G u a ra n te e s 11 Figure 2: SLS class diagram 26 -i d : s tr in g -n a m e : s tr in g -i n c lu d e s T o ta lB a n d w id th C a p a c it y : f lo a t -i n c lu d e s R e s e rv e d In g re s s B a n d w id th C a p a c it y : f lo a t -i n c lu d e s R e s e rv e d E g re s s B a n d w id th C a p a c it y : f lo a t -i n c lu d e s L a y e r2 A d d re s s : s tr in g -i n c lu d e s L a y e r3 A d d re s s : s tr in g -i s A c ti v e : b o o l In te rf a c e -i d : s tr in g -n a m e : s tr in g N o d e -i d : s tr in g P o li c y -i d : s tr in g T ra ff ic C la s s if ie r -i d : s tr in g T ra ff ic C o n d it io n e r -i n c lu d e s In te rf a c e 0 .. * -i n c lu d e d In te rf a c e B y 1 -i n c lu d e s In g re s s P o li c ie s 0 .. * 1 -i n c lu d e s E g re s s P o li c ie s 0 .. * 1 N e tw o rk -i n c lu d e s N o d e 0 .. * 1 -includesClassifier 0 .. * 1 -includesConditioner 0 .. * * Figure 3: Interface class diagram 27 Mark -id : string TrafficClassifier BA -includesLayer2AddressSpec : string -includesLayer3AddressSpec : string -includesLayer4AddressSpec : string -includesProtocolID : string MF -in c lu d e s C la s s ific a tio n M a rk 1 * LogicOperator -includesLogicOperator 1 1 -includesNestedClassifier 0..* 1 Figure 4: Classifier class diagram 28 A c ti o n M A R K D R O P -i d : s tr in g T ra ff ic C o n d it io n e r M a rk e r S h a p e r -i n c lu d e s C o n fo rm a n c e L e v e l1 1 -i n c lu d e s O u tP ro fi le L e v e l1 1 -i n c lu d e s C IR : f lo a t -i n c lu d e s C B S : f lo a t -i n c lu d e s E B S : f lo a t S in g le R a te T h re e C o lo rM a rk e r -i n c lu d e s C IR : f lo a t -i n c lu d e s C B S : f lo a t -i n c lu d e s P IR : f lo a t -i n c lu d e s P B S : f lo a t T w o R a te T h re e C o lo rM a rk e r -i n c lu d e s E x c e e d P ro fi le L e v e l1 1 -i n c lu d e s S R : f lo a t -i n c lu d e s B S : f lo a t T o k e n B u c k e tP o li c e r -i n c lu d e s B S : f lo a t T o k e n B u c k e tS h a p e r -i n c lu d e s S R : f lo a t L e a k y B u c k e t T R A N S M IT -i n c lu d e s E x c e e d P ro fi le L e v e l 1 1 P o li c e r -i n c lu d e s In P ro fi le L e v e l1 1 Figure 5: Conditioner class diagram 29 id : s tr in g = S L S 1S L S 1 : S L S id : s tr in g = N 1 I1 n a m e : s tr in g = e th 0 in c lu d e s T o ta lB a n d w id th C a p a c it y : f lo a t = 1 0 0 0 0 0 0 in c lu d e s R e s e rv e d In g re s s B a n d w id th C a p a c it y : f lo a t = 1 0 2 4 in c lu d e s R e s e rv e d E g re s s B a n d w id th C a p a c it y : f lo a t = 0 in c lu d e s L a y e r2 A d d re s s : s tr in g = 0 0 :0 0 :0 0 :0 0 :0 0 :0 1 in c lu d e s L a y e r3 A d d re s s : s tr in g = 1 0 .0 .0 .1 is A c ti v e : b o o l = t ru e N 1 I1 : N e tw o rk M o d u le :: In te rf a c e id : s tr in g = N 2 I1 n a m e : s tr in g = e th 0 in c lu d e s T o ta lB a n d w id th C a p a c it y : f lo a t = 1 0 0 0 0 0 0 in c lu d e s R e s e rv e d In g re s s B a n d w id th C a p a c it y : f lo a t = 0 in c lu d e s R e s e rv e d E g re s s B a n d w id th C a p a c it y : f lo a t = 1 0 2 4 in c lu d e s L a y e r2 A d d re s s : s tr in g = 0 0 :0 0 :0 0 :0 0 :0 0 :0 2 in c lu d e s L a y e r3 A d d re s s : s tr in g = 1 0 .1 .0 .1 is A c ti v e : b o o l = t ru e N 2 I1 : N e tw o rk M o d u le :: In te rf a c e id : s tr in g = C l1 in c lu d e s L a y e r2 A d d re s s S p e c : s tr in g in c lu d e s L a y e r3 A d d re s s S p e c : s tr in g = 1 0 .0 .0 .5 1 in c lu d e s L a y e r4 A d d re s s S p e c : s tr in g = 1 2 3 4 5 in c lu d e s P ro to c o lI D : s tr in g C l1 : N e tw o rk M o d u le :: M F id : s tr in g = T k 1 in c lu d e s S R : f lo a t = 1 0 2 4 in c lu d e s B S : f lo a t = 1 5 0 0 T k 1 : N e tw o rk M o d u le :: T o k e n B u c k e tP o li c e r m e tr ic V a lu e : f lo a t = 1 0 2 4 u n it : s tr in g = b p s u n it C o n v e rs io n : f lo a t = 1 b a s e U n it : s tr in g = b p s B d 1 : B a n d w id th m e tr ic V a lu e : f lo a t = 1 0 0 u n it : s tr in g = m s u n it C o n v e rs io n : f lo a t = 1 b a s e U n it : s tr in g = m s D l1 : D e la y m e tr ic V a lu e : f lo a t = 5 0 u n it : s tr in g = m s u n it C o n v e rs io n : f lo a t = 1 b a s e U n it : s tr in g = m s J t1 : J it te r m e tr ic V a lu e : f lo a t = 0 .0 0 1 u n it : s tr in g = % u n it C o n v e rs io n : f lo a t = 1 b a s e U n it : s tr in g = % P L 1 : L o s s m e tr ic V a lu e : f lo a t = 9 9 u n it : s tr in g = % u n it C o n v e rs io n : f lo a t = 1 b a s e U n it : s tr in g = % R l1 : R e li a b il it y s ta rt D a te = 0 1 /0 1 /1 0 S c h d 1 : S ta n d a rd S e rv ic e S c h e d u le in c lu d e s In g re s s In te rf a c e in c lu d e s E g re s s In te rf a c e in c lu d e s In g re s s C la s s if ie r in c lu d e s In g re s s C o n d it io n e r in c lu d e s P e rf o rm a n c e G u a ra n te e s in c lu d e s S e rv ic e S c h e d u le in c lu d e s R e li a b il it y T k 1 A 1 : N e tw o rk M o d u le :: M A R K D R O P : N e tw o rk M o d u le :: D R O P in c lu d e s In P ro fi le A c ti o n in c lu d e s O u tP ro fi le A c ti o n E F : N e tw o rk M o d u le :: D S C P in c lu d e s R e s u lt in g M a rk id : s tr in g = S L S 1 P S L S 1 P : N e tw o rk M o d u le :: P o li c y in c lu d e s C la s s if ie r in c lu d e s C o n d it io n e r in c lu d e s In g re s s P o li c y Figure 6: SLS example diagram 30 PelletJena Framework JenaBeans TDB ServiceModel API Figure 7: ServiceModel API structure diagram 31 work_2xgnobbhpnbmhmgkqlirqfz66q ---- [PDF] Expert Systems in Clinical Microbiology | Semantic Scholar Skip to search formSkip to main content> Semantic Scholar's Logo Search Sign InCreate Free Account You are currently offline. Some features of the site may not work correctly. DOI:10.1128/CMR.00061-10 Corpus ID: 2239702Expert Systems in Clinical Microbiology @article{Winstanley2011ExpertSI, title={Expert Systems in Clinical Microbiology}, author={T. Winstanley and P. Courvalin}, journal={Clinical Microbiology Reviews}, year={2011}, volume={24}, pages={515 - 556} } T. Winstanley, P. Courvalin Published 2011 Computer Science, Medicine Clinical Microbiology Reviews SUMMARY This review aims to discuss expert systems in general and how they may be used in medicine as a whole and clinical microbiology in particular (with the aid of interpretive reading). It considers rule-based systems, pattern-based systems, and data mining and introduces neural nets. A variety of noncommercial systems is described, and the central role played by the EUCAST is stressed. The need for expert rules in the environment of reset EUCAST breakpoints is also questioned. Commercial… Expand View on ASM cmr.asm.org Save to Library Create Alert Cite Launch Research Feed Share This Paper 77 CitationsHighly Influential Citations 3 Background Citations 19 Methods Citations 12 View All Tables and Topics from this paper table 2 table 3 cefpodoxime Cefoxitin Expert Systems CNS disorder Aminoglycosides cefepime Infections, Hospital Cephalosporins Carbapenems Pneumonia Amoxicillin Antibiotics Interpretation Process Paper hearing impairment Instrument - device Rule (guideline) Scientific Publication Hyperactive behavior Phenotype Confusion Version 77 Citations Citation Type Citation Type All Types Cites Results Cites Methods Cites Background Has PDF Publication Type Author More Filters More Filters Filters Sort by Relevance Sort by Most Influenced Papers Sort by Citation Count Sort by Recency EUCAST expert rules in antimicrobial susceptibility testing. R. Leclercq, R. Cantón, +11 authors G. Kahlmeter Medicine Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2013 422 PDF View 2 excerpts, cites background Save Alert Research Feed Clinical Microbiology Informatics D. Rhoads, V. Sintchenko, C. Rauch, L. Pantanowitz Medicine Clinical Microbiology Reviews 2014 40 Highly Influenced PDF View 6 excerpts, cites background and methods Save Alert Research Feed Clinical Decision Support Tools for Microbiology Laboratory Testing R. Jackups Computer Science 2020 Save Alert Research Feed Expert system in medicine and its application on pulmonary diseases Evren Burşuk, S. Demirci, M. A. Korpinar Medicine 2016 1 Save Alert Research Feed Applications of Artificial Intelligence in Clinical Microbiology Diagnostic Testing K. Smith, H. Wang, +5 authors D. Rhoads Computer Science 2020 2 Save Alert Research Feed Recommendations of the Spanish Antibiogram Committee (COESANT) for selecting antimicrobial agents and concentrations for in vitro susceptibility studies using automated systems. R. Cantón, A. Oliver, +27 authors L. Martinez-Martinez Medicine Enfermedades infecciosas y microbiologia clinica 2019 1 PDF Save Alert Research Feed Antimicrobial Susceptibility Testing Systems J. Karlowsky, S. Richter Medicine 2015 7 Save Alert Research Feed Validation of Antibiotic Susceptibility Testing Guidelines in a Routine Clinical Microbiology Laboratory Exemplifies General Key Challenges in Setting Clinical Breakpoints M. Hombach, P. Courvalin, E. Böttger Medicine Antimicrobial Agents and Chemotherapy 2014 12 PDF Save Alert Research Feed The critical influence of the intermediate category on interpretation errors in revised EUCAST and CLSI antimicrobial susceptibility testing guidelines. M. Hombach, E. Böttger, M. Roos Medicine Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases 2013 30 View 1 excerpt, cites background Save Alert Research Feed Digital microbiology A. Egli, J. Schrenzel, Gilbert GREUB Computer Science, Medicine Clinical Microbiology and Infection 2020 View 1 excerpt, cites background Save Alert Research Feed ... 1 2 3 4 5 ... References SHOWING 1-10 OF 349 REFERENCES SORT BYRelevance Most Influenced Papers Recency [Interpretative reading and quality control of an antibiotic sensitivity test using an expert system. Application to the API ATB system and Enterobacteriaceae]. M. Peyret, J. Flandrois, G. Carret, C. Pichat Computer Science, Medicine Pathologie-biologie 1989 5 View 1 excerpt Save Alert Research Feed A decision support system for microbiology quality control B. Jackson, J. Schwartzman, Deborah E. Zuaro, E. Shultz Computer Science, Medicine AMIA 1997 2 View 1 excerpt, references background Save Alert Research Feed A study of the usage of a decision-support system for infective endocarditis C. Ekdahl, D. Karlsson, O. Wigertz, U. Forsum Medicine Medical informatics and the Internet in medicine 2000 17 View 3 excerpts, references background and methods Save Alert Research Feed Comparison and Evaluation of Osiris and Sirscan 2000 Antimicrobial Susceptibility Systems in the Clinical Microbiology Laboratory A. Nijs, R. Cartuyvels, A. Mewis, V. Peeters, J. Rummens, K. Magerman Biology, Medicine Journal of Clinical Microbiology 2003 33 PDF View 1 excerpt, references methods Save Alert Research Feed [Expert systems and antibiotic sensitivity test]. J. Flandrois, G. Carret Medicine Annales de biologie clinique 1991 2 Save Alert Research Feed Decision support systems for antibiotic prescribing V. Sintchenko, E. Coiera, G. Gilbert Medicine Current opinion in infectious diseases 2008 61 Save Alert Research Feed Potential Impact of the VITEK 2 System and the Advanced Expert System on the Clinical Laboratory of a University-Based Hospital C. Sanders, M. Peyret, +5 authors W. Sanders Medicine Journal of Clinical Microbiology 2001 35 PDF View 2 excerpts, references background Save Alert Research Feed [Bacterio-expert: an integrated system for assisting in the validation of antibiotic sensitivity tests. Retrospective application in 4053 Staphylococcus]. B. Legras, M. Weber, J. Legras, J. Burdin, L. Feldmann Medicine Pathologie-biologie 1991 1 Save Alert Research Feed [An expert system for differentiation of viral meningitis]. N. Janić, J. Kern, S. Vuletic Medicine Lijecnicki vjesnik 1991 1 View 1 excerpt, references background Save Alert Research Feed [Evaluation of ceftazidime treatment in septicemia expert systems]. H. Portier, C. Beuscart Medicine Presse medicale 1988 1 View 1 excerpt, references background Save Alert Research Feed ... 1 2 3 4 5 ... Related Papers Abstract Tables and Topics 77 Citations 349 References Related Papers Stay Connected With Semantic Scholar Sign Up About Semantic Scholar Semantic Scholar is a free, AI-powered research tool for scientific literature, based at the Allen Institute for AI. Learn More → Resources DatasetsSupp.aiAPIOpen Corpus Organization About UsResearchPublishing PartnersData Partners   FAQContact Proudly built by AI2 with the help of our Collaborators Terms of Service•Privacy Policy The Allen Institute for AI By clicking accept or continuing to use the site, you agree to the terms outlined in our Privacy Policy, Terms of Service, and Dataset License ACCEPT & CONTINUE work_2xtf2gstifhyza6sahun3emrqi ---- Expert System for Computer-assisted Annotation of MS/MS Spectra*□S Nadin Neuhauser‡¶, Annette Michalski‡¶, Jürgen Cox‡, and Matthias Mann‡§ An important step in mass spectrometry (MS)-based pro- teomics is the identification of peptides by their fragment spectra. Regardless of the identification score achieved, almost all tandem-MS (MS/MS) spectra contain remaining peaks that are not assigned by the search engine. These peaks may be explainable by human experts but the scale of modern proteomics experiments makes this impracti- cal. In computer science, Expert Systems are a mature technology to implement a list of rules generated by in- terviews with practitioners. We here develop such an Ex- pert System, making use of literature knowledge as well as a large body of high mass accuracy and pure fragmen- tation spectra. Interestingly, we find that even with high mass accuracy data, rule sets can quickly become too complex, leading to over-annotation. Therefore we estab- lish a rigorous false discovery rate, calculated by random insertion of peaks from a large collection of other MS/MS spectra, and use it to develop an optimized knowledge base. This rule set correctly annotates almost all peaks of medium or high abundance. For high resolution HCD data, median intensity coverage of fragment peaks in MS/MS spectra increases from 58% by search engine annotation alone to 86%. The resulting annotation performance sur- passes a human expert, especially on complex spectra such as those of larger phosphorylated peptides. Our system is also applicable to high resolution collision-in- duced dissociation data. It is available both as a part of MaxQuant and via a webserver that only requires an MS/MS spectrum and the corresponding peptides se- quence, and which outputs publication quality, annotated MS/MS spectra (www.biochem.mpg.de/mann/tools/). It provides expert knowledge to beginners in the field of MS-based proteomics and helps advanced users to focus on unusual and possibly novel types of fragment ions. Molecular & Cellular Proteomics 11: 10.1074/mcp. M112.020271, 1500 –1509, 2012. In MS-based proteomics, peptides are matched to peptide sequences in databases using search engines (1–3). Statisti- cal criteria are established for accepted versus rejected pep- tide spectra matches based on the search engine score, and usually a 99% certainty is required for reported peptides. The search engines typically only take sequence specific back- bone fragmentation into account (i.e. a, b, and y ions) and some of their neutral losses. However, tandem mass spectra— especially of larger peptides— can be quite com- plex and contain a number of medium or even high abun- dance peptide fragments that are not annotated by the search engine result. This can result in uncertainty for the user— especially if only relatively few peaks are annotated— be- cause it may reflect an incorrect identification. However, the most common cause of unlabeled peaks is that another pep- tide was present in the precursor selection window and was cofragmented. This has variously been termed “chimeric spectra” (4 – 6), or the problem of low precursor ion fraction (PIF)1 (7). Such spectra may still be identifiable with high confidence. The Andromeda search engine in MaxQuant, for instance, attempts to identify a second peptide in such cases (8, 9). However, even “pure” spectra (those with a high PIF) often still contain many unassigned peaks. These can be caused by different fragment types, such as internal ions, single or combined neutral losses as well as immonium and other ion types in the low mass region. A mass spectrometric expert can assign many or all of these peaks, based on expert knowledge of fragmentation and manual calculation of frag- ment masses, resulting in a higher degree of confidence for the identification. However, there are more and more practi- tioners of proteomics without in depth training or experience in annotating MS/MS spectra and such annotation would in any case be prohibitive for hundreds of thousands of spectra. Furthermore, even human experts may wrongly annotate a given peak— especially with low mass accuracy tandem mass spectra— or fail to consider every possibility that could have resulted in this fragment mass. Given the desirability of annotating fragment peaks to the highest degree possible, we turned to “Expert Systems,” a well-established technology in computer science. Expert Sys- tems achieved prominence in the 1970s and 1980s and were meant to solve complex problems by reasoning about knowl- From the ‡Department of Proteomics and Signal Transduction, Max-Planck Institute of Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany Received May 5, 2012, and in revised form, July 19, 2012 Author’s Choice—Final version full access. Published, MCP Papers in Press, August 10, 2012, DOI 10.1074/mcp.M112.020271 1 The abbreviations used are: PIF, Precursor Intensity Fraction; FDR, False Discovery Rate; MS/MS, Tandem mass spectrometry; HCD, Higher Energy Collisional Dissociation; PEP, Posterior Error Probability; PDF, Portable Document Format; IM, immonium ion; SC, side chain fragment ion; Th, Thomson. Technological Innovation and Resources Author’s Choice © 2012 by The American Society for Biochemistry and Molecular Biology, Inc. This paper is available on line at http://www.mcponline.org 1500 Molecular & Cellular Proteomics 11.11 edge (10, 11). Interestingly, one of the first examples was developed by Nobel Prize winner Joshua Lederberg more than 40 years ago, and dealt with the interpretation of mass spectrometric data. The program’s name was Heuristic DENTRAL (12), and it was capable of interpreting the mass spectra of aliphatic ethers and their fragments. The hypothe- ses produced by the program described molecular structures that are plausible explanations of the data. To infer these explanations from the data, the program incorporated a the- ory of chemical stability that provided limiting constraints as well as heuristic rules. In general, the aim of an Expert System is to encode knowl- edge extracted from professionals in the field in question. This then powers a rule-based system that can be applied broadly and in an automated manner. A rule-based Expert System represents the information obtained from human specialists in the form of IF-THEN rules. These are used to perform oper- ations on input data to reach appropriate conclusion. A ge- neric Expert System is essentially a computer program that provides a framework for performing a large number of infer- ences in a predictable way, using forward or backward chains, backtracking, and other mechanisms (13). Therefore, in contrast to statistics based learning, the “expert program” does not know what it knows through the raw volume of facts in the computer’s memory. Instead, like a human expert, it relies on a reasoning-like process of applying an empirically derived set of rules to the data. Here we implemented an Expert System for the interpreta- tion for high mass accuracy tandem mass spectrometry data of peptides. It was developed in an iterative manner together with human experts on peptide fragmentation, using the pub- lished literature on fragmentation pathways as well as large data sets of higher-energy collisional dissociation (HCD) (14) and collision-induced dissociation (CID) based peptide iden- tifications. Our goal was to achieve an annotation perform- ance similar or better than experienced mass spectrometrists (15), thus making comprehensively annotated peptide spectra available in large scale proteomics. EXPERIMENTAL PROCEDURES The benchmark data set is from Michalski et al.2 Briefly, E. coli, yeast and HeLa proteomes were separated on 1D gel electrophoresis and in gel digested (16). Resulting peptides were analyzed by liquid chromatography (LC) MS/MS on a linear ion trap - Orbitrap instrument (LTQ Velos (17) or ELITE (18), Thermo Fisher Scientific). Peptides were fragmented by HCD (14) or by CID, but in either case fragments were transferred to the Orbitrap analyzer to obtain high resolution tandem mass spectra (7500 at m/z 400). We scanned tandem mass spectra already from m/z 80 to capture immonium ions as completely as possible. Data analysis was performed by MaxQuant using the An- dromeda search engine (8, 9). Maximum initial mass deviation for precursor peaks was 6 ppm and maximum deviation for fragment ions for both the search engine and for the Expert System was 20 ppm. MaxQuant preprocessed the spectra to be annotated by the Expert System in the same way as it does for the Andromeda search engine: Peaks were filtered to the 10 most abundant ones in a sliding 100 m/z window, de-isotoped and shifted to charge one where possible. From this data, sequence-spectra pairs were selected that had a certainty of identification of 99.99% PIF values (7) larger than 95% and that were sequence unique (more than 16,000 peptides). The Expert System was written in the programming language C#, using the Microsoft .NET framework version 3.5 and the Workflow. Activities library, which contains a rule engine to implement an Expert System (Microsoft Corporation, Redmond, WA). MaxQuant contains the Expert System as an integrated option in its Viewer—the component that allows visualization of raw and anno- tated MS data. MaxQuant can freely be downloaded from www. maxquant.org. It requires Microsoft .NET 3.5, which is either already installed with Microsoft Windows or can be installed as a free Win- dows update. In our group we have implemented the Expert System both on a Windows cluster and in a desktop version. Additionally, we provide an Expert System web server, which can be accessed at 2 Michalski, A., Neuhauser, N., Cox, J., and Mann, M., unpublished data. FIG. 1. Basic concept of the Expert System. A, An Expert System is constructed by interviewing an expert in the domain (here peptide fragmentation and the accumulated literature) and devising a set of rules with associated priority and dependence on each other. The knowledge base contains the rules whereas the rule engine is generic and applies the rules to the data. B, Data are automatically processed following the steps depicted. Expert System for Annotation of MS/MS Spectra Molecular & Cellular Proteomics 11.11 1501 www.biochem.mpg.de/mann/tools/. Although MaxQuant allows the Expert System annotation of arbitrary numbers of MS/MS spectra, the webserver is currently limited to the submission of one MS/MS spec- trum at a time. After upload of a list of peaks with m/z value and their intensities—together with the corresponding peptide sequence—the spectrum with all annotations is displayed. This can then be exported in different graphical formats. RESULTS AND DISCUSSION Construction of the Expert System—Human experts per- form a generic set of tasks when solving problems such as the interpretation of an MS/MS spectrum. These rules have to be codified in the Expert System, mainly in the form of a series of IF-THEN rules. Fig. 1 shows the major steps involved in build- ing and using the Expert System. It is important to acquire all relevant rules to interpret MS/MS spectra as comprehensively as possible. However, to avoid over-annotation leading to false positives (see below), the number of rules and their interactions should not become too large. This balance was struck by evaluating the performance of different set of rules on large data sets in conjunction with human experts. Rules were encoded in a table-like structure, where they could be activated, deactivated or modified. To create the knowledge base, the extent of interactions of the rules also had to be determined—for instance, which combination of neutral losses to allow. After iterative construction of the knowledge base, the rule engine then applied the encoded knowledge to MS/MS spectra and displayed the result to the user (Fig. 1A). The processing steps that are performed on the raw MS and MS/MS spectra are shown in Fig. 1B (see also EXPERIMENTAL PROCEDURES). Note that the workflow is entirely automated and that user interaction is possible but not required. Arbitrary numbers of annotated spectra of pep- tides of interest can be produced as interactive screen images or high resolution, printable PDF files. The Expert System is very fast, and 16,000 spectra can be annotated in less than four hours on a desktop system. The IF-THEN constraints of our Expert System can be di- vided into four major parts (Fig. 2). At first the Expert System calculates any specific backbone fragments (a, b, and y-ion series), the charged precursor ion, the immonium ions as well as side chain fragments in the low-mass region and places them into a queue. In the second part of the workflow every element in this queue is filtered with respect to the actual MS/MS spectrum. Even if there is a peak corresponding to a calculated item in the queue, it may still be filtered out (symbolized by missing annotations after the filter in Fig. 2). For instance, a b1 ion is only allowed in very restricted circumstances. In the third step, neutral losses and internal fragments for the filtered values are calculated and added to the queue. They are then subjected to the same filtering rules as in step 2. Step 3 is iterative, as several subsequent neutral losses may be allowed. In the fourth and last step each potential annotation is given a priority. If there is more than one possible annotation, the one with the highest priority is chosen (i.e. the one that trig- gered the rules with higher priority). However, in this case the Expert System provides a pop-up (or “tool-tip”) containing the other possibility when hovering the mouse over the peak. (This can still happen if the FDR is properly controlled and is then typically caused by two different chemical designations for the same ion; or by different ions with the same chemical composition, such as small internal fragments with different sequence but the same amino acids). Determining a False Discovery Rate for Peak Annotation— Use of a very high threshold for peptide identification (99.99%) ensured that virtually none of the peptides in our collection should be misidentified. However, when building FIG. 2. Work flow of the Expert System. ➀ From the database sequence of the peptide identified by the search engine, a list of possible fragment ions is created. ➁ Peaks from the measured spec- trum are compared with the possible fragments and preliminarily annotated if they pass the rules of the Expert System. ➂ Neutral losses and internal fragments are generated from the candidate, annotated peaks and exposed to the Expert System rules. ➃ Potential conflicts are resolved via the priority of the annotations and peaks are labeled. Note that possible internal fragment ‘CA’ is crossed out because the b2 ion has the higher priority. Expert System for Annotation of MS/MS Spectra 1502 Molecular & Cellular Proteomics 11.11 FIG. 3. Calculation of false discovery rate for peak annotations. A, The upper panels represent a large number of identified MS/MS spectra from which annotated peaks are drawn to form a large peak collection of possible fragment masses. From each identified spectrum in the data set, 10 random fragments are inserted and the number of annotations by the Expert System is counted. This process is repeated 500 times for each peptide. B, Median FDR as determined in A as a function of peptide length distinguished by the mass difference of fragment ion and theoretical mass. The FDR for peak annotation rises with peptide length and is strongly dependent on the mass difference. Box plot at the bottom shows that 50% of the peptides were between 12 and 18 amino acids long. The box plots on the right summarize the range of FDR values regardless of peptide length. C, Graph of the median FDR as a function of peptide length but separated by intensity classes of the false annotated fragment peaks. Most false positives come from the low abundant peaks (blue) rather than the medium (green) or high abundance fragment peaks (yellow). D, Same plot as above but differentiated by the fragment ion type of the false positives. Getting lower number of false positives from regular fragment annotations (blue), compared with internal fragment (green) and neutral loss annotations (yellow). Expert System for Annotation of MS/MS Spectra Molecular & Cellular Proteomics 11.11 1503 the Expert System, we noticed that it was still possible to over-interpret the MS/MS spectra. This was initially surprising to us because our large scale data set had good signal to noise and peaks was only candidates for annotation when their calculated mass was less than 20 ppm from the ob- served mass. The over-interpretation became apparent through conflicting annotations for the same peak, and was typically caused by a combination of rules, such as several neutral losses from major sequence specific backbone or internal ions. Be- cause conflicting or wrong annotations would undermine the entire rational for the Expert System, we devised a scheme to stringently control the false discovery rate for peak annotation. The false discovery rate (FDR) is meant to represent the percent probability that a fragment peak is annotated by FIG. 4. Example spectra before and after Expert System annotation. A, Based on the search engine result, 34% of the fragments by peak intensities and 24% by peak number are explained, whereas the Expert System almost completely annotates the spectrum (for further explanation see main text). Posterior Error Probability (PEP) a statistical expectation value for peptide identification in Andromeda. Apart from the large fraction of a-, b-, and y-ions (pale blue/dark blue/red) and ions with neutral losses (orange), one can find internal fragment ions (purple) and in the low mass region one immonium ion of Isoleucine (green) and a side chain loss from arginine (turquoise). B, Expert System annotation of a phosphorylated peptide. Apart from the internal ions, several phosphorylation-related fragment ions were found. The asterisk (*) denotes loss of H3O4P with a delta mass of 97.9768 from the phosphorylated fragment ion. Expert System for Annotation of MS/MS Spectra 1504 Molecular & Cellular Proteomics 11.11 chance because its mass fits one of the Expert System rules for the peptide sequence. To calculate a proper FDR, we therefore needed to provide a set of background peaks that would represent false positives when they are labeled by the Expert System. Producing realistic background peaks turned out to be far from trivial because they need to have possible masses that can in principle be generated from peptide se- quences and they need to be independent of the sequence of the peptide in question. The principle of our solution to this problem is shown in Fig. 3A. From the large data set under- lying this study, we collect the m/z values of all annotated peaks, except those coming from immonium or side chain ions. They were stored in a large peak collection of several million entries, together with the respective peptide se- quences and the relative intensity of the peak. For each spec- trum in which we wanted to determine the FDR, we then inserted a random set of 10 peaks from the collection, where after we checked if the sequence of the selected peaks was independent from the sequence of the current spectrum. If one of the inserted peaks overlapped with an existing peak, it was discarded. By definition these 10 peaks represent pos- sible peptide fragments and, because they are chosen ran- domly from millions of other peaks, they collectively represent a good approximation to a true background set. This would FIG. 4 —continued Expert System for Annotation of MS/MS Spectra Molecular & Cellular Proteomics 11.11 1505 not be the case for permutation of the sequence of the pre- cursor in question, for instance, because many of the frag- ment peaks in permutated sequences are identical. Whenever the Expert System annotated one of these peaks, it was counted as a false positive. To find the number of repeats necessary to obtain a stable FDR for this procedure, we chose a set of spectra and simulated a thousand times on each one. We found that the FDR was constant after 500 iterations. For the final FDR calculation, for each spectrum we added a different set of 10 random peaks from the collection and repeated this 500 times. This was then applied to each of the more than 16,000 pure (high PIF) spectra in the large scale data set. Beyond providing a solid FDR estimate for each rule set, this procedure also allowed us to identify the rules or rule combinations that were responsible for miss-annotation, i.e. the rules that falsely annotated the inserted peaks. These mostly turned out to be chains of subsequent neutral losses. In conjunction with detailed evaluation of the frequency of ion types, we iteratively designed an optimal rule set (supplemen- tal Table S1). For instance, neutral losses from a particular amino acid were allowed if they occurred in more than five percent of the fragment sequences that contained that amino acid. Likewise, of a set of about 42 possible neutral side chain losses, only six were sufficiently important to retain them in the Expert System. The Figs. 3B–3D show the results of the median FDR as a function of the peptide length based on this final rule set. The overall FDR—indicated in red—is the same in all plots and shows a clear growing trend in the number of false positives with the length of the peptides. For small peptides of 12 amino acids or less, the FDR was less than 2.1% and all peptides in the range investigated had a peak annotation FDR of less than 5%. With these settings, the annotations are correct in more than 97% of the cases for the vast majority of MS/MS spectra. The Expert System could of course be pruned to provide a lower FDR by narrowing the mass tolerance window; however, this would come at the ex- pense of discarding correct annotations. To explore the influ- ence of mass accuracy on potential false positive annotations, we repeated these calculations with required mass deviations no larger than 5 ppm or no larger than 10 ppm. As can be seen in Fig. 3B, this further reduced possible errors to less than 1%, or less than 0.3%, respectively. This highlights the value of high mass accuracy in unambiguously identifying fragment mass identity. Furthermore, peaks with a low signal to noise are more likely to be miss-annotated than more intense peaks. In Fig. 3C we sorted the peak intensity of the false positives into three intensity classes (Fig. 3C). The median FDR of peaks with high or medium abundance are only 0.1 or 0.5%. For low abundance peaks it is higher but still with a median of no more than 2.1%. Next we separately investigated the FDR as a function of peptide length for the different fragment ion types. As can be seen in Fig. 3C, regular ions and internal fragments contribute very little to overall false annotation (0.4 and 0.5%), whereas neutral loss ions are wrongly annotated in 1.8% of the case or even more. FIG. 5. Expert System performance on a large data set. Median sequence coverage by summed fragment ion intensity is plotted as a function of identification score. Statistics is based on more than 16,000 spectra. For every identification score, the Expert System adds a large proportion of explainable peaks. Box plot below the graph indicates that 50% of peptides in the set have an Andromeda score between 98 and 140. Box plots on the right indicate the range of values for the intensity coverage for standard and Expert System annotation. Expert System for Annotation of MS/MS Spectra 1506 Molecular & Cellular Proteomics 11.11 http://www.mcponline.org/cgi/content/full/M112.020271/DC1 http://www.mcponline.org/cgi/content/full/M112.020271/DC1 Performance of the Expert System—Fig. 4 shows an illus- trative example of an HCD fragmented peptide before and after Expert System evaluation. The peptide was identified with an Andromeda score of 136 and posterior error proba- bility (PEP) of 1.1E-21 (the corresponding Mascot score was 83). The spectrum features an uninterrupted b-ion series from b2 to b9 and an uninterrupted y-ion series from y1 to y12, together covering the entire peptide sequence. Despite this unambiguous identification, the peaks used by the search engine to identify the peptide only accounted for 35% of the summed intensity of the peaks in the fragmentation spectrum. Coverage by number of explained peaks was even lower at 24% (allowing up to 10 peaks per 100 Th in the measured spectrum see EXPERIMENTAL PROCEDURES). There is a series of high abundance, high m/z fragments as well as a large number of low abundance peaks in the low and me- dium m/z range that are unexplained by the search engine. After annotation by the Expert System, this situation changes entirely. The high m/z series is revealed to be a prominent loss of CH4SO from oxidized methionine. The low FIG. 6. Web interface for the Expert System. A, Text field to paste the spectrum in text format (m/z value; intensity in arbitrary units). B, Form to enter the peptide sequence, modifications and their positions. C, Detected backbone fragments and their neutral losses are indicated in the peptide logo. Scalable spectrum annotated by the Expert System. Note that neutral loss peaks are very small compared with the major backbone fragments. The spectrum can be downloaded with the desired resolution and in the desired graphical format. Expert System for Annotation of MS/MS Spectra Molecular & Cellular Proteomics 11.11 1507 mass ions are neutral losses, internal fragments and com- binations between them and they were unambiguously and correctly assigned. Altogether, the Expert System ac- counted for almost all prominent ions and explained a total of 88% of the ion current. Manual annotation of this spec- trum would have been possible but would have been very time consuming. Interpretation of phosphorylated peptides, especially large ones, is more difficult than that of unmodified peptides. Fur- thermore, accurate placement of the phosphorylation site can be challenging. We used literature knowledge (19, 20) and the results of a large-scale investigation into the fragmentation of phosphorylated peptides to derive suitable fragmentation rules for the Expert System. This led to an additional six rules, which were easily integrated, illustrating the extensibility of the Expert System. Fig. 4B depicts an example annotation of the relatively complex fragmentation spectra typical of phos- phorylated peptides. The large ion series from the low mass range to about mass 1000 is caused by an extensive and uninterrupted internal ion series starting from the proline in the second position of the peptide sequence. As these internal fragments contain several glutamines, they lead to additional water and ammonia losses. However, there are also newly annotated fragments resulting from neutral losses in addition to loss of the phosphorylation site. Moreover, the neutral loss of HPO3 is annotated. Large-scale Evaluation of the Performance of the Expert System—We used the population of 16,000 spectra with high PIF—identified with a false discovery rate of 0.01% by the search engine—and annotated them automatically using the Expert System. For each spectrum we calculated the intensity coverage obtained by the fragments used by the search en- gine and the fragments explained by the Expert System. Higher scoring fragmentation spectra would be expected to have a larger fraction of their ion current annotatable than lower scoring peptides. Fig. 5A shows a plot of the median of these values for all search engine scores. A total of 95% of these Andromeda scores are within a range of 96 to 138. Here the median intensity coverage by standard annotation varies from 55% at 96 to 64% at 138. The Expert System, in con- trast, annotated between 86 and 89% of the total ion current in the fragment spectra of the same peptides. This repre- sents an average increase of 28%. There was only a small percentage of peptides that were lower scoring than 96 and for these the increased annotation percentage of the Expert System was even larger (34%). Interestingly, even in very high scoring HCD fragment spectra there are still many peaks not directly annotated by the search engine. For these, the average increase of annotated ion current be- cause of the Expert System was still 23%. The rule set of the Expert System was derived from HCD data. However, HCD and CID appear to produce similar ion types, although with different abundances. We therefore tested if the derived rule set was also applicable to high resolution CID data. This was indeed the case, and a total of 85% of the ion current in high resolution CID spectra ex- plained by the Expert System, although in CID spectra a higher percentage (79%) of the peaks are already accounted for by standard ion types. Therefore we conclude that the Expert System can be used equally well for high resolution HCD and CID data although the benefits for CID are not as large as they are for HCD. Webserver for Expert System Annotation of Spectra—The Expert System is now part of the Viewer component of Max- Quant, which is freely available at www.maxquant.org. In this environment, the Expert System can annotate arbitrarily large data sets of identified peptides and visualize and export them in different graphical formats such as PDF. Additionally, we established a webserver to make the Expert System available to any proteomics scientist, regardless of the computational workflow that he or she is using. The webserver is located at http://www.biochem.mpg.de/mann/tools/and its graphical in- terface is shown in Fig. 6. The user needs to supply a mass spectrum in the form of an m/z and peak intensity list as well as the sequence of the identified peptide (Figs. 6A, 6B). Com- mon modifications and their position in the sequence can also be specified. The webserver then provides an annotation of the spectrum within the stated mass tolerance as shown in Fig. 6C. The graph is scalable to enable detailed study of complex fragmentation spectra. Mass deviations in ppm (cal- culated mass – measured mass) can also be depicted. This annotated spectrum can be downloaded in a number of graphical formats for use in publications. CONCLUSION AND OUTLOOK Here we have made use of Expert Systems—a well-known technology in computer science—to automatically but accu- rately interpret the fragmentation spectra of identified pep- tides. We have shown that the Expert System performs very well on high mass accuracy data, annotating the large major- ity of medium to high abundance peaks. For HCD spectra it explains on average 28% more of the peak intensities than the search engine results alone. We derived a rigorous false pos- itive rate, ensuing that less than 5% of peaks can be miss- annotated—this rate is even lower for spectra with at least median scores and fragment ion intensities of at least mod- erate abundance. The rule set was derived by iterative inter- pretation of large HCD data set but we show that the Expert System is equally applicable to high resolution CID spectra. We envision different uses for the Expert System: For be- ginners in MS-based proteomics, it enables efficient training in the interpretation of MS/MS spectra without requiring much input from a specialist. For advanced users, it allows focusing on unusual and potentially novel types of fragments. One caveat is that the Expert System currently cannot explain fragment peaks that belong to cofragmented precursors; a very common occurrence that we deliberately avoided here by selecting only pure MS/MS spectra. This limitation can be Expert System for Annotation of MS/MS Spectra 1508 Molecular & Cellular Proteomics 11.11 addressed if both precursors are identified and communi- cated to the Expert System. Such a feature might be partic- ularly useful for instruments that allow deliberate multiplexing of precursors, which leads to complex MS/MS spectra (21). The Expert System has been in routine use in our laboratory for a number of months. During this time we have found that it provides helpful confirmation of the identification of the peptide and the identity of the previously unlabeled fragment ions. This is particularly welcome in the case of complicated spectra of important peptides, such as the ones regulated in the biological function in question. Compared with a human expert, the principal advantages of the Expert System are its speed, its ability to check for all supplied rules in a consistent manner as well as its rigorously controlled false positive rate. Obviously, the Expert System is limited to the knowledge supplied whereas an experienced mass spectrometrist can go beyond these rules and discover the origin of novel fragmen- tation mechanisms. As we have shown here, Expert Systems can readily be applied to problems in computational proteomics. Given their relative ease of implementation, they may become useful in other areas in MS-based proteomics, too. Acknowledgments—We thank Forest White for critical comments on this manuscript. * This work was supported by funding from the European Union 7th Framework project PROSPECTS (Proteomics Specification in Time and Space, grant HEALTH-F4-2008-201645). □S This article contains supplemental Table S1. ¶ These authors contributed equally. § To whom correspondence should be addressed: Department of Proteomics and Signal Transduction, Max-Planck Institute of Bio- chemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany. E-mail: mmann@biochem.mpg.de. REFERENCES 1. Steen, H., and Mann, M. (2004) The ABC’s (and XYZ’s) of peptide sequenc- ing. Nat. Rev. Mol. Cell Biol. 5, 699 –711 2. Nesvizhskii, A. I., Vitek, O., and Aebersold, R. (2007) Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat. Meth- ods 4, 787–797 3. Granholm, V., and Käll, L. (2011) Quality assessments of peptide-spectrum matches in shotgun proteomics. Proteomics 11, 1086 –1093 4. Houel, S., Abernathy, R., Renganathan, K., Meyer-Arendt, K., Ahn, N. G., and Old, W. M. (2010) Quantifying the impact of chimera MS/MS spectra on peptide identification in large-scale proteomics studies. J. Proteome Res. 9, 4152– 4160 5. Zhang, N., Li, X. J., Ye, M., Pan, S., Schwikowski, B., and Aebersold, R. (2005) ProbIDtree: an automated software program capable of identify- ing multiple peptides from a single collision-induced dissociation spec- trum collected by a tandem mass spectrometer. Proteomics 5, 4096 – 4106 6. Bern, M., Finney, G., Hoopmann, M. R., Merrihew, G., Toth, M. J., and MacCoss, M. J. (2010) Deconvolution of mixture spectra from ion-trap data-independent-acquisition tandem mass spectrometry. Anal. Chem. 82, 833– 841 7. Michalski, A., Cox, J., and Mann, M. (2011) More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC-MS/MS. J. Proteome Res. 10, 1785–1793 8. Cox, J., and Mann, M. (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 9. Cox, J., Neuhauser, N., Michalski, A., Scheltema, R. A., Olsen, J. V., and Mann, M. (2011) Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794 –1805 10. Giarratano, J. C., and Riley, G. (2005) Expert systems: principles and programming. PWS Pub. Co., Boston 11. Liao, S. H. (2005) Expert system methodologies and applications - a dec- ade review from 1995 to 2004. Expert Syst. Appl. 28, 93–103 12. Schroll, G., Duffield, A. M., Djerassi, C., Buchanan, B. G., Sutherland, G. L., Feigenbaum, E. A., and Lederberg, J. (1969) Applications of artificial intelligence for chemical inference. III. Aliphatic ethers diagnosed by their low-resolution mass spectra and nuclear magnetic resonance data. J. Am. Chem. Soc. 91, 7440 –7445 13. Russell, S. J., Norvig, P., and Davis, E. (2010) Artificial intelligence: a modern approach. Prentice Hall, Upper Saddle River, NJ 14. Olsen, J. V., Macek, B., Lange, O., Makarov, A., Horning, S., and Mann, M. (2007) Higher-energy C-trap dissociation for peptide modification anal- ysis. Nat. Methods 4, 709 –712 15. Bin, M., and Johnson, R. (2012) De novo sequencing and homology search- ing. Mol. Cell. Proteomics 11, O111.014902 16. Shevchenko, A., Tomas, H., Havlis, J., Olsen, J. V., and Mann, M. (2006) In-gel digestion for mass spectrometric characterization of proteins and proteomes. Nat. Protoc. 1, 2856 –2860 17. Olsen, J. V., Schwartz, J. C., Griep-Raming, J., Nielsen, M. L., Damoc, E., Denisov, E., Lange, O., Remes, P., Taylor, D., Splendore, M., Wouters, E. R., Senko, M., Makarov, A., Mann, M., and Horning, S. (2009) A dual pressure linear ion trap Orbitrap instrument with very high sequencing speed. Mol. Cell. Proteomics 8, 2759 –2769 18. Michalski, A., Damoc, E., Lange, O., Denisov, E., Nolting, D., Muller, M., Viner, R., Schwartz, J., Remes, P., Belford, M., Dunyach, J. J., Cox, J., Horning, S., Mann, M., and Makarov, A. (2012) Ultra high resolution linear ion trap Orbitrap mass spectrometer (Orbitrap Elite) facilitates top down LC MS/MS and versatile peptide fragmentation modes. Mol. Cell. Pro- teomics 11, 10.1074/mcp.O111.013698 19. Boersema, P. J., Mohammed, S., and Heck, A. J. (2009) Phosphopeptide fragmentation and analysis by mass spectrometry. J. Mass Spectrom. 44, 861– 878 20. Kelstrup, C. D., Hekmat, O., Francavilla, C., and Olsen, J. V. (2011) Pin- pointing phosphorylation sites: Quantitative filtering and a novel site- specific x-ion fragment. J. Proteome Res. 10, 2937–2948 21. Michalski, A., Damoc, E., Hauschild, J. P., Lange, O., Wieghaus, A., Ma- karov, A., Nagaraj, N., Cox, J., Mann, M., and Horning, S. (2011) Mass spectrometry-based proteomics using Q Exactive, a high-performance benchtop quadrupole Orbitrap mass spectrometer. Mol. Cell. Proteomics 10, 10.1074/mcp.M111.011015 Expert System for Annotation of MS/MS Spectra Molecular & Cellular Proteomics 11.11 1509 http://www.mcponline.org/cgi/content/full/M112.020271/DC1 work_2y55lv27fnazbowsoz2jmkrwtu ---- Analysis of prepositions near and away from Fòrum de Recerca. Núm. 22/2017, p. 433-449 ISSN: 1139-5486. DOI: http://dx.doi.org/10.6035/ForumRecerca.2017.22.26 433 Analysis of prepositions: near and away from Frames of reference Nuria Flor Fabregat al081314@uji.es Nuria Flor Fabregat. Analysis of prepositions: near and away from. Frames of reference 434 I. Abstract Traditional strategies and procedures to learn a foreign language include the study of rules of grammar and doing exercises such as filling the gaps, repetition of words, drills, memorization of irregular verbs and sentences which may express usual expressions of everyday life. Even if the array of exercises is adequate, polysemy in prepositions causes difficulties in choosing the proper preposition conveying the meaning required by different contexts. Two prepositions of the horizontal axis (near and away from) are taken into consideration in this paper. Approaching the problem from the theory of polysemy and understanding, the use of these prepositions is explored along the dimensions of function, topology – which is the study of physical space–, and force dynamics – introduced in studies such as Navarro (1998)–, as well as the notion of frame of reference (Levinson, 2004). Then, the different senses and uses of these prepositions of the horizontal axis are systematized, explained and examples are used to illustrate the difficulties in learning a language and the doubts which students may have in some situations. Key words: horizontal directions, landmark, trajector, frames of reference, visual perception. II. Introduction Traditional approaches to the teaching of prepositions in English are often reduced to a series of rules and typical examples, but students often have doubts in applying these rules correctly in different contexts. Handbooks of foreign language usually present grammar through irregularities and expose students to mechanical exercises, and the memorization of phrases and paradigms. According to Langacker (2008), while norms are required to learn grammar, the importance of wonder and curiosity cannot be disregarded. Meanings may be elaborated and constructed with complex expressions such as phrases, clauses and sentences. Communication reflects the basic experience of moving, perceiving and acting on the world. Many students of English as a Foreign Language (EFL) do not find the use of prepositions to be an easy topic. A point in case is the connection between the English prepositions such as in, on and at and the Spanish preposition en. Since the English and Spanish systems overlap in the contents of space relationships in terms of prepositions like these, Spanish EFL students often find problems in Fòrum de Recerca. Núm. 22/2017, p. 433-449 ISSN: 1139-5486. DOI: http://dx.doi.org/10.6035/ForumRecerca.2017.22.26 435 learning their proper use in a context (Navarro, Campoy & Caballero 2001). In my view, as a student but especially recently as a teacher, I have noticed that students of a foreign language do not know which prepositions are the correct option when producing oral or written English (compositions, letters or emails). Although they know about the rules of grammar to follow when dealing with prepositions, they do not master the nature of prepositions, their main senses and contextual uses. Based on their own answers in an exercise on writing compositions, they use their intuition. Practice, therefore, seems the most effective way towards correction. This research focuses specifically on the horizontal directions, near and away from, and will explore their main uses and senses. A previous paper dealt with the kind of practice that students need to improve their understanding of the preposition on: should the learning of rules and drills prevail or should learning rely on a practical approach as a more natural way to learn space relationships by focusing on the nature and contextual uses of prepositions? Therefore, from the perspective of cognitive linguistics and as a continuation of the strand of research explored by other studies of the approach to prepositional polysemy developed by Navarro (1998), I will devise and test this learning approach as regards two prepositions pertaining to the visual point of view. More specifically, I will focus on two prepositions of horizontal directions, near and away from, including examples from dictionaries. The main objectives of my research together with a detailed description of the procedures followed to carry out this study can be found under the objectives subsection and the method section below. Before diving into these sections, and for the purpose of clarity, a brief description of the theoretical bases that support this approach will be provided. III. Theoretical background: frame of reference The notion of frames of reference may be understood as places, the location of objects which they occupy and as the places containing the objects. Aristotle mentioned an example, the conundrum in which the river is the frame of reference and a boat is moving with the river. In the cognitive paradigm, prepositions are considered as particles which relate two elements, these are called the trajector and the landmark, which are referred to as TR and LM in short (Langacker, 1987). The trajector is the most significant entity, it is usually situated before the preposition and it can be changed more easily from one place to another. However, the landmark is the Nuria Flor Fabregat. Analysis of prepositions: near and away from. Frames of reference 436 entity to which the trajector is related. It is situated after the preposition and it is the point of reference for the trajector. A variety of frames of reference can be used when reading some sentences. For instance: I lived near the school (Macmillan dictionary) or I will be away from home for two weeks (Merriam Webster dictionary). It is clear that the sense of distance depends on the landmark and the speaker. Sometimes there is an ambiguity because of the position of the object and the landmark. Nevertheless, a question must be asked by some speakers when position and spatial information are referring from a frame to another direction of movement. There are some distinctions of spatial frames of reference according to Levinson (2004): relative, absolute, regarding space as relations between objects, directions and relations between objects, deictic, intrinsic, which depends on the landmark or visual perception and as well as viewer-centred, object-centred and environment-centred which are centred on the speaker, the object or the environment respectively. When relative space is being considered, it refers to the egocentric coordinate system and when absolute space is being considered, it refers to non-egocentric systems (Kant (1991), as cited by Levinson (2004)). The distinction between egocentric and allocentric refers to the coordinate system centred within the subjective body frame and the second one centred within elsewhere, the geographic orientation which is not often specified. Then, it is related to body-centred and environment-centred frames of reference. As philosophers argued (Campbell 1993), the egocentric frame is joined with body-centred, a speaker and a body-schema in a spatial interaction. Another distinction is based on the theory of vision, in which the notions are viewer-centred and object-centred. As Levinson (2004) pointed out, this theory of vision is the process of the vision of a retinal image to the recognition of an object itself, that is from 2.5D sketch to a model of 3D as a structural description. This distinction is related to the linguistic distinction of deictic and intrinsic perspectives. Then, the deictic perspective would be the viewer- centred, while the intrinsic perspective would be the object-centred. Indeed, there are also notions of orientation, called orientation- bound and orientation-free. The first orientation refers to both absolute and relative frames, while the second one, which is orientation-free, refers to intrinsic frames. In fact, linguists have distinguished deictic and intrinsic frames of reference. There are three different interpretations of deictic and intrinsic in Table 2.I (Levinson 2004). The first one is speaker-centric and non-speaker centric (Levelt 1989). The second one is centred on any of the speech participants and non-centred (Levinson 1983). The third one refers to ternary and binary spatial relations (Levelt 1984, 1996). Fòrum de Recerca. Núm. 22/2017, p. 433-449 ISSN: 1139-5486. DOI: http://dx.doi.org/10.6035/ForumRecerca.2017.22.26 437 Carlson-Radvansky and Irwin (1993: 224, quoted in Levinson 2004: 32) explained frames for spatial relationships among objects. In a viewer-centred frame, objects are represented in a retinocentric, head-centric or body-centric coordinate system based on the perceiver’s perspective of the world. In an object- centred frame, objects are coded with respect to their intrinsic axes. In an environment- centred frame, objects are represented with respect to salient features of the environment [...]. In order to talk about space, vertical and horizontal coordinate axes must be oriented to one of these reference frames so that linguistic spatial terms such as «above» and «to the left of» can be assigned. Thus, the notions of deictic, intrinsic and extrinsic are related to the corresponding linguistic interpretations of viewer-centred, object-centred and environment-centred frames of reference. So, in a spatial representation system, the frame of reference is accepted for a coordination of perception and language. For instance, it is clear that egocentric refers to relative or viewer-centred, 2.5D sketch refers to deictic frame, intrinsic corresponds to object-centred or 3D model, absolute corresponds to environment-centred. Regarding frames of reference in a linguistic view, the notion absolute as the frame of reference is used by many languages such as fixed bearings (West, North). Otherwise some European languages would use the notion of relative or viewer-centred such as left. Essentially, the frames of reference absolute, relative and intrinsic are the main ones in order to describe the horizontal spatial directions. For instance, according to Levinson (2004), some sentences are viewed as a deictic frame: 1. The ball is in front of me. The ball is in front of the tree. (Levinson, 2004). (As speaker-centric.) Then, regarding the prepositions near and away from: 2. I lived near the school (Macmillan dictionary) or I will be away from home for two weeks (Merriam Webster dictionary). (As speaker-centric.) Other sentences are viewed as an intrinsic frame: 3. The ball is in front of the chair. (Levinson, 2004). (As non- speaker-centric, the chair.) 4. The ball is in front of you. (Levinson, 2004). (As non- speaker-centric, the addressee.) 5. The ball is to the right of the lamp, from your point of view. (Levinson, 2004). (As non-speaker-centric, the addressee.) Then, regarding prepositions near and away from: Nuria Flor Fabregat. Analysis of prepositions: near and away from. Frames of reference 438 6. They are near to solving the puzzle. Keep away from the stove – it’s very hot (Macmillan). (As non-speaker-centric, the addressee.) Coordinate systems are frames of reference and, in language, frames can be distinguished according to origins such as the speaker, the addressee, etc. Table. 1 Classification of frames of reference INTRINSIC ABSOLUTE RELATIVE Origin ≠ ego Origin ≠ ego Origin = ego Object-centred Intrinsic-perspective 3D Model Environment-centred Viewer-centred Deictic-perspective 2.5 D sketch Allocentric Egocentric Orientation-free Orientation-bound 2.1. Linguistic categories of prepositions and the three dimensions of perception In cognitive linguistics, there is some research about space semantics which is likely to arrange an order, despite the difficulty of the concepts, by means of the application of a system with radial networks, in which each prepositional or adverbial sense is located on a node in accordance with its centrality within the network. More literal (spatial or physical) meanings tend to be associated with most focuses in more central senses, from which abstract meanings result from the help of metaphoric and metonymic processes (Navarro, 1998). For example: the bottle is on the table. The bottle is the TR which can be moved easily and its resting side falls across the LM, in this case, the table, which can work as a supporting point for the TR. Another example: A group of students were standing near the entrance. The students are the TR which can be moved easily and its resting side falls across the LM, in this case, the entrance, which can work as a building for the TR. Thus, prepositions, in cognitive linguistics, are considered as «linguistic categories» themselves surrounding a series of elements such as meanings or senses arranged in the structure of a radial category, with a prototype and peripheral members; these are, therefore, polysemic elements with several meanings. Fòrum de Recerca. Núm. 22/2017, p. 433-449 ISSN: 1139-5486. DOI: http://dx.doi.org/10.6035/ForumRecerca.2017.22.26 439 Among the range of research on prepositions from a cognitive linguistic perspective, Navarro’s model (1998, 2006) provides a fully- fledged model, which is a model completely developed, for the semantic representation of prepositions whose senses are derived and arranged in terms of three semantic dimensions of perceptual space or aspects of construal. Therefore, three dimensions that can help in determining the spatial relationship established between the two entities (trajector and landmark) as mentioned above, in human conceptualization, specifically: 1. Topology: The visual perception of objects gives the speaker clues for establishing and conceptualizing topological relations like coincidence, contact, inclusion, proximity, and the like. 2. Force-dynamics: Human beings have experience of self-motion and object motion, which provides the clues for conceptualizing patterns of interaction in terms of dynamics. 3. Function: Human beings have experience of the effects of interaction, as well as the consequences of those effects for survival and well-being. (Navarro, 2006: 171). Regarding the specialisation of meaning, metonymic and metaphoric extensions can be shown in senses. These senses can be detailed by profiling certain aspects of the conceptual schema. In force-dynamic configuration senses, the interaction axis between trajector and landmark is seen as the central aspect of the relation. Though still present, other aspects like the topological relation of contiguity and the functional orientation remain in the background. The direction of the movement is also determined by the trajector’s functional front and by the landmark’s accessible zone as well. The context for the sense of search for contiguity, which is derived from the central force-dynamic senses, requires motion verbs and other dynamic expressions like come, fly, go, run, swing, make, jump, dive and so on. In topological configuration senses, the topological relation of contiguity is prevalent over force-dynamic or functional aspects. The sense of coincidence reflects a «coincidence» between the trajector and the landmark. Some words such as location, place or point are frequent in this sense. The trajector is something understood as if it is attached to a part of an entity and the landmark designates that part. Some words are used in this sense like beginning, top, bottom, middle, centre, head, edge and so on. In functional configuration senses, from the conceptual image schema the functional space can be described with the relation of functionality itself. In the background it is remained the force- Nuria Flor Fabregat. Analysis of prepositions: near and away from. Frames of reference 440 dynamic dimension and the topological relationship. Here places are designated for the landmarks where people usually do certain activities or they participate in certain events. Landmarks are thus often buildings or public spaces. Then, trajectors are people who control or use them in a certain way with relation to them. Trajectors may also be concepts which realise these activities that are carried out by those people. (Navarro, 1998) 2.2. Vandeloise’s spatial relations The object to be located is called the trajector (Langacker), and this author uses the corresponding reference point, the landmark. In this case, Vandeloise (1991) refers to the object to be located as the target and to the object of reference as the landmark. In well-formed utterances, the target always corresponds with the subject of the relation, and the landmark coincides with its object. The linguistic principle may be expressed as follows: • subject of spatial relation = target • object of spatial relation = landmark What are the characteristics of target and landmark? It should be pointed out that the position of the target constitutes new information, while the position of landmark states known information. Although the target is difficult or small to perceive, generally the landmark is large and easy to be distinguished. Also the target is mobile, while the landmark is immobile and stable. For instance, look at the falling star! Near the church tower or look at the church tower! Near the falling star. The falling star, momentary and brief, is drawing the speaker’s attention. It seems as it is the ideal target, while the church tower, immobile and immense, shows the characteristics of the usual landmark. In contrast the second sentence is uncommon. Another example: the bus stop is near the house or the house is near the bus stop. In this contrast it is seen the reason of the sense from an element that is not understood explicitly. The pedestrians’ path is between the house and the bus stop. Here the speaker is recognised with the landmark. The first sentence may be understood by imagining a principal path from the house to the bus stop, whereas the second sentence proposes the opposite course. Indeed both target and landmark are immobile in both examples, and the degree of being acceptable is justified by the speaker who may be moved along the path possibly. When describing the scene, the speaker may choose among several different strategies for situating the landmark. When a non- egocentric landmark is eliminated, there is one strategy between not introducing an egocentric landmark and expressing a landmark with the correspondent preposition. Thus, the landmark may be omitted when the speaker is sufficiently identified with the landmark (e.g., Fòrum de Recerca. Núm. 22/2017, p. 433-449 ISSN: 1139-5486. DOI: http://dx.doi.org/10.6035/ForumRecerca.2017.22.26 441 Saint-Cloud).The degree to which the speaker locates the landmark determines the situation in which this landmark is expressed or not. Then, the landmark refers to the fact of the virtual position of the speaker. Thus, the speaker could change a variety of points of view, and a conversation may continue as long as the addressee is able to understand the situation and the movement of the speaker in such a case. The unexpressed landmark often identifies the speaker’s position, but this position may be either real or virtual. Concerning the expressions near and away from, what the speaker has to carry out is to move to the place of the landmark itself. These expressions near and away from are described as directions of distance between target and landmark. In terms of distance, it is considered to a certain norm which depends on the movement to be approached of the target/landmark and to the landmark/target. The expression près de (near) is reduced to spatial and temporal domains, whereas the other expression proche (close to) de suggests proximity in every domain. Thus, the expression near would refer to smaller distance than away from. For instance: Jupiter is near Saturn. The electron is far from its nucleus. In the first sentence near is identified as a larger distance between target and landmark than far from in the second example. So, the principal characteristic of this distance as a norm relate to the accessibility of the target/landmark to the landmark/target. According to Vandeloise (1991), this norm of distance depends on the trajector and the relation with the landmark as well as the dimension of the landmark, the speed of the target, the speed of the landmark, the size of the speaker, the speed of the speaker, the size and the speed of the addressee, the facility of access and types of access too. Regarding the dimension of the landmark, infrequently the landmark is smaller than the target. The norm may increase in proportion to the landmark. The distance between the target Jupiter and the landmark may be greater in the following example, Jupiter is near the Milky Way than in the example of Jupiter is near Saturn. Indeed, the speed of the target is also relevant, since the norm may increase with the speed of the target when the target is moving towards the landmark. For instance: the tortoise is far from the lake or the antelope is far from the lake. In this case the lake is seen further from the antelope than from the tortoise. However, when the target is moving away from the landmark, the extent of the norm will diminish as speed decreases. In other words, when the speed of the target is gathering with the landmark in an easier or more difficult way, the normal distance increases or decreases, respectively. Nuria Flor Fabregat. Analysis of prepositions: near and away from. Frames of reference 442 Thus, the speed of the landmark is not seen as a common context when the landmark is mobile. However, there are some sentences in which the distance increases with the speed of the landmark. For instance: the man is far from the helicopter. Here the landmark is moving from the target. The fox is near the rabbit. Here it depends on the speed of the landmark. Another case is the size of the speaker, which could be near when is a father or far when is a child and it depends on the age. Moreover, the speed of the speaker varies when the speaker is driving a car or walking. Also the size and the speed of the addressee could vary when the addressee is a hiker and then it is near –for example a farm from a village–, or if this is lame. Finally, the facility of access may be different when the path is easy or difficult to access and the distance increases or diminishes. For example, the red house is far. Here the speaker is walking up and it is far. The yellow house is near. When the speaker is walking down and it is near in such case. There are types of access such as the visual access and the physical access. For example, if the speaker sees the mountain from the hotel window, it may be near, or if the speaker wants to hike there, it may be far. Another situation is when a sailboat may be far from the visual access of the eyes, but it may be near through binocular. There is a value of distance which is changed by the types of access of the speaker or target. The access to the meeting point and the factors of making it easier or not easier between the target and the landmark play an important role. The main factors are pointed out below. Table 2. Classification of accessibility Accessibility Relative speed of target and landmark Distance Type of access The temporal sense is also used with the preposition near, which indicates the spatial reference. For example, it is near Christmas. In the domains of color terminology the expression proche de (close to) is preferred over près de (near). An example of this sense: mauve is close to blue. Thus, isn’t it more direct to define the prepositions near and away from concerning the accessibility and inaccessibility? Then, the distance itself would be one of several possible factors which may affect access. Fòrum de Recerca. Núm. 22/2017, p. 433-449 ISSN: 1139-5486. DOI: http://dx.doi.org/10.6035/ForumRecerca.2017.22.26 443 IV. Research objectives The main objective of this research is to analyse and compare the two prepositions near and away from in order to systematise the main uses and illustrate them with some sentences. Several authors’ explanations and their background knowledge are detailed too. Some prepositions express where something is in physical relation to another thing: Example: There was a bird near the tree. There was a bird away from the tree. About these examples, the following can be said: • A bird is the trajector in the relationship expressed by the preposition. • The tree is the landmark of the preposition. • The preposition tells us where the trajector is in relation to the landmark. • Near / away from are prepositions of distance. Also, because both the subject and the landmark are concrete things, we can say that near/ away from are being used literally. Although there are less than one hundred English prepositions, they do not take ending positions, that is to say they are not usually written in the end of a sentence, and even if the structure of most prepositional phrases is usually easy, the use of English prepositions is complex. Then, most prepositions have more than one meaning, as described, for example, in the prepositional approaches of polysemy in Navarro’s model. As it is well known, English prepositions play an important role in sentences of many key notions, those pertaining to physical objects in their visual perceptions, arrangements, orientations and so forth. V. Methodology The methodology of this research is to look for examples of the prepositions near and away from on the following dictionaries to identify their main uses: • Cambridge dictionary • Merriam Webster • Macmillan • English Oxford dictionary After comparing the meaning of these prepositions, some examples are written to illustrate their uses. Then, each sentence and the preposition is analysed regarding the frames of reference, the three dimensions, the spatial relations and the two concepts which are trajector and landmark. Nuria Flor Fabregat. Analysis of prepositions: near and away from. Frames of reference 444 In order to understand the tree structures of sentences, some concepts of analysis will first be made clear. In traditional grammar and modern syntactic theory, the notion constituency is an essential construct. It is described by linguists as assigned fixed hierarchical structures which are considered as inverted «trees» metaphorically. Then, details vary and styles may change, but an example of the usual format is the nominal or noun phrase as a table near the door (noun phrase- article and noun, prepositional phrase- preposition, article and noun). Three kinds of information can be found in syntactic tree structures. Grammatical category (N, P, NP...), linear order (left to right on the page) and constituency (hierarchical grouping). Prime examples are the subject and object relations. Thus, a subject is designated as a nominal whose profile corresponds to the trajector of a profile relationship. An object is characterised as a landmark of a profile relationship. A complex activity must be called as the act of talking and something which people do rather than whatever they have must be viewed as a language. Some aspects of this activity are motor, perceptual and mental, which are established by the procedure in brain, that is to say in a wide sense a cognitive activity is talking. The knowledge of a language is a situation of controlling skills in distinct contexts. Some regions of the brain are involved in language and the processing activity constitutes linguistic units. Thus, these units are not separated or independent, but they overlap with other units or even add them as components. In general, units embody the rules and regularities of a language. Thus, linguists may consider rules in one of the three ways which are as constructive rules, as filters or as schemas. In cognitive grammar, rules take the form of schemas. These units are connected by relationships of categorization and they can form networks of any size. (Langacker, 2008) VI. Results and discussion According to Merriam Webster, Macmillan, Cambridge dictionary, and English Oxford dictionary near can be used in the following ways: 1) as a preposition (close to someone or something): A group of students were standing near the entrance. I lived near the school. I’ll write and let you know nearer the time (Macmillan). Beaches near the city (Merriam Webster). 2) as an adverb (not far away in distance): Come nearer, and I’ll tell you the whole story (Macmillan). Is there a restaurant near here? (Cambridge) Fòrum de Recerca. Núm. 22/2017, p. 433-449 ISSN: 1139-5486. DOI: http://dx.doi.org/10.6035/ForumRecerca.2017.22.26 445 3) as an adjective: I went into the nearest room. A climb in the mountains led to near disaster (Macmillan). In the near future (not far distant in time (Merriam Webster). 4) in the preposition phrase near to: Pull your chair nearer to the table (Macmillan). a) getting close to a particular state or situation: (near to) Julian was near to panic as he suddenly realized that he was trapped (Macmillan). People near to retirement need to know their pension funds are sufficient (Macmillan). b) near to doing something: They are near to solving the puzzle (Macmillan). According to Merriam Webster, Macmillan, English Oxford dictionary and Cambridge dictionary, away from can be used in the following ways: 1) Somewhere else: somewhere else, or to or in a different place, position, or situation: as an adverb. Ms Watson is away on holiday until the end of the week (Cambridge). Keep/Stay away from him (Cambridge). a) Distant: at a distance (of or from here): as an adverb How far away is the station? (Cambridge). The office is a half-hour drive away (Cambridge). We live five kilometres away from each other (Cambridge). Keep away from the stove – it’s very hot (Macmillan). b) far from people, places, or things, especially so that you feel separated from them. I will be away from home for two weeks (Merriam Webster). It’s nice to have a weekend away from London (Macmillan). She’s been away from her family for too long (Macmillan). 2) Away as an adjective: - (of a sports fixture) played at the opponents’ ground: Tomorrow night’s away game at Leicester (English Oxford dictionary). An away victory (English Oxford dictionary). 3) Relating to or denoting a sports team that is playing at the opponents’ ground: The away side scored first (English Oxford dictionary). Away fans chanted and cheered (English Oxford dictionary). In this section, some examples of the preposition near are described with the purpose of understanding the use of frames of reference: - A group of students were standing near the entrance (Macmillan). The students are the trajector (TR) which can be moved easily and the landmark (LM) is situated after the preposition and it is the Nuria Flor Fabregat. Analysis of prepositions: near and away from. Frames of reference 446 point of reference for the trajector, in this case, the entrance, which can work as a building for the TR. The frame of reference is intrinsic which corresponds to object- centred, since it depends on the accessibility from the trajector to the landmark. The dimensions are topology because of the proximity and function when there is an interaction. The spatial relations of Vandeloise are the type of access and the facility of access to this entrance as a building and the speed of the target (a group of students). - I lived near the school (Macmillan). The speaker is the TR and the subject and the school is the LM and the place where the speaker is situated. The frames of reference are absolute which corresponds to environment-centred and relative which corresponds to viewer-centred, since it depends on the type of access. The dimensions are topology because of the proximity and function when there is an interaction and accessibility. The spatial relations of Vandeloise are the type of access and the facility of access of the school and the speed of the target as a speaker. - They are near to solving the puzzle (Macmillan). The subject, which is they, is the TR and the puzzle is the LM as the object which they try to solve. This preposition means near to doing something. The Frames of reference are intrinsic which corresponds to object-centred and relative which corresponds to viewer-centred. The dimensions are topology (metaphoric sense) because of the proximity as a temporal situation and function when there is an interaction and dynamic when there is a movement. The spatial relations of Vandeloise are the speed of the target as a subject of this sentence to solve the puzzle and the dimension of the landmark (the puzzle). Some examples of the preposition away from are the following: - Keep away from the stove – it’s very hot (Macmillan). The viewer is the TR and the stove is the LM as the object to be away from. The frames of reference are intrinsic which corresponds to object-centred and relative which corresponds to viewer-centred. The dimensions are topology because of the distant and function when there is an interaction and dynamic when the viewer has to keep away. The spatial relations of Vandeloise are the speed of the speaker to keep away from the stove and the dimension of the landmark (the stove). - I will be away from home for two weeks (Merriam Webster). The speaker is the TR and the subject and the home is the LM as the building. The frames of reference are absolute which Fòrum de Recerca. Núm. 22/2017, p. 433-449 ISSN: 1139-5486. DOI: http://dx.doi.org/10.6035/ForumRecerca.2017.22.26 447 corresponds to environment-centred and relative which corresponds to viewer-centred. The dimensions are topology because of the distant and function when there is an interaction. The spatial relations of Vandeloise are the type of access and the facility of access and the space of time of two weeks. - It’s nice to have a weekend away from London (Macmillan). The speaker is the TR as an opinion of the speaker and the city itself is the LM where the speaker is situated. The frames of reference is absolute which corresponds to environment-centred, since it depends on the situation of the city on the map. The dimensions are topology because of the distance and function when there is an interaction. The spatial relations of Vandeloise are the facility of access and the type of access and the speed of the target as the subject of this sentence and as well as the dimension of the landmark (the city). VII. Conclusions In grammatical structure, prepositions relate two elements and these are considered as nouns. These elements are called the trajector and the landmark, in linguistics, which are referred to as TR and LM (Langacker, 1987). Briefly, the trajector is usually situated before the preposition, it is the most meaningful concept and it can be changed more easily from one place to another. However, the landmark is situated after the preposition and it is the concept to which the trajector is related. Then, this is the point of reference for the trajector. Regarding the English language as a foreign language, this subject is not an easy one for many students and they encounter great difficulty as they learn English prepositions in this field of knowledge and study. A case in point is the connection between the English prepositions near and away from and the Spanish prepositions cerca and lejos. The frames of reference absolute, relative and intrinsic are the main ones in order to describe the horizontal spatial directions which are the directions of the prepositions near and away from in this research. The following three terms are synthesised above (Levinson, 2004): • Absolute which corresponds to environment-centred • Relative which corresponds to viewer-centred. • Intrinsic which corresponds to object-centred In these spatial directions, the usual dimensions are topology when there is a proximity, contact or inclusion and function when there is an interaction and some effects between the trajector and the landmark. Nuria Flor Fabregat. Analysis of prepositions: near and away from. Frames of reference 448 The norm of distance, which Vandeloise (1991) explained, depends on the trajector and the relation with the landmark. These two terms (TR and LM) are determined by the dimension of the landmark, the speed of the target or the subject of the sentence, the speed of the landmark or the object of the sentence, the size of the speaker, the speed of the speaker, the size and the speed of the addressee, the facility of access and types of access regarding the building or the place where the speaker is situated. Therefore, some questions should be asked to consider: what are the two elements which prepositions relate?, how are the three dimensions called in spatial directions?, what are the main frames of reference in the horizontal directions? VIII. References Langacker, Ronald W. 2008. Cognitive Grammar. A basic introduction. Oxford: Oxford University Press. Levinson, Stephen C. 2004. Space in language and cognition. Explorations in cognitive diversity. Cambridge: Cambridge University Press. Navarro, Ignasi. 1998. A Cognitive Semantics Analysis of the Lexical Units AT, ON, and IN in English. Ph.D. dissertation. Castelló de la Plana: Publicacions de la Universitat Jaume I de Castelló. Navarro, Ignasi, Campoy, M. Carmen & Caballero, Rosario. 2001. «Thinking with English Prepositions and Adverbs». In Docència Universitària: Avanços Recents. Primera Jornada de Millora Educativa de la Universitat Jaume I, 243-252. Castelló: Publicacions de la Universitat Jaume I de Castelló. Vandeloise, Claude. 1991. Spatial prepositions. A case study from French. Chicago: University of Chicago Press. Web pages https://dictionary.cambridge.org/es/ https://www.merriam-webster.com/ https://www.macmillandictionary.com/ https://www.oxforddictionaries.com/ Images https://www.shutterstock.com/video/clip-17171377-stock-footage- lausanne-switzerland-circa-january-students-outside-college- building-university-students.html https://www.pinterest.es/pin/126452702008017071/ Fòrum de Recerca. Núm. 22/2017, p. 433-449 ISSN: 1139-5486. DOI: http://dx.doi.org/10.6035/ForumRecerca.2017.22.26 449 https://www.123rf.com/photo_11248579_illustration-of-a-cartoon- house-inside-rounded-landscape.html https://www.123rf.com/photo_12107090_illustration-of-kids- solving-a-giant-jigsaw-puzzle.html https://www.pinterest.es/explore/map-of-london-city/ work_2yanabvlyjeqnphthvu7dvtlvi ---- Fuzzy expert system for the detection of episodes of poor water quality through continuous measurement Cecilio Anguloa,∗, Joan Cabestanya, Pablo Rodŕıguezb, Montserrat Batlleb, Antonio Gonzálezb, Sergio de Camposb aUniversitat Politècnica de Catalunya, UPC-Barcelona Tech, Vilanova i la Geltrú, Spain bAdasa Sistemas S.A.U., Barcelona, Spain Abstract In order to prevent and reduce water pollution, promote a sustainable use, protect the environment and enhance the status of aquatic ecosystems, this article deals with the application of advanced mathematical techniques de- signed to aid in the management of records of different water quality moni- toring networks. These studies include the development of a software tool for decision support, based on the application of fuzzy logic techniques, which can indicate water quality episodes from the behaviour of variables mea- sured at continuous automatic water control networks. Using a few physical- chemical variables recorded continuously, the expert system is able to obtain water quality phenomena indicators, which can be associated, with a high probability of cause-effect relationship, with human pressure on the water environment, such as urban discharges or diffuse agricultural pollution. In this sense, at the proposed expert system, automatic water quality control networks complement manual sampling of official administrative networks and laboratory analysis, providing information related to specific events (dis- charges) or continuous processes (eutrophication, fish risk) which can hardly be detected by discrete sampling. Keywords: Water quality system, Automated measurement networks, Fuzzy logic, Fuzzy inference system, Guadiana river ∗Neàpolis Building, Rambla de l’Exposició 62, 08800 Vilanova i la Geltrú, Spain. Tel: +34 938967273; Fax: +34 938967200 Email address: cecilio.angulo@upc.edu (Cecilio Angulo) Preprint submitted to Expert Systems with Applications June 7, 2010 1. Introduction On October 23th, 2000, the “Directive 2000/60/EC of the European Par- liament and of the Council establishing a framework for the Community action in the field of water policy” or, in short, the EU Water Framework Directive (European Parliament, 2000) was finally adopted. The Directive is about the organization of water management, in order to prevent and re- duce pollution, promote sustainable water use, protect the environment and enhance the status of aquatic ecosystems. As a starting point in the practice of the policy in Spain, tasks for the characterization of anthropogenic impact on the waters were carried out, as well as the diagnosis of their condition based on monitored indicators. Application of the Directive led to the execution of an analysis of the char- acteristics of each river basin, and a study of the impact of human activity on the water. Based on the results of such initial analysis as well as determining fac- tors such as cost-benefit analysis, citizen participation and others, each river basin’s management unit should be able to develop a management plan and a program of measures. The objectives of this plan would prevent deteriora- tion, enhance and restore the status of surface water bodies, ensuring that they are in good ecological and chemical status, and reduce pollution due to discharges and emissions of hazardous substances. Moreover, the deployment of the stations of the Automated Information Water Quality System (SAICA, in Spanish) has allowed to obtain automated and continuous information from the state of rivers (Serramià, 2005). Auto- mated information is different from those usually collected through manual measurements. Unlike the latter, which are timely and accurate measures, the SAICA ones present the advantages of immediacy and continuity. How- ever, they involve the management of large quantities of records spread across multiple variables that are supposed to be less precise. This article presents work developed through the Spanish project ECOW- ATCH in the context of implementing the EU Water Framework Directive. It deals with the application of advanced mathematical techniques designed to aid in the management of records from different observational networks of water quality. Studies and developments performed in the project ECOW- ATCH include the development of a software tool for decision support, based on the application of fuzzy logic techniques, which can detect quality episodes from the behavior of continuous variables measured in SAICA networks. 2 The developed expert system is able to generate, in real time, a set of combined indicators of water quality based on: • data from SAICA automated networks • expert knowledge about episodes of poor water quality expressed as rules whose purpose is to detect two different types of events: • ad hoc and spurious events, like urban discharges or caused by a waste water treatment plant (WWTP), • long and cumulative over time events, like episodes of eutrophication and, as a novelty, fish risk, i.e. risky environmental conditions for fish communities The working assumption along this project is that expert definition of continuous indicators will provide information on the evolution of the water quality of a river basin allowing early detection of abnormal events. These indicators will anticipate and complete the results obtained with hand-made programmed sampling. In addition, this form of dual management: • will increase knowledge about the river basin, by summarizing in syn- thetic values the information contained in a set of complex variables • justifies the operation and maintenance costs of continuous automatic monitoring networks, currently used for warning and inspection pur- poses Automatic monitoring networks will supply an added value as knowledge generators about the environment and can help assess the effectiveness and appropriateness of the actions taking place on the basin or specific parts of the same: introducing a management system, construction of a water treatment infrastructure or others. Based on the working hypothesis and with the aim of combining expert knowledge with information collected by SAICA network, the methodology depicted in Figure 1 has been used as project development script. As shown therein, the generation of indicators by the expert system is based on the expression of empirical expert rules in the form of fuzzy logic. Such expert fuzzy rules are constructed from a set of examples that allow the formulation 3 of the phenomena by the experts. The acquisition of this knowledge through the use of fuzzy logic will allow to design an expert system able to generalize to new situations. Figure 1: Development process for setting-up fuzzy logic rules. 2. Selecting poor water quality episodes The Spanish side of the Guadiana river basin has been chosen as the study area for this work, due to the availability of information records collected by several automatic measurement stations belonging to the Guadiana Hydro- graphic Confederation’s SAICA network. Figure 2(a) shows the location of the basin and the selected stations are depicted in Figure 2(b). 2.1. Episodes and related variables The expert system to be developed will focus on the detection of three types of episodes: (i) discharges to the environment, (ii) eutrophication and, experimentally, (iii) fish risk. The determination of quality variables to be used as inputs for each kind of episode to assess and their combined impact has been completed with the collaboration of experts, who have characterized a series of remarkable events, which have been used to calibrate the developed methods. 4 (a) Guadiana River basin. (b) Stations in the Guadiana River. Figure 2: Location of selected stations in the Guadiana River. The five variables extracted from expert knowledge to characterize episodes have been: pH, conductivity, turbidity, ammonium concentration and dis- solved oxygen (DO) concentration. Relevant cases corresponding to the studied episodes should be selected for training and development of automated fuzzy inference systems. In the following, episodes characterizing each type of analyzed event are discussed, along with a detailed description in terms of the consulted experts. 2.2. Episodes of Urban or WWTP discharge The two episodes of water discharge shown below correspond to data from the Guadajira automatic station, located downstream of a WWTP. 2.2.1. Episode 1. Guadajira, January 2009 The values of the variables selected by the experts for the duration of the episode are shown in Figure 3. Two different events can be distinguished, one of them is starting on January 22th, and another at the beginning of the 27th, marked with two separate rectangles. An explanation for the first event is a possible malfunction in the process of nitrification - denitrification of the upstream wastewater treatment plant, which would justify the increase of ammonium, as well as in the phosphate removal, while there is a decrease in the dissolved oxygen concentration. For the second event, aeration might not work properly in the WWTP, so no removal of phosphorus and ammonium exists, while there is an increase in 5 Figure 3: Measured variables for Episode 1 (discharge). organic matter, which is reflected in the decrease of dissolved oxygen. Both cases are similar, the second one having a largest pollution load than the first, although it could also be a direct discharge. This kind of situation is not unusual in this particular WWTP. A planned action exists for this specific WWTP in order to enlarge it to provide sufficient treatment capabilities for current needs. The proposed expert system would serve, in this case, as a tool for assessing the effectiveness and appropriateness of the actions currently being undertaken at the WWTP. 2.2.2. Episode 2. Guadajira, September 2008 In Figure 4 the values of the variables during the analyzed event are depicted. It can be seen that the event starts in the evening of September 22th, as marked with a rectangle. The increased presence of ammonium indicates that the process of nitri- fication - denitrification has not been completed, therefore the cause is in the WWTP. Moreover, there has neither been degradation of organic mat- ter before the discharge, since values for the dissolved oxygen concentration become null and turbidity increases due to the presence of organic matter. 6 Figure 4: Measured variables for Episode 2 (discharge). This episode is related to a direct urban discharge, from either the WWTP itself, or from some of the olive-related industries located in the area that are not networked to the sewerage system which serves the WWTP, and that are in the phase of requirements for implementation of appropriate treatment systems. 2.3. Episode of Eutrophication A process of eutrophication is an increase in the trophic status of water caused by nutrient enrichment (Smith et al., 1999). In the limit, the cumula- tive process of eutrophication can lead to a biological impact as it comes to situations of partial (night) or permanent anoxia (no oxygen), which could end in itself with fish life. The selected episode to be used for the expert system is shown in the next section. 2.3.1. Episode 3. Benavides, March 2007 A process of natural photosynthesis is going on: daytime primary pro- ducers consume CO2 and produce oxygen, thus decreasing H + concentration, and therefore pH increases. At night CO2 concentration increases, increasing H+, and low pH. As the process continues, the possible presence of N and P nutrients (not measured) causes biomass growth and increases production 7 Figure 5: Measured variables for Episode 3 (eutrophication and fish risk). / consumption of oxygen, which is manifested in the amplitude of oscillation in the daily cycle of this element. While N and P values are in excess the process will increase. From an initial state of eutrophication, low flow does not favor dilution, and the increase of conductivity may indicate a concentration of salts and increased nutrient concentrations (nitrates, phosphates...) that are a limit- ing factor for algae and macrophytes, favoring the uncontrolled growth of biomass. The system is no longer sustainable as the amount of biomass to degrade exceeds the system’s capacity, hence oxygen concentration de- creases until eventual anoxia situations. Values for quality variables during the episode are shown in Figure 5. 2.4. Episode of Fish risk According to the experience of operation managers, during the advanced stage of eutrophication an increase in average pH is observed. In these cir- cumstances, if a discharge of ammonium in a basic medium exists, it is trans- formed into ammonia, a highly toxic product and lethal to fish fauna, which was already under pressure due to periods of partial anoxia. Information about fish risk is useful for operation managers as an aid to 8 decision making for preventive action. In entering a risk area, when there is a capacity to act, dilution can be forced by releasing water from reservoirs to help breaking the cycle of eutrophication. An episode from a water station on the River Guadiana has been selected and is shown in the next section. 2.4.1. Episode 3. Benavides, March 2007 Ammonium (NH+4 ) is in the water in steady equilibrium with ammonia (NH3), highly toxic (NH3 +H + ↔ NH+4 ). Faced with a discharge of ammo- nium in the river, a small increase in pH (H+ decrease) moves the equilibrium towards the formation of NH3. While ammonium can be easily assimilated, ammonia is lethal to living beings. The situation can be particularly danger- ous on this zone of the river, which consists on a wide and clear channel with shallow waters, when eutrophication occurs. Both contributing factors lead to a strong proliferation of phytoplankton, which during the daily cycle cause significant variations in pH. During an episode with high levels of ammonium that can not be neutralized by oxidation, NH3 lethal concentrations to the fish stock can be reached. In this episode, the average pH values grow up close to 9, so a small discharge of ammonium is dangerous to the fish fauna, and a large one, lethal. Figure 5 shows the value for quality variables during the episode. 3. Fuzzy inference systems developed for the expert system The set of examples of abnormal events to be used for the definition and development of the expert system have been established. The next stage in the methodology is the automation, by the working group, of the steps to generate the expert system (see Figure 1). A first step, which consumes a significant amount of time if it is not well automated, is the inspection and validation of the database provided by the SAICA network’s stations. In particular, measurement errors for out of range and non-present values in some variables should be considered. Standardized solutions have been adopted: absent values have been filled by linear regression and outliers have been erased and replaced with mean data from previous and consequent samples. Experts have been paying particular attention to some little samples difficult to fix automatically. 3.1. Description Fuzzy inference systems are a widespread method in the treatment of information for the generation of expert systems in monitoring the water 9 quality (Nasiri et al., 2007; Lermontov et al., 2009; Hatzikos et al., 2009). Roughly speaking, a fuzzy inference system is an algorithm able to convert a strategy or set of linguistic rules in an automated strategy. Applications can be designed with fuzzy logic so that systems and processes respond with greater knowledge to imprecision, which seeks to mimic human behavior. Our approach is focused on the automated generation of fuzzy rules from recorded data according to a pre-defined structure of rules or strategy de- termined by human experts. Once this strategy is provided, some questions must be answered for the generation of the automated inference system. In particular: 1. How are membership functions for both input and output variables set-up? 2. Which labels for the output variable are assigned for each rule? 3. How are linguistic labels determined for the defuzzified output? Before developing the automated system, effectiveness of using fuzzy in- ference systems was firstly validated by manually creating an expert system prototype, which was tested on the two episodes of urban discharge in Guada- jira’s SAICA station, in January 2009 and September 2008. As a result, for the first of the episodes, a set of 54 basic rules was designed. One example of them is following: 54. IF (IncConduct is Negative) and (IncOD is Zero) and (IncAmmonium is Negative) and (IncPhosphates is Negative) THEN (UrbanDischarge is Normal) For the second example, turbidity measures are further provided, so the initial knowledge base is completed with the information of whether or not there is increased turbidity. This resulted in 162 rules. One example is following: 1. IF (IncConduct is Positive) and (IncOD is Negative) and (IncTurbidity is Positive) and (IncAmmonium is Positive) and (mOD is Low) THEN (UrbanDischarge is Evidences) After positive testing of manually generated fuzzy inference systems for these two examples, it was time to generalize, i.e. to develop an automated procedure to automatically obtain fuzzy rules according to provided data. 10 3.2. General Application The continuous data obtained through automated networks in the Guadi- ana river Hydrographic Confederation (CHG) have allowed characterization of different episodes in its basin, which experts have sought to associate the possible cause of. As an instance, in Figure 3 the expert has been able to find out two episodes of increased ammonium matching a decrease in oxy- gen, a slight decrease in pH and a slight increase in conductivity. In turn, a discharge of organic origin has been assigned as a possible cause. This type of information is quite important to perform a systematic study of water quality, so it is very important to have the support of technical experts who can discern this kind of behavior and match it with real expe- riences. The creation of an expert system capable of collecting, objectifying and quantifying this experience, can be made more transparent through an inference system based on fuzzy rules. The fuzzy inference systems should allow the identification of combined indices of automatic settings that are representative of specific events or indicators of possible trends in the evolu- tion of the pressures. Figure 6 shows the two types of application processes that have been followed for the development and calibration of the inference system, distinguishing between an automatic formulation of rules once the principles of phenomena are known, and a manual formulation for the risk fish phenomenon due to its special nature. 3.2.1. Expert strategy formulation As a result of the analysis of the information gathered by experts, a strategy formulation has been provided for the three types of events to be detected. The resulting algorithm for the detection of discharge episodes is the concatenation of three fuzzy inference systems, resulting in a value between 0 and 1 drawn from the linguistic labels “Normal”, “Evidence of discharge”, “Slight discharge” and “Serious discharge” about the output variable (Figure 7). Features were selected among incremental and mean values of the original variables, according to expert advice. A point to be studied later is how to best combine the outputs of the three inference systems, either by nesting them or by weighting the outputs. For the detection of eutrophication phenomena, the resulting algorithm is composed of a single fuzzy inference system, resulting in a value between 0 and 1 extracted from labels “Normal”, “Natural photosynthesis”, “Slight eutrophication” and “Advanced eutrophication” in the output. A diagram of 11 Figure 6: Algorithm calibration process. Figure 7: Fuzzy inference system strategy for the ‘discharge indicator’. 12 Figure 8: Strategy for the ‘eutrophication indicator’. the obtained algorithm is shown in Figure 8. In this case, since eutrophication is a process lasting for several days and depending on the day/night cycle, features were designed using a long temporal window. Finally, the algorithm for detecting fish risk situations consists of a single fuzzy inference system, resulting in a value between 0 and 1 drawn from the values of membership functions on linguistic labels “Normal”, “Possible risk”, “Moderate risk”, “High risk” and “Extreme risk” in the output. In this case, membership functions for the input variables have been manually designed. Representations of the membership functions, as designed by experts, have been depicted in Figure 9. 3.2.2. Automated design Once strategies have been provided by the experts, membership functions for the input and output variables and labels for the output in the rules must be calculated in an automated way from the SAICA network data. Later, they should be tuned or scaled so that the response is adjusted to the expert opinion on the known cases. Input variables’ membership functions. For the input variables, membership functions have been designed according to the following constraints: • three membership functions for each input variable 13 Figure 9: Fuzzy algorithm strategy for the ‘fish risk indicator’. • the universe of discourse is defined by the database ranges and symme- try • membership functions are gaussian functions centered on both bounds of the range and its middle • gaussian functions’ width is a compromise between data quartiles and normalization to the unity As an illustrative example, membership functions for the ‘DO increment’ variable are shown in Figure 10. For the ‘fish risk indicator’ case, the expert-designed membership func- tions for the input variables (depicted in Figure 9) have been translated to those shown in Figure 11. Output variable’s membership functions. For the output variable, member- ship functions have been designed according to the following constraints: • usual equally distributed triangular membership functions have been employed with the universe of discourse in the range [0, 1], for the final output variable 14 Figure 10: Membership functions for the variable ‘DO increment’. • for the concatenated fuzzy inference systems in the ‘discharge indi- cator’, the same distribution has been chosen with four membership functions Output variable’s labels for each rule. Labels for the output variable for each rule are also automatically associated by considering the strength of the input variables in the rule’s precedent, and its either direct or reverse relationship with the rule’s consequent. An easy linear formulation has been employed to assign a label to the output variable. Linguistic labels for the defuzzified output. Tresholds used to assign linguis- tic labels to the indicators have been calculated after the results on known episodes have been analyzed. Output normalization to the range [0, 1] could have been performed. However we are interested on generalization to other basins and rivers, hence it was determined not to scale it. Output normaliza- tion has been delayed for future studies on data coming from further Spanish river basins. 4. Experimentation and result analysis A MATLAB/GUI (Graphical User Interface) based tool has been devel- oped as a fuzzy expert system for the detection of episodes of poor water 15 (a) Fuzzy inference system for the ‘fish risk indicator’. (b) Mean pH. (c) Mean DO. (d) Mean Ammonium. Figure 11: Strategy and input membership functions for the ‘Fish risk indicator’. quality (discharge, eutrophication and fish risk) through continuous mea- surement. The prototype has been evaluated on the SAICA network data provided by the river Guadiana Hydrographic Confederation and obtained results have been analyzed. An example of the GUI-based user-friendly in- terface is illustrated in Figure 12. The indicator of urban or WWTP discharge (‘discharge indicator’) has been specifically calibrated for the Guadajira station and for episodes 1 and 2. It should be adjusted if implemented in other locations. This process could be performed with a set of selected episodes or with a long series of data. The ‘eutrophication indicator’ has been tuned for episode 3, and it has been directly applied to other stations (in the same basin or a different one) without ulterior adjustement. Finally, the ‘fish risk indicator’ is an absolute indicator, and no tuning process was performed in any case. Consequently, it can be directly applied to other stations or basins. 16 Figure 12: MATLAB/GUI prototype’s example. Below are the results of the implementation of the algorithms for calcu- lating event indicators. 4.1. Urban or WWTP discharge Figure 13 (top) shows the results for the ‘discharge indicator’ for Episode 1. It can be noted that the first event has been characterized as “Evidence of discharge”, while the second has been labeled as “Serious discharge”. The indicator has identified yet another possible event at the beginning of the 24th day, mainly due to an increase in ammonium concentration and a lowering of dissolved oxygen, but not important enough to become an “Evidence of discharge”. Figure 13 (bottom) shows the results for the ‘discharge indicator’ for Episode 2. In this case the indicator has a base value slightly below the threshold of an “Evidence of discharge”, so sometimes it is occasionally over- passed. The event on the afternoon of the 22nd day is properly detected and characterized as “Serious discharge”. And just at the beginning of the 24th day there is another event that is detected at the edge of being considered “Slight discharge”. An inspection of the original data does not show that it is a new event, but just a spike in long event conditions detected in the first place. The results suggest the use of the ‘discharge indicator’ thresholds shown in Table 1 to characterize detected events more easily. 17 Figure 13: ‘Discharge indicator’ results for Episodes 1 and 2. Discharge “Normal” “Evidence” “Slight” “Serious” Range < 0.4 0.4 − 0.6 0.6 − 0.8 > 0.8 Table 1: Thresholds for the ‘discharge indicator’. 18 Figure 14: ‘Eutrophication indicator’ and ‘Fish risk indicator’ results for Episode 3. Eutrophia “Normal” “Photosynthesis” “Slight” “Advanced” Range < 0.2 0.2 − 0.3 0.3 − 0.4 > 0.4 Table 2: Thresholds for the ‘eutrophication indicator’. In measuring stations located downstream of discharges, at most “Ev- idence of discharge” values were only detected. This makes sense because when water flowed into the main channel of the Guadiana river, dilution occurred and decreased the effects of the discharge. 4.2. Eutrophication The ‘eutrophication indicator’ for Episode 3 enters a state of “Natural photosynthesis” from the 9th day and gradually grows up into “Slight eu- trophication” in the middle of the study period. At this moment, the state moves rapidly to “Advanced eutrophication”, where it remains until the 30th day. Since then decreases progressively, as shown in Figure 14 (top). The results in the ‘eutrophication indicator’ suggest using the thresholds defined in Table 2 to characterize the detected events. It can be noted that the obtained values for this indicator have been lower 19 Fish risk “Normal” “Possible” “Moderate” “High” “Extreme” Range < 0.2 0.2 − 0.3 0.3 − 0.4 0.4 − 05 > 0.5 Table 3: Thresholds for the ‘fish risk indicator’. than in the case of a discharge, so that the thresholds should be proportion- ally lower, or should be scaled to obtain uniformity in the system. 4.3. Fish risk Figure 14 (bottom) shows the results of the ‘fish risk indicator’ for Episode 3. The indicator enters a situation of concern about the middle of the study period, and the obtained values are growing up beyond the threshold of “Moderate risk”. However, the detected discharge of ammonium is not sig- nificant enough to generate a dangerous situation. From the 25th day the risk values decrease to reach a “Normal” situation in the middle of the 29th. As it can be observed, the ‘eutrophication indicator’ draws a gradual pro- cess of degradation of the quality conditions in the river reaching the “Ad- vanced eutrophication” state. Under these conditions the pH is slightly basic which is manifested in a “Moderate risk” label for the ‘risk fish indicator’. This represents only a risk situation, but the conditions necessary to trigger a situation of danger to fish life exist if there is an increased concentration of ammonium, and this could occur at any time by an urban discharge. It has been observed in other case studies that in a situation of “Slight eutrophication” there has been a major discharge of ammonium, but the ‘fish risk indicator’ has not reached high-risk values. The reason for this phenomenon is that, although there is a high concentration of ammonium, environmental conditions with normal pH do not facilitate the formation of ammonia. Results obtained for the ‘fish risk indicator’ suggest using the thresholds defined in Table 3 to characterize the detected events. The developed ‘fish risk indicator’ is a novel helpful tool for better en- vironmental management, as it provides objective criteria for risk mitiga- tion actions, such as forcing the dilution of nutrients to break the cycle of eutrophication by providing flow from reservoirs, thereby avoiding possible river wildlife mortality. 20 5. Conclusions From the methodological point of view, algorithms based on fuzzy logic that allow the detection and identification of episodes of urban or WWTP dis- charge, eutrophication and fish risk have been developed. These algorithms have been obtained from the extraction of the experience and knowledge of experts in this field, as well for the characterization of the phenomena as for the selection of variables to use. The presented work demonstrates the ability of fuzzy-logic-based methods to synthesize complex information, in- terpretable only by a few experts, and translate it into more understandable indicators for environmental managers and the general public. The developed expert system is based on data from measuring stations belonging to SAICA networks. Using a few physical-chemical variables con- tinuously recorded, the expert system is able to obtain indicators of water quality phenomena that may be associated, with a high probability of cause- effect relationship, to human pressure on the water environment, as would be urban discharges or agricultural diffuse pollution. In this sense, SAICA networks for continuous water quality measurements complement adminis- trative official networks of manual sampling and analytical laboratory, by providing information about episodic events (discharges) or continuous pro- cesses (eutrophication) which can hardly be detected by discrete sampling. The next steps to perform are the extension of the methods of calcula- tion of indicators to more phenomena of water quality, particularly follow-up downstream episodes, and application in other basins with different dynamic behavior. Acknowledgments The SAICA network data have been provided by the Guadiana Hydro- graphic Confederation, which has also contributed its experience and knowl- edge of the basin needed to characterize the episodes. The work has been developed in the context of the project ECOWATCH, led by Adasa Sistemas S.A.U., which has received a grant (ref. 022/SGTB/2007/6.1) from “En- vironmental projects of scientific research, technological development and innovation” of the Spanish Ministry of Environment, Rural and Marine En- vironment under the National Program of Environmental Science and Tech- nology. 21 References European Parliament, C., December 2000. Directive 2000/60/ec of the eu- ropean parliament and of the council of 23 october 2000 establishing a framework for community action in the field of water policy. Official Jour- nal L 327, 1–73. Hatzikos, E., Hätönen, J., Bassiliades, N., Vlahavas, I., Fournou, E., 2009. Applying adaptive prediction to sea-water quality measurements. Expert Systems with Applications 36 (3, Part 2), 6773 – 6779. Lermontov, A., Yokoyama, L., Lermontov, M., Machado, M. A. S., 2009. River quality analysis using fuzzy water quality index: Ribeira do iguape river watershed, brazil. Ecological Indicators 9 (6), 1188 – 1197. Nasiri, F., Maqsood, I., Huang, G., Fuller, N., 2007. Water quality index: A fuzzy river-pollution decision support expert system. Journal of Water Resources Planning and Management 133 (2), 95–105. Serramià, A., 2005. Information systems and water quality in Spain. Inge- nieŕıa Qúımica 420, 155–158, in Spanish. Smith, V. H., Tilman, G. D., Nekola, J. C., 1999. Eutrophication: impacts of excess nutrient inputs on freshwater, marine, and terrestrial ecosystems. Environmental Pollution 100 (1-3), 179 – 196. 22 work_2yfpqgwysjgvtfekdhm5l462zm ---- A Fuzzy Expert System Architecture For Data And Event Stream Processing HAL Id: cea-01803819 https://hal-cea.archives-ouvertes.fr/cea-01803819 Submitted on 7 Jan 2019 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. A fuzzy expert system architecture for data and event stream processing Jean-Philippe Poli, L. Boudet To cite this version: Jean-Philippe Poli, L. Boudet. A fuzzy expert system architecture for data and event stream pro- cessing. Fuzzy Sets and Systems, Elsevier, 2018, 343 (SI), pp.20-34. �10.1016/j.fss.2017.10.005�. �cea- 01803819� https://hal-cea.archives-ouvertes.fr/cea-01803819 https://hal.archives-ouvertes.fr A Fuzzy Expert System Architecture For Data And Event Stream Processing Jean-Philippe Poli, Laurence Boudet CEA, LIST, Data Analysis and System Intelligence Laboratory, 91191 Gif-sur-Yvette cedex, France. Abstract The Internet of Things was born from the proliferation of connected objects and is known as the third era of information technology. It results in the availability of a huge amount of continuously acquired data which need to be processed to be more valuable. This leads to a real paradigm shift: instead of processing fixed data like classical databases or files, the new algorithms have to deal with data streams which bring their own set of requirements. Researchers address new challenges in the way of storing, querying and processing those data which are always in motion. In many decision making scenarios, fuzzy expert systems have been useful to deduce a more conceptual knowledge from data. With the emergence of the Internet of Things and the growing presence of cloud-based architectures, it is necessary to improve fuzzy expert systems to support higher level operators, large rule bases and an abundant flow of inputs. In this paper, we introduce a modular fuzzy expert system which takes data or event streams in input and which outputs decisions on the fly. Its architecture relies on both a graph-based representation of the rule base and the cooperation of four customizable modules. Stress tests regarding the number of rules have been carried out to characterize its efficiency. Keywords: Fuzzy expert system, complex event processing, data stream ∗Corresponding author Email address: jean-philippe.poli@cea.fr (Jean-Philippe Poli, Laurence Boudet) Preprint submitted to Fuzzy Sets and Systems October 11, 2017 processing, rule base representation, policies 1. Introduction The emergence of connected objects and mobile devices gave birth to the Internet of Things and is leading towards a continuous data acquisition from different devices and sensors. Before this third era of information technology, the data were stored in data warehouse, queried at once and manipulated by5 algorithms as a whole. With such data in motion, the use cases have changed: for instance, new database management paradigms are introduced, special efforts are made on data compression to avoid networks overload, and supervised or unsupervised learning algorithms are rethought. Cugola and Margara [1] define the Information Flow Processing (IFP) do-10 main as the domain of tools capable of processing information as it flows. Usu- ally, the flow is coming from multiple sources and processed to extract relevant knowledge. They distinguish two subdomains:Complex Event Processing (CEP) and Data Stream Processing (DSP). Algorithms for processing such flows have to be fast and incremental [2] and are evaluated regarding two criteria: the15 number of passes over the stream (which must be as close as possible to 1) and the size of the workspace in memory [3]. On the one hand, DSP consists in processing data flows and in producing a new data flow as output. The Federal Standard defines a data stream as a “sequence of digitally encoded signals used to represent information in transmis-20 sion”. More formally, data stream can be defined [2, 3, 4] as a sequence of data items x1, . . . , xi, . . . , xn such that the items are read once in increasing order of the indexes i. A lot of tools have been introduced to process data streams. For instance, traditional database management systems, which work on persistent data, are replaced by data stream management systems whose queries run con-25 tinuously: anytime new data arrive, the result of a query is updated [5, 6]. Other researches mainly include data streams mining with either clustering methods [7] or neural networks [8]. In [9, 10], the authors are revealing the open chal- 2 lenges which must be addressed in the domain of data stream mining, including privacy issues, developing a methodology for stream preprocessing, developing30 online monitoring systems and balancing resources. On the other hand, CEP differs by the type of data items it considers: an item is a notification of event [11]. CEP are also associated with velocity: it aims at managing thousands of events per second [12], for instance, up to 125000 events for a financial software [13]. In this domain, processing mainly consists35 in filtering, gathering and combining those events to build a higher level infor- mation [14], to raise alerts or trigger processes. It makes use of the relationships which exist between events: indeed, events may be related in various ways (by cause, by timing, or by membership) [11]. Nowaday, DSP and CEP became usual considerations in many real world40 applications. We can cite for example: system monitoring and fault detection, home automation, security and finance [1]. However, in both data and event streams, the information may be incomplete and imprecise by nature [15]. For instance, sensors may be out of order or inaccurate, and data may be noisy. Fuzzy logic [16] has been specifically designed to mathematically represent un-45 certainty and vagueness and is a popular tool for dealing with imprecision in many real world problems. Taking advantage of fuzzy logic, fuzzy expert sys- tems allow to easily represent human knowledge about data and phenomena and have been successfully applied to many domains [17, 18]. Fuzzy expert systems often come with a higher computational cost compared50 with boolean logic expert systems. For instance, fuzzy inference needs to assess the whole rule base to compute the outputs, whereas in boolean expert systems, a subset of the rules are applied one by one to produce the inference. Whereas fuzzy inference involves simple operations and simple functions to lower the computational cost, it has been showed that fuzzy rule bases need to be more55 complicated if only piecewise-linear functions (e.g. trapezoids...) are used in- stead of non-linear membership functions (e.g. sigmoids...) [19]. Consequently, expensive functions in terms of computation are needed to assess the aggrega- tion and the defuzzification [20]. In addition, in real-world applications, it is 3 possible to have very large rule bases which require a great amount of processor60 time [21]. Moreover, to describe the relations between the data or the events, more sophisticated operators are needed for temporal [22, 23], spatial [24, 25] or even spatio-temporal [26] reasoning. These operators imply a higher com- putational cost. In addition, traditional fuzzy expert systems compute output values only when input values have changed. This is not compliant with event65 streams whose events are potentially arriving in an irregular manner: in such a case, expressions may change before the next event arrival (see section 3.3). Our work aims at developing a fuzzy expert system to process information flows, handling the imprecision brought by noisy data, sensors or network prob- lems with fuzzy logic. The motivation of our work is to provide an efficient fuzzy70 expert system in operational contexts. To enable human experts to author more complex rules, our system is able to efficiently assess complex fuzzy relations [23]. To ensure it can interface easily with the various information systems of our partners, we chose to avoid specific architectures (like GPU) and to develop a software for data and event stream processing on regular CPU platforms. Fi-75 nally, in industrial applications, the efficiency is important not only because there must be a lot of rules, but also because the rules can be applied to a huge number of objects or events per second. The paper is structured as follows: section 2 presents the related work about large rule base handling. Section 3 describes the architecture of our fuzzy expert80 system. Section 4 presents the implementation, and then the protocol and the results of experiments on both data and event streams. Finally, section 5 points out the conclusions. 2. Related work To address real-world applications, fuzzy expert systems have to be able to85 process large rule bases very fast, and thus face the well-known combinatorial explosion problem. Indeed, in case of a conjunctive combination of the terms of all the inputs (also known as grid fuzzy rule structure), the number of rules 4 grows exponentially with respect to the number of inputs. For instance, there are pn possible rules for n inputs with p terms each.90 Combs and Andrews [27] tried to solve this problem by proposing a rule construction schema based on the union of single-antecedent rules, called Union Rule Configuration. In this case, the number of rules evolves only linearly in function of the number of inputs. Mendel and Liang [28] contested this approach stating that the two rule bases may not be equivalent and suggest to inquire95 whether the replacement makes sense. Large rule bases are rarely given by human experts and are rather automat- ically inducted from datasets. Thus, automatic rule induction also faces the curse of dimensionality. To fix this issue, one can generate a partial rule base which does not cover the whole input space. Approaches based on input space100 partitioning by k-d trees [29] or quadtrees [30] result in nonuniform overlapping membership functions which are difficult to label with an understandable lin- guistic term. Jin [31] takes advantage of the conclusion drawn in [32] : optimal rules cover the extrema. The generated rule base is then checked for redundancy and potential inconsistency and optimized by a genetic algorithm. Hence, Jin’s105 algorithm generates 27 rules from a training set of 20000 examples described by 11 inputs. In [33], authors present S-FRULER a genetic fuzzy system for regression problem capable of learning rules from big datasets. First, a multi- granularity fuzzy discretization is applied on the whole dataset, which is then split into several partitions. Each partition is then treated as an independent110 problem, which allows distribution. The processing consists in selecting vari- ables randomly and then using genetic algorithms to induce rules. The rules are then combined into a unique rule base. Authors claim they obtain simple rule bases in terms of number of rules while maintaining a good precision. How- ever, the different steps do not guarantee to have an optimal rule base since115 random variable selection, genetic algorithms and rule combination can add a lot of biases. The interpretability of the rule base is not the only reason to decrease the number of rules: applying on information streams needs a fast processing of 5 the rules to make decisions on the fly. Fuzzy controllers have been introduced120 to overcome these drawbacks and are able to process inputs in real-time [34]. More recently, papers address the acceleration of fuzzy computation either with dedicated hardware [35] or with the help of Graphics Processing Units (GPU) [36]. However, to our experience, fuzzy expert softwares which run on classic CPU platforms are more convenient for many reasons. Firstly, they are easier to125 interface with an existing system than electronic chipsets. Then, DSP and CEP both rely on software intensive architectures. Moreover, in terms of scalability, it is possible to use from a single core of a machine to several machines and it can all be done transparently for the user ; for instance, it can take advantage of the virtualization as in cloud-based services.130 The next section describes a suitable architecture for general-purpose fuzzy expert systems in which the problem of the rule base size in terms of computa- tion speed is addressed by eliminating redundancy in the assessment. We also distribute the different roles like gathering inputs, computing and announcing results in different modules to be able to process information streams.135 3. Architecture description Fuzzy expert systems can infer regarding two main paradigms: • Mamdani type, in which rule conclusions are fuzzy set; • Takagi-Sugeno type, in which rule conclusions are a function of the inputs, i.e. a crisp value.140 Whatever the type of inference, when a group of inputs change at a time t, all the rules containing at least one of those inputs have to be reevaluated. In information streams, inputs may change several times per second, or rules must be applied on thousands of incoming events per second ; the evaluation of the whole rule base may thus need a huge computation time. The distribution of145 the rule base is not always a good solution: for instance, monitoring of vehicles 6 or of patients need to process the same rule base over different input streams. However, the evaluation for each vehicle or each patient can be concurrent. In this article, we introduce an architecture which tends to avoid the system saturation. In the remainder, without loss of generality, the examples will be150 given for a Mamdani type inference system. 3.1. Architecture overview Figure 1 presents the overview of the proposed architecture. The modularity is ensured by a separation of the tasks and a customization provided by the use of policies. A policy is a set of parameters which customize the behavior of each155 module. The combination of the behaviors of all the modules enable to address a lot of applications and issues : regular or irregular data rate, delay before inference, etc. The architecture is composed of several modules : • the active input queue gathers and groups the inputs by timestamps, • the scheduler monitors the system (via the operating system) and to160 decide which inputs group has to be processed, • the evaluator is in charge of the evaluation of the rules, • the output change broadcaster informs the user about outputs changes. The different modules help avoiding a system overload (for instance, the ac- tive input queue selects the inputs which should be treated) or user overfeeding165 (for instance, the output change broadcaster displays only the relevant infor- mation). We first introduce how we optimize the rule base representation by common subexpression elimination and the concept of expiration of expressions. We then describe each module of the architecture and give some examples of policies.170 3.2. Rule base representation The rule base in-memory model plays a major role in the efficiency of the fuzzy expert system. Expressions are usually modeled with a tree [37], as in 7 Active input queue Scheduler Evaluator Output change broadcaster Rule base Input stream Output stream observes injects uses calls notifies notifies notifies Figure 1: Architecture overview. Figure 2(a). However, some expressions can be included in several rules or other expressions: thus, in a tree representation, it is difficult to check the175 redundancy of such expressions, and it is necessary to compute them several times when a group of inputs changed. This problem is known as common subexpression elimination (CSE). To address the CSE problem in our architecture, we chose to represent each expression by a unique node: thus, the rule base is not represented by a tree180 anymore but by a graph (figure 2(b)). More precisely, we use an acyclic directed graph to avoid loops during the evaluation. In the graph, an edge A −→ B means that if the value of the node A changes, it affects the node B and B has to be evaluated again. In this case, B is called a direct successor of A. In the graph we are considering, a node may have several direct successors. A node185 can represent fuzzy expressions (including fuzzy propositions) or rules, and we consider particular nodes for defuzzification and aggregation. Thus, the changes propagate from input nodes to output nodes. The propagation stops if there are no changes during the evaluation of the current node. The propagation is achieved as particular breadth-first traversal of the graph.190 However, for a fuzzy relation of cardinality n, it is necessary to assess its n 8 Input X Input Y X is A Y is B Input Z Z is C (X is A AND Y is B) OR NOT Z is C IF (X is A AND Y is B) OR NOT Z is C THEN O is D1 (X is A AND Y is B) OR Z is C (X is A AND Y is B) OR Z is C THEN IF O is D2 Aggregation output O Defuzzification output O X is A AND Y is B NOT Z is C Z is C Input Z Input X Input Y X is A Y is B X is A AND Y is B (a) Tree-based representation Input X Input Y X is A Y is B Input Z Z is C NOT Z is C (X is A AND Y is B) OR NOT Z is C IF (X is A AND Y is B) OR NOT Z is C THEN O is D1 (X is A AND Y is B) OR Z is C (X is A AND Y is B) OR Z is C THEN IF O is D2 Aggregation output O Defuzzification output O X is A AND Y is B (0)(0) (0) (1)(1)(1) (2) (2) (3)(3) (4) (4) (5) (6) (b) Graph-based representation Figure 2: Representations of a base of two Mamdani-type rules. predecessors before its own evaluation, otherwise it would be evaluated n times, and worst, at a certain time, its value would be inconsistent. To avoid this effect, we added a priority information to the nodes. Before starting the fuzzy inference engine, the graph is traversed and a recursive function priority : Node →195 integer is applied. Let N be the current node to be treated, the function priority is defined as follow: • if N is an input node, then priority(N) = 0, • otherwise, let si be the direct successors of N: priority(N) = maxi(priority(si)) + 1.200 Let X, Y and Z be three input linguistic variables, and A, B, C a term from respectively X, Y , Z. Let D1 and D2 be two terms of an output linguistic variable O. Then, the rule base is composed of two rules: • IF (X is A AND Y is B) OR NOT Z is C THEN O is D1, • IF (X is A AND Y is B) OR Z is C THEN O is D2.205 In Figure 2(b), numbers in brackets represent the evaluation priority of each node; the three inputs are at the bottom of the figure and have a null priority, 9 which means they need to be evaluated first. We will develop in section 3.6 the use of the priority during evaluation. To the best of our knowledge, current fuzzy expert system does not imple-210 ment CSE. This is due to the fact that they only use classical fuzzy logic opera- tors which are really fast to compute. For instance, t-norms and t-conorms use arithmetic operators and binary comparisons (Zadeh’s operator min and max), whose complexity is O(1), whereas most of temporal operators complexity is in O(l) where l is a number of samples used [23] and most of spatial operators are215 at least in O(n × m) where n × m is the dimension of the image [24]. We can easily assess the number of nodes in the two representations of the rule base. Let Nt be the number of nodes in the tree-based one and Ng in the graph-based one : Nt(n, p) = n︸︷︷︸ inputs + 1︸︷︷︸ aggregation + 1︸︷︷︸ defuzzification + pn︸︷︷︸ rules ( p︸︷︷︸ propositions + (n − 1)︸ ︷︷ ︸ conjonctions + 1︸︷︷︸ implication ) (1) Ng(n, p) = n︸︷︷︸ inputs + n × p︸ ︷︷ ︸ propositions + n∑ i=2 pi ︸ ︷︷ ︸ conjonctions + pn︸︷︷︸ implications + 1︸︷︷︸ aggregation + 1︸︷︷︸ defuzzification (2) Using Landau notations, these equations show the number of nodes is asymp-220 totically O((p + n) · pn) for the tree-based rule base, and O(pn) for the graph- based one. 3.3. Expiration Among the sophisticated relations we have implemented, temporal operators [23] and those which derived from them need a special attention when applied on225 event streams. The particularity of event streams is that the system is noticed of events irregularly. For instance, let us consider the fact “the temperature was 10 too hot on Monday from 2 am to 3 am”. The system has received two events: at 2am, a temperature high enough to activate the fuzzy proposition “the tem- perature is too hot”, and at 3am, a lower temperature such as “the temperature230 is too hot” is false. Now, we consider the temporal operator “occurrence” from [22] which indicates that a phenomenon has occurred on a certain scope in the past: for instance, it can express that “the temperature was too hot during the last 24 hours”. Until the next Tuesday 3 am, the degree of truth of this occurrence is strictly greater than 0. After 24 hours, its degree of truth equals235 0, whereas the system inputs have not changed since Monday 3 am. Classical fuzzy expert systems cannot perform this trick since they need that inputs change to compute the outputs. We thus introduce in our system the notion of expiration. Some expressions in the rule base are represented by special nodes in the rule base graph, which are marked as “expirable”. After being240 evaluated, the expirable nodes signal to the scheduler they must be evaluated again after a certain delay (see section 3.5). If an input which is connected to an expirable node changed before this delay, the expiration is simply postponed by the scheduler. Thus, expirable components must provide an expiration frequency and a set245 of criteria to stop the expiration. The expiration frequency is a parameter which depends on the application and the set of criteria depend only on the definition of the operator. For instance, in home health patient monitoring applications, expressions usually expire every day for symptoms persistence, because doctors cannot have alerts that change too frequently, whereas in control applications,250 the expiration rate is usually less than 1 second to have a smooth behavior. More details can be found in [23]. 3.4. Active input queue Sensor networks are a particular case of information stream. Some sensors measure (data stream) and some others detect (event stream), but they usu-255 ally work in an asynchronous way. Moreover, some delays can appear in such networks. The active input queue is thus in charge of several steps before the 11 engine can process them. Firstly, it listens to the information stream to fetch the interesting values it contains. Then, it groups the input values by timestamp and enqueue. Finally,260 it signals the scheduler that a new group has been enqueued. Different policies can be conceived for this component. For instance, in some applications, it is necessary to wait for delayed sensors or delayed network packets before signaling the scheduler. Conversely, it can ignore delays and late arrivals, and thus filter these data. It may also be seen as a firewall which265 protects the scheduler from irrelevant inputs. 3.5. Scheduler The scheduler has an important role to play to limit the delay between the arrival of the data and the decision making. When a new input set is announced, it decides, regarding its own policy, whether it is important to compute it im-270 mediately, later or not at all. In the simplest configuration, the scheduler just fetches the first element in the active input queue, asks the evaluator to assess this group of inputs and gives the results to the broadcaster. With the use of policies, his behavior can be more sophisticated. For instance, one particular configuration can monitor the275 system to determine how busy the CPU cores are and to decide whether a group of inputs can be skipped. Moreover, the scheduler implements the expiration. All the expirable components of the rule base whose evaluation has changed are placed in another queue, waiting to expire. Another configuration may consist in evaluating on different processor cores280 of the machine. Each core receives a sub-part of the input set. A simple al- gorithm based on the graph representation of the rule base is used to separate independent inputs on different sub-parts: this may simply be achieved by find- ing connected components of graph with well-known algorithms of graph theory [38].285 12 3.6. Evaluator The evaluator is the component which evaluates the different expressions and rules in the rule base. For a set of inputs, it gives a particular set of outputs. It also takes advantage of the rule base representation to perform the computation only when necessary.290 To compute the different nodes of the graph representing the rule base, the evaluator traverses the graph in a certain order. To ensure the right order, we use a priority queue Q. The priority queue Q places the nodes with the lowest priority at the front of the queue and can contain each node at most once. Algorithm 1 presents the general evaluation algorithm. The algorithm takes295 four parameters: I, a dictionary which maps each input which has changed to its value, E, a set of nodes which has expired and must be evaluated again, M, a dictionary which maps each node of the graph to its value, and finally G, the rule base graph. Eventually, I or E can be exclusively empty: when I is empty, the procedure is called just to evaluate the expired nodes, whereas when E is300 empty, it is called to evaluate the consequences of the input change. The assess function takes the node to evaluate and M : it fetches the operands values in M and applies the operator, then stores the value of the current node in M and finally returns false if the value of the current node has not changed, true otherwise.305 In figure 2(b), the priority queue ensures the node “(X is A AND Y is B) OR Z is C” is evaluated at the right time. It ensures that if several paths lead to the same node N, all nodes on the paths are assessed before N. In fuzzy logic, different functions can be used for operators (conjunction, disjunction, negation, implication, aggregation, defuzzification) evaluation. The310 policies of the evaluator indicate which version of the operators must be used. 3.7. Output change broadcast The broadcaster is also an important module because it is in charge of build- ing the output stream. The last step is indeed to inform on the fly the calling 13 Algorithm 1 Evaluation of the rule base graph Inputs: � I: dictionary mapping nodes and values representing inputs which have just changed � E: set of expired nodes � M: dictionary mapping nodes and values resulting of the previous evaluation � G: rule base graph /* Initializes the priority queue Q */ Q ← ∅ /* Adds changed inputs into Q and update memory */ for all pair 〈node, value〉 in I do Q ← priority enqueue(Q, node, priority(node)) M ← M ⋃ 〈node, value〉 end for /* Adds expired nodes into Q */ for all node n in E do Q ← priority enqueue(Q, n, priority(n)) end for /* Rule base graph browsing */ while Q �= ∅ do current ← priority first(Q) Q ← priority dequeue(Q) if assess(current, M) then for all n such as n ∈ successors(current, G) do Q ← priority enqueue(Q, n, priority(n)) end for end if end while return M 14 system or the user that some outputs have changed. The policies are used to315 determine when and how the outputs have to be broadcast. For instance, the changes can be gathered and sent at regular time intervals or only outputs which have changed are broadcast with their new values. In a more verbose style, the changes can be sent with a trace of the activated rules. It may gather information from the graph and the evaluation of its node to320 build justifications (to explain why the decision has been made). The next section introduces implementation considerations and shows some experiments we achieved to characterize the performances of the software on both data and event stream processing. 4. Implementation and experiments325 The software has been developed in C# as an API (Application Program- ming Interface). This language is not compiled as C++, but it offers a good compromise between efficiency and the ease of interfacing with other systems (for instance by webservices) and runs on different platforms without recompil- ing.330 The succession of the nodes evaluation has been implemented without any optimization regarding the current fuzzy inference methods : the software per- forms all the steps needed in Takagi-Sugeno or Mamdani type inferences. We wanted to be as general as possible to be able to take into account future or experimental inference methods. Moreover, all calculations involved in the dif-335 ferent inferences (integrals, curves intersection, ...) are processed analytically, in opposition with, for instance, Matlab which represents a membership function by a given number of points. In terms of memory usage, for now, all the rules are kept in memory at all time. We also have to store the values of the different nodes of the rule base.340 To implement efficiently expiration, which affects only some temporal operators [23], the values of expirable expression operands are stored in memory regarding a temporal scope to allow partial recalculation. Indeed, it is necessary to find 15 a good compromise between efficiency, scalability and flexibility regarding our industrial partners and our needs to experiment new operators and inference345 methods. Implementation is still a work in progress to improve performances. For instance, from the results in [39], we improved the efficiency of the hash function and some data structures and divided by up to 170 the computation time of large rule bases. From the results in [40], we also improved memory usage in order350 to load larger rule bases. Without loss of generality, we have then experimented the system in two different ways: either with a data stream or an event stream. On one hand, to test the ability of the system to process data streams, we measured the time to process rule bases, varying the number of inputs and terms to increase the355 number of rules at each test. On the other hand, we used an event simulator to test the event stream processing capabilities, which is used to benchmark CEP softwares. It is of course possible to address both event streams and data streams with the same rule base. In the first series of experiments, all the inputs are changing at each time360 whereas in the second series of experiments, few inputs change simultaneously to reveal the potential of our system in real world applications of stream processing. These tests have been processed on only one core of an Intel Xeon X5650 at 2.67GHz on a Windows server and 42GB of RAM to process several tests in parallel without interfering.365 4.1. Data stream experiment This experiment aims at comparing the performances of the evaluation of different rule bases regarding two different policies of the evaluator module : • full recalculation mode : all the expressions and nodes are reassessed each time an input changes,370 • partial recalculation mode : last values of the nodes are kept in memory and are reassessed only when needed, 16 and different rule base representations: • tree-based representation (without CSE), • graph-based representation (with CSE).375 The graph based representation with partial recalculation is the default be- havior of our system. Its modularity, through the use of policies, allows to easily switch between the different modes. The full recalculation mode is obtained by clearing M before each call of the evaluation procedure (algorithm 1) whereas the partial recalculation mode stores in memory the dictionary M.380 4.1.1. Protocol These experiments have been carried out on artificial rule bases and data sets whose generation is described hereafter. Let {vi}1≤i≤n be n input linguistic variables, each defined by p terms T 1i , . . . , T p i . Let w be an unique output linguistic variable whose terms are W1, ..., WK. Those input variables combine385 into rules by the full conjunctive combination principle : IF v1 is T l1 1 and . . . and vn is T ln n THEN w is Wk where T lii refers to a term of vi with 1 ≤ li ≤ p and k = ∑n i=1 li − n + 1. Thus, for a given couple (n, p), there are pn possible combinations of those inputs (i.e. rules) and w has K = n(p − 1) + 1 terms.390 For the sake of simplicity, the terms T lii of each variable vi are defined by triangular membership functions on the domain [0, p + 1]. By construction, the support of each term T lii is [li − 1; li + 1] and its kernel is {li}. The same construction is used for the terms Wk of w. Figure 3 shows an example of a linguistic variable characterized by 3 terms.395 Each input variable vi receives a data stream of 20 values, which have been generated following an uniform distribution U([0, p + 1]). The architecture has been configured as follows: the active input queue is set in DSP mode, i.e. it waits to receive a value for each input. The scheduler evaluates this group as soon as possible, then the new value of the output is400 17 0 1 2 3 4 0 0,5 1 Variable 1 Term 1 Term 2 Term 3 Figure 3: Linguistic variable with 3 terms defined on the domain [0, 4]. broadcasted. This is the most simple configuration of these modules. The dif- ferent modes of evaluation of the architecture have been obtained by configuring the policy of the evaluator : in one case, it uses its memory functionality; in the other case, it has to compute all the values of the nodes again. The same input data streams have been used for both cases.405 By making both the number of inputs n and the number of terms p vary from 2 to 10, we are able to assess the performance of the architecture on large rule bases and to draw some conclusions. Due to the computational cost, the largest configuration was obtained with 7 input variables and 9 linguistic terms with the graph-based representation (4782969 rules), whereas the tree-based410 representation allows to reach 9 input variables and 4 terms (262144 rules). Even if these are not realistic cases, it is useful to benchmark the proposed system. 4.1.2. Results In this section, we first compare the average number of nodes being reevalu-415 ated in each mode and then compare the average evaluation time of the different rule bases. The averages are computed over the 20 values of the data stream to decrease the possible biases. Figure 4 represents the number of nodes to be evaluated in each configuration and figure 5 represents the computation times regarding the number of rules.420 These figures show the results for the full and partial recalculation modes and 18 for the tree-based and graph-based representations of the rule base in log-scale. We can see that for the graph-based rule bases, both the computation time and the number of nodes are linear regarding the number of rules whereas for tree- based rule bases, it is weakly exponential. Point clouds in figure 5 confirm the425 intuition: storing the value of each node allows to stop propagating the changes, and strongly decreases the number of nodes to evaluate. For a rule base with 6 input variables and 8 terms (262144 rules), 1572920 nodes may be evaluated in the full recalculation mode with tree-based rule base and 561784 with a graph, whereas in the partial one, only 35248 nodes in average are evaluated for trees430 and 275 for graphs. 1,E+00 1,E+01 1,E+02 1,E+03 1,E+04 1,E+05 1,E+06 1,E+07 1,E+08 1,E+00 1,E+02 1,E+04 1,E+06 Partial recalculation mode (Graph) Full recalculation mode (Graph) Partial recalculation mode (Tree) Full recalculation mode (Tree) Number of rules Av er ag e nu m be r o f r ee va lu at ed n od es Figure 4: Average number of nodes to be evaluated for different rule base sizes in the all four modes (log-scale for both axes). In particular, figure 5 shows the duration of the evaluation of the rule bases in full and partial modes with graph-based rule bases. With the same data streams as before, to evaluate 262144 rules, full recalculation mode needs almost 2s whereas the partial one needs only 300ms, i.e. the latter one is more than 6435 times faster than the former one on this rule base structure. 19 1,E-05 1,E-04 1,E-03 1,E-02 1,E-01 1,E+00 1,E+01 1,E+02 1,E+03 1,E+00 1,E+01 1,E+02 1,E+03 1,E+04 1,E+05 1,E+06 1,E+07 Partial recalculation mode (Graph) Full recalculation mode (Graph) Partial recalculation mode (Tree) Full recalculation mode (Tree) Number of rules Av er ag e co m pu ta tio n tim e (s ec ) Figure 5: Average computation time in seconds for different rule base sizes in all four modes (log-scale for both axes). 4.1.3. Discusssion The drastic reduction of the number of nodes to be evaluated can be ex- plained by a theoretical analysis. The fuzzy partitions used to create the terms of the linguistic variables explain why, at each time, for each variable, at most 2440 terms out of p are activated. Thus, at most Ng(n, 2) nodes (equ. 2) have to be evaluated: for n = 6, at most 208 nodes will be activated. But a large number of them are null because of the conjunctive combination of the inputs. Now, to count the number of needed reevaluations, we should consider the worst case : all the active elementary propositions become null, and the same number of445 propositions get a non-null value. This gives 2×Ng(n, 2) as a pessimistic upper bound of the number of nodes that need to be reevaluated. It seems that saved computational time is not as high as we could expect considering the saved computations shown just before. Indeed, saved compu- tations correspond to the evaluation of a null value by a quite simple function450 (mainly either by the membership function evaluation or by a conjunctive com- bination of two expressions) and its affectation to nodes that were already null. 20 We will see in section 4.2 that partial recalculation mode is really helpful in event stream processing. These tests are good stress tests because all the inputs change at the same455 time. For rule bases of conventional sizes, for instance 343 rules, the engine needs less than 0.5ms in the partial recalculation mode. Thus, we can handle inputs which change more than 2000 times per second on only one core. 4.2. Event stream experiment Streaming events poses some challenges to the design of a benchmark. As460 described in [41], the authors introduce “Linear Road”, a benchmark for event stream processing systems which matches with many requirements of such bench- marks. For instance, we cite the two most important requirements in our case: • the generated data must have a semantic validity, • the benchmark must be verifiable (even if the streams may vary depending465 on the moment they are generated). The linear road benchmark is inspired from the problem of variable tolling (also known as congestion pricing), i.e. the computation of tolls that vary according to different factors such as congestion levels and accident proximity [42]. The input data are generated by the MIT Traffic Simulator (MITSIM) [43].470 The benchmark consists in different challenges for stream data management systems, from historical queries to tolls assessment. In our case, we have only implemented one challenge out five: the detection of car accidents. The other challenges are not appropriate for a fuzzy expert system and need the capability to query historical data or an aggregated view of the expressway.475 4.2.1. Previous work In the following experiment, we used the “strictly persists” temporal oper- ator described in [23] and which evaluates how much a phenomenon persists during a given scope. In our case, the scope is fuzzy since some moments in 21 -5 -4,5 -4 -3,5 -3 -2,5 -2 -1,5 -1 -0,5 0 0 0,5 1 Scope last 2 minutes (a) Membership function for the fuzzy scope “the last 2 minutes” 0 10 20 30 40 50 0 0,5 1 Distance Very short (b) Membership function of a very short dis- tance Figure 6: Parameters used for the linear road benchmark. the past are considered as more important. Figure 6(a) shows the member-480 ship function used to define the scope: it considers the values from 2.5 minutes to 2 minutes before the present as more and more important, and moments from 2 minutes to the present as important. This fuzzy scope allows testing a representative case for the temporal operator in terms of computational cost. 4.2.2. Protocol485 We generated data with the MITSIM simulator [43] in a flat file which rep- resents a simulation of 6 minutes and 40 seconds, involving 5953 cars. We then select only a given number of cars inside this simulation. The data consist in cars positions, emitted every 30 seconds on a bidirectional expressway. Each direction is composed of 3 travel lanes and one exit and one entrance ramps.490 The position is given by the direction, the lane number and an integer that represents a 1-D coordinate on a specific lane. In the original benchmark, a car accident is detected if at least two cars report the same position for at least four times. More details can be found in [41] and [43]. We developed a software to play the simulation in real time and another one to check if the accident495 notifications are true regarding the simulation. To use the expiration, we send the position of a car only if it changes from its previous position. To match with a fuzzy problem, we changed the benchmark rules. Firstly, we used a fuzzy definition of the distance, to take into account the GPS inaccuracy 22 and the size of the car: figure 6(b) shows a narrow gaussian membership function500 which defines a short distance. Then, we decided to use the “strictly persists” temporal operator, described in section 4.2.1, and to develop two new nodes for the rule graph: • a distance node which simply computes the absolute value of the difference between two inputs values,505 • a “same lane” crisp node, which is boolean and indicates if two cars are on the same lane, same direction. For each unique pair of distinct cars (cari, carj), we wrote the two following rules : • IF (cari is on the same lane as carj AND distance between cari and carj510 IS very short) STRICTLY PERSISTS during at least the 2 last minutes THEN Accident(i, j) IS true, • IF NOT ( (cari is on the same lane as carj AND distance between cari and carj IS very short) STRICTLY PERSISTS during at least the 2 last minutes) THEN Accident(i, j) IS false.515 For n cars on the expressway, we thus have n2 − n pairs of rules. When one car emits a position, it affects 2 × (n − 1) rules which may be reevaluated. In particular, for the “strictly persists” operator, the different values of operand are kept in memory during the last 2.5min as a compressed signal : i.e., as we receive at most a position every 30s for each car, the signal for each pair of cars520 contains at most 10 values. However, we do not store the past values of the car positions. During this test, we used the following configuration : • Input queue groups inputs by timestamps with no delay, • Scheduler processes on one core,525 • Evaluator accepts temporal operators and use Zadeh norm and conorm, 23 Number Simultaneous events Number Response Total computation of cars min avg max of rules time (ms) time (s) 50 1 4.7 9 4900 9.38 1.8 100 2 5.6 9 19800 19.22 5.8 250 2 8 20 124500 317.57 5.8 400 2 12.4 24 319200 806.51 21.9 500 2 15.2 24 499000 836.56 40.1 750 2 21.7 37 1123500 847.53 40.5 1000 2 27.2 43 1998000 862.6 41.5 1100 2 29.2 48 2417800 912.37 41.9 1200 2 31 50 2877600 880.11 41.4 1300 2 32.5 53 3377400 808.65 37.2 Table 1: Results of the partial linear road benchmark performed on a scenario of 6’40” • Broadcaster indicates the value of an output only when it changes, • The persistence expires every ten seconds. When an output changes from false to true, we record the time and the value to indicate that an accident occurs. Note that in the case more than two cars530 are involved in the accident, more than one output will change: for instance, if an accident occurred between car1, car2 and car3, we will be noticed that an accident happened between car1 and car2, car1 and car3, car2 and car3 (thus, 3 notifications for the same accident). 4.2.3. Results535 Table 1 shows the results of the car accident detection in the linear road benchmark. We iterate the tests with different number of cars, from 50 to 1300. The table characterizes the number of simultaneous inputs that change during the simulation by the minimum, the average and the maximum values. It then shows the number of involved rules, the response time, i.e. the average delay to540 evaluate the rule base in milliseconds, and finally the total time of computations during the simulation. For instance, with 1300 cars, in average 32.5 cars report their positions simultaneously. 808.65ms were necessary to tell if an accident happened or not. 24 The total computation time lasts only 37.2s over 401s of simulation.Thus, the545 system was evaluating the rules during less than 10% of the total duration of the simulation. The results show that, thanks to partial recalculation mode, the number of rules does not impact so much on the response time. Naturally, the latter is more impacted by the number of simultaneous events.550 4.2.4. Discussion In this benchmark, regarding the current state of the software, we were lim- ited by two main factors. First, the number of simultaneous events is always low because of the nature of the simulation. Indeed, to the best of our compre- hension, to increase the number of simultaneous events in MITSIM, we have to555 increase the number of cars. However, the number of rules grows too fast regard- ing the number of cars. This leads to the second limitation : the construction of such large rule bases. We have not been able to deal with more than 1300 cars because of the time needed to create the rule base. Indeed, in our architecture, to apply common subexpression elimination on rule bases, we need to check if a560 subexpression has been created before. Even with hashing functions, this step implies a certain amount of computations. Despite this cost, we have to remind that without CSE, we could not evaluate such large rule bases. To better handle the linear road benchmark, systems may consider dynamic rule bases (when a new car appears on the express way, a new set of rules is565 added, and they are removed whenever the car disappears) or to filter the rules to evaluate (e.g. the rules concerning two cars are created only if the two cars are close enough). The goal of this experiment was to study the behavior of the system on large rule bases for event streams. In the case we would want to address this570 benchmark fully, we should use a multi-core scheduler and higher level rules, like “FOR ALL unique pair of cars (x, y), IF ... THEN ...”: we will thus have only two rules in memory for all the pair of cars, while keeping the performances of the inference engine. 25 5. Conclusion575 In this paper, we have presented a modular architecture for a fuzzy expert system designed to handle information streams (data streams or event streams). The architecture relies on two aspects. Firstly, the graph representation of the rule base indicates the dependency between inputs, expressions, rules and outputs. More generally, it indicates what must be computed and in which580 order. Secondly, the use of four cooperating modules permits to filter and to decide when it is possible to process a set of inputs. The introduction of policies in the four modules allows to customize their behaviors regarding the addressed projects or issues. Moreover, the flexibility of the rule base representation has been shown by the addition of two ad-hoc types of node in the graph (“distance”585 node and “same lane and direction” node). The described architecture has been implemented and used in several in- dustrial projects in different domains: home automation, decision making in industry and home care services. All projects needed to process either data stream or event stream, sometimes both of them at the same time.590 Uncertainty and imprecision are real-world challenges, but others emerge. The different experiments that have been presented in this paper show some limitations of the current system: the usage of only one core of the processor and the necessity to load a potentially huge number of rules. Considering CEP and several thousands of inputs per second, we should parallelize the computations.595 We should also consider higher level rules that could be applied efficiently to a lot of inputs while keeping the performances of the inference engine. Moreover, users need more fuzzy relations to be able to describe their scenarios or to characterize what they want to extract from the streams. Finally, online rule base optimization will allow users to sketch first rules and then let the system600 evolve. 26 References [1] G. Cugola, A. Margara, Processing flows of information: From data stream to complex event processing, ACM Comput. Surv. 44 (3) (2012) 15:1–15:62. [2] J. A. Silva, E. R. Faria, R. C. Barros, E. R. Hruschka, A. C. P. L. F. d.605 Carvalho, J. a. Gama, Data stream clustering: A survey, ACM Comput. Surv. 46 (1) (2013) 13:1–13:31. [3] M. R. Henzinger, P. Raghavan, S. Rajagopalan, Computing on data streams, in: J. M. Abello, J. S. Vitter (Eds.), External Memory Algorithms, American Mathematical Society, Boston, MA, USA, 1999, Ch. Computing610 on Data Streams, pp. 107–118. [4] S. Guha, A. Meyerson, N. Mishra, R. Motwani, L. O’Callaghan, Clustering data streams: Theory and practice, IEEE Trans. on Knowl. and Data Eng. 15 (3) (2003) 515–528. [5] A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, K. Ito, R. Mot-615 wani, U. Srivastava, J. Widom, Stream: The stanford data stream man- agement system, Technical Report 2004-20, Stanford InfoLab (2004). [6] S. Chandrasekaran, O. Cooper, A. Deshpande, M. J. Franklin, J. M. Heller- stein, W. Hong, S. Krishnamurthy, S. R. Madden, F. Reiss, M. A. Shah, Telegraphcq: Continuous dataflow processing, in: Proceedings of the 2003620 ACM SIGMOD International Conference on Management of Data, SIG- MOD ’03, ACM, New York, NY, USA, 2003, pp. 668–668. [7] M. Khalilian, M. Norwati, Data stream clustering: Challenges and issues, Proceedings of International Multi Conference of Engineers and Computer Scientists (2010) 566–569.625 [8] D. Cardoso, M. De Gregorio, P. Lima, J. Gama, F. França, A weight- less neural network-based approach for stream data clustering, in: H. Yin, J. Costa, G. Barreto (Eds.), Intelligent Data Engineering and Automated 27 Learning - IDEAL 2012, Vol. 7435 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, 2012, pp. 328–335.630 [9] G. Krempl, I. Žliobaite, D. Brzeziński, E. Hüllermeier, M. Last, V. Lemaire, T. Noack, A. Shaker, S. Sievi, M. Spiliopoulou, J. Stefanowski, Open chal- lenges for data stream mining research, SIGKDD Explor. Newsl. 16 (1) (2014) 1–10. [10] S. Muthukrishnan, Data streams: Algorithms and applications, in: Pro-635 ceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’03, Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2003, pp. 413–413. [11] D. C. Luckham, The Power of Events: An Introduction to Complex Event Processing in Distributed Enterprise Systems, Addison-Wesley Longman640 Publishing Co., Inc., Boston, MA, USA, 2001. [12] E. Wu, Y. Diao, S. Rizvi, High-performance complex event processing over streams, in: Proceedings of the 2006 ACM SIGMOD International Con- ference on Management of Data, ACM, New York, NY, USA, 2006, pp. 407–418.645 [13] M. Stonebraker, U. Çetintemel, S. Zdonik, The 8 requirements of real-time stream processing, SIGMOD Rec. 34 (4) (2005) 42–47. [14] E. Alevizos, A. Skarlatidis, A. Artikis, G. Paliouras, Complex event process- ing under uncertainty: A short survey, in: P. M. Fischer, G. Alonso, M. Are- nas, F. Geerts (Eds.), Proceedings of the Workshops of the EDBT/ICDT650 2015 Joint Conference, Vol. 1330 of CEUR Workshop Proceedings, CEUR- WS.org, 2015, pp. 97–103. [15] A. Artikis, C. Baber, P. Bizarro, C. Canudas-de Wit, O. Etzion, F. Fournier, P. Goulart, A. Howes, J. Lygeros, G. Paliouras, A. Schuster, I. Sharfman, Scalable proactive event-driven decision making, Technology655 and Society Magazine, IEEE 33 (3) (2014) 35–41. 28 [16] L. Zadeh, Fuzzy sets, Information and Control 8 (3) (1965) 338 – 353. [17] J. M. Garibaldi, Do Smart Adaptive Systems Exist? Best Practice for Se- lection and Combination of Intelligent Methods, Springer, 2005, Ch. Fuzzy expert systems, pp. 105–132.660 [18] W. Siler, J. Buckley, Fuzzy expert systems and fuzzy reasoning, Wiley- Interscience, 2005. [19] K. Basterretxea, J. M. Tarela, del Campo I, G. Bosque, An experimental study on nonlinear function computation for neural/fuzzy hardware design, IEEE Transactions on Neural Networks 18 (1) (2007) 266–283.665 [20] A. Laurent, M. J. Lesot (Eds.), Scalable Fuzzy Algorithms for Data Man- agement and Analysis: Methods and Design, Information Science Refer- ence, 2010. [21] G. Acampora, V. Loia, Fuzzy control interoperability and scalability for adaptive domotic framework, IEEE Transactions on Industrial Informatics670 1 (2) (2005) 97–111. [22] P. Cariñena, A. Bugaŕın, M. Mucientes, S. Barro, A language for expressing fuzzy temporal rules, Mathware and Soft Computing 7 (2-3) (2000) 213– 227. [23] J.-P. Poli, L. Boudet, D. Mercier, Online temporal reasoning for event and675 data streams processing, IEEE Conference on Fuzzy Systems, FUZZ-IEEE (2016) 2257–2264. [24] I. Bloch, Fuzzy spatial relationships for image processing and interpreta- tion: a review, Image and Vision Computing 23 (2) (2005) 89 – 110. [25] S. Schockaert, M. D. Cock, E. Kerre, Reasoning about fuzzy temporal and680 spatial information from the web, World Scientific, 2010. 29 [26] J.-M. Le Yaouanc, J.-P. Poli, A fuzzy spatio-temporal-based approach for activity recognition, in: Advances in Conceptual Modeling, Vol. 7518 of Lecture Notes in Computer Science, 2012, pp. 314–323. [27] W. E. Combs, J. E. Andrews, Combinatorial rule explosion eliminated by685 a fuzzy rule configuration, Trans. Fuz Sys. 6 (1) (1998) 1–11. [28] J. Mendel, Q. Liang, Comments on ”combinatorial rule explosion elimi- nated by a fuzzy rule configuration”, IEEE Transactions on Fuzzy Systems 7 (3) (1999) 369–373. [29] M. Sugeno, K. Tanaka, Successive identification of a fuzzy model and its690 applications to prediction of a complex system, Fuzzy sets and systems 42 (3) (1991) 315–334. [30] C.-T. Sun, Rule-base structure identification in an adaptive-network-based inference systems, in: IEEE Transaction on Fuzzy Systems, Vol. 2, 1994, pp. 64–73.695 [31] Y. Jin, Fuzzy modeling of high-dimensional systems: Complexity reduction and interpretability improvement, Trans. Fuz Sys. 8 (2) (2000) 212–221. [32] B. Kosko, Optimal fuzzy rules cover extrema, International Journal of In- telligent Systems 10 (2) (1995) 249–255. [33] I. Rodŕıguez-Fdez, M. Mucientes, A. Bugaŕın, S-FRULER: Scalable fuzzy700 rule learning through evolution for regression, Knowledge-Based Systems 110 (2016) 255–266. [34] L. Reznik, Fuzzy Controllers Handbook, Newnes, 1997. [35] K. Basterretxea, I. Del Campo, Scalable Fuzzy Algorithms for Data Man- agement and Analysis: Methods and Design, Information Science Refer-705 ence, 2010, Ch. Electronic Hardware for Fuzzy Computation, pp. 1–30. 30 [36] N. Harvey, R. H. Luke, J. M. Keller, D. Anderson, Speedup of fuzzy logic through stream processing on graphics processing units, IEEE Congress on Evolutionary Computation (2008) 3809–3815. [37] B. R. Preiss, Data Structures and Algorithms with Object-Oriented Design710 Patterns in Java, Worldwide Series in Computer Science, Wiley, 2000. [38] J. Hopcroft, R. Tarjan, Algorithm 447: Efficient algorithms for graph ma- nipulation, Commun. ACM 16 (6) (1973) 372–378. [39] J.-P. Poli, L. Boudet, Une architecture moderne de système expert flou pour le traitement des flux d’information, Rencontres Francophones sur la715 Logique Floue et ses Applications, 2015. [40] J.-P. Poli, L. Boudet, A modular fuzzy expert system architecture for data and event streams processing, in: IPMU, 2016, pp. 717–728. [41] A. Arasu, M. Cherniack, E. Galvez, D. Maier, A. S. Maskey, E. Ryvkina, M. Stonebraker, R. Tibbetts, Linear road: A stream data management720 benchmark, in: Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, VLDB ’04, VLDB Endowment, 2004, pp. 480–491. [42] A. D. Palma, R. Lindsey, Traffic congestion pricing methods and technolo- gies, Tech. rep., Ecole Polytechnique (2009).725 [43] Q. Yang, H. N. Koutsopoulos, A microscopic traffic simulator for evaluation of dynamic traffic management systems, Transportation Research Part C: Emerging Technologies 4 (3) (1996) 113 – 129. 31 work_2z3fjxhcrfhefbagjvcl4iyogm ---- Neural network ensemble operators for time series forecasting Kourentzes, N. , Barrow, D. K. and Crone, S. F. Author post-print (accepted) deposited in CURVE January 2016 Original citation & hyperlink: Kourentzes, N. , Barrow, D. K. and Crone, S. F. (2014) Neural network ensemble operators for time series forecasting. Expert Systems with Applications, volume 41 (9): 4235–4244 http://dx.doi.org/10.1016/j.eswa.2013.12.011 DOI 10.1016/j.eswa.2013.12.011 ISBN 978-86-7892-739-3 Publisher: Elsevier Copyright © and Moral Rights are retained by the author(s) and/ or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This item cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder(s). The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders. This document is the author’s post-print version, incorporating any revisions agreed during the peer-review process. Some differences between the published version and this version may remain and you are advised to consult the published version if you wish to cite from it. CURVE is the Institutional Repository for Coventry University http://curve.coventry.ac.uk/open http://dx.doi.org/10.1016/j.eswa.2013.12.011 http://curve.coventry.ac.uk/open Neural Network Ensemble Operators for Time Series Forecasting Nikolaos Kourentzesa,∗, Devon K. Barrowa, Sven F. Cronea aLancaster University Management School Department of Management Science, Lancaster, LA1 4YX, UK Abstract The combination of forecasts resulting from an ensemble of neural networks has been shown to outperform the use of a single “best” network model. This is supported by an extensive body of literature, which shows that combining generally leads to improvements in forecasting accuracy and robustness, and that using the mean operator often outperforms more complex methods of combining forecasts. This paper proposes a mode ensemble operator based on kernel density estimation, which unlike the mean operator is insensitive to outliers and deviations from normality, and unlike the median operator does not require symmetric distributions. The three operators are compared empirically and the proposed mode ensemble operator is found to produce the most accurate forecasts, followed by the median, while the mean has relatively poor performance. The findings suggest that the mode operator should be considered as an alternative to the mean and median operators in forecasting applications. Experiments indicate that mode ensembles are useful in automating neural network models across a large number of time series, overcoming issues of uncertainty associated with data sampling, the stochasticity of neural network training and the distribution of the forecasts. Keywords: Time Series, Forecasting, Ensembles, Combination, Mode Estimation, Kernel Density Estimation, Neural Networks, Mean, Median ∗Correspondance: N Kourentzes, Department of Management Science, Lancaster Uni- versity Management School, Lancaster, Lancashire, LA1 4YX, UK. Tel.: +44-1524-592911 Email address: n.kourentzes@lancaster.ac.uk (Nikolaos Kourentzes) Preprint submitted to Expert Systems with Applications December 5, 2013 1. Introduction With the continuing increase in computing power and availability of data, there has been a growing interest in the use artificial Neural Networks (NNs) for forecasting purposes. NNs are typically used as ensembles of several net- work models to deal with sampling and modelling uncertainties that may otherwise impair their forecasting accuracy and robustness. Ensembles com- bine forecasts from the different models that comprise them. This paper proposes a new fundamental ensemble operator for neural networks that is based on estimating the mode of the forecast distribution, which has appeal- ing properties compared to established alternatives. Although the use of ensembles is nowadays accepted as the norm in fore- casting with NNs (Crone et al., 2011), their performance is a function of how the individual forecasts are combined (Stock and Watson, 2004). Improve- ments in the ensemble combination operators have direct impact on the re- sulting forecasting accuracy and the decision making that forecasts support. This has implications for multiple forecasting applications where NN ensem- bles have been used. Some examples include diverse forecasting applications such as: economic modelling and policy making (McAdam and McNelis, 2005; Inoue and Kilian, 2008), financial and commodities trading (Zhang and Berardi, 2001; Chen and Leung, 2004; Versace et al., 2004; Bodyanskiy and Popov, 2006; Yu et al., 2008), fast-moving consumer goods (Trapero et al., 2012), tourism (Pattie and Snyder, 1996), electricity load (Hippert et al., 2001; Taylor and Buizza, 2002), temperature and weather (Roebber et al., 2007; Langella et al., 2010), river flood (Campolo et al., 1999) and hydrolog- ical modelling (Dawson and Wilby, 2001), climate (Fildes and Kourentzes, 2011), and ecology (Araújo and New, 2007) to name a few. Zhang et al. (1998) lists multiple other forecasting applications where they have been em- ployed successfully. As NN ensembles are fundamental for producing accurate NN forecasts for these various applications; hence, improvements in the construction of the ensembles are important. In this paper, the performance of the proposed mode operator is investigated together with the two existing fundamental ensemble operators: the mean and the median. Two different datasets, in- cluding 3,443 real time series, are used to empirically evaluate the different operators. Furthermore, ensembles of both training initialisations and sam- pling (bagging) are used to investigate the performance of the operators. The proposed operator is found to be superior to established alternatives. 2 Moreover, the robustness and good performance of the median operator is validated. The findings provide useful insights for the application of NNs in large scale forecasting systems, where robustness and accuracy of the fore- casts are equally desirable. The rest of the paper is organised as follows: section 2 discusses the benefits of NN ensembles and the limitations of the established ensemble operators. Section 3 introduces multilayer perceptrons that will be used for this paper and section 4 discusses the three fundamental ensemble operators and presents the proposed method for mode ensembles. Sections 5 and 6 discuss the experimental design and the results respectively, followed by a discussion of the findings in section 7. 2. Forecasting with neural networks Over the last two decades there has been substantial research in the use of NNs for forecasting problems, with multiple successful applications (Zhang et al., 1998). Adya and Collopy (1998) found that NNs outperformed estab- lished statistical benchmarks in 73% of the papers reviewed. NNs are flexible nonlinear data driven models that have attractive properties for forecasting. They have been proven to be universal approximators (Hornik et al., 1989; Hornik, 1991), being able to fit to any underlying data generating process. NNs have been empirically shown to be able to forecast both linear (Zhang, 2001) and nonlinear (Zhang et al., 2001) time series of different forms. Their attractive properties have led to the rise of several types of NNs and appli- cations in the literature (for examples see Connor et al., 1994; Zhang et al., 1998; Efendigil et al., 2009; Khashei and Bijari, 2010). While NNs powerful approximation capabilities and self-adaptive data driven modelling approach allow them great flexibility in modelling time series data, it also complicates substantially model specification and the es- timation of their parameters. Direct optimisation through conventional min- imisation of error is not possible under the multilayer architecture of NNs and the back-propagation learning algorithm has been proposed to solve this problem (Rumelhart et al., 1986), later discussed in the context of time series by Werbos (1990). Several complex training (optimisation) algorithms have appeared in the literature, which may nevertheless be stuck in local optima (Hagan et al., 1996; Haykin, 2009). To alleviate this problem, training of the networks may be initialised several times and the best network model selected according to some fitting criteria. However, this may still lead to 3 suboptimal selection of parameters depending on the fitting criterion, result- ing in loss of predictive power in the out-of-sample set (Hansen and Salamon, 1990). Another challenge in the parameter estimation of NNs is due to the uncertainty associated with the training sample. Breiman (1996b) in his work on instability and stabilization in model selection showed that subset selection methods in regression, including artificial neural networks, are un- stable methods. Given a data set and a collection of models, a method is defined as unstable if a small change in the data results in large changes in the set of models. These issues pose a series of challenges in selecting the most appropriate model for practical applications and currently no universal guidelines exist on how best to do this. In dealing with the first, the NN literature has strongly argued, with supporting empirical evidence, that instead of selecting a single NN that may be susceptible to poor initial values (or model setup), it is preferable to consider a combination of different NN models (Hansen and Salamon, 1990; Zhang and Berardi, 2001; Versace et al., 2004; Barrow et al., 2010; Crone et al., 2011; Ben Taieb et al., 2012). Naftaly et al. (1997) showed that ensembles across NN training initialisations of the same model can improve accuracy while removing the need for identifying and choosing the best training initialisation. This has been verified numerous times in the literature (for example see Zhang and Berardi, 2001). These ensembles aim at reducing the parameter uncertainty due to the stochasticity of the training of the networks. Instead of relying on a single network that may be stuck to a local minima during its training, with poor forecasting performance, a combination of several networks is used. In the case of uncertainty about the training data, Breiman (1996a) proposed Bagging (Bootstrap aggregation and combination) for generating ensembles. The basic idea behind bagging is to train a model on permutations of the original sample and then combine the resulting models. The resulting ensemble is robust to small changes in the sample, alleviating this type of uncertainty. Recent research has lead to a series of studies involving the application of the Bagging algorithm for forecasting purposes with positive results in many application areas (Inoue and Kilian, 2008; Lee and Yang, 2006; Chen and Ren, 2009; Hillebrand and Medeiros, 2010; Langella et al., 2010). Apart from improving accuracy, using ensembles also avoids the problem of identifying and choosing the best trained network. In either case, neural network ensembles created from multiple initial- isations or from the application of the Bagging algorithm, require the use 4 of an ensemble combination operator. The forecast combination literature provides insights on how to best do this. Bates and Granger (1969) were amongst the first to show significant gains in forecasting accuracy through model combination. Newbold and Granger (1974) showed that a linear com- bination of univariate forecasts often outperformed individual models, while Ming Shi et al. (1999) provided similar evidence for nonlinear combinations. Makridakis and Winkler (1983) using simple averages concluded that the forecasting accuracy of the combined forecast improved, while the variabil- ity of accuracy among different combinations decreased as the number of methods in the average increased. The well known M competitions provided support to these results; model combination through averages improves ac- curacy (Makridakis et al., 1982; Makridakis and Hibon, 2000). Elliott and Timmermann (2004) showed that the good performance of equally weighted model averages is connected to the mean squared error loss function and under varying conditions optimally weighted averages can lead to better ac- curacy. Agnew (1985) found good accuracy of the median as an operator to combine forecasts. Stock and Watson (2004) considered simple averages, me- dians and trimmed averages of forecast, finding the average to be the most accurate, although one would expect the more robust median or trimmed mean to perform better. On the other hand, McNees (1992) found no sig- nificant differences between the performance of the mean and the median. Kourentzes et al. (2013) showed that combining models fitted on data sam- pled at different frequencies can achieve better forecasting accuracy at all short, medium and long term forecast horizons and found small differences in using either the mean or the median. There is a growing consensus that model combination has advantages over selecting a single model not only in terms of accuracy and error variability, but also simplifying model building and selection, and therefore the forecast- ing process as a whole. Nonetheless, the question of how to best combine different models has not been resolved. In the literature there are many dif- ferent ensemble methods, often based on the fundamental operators of mean and median, in an unweighted or weighted fashion. Barrow et al. (2010) ar- gued that the distribution of the forecasts involved in the calculation of the ensemble prediction may include outliers that may harm the performance of mean-based ensemble forecasts. Therefore, they proposed removing such elements from the ensemble, demonstrating improved performance. Jose and Winkler (2008) using a similar argument advocated the use of trimmed and winsorised means. On the other hand, median based ensembles, are more 5 robust to outliers and such special treatment may be unnecessary. However, the median, as a measure of central tendency is not robust to deviations from symmetric distributions. The median will merely calculate the middle value that separates the higher half from the lower half of the dataset, which is not guaranteed to describe well the location of the distribution of the forecasts that are used to construct the ensemble. Taking a different perspective, ensembles provide an estimate of where most forecasts tend to be. Mean and median are merely measures of the cen- tral tendency of the forecast distribution. In the case of normal distribution these coincide. Outliers and deviations from normality harm the quality of the estimation. An apparent alternative, that in theory is free of this prob- lem, is the mode. This measure of central tendency has been overlooked in the combination literature because of its inherent difficulty in estimating it for unknown distributions. This paper exploits the properties of the mode to propose a new fundamental ensemble operator. In the following sections this operator is introduced and evaluated against established alternatives. 3. Multilayer Perceptrons The most commonly used form of NNs for forecasting is the feedforward Multilayer Perceptron. The one-step ahead forecast ŷt+1 is computed using inputs that are lagged observations of the time series or other explanatory variables. I denotes the number of inputs pi of the NN. Their functional form is: ŷt+1 = β0 + H ∑ h=1 βhg ( γ0i + I ∑ i=1 γhipi ) . (1) In eq. (1), w = (β, γ) are the network weights with β = [β1, . . . , βH], γ = [γ11, . . . , γHI] for the output and the hidden layers respectively. The β0 and γ0i are the biases of each neuron, which for each neuron act similarly to the intercept in a regression. H is the number of hidden nodes in the network and g(·) is a non-linear transfer function, which is usually either the sigmoid logistic or the hyperbolic tangent function. NNs can model interactions between inputs, if any. The outputs of the hidden nodes are connected to an output node that produces the forecast. The output node is often linear as in eq. (1). 6 w 1 w 2 M S E −10 −5 0 5 10 −10 −5 0 5 10 0 0.2 0.4 0.6 0.8 1 Figure 1: Contour plot of the error surface of a neural network. The initial (⊕) and ending (•) weights for six different training initialisations are marked. In the time series forecasting context, neural networks can be perceived as equivalent to nonlinear autoregressive models (Connor et al., 1994). Lags of the time series, potentially together with lagged observations of explanatory variables, are used as inputs to the network. During training pairs of input vectors and targets are presented to the network. The network output is compared to the target and the resulting error is used to update the network weights. NN training is a complex nonlinear optimisation problem and the network can often get trapped in local minima of the error surface. In order to avoid poor quality results, training should be initialised several times with different random starting weights and biases to explore the error surface more fully. Figure 1 provides an example of an error surface of a very simple NN. The example network is tasked to model a time series with a simple autoregressive input and is of the form ŷt+1 = g (w2g (w1yt−1)), where g(·) is the hyperbolic tangent and w1 and w2 its weights. Six different training initialisations, with their respective final weights, are shown. Observe that minor differences in the starting weights can result in different estimates, even for such a simple model. In order to counter this uncertainty an ensemble of all trained networks can be used. As discussed before, this approach has been shown to be superior to choosing a single set of estimated weights. Note that the objective of training is not to identify the global optimum. 7 This would result in the model over-fitting to the training sample and would then generalise poorly to unseen data (Bishop, 1996), in particular given their powerful approximation capabilities (Hornik, 1991). Furthermore, as new data become available, the prior global optimum may no longer be an optimum. In general, as the fitting sample changes, with the availability of new information, so do the final weights of the trained networks, even if the initial values of the network weights were kept constant. This sampling induced uncertainty can again be countered by using ensembles of models, following the concept of bagging. 4. Ensemble operators Let ŷmt be a forecast from model m for period t, where m = 1, . . . , M and M the number of available forecasts to be combined in an ensemble forecast ỹt. In this section the construction of ỹt using the mean, median and the proposed mode operators is discussed. To apply any of these operators reliably a unimodal distribution is assumed. 4.1. Mean ensemble The mean is one of the most commonly used measures of central tendency and can be weighted or unweighted. Let wm be the weight for the forecasts from model m. Conventionally 0 ≤ wm ≤ 1 and ∑M m=1 wm = 1. The ensemble forecast for period t is calculated as: ỹMeant = M ∑ m=1 wmymt. (2) If all wm = M −1 the resulting combination is unweighted. The properties of the mean are well known, as well as its limitations. The mean is sensitive to outliers and unreliable for skewed distributions. To avoid some of its prob- lems one might use a winsorised or truncuted mean (Jose and Winkler, 2008). In this case the mean behaves more closely to the median. For distributions with finite variance, which is true for sets of forecasts, the maximum distance between the mean and the median is one standard deviation (Mallows, 1991). 8 4.2. Median ensemble Similarly the median can be unweighted or weighted, although the latter is rarely used. The median ensemble ỹMediant is simply calculated sorting wmymt and picking the middle value if M is odd or the mean of the two middle values otherwise. Although the median is more robust than the mean it still suffers with non-symmetric distributions. 4.3. Mode Ensemble The mode is defined as the most frequent value in a set of data. The mode is insensitive to outliers, in contrast to the mean and median. There is no formula to calculate the mode of an unknown distribution for continuous variables. There are two common ways to calculate it: either by discretising the data and identifying the most frequent bin, or by kernel density esti- mation. In this work the second approach is preferred in order to avoid the discretisation of the data. Furthermore, kernel density estimation lends itself well to the continuous-valued nature of forecasts. Kernel density estimation is a non-parametric way to estimate the proba- bility density function of a random variable, in this case the forecasts. Given forecasts of a distribution with unknown density f, we can approximate its shape using the kernel density estimator f̂th(x) = (Mh) −1 M ∑ m=1 K ( x − ŷmt h ) , (3) where K(·) is a function with the property ∫ K(x)dx = 1 that is called kernel and h > 0 is its bandwidth. The kernel is often chosen to be a unimodal symmetric density function, therefore making f̂h(x) a density function itself, which is often, for computational reasons, the Gaussian kernel φ(x): φh(x) = 1√ 2πh e − x 2 2h2 . (4) Figure 2 shows an example of the calculation of kernel density. A kernel with bandwidth h is fitted around each observation and the resulting sum approximates the density function of the sample. A number of alternative kernel functions have been proposed in the litera- ture, however the choice of kernel has been found to have minimal impact on the outcome for most cases (Wand and Jones, 1995). The bandwidth of the kernel h controls the amount of smoothing. A high bandwidth results in more 9 D e n si ty Observations Figure 2: Example calculation of kernel density estimation. smoothing. Therefore, the choice of h is crucial, as either under-smoothing or over-smoothing will provide misleading estimation of the density f (Sil- verman, 1981). The approximation by Silverman (1998) is often used in practice h = ( 4σ̂5 3M ) 1 5 , (5) where σ̂ is the standard deviation of the sample of the forecasts. This approx- imation is often adequate for Gaussian kernels. Botev et al. (2010) propose a methodology to automatically select the bandwidth that is free from the arbitrary normal reference rules used by existing methods. This is preferred in the calculation of the mode ensemble as the resulting bandwidth h allows fast convergence and good performance of the ensemble, as it is discussed in the results. The value x that corresponds to the maximum density approximates the mode of the true underlying distribution for a set of forecasts, which is also the value of the mode ensemble ŷModet+h . This is true as long as the estimated distribution is unimodal. Although the probability of facing non-unimodal distributions when dealing with forecasts is low, the following heuristic is proposed to resolve such cases. Since there is no preference between the modes, the one closer to the previous (forecasted or actual) value is retained as the mode. This results in smooth trace forecasts. Eq. (3) results in unweigthed ŷModet+h . It is trivial to introduce wm individual weights for each model. For kernel density estimation to adequately reveal the underlying density a relatively large number of observations are required. A small number of 10 observations will lead to a bad approximation. This is illustrated in figure 3. It shows the mean, median, mode ensembles as well as the selected “best” model forecast, as selected using a validation sample for four different forecast horizons. Furthermore, the estimated kerned density for each horizon is plotted. It is apparent by comparing 3a and 3b that the kernel density estimation using only 10 models is very poor. While in 3a the shape of the distribution is successfully approximated, in 3b there are not enough forecasts to identify the underlying shape of the distribution of the forecasts. Furthermore, in figure 3a it is easy to see that neither the mean, median or the “best” model are close to the most probable value of the forecast distribution. The mode ensemble offers an intuitive way of identifying where forecasts from different models converge and provide a robust forecast, independent of distributional assumptions. 5. Empirical Evaluation 5.1. Datasets To empirically evaluate the performance of the mean, median and the proposed mode ensemble for NNs, two large datasets of real monthly time series are used. The first dataset comes from Federal Reserve Economic Data (FRED) of St. Luis.1 From the complete dataset 3,000 monthly time series that contain 108 or more observations (9 years) were sampled. Long time series were preferred to allow for adequate training, validation and test sets. The second dataset comes from the UK Office for National Statistics and contains 443 monthly retail sales time series.2 Again, only time series with 108 or more observations were retained for the empirical evaluation. A summary of the characteristics of the time series in each dataset is provided in table 1. To identify the presence of trend in a time series the cox- stuart test was employed on a 12-period centred moving average fitted to each time series. The test was performed on the centred moving average to smooth any effects from irregularities and seasonality. To identify the presence of seasonality, seasonal indices were calculated for the de-trended time series and then these were tested for significant deviations from each other by means of a Friedman test. This procedure, based on non-parametric tests, is robust, 1The dataset can be accessed at http://research.stlouisfed.org/fred2/. 2The dataset can be accessed at http://www.ons.gov.uk/ons/rel/rsi/ retail-sales/january-2012/tsd-retail-sales.html. 11 (a) 100 models (b) 10 models Figure 3: Example of the distribution of NN forecasts of different number of models, as estimated by Gaussian kernel density estimation, for the first four steps ahead. The forecasts by model selection, mean, median and mode ensembles are provided. 12 however different tests may provide slightly different percentages to those in table 1. Table 1: Dataset characteristics. Series Length Series Patterns Dataset Series Min Mean Max Level Trend Season Trend-Season FRED 3000 111 327 1124 5.37% 40.70% 5.80% 48.13% Retail 443 179 270 289 15.12% 48.98% 1.81% 34.09% The last 18 observations from each time series are withheld as test set. The prior 18 observations are used as validation set to accommodate NNs training. 5.2. Experimental Design A number of NN ensemble models are fitted to each time series. Two are based on mean, two on median and two on mode ensembles. Hereafter, these are named NN-Mean, NN-Median and NN-Mode respectively. All combina- tion operators are applied in their unweighted version, as the objective is to test their fundamental performance. In each pair of ensembles, the first is a training ensemble, combining multiple training initialisations and the sec- ond is based on bagging, as described by Kunsch (1989). This moving block bootstrap samples the original time series while preserving the temporal and spatial covariance structure, as well as the serial correlation of the time series data. By assessing the operators using different types of ensembles we aim to assess the consistency of their performance. Furthermore, different sizes of ensembles are evaluated, from 10 members up to 100 members, with steps of 10. Results for single NN models, based on selecting the best one, are not provided as there is compeling evidence in the literature that ensembles are superior (for example see Zhang and Berardi, 2001; Barrow et al., 2010). This was validated in our experiments as well. The individual neural networks have identical setup. Following the sug- gestions of the literature, if trend is identified in a time series it is removed through first differencing (Zhang and Qi, 2005). The time series is then li- nearly scaled between -0.5 and 0.5 to facilitate the NN training. The inputs are identified through means of stepwise regression, which has been shown to perform well for identifying univariate input lags for NNs (Crone and 13 Kourentzes, 2010; Kourentzes and Crone, 2010). All networks use the hy- perbolic tangent transfer function for the hidden nodes and a linear output node. The number of hidden nodes was identified experimentally for each time series. Up to 60 hidden nodes were evaluated for each time series and the number of hidden nodes that minimised the validation Mean Squared Error (MSE) was chosen. Each network was trained using the Levenberg-Marquardt (LM) algo- rithm. The algorithm requires setting a scalar µLM and its increase and decrease steps. When the scalar is zero, the LM algorithm becomes just Newton’s method, using the approximate Hessian matrix. On the other hand, when µLM is large, it becomes gradient descent with a small step size. Newton’s method is more accurate and faster near an error minimum, so the aim is to shift toward Newton’s method as quickly as possible. If a step would increase the fitting error then µLM is increased. Here µLM = 10 −3, with an increase factor of µinc = 10 and a decrease factor of µdec = 10 −1. For a detailed description of the algorithm and its parameters see Hagan et al. (1996). MSE was used as the training cost function. The maximum training epochs are set to 1000. The training can stop earlier if µLM becomes equal or greater than µmax = 10 10. The MSE error at the validation set is tracked while training. If the error increases consequently for 50 epochs then training is stopped. The weights that give the lowest validation error are selected at the end of each training. This is common practice in the literature and helps to achieve good out-of-sample performance, since it avoids over-fitting to the training sample (Haykin, 2009). Following the suggestions of the forecasting literature (Adya and Col- lopy, 1998) two statistical benchmarks are used in this study, namely the naive forecast (random walk) and exponential smoothing. This is done to assess the accuracy gains of using NNs against established simpler statistical methods. The Naive requires no parameterisation or setup, hence is used as a baseline that any more complex model should outperform. The appro- priate exponential smoothing model is selected for each time series, depend- ing on the presence of trend and/or seasonality using Akaike’s Information Criterion. Model parameters are identified by optimising the log-likelihood function (Hyndman et al., 2002, 2008). Exponential smoothing was selected as a benchmark based on its widely demonstrated forecasting accuracy and robustness (Makridakis and Hibon, 2000; Gardner, 2006) and will be named ETS in this work. The use of these benchmarks can help establish the rel- ative performance of the NN models. In total, eight forecasting models are 14 fitted to each time series, six NNs and two statistical benchmarks. Rolling trace forecasts of 12 months are produced using each model. The rolling origin evaluation enables collecting a large sample of forecasts and their errors, while being robust to irregular forecast origins and outliers, thus providing reliable error measurements. Based on the long test set, 7 trace forecasts from t+1 up to t+12 months are collected for each time series. The reader is referred to Tashman (2000) for a detailed description of the evaluation scheme and its advantages. The forecasting accuracy is assessed using the Mean Absolute Scaled Er- ror (MASE). This is preferred due to its favourable statistical properties. MASE is calculated for each trace forecast as: MASE = m−1 ∑m j=1 |yj − ŷj| (n − 1)−1 ∑n r=2 |yr − yr−1| , (6) where yj and ŷj are the actual and forecasted value for j = 1, . . . , m out- of-sample observations. The denominator is the mean absolute error of the random walk in the fitting sample of n observations and is used to scale the error. MASE, being a scaled error, permits summarising model performance across time series of different scale and units, which mean squared or absolute errors cannot do, and is less biased from errors like the mean absolute per- centage error and its symmetric equivalent. Another advantage of this error is that it is very improbable that the denominator is zero, therefore making it easy to calculate in several scenarios and robust to time series with several values equal or close to zero (Hyndman and Koehler, 2006). Note that the Retail dataset contains several time series that do not permit the calculation of conventional percentage errors, due to zero values in the denominator. To summarise the results across the time series of each dataset the mean and median MASE across all series are calculated. 6. Results Table 2 presents the results for the FRED time series. Numbers in brack- ets refer to median MASE, while the rest to mean MASE. The table provides results for ensembles from 10 to 100 members. The results for bagging and training ensembles are presented separately to assess the impact of the ensem- ble type on the different ensemble operators. In each row the best performing method according to mean and median MASE is highlighted in boldface. 15 Table 2: Mean (Median) MASE for FRED dataset. Ensemble Size NN-Mean NN-Median NN-Mode Naive ETS Bagging 10 1.06 (0.66) 0.92 (0.64) 1.30 (0.77) 1.11 (0.87) 3.43 (0.62) 20 1.06 (0.65) 0.90 (0.63) 0.94 (0.65) 1.11 (0.87) 3.43 (0.62) 30 1.02 (0.65) 0.89 (0.62) 0.89 (0.62) 1.11 (0.87) 3.43 (0.62) 40 1.04 (0.65) 0.88 (0.62) 0.88 (0.61) 1.11 (0.87) 3.43 (0.62) 50 1.03 (0.64) 0.88 (0.62) 0.88 (0.61) 1.11 (0.87) 3.43 (0.62) 60 1.03 (0.64) 0.89 (0.62) 0.88 (0.61) 1.11 (0.87) 3.43 (0.62) 70 1.04 (0.65) 0.88 (0.62) 0.87 (0.61) 1.11 (0.87) 3.43 (0.62) 80 1.03 (0.65) 0.88 (0.62) 0.87 (0.61) 1.11 (0.87) 3.43 (0.62) 90 1.01 (0.65) 0.88 (0.61) 0.87 (0.61) 1.11 (0.87) 3.43 (0.62) 100 1.01 (0.65) 0.88 (0.61) 0.87 (0.61) 1.11 (0.87) 3.43 (0.62) Training ensemble 10 1.05 (0.64) 0.95 (0.62) 1.17 (0.70) 1.11 (0.87) 3.43 (0.62) 20 1.03 (0.65) 0.93 (0.62) 0.95 (0.64) 1.11 (0.87) 3.43 (0.62) 30 1.01 (0.64) 0.91 (0.62) 0.90 (0.62) 1.11 (0.87) 3.43 (0.62) 40 1.02 (0.64) 0.91 (0.62) 0.90 (0.61) 1.11 (0.87) 3.43 (0.62) 50 1.02 (0.64) 0.92 (0.62) 0.89 (0.61) 1.11 (0.87) 3.43 (0.62) 60 1.01 (0.64) 0.91 (0.62) 0.89 (0.62) 1.11 (0.87) 3.43 (0.62) 70 1.01 (0.64) 0.91 (0.61) 0.89 (0.61) 1.11 (0.87) 3.43 (0.62) 80 1.01 (0.64) 0.91 (0.62) 0.88 (0.61) 1.11 (0.87) 3.43 (0.62) 90 1.01 (0.64) 0.91 (0.61) 0.88 (0.61) 1.11 (0.87) 3.43 (0.62) 100 1.01 (0.64) 0.91 (0.62) 0.88 (0.61) 1.11 (0.87) 3.43 (0.62) Overall, the difference between the mean and median MASE results in- dicates that there are several difficult time series, particularly affecting the less robust mean MASE. Focusing on the bagging results, all NN-Mean, NN- Median and NN-Mode are more accurate than the benchmarks when con- sidering mean MASE. Furthermore, as the ensembles increase in size their accuracy improves. In particular, for NN-Mode after there are 30 or more members the forecasts are very accurate. This was to be expected since the kernel density estimation becomes reliable once there is an adequate num- ber of observations, as discussed in section 4. For ensembles of 70 or more members NN-Mode provides consistently the best accuracy, closely followed by NN-Median. Note that achieving large numbers of ensemble members is trivial with NNs, as this merely implies that more training initialisations or bootstrapped samples are used. Therefore, the requirement of the mode op- erator for 30 or more ensemble members is not a limiting factor. In contrast, 16 NN-Mean underperforms to the extent that ETS is more accurate for median MASE. This is an interesting finding, given how common is the mean oper- ator for ensembles in the literature. The more robust behaviour of median and the in-sensitive to outliers nature of the mode result in more accurate ensemble forecasts. Looking at mean MASE, all NNs behave more robust than ETS, the latter being severely affected by outliers. The results of the training ensembles are very similar. Again, as the number of members in the ensemble increases NN-Mode performs better and is the most accurate model for 40 or more ensemble members. NN-Median ranks second with small differences, while NN-Mean is substantially worse. Comparing the results between bagging and training ensembles we can see that the former is marginally more accurate for NN-Median and NN-Mode when mean MASE is considered. However, the same is not true for NN-Mean, indicating that the robustness and performance of this ensemble operator is affected by the type of ensemble. Table 3 presents the results for the Retail dataset. Its structure is the same as in table 2. The differences between mean and median MASE are smaller than the FRED results, showing that the time series in this dataset are better behaved. Considering the bagging results, NN-Median consistently outperforms the statistical benchmarks, while the same is true for NN-Mode, once there is an adequate number of members in the ensemble (again 30 or more). NN-Mode is the most accurate model with the lowest mean and median MASE. This is followed closely by NN-Median. On the other hand, NN-Mean often fails to outperform the benchmark ETS, although it is always better than the Naive. Looking at the accuracy of the training ensembles NN-Mode is overall more accurate for mean MASE, NN-Median is the most accurate for median MASE. Although all NN models outperform the Naive benchmark, the dif- ferences between either NN-Mode or NN-Median and ETS are very small. NN-Mean is worse than ETS in terms of mean MASE, while occasionally it is marginally better in terms of median MASE. Comparing accuracies between bagging and training ensembles there are differences in favour of the former when looking at NN-Median and NN-Mode, while the accuracy for NN-Mean is almost identical for both types of ensembles. Across both datasets NN-Mode and NN-Median are the most accurate models. NN-Mode seems to perform better when the size of ensemble is large enough. NN-Median has slightly lower accuracy. While large ensem- bles benefit NN-Median, it can perform well for small ensembles too. Both 17 Table 3: Mean (Median) MASE for Retail dataset. Ensemble Size NN-Mean NN-Median NN-Mode Naive ETS Bagging 10 1.33 (0.96) 1.11 (0.93) 1.44 (1.10) 1.45 (1.29) 1.12 (0.97) 20 1.37 (0.97) 1.10 (0.94) 1.14 (0.94) 1.45 (1.29) 1.12 (0.97) 30 1.29 (0.96) 1.10 (0.91) 1.09 (0.92) 1.45 (1.29) 1.12 (0.97) 40 1.31 (0.97) 1.10 (0.91) 1.09 (0.90) 1.45 (1.29) 1.12 (0.97) 50 1.30 (0.97) 1.10 (0.92) 1.09 (0.89) 1.45 (1.29) 1.12 (0.97) 60 1.26 (0.96) 1.09 (0.91) 1.09 (0.89) 1.45 (1.29) 1.12 (0.97) 70 1.26 (0.96) 1.09 (0.91) 1.09 (0.90) 1.45 (1.29) 1.12 (0.97) 80 1.26 (0.98) 1.09 (0.90) 1.08 (0.88) 1.45 (1.29) 1.12 (0.97) 90 1.27 (0.97) 1.09 (0.90) 1.09 (0.87) 1.45 (1.29) 1.12 (0.97) 100 1.27 (0.95) 1.09 (0.91) 1.09 (0.88) 1.45 (1.29) 1.12 (0.97) Training ensemble 10 1.34 (0.97) 1.14 (0.91) 1.27 (0.97) 1.45 (1.29) 1.12 (0.97) 20 1.33 (0.95) 1.14 (0.91) 1.14 (0.91) 1.45 (1.29) 1.12 (0.97) 30 1.31 (0.96) 1.13 (0.89) 1.11 (0.90) 1.45 (1.29) 1.12 (0.97) 40 1.28 (0.95) 1.12 (0.91) 1.11 (0.90) 1.45 (1.29) 1.12 (0.97) 50 1.28 (0.96) 1.12 (0.90) 1.11 (0.91) 1.45 (1.29) 1.12 (0.97) 60 1.29 (0.96) 1.12 (0.89) 1.11 (0.91) 1.45 (1.29) 1.12 (0.97) 70 1.29 (0.95) 1.13 (0.90) 1.11 (0.90) 1.45 (1.29) 1.12 (0.97) 80 1.29 (0.95) 1.13 (0.90) 1.11 (0.90) 1.45 (1.29) 1.12 (0.97) 90 1.28 (0.95) 1.13 (0.89) 1.12 (0.90) 1.45 (1.29) 1.12 (0.97) 100 1.28 (0.96) 1.13 (0.89) 1.12 (0.90) 1.45 (1.29) 1.12 (0.97) these models are on average more accurate than the statistical benchmarks ETS and Naive. On the other hand, NN-Mean provides mixed results. In both datasets it outperforms Naive, but not always ETS. It is substantially outperformed by both NN-Mode and NN-Median. 7. Discussion The value of ensembles for NNs has been argued theoretically and demon- strated empirically. The combination of the models has often involved some type of mean operator. The empirical evaluation in this paper found that the less commonly used median operator and the proposed mode operator are more accurate and thus preferable. The size of the ensemble was found to be important for the accuracy of all operators. Both mode and median, for the two datasets investigated here, seemed on average to converge for ensem- 18 bles of 60 or more members, with any additional members offering minimal changes in the forecasting performance. In particular, the mode, due to its reliance on kernel density estimation, required at least 30 members. However, after that point it was found to be on average the most accurate ensemble op- erator. This is illustrated in figures 4 and 5 that present the mean MASE for different number of ensemble members for the FRED and the Retail datasets respectively. The results for the different type of ensembles have been pooled together, since they had only small differences. Note that there is little evi- dence that the mean ensembles had converged even with 100 members. Even larger ensembles were not calculated due to the substantial computational resources required, especially when the objective is to forecast a large num- ber of time series, which is common in supply chain and retailing forecasting applications. 10 20 30 40 50 60 70 80 90 100 0.85 0.9 0.95 1 1.05 1.1 1.15 1.2 1.25 Ensemble members m e a n M A S E FRED dataset Mean Median Mode Figure 4: Mean MASE for different number of ensemble members for the FRED dataset. In order to assess whether these differences are significant or not, we employ the testing methodology suggested by Koning et al. (2005) that is appropriate for comparing forecasts from multiple models. The comparison is done across all different ensemble sizes to highlight if an operator is consis- tently statistically different. First, a Freidman test is used to assess whether the accuracy of any model is significantly different from the rest. Subse- quently, the MCB test is used to reveal the exact ordering of the different operators, and any significant differences between them. For both datasets the mode operator was significantly better than the median, which in turn was significantly different than the mean, at 5% significance level. At this point, it is useful to comment on the associated computational 19 10 20 30 40 50 60 70 80 90 100 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 Ensemble members m e a n M A S E Retail dataset Mean Median Mode Figure 5: Mean MASE for different number of ensemble members for the Retail dataset cost of the NN ensembles. The main cost comes from training the networks. Therefore, the more ensemble members that need to be trained, the less scalable forecasting with NNs becomes, and the operator that achieves good forecasting performance with the least amount of members is preferable. Table 4 provides an overview of the average time required for forecasting across all series, for each dataset. As a different number of hidden nodes are used for each time series, the complexity of NN training changes, requiring different amount of time. To keep the presentation of the values simple, we summarise the training time over different series into the reported average time. The ensemble size that gave the minimum error for each operator in figures 4 and 5 is used as reference for the comparison. The average time in seconds, as well as the percentage difference over the time needed for the mode ensembles, are provided in the table. The networks were trained in parallel on an i7-3930K CPU clocked at 4.5 Ghz with 12 logical cores. The mode operator needed the least number of ensemble members, re- quiring from 25% up to 200% less time than the mean or median operators across both datasets. Therefore, apart from the significant gains in forecast- ing accuracy, the proposed ensemble operator required the least computa- tional resources. In particular for the retailing dataset, the run-time was more than halved. In figures 4 and 5 it is clear that similar performance is achieved for a large range of ensemble sizes for the median operator. This allows exchanging marginal differences in accuracy for smaller run-times, thus improving its scalability as well. On the other hand, this is not the case with the mean 20 Table 4: Average computational time comparison. Ensemble operator FRED Retail Ensemble Mean Difference Ensemble Mean Difference size time size time Mean 100 9.74 secs +25.0% 70 5.33 secs +133.3% Median 100 9.74 secs +25.0% 90 6.85 secs +200.0% Mode 80 7.79 secs - 30 2.28 secs - operator, the accuracy of which improves with bigger ensembles. In the experiments, two types of ensembles were considered, bagging and training ensembles. Each one tackles a different type of parameter uncer- tainty. We examined whether the performance of the operators was affected by the type of ensemble. Again median and mode had very similar perfor- mance, favouring bagging. For the mean this behaviour was not evident. We attribute this different behaviour to the sensitivity of the mean to ex- treme values, which both median and mode are designed to avoid, albeit with different success. 8. Conclusions This paper evaluates different fundamental ensemble operators. The well known mean and the less commonly used median were compared, together with a proposed mode operator that is based on kernel density estimation. All these three different operators attempt to describe the location of the distribution of the forecasts of the members of an ensemble. However, they deal with outlying extreme values differently, with the mean being the most sensitive and the mode the least. Furthermore, distributional asymmetries can affect both the mean and the median, while the mode is immune. The findings in this paper suggest that both median and mode are very useful operators as they provided better accuracy than mean ensembles con- sistently across both datasets. The mode was found to be the most accurate, followed by the median. Based on this finding, we recommend investigating the use of the mode and median operators further in ensembles research and applications, which have been largely overlooked in the literature that has mainly focused on the mean. Furthermore, this work demonstrated that mode ensembles can robustly 21 and accurately forecast automatically a large number of time series with neural networks, while the commonly used mean ensembles were often out- performed by exponential smoothing forecasts. Moreover, mean ensembles required a very large number of members, which neither mode or median needed, with apparent implications for computational costs. In particular, the mode operator was found to require the least computation resources, due to the relatively small number of ensemble members that needed to be trained. We have already mentioned a number of applications that can benefit from improved NN ensemble forecasts, ranging from economic and business forecasting to climate modelling. Most of these applications are characterised by forecasting a few, yet important, time series. The improved scalability of mode ensembles over the commonly used mean ensembles allows applying NNs to areas that routinely require large scale automatic forecasting, which can benefit from the nonlinear modelling capabilities of NNs. One such ex- ample is retailing, where one has to forecast a large number of products, the sales of which are affected by multiple factor that interact in a nonlin- ear fashion, such as pricing, promotional and temperature effects. The im- proved scalability of mode ensembles, compounded with the ever increasing computing capabilities provides opportunities for novel important forecasting applications of NN ensembles. This paper found significant savings in com- puting time from the proposed operator, which over the complete number of time series accounts for several hours of computations. Such reduction will also help using NN ensembles in high frequency forecasting cycles, where the computational speed has been a limiting factor. Future work should explore these potentials. The empirical evaluation, in this paper, focused on the unweighted version of all these operators, trying to assess their fundamental properties. Although their differences are attributed to their robustness to extreme values, future research should extend this work to weighted versions of the operators. This will allow considering their use on further ensemble types, such as boosting. References Adya, M., Collopy, F., 1998. How effective are neural networks at forecasting and prediction? a review and evaluation. Journal of Forecasting 17 (5-6), 481–495. 22 Agnew, C. E., 1985. Bayesian consensus forecasts of macroeconomic vari- ables. Journal of Forecasting 4 (4), 363–376. Araújo, M. B., New, M., 2007. Ensemble forecasting of species distributions. Trends in ecology & evolution 22 (1), 42–47. Barrow, D., Crone, S., Kourentzes, N., 2010. An evaluation of neural net- work ensembles and model selection for time series prediction. In: Neural Networks (IJCNN), The 2010 International Joint Conference on. IEEE, pp. 1–8. Bates, J. M., Granger, C. W. J., 1969. The combination of forecasts. Oper- ational Research Quarterly 20 (4), 451–468. Ben Taieb, S., Bontempi, G., Atiya, A. F., Sorjamaa, A., 2012. A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition. Expert Systems with Applications 39 (8), 7067–7083. Bishop, C. M., 1996. Neural Networks for Pattern Recognition, 1st Edition. Oxford University Press, USA. Bodyanskiy, Y., Popov, S., 2006. Neural network approach to forecasting of quasiperiodic financial time series. European Journal of Operational Research 175 (3), 1357–1366. Botev, Z. I., Grotowski, J. F., Kroese, D. P., 2010. Kernel density estimation via diffusion. The Annals of Statistics 38 (5), 2916–2957. Breiman, L., 1996a. Bagging predictors. Machine learning 24 (2), 123–140. Breiman, L., 1996b. Heuristics of instability and stabilization in model selec- tion. The Annals of Statistics 24 (6), 2350–2383. Campolo, M., Andreussi, P., Soldati, A., 1999. River flood forecasting with a neural network model. Water resources research 35 (4), 1191–1197. Chen, A.-S., Leung, M. T., 2004. Regression neural network for error correc- tion in foreign exchange forecasting and trading. Computers & Operations Research 31 (7), 1049–1068. 23 Chen, T., Ren, J., 2009. Bagging for gaussian process regression. Neurocom- puting 72 (7), 1605–1610. Connor, J. T., Martin, R. D., Atlas, L. E., 1994. Recurrent neural networks and robust time series prediction. IEEE Transactions on Neural Networks 5, 240–254. Crone, S. F., Hibon, M., Nikolopoulos, K., 2011. Advances in forecasting with neural networks? Empirical evidence from the NN3 competition on time series prediction. International Journal of Forecasting 27 (3), 635–660. Crone, S. F., Kourentzes, N., 2010. Feature selection for time series prediction - a combined filter and wrapper approach for neural networks. Neurocom- puting 73 (10-12), 1923–1936. Dawson, C., Wilby, R., 2001. Hydrological modelling using artificial neural networks. Progress in physical Geography 25 (1), 80–108. Efendigil, T., Önüt, S., Kahraman, C., 2009. A decision support system for demand forecasting with artificial neural networks and neuro-fuzzy models: A comparative analysis. Expert Systems with Applications 36 (3), 6697– 6707. Elliott, G., Timmermann, A., 2004. Optimal forecast combinations under general loss functions and forecast error distributions. Journal of Econo- metrics 122 (1), 47–79. Fildes, R., Kourentzes, N., 2011. Validation and forecasting accuracy in mod- els of climate change. International Journal of Forecasting 27 (4), 968–995. Gardner, E. S., 2006. Exponential smoothing: The state of the art - part ii. International Journal of Forecasting 22 (4), 637–666. Hagan, M. T., Demuth, H. B., Beale, M. H., 1996. Neural Network Design. MA: PWS Publishing, Boston. Hansen, L. K., Salamon, P., 1990. Neural network ensembles. IEEE Trans- actions Pattern Analysis and Machine Intelligence 12 (10), 993–1001. Haykin, S., 2009. Neural Networks and Learning Machines. Pearson Educa- tion, Inc. 24 Hillebrand, E., Medeiros, M. C., 2010. The benefits of bagging for forecast models of realized volatility. Econometric Reviews 29 (5-6), 571–593. Hippert, H. S., Pedreira, C. E., Souza, R. C., 2001. Neural networks for short- term load forecasting: A review and evaluation. Power Systems, IEEE Transactions on 16 (1), 44–55. Hornik, K., 1991. Approximation capabilities of multilayer feedforward net- works. Neural Networks 4 (2), 251–257. Hornik, K., Stinchcombe, M., White, H., 1989. Multilayer feedforward net- works are universal approximators. Neural Networks 2 (5), 359–366. Hyndman, R. J., Koehler, A. B., 2006. Another look at measures of forecast accuracy. International Journal of Forecasting 22, 679–688. Hyndman, R. J., Koehler, A. B., Ord, J. K., Snyder, R. D., 2008. Forecasting with Exponential Smoothing: The State Space Approach. Springer. Hyndman, R. J., Koehler, A. B., Snyder, R. D., Grose, S., 2002. A state space framework for automatic forecasting using exponential smoothing methods. International Journal of Forecasting 18 (3), 439–454. Inoue, A., Kilian, L., 2008. How useful is bagging in forecasting economic time series? a case study of us consumer price inflation. Journal of the American Statistical Association 103 (482), 511–522. Jose, V. R. R., Winkler, R. L., 2008. Simple robust averages of forecasts: Some empirical results. International Journal of Forecasting 24, 163–169. Khashei, M., Bijari, M., 2010. An artificial neural network (p, d, q) model for timeseries forecasting. Expert Systems with Applications 37 (1), 479–489. Koning, A. J., Franses, P. H., Hibon, M., Stekler, H. O., 2005. The M3 competition: Statistical tests of the results. International Journal of Fore- casting 21 (3), 397 – 409. Kourentzes, N., Crone, S. F., 2010. Frequency independent automatic input variable selection for neural networks for forecasting. In: Neural Networks (IJCNN), The 2010 International Joint Conference on. IEEE, pp. 1–8. 25 Kourentzes, N., Petropoulos, F., Trapero, J. R., 2013. Improving forecasting by estimating time series structural components across multiple frequen- cies. International Journal of Forecasting. Kunsch, H. R., 1989. The jackknife and the bootstrap for general stationary observations. The Annals of Statistics 17 (3), 1217–1241. Langella, G., Basile, A., Bonfante, A., Terribile, F., 2010. High-resolution space–time rainfall analysis using integrated ann inference systems. Journal of Hydrology 387 (3), 328–342. Lee, T.-H., Yang, Y., 2006. Bagging binary and quantile predictors for time series. Journal of Econometrics 135 (1), 465–497. Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon, M., Lewandowski, R., Newton, J., Parzen, E., Winkler, R., 1982. The ac- curacy of extrapolation (time series) methods: Results of a forecasting competition. Journal of Forecasting 1 (2), 111–153. Makridakis, S., Hibon, M., 2000. The M3-competition: results, conclusions and implications. International Journal of Forecasting 16 (4), 451–476. Makridakis, S., Winkler, R. L., 1983. Averages of forecasts: Some empirical results. Management Science 29 (9), 987–996. Mallows, C., 1991. Another comment on ocinneide. The American Statisti- cian 45, 257. McAdam, P., McNelis, P., 2005. Forecasting inflation with thick models and neural networks. Economic Modelling 22 (5), 848–867. McNees, S. K., 1992. The uses and abuses of ‘consensus’ forecasts. Journal of Forecasting 11, 703–711. Ming Shi, S., Da Xu, L., Liu, B., 1999. Improving the accuracy of nonlinear combined forecasting using neural networks. Expert Systems with Appli- cations 16 (1), 49–54. Naftaly, U., Intrator, N., Horn, D., 1997. Optimal ensemble averaging of neural networks. Network: Computation in Neural Systems 8, 283–296. 26 Newbold, P., Granger, C. W. J., 1974. Experience with forecasting univariate time series and the combination of forecasts. Journal of the Royal Statis- tical Society. Series A (General) 137 (2), 131–165. Pattie, D. C., Snyder, J., 1996. Using a neural network to forecast visitor behavior. Annals of Tourism Research 23 (1), 151–164. Roebber, P. J., Butt, M. R., Reinke, S. J., Grafenauer, T. J., 2007. Real-time forecasting of snowfall using a neural network. Weather and forecasting 22 (3), 676–684. Rumelhart, D. E., Hinton, G. E., Williams, R. J., 1986. Learning internal rep- resentations by error propagation. In: Rumelhart, D. E., McClelland, J. L. (Eds.), Parallel distributed processing: explorations in the microstructure of cognition. Vol. 1. MIT Press, Cambridge, MA, USA, Ch. Learning in- ternal representations by error propagation, pp. 318–362. Silverman, B., 1998. Density Estimation for Statistics and Data Analysis. Chapman & Hall/CRC, London. Silverman, B. W., 1981. Using kernel density estimates to investigate multi- modality. Journal of the Royal Statistical Society. Series B (Methodologi- cal) 43 (1), 97–99. Stock, J. H., Watson, M. W., 2004. Combination forecasts of output growth in a seven-country data set. Journal of Forecasting 23 (6), 405–430. Tashman, L. J., 2000. Out-of-sample tests of forecasting accuracy: an ana- lysis and review. International Journal of Forecasting 16 (4), 437–450. Taylor, J. W., Buizza, R., 2002. Neural network load forecasting with weather ensemble predictions. Power Systems, IEEE Transactions on 17 (3), 626– 632. Trapero, J. R., Kourentzes, N., Fildes, R., 2012. Impact of information ex- change on supplier forecasting performance. Omega 40 (6), 738–747. Versace, M., Bhatt, R., Hinds, O., Shiffer, M., 2004. Predicting the exchange traded fund DIA with a combination of genetic algorithms and neural networks. Expert systems with applications 27 (3), 417–425. 27 Wand, M. P., Jones, M. C., 1995. Kernel Smoothing. Chapman & Hall. Werbos, P. J., 1990. Backpropagation through time - what it does and how to do it. Proceedings of the IEEE 78 (10), 1550–1560. Yu, L., Wang, S., Lai, K. K., 2008. Forecasting crude oil price with an EMD- based neural network ensemble learning paradigm. Energy Economics 30 (5), 2623–2635. Zhang, G., Patuwo, B. E., Hu, M. Y., 1998. Forecasting with artificial neural networks: The state of the art. International Journal of Forecasting 14 (1), 35–62. Zhang, G. P., 2001. An investigation of neural networks for linear time-series forecasting. Computers and Operations Research 28 (12), 1183–1202. Zhang, G. P., Berardi, V. L., 2001. Time series forecasting with neural net- work ensembles: an application for exchange rate prediction. Journal of the Operational Research Society 52, 652–664. Zhang, G. P., Patuwo, B. E., Hu, M. Y., 2001. A simulation study of artifi- cial neural networks for nonlinear time-series forecasting. Computers and Operations Research 28 (4), 381–396. Zhang, G. P., Qi, M., 2005. Neural network forecasting for seasonal and trend time series. European Journal of Operational Research 160 (2), 501–514. 28 Neural cover Neural work_2zlvlsgvvbe5tke7aiw2ffir4u ---- Refinement of the HEPAR Expert System: Tools and Techniques∗ Peter Lucas Department of Computer Science Utrecht University Padualaan 14 3584 CH Utrecht, The Netherlands June 26, 2013 Abstract Methods and tools for the static and dynamic verification and validation of software sys- tems are common place in the field of software engineering. In the field of expert systems, where it is more difficult to ensure that a system meets the specifications and expecta- tions than in traditional software engineering, such tools are generally not available. In this paper, the need for more support of the development process by method and tools is illustrated by the approach taken in building the HEPAR system, a rule-based expert system that can be used as a supportive tool in the diagnosis of disorders of the liver and biliary tract. At a certain stage in the development of this system an incremental devel- opment methodology has been adopted, in which implementation of parts of the expert system was followed by dynamic validation. For this purpose, a collection of software tools were implemented as extensions to a rule-based expert-system shell. These tools provide valuable information about the effects of modification of the HEPAR knowledge base, and indicate places in the knowledge base for refinement. It is believed that similar software tools may prove helpful in the development of other expert systems as well. Keywords. Knowledge engineering; validation of expert systems; refinement of expert systems; evaluation of expert systems; expert-system validation tools. 1 Introduction In the field of software engineering it is generally recognized that the implementation of large software systems must be supported by methods and tools for their verification and valida- tion [19]. Often a distinction is made between static and dynamic approaches to verification and validation. Typical examples of static methods are program code inspection and meth- ods for proving program correctness. The application of static methods does not require the program to be executed. In contrast, dynamic verification and validation methods involve execution of the program and examining its output when it is presented with certain input. As has been pointed out repeatedly by many researchers, dynamic methods can be used only to demonstrate the presence of errors in a program, and never to demonstrate their absence. ∗Published in: Artificial Intelligence in Medicine, 6(2), 175–188, 1994. 1 Despite this fundamental limitation, software engineers consider dynamic methods indispen- sible aids in the software-development process, and supporting software tools are invariably included in programming environments. It is therefore ironical that in the development of expert systems, where it is much more difficult to ensure that the system meets its specifica- tions and expectations than in software engineering, tools that aid in the dynamic verification and validation of the system are not generally available. In this paper we discuss several static and dynamic methods, including some software tools, which were applied during the development of the HEPAR system, a rule-based expert system in the field of hepatology that offers support in the diagnosis of disorders of the liver and biliary tract. For a description of this system and of two successive studies of its performance, the reader is referred to [10, 11, 12]. Taking the development of the HEPAR system as a real-life example, we describe where in the development process the need for static and dynamic verification and validation methods and tools arose. This experience provides further evidence that in the development process of expert systems more support by methods and tools tailored to verification and validation is needed than is presently offered. The structure of the paper is as follows. In Section 2, we review the design and imple- mentation of the HEPAR system and analyse the problems encountered in the design stage. Several static verification and validation methods were employed at this stage of the project. In Section 3, the subject of knowledge-base refinement is related to expert-system validation using dynamic techniques. In Section 4, we describe some simple software tools which were used for the purpose of dynamic validation during the refinement of the HEPAR system. 2 Development of the HEPAR system 2.1 Knowledge acquisition and design It is now well-recognized that the acquisition of domain knowledge in the process of building an expert system is a difficult task [6]. In recent years, many methodologies have therefore been proposed, providing systematic methods to be followed in building an expert system. Examples of such methodologies are KADS [3] and KEATS [15, 16]. Some of these method- ologies include a set of software tools which help the knowledge engineer in building a specific application, mainly by assisting in the analysis of the problem domain. Some assistance by software tools may be provided in the design of the expert system as well. Examples of such tools are Shelley [2] and Acquist [15, 16]. Most methodologies place considerable emphasis on the process of gathering domain knowledge to be incorporated into the expert system, and on the development of conceptual models of the domain, being the result of the analysis of the knowledge collected. Although the HEPAR system was actually designed long before such methodologies came into play, the development of the system was initially carried out in a structured way, mainly following the top-down design methodology from software engineering. The knowledge con- cerning diagnosis in liver and biliary disease incorporated into the HEPAR system was derived from the experience of a specialist in internal medicine and hepatology and from the medical literature. The analysis of the problem of diagnosis of disorders of the liver and biliary tract indicated that the following aspect were important in this domain: • Expert hepatologists follow a clear and unambiguous strategy in diagnosis. The early classification of a patient’s disorder into rather general categories, such as whether or not 2 the disorder is biliary-obstructive in nature, is used for the selection of supplementary tests to reduce the number of alternative diagnoses to be considered. • Early in the diagnostic process only a limited amount of patient data is available, mainly obtained from medical history and physical examination. Still, a hepatologist is often capable of coming up with a working diagnosis of the patient’s disorder. In the design of the HEPAR system, the strategy followed by the hepatologist has been taken as the point of departure for problem decomposition of the diagnostic process, by distinguishing several subtasks [11]. The requirement that the ultimate system ought to be able to assist the clinician in the initial assessment of the patient, in whom only a limited amount of data is available, as well as in the assessment of the patient in whom more specific test results are known, proved to be extremely difficult. In general, to explicitly deal with data not available in the patient would yield an exponential number of combinations of conditions on known and unknown patient findings to be taken into account. Although the hepatologist involved in the project was able to reduce the number of useful combinations considerably, we felt quite uncertain with respect to the suitability of this knowledge for classifying actual patient cases. Note that current knowledge-acquisition methodologies do not offer much help in solving this problem. In most popular methodologies, the design process is essentially viewed as the process of abstraction from reality. Our problem was that we required some form of experimental feedback in refining the expert system to accommodate to reality. Likewise, only limited attention has been given to tools that provide information about the diagnostic quality of the advice produced by the expert system. 2.2 Implementation and experimental feedback Implementation of the HEPAR system was carried out using the EMYCIN-like expert-system shell DELFI-2 [9]. The advice produced by the system is in the form of a collection of conclusions: 1. Whether the patient has a hepatocellular of biliary-obstructive disorder. 2. Whether the features found in the patient indicate benign or malignant disease. 3. A differential diagnosis consisting of a collection of specific disorders, selected out of a set of about 80 disorders, each explaining some of the findings observed in the patient. In HEPAR, the first two conclusions are intermediate and the last is a final conclusion. After completing a considerable portion of the knowledge base, it was decided to carry out some experiments with the system using data from real patients to investigate whether the system was able to meet our expectations. It appeared that the system was unable to come up with an acceptable advice in many cases. An analysis of the results of this initial experiment yielded the following reasons for the disappointing performance: • Many rules were formulated too rigorously, such that these rules almost never applied in a patient with the given disease. • Many rules were defined without explicitly mentioning the medical context in which they should hold. These rules frequently succeeded in patients for whom they had not been designed. 3 if same(patient,complaint,abdominal pain) and ⇒notsame(patient,complaint,fever) and same(patient,signs,hepatomegaly) and same(pain,character,continuous) and same(ultrasound liver,parenchyma,multiple cysts) and same(patient,nature-disorder,benign) then conclude(patient,diagnosis,polycystic disease) with CF = 1.00 fi Figure 1: Weakly formulated production rule. if (same(complab,duration,chronic) or same(patient,type-disorder,biliary obstructive)) and same(patient,sex,female) ⇒ same(patient,sex,male) and same(serol,mitochondrial Ab,yes) then conclude(patient,diagnosis,primary biliary cirrhosis) with CF = 0.80 ⇒ with CF = 0.60 fi Figure 2: Rigorously formulated production rule. These problems may actually be taken as an indication of the knowledge clinicians draw upon in medical practice. Firstly, the knowledge of the clinician is partly based on the descriptions given in medical textbooks, in which there is little place for the description of atypical disease patterns, and partly on experience in the management of specific disorders. Rules which have been formulated too rigorously tend to describe the typical picture of the disease, and may assume the availability of an unrealistic amount of data for the patient. Secondly, the clinician has considerable experience with disorders frequently observed in clinical practice. However, these disorders carry a clinical context which the clinician may not be able to make explicit. Formalizing such knowledge may yield rules with a wider application than intended. As an example, consider the production rule depicted in Figure 1. In this HEPAR rule we use the object–attribute–value representation and certainty factors as employed in the DELFI-2 system [9, 13]. This rule was originally formulated without the second condition (indicated by the right arrow); for the modified rule to be applicable, fever must be absent in the patient. So, in its original form it is an example of a too weakly formulated production rule. This missing condition caused the original rule to interact with Caroli’s disease, infected liver cysts and cystic liver metastases. 4 The production rule shown in Figure 2 is an example of a too rigorously formulated rule, since in the form left from the right arrows, the rule is only applicable for female patients. Typically, a patient having primary biliary cirrhosis is a female, but the disorder is not limited to the female sex. A new production rule was therefore added to the knowledge base in which the expressions specified right from the arrow replaced the expressions left from the arrow. As said above, clinicians often have to base their early decisions on incomplete clinical evidence; likewise, expert systems must be able to deal with incomplete evidence as well. In designing and implementing the HEPAR system we have tried to explicitly handle incomplete patient data in the following two ways: 1. By distinguishing several different conceptual levels in the diagnostic problem-solving process; 2. By explicitly incorporating knowledge about unknown diagnostic test results for a pa- tient into the knowledge base. To deal with the first source of incompleteness of information in the HEPAR system, rules were drafted covering only the symptoms and signs of a disorder obtained early in the diagnostic process, whereas other rules were drafted only covering the results of supplementary tests obtained later in the diagnostic process. In this way a more or less layered structure of the knowledge base was obtained. The second source of incompleteness was dealt with by inspecting rules for conditions on data not always available in the patient. Some of these rules were used as a basis for new rules containing conditions concerning unknown data. Mainly static verification and validation methods were employed at this stage; the early experiments were only carried out to validate our design rationale. 3 Knowledge base refinement and validation 3.1 Refinement parameters The process of refining an expert system may be viewed as the iterative process of validating, extending and adapting its knowledge base. Here, we are concerned with dynamic validation. Since the extension and adaptation are based on the results of the validation, it is clearly important to decide on the parameters used for validating an expert system. The point of departure for expert system validation is to consider a medical diagnostic expert system as a computer program that tries to construct a model of a given patient which it compares with prestored descriptions of disease patterns in its knowledge base. Under ideal circum- stances, validation of a diagnostic expert system should therefore not only pertain to the advice produced by the expert system, but also to the assumptions made by the reasoning process on which the conclusions are based. It is, for instance, important to know whether it is possible for an expert system to arrive at conclusions which are correct, but based on incorrect assumptions. So, the conclusions of a medical diagnostic expert system should not be interpreted as unique answers, but as judgements of the patient’s status [8]. As a con- sequence, not unique answers but judgements should be validated. This view of validation is also able to cope with situations in which the expert system’s conclusions are incorrect, but nevertheless acceptable in the light of the data available. This approach to validation is particularly valuable when applied in refining an expert system; it is less appropriate in the final validation of an expert system [7]. 5 3.2 Refinement by dynamic validation For the purpose of the refinement of the HEPAR knowledge base, we have investigated several approaches to system validation. A diagnostic expert system like HEPAR may be validated against: 1. Patient cases with known clinical diagnosis; 2. The conclusions of some other, but similar decision support system; 3. The judgement of human experts in the field. In all three cases, there is a need for a test, or a combination of tests, that may be taken as a ‘gold standard’ for the comparison, although this is not as crucial as in the final validation of an expert system, because inspection of the knowledge base may provide additional information. Examples of tests that are suitable as a gold standard in the diagnosis of disorders of the liver and biliary tract are the histological examination of liver biopsies, ERCP and surgical exploration. There is not a single test available in hepatology that may be employed as a gold standard in the entire domain, because diagnosis of hepatocellular disorders differs considerably from diagnosis of biliary obstruction. Initially in the refinement, we have used part of the data from more than 1000 patients obtained from the Danish COMIK group as a source for comparison. Originally, these data have been used in the development of the Copenhagen Pocket Chart, a paper chart based on the statistical technique of logistic regression, which may assist the clinician in the early assessment of a patient with jaundice [14]. Unfortunately, this database was of limited value because only 23 disease categories were distinguished in this database, whereas in HEPAR more than 80 disorders of the liver and biliary tract are distinguished, and not all data required for HEPAR to derive final conclusions were included in the database. Therefore, the database was mainly useful for getting insight into the extent to which HEPAR was capable of dealing with incomplete patient data. During the further development of the system, a database with patient data from the Leiden University Hospital was put together, which was applied as the main device for the refinement of the knowledge base. Comparison of an expert system with some similar decision-support system is seldom straightforward, because of differences in required input and produced output among the sys- tems. For the refinement of the HEPAR system, we have included production rules that map diagnostic conclusions of HEPAR to the possible diagnostic conclusions of the Copenhagen Pocket Chart. The results produced by the Copenhagen Pocker Chart could thus be used as a simple means for rapidly finding patient case which deserved further study with regard to the conclusions produced by HEPAR. The hepatologist involved in the project has studied the reasoning process of HEPAR for a considerable number of patients. Validation of the reasoning process of HEPAR turned out to be very time-consuming, and as a consequence we have not been able to involve other hepatologists. However, on a limited scale we have profited from discussion with other hepatologists in examing the results produced by HEPAR. Taking the conclusions concerning the final diagnosis produced by the HEPAR system as a point of departure for refining the system was difficult, because the system’s advice consists of more than one conclusion. This problem is known as the multiple response problem [18]. Consider for example the situation that the expert system did produce the correct answer with highest ranking, as well as a conclusion with lower ranking which is however totally 6 unacceptable to the clinician. Restricting the validation only to the single conclusion with the highest ranking will give a distorted account of the actual performance of the system. These problems cannot be solved by only collecting information concerning the number of conclusions generated by the system and the ranking of the clinical diagnosis. Furthermore, consider the situation in which the conclusion with highest ranking is incorrect, but where the differential diagnosis as a whole is acceptable to the clinician. Taking only the conclusion with highest ranking into account will then give an inadequate impression of the system’s capabilities. However, due to the layered approach to diagnostic problem solving modelled in HEPAR, it is not only possible to compare specific disorders as a diagnosis with the clinical diagnosis confirmed in the patient, but also to check whether the patient’s disorder has been classified into the right diagnostic category (e.g. hepatocellular disorder). This layered approach makes it less likely that the differential diagnosis produced by the system is as a whole unacceptable to the clinician. Most of the information obtained by the dynamic validation of the HEPAR knowledge base was automatically compiled by a collection of simple software tools which are discussed in the next section. Without the availability of these tools, dynamic validation would have been too time-consuming for being practically feasible. 4 Tools for knowledge base refinement 4.1 Testing tools in software engineering Some of the software tools that have been developed for the dynamic validation of the HEPAR system, and applied in the refinement and the performance studies of the system, have been inspired by software tools commonly available in programming environments. Test-data generators, programs that systematically produce test data to be used as input to the program to be tested, are one type of program that may be useful in the dynamic validation of expert systems. However, for realistic testing, data from real-life cases are often indispensible. Another tool is the dynamic analyser, also known as the execution flow summarizer. A dynamic analyser adds instrumentation statements to a computer program in order to collect information on how many times a statement is executed. A display part of the dynamic analyser prints a summarizing execution report [19]. A tool with similar usage as the dynamic analyser is the call-graph profiler [5]. 4.2 A tool for performance measurement The environment of software tools developed for the dynamic validation of HEPAR consists of a non-interactive batch version of the expert-system shell DELFI-2 which is able to use a database of patient cases as its input. This system produces a report containing the results for each individual patient. The report together with a file containing information about the final clinical diagnoses and the two intermediate conclusions concerning the patient is then processed by a program which produces a table summarizing the results. Figure 3 shows the overall structure of the validation environment. The tools collect information with regard to the number of correct, incorrect and unclassified patient cases concerning: 1. the type of hepatobiliary disease (hepatocellular or biliary-obstructive); 7 HEPAR knowledge base Database of patient cases Batch version of DELFI-2 Report Statistics program Final clinical diagnosis of patients Summarizing tables Rule-application analyser Rule-application report/graph Figure 3: Environment for dynamic validation. 2. the nature of the disorder (benign or malignant); 3. the final diagnosis. With regard to the final diagnosis, the system computes the average number of conclusions, whether the clinical diagnosis occurs as the conclusion ranked first, or among the list of alternatives generated. An example of such a table, produced after refinement of the HEPAR system using the software tools and a database with data from 82 patients from Leiden University Hospital, is reproduced in Table 1, which shows the results for the patients after refinement. The reader should note that the system does not reach 100% correctness, for reasons discussed in Section 3.1. To obtain some insight into the capabilities of the system in handling incomplete data, the batch version of the expert-system shell can select part of the patient data from the database. For example, the system is capable of selecting only data obtained from history and physical examination. An example of a table in which the results for incomplete data are produced using this environment is shown in Table 2. As can be seen in Table 2, the number of correct final diagnoses decreased considerably when the patient data entered into the system were more incomplete. However, the percentage of incorrectly classified cases did not increase Table 1: Diagnostic results of HEPAR for a population of 82 patients with hepatobiliary disease. Conclusion Correct Incorrect Unclassified Total n (%) n (%) n (%) n (%) Type of hepato- 74 (90) 4 (5) 4 (5) 82 (100) biliary disorde Benign/malignant 78 (95) 4 (5) 0 (0) 82 (100) nature of disorder Final diagnosis 71 (86) 8 (10) 3 (4) 82 (100) Clinical diagnosis 76 (93) 3 (4) 3 (4) 82 (100) among conclusions 8 Table 2: Assessment of the effects of incompleteness of information on the diagnostic conclusions of the system, for a database of 82 patients with hepatobiliary disease. Conclusion Correct (%) Incorrect (%) Unclassified (%) A B C A B C A B C Type of hepato- 90 90 50 5 5 0 5 5 50 biliary derangement Benign or malignant 95 95 94 5 5 6 0 0 0 nature of disorder Final diagnosis 86 49 24 10 9 10 4 43 66 A: All available data presented to system. B: Only data concerning symptoms, signs, haematology and bloodchemistry (no data from ultrasound or serology presented). C: Only data from medical interview and physical examination. significantly, only the percentage of unclassified cases did. Tables such as presented here were used by the hepatologist as an indication of the effects of changes to the knowledge base. An accompanying textual report provided information for each individual patient, and gave also information about the rules applied in deriving the final conclusions. This information served as a point of departure for a more in-depth study of the reasoning behaviour of the system. 4.3 Dynamic analysis of the HEPAR knowledge base In the previous section, we have discussed how the study of the results of the HEPAR system for individual patient cases, has been employed for refining the diagnostic quality of HEPAR. A second source of information that has been used for the refinement of the system, was the contents of the HEPAR knowledge base itself, by studying the overall behaviour of the system when provided with a complete database with patient data. These tools bear some resemblance with a dynamic analyser as described in Section 4.1. In order to obtain infor- mation concerning the frequency of rule application over a given database of patient cases, the testing environment discussed in Section 4.2 includes a collection of tools which uses the report produced by the batch version of the expert system for a database of patient cases. The following results are produced by these programs: • An enumeration of all production rules used, with for each rule information about how often it has been used for a given database; • An overview of the frequency distribution of the rule application, both in textual and in graphical form. Figure 4 which was automatically produced by the environment, contains the results after refinement of the HEPAR knowledge base for the 82 patient cases from Leiden University Hospital. Most production rules (72 of about 500 rules contained in HEPAR) were applied only once. The accompanying textual form, which is reproduced in Figure 5, shows that from the rules that were applied several times, those with highest frequency were applied to conclude about the intermediate hypotheses. Only few, one to three, production rules were applied several times to reach a final conclusion. Because we have tried to obtain a rulebase 9 4 8 12 16 20 24 28 32 36 40 44 0 16 32 48 64 80 absolute frequency number of rules Figure 4: Rule-application bar graph for 82 patients. in which at most a few rules will succeed for a given case, the report does not provide results concerning failed rules. The reports were studied by the hepatologist involved in the project as another source for the refinement of HEPAR. 5 Discussion Recent knowledge-engineering methodologies place considerable emphasis on the development of conceptual models. A suitable conceptual model may be of real help in designing and im- plementing an expert system, as was also observed in the development of the HEPAR system, where a diagnostic problem-solving strategy was used as the basis for problem decomposition and structuring of the knowledge base. However, in many fields of medicine, the development of an expert system is only possible with sufficient experimental feedback, for which software tools are required. Some other software tools supporting the building of rule-based expert systems have been developed in the past. Teiresias was an experimental tool that assisted in the refinement of rule-based expert systems by interacting with the user in the analysis of the conclusions concerning single cases, applying meta-knowledge about the rulebase [4]. Although such an analysis is certainly useful, an approach as embodied by Teiresias does not give information about how well the system performs over a database of cases. Seek is a system that automatically suggests generalizations and specialization of production rules, based on the analysis of the success and failure of rules on processing case data [17]. This system is more in the line of our approach. The broadness of the domain of hepatology, and the amount of patient data incorporated in the HEPAR system, suggest that automatic tech- niques as provided by Seek are as yet not powerful enough to be applied for refining a system like HEPAR. The parameters used as a point of departure for the refinement of the HEPAR knowledge base are only a few of the many that are possible. Another elegant example of 10 FREQUENCY #RULES RULE NUMBER 1 72 161, 500, 510, 660, 700, 720, 790, 800, 860, 900, 930, 950, 980, 1100, 1140, 1150, 1330, 1340, 1370, 1410, 1480, 1540, 1610, 1630, 1710, 1730, 1820, 1980, 2000, 2010, 2030, 2050, 2140, 2240, 2250, 2380, 2530, 2680, 2690, 2750, 3011, 3020, 3022, 3090, 3100, 3101, 3120, 3160, 3171, 3192, 3220, 3340, 3370, 3425, 3430, 3450, 3460, 3470, 3520, 3530, 3610, 3780, 3820, 3870, 3902, 3930, 3960, 4000, 4111, 4122, 4162, 4260 2 26 110, 120, 130, 440, 460, 470, 740, 830, 1090, 1130, 1490, 1940, 1950, 2130, 2180, 2370, 3010, 3041, 3110, 3390, 3410, 3427, 3428, 3440, 3920, 3940 3 30 111, 112, 131, 132, 140, 190, 230, 240, 560, 580, 600, 620, 1220, 1900, 1970, 2060, 2080, 2110, 2170, 2220, 2280, 2310, 2720, 3190, 3350, 3970, 4230, 4240, 4250, 4252 4 7 260, 341, 890, 1580, 2160, 3080, 4320 5 8 1550, 1560, 2120, 2350, 3001, 3060, 3102, 3420 6 7 160, 300, 2330, 3050, 3070, 3900, 3903 7 11 90, 100, 332, 1210, 1570, 2200, 2320, 2340, 2970, 3000, 3910 8 4 270, 290, 340, 350 9 2 150, 180 10 3 250, 4080, 4310 12 2 540, 3103 18 2 220, 4020 22 1 370 26 1 5010 27 1 5000 30 1 5030 36 1 711 45 1 5020 Figure 5: Textual rule application form. knowledge-base refinement has been proposed by Adlassnig and Scheithauer in the context of the CADIAG-2/PANCREAS system [1]. They have used ROC curves for the optimal adjust- ment of the internal classification threshold in this expert system. The technique is, however, not applicable in the HEPAR system, because here failure of classification is the result of logical falsification, and not of the failure of reaching an internal threshold. Most current expert-system shells and expert-system builder tools do not offer facilities supporting the refinement of an expert system by dynamic validation. In the development of the HEPAR system, we have therefore developed a collection of simple software tools that provide useful information for the refinement of the system. These tools have also been used in two successive final validation studies of HEPAR [10, 12]. Although there are many ways in which these tools can be improved, these validation studies would not have been possible without the assistance of these tools. In our opinion, future software tools for building expert systems should offer a wider range of facilities for the detailed analysis, verification and validation of an expert system than is currently provided. References [1] K.P. Adlassnig and W. Scheithauer, Performance evaluation of medical expert systems using ROC curves, Computers and Biomedical Research 22 (1989) 297-313. [2] A. Anjewierden, J. Wielemaker, C. Toussaint, Shelley – Computer aided knowledge engineering, in: B. Wielinga, J. Boose, B. Gaines, G. Schreiber and M. van Someren, eds., Current Trends in Knowledge Acquisition (IOS Press, Amsterdam, 1990) 41-59. 11 [3] J. Breuker and B. Wielinga, Models of expertise in knowledge acquisition, in: G. Guida and C. Tasso, eds., Topics in Expert Systems Design (North-Holland, Amsterdam, 1989) 265-295. [4] R. Davis and D.B. Lenat, Knowledge-based Systems in Artificial Intelligence (McGraw- Hill, New York, 1982). [5] S.L. Graham, P.B. Kessler and M.K. McKusick, Gprof: a call graph execution profiler, in: Proceedings of the SIGPLAN’82 Symposium on Compiler Construction, SIGPLAN Notices 17 (1982) 120-126. [6] G. Guida and C. Tasso, Building expert systems: from life cycle to development method- ology, in: G. Guida and C. Tasso, eds., Topics in Expert Systems Design: methodologies and tools (North-Holland, Amsterdam, 1989) 3-24. [7] J. Hilden and J.D.F. Habbema, Evaluation of clinical decision aids – more to think about, Medical Informatics 15 (1990) 275-284. [8] C.A. Kulikowski and S.M. Weis, Representation of expert knowledge for consultation: the CASNET and EXPERT projects, in: P. Szolovits, ed., Artificial Intelligence in Medicine (Westview Press, Boulder, 1982) 21-56. [9] P.J.F. Lucas, Knowledge Representation and Inference in Rule-based Systems. Centre for Mathematics and Computer Science, Report CS-R8613, Amsterdam, 1986. [10] P.J.F. Lucas, R.W. Segaar, A.R. Janssens, HEPAR: an expert system for the diagnosis of disorders of the liver and biliary tract, Liver 9 (1989) 266-275. [11] P.J.F. Lucas, A.R. Janssens, Development and validation of HEPAR, an expert system for the diagnosis of disorders of the liver and biliary tract, Journal of Medical Informatics 16 (1991) 259-270. [12] P.J.F. Lucas, A.R. Janssens, Second evaluation of HEPAR, an expert system for the diagnosis of disorders of the liver and biliary tract, Liver 11 (1991) 340-346. [13] P.J.F. Lucas, L.C. van der Gaag, Principles of Expert Systems (Addison-Wesley, Wok- ingham, 1991). [14] P. Matzen, A. Malchow-Møller, J. Hilden, C. Thomsen, L.B. Svendsen, J. Gammelgaard, E. Juhl, Differential diagnosis of jaundice: a pocket diagnostic chart, Liver 4 (1984) 360- 71. [15] E. Motta, T. Rajan and M. Eisenstadt, A methodology and tool for knowledge acqusition in KEATS-2, in: G. Guida and C. Tasso, eds., Topics in Expert Systems Design (North- Holland, Amsterdam, 1989) 297-322. [16] E. Motta, T. Rajan, J. Domingue and M. Eisenstadt, Methodological foundation of KEATS, the knowledge engineer’s assistant, in: B. Wielinga, J. Boose, B. Gaines, G. Schreiber and M. van Someren, eds., Current Trends in Knowledge Acquisition (IOS Press, Amsterdam, 1990) 257-275. [17] P.G. Politakis, Emperical Analysis for Expert Systems (Pitman, London, 1985). 12 [18] R.E. Shannon, Systems Simulation: the art and science (Prentice-Hall, Englewood Cliffs, New Jersey, 1975). [19] I. Sommerville, Software Engineering (Addison-Wesley, Wokingham, 1992). 13 work_345bu7ezubahfezuj3appo2jgy ---- 4th Global Business Research Congress (GBRC - 2018), Vol.7-p.47-51 Kizmaz, Deveci, Kilic _____________________________________________________________________________________________________ DOI: 10.17261/Pressacademia.2018.854 47 PressAcademia Procedia AN EXPERT SYSTEM APPROACH FOR THE INTERNAL AUDIT OF ISO 9001: 2015 DOI: 10.17261/Pressacademia.2018.854 PAP- V.7-2018(7)-p.47-51 Emine Kizmaz1, Esma Deveci2, Huseyin Selcuk Kilic3 1Marmara University, Goztepe Yerleskesi 34722 Kadıköy, Istanbul, Turkey. eminekizmaz@marun.edu.tr, ORCID: 0000-0002-0009-6175 2Marmara University, Goztepe Yerleskesi 34722 Kadıköy, Istanbul, Turkey. esmadeveci@marmara.edu.tr, ORCID: 0000-0002-6601-9922 3Marmara University, Goztepe Yerleskesi 34722 Kadıköy, Istanbul, Turkey. huseyin.kilic@marmara.edu.tr, ORCID: 0000-0003-3356-0162 To cite this document Kizmaz, E., Deveci, E., Kilic, H. S. (2018) An expert system approach for the internal audit of ISO 9001: 2015. PressAcademia Procedia (PAP), V.7, p.47-51. Permemant link to this document: http://doi.org/10.17261/Pressacademia.2018.854 Copyright: Published by PressAcademia and limited licenced re-use rights only. ABSTRACT Purpose- This research proposes an expert system approach for the internal audit of ISO 9001: 2015. Methodology- The International Organization for Standardization (ISO) 9001 Certificate is one of the main indicators of a good organization. One of the important steps of providing ISO 9001 quality management system is the internal audit. Depending on its importance, this study aims to facilitate the internal audit process with respect to ISO 9001: 2015 accreditation via Expert System approach. Expert systems are knowledge based systems which can be as successful as human experts in the solution of complicated problems by utilizing the expertise and knowledge of experts. However, rule based approach is used in the proposed expert system. Findings- The internal audit evaluations and scoring reports regarding ISO 9001: 2015 directives are obtained as a result of the expert system. Conclusion- The study provides an expert system for internal audit of ISO 9001: 2015 which is essential for all the organizations. With the proposed expert system, the organizations will be able to operate the internal audit process in a systematic way without the need of an expert. Moreover, the deficiencies that do not comply with ISO requirements can be easily identified. Keywords: Expert system, internal audit, ISO 9001: 2015 JEL Codes: L15, L50, O31 1. INTRODUCTION Expert systems are computer programs with intelligence and knowledge that resemble field experts in the solution of problems in a particular complex area. Initial work on expert systems began in the late 1950s, and nowadays many areas are being used, including medicine, geology, mathematics, chemistry, computer technology, management and military. The fact that the speed of change in technology and science is high today has brought the needs of experts in various fields to the highest level. The intense competition environment requires the right information, fast and at the lowest possible cost. The fact that the number of experts working on quality is low, the time it takes for quality specialists to grow up and the cost of operating these people is costly, has led to the idea of developing expert systems in this field. The objective of this study is to suggest an expert system approach for facilitating the internal audit process with respect to ISO 9001: 2015 QMS. ISO 9001 QMS Standardization is one of the standards of quality assurance system that has been widely accepted today. Implementation and continuity of ISO 9001 QMS require expertise for all organizations which are producing products and services. The continuity of the quality system established in the organizations must be ensured. It is important that internal audits are carried out to ensure that the ISO 9001 structure is applied at this stage and the sustainability of the ISO 9001 organization is ensured. In addition, these internal audits determine the deficiencies in the system, but there is no study to measure the audited ISO 9001 QMS sensitivity and system success. ISO 9001: 2008 QMS Standards are updated to ISO 9001:2015 QMS Standards. The firms must adapt their existing QMSs according to the standards of ISO 9001: 2015 until September 15, 2018 (URL 1). In this study, an expert system approach for measuring the success of the ISO 9001 QMS items and the system in general is proposed. With this expert system approach, internal audits can be performed more quickly and the system's performance can be measured. The remainder of this study is organized as follows: “Section 2” includes the literature review. “Section 3” provides the data and methodology. Findings are provided in “Section 4” and finally “Section 5” contains the conclusion with the references following. mailto:pinaryildiran@marun.edu.tr mailto:huseyin.kilic@marmara.edu.tr 4th Global Business Research Congress (GBRC - 2018), Vol.7-p.47-51 Kizmaz, Deveci, Kilic _____________________________________________________________________________________________________ DOI: 10.17261/Pressacademia.2018.854 48 PressAcademia Procedia 2. LITERATURE REVIEW Literature review of this study is based on two parts. The first one is Quality Management System (QMS) and the other one is Expert Systems (ES). Hence, the related studies about these two main topics are as below. 2.1. Quality Management System (QMS) The concept of quality has been used since ancient times, especially after the industrial revolution, with the increase of mechanization, the production style has shifted from workshop type production to factory production and mass production. With the change of many inhabitants living, the quality conceptually emerged in the 19th century (Montgomery, 2009). Many developments happened related with quality. Juran published Quality Control Handbook (1951) and separated the quality into two components: first one is quality of design and the other one is quality of conformance (Reeves and Bednar, 1994). Deming and Juran went to Japan to provide quality-related lectures. Deming pointed out that many problems arising from production originated from the process, and that this could be controlled using statistical methods. Juran set up a managerial approach to quality control and focused on project- based teamwork and client satisfaction. Crosby revealed the concept of “zero error” (Günaydın, 2001). Moreover, at the end of the 1980s, the automotive industry started to apply Statistical Process Control (SPC). The Malcolm Baldrige National Quality Award (1986) for American companies and the European Quality Foundation (EFQM) quality award for European companies began in 1992 (Boran, 2000). Regarding the quality management systems, International Organization for Standardization (ISO) is the world's grand developer of quality standards. ISO, was set up in Geneva, Switzerland in 1947. The International Standards which ISO develop are very beneficial. This is because it conduces to making the development, manufacturing and supply of products and services more effective, confident and cleaner (Shouman et al., 2009). Gotzamani and Tsiotras (2001) emphasize that ISO 9000 standards enable companies to design and utilize an efficient and active quality system with the consideration of continuous improvement and adaptation. Arditi and Gunaydın (1997) point out that the ISO 9000 standards are composed of two basic concepts including quality management and quality assurance. However, besides implementing ISO standards to the company, the internal audit is also important. Internal audit is a function that controls and analyzes the processes and procedures within the organization, suggests improvement, performs risk analysis, and serves all the interest groups both inside and outside the company (Aslan and Özçelik, 2012). According to Türedi (2012), total quality management and internal auditing are two mutually reinforcing management elements that progress through innovations in achieving the company's goals and they are in interaction and help each other to develop. Hence, depending on its importance, it is focused on internal audit in this study. 2.2. Expert Systems (ES) Expert systems are problem-solver or decision-maker software packets that could achieve a stage of performance commonly in attenuated problem areas. Expertise is transferred to the computer from experts. This transmitted information is kept in the computer. Then users operate the computer for particular suggestions as required. The expert system inquiries about cases and could make inferences and attain to a particular result. Afterwards, such as a human advisor, it informs one who is not a specialist and declares, if required, the logic underlying the suggestion (Aronson et al., 2005). Initial work on the expert system began towards the end of 1950s and the leading system in this field is DENDRAL. This system was initiated in 1965 by E. Feigenbaum and colleagues at Stanford University in the USA to provide assistance in the identification of the structure of an organic compound by mass spectrogram and raw chemical formula (Altuntaş and Çelik, 1998). Expert system methodologies are based on eleven parts including knowledge-based systems, rule-based systems, fuzzy expert systems, neural networks, case-based reasoning, object- oriented methodology, intelligent agent systems, system architecture, modeling, database methodology and finally ontology (Liao, 2005). Regarding the structure of the expert systems, it can be stated that there is not a standard structure and the main reason why expert systems cannot have a standard structure is that each structure is more suitable for another application than the other. In general, although expert system structures differ from each other, the components that make up the expert system structure can be described as knowledge base, working memory, inference engine, explanatory system, knowledge acquisition facility and user interface (Fidan, 1994). 3. DATA AND METHODOLOGY The aim of this study is primarily to develop an expert system that can be used by all companies. This expert system is a program established within the scope of ISO 9001: 2015 QMS, to use in internal audits and to measure achievement. With this program, the deficiencies of the QMS used can be identified. The accuracy of the output of this program depends on the truthfulness of the answers to the questions to be asked and their objectivity. For this expert system, some rules were developed and all the rules were built on each part of the QMS. The weight points and the scoring structure were created and, as a result, the scoring structure was combined with the expert system. Thus, an expert system structure consisting of knowledge base, working memory, extraction engine, explanatory system, information acquisition facility and user interface was established and a system supporting the companies was established. Knowledge acquisition facility is based on ISO 9001:2015 QMS Standards in proposed expert system approach. The knowledge base names and rule numbers are shown in Table 1 as follows. 4th Global Business Research Congress (GBRC - 2018), Vol.7-p.47-51 Kizmaz, Deveci, Kilic _____________________________________________________________________________________________________ DOI: 10.17261/Pressacademia.2018.854 49 PressAcademia Procedia Table 1: Knowledge Bases and Number of Rules These rules constitute the knowledge base of the expert system. The questions created by these rules can be answered in three ways as “Yes, No or Partially”. These responses are retrieved by the inference engine for use and sent to the working memory. Later the inference engine carries out the processes necessary for interrogating these knowledge bases by searching through the forward chained control strategy. The inference engine here, in order to initiate a rule, searches the case list for cases that satisfy the condition part of the rule, and performs the result part of that rule if it finds these cases. The designed system consists of two parts which are intertwined; The Internal Audit Part (1) and The Success Measurement (Scoring) Part (2). ISO 9001 QMSs, expert systems and internal audit related literature search and consulted experts have played an important role to identify the problem. The questions are prepared by the information obtained from experts and the accreditation of ISO 9001: 2015. Depending on the answers received from the user, the question flow can change. In internal audit part, there is a knowledge base that belongs to 4 processes of organizations. Each knowledge base is divided into the questions determined within the scope of ISO 9001: 2015 standards. A number of different questions are asked to the user regarding the auditing of the system under each process heading. The important point here is that the expert system determines the questions that the user will ask in the next step, according to the answer given by the user. In other words, the “No” answer given by the user to some questions may skip the next question and lead to other questions so it makes the expert system more complex. Depending on the problem type, the user is offered a choice of two kinds of answers; Yes / No or Yes / Partially / No. The approach controlled by the four process expert systems does not have the same designation. At this point, each Weight Score (WS) of processes is determined by information from experts and by literature review. The weight scores of these operations are given in Table 2 below. Table 2: Knowledge Bases and Weight Scores Here, each process is determined according to the importance of the Weight Score (WS) in the process and the Quality Management Standards. The Process Score (PS) for each process is calculated as follows. Process Total Score (Process TS) = 10 * Y (Yes) + 5 * P (Partially) + 0 * N (No) Process Maximum Score (Process MS) = number of questions answered * 10 Process Success Percentage (Process SP) = Process TS * 100 / Process MS Process Score (The value of the WS that corresponds to the Process' success) = Process WS* Process SP / 100 There are explanations in the software section of the program with respect to scoring information (section where constants and variables are assigned), functions, rules, score calculations, and module display sections found in this expert system. These explanations provide information about the processes that ensure the program to run and provide its results. This program is prepared using JAVA programming language. The implementation of the proposed expert system approach under the ISO 9001 QMS in an enterprise is accomplished. The questions that is asked by the proposed expert system during the audit are directed to the relevant people and the responses received are entered into the expert system. Knowledge Base Name I SO 9001 Quality System Submission subject to Audit Number of Rules Process 1 Mannagement Responsibility Process 24 Process 2 Quality Management System Process 26 Process 3 Production (New Product) Process 38 Process 4 Purchasing Process 20 Knowledge Base Name I SO 9001 Quality System Submission subject to Audit Weight Score Process 1 Mannagement Responsibility Process 30 Process 2 Quality Management System Process 30 Process 3 Production (New Product) Process 24 Process 4 Purchasing Process 16 TOTAL SCORE 100 4th Global Business Research Congress (GBRC - 2018), Vol.7-p.47-51 Kizmaz, Deveci, Kilic _____________________________________________________________________________________________________ DOI: 10.17261/Pressacademia.2018.854 50 PressAcademia Procedia 4. FINDINGS After the application the expert system in a company which is in the plastic sector, the related scores are obtained. Considering the 70 % score as the base for success. There are not the processes with success percentage below 70 %. However, since, system final score is 90.5 % so, it can be said that the internal audit of the company is successful and passes the required score threshold. Table 3: Summary Display of Implementation Results and Success Measurement of the Company 5. CONCLUSION Expert systems are knowledge based systems with software and hardware, which can be as successful as human experts solving complicated problems by using expertise and knowledge of experts. Expert systems have the benefits such as using time efficiently, reducing costs, increasing productivity, and reducing errors. The ISO 9001: 2015 standard anticipates continuous improvement, and also internal audit is one of the main processes that must be used for continuous improvement. In this study, the expert system approach is combined with the scoring technique to ensure that the quality systems of the companies are checked according to the ISO 9001: 2015 standards before the quality control and the necessary improvements are made. The proposed expert system is based on rules and knowledge. For this reason, questions are prepared with the information obtained from ISO and quality management experts and question lists of these processes were determined. Since the new ISO 9001: 2015 accreditation is based on the process, the four main processes of the organizations are discussed. These processes have been prepared by extensive research and consultation of experts. Management Responsibility Process, Quality Management System Process, Production (New Product) Process, Purchasing Process are processes that are integrated into the system. The proposed system can carry out the internal audit process without the need for the experts. This reduces the cost. Also, companies often try to keep the internal audit process as short as possible because they do not want to waste their workforce. Moreover, the system performance can be measured together with the scoring system. Hence, this system provides time saving, increases productivity and continuity. In the further studies, the expert system approach proposed in this study can be improved or modified depending on technological developments and developments in the ISO QMS processes. Accordingly, more compatible systems can be used instead of the JAVA programming language depending on the developments that may occur in the software system. REFERENCES Altuntaş, E., Çelik, T. (1998). Yapay zekanın tarihçesi. Otak Yayıncılık. Arditi, D., Gunaydin, H. M. (1997). Total quality management in the construction process. International Journal of Project Management, 15(4), 235-243. Aronson, J. E., Liang, T. P., Turban, E., (2005). Decision support systems and intelligent systems. Pearson Prentice-Hall. Aslan, S., Özçelik, H. (2012). İç denetim ve toplam kalite yönetimi ilişkisi. Uluslararası Yönetim İktisat ve İşletme Dergisi, 5(10), 109-119. Boran, S. (2000). Toplam kalite yönetimi. Fidan, S. (1994). Endüstri mühendisliğinde uzman sistemler ve proje yönetim yazilimi seçiminde bir uzman sistem yaklaşimi. Doctoral Dissertation Gotzamani, K. D., Tsiotras, G. D. (2001). An empirical study of the ISO 9000 standards’ contribution towards total quality management. International Journal of Operations & Production Management, 21(10), 1326-1342. 4th Global Business Research Congress (GBRC - 2018), Vol.7-p.47-51 Kizmaz, Deveci, Kilic _____________________________________________________________________________________________________ DOI: 10.17261/Pressacademia.2018.854 51 PressAcademia Procedia Günaydın, H. M. (2001). Toplam kalite yönetimi. Mimarlar Odası İzmir Şubesi. Liao, S. H., (2005). Expert system methodologies and applications—a decade review from 1995 to 2004. Expert systems with applications, 28(1), 93-103. Montgomery, D. C. (2009). Introduction to statistical quality control. John Wiley & Sons (New York). Reeves, C. A., & Bednar, D. A. (1994). Defining quality: alternatives and implications. Academy of management Review, 19(3), 419-445. Shouman, M., Eldrandaly, K., & Tantawy, A. (2009). Software quality assurance models and expert systems. Türedi, S. (2012). İç kontrol sistemi ve toplam kalite yönetimi ilişkisi. Uluslararası Alanya İşletme Fakültesi Dergisi, 4(1). URL 1: https://www.iso.org/iso-9001-revision.html (11.12.2017) https://www.iso.org/iso-9001-revision.html work_34b345ljjbbzpisupoqauu7yee ---- Expert Systems and the Emergence of Teledesign Pr e- Pr int Expert Systems and the Emergence of Teledesign Anthony Crabbe, Dept of Design, Nottingham Trent University. anthony.crabbe@ntu.ac.uk Preprint of article published in Design Studies 25(4): 415-423 ABSTRACT This paper considers the extent to which the amateur use of expert systems for home design challenges traditional views of the design process. The issues are examined in the context of competing definitions of design. The emergence of a design process characterised as "teledesign" is then considered, wherein retailers provide a CAD/CAM service to consumers, allowing the latter to use expert systems to modify template designs and get products fabricated to their own specifications. Such a system may be seen to empower consumers as designers, rather than just selectors of products and would differ considerably from established paradigms of design, manufacture and consumption, such as that given by Baudrillard. Keywords: Expert systems, design theory, architectural design, design process, design models. 1. Introduction An "expert system" is a computer program which performs many functions normally done by human experts. Expert systems generally comprise of an interface between the user and a knowledge system that is built out of the experiences of expert practitioners in the given field. Generally, the interface comprises the communication hardware and software needed to enable the user to interact with the computer, and it may also include other peripherals such as sensing or manufacturing equipment. The knowledge system comprises of two main parts. The first is the "knowledge base", which stores expert human knowledge on a subject in a formal and hierarchical way, containing definitions like "A = list 7", and rule-based statements, such as "if A and B, then C, not D". The second part is the "inference engine", a program which determines the consequences of a user action or command by handling each rather like a query, searching for the closest match between requested actions and pre-set rules. By such means (usually invisible to the user), the engine finds and applies the closest "expert" solution for each user action. The characteristics of expert systems may be observed in recent word processor applications, where the program goes even further than performing the "craft" role of the typesetter and plays a "professional" role in for instance, editing the grammar of the author’s original text input. mailto:anthony.crabbe@ntu.ac.uk Pr e- Pr int Other examples of expert systems are medical diagnosis programs and drafting of legal contracts. In design software, professional expertise is often incorporated within a straightforward drafting application, such as 3D Studio, which allows the mechanical behaviour of the model structure to be analysed by application of engineering formulae, that provide a simulation known as "finite element analysis". The use of such computer applications is now pervasive in the professional design world, and the expertise is often disguised, being embedded as just another program function, or a command on the menu bar. The development of new expert design applications raises three major issues concerning the future nature of the design profession. The first concerns the extent to which expert systems are evolving either as assistants or as competitors to the professional. The second issue concerns the extent to which use of expert systems may alter notions of what the design process involves. The third concerns the ways in which the retailer may come to exploit the potential of expert systems as a customer service. These issues will now be examined by general reference to the medical and design professions and by specific reference to design of the home. 2. Expert systems in medicine Medicine is a profession where expert systems are developing vigorously. Whole teams of doctors are developing expert databases for their own consulting areas, others are aimed at a much broader healthcare market, providing doctors, auxiliaries and even patients with such things as a preliminary diagnosis. 5GL Doctor is an example of the latter, being aimed at markets in the developing world, where access to a doctor may be difficult (see Figure 1). Accordingly, the system may be consulted at differing levels of expertise. At present, the advised role of the expert system in medicine is that of an assistant in practice management, finding data and second opinion. The main reservation within the profession seems to concern misuse of the assistant, for instance allowing it to make decisions in place of the professionals who properly carry the responsibility and indemnification. Pr e- Pr int Figure 1 There appear to be mixed feelings about the future effect of expert medical systems, particularly in light of their effect on the current roles played by professionals., The expert application can draw on a potentially limitless databank of case history and unlike the doctor, can give an immediate quantitative calculation of the probability that its diagnosis is accurate. With the right peripherals, the application might allow someone with lesser qualifications to conduct many examinations. In time, such developments could limit the general practitioner's role much more to surgical procedures, or to matters of strategy, such as practice management. To the optimists, this would represent only a change in role due to technological innovation, to the pessimists, a threat to job status and tenure. 3. Expert systems and design practices Expert systems are also beginning to affect the role played by professional designers. Computer technology encourages many more businesses to design and print their own materials, web pages, CDs and so forth. The graphic basis of design activity makes it particularly suitable for the functioning of home computers and thus ripe for further market penetration by the software industry. Following the boom in the home Do-It- Yourself market, there are now many cheap DIY programs offering the general public the opportunity to design much of their home environment from interiors and gardens to the house itself. Furthermore, influential retailers like Homebase, are actively encouraging this trend by creating their own software and catalogues of digital images that allow the customer to review products in a simulation of their own home and make their own Pr e- Pr int design choices. As witnessed by the popularity of the suffix "Pro" on many of these software applications, the marketing message being sent to home computer users is that computer technology will empower them to rapidly assume the creative role of professionals, in fields from which the untrained and unequipped would normally be excluded. Yet, these same applications are also used by professionals and could be seen to enhance their existing advantages over amateurs. So it is not easy to judge whether this computer empowerment is set to take work away from design professionals (as word processing did from printers), democratise the activity of designing (in an arts & crafts kind of sense), or merely maintain the creative gap between the professional and the DIY enthusiast. These questions oblige us to consider further the kinds of functions performed by those called "professional" designers, before examining how far they may be taken over by non-professionals. To characterise, even summarily, the activities undertaken by professional designers, is much harder than it would be for law or medicine. The design profession embraces a wide range of specialists, from those requiring chartered and indemnified status such as architects and engineers, to those who may simply designate themselves "designers", such as fashion and interior designers. Even well into the "Designer '80’s" the terms "commercial" and "applied" art were generally used to describe what most today would call design. Looking at design education in Britain, as recently as the 1970's, the study of design history was just a branch of art history. The teaching of design in higher education embraces a number of differing approaches, ranging from the arts and crafts like approach of the traditional art college, to the engineering and management emphasis of the traditional university. As reviewed by John Walker, recent commentators have sought to define design in conflicting ways, some in terms of arts and crafts tradition, some in terms of a common activity pertaining to manufacturing industries and yet others in terms of a generalised problem solving activity, not even restricted to professionals. To these views should also be added an even more pervasive one, that design amounts to a process of "valorising" utilitarian objects (in the sense of "adding" some hitherto unrecognised value to them). British versions of this idea of valorisation are to be found in ideologies as opposite as those found in the Arts and Crafts movement of the 19th Century and the Thatcher government of the 1980’s. At face value, the Thatcher government view appeared to be that design status (most obviously evidenced in "designer label" goods) added to the price that could be asked for them, and hence the profit gained from them. However, Thatcherites also seemed to have seen value addition as another instrument to be used in their bid to create a "market culture", where tensions between business and citizenry could be reconciled through a shared value system of consumer expectations and rights. In linking the role of design to social well being, Thatcherism continued to follow the precedents set by the more left-wing approach of the Pr e- Pr int Arts and Crafts movement. This influential international movement chose to valorise craft products by reference to the self-actualisation and personal fulfillment attained by those following craft processes. Such attainments were seen by the likes of William Morris to provide the bases for developing a harmonious social system, such as that imagined in News from Nowhere, where the profit motive became the least important driver of the production process. By comparison, the contemporary Italian version of valorisation seems to lay more stress on the values added by a more wilful, individualistic form of creativity, commonly associated with the activities of the artist. Such a view clearly encourages us to regard the outcomes of design as creations, rather than discoveries and to appreciate the individual and personal qualities of designed products, rather than their objective necessity and reproducibility. This bewildering array of perspectives suggests a fundamental dichotomy between design on the one hand as a description for a set of activities common to anyone involved in making and on the other hand, as a description for professional activities found in only certain types of commercial production. This makes it considerably harder to determine what role expert systems have to play in design, compared say to Medicine, and hence to predict what impact they may have on those practising design professionally. An instructive approach then, is to focus on just one example, the design of the home. The example is useful because it covers a multitude of products designed by professionals with very different status, ranging from the architect to the gardener. It is also useful because we have historical precedents for the homeowner taking over the functions of the professional. 4. Expert systems for DIY design of the home In addition to the store software mentioned earlier, there are now a number of inexpensive home computer software applications for house and garden design. 3D Dream House from Data Becker is typical. A simple command interface allows the novice to design either a new house, or simulate their existing one with a high degree of accuracy (see Figures 2-4). Expert knowledge is built into the programming so that correct parameters are automatically set for the following: plan drawing, perspective rendering, daylight simulation, wall and ceiling thickness, stair and roof geometry, window and door placement. For the interior, the consumer can specify from a wide range of colours, textures as well as a catalogue of furnishing items. Since these items are included as scaleable bitmaps, it would only be a very short step to manufacturers providing their own catalogues of such items for the consumer to try out in their simulated home. Pr e- Pr int Figure 2 Figure 3 Pr e- Pr int Figure 4 It would appear to be only the limitations of home computer performance and program cost that prevent inclusion of more detail such as plumbing and services, specific building regulations and materials costings. In other words, we may assume that we will shortly see expert programs which permit the homeowner to quite easily generate the complete design of a home that could be validated by regulatory authorities, without any professional input. Architects may retort that most of the results would lack the creativity, judgement and aesthetic sensitivity, which they alone could bring to the scheme and that just as with in- house graphics, there would be a proliferation of ill-conceived and unoriginal designs. This would be to emphasise a) the necessity of emotional and affective qualities in designing and b) the lack of such qualities in the mechanical operations of current computers. However, this view does not negate the great practical assistance dumb machinery can provide to human users. Following the example of graphic design, it is notable that applications like Photoshop allow a level of image manipulation and exploration that far exceeds the capacity of any designer to match using hand techniques. So expert systems present many, often unforeseen, possibilities for human review. In which case it could be argued that the primary difference remaining between the professional and DIY user of expert software, is the extent to which each has learnt to choose and act "well" from the solutions offered by the machine. As the argument shifts into consideration of design merit, we see how important the definitions of design mentioned earlier become in assessing the impact of DIY design. If, for instance, we elect to define design primarily in terms of a kind of problem solving Pr e- Pr int activity - as medical diagnosis might be chararcterised - then the only thing that really seems to separate professional from non-professional design is whether it occurs in the commercial or domestic arenas. However, doctors would be quick to affirm that proper diagnosis still depends on affects in the trained mind, like intuition and inspiration, to trigger innovative lines of enquiry, without which the science of diagnosis would never advance. This may explain the consensus in the medical profession that the expert system can never act as anything more than a passive assistant. Such a line of argument helps demonstrate the difficulty in choosing between views of design, either as problem solving, or as a problem solving/valorising activity. Those who subscribe to the former view can always argue that problem solving is not reducible simply to a set of formulaic inputs and outputs - the creative dimension is still essential and is based in human, rather than mechanical processes. Subscribers to the latter view may argue that the problem to be solved in design is broader still: it must address cultural and aesthetic needs and so the general notion of "function" in design must encompass both the utilitarian and aesthetic. Hence, work judged as failing to address this broader notion of function could fulfill some, but not all the requirements necessary to be counted as design "proper". These considerations make it easier to see the traditional strength of the valorising view, which helps professionals to protect their status by recourse to critical consensus. Whereas the amateur is usually designing for personal need, the professional has a vested interest in contributing through practice and commentary to the establishment of a recognisable discipline with its own rules and critical canon. Thus, the critical pitch seems very much tilted in favour of those who have established the critical canons. Yet, the significance of the non-professional contribution has often been promoted by professional champions. Historical examples are the rehabilitation of vernacular architecture by the likes of William Morris, and the artisan's aesthetic sense by Léger. The common factor linking these disparate examples is that they have come, by whatever means, to be seen as exemplars, both of products and creative processes. The primary limitation of the valorising view lies in the extent to which it naturally tends to exclude the more engineering based kind of design, so essential to technically sophisticated and mass manufactured products. Evidence of this kind of exclusion can be found in many production processes. For instance, the blueprints of many public buildings emerge from a loop where architects periodically review and amend their conceptualisations with consulting engineers. However, as the celebrity of architects like Norman Foster and Richard Rogers testifies, the credit for the cultural and aesthetic values of the final buildings is usually attributed to the creative/artistic vision of one outstanding architect, however much that individual may point to the effort of a multi- disciplinary team. Pr e- Pr int A most striking feature of current expert systems for design is that they provide no form of guidance on aesthetic principles, such as proportion, composition and colour. Unlike their historical antecedents, the hand and the pattern book, expert systems offer the novice no advice or means of exploring the different methods of composition and construction that have achieved exemplary status. It is then hard to imagine that an expert system would ever have an effect comparable to Palladio's Four Books on Architecture, which facilitated an entire amateur architectural movement amongst the landed classes of Britain and the United States in the 18th Century. In these books, the practical guidance about architectural elements presupposes that, for instance, the "proper" dimensions are those which have classic beauty as well as constructional or utilitarian function. Such an outlook remains embedded in the heart of modernist architectural practice, as in Le Corbusier's handbook for proportioning, The Modulor. Since theories of proportion can be presented by means of mathematical argument, it would not be hard to imagine the incorporation of a system like Modulor into a computer application like 3D Dream Home. On the other hand the justification for choosing the given mathematical method is clearly less objective. We have only to look at the present day schisms in the architectural profession between so-called "late" and "post" modernists to see how the incorporation of utilities like a proportioning system into the software would draw the DIY user even further into concerns that hitherto, had been largely professional. Palladianism seems to have emerged as a tendency exactly because the handbook Four Books on Architecture gave a wider audience access to design theory, especially the more intellectually able amateurs, who like Lord Burlington, might otherwise have continued to focus on their other interests, such as music. By comparison, current expert design systems seem utilitarian and much less useful as educational assistants, both because their knowledge base excludes cultural forms, and because they share the generic limitation, noted by some educationalists, of hiding their program arguments from the user, in the interests of functionality. Some architectural academics seek to overcome such limitations by exploiting the programs’ abilities to generate entire catalogues of possible solutions for the geometric relationships found in differing architectural types. By such means they hope to discover the "shape grammars" of architectural types like the Palladian Villa, from which they believe they could deduce rules which would rapidly lead the user to better design solutions. But the degree to which the "better" geometric solutions would be better architectural ones, is open to serious objections. As noted earlier, design problems usually require the designer to address the users’ cultural expectations, to some extent. So the assessment criteria for design success may be qualitative as well as quantitative. Public hostility in 1980’s Britain to the 1960’s high-rise dwelling type showed that however successful this type was in providing structural solutions for housing that were more cost and space efficient, the solutions were still examples of unsuccessful architecture, with many being demolished inside a fraction of their structural lifespan. It then remains to be seen what impact shape grammars will have on the expert systems Pr e- Pr int used by professional and amateur architects. The relative advantage of handbooks over expert systems is that they are created by experts whose interests reach beyond the performance of their own immediate tasks and out to the practice of others. This indicates a desire to lead the discipline, not just follow it. It is unlikely the same is true of authors of expert design systems, since they are in the main, anonymous consultants operating within a computer programming team. Seen from a market perspective, expert design systems look like another instance of the computer software industry trying to take market share off an established profession, by leading consumer demand away from professional decision making to the computer dependent kind. The professional architect may then feel that expert systems offer little threat and are unlikely to enable the amateur to produce work of comparable standard, any more than Four Books on Architecture enabled Burleigh’s Chiswick House to rival the success of it’s inspiration, Palladio’s Villa Rotonda. On the other hand, unlike handbooks, expert systems give their users the technical skills to rapidly generate schemes and make their own discoveries, free of the kind of prescription or burden of respect imposed by the handbook. It is to be hoped that this new functionality will, as in the shape grammar project, lead to new creative insights amongst practitioners of all kinds. It is to be feared that culturally empty expert systems will give the DIY enthusiast a limited understanding and false sense of confidence, which will lead to a proliferation of uninspiring design jobs. 5. Teledesign If only a minority of consumers have the time and inclination to use expert systems in place of professionals, expert systems still offer one more challenge to the professional designer, especially in the field of home furnishing. This is a more significant challenge since it involves both the marketing behaviour of suppliers and the shopping habits of the consumer. Tucked away in the files of 3D Dream Home is a folder of images labelled Ikea. This contains scaleable bitmap images of furnishings designed by that company. It is then easy to envisage the following scenario. There are shops and DIY stores that already offer a service whereby the shopper can provide details of say, their kitchen, to a sales assistant who can then use an easy to learn application, similar to 3D Dream Home, to make a virtual model of the kitchen. Guided by the shopper’s preferences, the assistant can drop in the images of various product ranges of kitchen fittings and indeed cost each ensemble to the customer’s satisfaction. The purchase of the final choice can then be fed into the supplier’s order system, which may in turn be linked to the manufacturer’s stock and production systems, giving both companies the benefit of "just in time" manufacture and sale. There are also after sales Pr e- Pr int benefits for manufacturer and retailer, in cases where the purchase is made electronically, rather than with cash. The companies can use the purchase transaction data to profile both their customers and each other in order to target their future marketing, exactly as is done in other stores, such as supermarkets. Going one step further, it is notable that the computer generated images of products can be easily re-dimensioned, re-coloured, indeed re-formulated, as can any images in a computer application. If the manufacturer constructs the images in a computer application that relates to his/her own design software, then these images, far from being bitmaps, could be the designs themselves that the customer modifies. Since the designer’s original software can have inbuilt expert functions, such as finite element analysis and costing, the customer’s modifications can be automatically appraised and relevant production parameters set. Because the design files can be generators for a CAD/CAM manufacturing system, the opportunity exists in the case of many home design products, for the customer’s final choice to be sent directly to the manufacturing line for fabrication as a bespoke item. Such a process could be characterised as "teledesign", a process where the consumer uses communications technology to execute the final design, remote from the design studio (represented in Figure 5). Indeed, technology for clothing design and manufacture is fast moving towards this point, with companies like Assyst and Lectra offering store-based or online software that allows customers to enter both their measurements and preferences into existing garment design templates. The customer may even design their own textile prints and the selected fabrics can then be marked and cut ready for sewing. Figure 5 Pr e- Pr int In this respect, expert systems seem set to have an even more profound impact upon the design profession than on medicine or law. Firstly, many designers may find themselves more engaged in the design of templates, analogous to those already found in presentation software like Powerpoint, where users may take a pre-designed graphic slide and from that adjust every element to their own needs. The final product is then created to the user’s design, following the designer’s earlier cues and hints – a system quite unlike Baudrillard’s famous characterisation of products as serial reproductions of the designers originating model. This suggests secondly, that teledesign may make the professional designer more contingent and less integral to the manufacturer’s needs. Thirdly, it suggests that the designer’s solutions will be less sacrosanct, the designer will be less of an arbiter of what is available to customers and more a facilitator of customer choice. Since few design specialisms involve indemnification, the consumer would be free to ignore professional advice in a way less likely found in medicine or law. Pictured in evolutionary terms, it would take a very strong initial design by a professional to survive without mutating into quite different forms driven by consumers’ desires. Such a production system could also lead to a different kind of valorisation of products, where the purchase of durable goods would no longer be just a matter of selection, but one of creation. There might even be parallels with the vision of News from Nowhere, where a technology (ironically, an industrial one) creates a production system that gives ordinary folk the power to create their own material and cultural environment, manifested in a sociable plurality of individual creations. However, this is only presuming as did Morris, that the creative impulse is common to most people, that they do not want to delegate creative responsibility to others (especially status figures) and thereby valorise their goods by association with a cultural milieu other than their own. Consider that Morris himself, the son of a Victorian brewery owner, aspired in thought and deed to the world views of both medieval knights and artisans. Yet, if e-commerce realises even part of its much vaunted potential, then it can be seen that expert systems would be essential tools, and teledesign even more prevalent. For instance, the consumer might store on his/her home computer, a virtual model of his home, created by expert software. This model would be linked to retailers’ catalogues, so that the consumer can review new items in situ, then customise and order his chosen design, all at his home computer. In such a future, do not be surprised to receive an unsolicited e-mail that reads "Valued Customer, our records show it is three years since you last re-furbished your kitchen. As you will see from the attached video file, our leading designers here at Ikeo have re-fitted your kitchen, using our latest Smørsgabord range, enhanced by a range of new paint finishes from…" Pr e- Pr int On the other hand, don’t be surprised to discover that it was an expert system which automatically loaded the new product ranges into the kitchen design file and the customer who took credit for the re-design. REFERENCES 1. Romiszoski, A. Artificial intelligence and expert systems in education: progress, promise and problems. Australian journal of Educational Technology, 3(1), 6-24 (1987). 2. Hillson SD. Connelly DP. Liu Y. The effects of computer-assisted electrocardiographic interpretation on physicians' diagnostic decisions. Medical Decision Making. 15(2):107-12 (1995, Apr-Jun). 3. Gardner RM. Lundsgaarde HP. Evaluation of user acceptance of a clinical expert system. Journal of the American Medical Informatics Association. 1(6):428-38 (1994, Nov-Dec). 4. Ridderikhoff J. van Herk E. A diagnostic support system in general practice: is it feasible? International Journal of Medical Informatics. 45(3):133-43 (1997, Jul). 5. Lovell NH. Celler BG. Implementation of a clinical workstation for general practice. Medinfo. 8 Pt 1:777 (1995). 6. Walker, JA. Design history and the history of design, pp28-32, Pluto Press, London (1990) 7. Jervis, S. The Penguin dictionary of design and designers, Introduction, Penguin, London (1984) 8. Bayley, S. In good shape: Style in Industrial Products 1900-60, Design Council, London (1979) 9. Papanek, V. Design for the real world: human ecology and social change. 2nd ed., Thames and Hudson (1985) 10. Pearman, H. Cautious optimism at Number Ten, Design , London, (March 1987) p459. 11. Morris, W. News from Nowhere and other writings, (1891), ed. Wilmer C. Penguin Books, London (1993) 12. Branzi, A. We are the primitives (1985), in Design Discourse: History, Theory, Criticisim. Ed. Margolin, V. University of Chicago Press, Chicago (1989). 13. Radice, B. Ettore Sottsass: a critical biography. Ch 3, Thames & Hudson, London (1993) 14. Léger, F. The machine aesthetic: The manufactured object, the artisan and the artist, 1924, reprinted in Benton T & Benton C. Form and function, Crosby Lockwood Staples, London (1975) 15. See e.g. Pawley, M. Norman Foster: a global architecture, Thames & Hudson, London (1999) 16. Palladio, A. The four books of architecture 1570, Trans. Ware I., Constable, London (1965) 17. Le Corbusier. The modulor. Trans de Francia P. & Bostock A. Faber, London (1954) 18. Op cit. note 1. 19. Stiny G, Mitchell W J, The Palladian Grammar Environment and Planning B: Planning and Design 5 5-18 (1978). 20. Madrazo L, The Concept of Type in Architecture, An Inquiry into the Nature of Architectural Form, Ph.D. Dissertation, University of Zurich, ETH Zentrum No. 11115 (1995) Epilogue. 21. http://www.assyst-intl.com and http://www.lectra.com 22. Baudrillard, J. The system of objects (1968), reprinted in Design after modernism: beyond the object ed. Thackera J. Thames and Hudson, London (1988) work_35b465fwlnexvbuebingsjly4y ---- doi:10.1016/j.eswa.2007.10.042 Available online at www.sciencedirect.com ARTICLE IN PRESS www.elsevier.com/locate/eswa Expert Systems with Applications xxx (2007) xxx–xxx Expert Systems with Applications Imbalanced text classification: A term weighting approach Ying Liu a,*, Han Tong Loh b, Aixin Sun c a Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong SAR, China b Department of Mechanical Engineering, National University of Singapore, 9 Engineering Drive 1, Singapore 117576, Singapore c School of Computer Engineering, Nanyang Technological University, Nanyang Avenue, Singapore 639798, Singapore Abstract The natural distribution of textual data used in text classification is often imbalanced. Categories with fewer examples are under-rep- resented and their classifiers often perform far below satisfactory. We tackle this problem using a simple probability based term weight- ing scheme to better distinguish documents in minor categories. This new scheme directly utilizes two critical information ratios, i.e. relevance indicators. Such relevance indicators are nicely supported by probability estimates which embody the category membership. Our experimental study using both Support Vector Machines and Naı̈ve Bayes classifiers and extensive comparison with other classic weighting schemes over two benchmarking data sets, including Reuters-21578, shows significant improvement for minor categories, while the performance for major categories are not jeopardized. Our approach has suggested a simple and effective solution to boost the per- formance of text classification over skewed data sets. � 2007 Elsevier Ltd. All rights reserved. Keywords: Text classification; Imbalanced data; Term weighting scheme 1. Introduction 1.1. Motivation Learning from imbalanced data has emerged as a new challenge to the machine learning (ML), data mining (DM) and text mining (TM) communities. Two recent workshops in 2000 (Japkowicz, 2000) and 2003 (Chawla, Japkowicz, & Kolcz, 2003) at AAAI and ICML confer- ences, respectively and a special issue in ACM SIGKDD explorations (Chawla, Japkowicz, & Kolcz, 2004) were dedicated to this topic. It has been witnessing growing interest and attention among researchers and practitioners seeking solutions in handling imbalanced data. An excel- lent review of the state-of-the-art is given by Weiss (2004). The data imbalance problem often occurs in classifica- tion and clustering scenarios when a portion of the classes possesses many more examples than others. As pointed out 0957-4174/$ - see front matter � 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2007.10.042 * Corresponding author. Tel.: +852 34003782. E-mail address: mfyliu@polyu.edu.hk (Y. Liu). Please cite this article in press as: Liu, Y. et al., Imbalanced text clas plications (2007), doi:10.1016/j.eswa.2007.10.042 by Chawla et al. (2004) when standard classification algo- rithms are applied to such skewed data, they tend to be overwhelmed by the major categories and ignore the minor ones. There are two main reasons why the uneven cases happen. One is due to the intrinsic nature of such events, e.g. credit fraud, cancer detection, network intrusion, and earthquake prediction (Chawla et al., 2004). These are rare events presented as a unique category but only occupy a very small portion of the entire example space. The other reason is due to the expense of collecting learning examples and legal or privacy reasons. In our previous study of building a manufacturing centered technical paper corpus (Liu & Loh, 2007), due to the costly efforts demanded for human labeling and diverse interests in the papers, we ended up naturally with a skewed collection. Automatic text classification (TC) has recently witnessed a booming interest, due to the increased availability of doc- uments in digital form and the ensuing need to organize them (Sebastiani, 2002). In TC tasks, given that most test collections are composed of documents belonging to multi- ple classes, the performance is usually reported in terms of micro-averaged and macro-averaged scores (Sebastiani, sification: A term weighting approach, Expert Systems with Ap- mailto:mfyliu@polyu.edu.hk 2 Y. Liu et al. / Expert Systems with Applications xxx (2007) xxx–xxx ARTICLE IN PRESS 2002; Yang & Liu, 1999). Macro-averaging gives equal weights to the scores generated from each individual cate- gory. In comparison, micro-averaging tends to be domi- nated by the categories with more positive training instances. Due to the fact that many of these test corpora used in TC are either naturally skewed or artificially imbal- anced especially in the binary and so called ‘‘one-against- all” settings, classifiers often perform far less than satisfac- torily for minor categories (Lewis, Yang, Rose, & Li, 2004; Sebastiani, 2002; Yang & Liu, 1999). Therefore, micro- averaging mostly yields much better results than macro- averaging does. 1.2. Related work There have been several strategies in handling imbal- anced data sets in TC. Here, we only focus on the approaches adopted in TC and group them based on their primary intent. The first approach is based on sampling strategy. Yang (1996) has tested two sampling methods, i.e. proportion-enforced sampling and completeness-driven sampling. Her empirical study using the ExpNet system shows that a global sampling strategy which favors com- mon categories over rare categories is critical for the suc- cess of TC based on a statistical learning approach. Without such a global control, the global optimal perfor- mance will be compromised and the learning efficiency can be substantially decreased. Nickerson, Japkowicz, and Milios (2001) provide a guided sampling approach based on a clustering algorithm called Principal Direction Divisive Partitioning to deal with the between-class imbal- ance problem. It has shown improvement over existing methods of equalizing class imbalances, especially when there is a large between-class imbalance together with severe imbalance in the relative densities of the subcompo- nents of each class. Liu’s recent efforts (Liu, 2004) in testing different sampling strategies, i.e. under-sampling and over- sampling, and several classification algorithms, i.e. Naı̈ve Bayes, k-Nearest Neighbors (kNN) and Support Vector Machines (SVMs), improve the understanding of interac- tions among sampling method, classifier and performance measurement. The second major effort emphasizes cost sensitive learn- ing (Dietterich, Margineantu, Provost, & Turney, 2000; Elkan, 2001; Weiss & Provost, 2003). In many real scenar- ios like risk management and medical diagnosis, making wrong decisions are usually associated with very different costs. A wrong prediction of the nonexistence of cancer, i.e. false negative, may lead to death, while the wrong pre- diction of cancer existence, i.e. false positive, only results in unnecessary anxiety and extra medical tests. In view of this, assigning different cost factors to false negatives and false positives will lead to better performance with respect to positive (rare) classes (Chawla et al., 2004). Brank, Grobel- nik, Milic-Frayling, and Mladenic (2003) have reported their work on cost sensitive learning using SVMs on TC. They obtain better results with methods that directly mod- Please cite this article in press as: Liu, Y. et al., Imbalanced text clas plications (2007), doi:10.1016/j.eswa.2007.10.042 ify the score threshold. They further propose a method based on the conditional class distributions for SVM scores that works well when only very few training examples are available. The recognition based approach, i.e. one-class learning, has provided another class of solutions (Japkowicz, Myers, & Gluck, 1995). One-class learning aims to create the deci- sion model based on the examples of the target category alone, which is different from the typical discriminative approach, i.e. the two classes setting. Manevitz and Yousef (2002) have applied one-class SVMs on TC. Raskutti and Kowalczyk (2004) claim that one-class learning is particu- larly helpful when data are extremely skewed and com- posed of many irrelevant features and very high dimensionality. Feature selection is often considered an important step in reducing the high dimensionality of the feature space in TC and many other problems in image processing and bioinformatics. However, its unique contribution in identi- fying the most salient features to boost the performance of minor categories has not been stressed until some recent work (Mladenic & Grobelnik, 1999). Yang and Pedersen (1997) has given a detailed evaluation of several feature selection schemes. We noted the marked difference between micro-averaged and macro-averaged values due to the poor performances over rare categories. Forman (2003) has done a very comprehensive study of various schemes for TC on a wide range of commonly used test corpora. He has recommended the best pair among different combina- tions of selection schemes and evaluation measures. The recent efforts from Zheng, Wu, and Srihari (2004) advance the understanding of feature selection in TC. They show the merits and great potential of explicitly combining posi- tive and negative features in a nearly optimal fashion according to the imbalanced data. Some recent work simply adapting existing machine learning techniques and not even directly targeting the issue of class imbalance have shown great potential with respect to the data imbalance problem. Castillo and Ser- rano (2004) and Fan, Yu, and Wang (2004) have reported the success using an ensemble approach, e.g. voting and boosting, to handle skewed data distribution. Challenged by real industry data with a huge number of records and an extremely skewed data distribution, Fan’s work shows that the ensemble approach is capable of improving the performance on rare classes. In their approaches, a set of weak classifiers using various learning algorithms are built up over minor categories. The final decision is reached based on the combination of outcomes from different clas- sifiers. Another promising approach which receives less attention falls into the category of semi-supervised learning or weakly supervised learning (Blum & Mitchell, 1998; Ghani, 2002; Goldman & Zhou, 2000; Lewis & Gale, 1994; Liu, Dai, Li, Lee, & Yu, 2003; Nigam, 2001; Yu, Zhai, & Han, 2003; Zelikovitz & Hirsh, 2000). The basic idea is to identify more positive examples from a large amount of unknown data. These approaches are especially sification: A term weighting approach, Expert Systems with Ap- Y. Liu et al. / Expert Systems with Applications xxx (2007) xxx–xxx 3 ARTICLE IN PRESS viable when unlabeled data are steadily available. The last effort attacking the imbalance problem uses parameter tun- ing in kNNs (Baoli, Qin, & Shiwen, 2004). The authors expect to set k dynamically according to the data distribu- tion, in which a large k is granted given a minor category. In this paper, we tackle the data imbalance problem in text classification from a different angle. We present a new approach assigning better weights to the features from minor categories. After a brief review of the classic term weighting scheme, e.g. tfidf, in Section 2 and inspired by the analysis of various feature selection methods in Section 3, we introduce a simple probability based term weighting scheme which directly utilizes two critical information ratios, i.e. relevance indicators, in Section 4. These rele- vance indicators are nicely supported by the probability estimates which embody the category membership. The setup of experimental study is explained in Section 5. We carry out the evaluation and comparison of our new scheme with many other different weighting forms over two skewed data sets. We report the experimental findings and discuss their performance in Section 6. Section 7 con- cludes as well as highlights some future work. 2. Term weighting scheme Text classification (TC) is such a task to categorize doc- uments into predefined thematic categories. In particular, it aims to find the mapping n, from a set of documents D: {d1, . . . , di} to a set of thematic categories C: {C1, . . . , Cj}, i.e. n : D ? C. In its current practice, which is dominated by supervised learning, the construction of a text classifier is often conducted in two main phases (Debole & Sebas- tiani, 2003; Sebastiani, 2002): � Document indexing – the creation of numeric represen- tations of documents: – Term selection – to select a subset of terms from all terms occurring in the collection to represent the doc- uments in a better way, either to faster computing or to achieve better effectiveness in classification. – Term weighting – to assign a numeric value to each term to weight its contribution which helps a docu- ment stand out from others. � Classifier induction – the building of a classifier by learn- ing from the numeric representations of documents. In information retrieval and machine learning, term weighting has long been formulated in a form as term fre- quency times inverse documents frequency, i.e. tfidf (Baeza- Yates & Ribeiro-Neto, 1999; Salton & Buckley, 1988; Sal- ton & McGill, 1983; van-Rijsbergen, 1979). The more pop- ular ‘‘ltc” form (Baeza-Yates & Ribeiro-Neto, 1999; Salton & Buckley, 1988; Salton & McGill, 1983) is given by tfidfðti; d jÞ¼ tfðti; d jÞ� log N NðtiÞ � � ð1Þ Please cite this article in press as: Liu, Y. et al., Imbalanced text clas plications (2007), doi:10.1016/j.eswa.2007.10.042 and its normalized version is wi;j ¼ tfidfðti; d jÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPjT j k¼1tfidfðtk; d jÞ 2 q ; ð2Þ where N and jTj denote the total number of documents and unique terms contained in the collection, respectively, and N(ti) represents the number of documents in the collection in which term ti occurs at least once, and tfðti; d jÞ¼ 1 þ logðnðti; d jÞÞ; if nðti; d jÞ > 0; 0; otherwise; � ð3Þ where n(ti, dj) is the number of times that term ti occurs in document dj. In practice, the summation in Eq. (2) is only concerned about the terms occurred in document dj. The significance of the classic term weighting schemes in Eqs. (1) and (2) is that they have embodied three funda- mental assumptions of term frequency distribution in a col- lection of documents (Debole & Sebastiani, 2003; Sebastiani, 2002). These assumptions are: � Rare terms are no less important than frequent terms – idf assumption. � Multiple appearances of a term in a document are no less important than single appearance – tf assumption. � For the same quantity of term matching, long docu- ments are no less important than short documents – nor- malization assumption. Because of these, the ‘‘ltc” and its normalized form have been extensively studied by many researchers and show their good performance over a number of different data sets (Sebastiani, 2002). Therefore, they have become the default choice in TC. 3. Inspiration from feature selection Feature selection serves as a key procedure to reduce the dimensionality of input data space to save computational cost. It has been integrated as a default step for many learning algorithms, like artificial neuron network, k-near- est neighbors, decision tree, etc. In the research community of machine learning, the computation constraints imposed by the high dimension of the input data space and the rich- ness of information available to maximally identify each individual object is a well known tradeoff. The ability of feature selection to capture the salient information by selecting the most important attributes, and thus making the computing tasks tractable has been shown in informa- tion retrieval and machine learning research (Forman, 2003; Ng, Goh, & Low, 1997; Ruiz & Srinivasan, 2002; Yang & Pedersen, 1997). Furthermore, feature selection is also beneficial since it tends to reduce the over-fitting problem, in which the trained objects are tuned to fit very well the data upon which they have been built, but per- forms poorly when applied to unseen data (Sebastiani, 2002). sification: A term weighting approach, Expert Systems with Ap- Table 1 Several feature selection methods, and their functions Feature selection method Mathematical form Information gain Pðtk; ciÞ log Pðtk;ciÞPðtkÞ�PðciÞþ Pð�tk; ciÞ log Pð�tk;ciÞ Pð�tkÞ�PðciÞ Mutual information log Pðtk;ciÞ PðtkÞPðciÞ Chi-square N�½Pðtk;ciÞ�Pð�tk;�ciÞ�Pðtk;�ciÞ�Pð�tk;ciÞ�2 PðtkÞ�Pð�tkÞ�PðciÞ�Pð�ciÞ Odds ratio log PðtkjciÞ�ð1�Pðtkj�ciÞÞ ð1�PðtkjciÞÞ�Pðtkj�ciÞ tk denotes a term; ci stands for a category; P(tk, ci) denotes the probability of documents from category ci where term tk occurs at least once; Pðtk;�ciÞ denotes the probability of documents not from category ci where term tk occurs at least once; Pð�tk; ciÞ denotes the probability of documents from category ci where term tk does not occur; Pð�tk; �ciÞ denotes the probability of documents not from category ci where term tk does not occur. Table 3 Feature selection methods and their formations as represented by information elements in Table 2 Method Mathematical form represented by information elements Information gain �AþCN log AþC N þ A N log A AþB � � þ CN log C CþD � � Mutual information log(AN/(A + B)(A + C)) Chi-square N(AD � BC)2/(A + C)(B + D)(A + B)(C + D) Odds ratio log(AD/BC) 4 Y. Liu et al. / Expert Systems with Applications xxx (2007) xxx–xxx ARTICLE IN PRESS In TC, several feature selection methods have been intensively studied to distill the important terms while still keeping the dimension small. Table 1 shows the main func- tions of several popular feature selection methods. These methods are evolved either from the information theory or from the linear algebra literature (Sebastiani, 2002; Yang & Pedersen, 1997). Basically, there are two distinct ways to rank and assess the features, i.e. globally and locally. Global feature selec- tion aims to select features which are good across all cate- gories. Local feature selection intends to differentiate those terms that are more distinguishable for certain categories only. The sense of either ‘global’ or ‘local’ does not have much effect on the selection of method itself, but it does affect the performance of classifiers built upon different cat- egories. In TC, the main purpose is to address whether doc- ument belongs to a specific category. Obviously, we prefer the salient features which are unique from one category to another, i.e. a ‘local’ approach. Ideally, the salient feature set from one category does not have any items overlapping with those from other categories. If this cannot be avoided, then how to better present them has become an issue. While many previous works have shown the relative strengths and merits of these methods (Forman, 2003; Ng et al., 1997; Ruiz & Srinivasan, 2002; Sebastiani, 2002; Yang & Pedersen, 1997), our experience with feature selec- tion over a number of standard or ad hoc data sets shows the performance of such methods can be highly dependent on the data. This is partly due to the lack of understanding Table 2 Fundamental information elements used for feature selection in text classification ci �ci tk A B �tk C D A denotes the number of documents belonging to category ci where the term tk occurs at least once; B denotes the number of documents not belonging to category ci where the term tk occurs at least once; C denotes the number of documents belonging to category ci where the term tk does not occur; D denotes the number of documents not belonging to category ci where the term tk does not occur. Please cite this article in press as: Liu, Y. et al., Imbalanced text clas plications (2007), doi:10.1016/j.eswa.2007.10.042 of different data sets in a quantitative way, and it needs fur- ther research. From our previous study of all feature selec- tion methods and what has been reported in the literature (Yang & Pedersen, 1997), we noted when these methods are applied to text classification for term selection purpose, they are basically utilizing four fundamental information elements shown in Table 2. These four information elements have been used to esti- mate the probability listed in Table 1. Table 3 shows the functions in Table 1 as presented by these four information elements A, B, C and D. 4. A probability based term weighting scheme 4.1. Revisit of tfidf As stated before, while many researchers believe that term weighting schemes in the form as tfidf representing those three aforementioned assumptions, we understand tfidf in a much simpler manner, i.e. � Local weight – the tf term, either normalized or not, specifies the weight of tk within a specific document, which is basically estimated based on the frequency or relative frequency of tk within this document. � Global weight – the idf term, either normalized or not, defines the contribution of tk to a specific document in a global sense. If we temporarily ignore how tfidf is defined, and focus on the core problem, i.e. whether this document is from this category, we realize that a set of terms is needed to rep- resent the documents effectively and a reference framework is required to make the comparison possible. As previous research shows that tf is very important (Leopold & Kin- dermann, 2002; Salton & Buckley, 1988; Sebastiani, 2002) and using tf alone can already achieve good performance, we retain the tf term. Now, let us consider idf, i.e. the glo- bal weighting of tk. The conjecture is that if term selection can effectively dif- ferentiate a set of terms tk out of all terms t to represent cat- egory ci, then it is desirable to transform that difference into some sort of numeric values for further processing. Our approach is to replace the idf term with the value that reflects the term’s strength of representing a specific cate- gory. Since this procedure is performed jointly with the cat- sification: A term weighting approach, Expert Systems with Ap- Y. Liu et al. / Expert Systems with Applications xxx (2007) xxx–xxx 5 ARTICLE IN PRESS egory membership, this basically implies that the weights of tk are category specific. Therefore, the only problem left is how to compute such values. 4.2. Probability based term weights We decide to compute those term values using the most direct information, e.g. A, B and C, and combine them in a sensible way which is different from existing feature selec- tion measures. From Table 2, two important ratios which directly indicate terms’ relevance with respect to a specific category are noted, i.e. A/B and A/C: � A/B: if term tk is highly relevant to category ci only, which basically indicates that tk is a good feature to rep- resent category ci, then the value of A/B tends to be higher. � A/C: given two terms tk, tl and a category ci, the term with a higher value of A/C, will be the better feature to represent ci, since a larger portion of it occurs with category ci. In the following of this paper, we name A/B and A/C relevance indicators since these two ratios immediately indicate the term’s strength in representing a category. In fact, these two indicators are nicely supported by probabil- ity estimates. For instance, A/B can be extended as (A/N)/ (B/N), where N is the total number of documents, A/N is the probability estimate of documents from category ci where term tk occurs at least once and B/N is the probabil- ity estimate of documents not from category ci where term tk occurs at least once. In this manner, A/B can be inter- preted as a relevance indicator of term tk with respect to category ci. Surely, the higher the ratio, the more important the term tk is related to category ci. A similar analysis can be made with respect to A/C. The ratio reflects the expec- tation that a term is deemed as more relevant if it occurs in the larger portion of documents from category ci than other terms. Since the computing of both A/B and A/C has its intrin- sic connection with the probability estimates of category membership, we propose a new term weighting factor which utilizes the aforementioned two relevance indicators to replace idf in the classic tfidf weighting scheme. Consid- ering the probability foundation of A/B and A/C, the most immediate choice is to take the product of these two ratios. Therefore, the proposed weighting scheme is formulated as tf � log 1 þ A B A C � � : ð4Þ 5. Experiment setup Two data sets were tested in our experiment, i.e. MCV1 and Reuters-21578. MCV1 is an archive of 1434 English language manufacturing related engineering papers which we gathered by the courtesy of the Society of Manufactur- Please cite this article in press as: Liu, Y. et al., Imbalanced text clas plications (2007), doi:10.1016/j.eswa.2007.10.042 ing Engineers (SME). It combines all engineering technical papers published by SME from year 1998 to year 2000. All documents were manually classified (Liu & Loh, 2007). There are a total of 18 major categories in MCV1. Fig. 1 gives the class distribution in MCV1. Reuters-21578 is a widely used benchmarking collection (Sebastiani, 2002). We followed Sun’s approach (Sun, Lim, Ng, & Srivastava, 2004) in generating the category infor- mation. Fig. 2 gives the class distribution of the Reuters data set used in our experiment. Unlike Sun et al. (2004), we did not randomly sample negative examples from cate- gories not belonging to any of the categories in our data set, instead we treated examples not from the target cate- gory in our data set as negatives. We compared our probability based term weighting scheme with a number of other well established weighting schemes, e.g. TFIDF, ‘ltc’ and normalized ‘ltc’, on MCV1 and Reuters-21578. We also carried out the bench- marking experiments between our conjecture and many other feature selection methods, e.g. chi-square (ChiS), cor- relation coefficient (CC), odds ratio (OddsR), and informa- tion gain (IG), by replacing the idf term with the feature selection value in the classic tfidf weighting scheme. There- fore, schemes are largely formulated in a form as tf � (fea- ture value) (TFFV). Table 4 shows all eight weighting schemes tested in our experiments and their mathematic formations. Please note that basically the majority of TFFV schemes are composed of two items, i.e. the normal- ized term frequency, tf(ti, dj)/max[tf(dj)], and the term’s feature value, e.g. N(AD � BC)2/(A + C)(B + D)(A + B)(C + D), in the chi-square scheme, where tf(ti, dj) is the frequency of term ti in the document dj and max[tf(dj)] is the maximum frequency of a term in the document dj. The only different ones are TFIDF weighting, ‘ltc’ form and the normalized ‘ltc’ form as specified in Table 4. Two popular classification algorithms were tested, i.e. Complement Naı̈ve Bayes (CompNB) (Rennie, Shih, Teevan, & Karger, 2003), and Support Vector Machine (SVM) (Vapnik, 1999). The CompNB has been recently reported that it can significantly improve the performance of Naı̈ve Bayes over a number of well known data sets, including Reuters-21578 and 20 Newsgroups. Various correction steps are adopted in CompNB, e.g. data trans- formation, better handling of word occurrence dependen- cies and so on. In our experiments, we borrowed the package implemented in Weka 3.5.3 Developer version (Witten & Frank, 2005). For SVM, we chose the well known implementation SVMLight (Joachims, 1998, 2001). Linear kernel has been adopted, since previous work has shown its effectiveness in TC (Dumais & Chen, 2000; Joachims, 1998). As for the performance measurement, precision, recall and their harmonic combination, i.e. the F1-value, were calculated (Baeza-Yates & Ribeiro-Neto, 1999; van-Rijsbergen, 1979). Performance was assessed based on fivefold cross validation. Since we are very con- cerned about the performance of every category, we report the overall performance in macro-averaged manner, i.e. sification: A term weighting approach, Expert Systems with Ap- Fig. 2. Class distribution in Reuters-21578. Fig. 1. Class distribution in MCV1. 6 Y. Liu et al. / Expert Systems with Applications xxx (2007) xxx–xxx ARTICLE IN PRESS macro-average F1, to avoid the bias for minor categories in imbalanced data associated with micro-averaged scores (Sebastiani, 2002; Yang & Liu, 1999). Please cite this article in press as: Liu, Y. et al., Imbalanced text clas plications (2007), doi:10.1016/j.eswa.2007.10.042 Major standard text preprocessing steps were applied in our experiments, including tokenization, stop word and punctuation removal, and stemming. However, feature sification: A term weighting approach, Expert Systems with Ap- Table 4 All eight weighting schemes tested in the experiments and their mathematic formations, where the normalized term frequency ntf is defined as tf(ti, dj)/ max[tf(dj)] Weighting scheme Name Mathematical formations tf � chi-square ChiS ntf � N(AD � BC)2/(A + C)(B + D)(A + B)(C + D) tf � correlation coef. CC ntf � ½ ffiffiffiffi N p ðAD � BCÞ= ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðA þ CÞðB þ DÞðA þ BÞðC þ DÞ p � tf � odds ratio OddsR ntf � log(AD/BC) tf � info gain IG ntf � ðAN log AN ðAþBÞðAþCÞþ C N log CN ðCþDÞðAþCÞÞ TFIDF TFIDF ntf � logð NNðtiÞÞ tfidf � ltc ltc tfðti; d jÞ � logð NNðtiÞÞ Normalized ltc nltc tfidfltcffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP tfidf 2ltc p Probability based Prob. ntf � log 1 þ AB A C � Y. Liu et al. / Expert Systems with Applications xxx (2007) xxx–xxx 7 ARTICLE IN PRESS selection was skipped and all terms left after stop word and punctuation removal and stemming were kept as features. 6. Experimental results and discussion 6.1. Overall performance Fig. 3 shows the overall performance of eight weighting schemes tested over MCV1 and Reuters-21578 using SVM and CompNB. They are reported in terms of macro-aver- aged F1-values. Our first observation is that all TFFV weighting schemes, e.g. tf � chi-square, tf � information gain and our probability based one, outperform classic ones, i.e. TFIDF, ‘ltc’, and normalized ‘ltc’ schemes. The TFIDF’s perfor- mance on Reuters-21578 is in line with the literature (Sun Fig. 3. The macro-averaged F1-values of eight weighting schemes teste Please cite this article in press as: Liu, Y. et al., Imbalanced text clas plications (2007), doi:10.1016/j.eswa.2007.10.042 et al., 2004; Yang & Liu, 1999). This has demonstrated the overall effectiveness of TFFV based schemes. In gen- eral, the performance patterns of eight weighting schemes on MCV1 and Reuters-21578 using two classification algo- rithms match very well. For example, our probability based term weighting scheme always take the lead in all eight schemes including the TFFV ones, and the normalized ‘ltc’ performs always the worst. When compared to TFIDF, the prevailing choice for term weighting in TC, our weighting strategy improves the overall performance from 6% to more than 12%, shown in Table 5. We also observe that when our scheme is adopted, CompNB has delivered a result which is very close to the best one that SVM can achieve using TFIDF scheme in Reuters-21578. This has demonstrated the great potential of using Comp- NB as a state-of-the-art classifier. d over MCV1 and Reuters-21578 using both SVM and CompNB. sification: A term weighting approach, Expert Systems with Ap- Table 5 Macro-averaged F1-values of TFIDF and probability based term weights on MCV1 and Reuters-21578 Classifier MCV1 21578 TFIDF Prob. TFIDF Prob. SVM 0.6729 0.7553 0.8381 0.8918 CompNB 0.4517 0.5653 0.6940 0.8120 8 Y. Liu et al. / Expert Systems with Applications xxx (2007) xxx–xxx ARTICLE IN PRESS Among the three global based classic weighting schemes, i.e. TFIDF, ‘ltc’, and normalized ‘ltc’ form, none of them can generate comparable results over either MCV1 or Reu- ters-21578. A close look into their performance reveals that classifiers built for minor categories, e.g. composite manu- facturing, electronic manufacturing and others in MCV1 or rice, natgas, cocoa and others in Reuters-21578, do not produce satisfactory results. As a result, this has largely affected the overall performance negatively. Among all TFFVs, surprisingly, odds ratio does not perform as expected, since in literature odds ratio is mentioned as one of the leading feature selection methods for TC (Ruiz & Srinivasan, 2002; Sebastiani, 2002). This implies that it is always worthwhile to reassess the strength of a term selection method for a new data set, even if it tends to per- form well. 6.2. Gains for minor categories As shown in Figs. 1 and 2, both MCV1 and Reuters- 21578 are skewed data sets. While MCV1 possesses 18 cat- egories with one major category occupying up to 25% of Fig. 4. F1 scores of TFIDF and the probability based term weight Please cite this article in press as: Liu, Y. et al., Imbalanced text clas plications (2007), doi:10.1016/j.eswa.2007.10.042 the whole population of supporting documents, there are six categories where each owns only 1% of MCV1, and other 11 categories falling below the average, i.e. 5.5%, if MCV1 is evenly distributed. The same case also happens to Reuters-21578 data set. While it has 13 categories, grain and crude, the two major categories, share around half of the population, there are eight categories in total whose shares falling below the average. Previous literature did not report successful stories over these minor categories (Sebastiani, 2002; Sun et al., 2004; Yang & Liu, 1999). Note, this imbalance situation is even worse when the training examples are arranged in the so called ‘‘one- against-all” setting for the induction of classifiers, i.e. examples from the target category (a minor category in this case) are considered as positive while examples from the rest categories are all deemed as negative. Nevertheless, given the nature of TC is to answer whether the document belongs to this particular category or not, the ‘‘one-against- all” setting is still the prevailing approach in TC, owing much to the fact that it dramatically reduces the number of classifiers to be induced. Since the proposed probability based weighting scheme is the best in the benchmarking test over both MCV1 and Reuters-21578, we intend to examine why this is the case. Therefore, we plot its performances in detail against TFIDF in Figs. 4 and 5, respectively. This is largely because TFIDF is the best among the three classic approaches as well as the default choice for TC in its cur- rent research and application (Sebastiani, 2002). A close examination of Figs. 4 and 5 shows that the probability based scheme produces much better results ing scheme tested over MCV1 using both SVM and CompNB. sification: A term weighting approach, Expert Systems with Ap- Fig. 5. F1 scores of TFIDF and the probability based term weighting scheme tested over Reuters-21578 using both SVM and CompNB. Y. Liu et al. / Expert Systems with Applications xxx (2007) xxx–xxx 9 ARTICLE IN PRESS over minor categories in both MCV1 and Reuters-21578, regardless of classifiers used. For all minor categories shown in both figures, we observed a sharp increase of per- formance occurs when the system’s weighting method switch from TFIDF to the probability one. Table 6 reveals more insights with respect to the system performance. In general, we observe that using the proba- bility based term weighting scheme can greatly enhance the systems’ recalls. Although it falls slightly below TFIDF in terms of precision using SVM, it still improves the systems’ precisions in CompNB, far superior to those TFIDF can deliver. For SVM, while the averaged precision of TFIDF in MCV1 is 0.8355 which is about 5% higher than the prob- ability’s, the averaged recall of TFIDF is 0.6006 only, far less than the probability’s 0.7443. The case with Reuters- 21578 is even more impressive. While the averaged preci- sion of TFIDF is 0.8982 which is only 1.8% higher than the other, the averaged recall of probability based scheme reaches 0.9080, in contrast to TFIDF’s 0.7935. Overall, the probability based weighting scheme surpasses TFIDF in terms of F1-values over both data sets. Table 6 Macro-averaged precision and recall of TFIDF and probability based term weights on MCV1 and Reuters-21578 Data Classifier precision recall TFIDF Prob. TFIDF Prob. MCV1 SVM 0.8355 0.7857 0.6006 0.7443 CompNB 0.4342 0.6765 0.4788 0.5739 21578 SVM 0.8982 0.8803 0.7935 0.9080 CompNB 0.5671 0.7418 0.9678 0.9128 Please cite this article in press as: Liu, Y. et al., Imbalanced text clas plications (2007), doi:10.1016/j.eswa.2007.10.042 6.3. Significance test To determine whether the performance improvement gained by the probability based scheme and other TFFVs over these two imbalanced data sets are significant, we per- formed the macro-sign test (S-test) and macro-t-test (T- test) on the paired F1-values. As pointed out by Yang and Liu (1999), on the one hand, the S-test may be more robust in reducing the influence of outliers, but at the risk of being insensitive or not sufficiently sensitive in perfor- mance comparison because it ignores the absolute differ- ence between F1-values; on the other hand, the T-test is sensitive to the absolute values but could be overly sensitive when F1-values are highly unstable, e.g. for the minor ategories. Therefore, we adopt both tests here to give a comprehensive understanding of the performance improvement. Since for both data sets, TFIDF performs better than the other two classic approaches, we choose it as the represen- tative of its peers. For both the S-test and T-test, we actually conduct two sets of tests over two data sets, respectively. One is to test all TFFV schemes, including the probability one, against TFIDF and the other one is to test the proba- bility scheme against others. While the first aims to assess the goodness of schemes in the form of TFFVs, the second intends to test whether the probability based scheme does generate better results. Table 7 summarizes p-values in the S-test for TFFV schemes against TFIDF and the probabil- ity one against others over two data sets. We consider two F1-values to be the same if their difference is not more than sification: A term weighting approach, Expert Systems with Ap- Table 8 t-Values of pairwise T-test on MCV1 and Reuters-21578, where a = 0.001 Test TFIDF ChiS CC OddsR IG Prob. MCV1 SVM, t-critical = 3.354 XX vs. TFIDF – 1.963E+01 2.151E+01 8.343E+00 2.588E+01 3.017E+01 Prob. vs. XX 3.017E+01 1.347E+01 1.069E+01 2.400E+01 6.571E+00 – CompNB, t-critical = 3.354 XX vs. TFIDF – 2.135E+01 2.049E+01 4.597E+00 2.419E+01 3.127E+01 Prob. vs. XX 3.127E+01 9.468E+00 1.043E+01 2.649E+01 8.192E+00 – Reuters-21578 SVM, t-critical = 3.467 XX vs. TFIDF – 2.516E+01 1.957E+01 1.692E+00 2.435E+01 2.889E+01 Prob. vs. XX 2.889E+01 3.682E+00 8.587E+00 2.262E+01 3.993E+00 – CompNB, t-critical = 3.467 XX vs. TFIDF – 2.157E+01 1.946E+01 5.318E+00 2.130E+01 3.064E+01 Prob. vs. XX 3.064E+01 4.926E+00 9.127E+00 2.167E+01 6.128E+00 – Table 7 p-Values of pairwise S-test on MCV1 and Reuters-21578, where two F1-values are the same if their difference is not more than 0.01 Test TFIDF ChiS CC OddsR IG Prob. MCV1 SVM XX vs. TFIDF – 4.813E�02 2.090E�03 5.923E�02 1.544E�02 6.561E�04 Prob. vs. XX 6.561E�04 6.363E�03 3.841E�02 6.561E�04 1.051E�01 – CompNB XX vs. TFIDF – 4.813E�02 4.813E�02 2.403E�01 4.813E�02 2.452E�02 Prob. vs. XX 2.452E�02 4.813E�02 2.452E�02 6.363E�03 1.544E�02 – Reuters-21578 SVM XX vs. TFIDF – 3.174E�03 3.174E�03 1.938E�01 5.859E�03 3.174E�03 Prob. vs. XX 3.174E�03 2.744E�01 1.938E�01 1.929E�02 3.872E�01 – CompNB XX vs. TFIDF – 3.271E�02 3.271E�02 1.133E�01 7.300E�02 3.271E�02 Prob. vs. XX 3.271E�02 1.133E�01 1.133E�01 5.859E�03 1.334E�01 – 10 Y. Liu et al. / Expert Systems with Applications xxx (2007) xxx–xxx ARTICLE IN PRESS 0.01, i.e. 1%. Table 8 summarizes t-values of T-test for the identical comparison settings, where a is 0.001. From the results we can summarize the strength of dif- ferent schemes. Consider the merits evaluated based on TFFVs against TFIDF, TFFVs have shown that they are the better approach in handling imbalanced data. Among various TFFVs, our proposed scheme claims the leading performance tested in both MCV1 and Reuters-21578, regardless of the classifier used. However, the approach based on the odds ratio is not much superior to TFIDF. With respect to the evaluation based on the merits of the probability scheme against others, it is not surprising to see that the new scheme still takes the lead. It manages to perform better than other approaches in TFFV, e.g. infor- mation gain and chi-square, where the absolute difference of F1-values is considered. Finally, the results of informa- tion gain, chi-square and correlation coefficient shown in our tests are compatible with those in literature (Forman, 2003; Yang & Pedersen, 1997). In general, the more minor Please cite this article in press as: Liu, Y. et al., Imbalanced text clas plications (2007), doi:10.1016/j.eswa.2007.10.042 categories the data set possesses, the better overall perfor- mance can be elevated if the probability based weighting scheme is chosen. 7. Conclusion and future work Handling of imbalanced data in TC has become an emerging challenge. In this paper, we introduce a new weighting paradigm which is generally formulated as tf � (feature value) (TFFV) to replace the classic TFIDF based approaches. We propose a probability based term weighting scheme, which directly makes use of two critical information ratios, as a new way to compute the term’s weight. These two ratios are deemed to possess the most salient information reflecting the term’s strength in associ- ating a category. Their computation does not impose any extra cost compared to the conventional feature selection methods. Our experimental study and extensive compari- sons based on two imbalanced data sets, MCV1 and Reu- sification: A term weighting approach, Expert Systems with Ap- Y. Liu et al. / Expert Systems with Applications xxx (2007) xxx–xxx 11 ARTICLE IN PRESS ters-21578, show the merits of TFFV based approaches and their ability to handle imbalanced data. Among the various TFFVs, our probability based scheme offers the best overall performance in both data sets regardless of classifier used. Our approach has suggested an effective solution to improve the performance of imbalanced TC. Start from the work reported in this paper, there are a few immediate tasks awaiting us. Since the probability scheme is derived from the understanding of feature selec- tion, the A/B � A/C itself can also be considered as a new feature selection method that reflects the relevance of terms with respect to different thematic categories. It is interesting to further explore its joint application with other algo- rithms in TC. As for the slight decrease of precision noted, we intend to remedy the situation by switching the linear kernel with a string kernel in SVM (Lodhi, Saunders, Shawe-Taylor, Cristianini, & Watkins, 2002). Another challenge we are facing is to handle the situation where the critical information needed, e.g. A and B, cannot be easily secured, i.e. in text clustering. One potential direction is to infer these critical values from a small collection of labeled data and then test how robust these values or this probability approach could be, what strategies we can pro- pose to accommodate the variation of term occurrence in the unlabeled documents, and how to modify the critical values accordingly. The whole idea falls into the emerging paradigm of semi-supervised learning. We will report our study when the results become more solid. Acknowledgement The work described in this paper was partially sup- ported by a grant from the Research Grants Council of the Hong Kong Polytechnic University, Hong Kong Spe- cial Administrative Region, China (Project No. G-YF59). References Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval. Boston, MA, USA: Addison-Wesley. Baoli, L., Qin, L., & Shiwen, Y. (2004). An adaptive k-nearest neighbor text categorization strategy. ACM Transactions on Asian Language Information Processing (TALIP), 3(4), 215–226. Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In COLT: Proceedings of the workshop on computa- tional learning theory (pp. 92–100). Brank, J., Grobelnik, M., Milic-Frayling, N., & Mladenic, D. (2003). Training text classifiers with SVM on very few positive examples. MSR- TR-2003-34. Castillo, M. D. D., & Serrano, J. I. (2004). A multistrategy approach for digital text categorization from imbalanced documents. ACM SIG- KDD Explorations Newsletter, 6(1) [Special issue on learning from imbalanced datasets]. Chawla, N., Japkowicz, N., & Kolcz, A. (Eds.). (2003). Proceedings of the ICML’2003 workshop on learning from imbalanced data sets. Chawla, N., Japkowicz, N., & Kolcz, A. (Eds.). (2004). ACM SIGKDD Explorations Newsletter, 6(1) [Special issue on learning from imbal- anced data sets]. Debole, F., & Sebastiani, F. (2003). Supervised term weighting for automated text categorization. In Proceedings of the 2003 ACM Please cite this article in press as: Liu, Y. et al., Imbalanced text clas plications (2007), doi:10.1016/j.eswa.2007.10.042 symposium on applied computing (pp. 784–788). Melbourne, Florida, USA. Dietterich, T., Margineantu, D., Provost, F., & Turney, P. (Eds.). (2000). In Proceedings of the ICML’2000 workshop on cost-sensitive learning. Dumais, S., & Chen, H. (2000). Hierarchical classification of Web content. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR2000) (pp. 256–263). Athens, Greece. Elkan, C. (2001). The foundations of cost-sensitive learning. In Proceed- ings of the 17th international joint conference on artificial intelligence (IJCAI’01) (pp. 973–978). Fan, W., Yu, P. S., & Wang, H. (2004). Mining extremely skewed trading anomalies. In Advances in database technology – EDBT 2004: Ninth international conference on extending database technology (pp. 801– 810). Heraklion Crete, Greece. Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research, 3, 1289–1305 [Special issue on variable and feature selection]. Ghani, R. (2002). Combining labeled and unlabeled data for multiclass text categorization. In International conference on machine learning (ICML 2002), Sydney, Australia. Goldman, S., & Zhou, Y. (2000). Enhancing supervised learning with unlabeled data. In Proceedings of 17th international conference on machine learning (pp. 327–334). San Francisco, California, USA. Japkowicz, N. (Ed.). (2000). Proceedings of the AAAI’2000 workshop on learning from imbalanced data sets, AAAI Tech Report WS-00-05, AAAI. Japkowicz, N., Myers, C., & Gluck, M. A. (1995). A novelty detection approach to classification. In Proceedings of the 14th international joint conference on artificial intelligence (IJCAI-95) (pp. 518–523). Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant features. Machine learning: ECML-98, 10th European conference on machine learning (pp. 137–142). Berlin, Germany. Joachims, T. (2001). A statistical learning model of text classification with support vector machines. In Proceedings of the 24th annual interna- tional ACM SIGIR conference on research and development in information retrieval (pp. 128–136). New Orleans, Louisiana, United States. Leopold, E., & Kindermann, J. (2002). Text categorization with support vector machines – How to represent texts in input space. Machine Learning, 46(1–3), 423–444. Lewis, D. D., & Gale, W. A. (1994). A sequential algorithm for training text classifiers. In Proceedings of {SIGIR}-94, 17th ACM international conference on research and development in information retrieval (pp. 3– 12). Dublin, Ireland. Lewis, D. D., Yang, Y., Rose, T. G., & Li, F. (2004). RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5, 361–397. Liu, A. Y. C. (2004). The effect of oversampling and undersampling on classifying imbalanced text datasets. Masters thesis, University of Texas at Austin. Liu, Y., & Loh, H. T. (2007). Corpus building for corporate knowledge discovery and management: A case study of manufacturing. In Proceedings of the 11th international conference on knowledge-based and intelligent information and engineering systems, KES’07, Lecture notes in artificial intelligence, LNAI, Vol. 4692 (pp. 542–550). Vietri sul Mare, Italy. Liu, B., Dai, Y., Li, X., Lee, W. S., & Yu, P. (2003). Building text classifiers using positive and unlabeled examples. In Proceedings of the third IEEE international conference on data mining (ICDM’03), Melbourne, Florida. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., & Watkins, C. (2002). Text classification using string kernels. The Journal of Machine Learning Research, 2, 419–444. Manevitz, L. M., & Yousef, M. (2002). One-class SVMS for document classification. The Journal of Machine Learning Research, 2, 139–154. sification: A term weighting approach, Expert Systems with Ap- 12 Y. Liu et al. / Expert Systems with Applications xxx (2007) xxx–xxx ARTICLE IN PRESS Mladenic, D., & Grobelnik, M. (1999). Feature selection for unbalanced class distribution and Naive Bayes. In Proceedings of the 16th international conference on machine learning, ICML’99 (pp. 258–267). Ng, H. T., Goh, W. B., & Low, K. L. (1997). Feature selection, perception learning, and a usability case study for text categorization. In ACM SIGIR forum, Proceedings of the 20th annual international ACM SIGIR conference on research and development in information retrieval (pp. 67– 73). Philadelphia, Pennsylvania, United States. Nickerson, A., Japkowicz, N., & Milios, E. (2001). Using unsupervised learning to guide re-sampling in imbalanced data sets. In Proceedings of the eighth international workshop on AI and statistics (pp. 261– 265). Nigam, K. P. (2001). Using unlabeled data to improve text classification. PhD thesis, Carnegie Mellon University. Raskutti, B., & Kowalczyk, A. (2004). Extreme re-balancing for SVMs: A case study. ACM SIGKDD Explorations Newsletter, 6(1), 60–69 [Special issue on learning from imbalanced datasets]. Rennie, J. D. M., Shih, L., Teevan, J., & Karger, D. R. (2003). Tackling the poor assumptions of Naive Bayes text classifiers. In Proceedings of the 20th international conference on machine learning (pp. 616–623). Washington, DC, USA. Ruiz, M. E., & Srinivasan, P. (2002). Hierarchical text categorization using neural networks. Information Retrieval, 5(1), 87–118. Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513– 523. Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. New York, USA: McGraw-Hill. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47. Sun, A., Lim, E.-P., Ng, W.-K., & Srivastava, J. (2004). Blocking reduction strategies in hierarchical text classification. IEEE Transac- tions on Knowledge and Data Engineering (TKDE), 16(10), 1305– 1308. Please cite this article in press as: Liu, Y. et al., Imbalanced text clas plications (2007), doi:10.1016/j.eswa.2007.10.042 van-Rijsbergen, C. J. (1979). Information retrieval (2nd ed.). London, UK: Butterworths. Vapnik, V. N. (1999). The nature of statistical learning theory (2nd ed.). New York: Springer-Verlag. Weiss, G. M. (2004). Mining with rarity: A unifying framework. ACM SIGKDD Explorations Newsletter, 6(1), 7–19 [Special issue on learning from imbalanced datasets]. Weiss, G. M., & Provost, F. (2003). Learning when training data are costly: The effect of class distribution on tree induction. Journal of Artificial Intelligence Research, 19, 315–354. Witten, I. H., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques (2nd ed.). San Francisco, CA, USA: Morgan Kaufman. Yang, Y. (1996). Sampling strategies and learning efficiency in text categorization. In Proceedings of the AAAI spring symposium on machine learning in information access (pp. 88–95). Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 42– 49). Berkeley, California, United States. Yang, Y., & Pedersen, J. O. (1997). A Comparative study on feature selection in text categorization. In Proceedings of ICML-97, 14th international conference on machine learning (pp. 412–420). Yu, H., Zhai, C., & Han, J. (2003). Text classification from positive and unlabeled documents. In Proceedings of the 12th international confer- ence on information and knowledge management (CIKM 2003) (pp. 232–239). New Orleans, LA, USA. Zelikovitz, S., & Hirsh, H. (2000). Improving short text classification using unlabeled background knowledge. In Proceedings of the 17th interna- tional conference on machine learning (ICML2000). Zheng, Z., Wu, X., & Srihari, R. (2004). Feature selection for text categorization on imbalanced data. ACM SIGKDD Explorations Newsletter, 6(1), 80–89 [Special issue on learning from imbalanced datasets]. sification: A term weighting approach, Expert Systems with Ap- Imbalanced text classification: A term weighting approach Introduction Motivation Related work Term weighting scheme Inspiration from feature selection A probability based term weighting scheme Revisit of tfidf Probability based term weights Experiment setup Experimental results and discussion Overall performance Gains for minor categories Significance test Conclusion and future work Acknowledgement References work_35cydgiokfe3bhmwz2zsowhzwi ---- untitled Supporting activity modelling from activity traces Olivier L. Georgeon,1 Alain Mille,1 Thierry Bellet,2 Benoit Mathern1 and Frank E. Ritter3 (1) Université de Lyon, 86 Rue Pasteur 69007 Lyon, France Email: olivier.georgeon@liris.cnrs.fr (2) Institut Français des Sciences et Technologies des Transports, de l’Aménagement et des Réseaux, 25, Avenue François Mitterrand, 69500 Bron, France (3) The Pennsylvania State University, University Park, PA 16802, USA Abstract: We present a new method and tool for activity modelling through qualitative sequential data analysis. In particular, we address the question of constructing a symbolic abstract representation of an activity from an activity trace. We use knowledge engineering techniques to help the analyst build an ontology of the activity, that is, a set of symbols and hierarchical semantics that supports the construction of activity models. The ontology construction is pragmatic, evolutionist and driven by the analyst in accordance with their modelling goals and their research questions. Our tool helps the analyst define transformation rules to process the raw trace into abstract traces based on the ontology. The analyst visualizes the abstract traces and iteratively tests the ontology, the transformation rules and the visualization format to confirm the models of activity. With this tool and this method, we found innovative ways to represent a car-driving activity at different levels of abstraction from activity traces collected from an instrumented vehicle. As examples, we report two new strategies of lane changing on motorways that we have found and modelled with this approach. Keywords: sequence mining, timeline analysis, activity trace, knowledge-based system, activity modelling 1. Introduction We introduce here new principles based on knowledge engineering techniques for designing systems to help analysts create models of activity from activity traces. We illustrate these princi- ples with a software tool that we have imple- mented, and with an example modelling analysis that we have performed using this tool. By activity trace we mean a set of multiple streams of quantitative or symbolic data that record (at least partially) an activity performed by a subject. The analysts may be psychologists seeking to build theories of the subject’s cogni- tion, ergonomists seeking to design better user interfaces, analysts seeking to predict the sub- ject’s behaviour in specific conditions, trainers seeking to improve training techniques or even the subjects themselves seeking to improve their understanding of their actions. In each case, the created models of activity constitute micro- theories proposed by the analysts to describe, explain and try to predict how the subject per- forms the activity. The principles and the tool that we introduce here address three needs for helping analysts construct models of activity from activity traces. The first need is for helping the analyst learn previously unknown aspects or details of the subject’s activity from the activity traces. The DOI: 10.1111/j.1468-0394.2011.00584.x Article _____________________________ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 c! 2011 Blackwell Publishing Ltd Expert Systems 1 E X S Y 5 8 4 B Dispatch: 11.2.11 Journal: EXSY CE: Sandeep Journal Name Manuscript No. Author Received: No. of pages: 15 PE: Chris/Satish EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M mailto:olivier.georgeon@liris.cnrs.fr mailto:olivier.georgeon@liris.cnrs.fr mailto:olivier.georgeon@liris.cnrs.fr mailto:olivier.georgeon@liris.cnrs.fr second need is for helping the analyst con- struct meaningful symbolic representations of interesting aspects of the activity. These repre- sentations, associated with the explanations proposed by the analyst, constitute the models of activity. The third need is for helping the analyst test and support the created models of activity with regard to the activity trace. Activity traces have also been called protocols (Ericsson & Simon, 1993) or simply sequential data (Sanderson & Fisher, 1994). We prefer the term activity trace because this term conveys the idea that the trace is intended to be interpreted by somebody (designated here as the analyst). We think of a trace as a footprint that helps who sees it understand what happened. Our activity traces yet differ from mere footprints in that they are not accidentally produced or unprocessed but they rather result from the analyst’s choices and set-up. Many software tools have been implemented for activity trace analysis. A recent review (Hil- bert & Redmiles, 2000) notes 40 of them. These tools cannot autonomously generate a compre- hensive explanation of human behaviour but they interactively support analysis. This analysis consists of identifying, categorising, labelling and transforming pieces of data and informa- tion in the activity trace. We summarize this process by the notion of abstraction. The ana- lysts use their expertise and knowledge to for- mulate a whole set of tiny hypotheses and choices concerning how to collect the data, how to filter it, how to cluster and label it and how to display and to report it so that it responds to the analysis purpose. Although most of the existing tools acknowl- edge the central role of the analysts and the importance of their knowledge and expertise in the analysing process, these tools still lack knowledge representation mechanisms to sup- port the management of the analysts’ knowl- edge. For instance, MacShapa (Sanderson et al., 1994) does help analysts label and cluster the behavioural data. It also acknowledges the usage of these labels as symbols to describe the activity. It, however, does not help the analyst formulate and manage the symbolic inferences she can make from these symbols. We formulated the hypothesis that aspects of knowledge engineering can help design software systems that address the three needs identified above. We use ontology management facilities and rule engines to capture the hypotheses and choices made by the analyst. When interactively used by the analyst or a group of analysts, these facilities help formalize the way analysts find interesting symbolic patterns and infer models from them. Once this knowledge is formalized, the system uses it to automatically compute new representations of the activity from the activity traces. The system also helps analysts organize and store the different concepts and rules that summarize different studies and help capitalize on these studies. To explain our principles and demonstrate the system and its design, we have organized this paper as follows: Section 2 presents the principles of activity trace modelling, based on a pragmatic and evolutionist approach. Section 3 presents the prototype software tool that we have implemented from these principles, its technical features, its architecture and its user interface. Section 4 presents an example study in which we have used this tool to create models of lane change on motorways from activity traces generated with an instrumented car. The meth- od is then summarized in the conclusion. 2. Modelling activity traces The notion of activity traces is widely used in the human behaviour literature, and we cannot attribute its origin to a specific author. Only more specific-related notions can be identified, such as pattern languages, as reviewed by Dear- den and Finlay (2006), or grammar representa- tions (Olson et al., 1994). Despite the wide usage of the term activity trace, we could not find a definition of it, which led us to propose the following definition: An activity trace is a meaningful inscription, from the viewpoint of an analyst, of the flow of what has happened, from the viewpoint of a subject. With this definition, we want to highlight that an activity trace always implies two viewpoints, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 2 Expert Systems c! 2011 Blackwell Publishing Ltd EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M situated in two different moments. It implies the subject’s viewpoint, when he or she was performing the activity, and the analyst’s view- point, when he or she is analysing the activity trace. Indeed, an activity trace cannot be an inscription of all that happened (if that had any sense), because an activity only concerns what relates to the subject’s perspective, goals and intentions. Thus, inevitably, the analyst has to make assumptions about what is meaningful to the subject when the analyst sets up the tracing mechanism. In addition, the activity trace de- pends on what activity aspects interest the analyst and what makes sense to her according to her previous knowledge and to her analysis goals. Because an activity trace depends on the ana- lyst’s knowledge and assumptions, it can only be modelled in an iterative way, each iteration pro- ducing new knowledge leading to new hypotheses for the next iteration. Ericsson and Simon (1993) described this iterative nature of analysing human behaviour: ‘In designing our data-gathering schemes, we make minimal essential theoretical commitments, then try to use the data to test stronger theories’ (p. 274). Moreover, none of the iterations can produce knowledge that could be proven to be true in an absolute sense, but only knowledge that is more efficient and useful with regard to the analyst’s goals and that is more convincing to the analyst’s community than the knowledge from the previous iteration. More broadly, this conception of knowledge relates to a pragmatic epistemology (James, 1907) and an evolutionist epistemology (Popper, 1972). These pragmatic and evolutionist aspects are crucial when defining a methodology and a tool for activity trace modelling. By fully acknowl- edging these aspects, we have designed a tool that facilitates and accelerates the evolutionist model- ling process. The tool helps formulate a series of micro-hypotheses of possibly useful symbols, pos- sibly useful transformation rules to transform the low-level data into higher-level data and possibly useful representations of the activity trace based on the micro-hypotheses. If the obtained repre- sentation does not help the analyst understand the activity better, then she rejects these micro- hypotheses; if it helps, then she keeps them. The tool is designed to shorten this formula- tion=usage=validation-or-rejection loop. This process leads to the construction of a set of micro-hypotheses that are validated by the ana- lyst. This set constitutes a formalization of the analyst’s knowledge about how to understand the activity. The tool stores this knowledge, helps the analyst keep track of it and helps the analysts’ community discuss and question it. The next section details how the tool reaches this goal by helping the analyst define sequences at the right level of abstraction and simulta- neously identify interesting subsequences, define them precisely and query the whole trace in search for their occurrences. 2.1. Collecting a symbolic trace A raw activity trace can be made of any kind of data describing a subject’s activity flow and intended for an analyst’s usage. In a broad sense, it can range from video or audio record- ing to computer logs. The only common point is that the data are temporally organized, meaning that each data piece is associated with a time- stamp referring to a common time base. The first abstraction step consists of converting these raw traces into sequences of symbols. We refer to this step as the discretization of the raw trace into a symbolic trace. The symbols in the sym- bolic trace have to be meaningful to the analyst, and they are chosen on a pragmatic and evolu- tionist basis, in compliance with a pragmatic and evolutionist epistemology introduced above (introduction of Section 2). The discretization process can be manual, semi-automatic or auto- matic. The definition of the symbols may evolve in parallel with the implementation of the dis- cretization process. This is because the later interpretation of the symbols may differ from the meaning initially intended by the analyst when she specifies the discretization algorithm. For example, in a study of car driving, we have used the classical mathematical curve ana- lysis method to generate symbols of interest from numerical values of the vehicle speed, the steering wheel angle and the pedal positions. In this case, the symbols correspond to threshold 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 c! 2011 Blackwell Publishing Ltd Expert Systems 3 EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M crossing, local extremum and inflexion points. Figure 1 illustrates this discretization process. In Figure 1, the curves represent the vehicle speed in km=h and the brake pedal position in percentage of range. Symbols of interest are shown on these curves as circles (threshold crossings, inflexion points, local extremums). The symbols are merged into the symbolic trace that is represented at the bottom of Figure 1. The figure also shows a derivative value as an example property of interest associated with a symbol generated by a brake pedal threshold crossing. The analyst specifies the way to gen- erate these symbols so that they correspond to meaningful events that describe the activity. In this example, the threshold crossing indicates the extent of the braking action and the deriva- tive value indicates the abruptness of this action. Notably, while creating these symbols, the ana- lyst claims the existence of the events that these symbols represent. By so doing, the analyst defines an ontology of the activity. Our experience has taught us that the system must maintain a connection between the raw trace and the symbolic trace, and provide parallel displays of both of them. The analyst needs to tune many parameters of the discretiza- tion algorithms, like the threshold values or noise filters. The analyst validates the chosen symbols and algorithms by comparing the symbolic trace to the raw trace and ensuring that the symbolic trace represents what is hap- pening. While she defines and validates these symbols, the analyst also supports her claim that the events represented by these symbols ‘exist’. This support arises because the method to generate these symbols from the recorded data is formally specified and explained by the analyst. In this example, after we fully specified the discretization algorithm in accordance to our specific modelling goals, the discretization algorithm could then compute the symbols fully automatically. 2.2. Modelling the symbolic trace At the symbolic level, analysts most often want to focus on relations between events. Indeed, events are not meaningful by themselves, but they become meaningful in the context where they relate to each other (Sanderson & Fisher, 1994). Examples of such relations include a ‘sequence following within a certain period of time’, ‘co-occurrence within a certain period of time’ and ‘causality with regard to a certain explanative theory’. Building and understanding these relations between events is a part of the analysing process. By definition, a set of ele- ments connected through relations is a graph. Therefore, we model the symbolic traces with a graph structure. More precisely, our trace graph structure has two parts: a sequence and an ontology. The sequence is a part of the graph that is made of event instances and of relation instances between event instances. The ontology is a part of the graph that is made of event classes and of relations between event classes. Figure 2 illus- trates this graph structure with a simplified example taken from the car-driving study. The sequence is represented in the bottom part of Figure 2 and the ontology in the top part. In the sequence graph, event instances are represented as circles and triangles. Relation instances between event instances are repre- sented as solid arrows between these circles and triangles. Example properties of event instances are represented with grey dashed arrows point- ing to their value at the bottom of the figure (e.g. the duration of an eye movement and accelera- tion value associated with an inflexion point of the speed curve). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Time Trace Speed Brake Local extremum Threshold Inflections Figure 1: Discretization of analogical traces. 4 Expert Systems c! 2011 Blackwell Publishing Ltd EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M Our tool displays the sequence in a similar form as shown in Figure 2; the exact form is shown in Figure 4. This display uses two axes: the time axis and the ‘abstraction’ axis. That is, the events’ time-code attributes determine their x coordinates and the analyst specifies their y coordinates when she configures these event’s class in the ontology. The analyst can use the y coordinate to express different meanings; our recommendation is to use it to express an idea of abstraction level related to a specific analysis. In this example, the lowest level (circles) represents the events obtained from the discretization process; the intermediary level represents events that describe the activity in usual driving terms: accelerate, glance, turn signal on=off (blinker); and the higher level represents events describing lane-change behaviour: indica- tor of intention to change lane and index of lane change. This example expresses the analyst’s assumption that the conjunction of an accelera- tion and a glance to the left rear mirror can generate an indicator of the driver’s intention to change lane (L.C. Indicator). In the ontology graph, the nodes represent event classes and the edges represent the relation ‘sub-class of’ (dashed black arrows). The analyst defines the ontology during the modelling pro- cess. For example, the ‘Collected events’ class includes all the event classes that come from the discretization process. The ‘Driving descriptors’ class gathers the intermediary event classes that describe the activity in usual driving terms. The ‘Lane change descriptors’ class gathers the most abstract event classes describing lane changes. The dotted grey arrows in the figure represent the relation ‘type of’ going from event classes defined in the ontology to event instances in the sequence. Again, like most ontologies, this ontology is made by the analyst on a pragmatic basis. It is likely that two analysts will create two different ontologies. While the software cannot demon- strate that one is better than the other, it does help the analysts formalize and discuss them. This discussion leads to the construction of a language for describing the activity that repre- sents an agreement about the terms that can be used to describe the activity. In addition, the ontology also supports the analysts’ agreement about how the trace should be visualized, be- cause the visualization properties of the symbols are stored in the ontology. This trace formalism enables the analyst to conduct a hierarchical analysis of the activity. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Ontology Sequence Event Collected events Driving descriptors Lane Change descriptors Eye track L.C. IndexSpeed Accelerate Glance L.C. IndicatorBlinker Abstraction Time Eye left duration: 0.2s Speed acceleration: 1.1ms–2 Superclass of Type of Inference Properties Figure 2: Activity trace modelled in a knowledge engineering system. c! 2011 Blackwell Publishing Ltd Expert Systems 5 EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M Event instances form a hierarchy where higher-level events are temporal patterns of lower-level events. The ontology also defines a hierarchy because some event classes are sub-classes of others. Notably, these two hierarchies are different because lower- level event instances do not necessarily belong to a sub-class of the class of higher-level event instances. 3. System implementation We have implemented a prototype system based on an assemblage of open-source knowledge engineering tools: an ontology editor, an infer- ence engine, visualization facilities and docu- mentation facilities. This system is named ABSTRACT (Analysis of Behavior and Situa- tion for menTal Representation Assessment and Cognitive acTivity modelling). Figure 3 illus- trates this assemblage. The system can be split into three levels: a lower level, at the bottom of the figure, which is the collection system; a core level, in the centre of the figure, which is the symbolic trace system itself; and a higher level, on the top of the figure, which is a documentation level. 3.1. The collection system The collection system integrates tools to help the analyst prepare the symbolic trace. We call these tools collection agents. Collection agents may be automatic when specified once by the analyst or may require the analyst’s intervention. Auto- matic collection agents can be tools for prepro- cessing sensor data or computer logs, as in the example of Section 2.1. Semi-automatic collec- tion agents can be tools for helping the analyst take notes, record interviews or transcript video data. As noted, this discretization cannot be done blindly, but must be driven by the analyst. Hence, this level requires visualization facilities. We use Microsoft Excel with specific Visual Basic Application (VBA) macros as a visualiza- tion tool for the collection system. In this visualization, each event of the symbolic trace 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Collection System Visualization system Collection agents Collected trace Videos Notes, Interviews Logs Sensors Symbolic-Trace-System Visualization system Transformation engine Ontology editor Modeled Trace Ontology Inferences Style- sheets Documentation System Transfor- mations Models Explanations Figure 3: The architecture of the ABSTRACT activity analysis system. 6 Expert Systems c! 2011 Blackwell Publishing Ltd EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M is displayed as a line in the spreadsheet. The lines are coloured according to the event’s type and the event’s properties are organized in different columns. We have implemented a spe- cific video player and analogical data player that triggers VBA macros that automatically scrolls down the spreadsheet in synchronization. Some of these facilities are also available in commer- cial quantitative data analysis tools such as MacShapa (Sanderson et al., 1994) and NVivo. These facilities allow the analyst to check and validate or reject the symbolic trace, that is, refine the discretization algorithm and its di- verse parameters until she gets a satisfying and meaningful symbolic trace including appropri- ate properties of interest. 3.2. The symbolic trace system The symbolic trace system is the knowledge engineering system itself. At this level, the traces are modelled as described in Section 2.2. In addition, they are associated with a set of inference rules and a set of style sheets. A style sheet is a specification for displaying the trace on the screen. It specifies how semantic proper- ties of the events that are defined in the ontology should be converted into visualization proper- ties, such as shape and position. Style sheets also implement particular time scales, and particular filters to display only the interesting aspect of the trace for a particular analysis. They corre- spond to different ways of looking at the trace according to different modelling goals. The inference rules produce inferred symbols from patterns of previously existing symbols. The principle is to query the graph in the search for sub-graphs that match certain patterns, and to attach new nodes and arcs to the matching sub-graphs. These new nodes represent the in- ferred symbols and the new arcs represent the inference relations. The usage of this inference mechanism is further described in Section 3.5. Technically, the sequence part of the activity traces is encoded as resource description frame- work (RDF) graphs. We choose RDF because it is the most widely used specification for graph encoding. We use XML as a serialization of RDF to store sequences, because XML makes RDF graphs easy to share with other applica- tions. The ontology is encoded as RDF schema (RDFS), because RDFS is the simplest ontology language based on RDF. We use Protégé as a graphical ontology editor. That is, an installa- tion of Protégé is embedded in our tool, and the analyst uses it to define the ontology of his traces. The graphical displays of our traces are encoded under the scalable vector graphic (SVG) specification. Because Firefox natively supports SVG, we use it as a visualization tool, and we have implemented most of the tool as a web application in PHP. We use extensible stylesheet language (XSL) as a transformation language for transforming RDF traces into their SVG graphical representation. We use SPARQL as a query language for graphs, as we will explain in Section 3.5. 3.3. The documentation system Analysts using our symbolic trace system ex- pressed the need for a higher system layer provid- ing a way to both index and attach documentation to episodes of interest. We implemented this by associating the symbolic trace system with a Wiki. As we have made the choice of implementing ABSTRACT as a web application, analysts can reference each episode of interest by their URL, and easily paste this URL into a Wiki page. Moreover, some new Wiki implementations, like Semantic MediaWiki,1 include semantic facilities. We are still investigating how these semantic facilities can be used to merge the ontology editor with the documentation system into a single semantic documentation system. 3.4. System usage The user interface is accessible as a web page in any browser that supports SVG, such as Firefox. This interface is illustrated in Figure 4. It has four tabs: the Open tab that allows the analyst to select a trace in a list; the Info tab that displays general information about the selected trace, such as its creation date and its version, 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 1http://semantic-mediawiki.org c! 2011 Blackwell Publishing Ltd Expert Systems 7 EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M http://semantic-mediawiki.org and the management of stylesheets; the View tab that displays graphical visualizations of the trace; and the Edit tab that allows the analyst to write queries to transform the trace. The interface also provides a link to the ontology editor, Protégé. The View tab shown in Figure 4 provides the following functionalities (noted with numbered boxes): 1. Unique ID of the analysis, including aspects of the trace file and analyses done. 2. Selection of different visualization style sheets in drop-down lists. Different visuali- zations can be displayed simultaneously on the screen and their time code is synchro- nized. 3. Time code: this value corresponds to the cursor position in the visualization modules (vertical red line). The analyst can enter a time code and click Go to to focus on it. 4. Visualization sample with a time span of 10 s. The analyst can scroll the trace left and right with the mouse. This visualization example corresponds to the simplified de- scription given in Figure 2. 5. Visualization sample of an entire trace (20 min), with only the high-level symbols displayed. These trace examples come from our car-driving study and are further ex- plained in Section 4. 6. The analyst can show the symbols’ proper- ties by clicking on the symbols. 7. The system can synchronize with a video player. When this box is checked, the system gives the time-code control to the video player and automatically follows it. 3.5. The transformation mechanism The analyst uses the Edit tab to write queries that infer higher-level symbols from patterns of lower-level symbols. For instance, Table 1 illus- trates a query to infer the Lane change indicator symbol shown in Figure 2, which indicates that a 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Figure 4: ABSTRACT user interface. 8 Expert Systems c! 2011 Blackwell Publishing Ltd EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M lane change is about to happen. In this example, the analyst wants to test the hypothesis that this indicator can be inferred from a conjunction of an accelerate event with an acceleration value Z1 m=s2, followed by a glance event pointing to the left mirror (generated by an eye-tracker), both within 1 s of each other. The graph elements, either from the sequence or the ontology, are handled as triples [node, ed- ge, node]. A query consists of a selection clause (WHERE) and a CONSTRUCT clause. The selection clause specifies a pattern of triples that should match the graph, and the CONSTRUCT clause specifies a pattern of triples that should be added to the graph wherever a pattern matches the selection clause. In addition, match- ing patterns can be restricted by a FILTER clause. The syntax shown in Table 1 has been simplified for clarity.2 In this query, ?r1, ?r2, ?d1, ?d2 and ?v1 represent variables. Each of them must match the same graph element each time they appear in the query. The sequence function tests that the time codes ?d1 and ?d2 occur in order and within the parameter of 1 s of each other. In our implementation, the analyst has to know SPARQL to specify queries on the trace. To make it simpler, however, we have imple- mented a template mechanism that prepares skeletons of queries. We have also added some customized functions in our implementation of SPARQL, such as the sequence function pre- sented above. These functions facilitate the specification of queries that compare time codes of events, and make it easier to specify temporal constraints. In so doing, we are implementing semantics of time, for example, the semantics of the relation of co-occurrence and of sequential ordering. In the future, we plan to add options to let the analyst specify these queries from a graphical interface based on the visualization of the trace. The Edit tab allows the analyst to visualize the resulting trace in a similar way as the View tab, but it also allows her to reject the trace if she is not satisfied by the result. This feature helps the analyst refine the query in search for the best symbols and inference rules she can get to describe the activity from the actual data. Our system returns the number of times the pattern has matched in the trace. This number indicates the number of new symbols added. The system also provides an export function to a text file that can be imported into other tools like Microsoft Excel for further statistical computa- tions. Queries are saved as independent files and the tool helps the user reference them. The database of queries associated with the ontology constitutes a representation of the analyst’s understanding about how to make sense of the activity trace. 4. Example activity model We report here an example activity modelling analysis taken from a car-driving study (Hen- ning et al., 2007). Another example analysis – in a study of non-state political violence – is reported by Georgeon et al. (2010). Figure 5 shows a 10 s section of a car-driving activity trace focusing on a lane change on a motorway. The legend is presented in Figure 6. In Figure 5, the ‘Button’ is an index signal from the experimenter recorded during the experiment, the start thinking event comes from 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Table 1: Simplified inference query to infer a lane change CONSTRUCT (?r1, infer, Indicator_Symbol) (?r2, infer, Indicator_Symbol) (Indicator_Symbol, type, Lane_Change_Indicator) WHERE (?r1, type, Accelerate) (?r2, type, Left_Mirror_Glance) (?r1, time-code, ?d1) (?r2, tine-code, ?d2) (?r2, Acceleration_Value, ?v1) FILTER (?v1 > 1) (sequence(?d1,?d2,1)) # in order, within 1 second 2The complete SPARQL syntax can be found in the SPARQL documentation, http://www.w3.org/TR/rdf- sparql-query/ c! 2011 Blackwell Publishing Ltd Expert Systems 9 EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M http://www.w3.org/TR/rdf-sparql-query/ http://www.w3.org/TR/rdf-sparql-query/ a verbal signal given by the driver in a video- based post-experiment interview, and the lane crossing event is the moment when the left front wheel crosses the lane, manually encoded from the video. The representation of the driv- ing episode given in Figure 5 is then automati- cally generated with the inference rules defined by the analyst. Using the trace querying facilities of AB- STRACT on this driving data set, we could identify two categories of lane changes that we explain by the performance of two strategies (Figures 7 and 8). In these descriptions, the lower part is a representative trace episode from our database, while the upper part is drawn by hand as an abstract description of the strategies. This upper part also shows the car trajectory in the lanes, respecting the scale ratio length=width: about 300m of 4m wide lanes. The strategy displayed in Figure 7 is character- ized by beginning in a situation where the subject is impeded by a slow vehicle. In this case, the subject starts accelerating [1] almost at the same time as he looks at his left mirror [2]. Then, if there is no vehicle coming from behind, he starts looking at the left lane [3], he switches his blinker on [4] and he performs the lane change [5]. In this situation, the acceleration associated with a glance to the left mirror appears as a good predictor of the lane change. It occurs more than 1 s before the subject switches the blinker on. In the situation of Figure 8, no slow vehicle impedes the subject and he performs the lane change ‘on the fly’. In this case, we can find no behavioural sign of his intention to change lane before the blinker is switched on [1]. Nevertheless, the blinker appears to be a sufficient predictor in that case, because it is switched in anticipation of the lane change, several seconds before the man- oeuvre: looking to the left lane [2], looking to the left mirror [3] and starting steering [4]. In parallel to searching and identifying these categories of situations and strategies, we define symbols to represent them and inference rules to generate these symbols. Finally, we have named the first strategy Lane_change_delayed and represented it with white triangles, and the second strategy Lane_change_anticipated represented 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Figure 5: Example of lane change on a motorway (screenshot with text labels added at the top). First level of abstraction Second level of abstraction Backwards Leftwards Rightwards Frontwards Stable Relation of inference Steering Speed Eye Blinker Gas Brake Obstacle Time Figure 6: Legend of car driving symbolic trace. 10 Expert Systems c! 2011 Blackwell Publishing Ltd EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M with white squares. Figure 9 shows that these two types of event occur four times in a representative 20-min motorway ride of a subject. Figure 9 displays only the symbols that are useful to see the lane changes: the index button pressed by the experimenter (blue circles), the blinker (orange triangles up and down), the accel- erations (ochre triangles to the right), the left mirror glances (grey triangles to the left), main junctions on motorways given by the GPS posi- tion (green square and triangles on the bottom of the display) and the lane change category symbols (white triangles and squares). In this example, one lane change (marked by the verti- cal cursor line in the figure) was not categorized. The uncategorized cases would require further study to understand their specificity. Once the analyst has finished her analysis, she can export the abstract traces into a spreadsheet to com- pute and report the statistics of the occurrences of events of interest. Despite the extensive existing studies of car driving (e.g. Groeger, 2000) and on lane change manoeuvres (e.g. Salvucci & Liu, 2002), we could find no representation of the driving activity that could compare to ours in terms of comprehen- siveness of the data and capacity to support higher-level understanding. In the case of lane changes, this innovative description helped us discover different strategies that have not been reported in the literature before. In so doing, this study shows how our principles and tool have addressed the needs set out in the introduction: our prototype tool helped us generate compre- hensive symbolic representations of the activity at an appropriate abstraction level to discover pre- viously unknown knowledge about the activity. It also helped us explain, report and back up our models with the collected traces. 5. Related work In this section, we situate our work in relation to two research areas: the area of qualitative data 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Figure 8: Lane change anticipated without acceleration (Lane_change_anticipated). Figure 7: Lane change with acceleration (Lane_change_delayed). c! 2011 Blackwell Publishing Ltd Expert Systems 11 EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M analysis and the area of trace-based reasoning (TBR). In the area of qualitative data analysis, many software tools address the need for supporting the transcription of raw data into sequences of encoded events, for example, Dismal (Ritter & Larkin, 1994; Ritter & Wood, 2005), NVivo, INTERACT, InfoScope, MORAE, MarShapa (Sanderson et al., 1994) and MacVisSTA. As such, these tools relate to our collection system as described in Section 3.1. Among these tools, MacVisSTA (Rose et al., 2004) particularly relates to this aspect of our work in that it supports merging multi-modal data into a common timeline. At the symbolic level – also called the transcript level in some studies – tools like Theme (Magnus- son, 2000) support the automatic discovery of temporal patterns based on statistical properties. We consider our tool complementary to these tools because our tool helps find the symbolic pattern based on the meaning they have to the analyst rather than on their statistical properties. In the car-driving example, our symbols of interest are not particularly frequent or infrequent nor do they obey pre-assumed statistical laws. HyperRESEARCH appears to be the only qualitative data analysis tool that supports the validation of hypothetic theories through rule- based expert system techniques (Hesse-Biber et al., 2001). We find HyperRESEARCH’s un- derlying principles for theory building very similar to ours. It, however, does not focus on temporal semantics and does not offer symbolic timeline visualization facilities to support activ- ity modelling. Its rule engine also does not exploit elaborated hierarchical semantics de- fined in an ontology, as opposed to our solution based on SPARQL and RDFSs. The other related research area, TBR (Cordier et al., 2009), comes from the domain of knowledge representation, and more precisely from case- based reasoning (CBR) (Aamodt & Plaza, 1994). CBR consists of helping users solve new problems by adapting solutions that have helped them solve previous problems. TBR extends CBR to retrieve useful cases as episodes from the stream of an activity trace (Mille, 2006). Our work relates to TBR because both address the question of model- ling and representing a stream of activity for future usage. In particular, we pull lessons from the work of Settouti et al. (2009) that implemented a trace-based system to support the management and the transformation of traces. TBR has been used to implement companions that provide assis- tance to the user based on previous usage (e.g. Cram et al., 2008) or to support reflexive learning by providing the users with a dynamic display of his passed activity (Ollagnier-Beldame, 2006). As opposed to these works, our work helps a user (the analyst) understand the activity of another user (the subject). In so doing, our work brings princi- ples of qualitative data analysis to TBR and brings TBR techniques to address problems of quantita- tive data analysis. 6. Conclusion We have defined the principles of a methodology and a software tool to help an analyst create models of activity from activity traces in an iterative and interactive fashion. These principles relate to the notion of abductive reasoning in that they consist of helping analysts form, orga- nize and test micro-hypotheses to explain and represent the activity. Specifically, they help 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Figure 9: Categorization of lane changes between Lyon and airport (20 min long). 12 Expert Systems c! 2011 Blackwell Publishing Ltd EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M investigate what concepts and semantics describe the activity the best and represent these concepts in an ontology and apply this semantics through a rule engine. These principles are summarized in Figure 10. In Figure 10, the collected raw data are repre- sented as curves along the activity axis. The first analysis step consists of collecting this raw trace. The second step consists of producing a symbolic trace (symbols represented by circles) through the discretization of the raw trace. The third step consists of modelling the symbolic trace by infer- ring more abstract symbols (represented as squares and triangles) and organizing these sym- bols in an ontology (hierarchy of white rectangles). The fourth step consists of producing explained models of activity (round-angled rectangles) that are backed up by the abstract trace. During the modelling process, the analyst formalizes her un- derstanding of the activity in the form of transfor- mation rules, ontology and documentations that are stored in the system, which allows capitalizing on the analyst’s knowledge across studies. To illustrate these principles, we have imple- mented a prototype software tool through an assemblage of open-source knowledge engineer- ing software modules. With this tool, we have modelled car-driving activity traces collected with an instrumented vehicle. This analysis allowed us to identify and describe two strate- gies of lane change on motorways. This example shows that our knowledge engineering approach of activity modelling from activity traces offers answers to the needs for tools to help analysts understand better an observed activity, create models of this activity, report and back up these models with the observational data. The abstract activity traces that we have con- structed constitute a model of the car driver in their own, in that the analysts can use our tool to query these traces to answer new questions they may have about the driving activity. Our future developments will consist of simulating the activ- ity, for instance, in the field of car driving, generating realistic driving behaviour in a driving simulator, based on our abstract activity traces. Acknowledgements We would like to acknowledge support for this project from the European Commission through the HumanIST Network of Excellence. Partial support for this report was provided by ONR (contracts N00014-06-1-0164 and N00014-08-1- 0481). We also thank Jean–Marc Trémeaux for his participation in the software implementation, and Matthias Henning for his contribution to the car-driving experiment. We appreciate the com- ments on this report from Jonathan Morgan and Ryan Kaulakis. References AAMODT, A. and E. PLAZA (1994) Case-based reasoning: foundational issues, methodological variations, and system approaches, AI Communications, 7, 39–59. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 1. Raw trace 2. Symbolic trace 3. Modeled trace 4. Activity models Activity Analysis Figure 10: Activity modelling from activity traces c! 2011 Blackwell Publishing Ltd Expert Systems 13 EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M CORDIER, A., B. MASCRET and A. MILLE (2009) Extend- ing case-based reasoning with traces, Grand Chal- lenges for Reasoning from Experiences, Workshop at IJCAI, Pasadena, CA, pp. 23–32Q1 . CRAM, D., B. FUCHS, Y. PRIÉ and A. MILLE (2008) An approach to user-centric context-aware assistance based on interaction traces, Modeling and Reasoning in Context, Human Centered Processes, Delft, The NetherlandsQ2 . DEARDEN, A. and J. FINLAY (2006) Pattern languages in HCI: a critical review, Human–Computer Interac- tion, 21, 49–102. ERICSSON, K.A. and H.A. SIMON (1993) Protocol Analysis: Verbal Reports as Data, Cambridge, MA: MIT Press. GEORGEON, O., J. MORGAN, J. HORGAN and K. BRAD- DOCK (2010) Process modeling for the study of non- state political violence, 19th Annual Conference on Behavior Representation in Modeling Simulation, Charleston, NC, Brims Society, pp. 240–247Q3 . GROEGER, J. (2000) Understanding Driving, Hove, UK: Psychology Press. HENNING, M.J., O. GEORGEON and J.F. KREMS (2007) The quality of behavioral and environmental indica- tors used to infer the intention to change lanes, 4th International Driving Symposium on Human Factors in Driver Assessment, Stevenson, Washington, USA, pp. 231–237Q4 . HESSE-BIBER, S., P. DUPUIS and S. KINDER (2001) Testing hypotheses on qualitative data: the use of HyperRESEARCH computer-assisted software, Social Science Computer Review, 18, 320–328. HILBERT, D.M. and D.F. REDMILES (2000) Extracting usability information from user interface events, ACM Computing Surveys (CSUR), 32, 384–421. JAMES, W. (1907) PragmatismQ5 . MAGNUSSON, M.S. (2000) Discovering hidden time patterns in behavior: T-Patterns and their detection, Behavior Research Methods, Instruments and Com- puters, 32, 93–110. MILLE, A. (2006) From case-based reasoning to traces- based reasoning, Annual Reviews in Control, 30, 223–232. OLLAGNIER-BELDAME, M. (2006) Traces d’interactions et processus cognitifs en activité conjointe: Le cas d’une co-rédaction médiée par un artefact numérique, CNRS – LIRIS, Lyon, Université Lumière Lyon 2. OLSON, G.M., J.D. HERBSLEB and H.H. REUTER (1994) Characterizing the sequential structure of interactive behaviors through statistical and grammatical tech- niques, Human–Computer Interaction, 9, 427–472. POPPER, K. (1972) Objective Knowledge, Oxford, UK: Oxford University Press. RITTER, F.E. and J.H. LARKIN (1994) Developing pro- cess models as summaries of HCI action sequences, Human–Computer Interaction, 9, 345–383. RITTER, F.E. and A.B. WOOD (2005) Dismal: a spread- sheet for sequential data analysis and HCI experi- mentation, Behavior Research Methods, Instruments, and Computers, 37, 71–81. ROSE, R.T., F. QUEK and Y. SHI (2004) MacVisSTA: a system for multimodal analysis, International Con- ference on Multimodal Interfaces, State College, PA, pp. 259–264. SALVUCCI, D.D. and A. LIU (2002) The time course of a lane change Q6: driver control and eye-movement beha- vior, Transportation Research, F, 123–132. SANDERSON, P.M. and C.A. FISHER (1994) Exploratory sequential data analysis: foundations, Human–Com- puter Interaction, 9, 251–317. SANDERSON, P.M., M.D. MCNEESE and B.S. ZAFF (1994) Handling complex real-word data with two cognitive engineering tools: COGENT and MacSHAPA, Be- havior Research Methods, Instruments, and Compu- ters, 26, 117–124. SETTOUTI, L.S., Y. PRIÉ, J.-C. MARTY and A. MILLE (2009) A trace-based system for technology- enhanced learning systems personalisation, the 9th IEEE International Conference on Advanced Learn- ing Technologies, Riga, pp. 93–97 Q7. The authors Olivier L. Georgeon Olivier L. Georgeon is a research associate in the LIRIS department at the Claude Bernard Uni- versity (Lyon, France). He previously had a 12- year industrial experience as a software engineer, developer and project manager in the domain of automatization of industrial processes. He re- ceived a PhD in cognitive psychology from the Université Lumière (Lyon, France) in 2008. His research interests are in learning through activity. His work both produces practical applications and addresses epistemological questions concern- ing learning from and about an activity. Alain Mille Alain Mille has been a professor at the Claude Bernard University (Lyon, France) since 2000. He is a scientific director in the LIRIS depart- ment (UMR 5205 CNRS), and leads the research group Supporting Interactions and Learning through Experience. After a first career as a computer engineer and a project manager (Hospices Civils de Lyon), Alain Mille was 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 14 Expert Systems c! 2011 Blackwell Publishing Ltd EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M asked to build a computer science department in a college of engineering (CPE-Lyon). In this latter position, he created a research team on case-based reasoning. The main topics were decision helping and complex tasks assistance. Alain Mille is on several national and interna- tional conference programme committees, and on the editorial board of Intellectica (a cognitive sciences journal). He is specifically interested in dynamic modelling of knowledge during com- puter-mediated activities. Thierry Bellet Thierry Bellet has been a researcher at the Ergonomics and Cognitive Sciences Laboratory of IFSTTAR (Institut Français des Sciences et Technologies des Transports, de l’Aménage- ment et des Réseaux) since 1999. His main areas of research concern human activity analysis and cognitive processes modelling (e.g. situational awareness, decision making, cognitive schemas). His research has produced computational simu- lations of mental activities of the car driver (e.g. COSMODRIVE: COgnitive Simulation MOdel of the DRIVEr). Benoit Mathern Benoit Mathern is a PhD student in computer science. He first worked as a computer engineer at the Ergonomics and Cognitive Sciences Labora- tory of IFSTTAR (Institut Français des Sciences et Technologies des Transports, de l’Aménage- ment et des Réseaux), and then he started his PhD work in collaboration with the LIRIS department of the Claude Bernard University (Lyon, France). His main research interests focus on knowledge engineering, knowledge discovery and human– machine interaction. His work is applied to cog- nitive science in the field of transportation. Frank E. Ritter Frank E. Ritter is one of the founding faculty of the College of IST, an interdisciplinary aca- demic unit at Penn State, to study how people process information using technology. Frank Ritter’s current research is in the development, application and methodology of cognitive mod- els, particularly as applied to interface design, predicting the effect of behavioural moderators and understanding learning. He edits the Oxford Series on Cognitive Models and Archi- tectures, is associate editor of Cognitive Systems Research, editorial board member of Human Factors, and the Journal of Educational Psychol- ogy and technical programme co-chair for the BRIMS 2009, 2010 and 2011 conferences and the associated special issues in Computational and Mathematical Organizational Theory. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 c! 2011 Blackwell Publishing Ltd Expert Systems 15 EXSY 584 (B W U K E X SY 5 84 W eb pd f: =0 2/ 11 /2 01 1 04 :4 6: 26 5 51 35 5 B yt es 1 5 PA G E S n op er at or =) 2 /1 1/ 20 11 4 :4 6: 31 P M Author Query Form _______________________________________________________ _______________________________________________________ Dear Author, During the copy-editing of your paper, the following queries arose. Please respond to these by marking up your proofs with the necessary changes/additions. Please write your answers clearly on the query sheet if there is insufficient space on the page proofs. If returning the proof by fax do not write too close to the paper's edge. Please remember that illegible mark-ups may delay publication. Journal EXSY Article 584 Query No. Description Author Response Q1 AUTHOR: Please provide date of workshop for reference Cordier et al. (2009). Q2 AUTHOR: Please provide further details for reference Cram et al. (2008). Q3 AUTHOR: Please provide date of conference for reference Georgeon et al. (2010). Q4 AUTHOR: Please provide the date of symposium for reference Henning et al. (2007). Q5 AUTHOR: Please update the reference James (1907) with further details. Q6 AUTHOR: Please provide volume number for reference Salvucci and Liu (2002). Q7 AUTHOR: Please provide the date of conference for reference Settouti et al. (2009). Olivier July 11 2009 Olivier March 21-24 2010 Olivier July 9-12, 2007 Olivier Unmarked définie par Olivier Olivier Transportation Research part F, 5 (5 is the volume number) Olivier July 15-17 2009 Olivier Cram, D., Fuchs, B., Prié, Y. and Mille, A. (2008, 8-12 June 2008). An approach to User-Centric Context-Aware Assistance based on Interaction Traces. Fifth International Workshop on Modeling and Reasoning in Context., Delft, The Netherlands, TELECOM Bretagne, pp. 89-101. Olivier JAMES, W. (1907) Pragmatism. Re-edition: Pragmatism (Philosophical Classics) Publisher: Dover Publications (1995) New York ISBN-10: 0486282708 128 pages Olivier Please also change the citation in page 3 to (James, 1907/1995) if you think that fits the journal's standards. Page 1 of 3 USING E-ANNOTATION TOOLS FOR ELECTRONIC PROOF CORRECTION Required Software Adobe Acrobat Professional or Acrobat Reader (version 7.0 or above) is required to e-annotate PDFs. Acrobat 8 Reader is a free download: http://www.adobe.com/products/acrobat/readstep2.html Once you have Acrobat Reader 8 on your PC and open the proof, you will see the Commenting Toolbar (if it does not appear automatically go to Tools>Commenting>Commenting Toolbar). The Commenting Toolbar looks like this: If you experience problems annotating files in Adobe Acrobat Reader 9 then you may need to change a preference setting in order to edit. In the “Documents” category under “Edit – Preferences”, please select the category ‘Documents’ and change the setting “PDF/A mode:” to “Never”. Note Tool — For making notes at specific points in the text Marks a point on the paper where a note or question needs to be addressed. Replacement text tool — For deleting one word/section of text and replacing it Strikes red line through text and opens up a replacement text box. Cross out text tool — For deleting text when there is nothing to replace selection Strikes through text in a red line. How to use it: 1. Right click into area of either inserted text or relevance to note 2. Select Add Note and a yellow speech bubble symbol and text box will appear 3. Type comment into the text box 4. Click the X in the top right hand corner of the note box to close. How to use it: 1. Select cursor from toolbar 2. Highlight word or sentence 3. Right click 4. Select Replace Text (Comment) option 5. Type replacement text in blue box 6. Click outside of the blue box to close How to use it: 1. Select cursor from toolbar 2. Highlight word or sentence 3. Right click 4. Select Cross Out Text http://www.adobe.com/products/acrobat/readstep2.html Page 2 of 3 Approved tool — For approving a proof and that no corrections at all are required. Highlight tool — For highlighting selection that should be changed to bold or italic. Highlights text in yellow and opens up a text box. Attach File Tool — For inserting large amounts of text or replacement figures as a files. Inserts symbol and speech bubble where a file has been inserted. Pencil tool — For circling parts of figures or making freeform marks Creates freeform shapes with a pencil tool. Particularly with graphics within the proof it may be useful to use the Drawing Markups toolbar. These tools allow you to draw circles, lines and comment on these marks. How to use it: 1. Click on the Stamp Tool in the toolbar 2. Select the Approved rubber stamp from the ‘standard business’ selection 3. Click on the text where you want to rubber stamp to appear (usually first page) How to use it: 1. Select Highlighter Tool from the commenting toolbar 2. Highlight the desired text 3. Add a note detailing the required change How to use it: 1. Select Tools > Drawing Markups > Pencil Tool 2. Draw with the cursor 3. Multiple pieces of pencil annotation can be grouped together 4. Once finished, move the cursor over the shape until an arrowhead appears and right click 5. Select Open Pop-Up Note and type in a details of required change 6. Click the X in the top right hand corner of the note box to close. How to use it: 1. Click on paperclip icon in the commenting toolbar 2. Click where you want to insert the attachment 3. Select the saved file from your PC/network 4. Select appearance of icon (paperclip, graph, attachment or tag) and close work_35yj7gknozerbjznmviwkbytwi ---- PII: 0003-2670(93)80016-E Analydca Chimica Acta, 284 (1993) 131-136 Elsevier Science Publishers B.V., Amsterdam 131 Expert system for the interpretation of infrared spectra G.N. Andreev, O.K. Argirov and P.N. Penchev Department of Chemistry, University of Pkmiiv, 4MW-Plovdiv (Bulgaria) (Received 7th December 1992; revised manuscript received 30th March 1993) Abstract An expert system for the interpretation of infrared spectra EXPIRS was created. The main features of EXPIRS are: hierarchical organization of the characteristic groups, realized by frames; registration of the multiple use of spectral bands; taking into account the solvent absorption and the chemical inconsistencies; documenting the interpretation course and providing explanations on request. The ten most important heuristics used by an expert for interpretation of infrared spectra were formulated and some of them were tested with EXPIRS. Keywords: Infrared spectrometry; Expert systems; Frames; Heuristics; Organic compounds Computer-assisted interpretation of infrared (IR) spectra has drawn the attention of scientists for more than a decade. Several different ap- proaches have been applied, including the utiliza- tion of correlation tables [l-S], symbolic 16-81 and fuzzy [9] logic, expert systems based on rules [lo-14 and refs. cited therein] and table-driven procedure [15], frames [16] and, recently, neural networks [17-191. Most of the authors have for- mulated their results in the terms of probability predictions for the characteristic groups in the studied compound. These systems produce re- sults in the form of tables: functional groups vs. probability. The final decision for the presence of a given substructure was left to the user. In the general case, however, the user is not a specialist in IR spectroscopy. An alternative approach is to base the inter- pretation rules on classical logic, leading to deci- sions clearly formulated by an expert. This can be Correspondence to: G.N. Andreev, Department of Chemistry, University of Plovdiv, 4000-Plovdiv (Bulgaria). achieved using the heuristic knowledge of a hu- man expert in conjunction with the positive char- acteristics of computers. The present work deals with the formulation of the principle heuristics used by an expert to interpret IR spectra and the implementation of some of them in an expert system. BASIC CONCEPTS OF THE EXPERT Until now, we have not found in the literature clear formulated heuristics used by an expert for the interpretation of vibrational spectra. Our ex- perience in structural elucidation of organic com- pounds by IR spectroscopy and the practice with EXPert in InfraRed Spectroscopy (EXPIRS) have led to the following ten most important heuris- tics: (1) Taking into consideration the preliminary information about the studied sample. (2) Correction of the spectral band intensities, when some of the bands are “obviously too inten- sive”. 0003~2670/93/$06.00 8 1993 - Elsevier Science Publishers B.V. All rights reserved 132 G.N. Andreev et aL /Anal. Chim. Acta 284 (1993) 131-136 (3) Comparison between the parameters of the spectrum bands (position, intensity, width) and characteristic group data. (4) Discussion of the alternatives for the func- tional group combinations, possibly existing in the compound at hand, satisfying the spectral data. (5) Discussion of the alternatives for the ex- planation of spectrum bands’ origin: normal vi- brations, overtones, combinations, Fermi reso- nances. (6) Excluding from consideration the bands used during the interpretation of every altema- tive, i.e. single use of each band by discussion of every alternative. (7) Taking into consideration the band shapes as well as the whole spectrum or any of its sectors. (8) Taking into consideration the absorption of moisture and carbon dioxide. (9) Taking into consideration spectrum regis- tration conditions: physical condition (influence upon characteristic group intervals and band in- tensities), solute-solvent interactions, blocking of spectrum intervals caused by solvent absorption, sample concentration (hydrogen bonds), sample thickness, pressure, etc. (10) Taking into consideration non-spectral reasons: chemical inconsistencies of the func- tional groups simultaneously predicted in a given alternative; chemical interaction between the sol- vent used and the predicted functional groups. It should be noted that this enumeration is not in order of importance because the neglection of any heuristic can lead to wrong conclusions. Fur- thermore, such an order should not be accepted as and algorithm, because some of the heuristics must be repeatedly applied in the course of inter- pretation, depending on the nature of the studied molecule. The third heuristic is the only heuristic used in all expert systems developed for the interpreta- tion of IR spectra. Most of these also use some of the other heuristics described above. However, we have not found any utilization of heuristics 5, 7, 8 and 10 whereas 2 and 6 have been used but not in the same manner as an expert would use them. The band of each characteristic group has a different relative intensity depending on the other groups included in the same molecule. For exarn- ple, y(CH) modes are very strong in the IR spectra of aromatic hydrocarbons but they often appear medium to strong in the spectra of aro- matic carbonyl compounds. The common solution for this problem of the system for automated interpretation is enlargement of the intensity in- terval in the knowledge base. However, this auto- matically brings one to the so-called “hyperpre- diction”. On the other hand, the expert detecting the presence of the “very strong” v(C0) modes “expands” the intensity of the other spectrum bands instead of extending the intensity intervals in his mind. The application of this principle in the expert system will avoid the hyperprediction. Heuristic 2 takes into account the latter ap- proach. The results obtained from EXPIRS testing pointed out that the most significant reduction of the hyperprediction can be achieved by the appli- cation of heuristic 6. EQUIPMENT AND MATERL4L.5 EXPIRS was developed on an IBM-PC com- patible computer with 640 kbyte RAM and the program was written in PASCAL (the system is available on request). 200 spectra from the Sadtler Fourier transform (FT)_IR library were used to test the system as well as spectra registered in our laboratory on a Perkin Elmer 1750 FT-IR spec- trometer with a resolution of 2 cm-‘. More than 70 characteristic groups containing the elements C, H, 0 and N were incorporated in the system and the appropriate rules for their interpretation were programmed. Program description The expert system is based on the concept of “characteristic group intervals”. [al. There are three levels in EXPIRS (Fig. 1). The first two levels involve the process of interpretation; the third one (documentation) is designed to register the interpretation course and the number of spectrum peaks used during the work of the program. G.N. Andreev et al. /Anal. Chim. Acta 284 (1993) 131436 133 UUCUMENTATION \...I . . . . . f . . . . . . . . . . . . . . . . . . . . . t . . . I . . . . . 1 I Fig. 1. Flow chart of EXPIRS. . . . . DATA - The data for the characteristic intervals, sol- vent absorption and chemical inconsistency were realized as arrays of the appropriate variables. Analysis of the literature, combined with our experience, reveals that the groups are best orga- nized hierarchically. Such an approach avoids re- peating the interpretation of the spectral inter- vals. For example, the primary XI-I,-OH, sec- ondary > CH-OH and tertiary X-OH alcohols have a common interval at 3606-3200 cm- ’ due to the stretching vibration of the hydroxyl group v(OH). The triple verification of this interval can be avoided if the characteristic OH group, which determines this interval, is used as a “parent group”. In this case, the rules for the different alcohols (primary, secondary and tertiary) must take into account not only the entire spectrum, but also the status of the parent group OH. Such an approach was utilized in the description of other characteristic groups, including alkanes, alkenes, amines, aldehydes, etc. This type of group organization corresponds to the chemist’s knowl- edge that the primary, secondary and tertiary alcohols are special cases of the concept “alcohol”. In other words such an organization reflects the different levels of abstraction of the chemical structure in the chemist’s mind. We have used frames to realize the hierarchi- cal organization of the characteristic groups in our expert system (Fig. 2). The principal frame in r+g:f$$g Fig. 2. Hierarchical organization of the frames in EXPIRS. the program is the frame “Characteristic group”; two specifications of this frame are given in Fig. 3 for the groups sp3C-H and CH,. When the system checks for the presence of the methylene group CT-I, it needs data for its parent group sp3C-H. If the status of the parent group has yet to be discovered, the procedure starts with its frame, etc. There are no limitations to the depth of such parent group checks. This makes the system independent on the group or- der in the data base and on the user’s range of demand, which differentiate EXPIRS from the approaches utilized elsewhere [ 10,16,21]. One can readily see (Fig. 3) that the parent group and its “generics” belong to the same frame and the hierarchy mentioned above is de- Characteristic group: Name : String for the screen: Conclusion for presence: Parent group: Name : co”clusion for presence Characteristic intervals: Interval No 1: Characteristic Arcmp: Name : String for the screen: ca”clusio” for presence: Parent group: Name: ‘Concl”.io” for preaance Characteristic iararva1s: Interval rio I: SPJCH spx-Ii Received by a procedure based on data for the characteristic intervals, spectrum, solvent absorption. Absent Absent 3000-2800 cm-l weak to strong Cb CHZ Received by a procedure based on data for the parent grou tic i.t.w.P& characteris- spectrum, solvent absorption. sp3cli Received 1480-1430 cm-l weak to * trong Fig. 3. Two entities from the frame “characteristic group”: sp3C-H and CH,. 134 termined not as frame hierarchy, but by means of the connection “parent group” between the dif- ferent specifications of the same frame “char- acteristic group”. Operation of the system The interaction between the user and the sys- tem is determined by the appropriate menus. There are three main options: INTERPRETA- TION, CONCLUSIONS, SAVING THE RE- SULTS. In the option INTERPRETATION the user: - puts in the name of the spectrum file; inter- action mode can be selected; - defines the intensity margins for strong, medium and weak peaks; - puts in the solvent or the disturbing media for the measured sample; - gives preliminary information (based on analysis or the origin of the sample) for the absence of some groups as well as the kind of the groups of interest. The option CONCLUSIONS provides infor- mation on the groups whose presence in the studied molecule is positive, negative or uncer- tain. The user can request explanation about the arguments on which the conclusion is based. Ad- ditional information on positively predicted groups which are not chemically coincident is available. In the option SAVING THE RESULTS user can save on disk or print the results of interpretation. RESULTS AND DISCUSSION the the We have obtained excellent to satisfactory agreement between the predicted and existing characteristic groups by testing the developed expert system with the IR spectra, measured in our laboratory or included in the Sadtler FT-IR library. The following examples of EXPIRS’ work il- lustrate the importance of some heuristics de- scribed above. G.h? Andreev et al. /Anal. Chim. Acta 284 (1993) 131-136 The interpretation of the spectrum of DL-2- methylbutanoic acid [CH,CH,CH(CH3)COOH] gives a report for the presence of the follow- ing characteristic groups: sp3C-H, CH,, CH,, COOH, alkyl-COOH. As seen EXPIRS identi- fies all three functional groups of the studied substance. It further specifies that the carboxylic group is aliphatic. The presence of C-H with sp3-hybridized carbon atom is also mentioned. Along with the correct predictions we also received some incorrect answers by the spectral interpretation. The latter can be divided into two different types: (1) negative statement for a char- acteristic group existing in the studied molecule; (2) positive statement for a characteristic group that is absent in the studied compound. We found that one could eliminate the first kind of incorrect answers by improving the data base. The errors of the second type, which appear more frequently (hyperpredictions), could not be avoided by the same manner. The latter are con- nected with the multiple use of the same bands in the course of interpretation. Our program was supplied with a counter in order to register the multiplicity of band usage. The hyperprediction will be illustrated by the interpretation of the spectrum of isopropylben- zene. EXPIRS reveals the existence of the follow- ing characteristic groups: sp3C-H, CH,, CH,, i-Pr, =C-H, C=C, cis-CH=CH, aryl, Ph-, o-aryl. This interpretation is based on the spectrum bands given in the following format: wavenum- ber(bandwidth) - multiplicity of band’s usage: 3083(m) - 1, 3064(m) - 1, 3029(s) - 1, 2963(s) - 1, 2929(s) - 1, 2889(s) - 1, 2873(s) - 1, 2802(m) - 1,1949(m) - 0,1872(w) - 0,1804(m) - 1,1744(w) - 0, 1665(w) - 1, 160%) - 2, 1533(w) - 0, 1496(m) - 0, 1465(s) - 1, 1451(s) - 1, 1386(m) - 3,1364(m) - 3,1320(m) - 1,1300(m) - 1,1279(m) - 1, 1208(w) - 0, 1102(m) - 0, 1079(m) - 1, 1048(m) - 2,1027(m) - 2,921(m) - 0,905(m) - 0, 777(w) - 0, 761(s) - 4, 697(s) - 4, 531(m) - 1, where s = strong, m = medium and w = weak. The multiple use of the 1605, 761 and 697 cm-’ bands is the reason for the C=C, c&CH=CH and o-aryl groups hyperprediction. Obviously, one could eliminate such incorrect predictions by tak- G.N. Andreev et al. /Ad Chim. Acta 284 (1993) 131-136 ing into account the multiple use of each spectral band. The ability of EXPIRS to take into considera- tion the influence of the solvent or dispersion agent used will be illustrated by the interpreta- tion of the spectrum of o-glucose measured in nujol mull (wavenumber/ relative intensity band- width): 3408/0.810, 3306/0.833br, 2925/0.905, 2855/ 0.828, 1459/0.720, 1376/0.662, 1340/ 0.630, 1296/0.568, 1224/O-568, 1203/0.570, 1149/ 0.695, 1111/0.751, 1078/0.611, 1050/ 0.753, 1024/ 0.824, 996/ 0.798, 916/ 0.597, 838/ 0.592, 775/0.559, 723/0.531, 614/0.709. The following message appears after the interpre- tation: (1) There are spectral data for PRESENCE of the following groups: OH, RCH,-OH, R&H- OH, R&-OH, Ar-OH, R-O-R, R-0-Ar, > N-H; (2) The presence of the following groups is UNCERTAIN: sp3C-H, CH,, (CH,),, CH,, i-Pr, t-Bu, Ar-NH. The second report deals with the groups whose characteristic intervals overlap with those of the dispersion media (or solvent used). An important feature of EXPIRS is that the user can follow the logic of the interpretation upon request. If the user wants to provide an explanation for any of the above results, he can use the option EXPLANATION. He goes to the option CONCLUSIONS and designate the corre- spoding group of interest. For example, the fol- lowing messages will appear for the groups R&H-OH and CH,: RXH-OH . . . . . . . . . . . . . . . . . . . . . . . . . . . REPORT: The conclusion for presence of OH is POSITIVE Is there a strong band between 1120 and 1080? YES: 1111 (s) There are spectral evidences for pres- ence of R&H-OH Press any key to continue.. . CI=................................. REPORT: The conclusion for presence of sp3C- H is UNCERTAIN 135 TOTAL OVERLAPPING WITH THE SOLVENT BANDS The presence of the CH2 is UNCER- TAIN Press any key to continue.. . This possibility makes the expert system appro- priate for educational purposes as well. The authors wish to thank the Bulgarian Na- tional Fund of Research at the Ministry of Edu- cation and Science for partial financial support through Grant No. X-124/91. REFERENCES 1 T. Visser and J.H. van der Maas, Anal. Chim. Acta, 122 (1980) 363; Anal. Chim. Acta, 133 (1981) 451. 2 M. Farkas, J. Markos, P. Szepesvaty, I. Bartha, G. Szalon- tai and Z. Simon, Anal. Chim. Acta, 133 (1981) 19. 3 G. Szalontai, Z. Simon, Z. Csapo, M. Farkas and Gy. Pfeifer, Anal. Chim. Acta, 133 (1981) 31. 4 B. Debska, J. Duliban, B. Guzowska-Swider and Z. Hippe, Anal. Chim. Acta, 133 (1981) 303. 5 W.-R. Leupold, C. Domingo, W. Niggemann and B. Schrader. Freszenius’ Z. Anal. Chem. 303 (1980) 337. 6 7 8 9 10 11 12 13 14 15 16 17 L.A. Gribov and M.E. Elyashberg, J. Mol. Struct., 5 (1970) 179; J. Mol. Struct., 50 (19781351. L.A. Gribov, M.E. Elyashberg and L.A. Moscovkina, J. Mol. Struct., 9 (1971) 357. K. Funatsu, Y. Susuta and S. Sasaki, Anal. Chim. Acta, 220 (1989) 155. T. Blaffert, Anal. Chim. Acta, 161 (1984) 135. H.B. Woodruff and G.M. Smith, Anal. Chem., 52 (1980) 2321. S.A. Tomellini, D.D. Saperstein, J.M. Stevenson, G.M. Smith, H.B. Woodruff and P.F. Seelig, Anal. Chem., 53 (1981) 2367. S.A. Tomellini, R.A. Hartwick and H.B. Woodruff, Appl. Spectrosc., 39 (1985) 330. B.J. Wythoff, C.F. Buck and S.A. Tomellini, Anal. Chim. Acta, 217 (1989) 203. H.J. Luinge, Vib. Spectrosc., 1 (1990) 3. M.O. Trulson and M.E. Munk, Anal. Chem., 55 (1983) 2137. P. Edwards and P.B. Ayscough, Chemom. Intell. Lab. Syst., 5 (1988) 81. E.W. Robb and M.E. Munk, Mikrochim. Acta (Wien) I, (19901 131. 18 M.E. Munk, M.S. Madison and E.W. Robb, Mikrochim. Acta (Wien) II, (1991) 505. 19 K.J. Fessenden and L. Gyorgyi, J. Chem. Sot. Perkin Trans 2, (1991) 1755. 136 G.N. Andreev et aL /Ad Chim. Acta 284 (1993) 131-136 20 N.B. Colthup, L.H. Daly and S.E. Wiberlay, Introduction to Infrared and Raman Spectroscopy, Academic Press, New York, 1975; L.J. Bellamy, The Infrared Spectra of Complex Molecules, Methuen, London, 1964, K. Nakan- ishi, Infrared Absorption Spectroscopy, Holden-Day, San Francisco, CA, and Nankodo, Tokyo, 1962. 21 H.J. Luinge and J.H. van der Maas, Anal. Chim. Acta, 223 (1989) 135. work_3665qv37y5dqbnw4huvnhsg4mu ---- 1052-2 Expert System Aid in Differentiating Among Paroxysmal Supraventricular Tachycardias lACC February 1995 ABSTRACfS l8lA were found to result in a variability below 5% in parameters measured di- rectly from the average waveform, and up to 10% in those obtained from the time derivative. Subsequently, the feasibility of an automated version of the algorithm, based on objective operator independent criteria, was evalu- ated, and parameters obtained were found to be in excellent agreement with those obtained using manual approach. In summary, this algorithm provides a fast. easy and objective method for noise reduction in acoustic quantifica- tion signals. This algorithm may improve the on-line noninvasive assessment of systolic and diastolic LV function. Computer Aided Instructions: II Tuesday, March 21,1995,9:00 a.m.-12:30 p.m. Ernest N. Morial Convention Center, Hall B Results of Reduced Antlthrombotic Therapy Following Intra Coronary Stenting Tuesday, March 21, 1995, 10:30 a.m.-Noon Ernest N, Morial Convention Center, La Louisiane A 10:30 1741-1 1 Clinical Experience with Heparin-Coated Stents - The Benestent II Pilot Phase 1 H~kan Emanuelsson, Patrick W. Serruys, Jorge Belardi, Hans Bonnier, Antonio Colombo, Jean Fajadet, Jean-Jacques Goy, Guy Heyndrickx, Peter de Jaegere, Victor Legrand, Cad as Macaya, Pierre Materne, Wolfgang Rutsch, Ulrich Sigwart, Harry Suryapranata, enestent Study Group. Division of Cardiology, Sahlgrenska Hospital, G6teborg, Sweden; Thoraxcenter, Erasmus Univ. Rotterdam, The Netherlands Krzysztof P. Wr6blewski, Zhi Yun Tian. Children's Hospital of Philadelphia, Philadelphia, PA Congenital heart disease (CHD) is the most common congenital malforma- tion with an incidence rate of 0.7%. Many pregnant woman (in some coun- tries all) are offered an ultrasound scan at around 18 weeks of pregnancy. The scan incorporates a detailed anatomical survey of the fetus and if in- cludes at least a four chamber view, it is an excellent opportunity to detect congenital heart defects. Therefore a Windows based multimedia computer program for the instruction of Fetal Echo for ultrasound technologists, stu- dents. residents, fellows and primary care physicians has been developed. The application includes a step by step tutorial to instruct the user in reading Fetal Echo data, a browsing library of definitions of CHD's, graphics images and digitized echocardiograms, and the Expert System for automatic diagno- sis. The pictures, images and descriptions are linked to a database and can be viewed by searching for the defect name or symptoms. This provides a powerful instructional tool in this very difficult area of medical education. The minimum hardware requirements to run this program are: IBMIPC or compatible computer running MS Windows version 3.1 with 80386SX pro- cessor,4 MB RAM, 30 MB free disk space, multimedia with dual speed CD ROM, and a VGA card capable to display at least 256 colors. Steven Georgeson. Somerset Medical Center, Somerville, NJ The differentiation between atrioventricular reciprocating tachycardia via an accessory pathway (AVRT). AV nodal reentrant tachycardia (AVNRT) and atrial tachycardia (AT) in paroxysmal supraventricular tachycardia may be helpful in guiding pharmacologic therapy and in identifying patients for radiofre- quency catheter ablation. To help in this differentiation, an expert system was developed using a commercially-available expert system shell (EXSl'S). A simplified version of this system was designed to run on a palm-top com- puter. Both programs use the MS-DOS operating system. The expert system is rule-based and assigns probability values to the goal states through the process of backward chaining. The goal states for this expert system were AVRT, AVNRT, and AT. The user is queried for the presence of various abnor- malities on the presenting EKG (P wave location, ORS alternans, pseudo r wave in lead V1, pseudo S wave in the inferior leads), comparison with pre- vious EKG's (presence of pre-excitation) and the effect of vagal manueuvers or adenosine infusion on the tachycardia. From published data, each abnor- mality is assigned a probability based on the positive predictive value of the abnormality for AVRT, AVNRT and AT. By combining the positive predictive values, the expert system assigns a final probability for AVRT, AVNRT and AT. This expert system may be useful as a diagnostic tool and a teaching aid in the differentiation between AVRT, AVNRT and AT in paroxysmal supraven- tricular tachycardia. 10:45 Full Antiplatelet Therapy without Anticoagulation After Coronary Stenting The purpose of the Benestent II Pilot Phase was to explore the safety of reducing antithrombotic therapy in conjunction with implantation of heparin- coated stents. The study consists of three phases, where resumption of hep- arin therapy after stent implantation was progressively delayed in a stepped care approach. Material and Methods. Palmaz-Schatz stents with heparin coating were implanted in 51 patients (88% male) with stable angina pectoris. Heparin treatment was withheld 6 hours following removal of the sheath introducer from the femoral artery. The mean age was 59 years, 10% had a previous myocardial infarction, diabetes was prevalent in 8%, hypertension in 24% and 59% were current or previous smokers. Target lesion was located in the left anterior descending artery in 51 %, left circumflex in 8% and right coronary artery in 41 %. TIM I flow I-II was present in 14% and TIMIIlI in 86%. The mean pre-procedural minimal lumen diameter (MLD) was 1.10 mm and reference diameter 3.16 mm. Results. Stent implantation was successful in all patients and following the procedure, mean MLD increased to 2.77 mm and percent diameter stenosis was reduced from 65% to 19%. The maximum balloon size was 3.45 ± 0.40 mm. Post-stent dilatation was performed in 40 patients (80%). High pressure (> 12 atm) was used in 22 of these cases (54%). Two patients needed a sec- ond stent· one for an occlusive distal dissection and one for a distal lesion. There we;e no major complications, i.e. death, myocardial infarction, urgent CABG, re-PTCA or cerebrovascular accident. Peripheral vascular complica- tions required vascular surgery in 3 patients (5.9%) and blood transfusion in one (2%). Conclusions. In this pilot study implantation of heparinized stents was as- sociated with a 100% success rate, absence of serious complications and a moderate incidence of vascular complications. Further reduction of an- tithrombotic treatment may be feasible. Jean-Marc Lablanche, Gilles Grollier, Nicolas Danchin, Jean-Louis Bonnet, Eric Van Belle, Eugene Me Fadden, Michel E. Bertrand. Universib'esofUlle, Caen, Nancy and Marseille, France Subacute thrombosis remains a major limitation of coronary stenting. In addi- tion, local complications related to the intensive anticoagulation that is com- monly employed are frequent. We prospectively studied a regime of intensive anti platelet therapy (as- pirin 200 mg daily begun before percutaneous transluminal coronary angio- plasty, ticlodipine 500 mg daily for 3 months begun just after angioplasty) with periprocedural dextran infusion continued for a period at the discretion of the local investigator. To date, 98 patients (85 men, 13 women) undergo- ing 102 procedures involving 125 stents have been enrolled; 71 patients had one stent implanted; 19 patients had 2 stents implanted, and 2 patients had 3 stents implanted. Symptoms before coronary angioplasty included effort angina (23%), angina at rest (6%), unstable angina (23%), recent myocardial infarction (27%). The indications for stent implantation were occlusive dis- section 36%, dissection without occlusion 32%, suboptimal result or elec- tive implantation 32%. The stented site was the LAD (37%). RCA (35%). LCx (17%), and vein graft (11 %). Stent types were Wiktor (70%), Palmaz-Schatz (23%). Gianturco-Roubin (7%). The diameter of the stents implanted was 2.5 mm (2%), 3.0 mm (32%), 3.5 mm (44%), and 4.0 mm (22%). There were 2 deaths; 1 patient died from cardiogenic shock that was present before stent implantation 1 patient commited suicide; 4 patients developed q-wave AMI, 1 was already complete at the time of stent implant. 3 occurred In the hours after stent implant for occlusive dissection; 8 patients had non-O AMI; 3 pa- tients had CABG, 1 at 5 hours post stent for reocclusion (non-O wave AMI). 2 were performed electively without sequelae for an unsatisfactory angio- graphic result but without ongoing ischemia. Blood transfusion for peripro- Multimedia Instructional System for Fetal Echo Expert System Aid in Differentiating Among Paroxysmal Supraventricular Tachycardias 11052-1 1 1 1052-21 work_37btiicl5fhs5lb2yfxzlmafnu ---- Local-Shapelets for Fast Classification of Spectrographic Measurements Local-Shapelets for Fast Classification of Spectrographic Measurements Daniel Gordona,c,∗, Danny Hendlera, Aryeh Kontorovicha, Lior Rokachb,c aDepartment of Computer Science, Ben-Gurion University of The Negev Be’er Sheva 84105, Israel bDepartment of Information Systems Engineering, Ben-Gurion University of The Negev Be’er Sheva 84105, Israel cTelekom Innovation Laboratories, Ben-Gurion University of The Negev Be’er Sheva 84105, Israel Abstract Spectroscopy is widely used in the food industry as a time-efficient alternative to chemical test- ing. Lightning-monitoring systems also employ spectroscopic measurements. The latter appli- cation is important as it can help predict the occurrence of severe storms, such as tornadoes. The shapelet based classification method is particularly well-suited for spectroscopic data sets. This technique for classifying time series extracts patterns unique to each class. A signif- icant downside of this approach is the time required to build the classification tree. In addition, for high throughput applications the classification time of long time series is inhibitive. Although some progress has been made in terms of reducing the time complexity of building shapelet based models, the problem of reducing classification time has remained an open challenge. We address this challenge by introducing local-shapelets. This variant of the shapelet method restricts the search for a match between shapelets and time series to the vicinity of the location from which each shapelet was extracted. This significantly reduces the time required to examine each shapelet during both the learning and classification phases. Classification based on local- shapelets is well-suited for spectroscopic data sets as these are typically very tightly aligned. Our experimental results on such data sets demonstrate that the new approach reduces learning and classification time by two orders of magnitude while retaining the accuracy of regular (non- local) shapelets-based classification. In addition, we provide some theoretical justification for local-shapelets. Keywords: Spectrography, time series, classification, shapelets, local Research highlights • We present an algorithm for classifying spectrographic measurements. • The concept of locality is introduced into an established time series algorithm. • A technique for estimating a tolerance parameter is presented. ∗Corresponding author. Tel: +972 (0)86428782; fax: +972 (0)86477650; Email addresses: gordonda@cs.bgu.ac.il (Daniel Gordon), hendlerd@cs.bgu.ac.il (Danny Hendler), karyeh@cs.bgu.ac.il (Aryeh Kontorovich), liorrk@bgu.ac.il (Lior Rokach) Preprint submitted to Expert Systems with Applications November 13, 2014 • Learning and classification times are reduced by two orders of magnitude. • Accuracy levels are retained. 1. Introduction Spectroscopy is a field devoted to the study and characterization of physical systems by measuring the electromagnetic frequencies they absorb or emit (Herrmann and Onkelinx, 1986). Items differing in their chemical composition or molecular bonds absorb or emit light at different wavelengths leaving a different spectroscopic fingerprint thus enabling differentiation between them. For example, Al-Jowder et al. (2002) used mid-infrared spectroscopy to detect meat adul- teration by comparing the spectra of adulterated meat with that of unadulterated meat. A study by Briandet et al. (1996) discriminated between two different types of coffee beans (Arabica and Robusta) using mid-infrared spectroscopy. Other methods for distinguishing between different types of food exist, which are based on wet chemical analysis (Bicchi et al., 1993; Lumley, 1996; Sharma et al., 1994). The advantages of spectroscopy over wet chemical analysis are in its sim- plicity (Briandet et al., 1996) and speed of response. Spectroscopic measurements are also generated by systems monitoring lightning (Eads et al., 2002). This application is important as relative percentages of different types of lightning can indicate the outbreak of severe storms, such as tornadoes. In addition to laboratory research, spectroscopic equipment is starting to be mass produced for every day use allowing anyone to analyze their surroundings with the aid of spectroscopic measurements (SCIO, 2014). The measurements are uploaded to a cloud service where they are analyzed and then the results of the analysis are made available. The service is cloud based, requiring algorithms with high throughput to enable a quick response to high volumes of queries by users. The outcome of the spectroscopic analysis of a physical system is a vector in which each index represents a frequency and each value is the measured intensity of that frequency. The representation of spectroscopic measures and time series are identical (Ye and Keogh, 2011a), as the only explicit data are the measurements and the meaning of each measurement is defined by its location in the vector. This equivalence allows the application of time series classifica- tion methods to the field of spectroscopy. A previous experimental study (Hills et al., 2013) showed that the shapelet based classification method is particularly suited for data sets from the field of spectroscopy, as it achieved a higher accuracy than other machine-learning classification methods. Recently, Ye and Keogh (2011a) introduced the shapelets approach for classifying time se- ries. A shapelet is a subsequence extracted from one of the time series in the data set. The shapelet is chosen by its ability to distinguish between time series from different classes. A test time series is classified based on its distance from the shapelet. In the case of multiple shapelets, these form the nodes of a classification tree. The intuition behind this approach is that the pattern best separating the classes is not necessarily an entire time series. Rather, a certain subsequence may best describe the discriminating pattern. Ye and Keogh’s algorithm considers all possible subsequences in the training set in order to identify those shapelets that yield the optimal split. Through the rest of this paper, we will refer to this algorithm as the YK-algorithm. Two key advantages of classification with shapelets are the accuracy and interpretability of the induced classification model, as it supplies information on the patterns characteristic of the different classes (Ye and Keogh, 2011a). A significant downside of this approach is the time re- quired for building the classification tree (Hills et al., 2013; Mueen et al., 2011; Rakthanmanon and Keogh, 2 0 50 100 150 200 250 Time series index A b so rb a n ce (a) Coffee data set 0 200 400 600 800 1000 Time series index A b so rb a n ce (b) Wheat data set Figure 1: Examples of two data sets (coffee and wheat) from the field of spectroscopy. The time series of each class are vertically separated and in different colors. As shown, the examples of each class are tightly aligned, i.e., similar patterns are exhibited at similar locations along the x-axis. 2013). The search for the best shapelet requires examining all subsequences of all lengths from all the time series in the training set, and for each shapelet calculating its distance to each time series at all possible locations. Even for small data sets, this process has a time scale of days, and for large data sets, the time scale becomes one of years. Hence, the original implementation on commonly available hardware is only practical on the smallest of data sets. Additionally, for high-throughput applications, the classification time may be prohibitively expensive for long time series. This is because at each node of the tree, all possible matches between the node’s shapelet and the time series to be classified need to be examined. 1.1. Our Contributions Our goal was to reduce both learning and classification time without impairing the accuracy of the induced model by exploiting a feature common to spectroscopic data – the localized nature of information in the time series. In the YK-algorithm, no importance is attributed to the location from which the shapelet was extracted. Hence, the best match between a shapelet and a time series is searched for anywhere in the time series. We observed that for many data sets from the field of spectroscopy, time series from the same class show similar behavior patterns at similar locations along the frequency axis. Fig. 1 presents examples of two data sets which strongly support this insight. Based on this insight, we propose a new property as part of the definition of a shapelet, derived from the location in the time series from which the shapelet was extracted. This property limits the scope of the search for the best match of a shapelet to a time series to the vicinity of the location from which the shapelet was extracted. The assumption of locality is justified as spectroscopic measurements of items with similar properties should have very similar spectroscopic fingerprints, especially in areas characteristic of a specimen which are not expected to be contaminated. Our current implementation assumes that all time series are of equal length. 3 Although the time series are generally aligned, some allowance for misalignment is neces- sary. We therefore introduce a method for learning the misalignment characteristic of a data set. We evaluate our approach on data sets from the field of spectroscopy, and show that local- shapelets can reduce learning and classification time (especially for data sets with long time series) by over two orders of magnitude without impairing accuracy. For reproducibility, we have made all our code available online (local shapelets, 2014). The rest of the article is organized as follows: First we present basic definitions required for understanding the article and shortly describe the YK-algorithm in Sect. 2. Then we present related work (Sect. 3), followed by a description of local-shapelets and our proposed method for determining the range to examine (Sect. 4). Next we present our experimental evaluation (Sect. 5) followed by a brief statistical analysis, which provides a theoretical justification for our local-shapelets approach (Sect. 6). Finally, we summarize our results and present additional research directions to pursue (Sect. 7). 2. Background Here we present a number of definitions necessary for the proper understanding of this article and a short description of the original shapelet algorithm as it is the basis of our work. 2.1. Definitions Definition 1. A time series T of length m is a series of m consecutive equally spaced measure- ments: T = t0, t1, ..., tm−1. Definition 2. A subsequence S of length k extracted from time series T of length m at index i such that k ≤ m is a series of k consecutive measurements: S = ti, ti+1, ..., ti+k−1. Definition 3. The Euclidean distance between two time series T,R of length m is: dE (T,R) = √√√ m−1∑ i=0 (ti − ri) 2 . Definition 4. The Euclidean distance between a time series T of length m and a subsequence S of length k such that k ≤ m is: dE (T,S ) = min i dE (T [i : i + k − 1],S ) i∈[0,m−k]. This is the minimal distance between S and all subsequences of length k in T . Definition 5. Given a data set D, with c classes and n examples, each class i with ni examples, the entropy is: Ent(D) =− c−1∑ i=0 ni n log ni n . 4 Intuitively, a data set’s entropy is a measure of its class-homogeneity. Larger homogeneity corresponds to smaller entropy values. Specifically, the smallest entropy value (0) corresponds to a data set in which all members belong to the same class. Definition 6. Given a data set D with n examples, split into two subsets D1 and D2, containing n1 and n2 examples respectively, such that D1 ∪ D2 = D and D1 ∩ D2 = ∅ the information gain (IG) is: IG(D, D1, D2) = Ent(D) − (n1 n Ent(D1) + n2 n Ent(D2) ) . Intuitively, information gain is a measure of the class-homogeneity induced by a split of data set D. The larger the information gain, the larger the decrease in entropy of the split data set w.r.t. D, in turn implying better class-homogeneity. As we will soon see, each shapelet induces a data set split and its effectiveness is measured by the split’s information gain. 2.2. The YK-Algorithm For completeness, we briefly present the YK-algorithm as presented in Ye and Keogh (2011a). First, we present the algorithm for two classes; we then extend the description to a multi-class data set. Let D be a dataset with two classes and n time series. The YK-algorithm examines all possi- ble subsequences of every length (from a minimal length, usually 3, to a maximal length which is usually the length of the shortest time series) from every time series. For each subsequence S , the distance to each time series is calculated, as defined in Definition 4. Then, the time series are ordered by their distance from S . Using this induced order, the average distance of every two adjacent time series to S is calculated. We will refer to this average distance as the splitting distance. Each of the n splitting distances defines two subsets, one containing all time series with a distance to S smaller than or equal to the splitting distance, and the other containing all time series with a distance to S greater than the splitting distance. For every possible split into two subsets, the information gain is calculated (see Definition 6). If the current information gain is better than the best so far, the shapelet is kept along with the corresponding splitting distance. Tie breaking is done by keeping the shapelet which induces a larger average distance between the two subsets which is referred to as the margin. After checking all possible subsequences, the best shapelet and the corresponding splitting distance are returned. This method can be easily extended to a multi-class problem by building a tree, with a shapelet and splitting distance in each node. A new node receives one of the two subsets created by the shapelet found by the node above it, and learns the best shapelet and splitting distance for this subset of time series. The stopping criteria for this recursive algorithm is that all the time series in the subset be of one class. Two important implementation issues are that all distance calculations are computed after local normalization (Goldin and Kanellakis, 1995) and that the margin is normalized, by dividing it by the length of the subsequence. Classification of a time series T is accomplished by traveling down the tree. At each node the distance of the shapelet S associated with the node to T is calculated. The node decides to which of its child nodes T should be directed, depending on whether its distance from S is smaller or greater than the splitting distance. When T reaches a leaf, it is assigned the class associated with this leaf. 5 2.2.1. Time Complexity of the YK-Algorithm Let m denote the length of a time series and let us assume all time series are of equal length. Assuming all shapelet lengths from 3 to m are examined, the number of different shapelets to examine in a single time series is ∑m i=3 i = O(m 2). Let n denote the number of time series in data set D. The number of shapelets to examine in the entire data set is O(nm2). When searching for the minimal distance between a shapelet S of length k and a time series T , the distance of S to all subsequences of length k in T needs to be calculated (see Definition 4). The time complexity of this operation is O(m2). Calculating the distance of S to all time series requires O(nm2) calculations. The total number of calculations for all shapelets and time series is O(n2m4) which explains the formidable time required for learning a model even for small data sets. 3. Related Work The time complexity of the YK-algorithm is formidable (see Sect. 2.2.1) leading to a large number of attempts to reduce it. As we will show in this section, none of the previous approaches utilized the location from which the shapelet was extracted to reduce the time required to learn a model. In addition, most of these approaches do not reduce the time required to classify a time series. The first attempt to reduce the time complexity of the YK-algorithm was introduced in the paper first presenting shapelets (Ye and Keogh, 2011a). Two optimizations were suggested. The first optimizes the distance calculation of a shapelet to a time series. The distance calculation is terminated if it exceeds the minimum distance found so far between the current shapelet and time series. This optimization was coined early-abandon. The second optimization (named entropy- pruning) checks whether the most optimistic IG possible, given the distances of a shapelet to time series already computed, can be better than the best IG found so far. If the IG cannot be improved, the shapelet under examination is discarded. As pointed out by Lines et al. (2012) this optimization requires testing O(2c) different possibilities (c is the number of classes in the data set), which can greatly reduce the effectiveness of this optimization when the number of classes is large. Later, Mueen et al. (2011) introduced additional optimizations. One optimization manages to compute the distance of a shapelet to a time series in constant time by precomputing necessary statistics. This optimization manages to reduce the time complexity to O(n2m3). A major down- side is that for each two time series, a matrix of size m2 needs to be maintained. This leads to a total space complexity of O((nm)2) which is untenable for large data sets (Gordon et al., 2015; Rakthanmanon and Keogh, 2013). A second optimization discards shapelets similar to shapelets that were already discarded. A disadvantage of this approach is the large time overhead when applied to data sets with a large number of classes. Two recent attempts managed to dramatically reduce the time complexity for learning a shapelet based model. The first method (Rakthanmanon and Keogh, 2013) quickly picks out a small number of shapelets from each shapelet length which seem able to effectively divide the data set into its classes. Then only this subset of shapelets are fully analyzed. This approach manages to reduce the time complexity to O(nm3) but requires a considerable amount of space to accommodate this reduction in time complexity. We will refer to this solution as the hashing- algorithm. A second approach (Gordon et al., 2015), named SALSA-R, randomly samples a constant number of shapelets (10,000) which are examined, reducing the time complexity to O(10,000 × nm2) with no excess space requirements. 6 As shown, none of the aforementioned optimizations utilize the location from which the shapelet was extracted to reduce the time complexity of the learning process. In addition none of them significantly reduces classification time. Xing et al. (2011) introduced an optimization for fast classification of streaming time series with the aid of shapelets. Their main idea was to prefer shapelets which appear early in a time series over shapelets which appear later. This ensures that time series can be classified quickly once initial measurements have arrived with no need to wait for further measurements. They coined this new type of shapelets as local-shapelets. Although we both name our shapelets similarly, the concepts differ. The motivation of Xing et al. was to classify streaming time series as early on as possible while ours is to optimize learning and classification time of non-streaming data sets. Also, the implementations differ. Xing et al. did not preserve the location from which the shapelet was extracted. Conversely, with our method, the location from which the shapelet was extracted is exploited with no limitation on the location from which to extract a shapelet. 4. Local-Shapelets In the material that follows, the basic idea of local-shapelets is described as well as modifi- cations which make it useful in practice. Definition 7. A shapelet is a tuple <~S ,d>. ~S is a series of consecutive measurements extracted from one of the time series in the data set and d is a cutoff distance. Time series with a distance to S smaller or equal to d traverse one side of the tree while time series with a distance to S greater than d traverse the other side of the tree. Definition 8. A local-shapelet is a tuple <~S ,d, i>. ~S and d are as in Definition 7. i is the location from which the local-shapelet was extracted. Unlike a shapelet, a local-shapelet contains information regarding the location from which it was extracted, which is utilized when calculating the distance of the local-shapelet to a time series. Definition 9. The Euclidean distance between a time series T of length m and a local-shapelet <~S ,d, i> of length k such that k ≤ m given x which defines a range around i is: dE (T,S ) = min j dE (T [ j : j + k − 1],S ) j∈[max(0,i−x),min(i+x,m−k)]. The distance between T and a local-shapelet <~S ,d, i> adds a constraint on the subsequences of T to which the distance of subsequence ~S is calculated. Instead of calculating the distance of ~S to all subsequences of length k in T , the distance of ~S is calculated only to subsequences in the vicinity of the location i from which ~S was extracted. This vicinity is defined by the constant x. Thus, the time required for calculating the distance of a shapelet to a time series is reduced. 4.1. The Tolerance Range A naı̈ve implementation of local-shapelets is to calculate the distance of a shapelet only to the single subsequence in the time series at the exact location i from which it was extracted. This approach may have detrimental impact on the learning process as exemplified in Fig. 2 which presents two time series from the same class of the data set Lightning7. As is clearly shown, the characteristic spike may appear at slightly different locations. Restricting the distance calculation 7 Time series index 0 100 200 300 A b so rb a n ce (a) Example from class 2 Time series index 0 100 200 300 A b so rb a n ce (b) Example from class 3 Figure 2: Two examples from different classes of the Lightning7 data set illustrating the need for a tolerance range. Without a tolerance range the spike characteristic of each class cannot be utilized as its location is different in different time series. of a shapelet to the exact location from which it was extracted would cause characteristic patterns to be overlooked. To accommodate this issue some tolerance, which we will refer to as the radius, needs to be added to the index i, such that the distance of a local-shapelet to a time series T is the minimum distance to all subsequences starting in the range [i − radius, i + radius] (see definition 9). We will refer to this range as the tolerance range. In food spectroscopy there are many sources of noise, such as differences in the residual water content of the freeze-dried samples (Briandet et al., 1996). This noise may cause distortions in the pattern generated but will not cause a shift in the frequencies emitted, therefore we set the tolerance range to 0. In lightning spectroscopy, interferences such as frequency-dependent dispersion induced by the ionosphere (Moore et al., 1995) may lead to a shift in the frequencies measured requiring the introduction of a tolerance range greater than 0. We present a method for computing the value of the tolerance required, as described in Pro- cedure 1. This is achieved by experimentally observing the tolerance required for a small number of subsequences. Procedure 1 splits each time series into ten equal consecutive and disjoint sub- sequences (line 1). From each class (C) and for each subsequence location, a single subsequence (subseq) is extracted from a randomly selected time series (rand-ts) (lines 2-6). Then the distance of subseq to all time series in C is calculated using the global method for calculating distances (see Definition 4). The location of the best match of subseq to each time series is recorded in locations (line 8). Before utilizing the information available in the list of locations, it is necessary to filter out values which are obviously non-characteristic of the data set (line 9). A major motivation is that a larger tolerance leads to an increase in learning and classification time as more distance calculations are required for each shapelet. Filtering out locations which are atypical should not impair accuracy significantly. For example, let us suppose we receive the following list of locations: 1,30,30,31,32,34,37,38,40,40,41,81. It is quite clear that the values 1,81 are outliers 8 Procedure 1 Algorithm for computing the tolerance characteristic of a data set Compute-Tolerance(D) {Input is a data set} 1: start-indices ← indices, s.t. time series will be split into 10 subsequences which are as equal in length as possible 2: for each class C in D do 3: for each start-index i in start-indices do 4: subseq-length ← length of subsequence 5: rand-ts ← randomly selected time series from C 6: subseq ← rand-ts[i : i+subseq-length-1] 7: for each time series t in class C do 8: locations ← append(location-of-best-distance(t, subseq)) 9: locations ← outlier-filter(locations) {Using IQR-filtering} 10: Radiusc,i ← max(locations) - min(locations) 11: Radiusc ← min(Radiusc,i) 12: tolerance ← max(Radiusc) 13: return tolerance and should be filtered out. Without filtering, the size of the range is 81, while after filtering, the size of the range is only 12. Our method for outlier filtering is based on the interquartile range (IQR). Using this method, the first (Q1) and third (Q3) quartiles are calculated and the IQR is calculated as I QR = Q3 − Q1. All values greater than Q3 +3× I QR or smaller than Q1 −3× I QR are filtered out. Although it is customary to use 1.5 as the multiplication factor, we chose a multiplication factor of 3 so as to not filter out too many values which may lead to overfitting. Our tuning of the multiplication factor to a value of 3 is a heuristic as it cannot guarantee total avoidance of overfitting, because other parameters such as the model complexity also play a part. Three advantages of IQR filtering are that it is simple, that it is a-parametric and that it does not automatically drop extreme values if they are similar to the rest. Once we have filtered out the outliers, the characteristic radius as reflected by this subse- quence of class C is calculated and recorded (line 10). After a radius for each of the subse- quences of a class has been calculated, the minimum of all these radii is selected to represent the locality of the class (line 11). We chose the minimal radius as this leads to the smallest number of distance calculations of a shapelet to a time series, which promises the best possible speedup in runtime. The last stage is the selection of a single radius as the tolerance for the data set. We chose the maximum radius from all classes (line 12) as the tolerance must accommodate the most loosely localized class. Otherwise, for some of the classes, the ultimate matches may reside outside of the recommended range and will not be examined. The time complexity of this phase is negligible as only 10 subsequences per class are exam- ined and the distance of each subsequence is calculated only to time series of its class. 4.2. Random Selection of Shapelets For completeness, we present the procedure used by SALSA-R for randomly selecting shapelets in Procedure 2. We chose a distribution similar to the uniform distribution but simpler to imple- ment. Procedure 2 describes the method for randomly selecting the next shapelet to examine. First (line 2), the time series from which to extract the shapelet is chosen. Then (lines 3-4), the 9 Procedure 2 Algorithm for randomly selecting shapelets extract-shapelet(D,min-sh-length) {Input is a data set and the minimum length of a shapelet} 1: num-ts ← number-of-time-series-in-data-set(D) 2: ts-index ← random-selection(0,num-ts) 3: highest-index ← times-series-length - min-sh-length + 1 4: sh-index ← random-selection(0,highest-index) 5: longest-possible-shapelet ← times-series-length - sh-index +1 6: sh-length ← random-selection(min-sh-length,longest-possible-shapelet) 7: sh ← extract-shapelet(D,ts-index,sh-index,sh-length) 8: return sh index in the time series from which to extract the shapelet is randomly generated. The range of possible indices is between 0 and the last index from which the shortest possible shapelet can be extracted. Last (lines 5-6), the length of the shapelet is randomly selected. The upper limit on the length of the shapelet (longest-possible-shapelet) is calculated based on the location from which the shapelet is to be extracted (line 5). The function random-selection(a,b) randomly selects an integer from the range [a,b-1] with a uniform distribution. 5. Experimental Results Our goal is to show that for data sets from the field of spectroscopy, the usage of local- shapelets reduces the time complexity during both training and classification phases in com- parison with global-shapelets (i.e., non-local shapelets) without degrading accuracy. First, we present the data sets from the field of spectroscopy with which we evaluated local-shapelets. Then, we establish the utility of our method (Procedure 1) for calculating the tolerance range. In the next phase of our evaluation we compare local-shapelets vs. global-shapelets within the YK- algorithm. In the last phase, we re-implement SALSA-R to utilize the locality of shapelets. We compare accuracy and run-time with the original implementation of SALSA-R and the hashing- algorithm. We ran all experiments on an Intel Xeon E5620 computer comprising two 2.40GHz processors with 24GB of RAM and with a 64-bit version of Windows Server 2008 R2 as the operating system. 5.1. Description of Data Sets Our experiments were conducted on a collection of 6 data sets from the field of spectroscopy, available online (Ye and Keogh, 2011b; Keogh et al., 2014). The collection contains four data sets from the field of food spectroscopy (Beef, Coffee, OliveOil, Wheat) and two from the field of lightning spectroscopy (Lightning2, Lightning7). Table 1 contains information on the number of examples in the training and test sets, the number of classes, the length of the time series and the tolerance range used. All data sets were already split into train and test sets. We preserved the original division to train and test sets to allow easy reproduction of our results, as well as a fair comparison with other published results. As the test sets are very small, our initial measurements of classification times were inaccu- rate due to minor overheads which dampened the effect of locality on the outcome and due to the inaccuracy of computer time measurements at small time scales. We solved this by enlarging each test set to a size of 1GB. This was done by duplicating examples. To ensure that the ac- curacy obtained would be identical to that on the original data set, we duplicated each example 10 an equal number of times. As the classifier is deterministic, the classification of an example will always be identical no matter how many times it appears. Therefore, the proportion of correct classifications out of all classification examples will not change and the accuracy will remain the same. Table 1: Description of the data sets dataset train set test set num. time series tolerance size size classes length Beef 30 252,510 5 470 0 Coffee 28 383,264 2 286 0 Lighting2 60 106,140 2 637 0 Lighting7 70 209,510 7 319 78 OliveOil 30 208,770 4 570 0 Wheat 49 115,434 7 1,050 0 5.1.1. Food Spectrographs Beef. The beef data set (Al-Jowder et al., 2002) contains the spectral absorbance of one type of beef cut (silverside). One class contains the spectral absorbance of the beef cut without any contaminates. Each of the other four classes contains the spectral absorbance of the beef cut contaminated with a different type of offal (kidney, liver, heart and tripe) which is cheaper than the declared beef cut. Coffee. The coffee data set (Briandet et al., 1996) contains the spectral absorbance of instant coffee of two different types of coffee beans Arabica and Robusta. Coffee from Arabica beans is more highly estimated as it has a finer and more pronounced taste. Approximately 90% of world coffee production is from Arabica and another 9% is from Robusta. As the price of Arabica is higher than that of Robusta, it is important to be able to distinguish between them even after the long process that is required to produce instant coffee. OliveOil. Olive oil samples from four different European countries (Greece, Italy, Portugal and Spain) were collected (Tapp et al., 2003). The classification task is to be able to discern the country of origin using the spectrograph of the olive oil sample. Wheat. This data set (Ye and Keogh, 2011a) consists of spectrographs of wheat grown during the years 1998-2005. There are a number of different types of wheat in the data set but the class was assigned based only on the year in which the wheat was grown and not on the type of wheat. 5.1.2. Lightning Spectrographs Data on frequencies emitted during lightning events were collected and then a Fourier trans- form was applied to produce spectrographs (Eads et al., 2002). The lightning events were cat- egorized into 7 different classes differing in the charge of the lightning (positive or negative), whether the event was gradual or abrupt and whether the event was intra-cloud or from cloud to ground. The original authors of this data set (Eads et al., 2002) note that there is a large inter- class variation and intra-class similarity. The data set Lightning7 contains examples of all 7 classes, while Lightning2 is a simpler binary problem of distinguishing between cloud to ground events and intra-cloud events. 11 5.2. Utility of Tolerance Range Calculation The last column of Table 1 presents the tolerance range used during our experiments. For the four data sets of food spectroscopy, the value of the tolerance range was not calculated using Procedure 1; rather we set it to 0 based on prior knowledge in this domain. A comparison of the tolerance range based on prior knowledge with those recommended by Procedure 1 shows large agreement. For three data sets (Coffee, OliveOil and wheat) values are identical. For the Beef data set, although the calculated tolerance range was not exactly 0, it was very close with a value of 2. Our procedure also succeeds when a tolerance range greater than 0 is required. When applied to the Lightning7 data set for which it is clear a tolerance range larger than 0 is required, as shown in Fig. 2, the calculated tolerance range is 78. These findings show that our method for predicting the tolerance range manages to successfully estimate the required range. 5.3. Local YK-Algorithm Here we show that local-shapelets reduce the time consumption without impairing accuracy. We compare local and global shapelets within the YK-algorithm framework which is the initial implementation of the shapelet algorithm which examines all possible shapelets. Due to the large time and space requirements of the YK-algorithm we could not collect results for the Lightning2 data set. For these experiments, we used the original code used by Ye and Keogh (2011a). As pointed out by Hills et al. (2013), the entropy-pruning optimization (see Sec. 3) has an overhead which grows exponentially with the increase in the number of classes in the data set. We encountered this experimentally with the wheat data set which has 7 classes. The original implementation did not finish examining all shapelets for the first node of the tree after 5 days, while the same implementation without the entropy-pruning optimization finished learning the whole tree in this period of time. Therefore we conducted the experiments using the original code without the entropy-pruning optimization. Results of our experiments are presented in Table 2. Each two columns compare results of global-shapelets vs. local-shapelets. The first comparison is the accuracy achieved, the second is the time required to learn a model and the third is the time required to classify the test set. In all measures, the local approach outperforms the global approach. The average improvement in accuracy is 8%, the average speedup during the learning phase is 9.5 and during the classification phase it is 80. Table 2: Global YK-algorithm vs. Local YK-algorithm data set accuracy (%) learning time (sec) classification time (sec) global local global local global local Beef 46.67 56.67 24,307 1,364 76.03 1.10 Coffee 96.43 92.86 1,466 407 36.79 1.54 Lighting7 43.84 53.42 98,636 67,660 109.75 59.58 OliveOil 33.33 50.00 17,448 2,287 46.85 1.01 Wheat 57.71 65.43 433,630 25,221 158.36 0.59 5.4. Local-SALSA-R In this set of experiments we re-implemented SALSA-R to use local-shapelets instead of global-shapelets. The number of shapelets randomly selected was set to 10,000 which was found 12 to be an optimal number by Gordon et al. (2015). We compared local-SALSA-R with global- SALSA-R and the hashing-algorithm. We repeated our experiments thirty times as each of the three methods includes an element of randomness. Table 3 presents a comparison of the average accuracy of local-SALSA-R with global- SALSA-R and the hashing-algorithm. We applied a Friedman test (Friedman, 1937) to test if there is a significant difference between accuracy attained by the different methods. The p-value was 0.85, which clearly shows that there is no significant difference in the accuracy achieved by any of the methods. Table 3: Comparison of accuracy of local-SALSA-R with that of global-SALSA-R and the hashing-algorithm data set global-SALSA-R hashing-algorithm local-SALSA-R Beef 51.78 50.78 62.11 Coffee 94.17 92.74 97.02 Lighting2 67.98 67.22 66.17 Lighting7 58.45 61.62 59.13 OliveOil 73.11 73.33 73.33 Wheat 66.72 69.94 66.46 A comparison of average learning and classification times is presented in Table 4. A Fried- man test on the learning and classification times shows that there is a significant difference be- tween the methods during both the learning (p-value = 0.0057) and classification phases (p-value = 0.030). A one sided Wilcoxon-test (Wilcoxon, 1945) affirms our claim that local-shapelets are significantly faster than global-shapelets during the learning phase (p-value = 0.016 for both global-SALSA-R and the hashing-algorithm) and the classification phase (p-value = 0.016 for global-SALSA-R and p-value = 0.031 for the hashing-algorithm). On average, local-SALSA-R reduces the time required to learn a model by a factor of 120 and 440 in comparison with global- SALSA-R and the hashing-algorithm, respectively. During classification, local-SALSA-R is 180 times faster than global-SALSA-R and 110 times faster than the hashing-algorithm, on average. Table 4: Comparison of learning and classification time of local-SALSA-R with that of global-SALSA-R and the hashing-algorithm data set learning time (sec) classification time (sec) global- hashing- local- global- hashing- local- SALSA-R algorithm SALSA-R SALSA-R algorithm SALSA-R Beef 44.98 169.77 0.47 87.46 104.87 0.51 Coffee 16.15 12.81 0.26 31.61 34.06 0.47 Lighting2 164.09 539.11 1.00 144.78 86.10 0.85 Lighting7 56.51 205.17 35.39 152.61 95.97 111.30 OliveOil 59.39 114.37 0.50 65.52 45.27 0.51 Wheat 315.19 1693.48 1.17 231.46 78.71 0.42 Analytically, it is easy to argue that the longer the time series, the greater the time saved by using local-shapelets, as the number of distance calculations avoided increases. We confirmed this experimentally as can be seen in Fig. 3. The figure shows the ratio between the time required by global-SALSA-R and local-SALSA-R as a function of the length of the time series for all data sets for which the tolerance range used was 0 (all data sets apart from Lightning7). This 13 figure clearly confirms that the ratio increases with the length of the time series. We chose to compare local-SALSA-R with global-SALSA-R and not with the hashing-algorithm as their implementation apart from the aspect of locality is identical, allowing isolation of locality as the only parameter influencing the outcome. Time series length T im e s p e e d u p 0 200 400 600 800 1000 0 1 0 0 3 0 0 5 0 0 Learning time Classification time Figure 3: Each point is the ratio of the time required by global-SALSA-R and local-SALSA-R. One plot is for learning times and the second for classification times. As can be seen the ratio increases with the length of the time series. 6. Statistical analysis The approach proposed in this paper was mainly motivated by algorithmic considerations: restricting the search to a small subset of the possible shapelet locations significantly speeds up both training and classification. In this section, we will argue that as a by-product, our approach offers statistical advantages as well. By restricting the number of features, we are constraining the complexity of the hypothesis class. As we show below, hypothesis classes of low complexity require fewer training examples to attain a certain accuracy level. The argument is made precise in the language of learning theory. A learner faced with a classification task is trying to learn a function g : X → {−1,1}, where X is the instance space (in our case, it is the set of all possible time series). The learner gets to observe example-label pairs (Xi,Yi) ∈ X× {−1,1} generated iid from some unknown distribution P over X× {−1,1}. This corresponds to the intuition that the training labels may be noisy, and indeed, there may be no “correct” classifier g : X → {−1,1} that achieves perfect accuracy. Although there are universal approximators capable of fitting arbitrary labeled samples, if unconstrained they will necessarily overfit (Devroye et al., 1996). Hence, when choosing the learning model, its richness (i.e., hypothesis complexity) must be taken into account. 14 The learner’s n observed labeled examples (Xi,Yi) constitute the training set, based on which it will produce a hypothesis g : X→{−1,1}. We will denote by H the collection of all admissible hypotheses (formally, H ⊂ 2X) and associate with every h ∈ H two key quantities: its sample (or training) error, êrr(h) = 1 n n∑ i=1 1{h(Xi ),Yi} and generalization error, err(h) = E[1{h(Xi ),Yi}] = P(h(X) , Y ). In words, êrr(h) is the relative fraction of mistakes that h makes on the training set while err(h) is the probability that h makes a mistake on a freshly drawn (X,Y ) pair — crucially, drawn from the same distribution used to generate the training set. Note that while the typical goal is to guarantee a small generalization error, the latter quantity cannot be computed without knowledge of the sampling distribution. Instead, the readily computable êrr(h) may be used (under certain conditions, detailed below) as a proxy for err(h). In this setting, the learner’s task is twofold: (i) algorithmic: efficiently find an h ∈ H for which êrr(h) is small, and (ii) statistical: guarantee that, with high probability, err(h) will not be much greater than êrr(h), regardless of which h ∈H the learner chooses. The foregoing sections were devoted to the algorithmic aspects, and we shall focus on the statistical one here. In the case of finite H, a particularly simple connection exists between err(h) and êrr(h): Theorem 1 (Mohri et al. (2012)). Suppose that |H|<∞ and the learner observes a training set consisting of n examples. Then, for any δ > 0, we have that err(h) ≤ êrr(h) + √ log |H|+ log(1/δ) 2n (1) holds with probability at least 1 −δ, uniformly over all h ∈H. For simplicity, let us consider the case where H = 2X (i.e., H consists of all possible binary functions). In this case, |H| = 2|X|, and hence even a modest reduction of the instance space — by reducing the feature set, for example — can have a noticeable effect on the second term in the right-hand side of (1), and hence yield a faster convergence rate. The basic features used in this paper are distances from a shapelet to a time series. It is precisely this feature set that gets reduced when our algorithm considers only a subset of the possible locations. This observation provides a statistical justification to our local-shapelets approach, in addition to the algorithmic speedup. 7. Conclusions The objective of our investigation was to utilize the localization of characteristic patterns in spectrographic measurements using the shapelet algorithm, which has previously been found to be suited (Hills et al., 2013) for this domain. Our adaption to the shapelet algorithm reduces the number of distance calculations and thus shortens the time required to train a model and classify examples. 15 As pointed out by Ye and Keogh (2011a), one important advantage of the shapelet approach is its interpretability, i.e., the process extracts informative subsequences representative of each class. The chosen shapelets provide insights into patterns characteristic of each class. One such example can be seen in Fig. 4. The shapelet, shown as a dashed red line, is overlaid on each of the two time series at the location from which it was extracted and represents a pattern characteristic of class 2. In addition to identifying the discriminative pattern, local-shapelets also identify the discriminative frequencies. We compared the frequencies found to be discriminative by the shapelet with those found to be discriminative by Briandet et al. (1996) and found that they coincide. Time series index A b so rb a n ce 160 180 200 220 240 260 Class 1 Class 2 Figure 4: Interpretability of local-shapelets. Two time series from the coffee data set are presented in black. Each time series is from a different class. Only part of each time series is presented so as to focus on the important details. Overlaid on each of the time series is the shapelet selected by local-SALSA-R as most discriminative appearing as a dashed red line. As can be seen the shapelet represents time series from class 2 (Robusta coffee beans). The algorithm we presented searches for matches of a shapelet on a time series only in the vicinity of the location from which the shapelet was originally extracted. We proposed an algo- rithm (Procedure 1) for calculating the exact vicinity based on properties of each class in the data set. Our main result is that local-shapelets can indeed speedup both the learning and classifica- tion methods 100-fold when using SALSA-R without impairing accuracy. We also show that our estimation of the vicinity to examine is quite accurate. This research can be extended in many ways. It may be interesting to explore possible trade- offs between speedup and accuracy as a function of the vicinity to examine. Another research direction is the application of local-shapelets to domains other than spectroscopy which may also require the adaption of our algorithm for time series of different lengths. 16 References R. Herrmann, C. Onkelinx, Quantities and units in clinical chemistry: Nebulizer and flame properties in flame emission and absorption spectrometry (Recommendations 1986), Pure and Applied Chemistry 58 (12) (1986) 1737–1742. O. Al-Jowder, E. K. Kemsley, R. H. Wilson, Detection of adulteration in cooked meat products by mid-infrared spec- troscopy, Journal of Agricultural and Food Chemistry 50 (6) (2002) 1325–1329. R. Briandet, E. K. Kemsley, R. H. Wilson, Discrimination of Arabica and Robusta in instant coffee by Fourier transform infrared spectroscopy and chemometrics, Journal of Agricultural and Food Chemistry 44 (1) (1996) 170–174. C. P. Bicchi, A. E. Binello, M. M. Legovich, G. M. Pellegrino, A. C. Vanni, Characterization of roasted coffee by S- HSGC and HPLC-UV and principal component analysis, Journal of Agricultural and Food Chemistry 41 (12) (1993) 2324–2328. I. Lumley, Authenticity of meat and meat products, in: Food Authentication, Springer, 108–139, 1996. N. Sharma, A. Srivastava, J. Gill, D. Joshi, Differentiation of meat from food animals by enzyme assay, Food Control 5 (4) (1994) 219–221. D. R. Eads, D. Hill, S. Davis, S. J. Perkins, J. Ma, R. B. Porter, J. P. Theiler, Genetic algorithms and support vector machines for time series classification, in: International Symposium on Optical Science and Technology, International Society for Optics and Photonics, 74–85, 2002. SCIO, http://www.consumerphysics.com/myscio/, 2014. L. Ye, E. Keogh, Time series shapelets: a novel technique that allows accurate, interpretable and fast classification, Data Mining and Knowledge Discovery (2011a) 1–34. J. Hills, J. Lines, E. Baranauskas, J. Mapp, A. Bagnall, Classification of time series by shapelet transformation, Data Mining and Knowledge Discovery (2013) 1–31. A. Mueen, E. Keogh, N. Young, Logical-shapelets: an expressive primitive for time series classification, in: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 1154–1162, 2011. T. Rakthanmanon, E. Keogh, Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets, in: Proceed- ings of the Thirteenth SIAM Conference on Data Mining (SDM), SIAM, 668–676, 2013. local shapelets, ftp://www.ise.bgu.ac.il/, 2014. D. Goldin, P. Kanellakis, On similarity queries for time-series data: Constraint specification and implementation, in: Principles and Practice of Constraint Programming CP’95, Springer, 137–153, 1995. J. Lines, L. M. Davis, J. Hills, A. Bagnall, A shapelet transform for time series classification, in: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 289–297, 2012. D. Gordon, D. Hendler, L. Rokach, (in press) Fast and Space-Efficient Shapelets-Based Time-Series Classification, Intelligent Data Analysis 19 (5). Z. Xing, J. Pei, S. Y. Philip, K. Wang, Extracting Interpretable Features for Early Classification on Time Series., in: Eleventh SIAM International Conference on Data Mining (SDM), SIAM, 247–258, 2011. K. R. Moore, P. C. Blain, S. D. Briles, R. G. Jones, Classification of RF transients in space using digital signal processing and neural network techniques, in: SPIE’s 1995 Symposium on OE/Aerospace Sensing and Dual Use Photonics, International Society for Optics and Photonics, 995–1006, 1995. L. Ye, E. Keogh, shapelet data sets, http://alumni.cs.ucr.edu/~lexiangy/shapelet.html, 2011b. E. Keogh, Q. Zhu, B. Hu, H. Y., X. Xi, L. Wei, C. A. Ratanamahatana, The UCR Time Series Classification/Clustering Homepage, www.cs.ucr.edu/~eamonn/time_series_data/, 2014. H. S. Tapp, M. Defernez, E. K. Kemsley, FTIR spectroscopy and multivariate analysis can distinguish the geographic origin of extra virgin olive oils, Journal of Agricultural and Food Chemistry 51 (21) (2003) 6110–6115. M. Friedman, The use of ranks to avoid the assumption of normality implicit in the analysis of variance, Journal of the American Statistical Association 32 (200) (1937) 675–701. F. Wilcoxon, Individual comparisons by ranking methods, Biometrics Bulletin 1 (6) (1945) 80–83. L. Devroye, L. Györfi, G. Lugosi, A probabilistic theory of pattern recognition, vol. 31 of Applications of Mathematics (New York), Springer-Verlag, New York, ISBN 0-387-94618-7, 1996. M. Mohri, A. Rostamizadeh, A. Talwalkar, Foundations of machine learning, MIT press, 2012. 17 http://www.consumerphysics.com/myscio/ ftp://www.ise.bgu.ac.il/ http://alumni.cs.ucr.edu/~lexiangy/shapelet.html www.cs.ucr.edu/~eamonn/time_series_data/ Introduction Our Contributions Background Definitions The YK-Algorithm Time Complexity of the YK-Algorithm Related Work Local-Shapelets The Tolerance Range Random Selection of Shapelets Experimental Results Description of Data Sets Food Spectrographs Lightning Spectrographs Utility of Tolerance Range Calculation Local YK-Algorithm Local-SALSA-R Statistical analysis Conclusions work_37p7nlfftrhp3cya4xhtjt7c7y ---- An expert system for diagnostics and estimation of steam turbine components' condition K. E. Aronson, et al., Int. J. of Energy Prod. & Mgmt., Vol. 5, No. 1 (2020) 70-81 © 2020 WIT Press, www.witpress.com ISSN: 2056-3272 (paper format), ISSN: 2056-3280 (online), http://www.witpress.com/journals DOI: 10.2495/EQ-V5-N1-70-81 AN EXPERT SYSTEM FOR DIAGNOSTICS AND ESTIMATION OF STEAM TURBINE COMPONENTS’ CONDITION KONSTANTIN E. ARONSON, BORIS E. MURMANSKY, ILIA B. MURMANSKII & YURI M. BRODOV Ural Federal University named after the First President of Russia B.N. Yeltsin, Ekaterinburg, Russia ABSTRACT This article describes an expert system of probability type for diagnostics and state estimation of steam turbine technological subsystems’ components. The expert system is based on Bayes’ theorem and permits one to troubleshoot the equipment components, using expert experience, when there is a lack of baseline information on the indicators of turbine operation. Within a unified approach, the expert system solves the problems of diagnosing the flow steam path of the turbine, bearings, thermal expan- sion system, regulatory system, condensing unit, and the systems of regenerative feed-water and hot water heating. The knowledge base of the expert system for turbine unit rotors and bearings contains a description of 34 defects and 104 related diagnostic features that cause a change in its vibration state. The knowledge base for the condensing unit contains 12 hypotheses and 15 pieces of evidence (indica- tions); the procedures are also designated for 20 state parameters’ estimation. Similar knowledge bases containing the diagnostic features and fault hypotheses are formulated for other technological subsys- tems of a turbine unit. With the necessary initial information available, a number of problems can be solved within the expert system for various technological subsystems of steam turbine unit: for steam flow path, it is the correlation and regression analysis of multifactor relationship between the vibration and the regime parameters; for thermal expansion system, it is the evaluation of force acting on the longitudinal keys depending on the temperature state of the turbine cylinder; for condensing unit, it is the evaluation of separate effect of the heat exchange surface contamination and of the presence of air in condenser steam space on condenser thermal efficiency performance, as well as the evaluation of term for condenser cleaning and for tube system replacement. With the lack of initial information, the expert system formulates a diagnosis and calculates the probability of faults’ origin. Keywords: diagnostic, diagnostic features, expert system, evidence, faults, hypotheses, steam turbine. 1 INTRODUCTION At present, the research on the state parameters of turbine unit equipment, forecasting of their changes and determination of their residual life has become widespread [1–3]. In world prac- tice, technical condition monitoring and equipment diagnostics are carried out by the centers operating at the manufacturing plant or in a large operating organization. At the same time, the working methods of these centers at the enterprises have been developed individually for many years and are a commercial secret. Steam and gas turbine manufacturers in the territory of Russia currently do not have such centers [4, 5]. Some works of certain enterprises are known concerning individual areas of diagnosis [6–8]. The lack of common approaches to the development of diagnostic tasks for various elements of equipment makes it difficult to implement them at thermal power plants. Expert systems (ES) seem to be the most ambitious approach for many diagnostic tasks. This methodology is connected with a huge volume of factors of steam turbines operating. ES are very advantageous for steam turbine unit (STU) diagnosis. These systems are designed to solve problems that are difficult to formulate. The ES is based on Bayes’ theorem and permits one to troubleshoot the equipment components when there is a lack of baseline information on the indicators of turbine unit operation. The system also employs the experi- ence of experts. The inaccuracy and lack of initial information are taken into account by K. E. Aronson, et al., Int. J. of Energy Prod. & Mgmt., Vol. 5, No. 1 (2020) 71 probabilistic methods using Bayes’ formula. The value of each piece of evidence is deter- mined by the Naylor method [9–12]. 2 METhODOLOGY An ES comprises a knowledge base and an information processing algorithm. The knowl- edge base contains information about STU failures as fault hypotheses and a table of evidence. The a priori probability of fault hypotheses and the evidence value are determined by the experts. The ES analyzes the evidence. If information is missing, the system receives it from a user or out of a database. The user sets the values of pieces of evidence and the ES calcu- lates a posteriori probabilities of hypotheses and forms a conclusion about the cause of failure. The system then makes recommendations to the staff about how to eliminate the malfunction. According to Bayes’ theorem [9], a posteriori probability of the hypothesis is calculated by the formula: P(h/E) P(E/h) P(h) P(E) = ⋅ , (1) where Р(Е/Н) is the probability of evidence Е, if the hypotheses h is true; Р(Н) stands for a priori probability of the hypotheses h; and P(E) stands for the probability of evidence E. P(E) is determined by the formula of total probability: P(E) = P(E/h)×P(h) + P(E/ h )×P( h ), (2) where Р(Е/ h ) is the probability of evidence Е, if the hypotheses h is false: Р( h ) = 1 – Р(Н). From (1) and (2), a posteriori probability of the hypothesis is calculated: P h E P E h P h P E h P h P E h P h ( / ) ( / ) ( ) ( / ) ( ) ( / ) ( ) = ⋅ ⋅ + ⋅ . (3) The ES comprises the knowledge bases designed for turbine flow part, for turbine bear- ings, for system of thermal expansions, for automatic regulatory system, for condensing unit, and for the systems of regenerative feed-water heating and hot water heating. For diagnostics, knowledge bases for various defects were collected. They were divided according to the tur- bine subsystem: vibration diagnostics, control system, heat expansion system, auxiliary equipment, etc. Each subsystem contains its own list of defects and diagnostic features. For example, the knowledge bases of the ES for vibration diagnostics of turbine rotors, bearings and other components include a description of 34 defects and 104 diagnostic features. These defects cause a change of vibration state of the turbine unit. All defects are divided into two groups: defects that occur during the turbine operation and defects added during turbine mounting and repair. The defects of turbine unit mounting and repair are identified in the analysis of start–stop actions. The defects of operation are revealed in the analysis of turbine component vibration, vibration changes or the relationship of vibration to the turbine operation. A number of connections are used to diagnose the system of steam distribution and turbine regulatory system (see Table 1). On the basis of these relationships the parameters of Table 2 are calculated. 72 K. E. Aronson, et al., Int. J. of Energy Prod. & Mgmt., Vol. 5, No. 1 (2020) When diagnosing turbine regulatory systems, the ES also makes use of the dynamic response of the actuators – servomotors and slide valves. An ES consists of three main interconnected components: a knowledge base, an output mechanism and a database that provides diagnostic solutions, similar to how experts do it. The database is designed to store initial and intermediate data. It contains all the necessary data for diagnosis, obtained by measurements and observations. The knowledge base in the ES is intended to store long-term data describing the area under consideration and the rules that, when applied to the initial data, lead to the solution of the problem. Expert knowledge is presented in the knowledge base in the form of rules [11, 13]. The output mechanism interprets the rules and uses the facts of the knowledge base to solve the problems posed. It makes a diagnosis based on the information contained in the database. Available ES can be divided into three types: logical ES, cause–effect ES and ‘intelligent’ ones. Logical models are presented as tables of faults, which list the signs of possible faults (usually the parameters that exceed beyond the set point). In a broader sense, the emphasis is only set on the principle of tolerance control of parameters, while the description of malfunc- tions can be quite complicated. Cause–effect ES (PSM) are models that reflect the interrelations between the processes and conditions of the diagnostic object in case of malfunction occurrence and development (or the qualitative relationship between the malfunction causes and their consequences). PSM can have a hierarchical structure, which shows at each level in a graphical form the relation- ship of various faults leading to more serious malfunctions. Table 1: Parametric relationships generated in the diagnosis of the turbine automatic regula- tory (TAR) system. № Name 1 Turbine steam flow rate – setting of high-pressure part servomotor 2 Turbine steam flow rate – turbine capacity 3 Settings of high (medium) pressure control valves – settings of high (medium) pressure part servomotor 4 Steam pressure beyond the valves of high (medium) pressure part – settings of high (medium) pressure part servomotor 5 Force of high (medium, low) pressure part servomotor – settings of high (medium, low) pressure part servomotor 7 Force of medium (low) pressure part servomotor – steam pressure in a chamber of process or heating steam extraction 8 Steam pressure in control stage (or first stage) chamber – settings of high-pressure part servomotor 9 Steam pressure in control stage (or first stage) chamber – turbine steam flow rate 10 Settings of steam stop valve autoactuators – oil pressure above the autoactuators’ slide valves 11 Settings of steam stop valve autoactuators – oil pressure below the autoactuators’ slide valves K. E. Aronson, et al., Int. J. of Energy Prod. & Mgmt., Vol. 5, No. 1 (2020) 73 A characteristic feature of ‘intelligent’ ES is, first of all, the form of diagnostic knowledge presentation. As for the fundamental capabilities of these systems, they do not differ signifi- cantly from the capabilities of ES of other types, with the exception of fuzzy logic [14]. The latter approach gives great opportunities in the analysis of complex systems because it allows one to operate with subjective assessments of an expert or user and to employ incomplete or inaccurate information. The program shell of a probabilistic ES includes a universal information processing pro- gram based on the Bayes’ formula taking into account the ‘price’ of each piece of evidence according to the Naylor method [11]. The specific filling of the knowledge base, that is, the formation of its content and the designation of a priori probabilities for the hypotheses and the ‘price’ of evidence, is carried out, as a rule, by the method of expert evaluations employ- ing the specialists dealing with thermo-mechanical equipment of thermal power plants, thus taking into account the specific features of the equipment at a particular station. To diagnose steam turbine subsystem parts, the ES employs various approaches, such as: • for turbine flow part – a correlation and regression analysis of the multi-factor relationship between vibration and regime parameters; • for thermal expansion system – the evaluation of forces acting on longitudinal keys under different temperatures of the left and right sides of turbine cylinder; • for condensing unit – the estimation of the effect of cooling surface fouling as well as of air content in condenser steam chamber on condenser efficiency, optimization of con- denser cleaning period, justification of periods of condenser tube system replacement, etc. Table 2: Main features for diagnostics and adjustment of the TAR system. № Feature name Electrohydraulic system of regulation and protection characteristic, determined from the recorded dependencies 1 Static characteristic of rotation speed control (RS) Degree of frequency response variations (FR) Degree of frequency regulation insensitivity Areas and values of the local non-uniformity of frequency regulation 2 Cam-operated steam distribu- tion device (CSD) and nozzle unit performances CSD technical condition Quality of the control valve setting adjustment Nozzle unit condition Flow part condition (salt fouling) 3 Force margins of regulatory system servomotors Identification of unstable operation areas of auto- matic regulatory system Regulation units lock detection Turbine regime optimization 4 Performances of steam stop valve autoactuators Technical condition of steam stop (safety) valve autoactuators Insensitivity of steam stop (safety) valve autoactua- tors 74 K. E. Aronson, et al., Int. J. of Energy Prod. & Mgmt., Vol. 5, No. 1 (2020) The algorithms of the data analysis for defects causing the vibration are presented below. The indications of these defects are divided as follows: • boundary group where the defect indications are determined by a measured parameter drop outside the normalized limits; • factorial group where the defect indications are determined by an occurrence of a previ- ously unobserved factor; • correlation group where the defect indications are determined by a connection between the vibration and technological parameters. Boundary indications are determined by a measured parameter drop outside the permissible limits that is beyond the zone of partial or full serviceability. As a rule, the boundaries of these zones are designated in the regulations. Factor indications are characterized by a qualitative change in vibration parameters, for example, by the rise of rotational component of vibration in vertical or transverse direction or vibration components with frequency of 2w, 3w, 4w and so on, by abrupt increase in high-fre- quency harmonics, by the emergence of new frequencies in the turbine unit vibration spectrum: frequencies from 500 to 2000 hz point to leakage in regulatory system, whereas frequencies from 1000 to 1050 hz point to backlash in control valves. For the correlation indications, a change is estimated in the coefficient of correlation between the vibration characteristics and technological process parameters. 3 APPLICATION The ES functioning can be described by taking a steam turbine condensing unit as an exam- ple [15, 16]. A knowledge base for the condensing unit contains more than 30 hypotheses and 25 pieces of evidence (or indications); estimation procedures for 20 parameters of state are also specified. Tables 3 and 4 show a sample from the knowledge base. A preliminary list of malfunction hypotheses is set up on the basis of performed inves- tigations, statistical processing of data on equipment damage and literature data. After expert examination, the final list of hypotheses is filled in the knowledge base. The evi- dence list is being set up during the condenser unit operation. The measurement circuit, the results of the condenser unit tests, the operation regimes and maintenance logs are analyzed. Then a priori probabilities of the hypotheses and evidence probabilities are entered in the table of probabilities. In addition, evidence probabilities are entered to detect the fault (to confirm the hypothesis) and not to detect the fault (reject the hypothesis). These probabilities are specified for each evidence. In the first case, the evidence probability is denoted by the superscript (+), and in the second case, by the superscript (–). Table 5 shows an example of probability table for four hypotheses and three evidences. Malfunction evidence listed in Table 5: 1. high water heating; 2. Temperature difference between steam and cooling water outlet exceeds the norm; 3. Condenser pressure exceeds the norm. For hypotheses, the values of the maximal and minimal probability are also evaluated. This permits one to form a justified diagnosis for limiting values of probabilities. K. E. Aronson, et al., Int. J. of Energy Prod. & Mgmt., Vol. 5, No. 1 (2020) 75 Table 3: hypotheses of condenser unit (CU) equipment malfunction. CU equipment Malfunction hypotheses Condenser Tube plates fouling Overpressure in the drain pipe Deterioration of siphon rarefaction Cooling surface fouling on the steam side Cooling surface fouling on the water side Elevated quantity of induced air Incomplete opening of the drain valve Elevated hydraulic resistance in the pressure line Condensate flooding on lower tube rows Level regulator fault Cooling water suction in the steam space Air suction between the condenser and condensate removal pump Water leakage from water ejector into condenser Improper organization of various streams discharge into condenser Steam jet ejector Steam grates or working nozzle clogging Inadequate flow rate of the full-flow condensate entering the ejector cooler Cooler heat transfer surface fouling on the water side Cooler heat transfer surface fouling on the steam side Steam–air mixture recirculation through one of the ejector stages Leaks in the partitions separating the coolers Airmeter or the exhaust pipe clogging high temperature of the full-flow condensate Dead air zones occurence in the drain pipe Leakage in ejector cooler Reduced heat exchange surface of the ejector cooler Circulation path (geodesic height) Changes in the hydraulic regime of water reservoir Skid of coarse gratings with aquatic vegetation and debris Significant fouling of rotating grates due to late cleaning or to wash- ing device malfunction Accumulation of air released by heating water Incomplete opening of the drain valve Water level lowering in the admission chamber 76 K. E. Aronson, et al., Int. J. of Energy Prod. & Mgmt., Vol. 5, No. 1 (2020) During the ES function, it analyzes the equipment operation parameters and then, if there is lack of information, it asks the user a series of questions to make the information more precise. Figure 1 shows the form for evidence processing. The users click the buttons ‘Yes’ or ‘No’ and so indicate the degree of confidence in their answer to the question. The program corrects the a posteriori probabilities of the hypotheses taking into consideration the degree of Table 4: Evidence of CU equipment malfunction. CU equipment Malfunction evidence Condenser high water heating high steam pressure at the condenser inlet Low flow rate of cooling water high cooling water pressure at the condenser inlet Increased temperature difference between steam and cooling water outlet Oxygen presence in the full-flow condensate Condensate overcooling (tc – ts) < 0 high pressure in the cooling water drain pipes Low cooling water pressure at the condenser inlet Low pressure in the pressure line of the circulation pump high condensate hardness Exhaust steam pressure is above the standard (low vacuum) high hydraulic resistance of the condeser high condensate level in the condenser Steam jet ejector Pressure pulsations of steam–air mixture at the ejector inlet and discharge Low pressure in the pipeline upstream of the ejector high pressure of the working steam upstream of the ejector Increased working steam flow rate The sharp increase of the pressure in the suction chamber of the ejector when the exhaust air flow is within the range cor- responding to the design part of ejector performance A number of ejector cooler tubes are gagged high back pressure beyond the last stage of the ejector Water ejection from the exhaust Inlet pressure pulsations at the second and third ejector stages high pressure at the ejector suction Circulation path Unsatisfactory performance of the drain water siphon K. E. Aronson, et al., Int. J. of Energy Prod. & Mgmt., Vol. 5, No. 1 (2020) 77 confidence in the user answers. The number of such questions corresponds to the number of evidence in the database of the ES leaving the evidence already processed out. 4 EXAMPLE As an example, it is here suggested to consider the diagnostics of ejector malfunction. It is known that ejector is an auxiliary equipment with a very high influence on the turbine. Ejec- tor diagnostics is performed upon request. Most parameters (signs) are defined manually in case of usual lack of measurements. The other point is a difficulty of automation; some parameters such as knocking, cooler overflowing and others are qualitative, not quantitative. The ejector state is defined in the operation process. For better diagnostics, there is need to provide ejector tests. Also some of the defects could be found only when dissembling the ejector. The initial data for this task are presented in Table 6. In the ES, the defects (malfunctions) that can occur in the ejector, as well as a priori prob- abilities of these defects (P(h)), are entered as listed in Table 7. Table 5: A sample of the probability table. № Hypothesis A priori proba- bilities Evidence probability for detection (non-detec- tion) of failure (hypothesis) 1+ 1– 2+ 2– 3+ 3– 1 Condenser tube plates fouling 0.5 0.8 0.2 0.005 0.005 0.5 0.005 2 Overpressure in the drain pipe 0.5 0.8 0.2 0.05 0.05 0.5 0.05 3 Deterioration of siphon rarefaction 0.6 0.1 0.05 0.05 0.1 0.05 0.2 4 Elevated quantity of induced air 0.7 0.05 0.05 0.7 0.05 0.7 0.05 Figure 1: Evidence processing. 78 K. E. Aronson, et al., Int. J. of Energy Prod. & Mgmt., Vol. 5, No. 1 (2020) Each malfunction of the ejector has a list of signs (evidence) that correspond to it; this list is given in Table 8. As an example, the probabilities of evidence (malfunction signs) in the presence of a spe- cific defect (P (E: h)) and in the absence of the defect (P (E: not h)) are presented in Tables 9 and 10, respectively. 5 CONCLUSIONS An ES of probability type is presented for diagnostics and state estimation of components of steam turbine technological subsystems. The knowledge base is made up for rotors, bearings, turbine automatic control and protection system and for other components of the turbine unit, condensing unit equipment and other technological subsystems. The ES permits one to diagnose the condition of various subsystems and components of the turbine unit, to troubleshoot the equipment components and to formulate recommenda- tions about the ways and terms of defect elimination and of reducing the risk of their Table 6: Initial data for diagnostics. № Name Designation Units Notes 1 Condensate temperature at condensate pump input tc °С 2 Condensate temperature at ejector output t2c °С Manual input 3 Ejector motivating steam pressure Рm kg/cm 2 Manual input (indicator) for ejectors A and B 4 Dry air ejector flow rate Ga kg/h Determined by the ejector test results Table 7: List of defects. № Defect name A priori proba- bility, P(H) 1 Low motivating steam consumption 0.1 2 Inadequate flow rate of the main condensate 0.15 3 high temperature of the main condensate 0.15 4 Reduced heat transfer surface (high tube contamination or a large number of tubes plugged) 0.05 5 Leakage of partitions in the steam space 0.15 6 Tube system leaks 0.15 7 Drainage clogged 0.25 8 Flow path wear 0.1 K. E. Aronson, et al., Int. J. of Energy Prod. & Mgmt., Vol. 5, No. 1 (2020) 79 development. To do this, the system employs the experience of experts. Information from the ES can be used to adjust the turbine operation regimes and to optimize the amount and timing of equipment repair. Table 8: List of signs. № Evidence (sign) name Designation Evidence evaluation algorithm 1 high heating of the main condensate in the ejector Δtej t2c – tc > 5°C 2 Low motivating steam pres- sure in the ejector Рmf Рm < 13 3 Water hammering in ejector housing h — 4 Ejector steaming Est A steam–air mixture column is visible from the ejector exhaust 5 Ejector capacity is low Gaj Ga < 80 kg/h 6 Ejector stage flooding Z When tapping the ejector housing, a dull sound is heard (the housing is filled with main condensate) 7 high pressure at the inlet of the ejector first stage at zero air flow Pas Pas0 > 1.5 kPа Table 9: Probabilities of evidence (malfunction signs) in the presence of a specific defect (Р(E:h)). № of malfunction hypothesis 1 2 3 4 5 6 7 8№ of evidence 1 — 0.9 — — 0.1 0.05 — — 2 0.9 — — — — — 0.1 — … … … … … … … … … 7 0.9 0.5 0.9 0.1 — — — 0.95 Table 10: Probabilities of evidence (malfunction signs) in the absence of a specific defect (Р(E:not h)). № of evidence 1 2 3 4 5 6 7 8 1 — 0.077 — — 0.22 0.23 — — 2 0.12 — — 0.00 0.00 0.00 0.23 — … … … … … … … … … 7 0.46 0.50 0.43 0.60 — — — 0.39 80 K. E. Aronson, et al., Int. J. of Energy Prod. & Mgmt., Vol. 5, No. 1 (2020) NOMENCLATURE CSD – cam-operated steam distribution device CU – condensing unit ES – expert system STU – steam turbine unit AV – adjusting valve TAR – turbine automatic regulatory system RS – rotation speed FR – frequency response tc, ts – condensate temperature, saturation temperature Рm – pressure of primary steam Ga – «dry» air flow rate Δtej – heating of the main condensate Gaj – ejector capacity with ‘dry’ air Pas – pressure at the inlet of the ejector first stage at zero air flow h – water hammering in ejector housing Est – ejector steaming Z – ejector stage flooding REFERENCES [1] Mikhailov, V.E., Khomenok, L.А., Sudakov, A.V. & Obukhov, S.G., On complex di- agnostics and examination of the state of equipment of thermal power plants and hydro- power plants. Reliability and Safety of Energy, 2(9), pp. 9–14, 2010. [2] Andryushin, A.V., Polushkina, E.N. & Shnyrov, E.Yu., Development of the mainte- nance service system in TGK and OGK after the completion of industry restructuring processes. Thermal Engineering, 1, pp. 69–73, 2010. [3] Aronson, K.E., Brodov, Yu.M. & Novoselov, V.B., Development of a technical condi- tion monitoring system for a cogenerating steam turbine equipment. Thermal Engineer- ing, 12, pp. 65–68, 2012. [4] GOST 20911-89. Technical Diagnostics: Basic Terms and Definitions, Publishing house of Standards: M., 14 p., 1990. [5] Uryev, E.V., Fundamentals of Reliability and Technical Diagnostics of Turbomachines, USTU: Ekaterinburg, 71 p., 1996. [6] hayet, S.I., Aronson, K.E., Brodov, Yu.M. & Shempelev, A.G., Development and test- ing of system elements for status monitoring and diagnostics of a steam turbine con- denser. Thermal Engineering, 7, pp. 67–69, 2003. [7] Kovalev, N.A., Algorithms development for the operation and recognition of defects for automatic system of vibration diagnostics. CKTI Proceedings, 19, pp. 27–33. [8] Mirzabekov, A.M., haimov, V.A. & Khrabrov, V.P., Optical diagnostic system for erosion damage of the input edges of steam turbine blades. Thermal Engineering, 4, pp. 52–56, 1991. [9] Panov, E.V. (ed.), Artificial Intellect: Directory, Radio & Communication: М., 1, 461 pp., 1990. [10] Bashlykov, А.А., Expert system architecture to back up decision-making processes in fault diagnosing of thermal power station heat exchanging equipment (SPRINT). Col- lected volume: An Expansion of Intellectual Abilities of ACS, ed. А.А. Bashlykov, Ener- goatomizdat: М., pp. 5–8, 1989. K. E. Aronson, et al., Int. J. of Energy Prod. & Mgmt., Vol. 5, No. 1 (2020) 81 [11] Naylor, C., Build your own Expert System, Energoatomizdat: М., 286 pp., 1991. [12] Brooking, А., Johns, P., Forsyth, R. & Cox, F., In: Expert Systems: Principles and Case Studies, ed. R. Forsyth. Radio & Communication: М., 191 pp., 1987. [13] Jackson, P., An Introduction to Expert Systems, Williams Publishers: М., 624 pp., 2001. [14] Perminov, I.A., Orlick, V.G. & Gordinsky, A.A., Flow path state diagnostics for steam turbines of high capacity with the use of power plant computational complexes. CKTI Transactions, 273, pp. 58–61, 1992. [15] Brodov, Yu.M., Aronson, K.E. & Nierenstein, M.A., The concept of diagnostics system for condensing steam turbine unit. Thermal Engineering, 7, pp. 34–38, 1997. [16] Khaet, S.I., Aronson, K.E., Brodov, Yu, M. & Shempelev, A.G., Development and test- ing of the monitoring system elements for condition control and diagnostic of steam turbine condenser. Thermal Engineering, 7, pp. 67–69, 2003. work_3ahleve2qbbjjgsmjg73bbl33a ---- Chapter 1: Introduction Page 1 of 27 Performance of supply chain collaboration - A simulation study Abstract In the past few decades several supply chain management initiatives such as Vendor Managed Inventory, Continuous Replenishment and Collaborative Planning Forecasting and Replenishment (CPFR) have been proposed in literature to improve the performance of supply chains. But, identifying the benefits of collaboration is still a big challenge for many supply chains. Confusion around the optimum number of partners, investment in collaboration and duration of partnership are some of the barriers of healthy collaborative arrangements. To evolve competitive supply chain collaboration (SCC), all SC processes need to be assessed from time to time for evaluating the performance. In a growing field, performance measurement is highly indispensable in order to make continuous improvement; in a new field, it is equally important to check the performance to test conduciveness of SCC. In this research, collaborative performance measurement will act as a testing tool to identify conducive environment to collaborate, by the way of pinpointing areas requiring improvements before initializing collaboration. We use actual industrial data and simulation to help managerial decision-making on the number of collaborating partners, the level of investments and the involvement in supply chain processes. This approach will help the supply chains to obtain maximum benefit of collaborative relationships. The use of simulation for understanding the performance of SCC is relatively a new approach and this can be used by companies that are interested in collaboration without having to invest a huge sum of money in establishing the actual collaboration. Key words: supply chain collaboration, simulation, performance measurement, CPFR 1. Introduction Supply chain management (SCM) organizes and manages the whole process of activities of supply network from suppliers through manufacturers, retailers/wholesales till end users (Christopher, 1998). Traditionally, supply chain (SC) was designed with more focus on movement of materials rather than information flow. Due to ever increasing competition in businesses, many SCs have taken some twists from traditional way of functioning, from time to time, to adapt to the situation. Existing literature describes the SCM of the 21st century as an integrative value adding process of planning and controlling of materials and information between the supplier and the end user in order to increase customer satisfaction by reduced cost and improved services (Cooper et al., 1997). In today’s competitive unpredicted business world, cost reduction and good customer services are not stand-alone effort of any single SC member. As success of any product lies in customers' response to that product, it is important for businesses to achieve customer satisfaction by having efficient and Page 2 of 27 effective SCs. This may be possible through collaboration among SC partners. Hence, it is important to coordinate SC activities to streamline planning, production and replenishment (Ramanathan, 2012a). Market demand and changing nature of end-users can create more opportunities for SC players. At the same time, to be viable in a competitive market, all SC members need to be innovative and productive (Lee, 2002). As operating alone in a tight competition seem to be no longer beneficial for SCs, the importance of partnership has been adopted in various stages of many SCs (Samros, 2007). In the past, several SCM practices such as Vendor Managed Inventory (VMI), Efficient Consumer Response (ECR), Continuous Replenishment (CR), and Electronic Data Interchange (EDI) have been suggested in the literature to increase benefits of SCs. VMI technique was developed in the mid 1980’s, in which customer’s inventory policy and replenishment process were managed by the manufacturer or supplier. However, SC visibility was not predominately powerful in VMI to avoid bullwhip effect (Barratt and Oliveira, 2001). Forecast driven VMI and integration of CR with EDI was used to reduce the information distortion in VMI. ECR developed in 1992, was based on the concept of value adding by all partners in the supply chain. Both VMI and EDI together with ECR tried to create more responsive supply chain with broader visibility of information across the whole SC. Ever increasing SC demands have led to the invention of Collaborative Planning Forecasting and Replenishment (CPFR), another supply chain management tool incorporating planning, forecasting and replenishment under a single framework (Fliedner, 2003). CPFR, a second generation ECR (Seifert, 2003) aims to be responsive to consumer demand. It was introduced as a pilot project between Wal-Mart and Warner-Lambert in mid- nineties. According to VICS (2002), CPFR is a new collaborative business perspective that combines the intelligence of multiple trading partners in the planning and fulfilment of customers demand by linking sales and marketing best practices. Collaboration among SC members is a topic of interest for many researchers and practitioners (Barratt and Oliveira, 2001; Danese, 2007; Nyaga et al., 2011; Ramanathan, 2012a). Simatupang and Sridharan (2004) evolved four profiles for supply chain collaboration (SCC), namely efficient, synergistic, underrating and prospective collaboration. They proposed decision synchronization, incentive alignment and information sharing as three performance indices. In an attempt to maximize benefits of SCs, all SC members share information (data sharing) and collectively forecast the demand for products to have effective replenishment process (Aviv, 2007; Gavirneni et al, 1999). SCC activities help to improve the performance of involved members in a structured framework with the aim of maximizing profit through improved logistical services (Stank et al., 2001). However, majority of the articles in the literature have not highlighted important factors of good SCC practice. In this paper, we will be analysing the environments conducive to initiate SCC such as CPFR. The focus of this research is to identify the Page 3 of 27 suitable environments to collaborate in SCs. Revealing the actual benefits of SC collaboration with certain number of partners with specific level of investments for a specified period will help to make decision on implementing SCC at various levels. This is further explained through evidence from the existing literature in the next section. The rest of the paper is organised as follows: Section 2 will briefly explain the existing literature on SCC. Section 3 will describe research methodology used in this research. Section 4 explains the development of performance measurement of supply chain collaboration. Section 5 will discuss the results and analysis of simulation. Finally, Section 6 will conclude the paper with key findings, managerial implications, limitations and future work. 2. Supply chain collaboration for performance improvement: A Literature review SCM is being practiced by many businesses around the globe and hence it has a great wealth of literature from time of evolution of business processes. But, SCC is a relatively new research area and the literature is growing at a tremendous pace. Various advantages and disadvantages have been revealed by academics and practitioners. This section discusses some of the advantages and barriers of SCC. On realizing the importance of collaborative efforts in SCs, many researchers have developed theoretical and mathematical models to improve the structure and functionality of SCs. 2.1. Advantages of SC collaboration In the field of SCM, there is an overlap in the meaning of cooperation, coordination, collaboration, joint action plan and partnership, representing more or less the same concept (Yu et al., 2001; Corsten and Felde 2005). However, CPFR is specifically defined as a web-based attempt (Fliedner, 2003) or internet tool to coordinate the various supply chain activities such as forecasting, production and purchasing in SCs to improve the visibility of consumer demand (Barratt and Oliveira, 2001), to reduce any variance between supply and demand (Steermann, 2003). Caridi et al. (2005) viewed CPFR as a process of correcting, adjusting, proposing prices and quantities to reach an agreement on common unique forecast that can be used by buyers and sellers. VICS (2002) claimed that CPFR would help cost savings and gain competitive advantage. Several case studies have been reported in literatures that have examined the impact of collaboration (see www.ecch.com and ECR Europe, 2002). In SCCs, through joint planning and decision making, the understanding of the replenishment process is becoming clearer (Barratt and Oliveira, 2001). For example, Wal-Mart’s initiative of creating profile on purchase pattern of customers, namely ‘personality traits’, has helped to increase visibility of demand throughout the value chain (Mclvor et al., 2003). Information exchange and demand forecast http://www.ecch.com/ Page 4 of 27 based on sales data helped ‘Sport Obermeyer’ to improve forecast accuracy during demand uncertainty (Fisher 1997). In recent years, many academics and practitioners have suggested using collaborative arrangement to improve SC performance. Ramanathan and Muyldermans (2011) used structural equation models to identify underlying demand factors of soft drink sales in collaborative supply chains. They suggested using those factors for demand forecasting. Cheung et al. (2012) used actionable quantitative information from a number of upstream and downstream partners in developing knowledge-based system in supply chains. They have used simulation experiments to test SC models. Ramanathan and Gunasekaran (2013), Nyaga et al., (2011) and several other researchers insisted the importance of transparent information sharing, joint efforts and investments to improve trust and commitments in SCCs. Any SC can improve visibility using five important factors namely responsiveness, planning, shared targets, trust and common forecast (Barratt and Oliveira 2001). Real benefit of information sharing among SC partners lies in its effective and efficient use (Lee et al., 2000; Raghunathan, 2001) and it is also supported by proper use of Information Technology (IT) (Sanders and Premus, 2005; Cachon and Fisher, 2000). From the cases of Wal-mart and P&G, it is understandable that the use of various IT platforms is based on the scale of operations. 2.2. Barriers of SC collaboration Barriers of SC collaboration can be broadly classified under two categories: organisational and operational. Smaros (2007) argued that lack of internal integration (organisational barrier) would be a great obstacle for manufacturers to efficiently use demand and forecast information (operational barrier). Sometimes behavioural issues within organisation may also lead to failure of collaborative relationships. Fliedner (2003) considered lack of trust, lack of internal forecast, and fear of collusion as three main obstacles to implement collaboration. Boddy et al. (1998) identified six underlying barriers for partnering: insufficient focus on the long term, improper definition of cost and benefit, over reliance on relations, conflicts on priority, underestimating the scale of change and turbulence surrounding partnering. Use of technology and levels of information exchange in SCs have been discussed in the literature as both the advantage and the disadvantage (Cadilhon and Fearne, 2005; Sanders and Premus 2005; Samros, 2007). Occasionally, even a basic level of information exchange will yield potential benefits to businesses. For example, Metro Cash & Carry Vietnam is a German-owned business to business grocery wholesaler successfully engaged in collaboration with a disarming degree of simplicity. The company shares information among SC partners using telephone calls and fax machine without much sophisticated IT (Cadilhon and Fearne, 2005). The case of Metro Cash & Carry clarifies that free access Page 5 of 27 to available data is imminent in SCCs for planning and forecasting. But technology may not be a barrier for the success of collaboration (Cadilhon and Fearne, 2005; Smaros, 2007). This argument on technology totally disagrees with the basic concept of CPFR, which is a web-based attempt to coordinate the various activities among supply chain partners (Fliedner, 2003). Though information sharing and the role of IT were commonly accepted as significant phenomena in SCC (Sanders and Premus 2005), the use of technology is not argued widely as a necessary condition for collaboration; this is mainly because the technology used in CPFR varies widely across different CPFR cases (Danese, 2007). Also, due to availability of wider variety of technology and tools, proper technology selection becomes a complicated task for collaborating partners. To handle this issue, Caridi et al., (2005) proposed a new ‘learning model’ to incorporate intelligent agents to CPFR to measure performance of SCs at different collaborative environments. Barriers of partnering could be avoided through supplier training programme (Smith, 2006) and identifying opportunities to increase scope (Boddy et al., 1998). Continuous efforts of academics and practitioners to improve SCC have helped creating many models of SCs. 2.3. Models in SC collaboration In general, the nature of complexity is instrumental in the development of models at various levels of SCCs. Also due to increase in SC dependencies, SCC requires different combination of tasks and resources (Simatupang and Sridharan, 2004). For instance, CPFR business model is based on experiences of practitioners and strategies of their business development process (Ireland and Crum, 2005). Though, the basic structure of CPFR model has been accepted by many practitioners, it is also commonly agreed by many that some value addition to the existing model, depending on the industry implementing CPFR, will make SCCs responsive to market changes (Smith 2006; Chung and Leung 2005). Theoretical model developed by Corsten and Felde (2005) is related to the impact of trust (Humphreys et al., 2001), dependence, supplier collaboration on innovation, purchase cost reduction and financial performance. They established that supplier collaboration and the level of trust have positive impact on innovation and success of SCs. In literature, many conceptual frameworks are designed to explain the organizational and functional aspects of SCC whereas mathematical or simulation models are focussing mainly on the performance evaluation. Examples of SC models, suggested in the literature after the development of CPFR framework (mid-nineties), are given in Table 1. Aviv (2001) compared the effect of collaboration in two different set-up: one with centralized information and another with decentralized information. Based on uncertainty measure he concluded that diversified forecasting capabilities can improve the benefits of collaborative forecasting; in other words forecasting accuracy is strongly dependent on the collaborative strength. Page 6 of 27 Lee et al., (2000) developed a model to verify value of demand information sharing especially when demands are correlated significantly over a period of time. In a counter argument, Ragunathan (2001) emphasised the importance on effective use of available internal information for forecasting in comparison to investing on inter-organizational information system for information sharing in the case of non-stationary demand. Only a few studies exist in the literature on the performance analysis of SCC using simulation. Kim and Oh (2005) used system dynamics model to identify the performance of collaborative SCs in three different scenarios: manufacturer dominated SCs, supplier dominated SCs and balanced decision making. The authors identified that the balanced SCC will yield high benefits. Angerhofer and Angelides (2006) created a system dynamics model to evaluate the performance of supply chain management. The impact of six constituents - stakeholders, topology, levels of collaboration, enabling technology, business strategy and processes, were tested on SCs to measure the performance. Chang, et al., (2007) introduced an idea of augmented CPFR (A-CPFR) as an improvement to existing CPFR model with access to market information through application service provider. The authors tested its forecast accuracy through a simulation model. In a recent paper, Ramanathan (2012a) used AHP model to compare performance of two companies based on use of SC information. The author concluded that the companies using frequent information exchange among SCs can be benefited with continuous improvement in planning and forecasting. Table 1: Some existing models in SCC Author Type of model Key concept Simulation models Cheung et al. (2012) Knowledge-based model The model helps to formulate long-term successful SC partnerships. Chan and Zhang (2011) Collaborative transportation management The model helps to identify the potential benefits of collaboration in transportation. Chang et al. (2007) Verification of forecast accuracy (Augmented CPFR), with application of service provider, will have access to market information and hence can improve forecast accuracy and achieve considerable reduction of inventory. Angerhofer and Angelides (2006) Performance measurement The model helps to identify the areas need improvement by measuring the performance of the supply chain Kim and Oh (2005) Performance measurement The model tests impact of different decision making process in collaborative supply chain performance. Fu and Piplani (2004) Evaluation of supply-side collaboration Supply-side collaboration can improve the distributor’s performance. Optimisation and mathematical models Sinha et al. (2011) Optimisation model The model helps to improve the performance of petroleum supply chain. Page 7 of 27 Author Type of model Key concept Aviv (2001) Mathematical model for forecasting Products with shorter lead time have more benefit from supply chain collaboration. Aviv (2007) Mathematical model for forecasting Dominance or power of partnership, agility of the supply chain and internal service rate affect the benefits of collaborative forecasting. Aviv (2002) Mathematical model for joint forecasting and replenishment Auto-regressive demand process can decrease the demand uncertainty in VMI and CFAR (Collaborative Forecasting and Replenishment) programmes. Chen and Chen, (2005) Mathematical model for joint replenishment Developed four decision making models to determine optimal inventory replenishment and production policies in a supply chain considering three-level inventory system in a two echelon supply chain; Model also included major and minor set-up cost for manufacturers, and major transportation and minor processing cost for the retailer. Raghunathan, (2001); Lee et al. (2000) Mathematical model Inventory reduction and cost reduction can be achieved with efficient use of information sharing (Lee et al, 2000) and there is no need to invest in inter- organizational systems for information sharing if order history is available (Raghunathan, 2001). Mishra and Shah (2009) Structural equation model New product development will benefit from collaborative effort of supplier and customer, and cross functional involvement. Nagya et al. (2011) Structural equation model Impact of collaborative efforts in overall satisfaction Ramanathan and Muyldermans (2010;2011) Structural equation model Impact of demand information in collaborative forecasting Ramanathan and Gunasekaran (2013) Structural equation model Impact of SC collaboration in success of long term partnership Ramanathan (2012a) AHP model Role of SC information in company’s decision making Other models Shafiei et al (2012) Multi-enterprise collaborative decision support system The model helps decision makers to explore various options of solutions under what-if scenarios. Singh and Power (2009) Structural Equation Model Firm performance will increase if both supplier and customer are involved in collaborative relationship. Kwon et al. (2007) Multi-agent model The model helps to provide flexible solutions to address SC uncertainties. Caridi et al. (2005) Multi-agent model Mutli-agent system can be used to automate and optimise supply chain collaboration. Chung and Leung (2005) An improvement to CPFR model Inclusion of ‘Engineering change management’ increases the responsiveness to market changes. Simatupang and Sridharan ( 2004) Collaborative performance system Collaborative enablers are directly linked with collaborative performance metrics. Four types of collaboration identified: Efficient, underrating, prospective and synergistic. Stank et al. (2001) Logistical service performance model Collaboration with external supply chain partners along with internal support will improve logistical services. McCarthy and Golicic (2002) Collaborative Increased revenues and earnings are possible with Page 8 of 27 Author Type of model Key concept forecasting model SCCs. Lambert and Pohlen (2001) Conceptual model Developed a framework with following seven steps: supply chain mapping, identifying value addition process, identifying the effect of relationship on profitability, realign supply chain processes accordingly, measure individual performance, compare value with supply chain objectives, replicate steps at each link in the supply chain 2.4. Performance measurement of SC collaborations Models in SC collaborations are mainly classified under two categories: performance measurement models and decision making frameworks. Some models are supported with mathematical/empirical evidence (Angerhofer and Angelides, 2006; Kim and Oh 2005; Forslund and Jonsson 2007), and other models are purely conceptual in nature (Chen and Paulraj, 2004; Simatupang and Sridharan, 2004). In general, these two types of models are interrelated to each other in their way of functioning with respect to cause and effect. For example, performance measurement will lead to decision making process and decisions will lead to improve future performance. The main purpose of measuring the performance of SC network is to identify the problems in order to improve the SC efficiency and also to identify the conduciveness of collaboration. Many researchers conducted a detailed study on performance measurement of SC network based on cost and service level (Lee and Padmanabhan, 1997). But in SCCs, communication technologies such as information exchange and proper use of data are of high importance to the success of collaboration (Danese, 2007). Hence, measuring the proper use of technology and information are also becoming important in SCCs. Some researchers developed theoretical frameworks to measure the performance using balanced score card with many performance perspective measures (Chen and Paulraj, 2004). But a very few researchers initiated benchmarking of SCs (Simatupang and Sridharan, 2004; Ramanathan el al., 2011). Evidences from the literature confirm that key measures for evaluating SC performance include cost, quality and responsiveness. In recent literature, forecast accuracy is also used as an indicator of proper use of information in SCCs (Ramanathan and Muyldermans, 2010). Meanwhile, lack of information exchange will result in greater variability of demand forecast for upstream SC members (Yu et al., 2001), which is the clear indication of SC problem. Chen and Paulraj (2004) tried to create a conceptual framework to understand problems and opportunities associated with SC management. As there are many dimensions for SCCs, the performance measurement is also becoming a complicated process. Verifying whether the environment is conducive to SCC will help the companies to Page 9 of 27 identify the areas to be modified before implementation. This was partly answered from the findings of Aviv (2001) and Smaros (2007). Aviv’s (2001) confirmed that the products with short lead time could achieve better forecast accuracy compared to the products with long lead time (Smaors, 2006). Danese (2007) through several case studies across SC networks such as manufacturers, customers and suppliers, identified that different levels of collaboration exist in SCs and the benefits attached to each level will differ. Based on the analysis of these case studies, Danese (2007) classified the degree of collaboration as low, medium or high. Ramanathan (2012a) compared two case companies performance on demand planning and forecasting and suggested three different levels of collaborations in SCs, namely preparatory level, progressive level and futuristic level. However, not many articles have discussed the benefits of SCC in terms of the number of partners, investment and duration of partnerships. Most of the studies discussed above have confirmed the role of supply chain partners and their involvement in SC performance and profit. However, there is no specific study that discusses in detail the role of investment, the number of partners or the duration in collaborative partnerships. To fill this gap, in this paper we use the combination of all these three elements in SC collaboration. In order to find environment conducive for SCC, based on the literature and the actual practices in SCs, we propose in this study that the degree of collaboration will depend on factors namely the investment on collaborating technology and partnerships, the number of collaborating partners and the duration of collaboration. We attempt to develop a performance model for SCC using a well-known methodology called simulation in the following sections. 3. Research methodology Performance evaluation of SCC is a complex task and research on this topic is still in its infancy. We make an attempt to quantify the benefits of SCCs through the factors discussed above. The choice of methodology is most important to identify the correct solution to a particular research problem (Yin, 1989). Case study based simulation is being used in this research. Case study research will be beneficial to understand the role of above specified five factors in performance of SCC. Basic information such as duration of collaboration, the level of investments and the number of partners from the case companies will be simulated to create similar scenario. For this purpose, we have chosen two case companies from the packaging industry. In this paper, we have used simulation to identify the performance of SC collaboration based on the factors of SCC. To initialize the process of simulation, basic mathematical approach is used as outlined above in Section 3. All the measures are converted in terms of ratio to avoid using mixed units. Generally, rhw ratio of input to output is described as a performance indicator. Simulation will support analysing collaborative performance on supply chains for changing degrees over the collaborating period. Page 10 of 27 This what-if analysis will be instrumental in decision making on implementation of collaborative supply chain (Angerhofer and Angelides, 2006). The advantage of using simulation is that an existing or proposed system can be designed using what if analysis in order to optimize the benefits by identifying the pitfalls in the system. Some researchers attempted to use system dynamics simulation for what-if analysis (Kim and Oh, 2005; Angerhofer and Angelides, 2006; Chang et al., 2007). In this research the purpose of what-if analysis is to identify the conducive environment to implement SCCs. Schematic projection (see Figure 2) of research methodology can further simplify the understanding of SC performance. This research intends to establish links among all the coordinating factors of collaboration. Creating links with different modules will in turn be powerful to identify a weaker node which needs improvement. Traditionally, performance of supply chain is measured through demand amplification (Angerhofer and Angelides, 2006) and value additions in each node of supply chain. But in case of collaborative SCs, the value addition is not an independent activity and hence composite performance indicator is used to measure performance of collaborative supply chain. If SC handles product returns then the performance should include inventory management and disposition of the returned goods. We have considered five important factors of SC collaboration for our further analysis; namely, degree of SC collaboration, business objectives – operational and financial, information sharing and SC processes. We have categorised the SC performance as financial and non-financial. Non-financial performance of SC is measured through operational business objectives, SC processes and information sharing (see Figure 2). Page 11 of 27 Figure 2: Schematic projection of methodology 4. Development of performance measures for supply chain collaboration Though SC is a widely researched area, it needs a strong framework (Chen and Paulraj, 2004) for development of more systematic principles that will help SCs to develop against all odds and barriers. In recent business world, many companies collaborate for different purposes such as logistics, cost reduction and business expansion. Such SCC necessitates some value addition to business objectives along with the original SC operations models (ECR Europe, 2002). Also information sharing is critical in modern SCs to meet fluctuating demand (Ramanathan, 2012a & b). In the literature, degree of collaboration is not linked with performance of SCs in an effective way (Danese, 2007; Larsen et al., 2003; Ramanathan, 2012a). In this research based on the literature and actual SC practices in recent businesses, we consider five important factors of collaboration namely business objectives - financial and operational, supply chain processes, information sharing and degree of collaboration. 4.1. Business Objectives – Financial (BOF) Now-a-days, many businesses are striving to maximize profit by improving the quality of products and services to the end users by lowering the cost. Many leading companies such as Wal-Mart and Procter & Gamble use SCC to achieve this objective. VICS (2002) claims that CPFR will help cost Define: Degree = f(N,L,T); here L = f(I) ; IS = f(FA) BOO = f(CU,LT); BOF = f(Rv, HC, SC); Pr = f(Ap,P,R,D) Define performance in terms of above defined metrics For Dg = ‘1 to x’ period Calculate ISDg, BOODg, BOFDg, PrDg Collaborative performance (non-financial) = BOODg + PrDg +ISDg Collaborative performance (financial) = BOFDg N - Number of collaborating partners L - Level of collaboration I – Investment on collaboration T – Time (duration) of collaboration IS - Information Sharing FA - Forecasting Accuracy BOO- Bus.Obj. Operational CU - Capacity Utilization LT- Lead Time Rv-Revenue HC – Holding Cost SC – Stock out Cost Pr – SC processes BOF – Bus Obj.Financial Ap-Adherence to plan Ad-Adherence to delivery plan P– No. produced R – No. returned D- No. Delivered (No. sold to wholesaler/Retailer) Analyse the performance at various degrees to identify the conduciveness Define variables N, L, T, I, FA, CU, LT, Rv, HC, SC, Ap, P, R, D Page 12 of 27 savings and gain competitive advantage. Commonly SC collaboration is initiated among various SC members to meet customers’ needs, to improve product availability, to increase business performance, to increase sales, to achieve reduced cost, to increase revenues and earnings, to improve forecast accuracy, to increase visibility of demand (McCarthy and Golicic, 2002; Cooke, 2002; Ireland and Crum, 2005; Ramanathan et al., 2012b). Cost savings such as minimizing the logistics cost can possibly be one of the most important drivers of collaborations (Corsten and Felde, 2005; Chen and Chen 2005). Chen and Chen (2005) developed a mathematical model for joint replenishment in the process of reducing cost. For example, Ace Hardware’s CPFR pilot project earned a positive result in forecast accuracy from 80 to 90 percent and product costs dropped from 7 to 2.5 percent (Cooke 2002). In many cases the SC collaboration proved to be a promising tool to increase business performance, sales, revenues and earnings (McCarthy and Golicic, 2002; Cooke, 2002). In our research, sales revenue and costs involved in production will be used to quantify financial business objectives. In general, cost involves fixed cost and variable costs such as production cost, stock out or holding cost. Other hidden variable costs are not included for the purpose of calculations.             T j BOF 0 cost Variablecost Fixed cost holdingor Stockout - cost) productionUnit produced (No. price salesUnit returns) No.ofsales of (No.           T j 0 jjjj cost Total OCPC]P[SP])R[(D Here D – No. delivered (i.e., sold to retailer) R- No. returned I – Current inventory SP- Selling Price PC- Production Cost P-No. Produced OC- Other Cost (Holding cost or stock-out cost) Variable cost = Production cost + Holding cost or stock-out cost          DI SC,R)I(D SC D I HC,D)-R(I HC OC HC- holding cost SC- stock-out cost Stock-out cost or penalty cost is usually calculated for retailers but not for manufacturers (Aviv, 2007). Based on our interview with the case companies, we assume that manufacturers will also incur Page 13 of 27 penalty cost for not completing production on time to facilitate on time delivery; this is similar to stock- out cost of retailer. 4.2. Business Objectives – Operational (BOO) Customer retention is becoming a great challenge in current competitive business market. Improved business performance through SCC can help to attract and retain customers (Matchette and Seikel, 2004). Customer loyalty can also be built by effective SC activities. For example, making stock of right products available at right time in proper location of retail stores will help to attract and retain customers. This can be achieved through a wider cooperation from all SC members. For instance, efficient capacity utilization can help reducing production time (Aviv, 2007). Customer loyalty can also be achieved if SC activities include customer service such as accepting and handling product returns (Dowlatshahi, 2000). From the literature, we have considered three important factors namely number of product returns, product lead time, and capacity utilization (production capacity) to measure the business objectives (Aviv, 2007; Dowlatshahi, 2000). Capacity utilization P μPμP CU n,nn,n 22 )( PPC )PC.(      where nnP , ≠ μ (Aviv, 2007) assumed capacity utilization as the product of cost of production and square of the difference between production batch size for period n and average production size) Assume if nnP , = μ , Capacity utilization is 100% PC—Product cost P-Number of items produced nn P , - Production batch size suggested for the next n periods at the beginning of period n. μ -Average production size Reduced Return rate, D R RR  1 R- Number of returns D – Number delivered Adherence to production plan will reflect in the reduction of product lead time or production time (Aviv, 2007). Adherence to production plan (AP): Page 14 of 27 jnn jn P PP AP   , , 1 Here, Production plan         T j Tj0 , ,1, ,  Tnn jnnjnn jn P PP PP n - Current period and Tn 1 ; jn PP , - production plan at period ‘n’ for period ‘j’ Hence, operational business objectives can be quantified as follows: BOO = %100 APRRCU =                  jnn jnn,n P PP D R P μP , , 2 11 )( 4.3. Supply chain processes Supply chain operations reference model (SCOR) classified processes as plan, source, make, deliver and return. Based on type of products and market value, length or degree of collaboration will differ (Ramanathan, 2012b). Products with long production cycle time takes more time to reach the market, while product with short production cycle time takes less time. Though collaboration in SCs can help to sell all products with variable lead time, products with shorter lead time have more benefit in SC collaboration (Aviv, 2001). In this research, we assume that the availability of raw material (source) is not difficult and accordingly, we consider four processes namely plan, produce, replenish and return. In SCCs planning stage will include forecasting as its integral part and hence forecasting is not treated as a separate process. SCs with activities of product returns need to check the inventory level and to arrange a proper disposition for the product returns (Dowlatshahi, 2000). In this case, performance of collaborative processes is a collective measure of cost function of adherence to plan and cost of inventory. Production plan         T j Tj0 , ,1, ,  Tnn jnnjnn jn P PP PP Adherence to plan cost function cost Production ).( )( 2 , 0 , jn T j jAP nAP PPC PPC    (based on Aviv, 2007). Product returns will increase the level of current inventory. In SCs with product returns, inventory holding cost can be quantified as follows: D I HC,D)-R(I HC  Here D – No. delivered (i.e., sold to retailer) R- No. returned Page 15 of 27 I – Current inventory HC- Unit holding Cost Performance of collaborative SC processes can be calculated as 4.4. Degree of collaboration Previous case study research by Danese (2007) identified different levels of collaboration such as basic communication, limited collaboration and full collaboration. Larsen et al., (2003) and ECR Europe (2002) categorized the depth and level of collaboration into three different forms such as basic collaboration, developing collaboration and advanced collaboration. Whereas, Simatupang and Sridharan, (2004) categorized the level of collaborative practices into low and high collaborations. In general it is agreed that various levels and practice of collaboration can yield benefits across the whole SC. In our research, degree of collaboration is measured in terms of number of collaborating partners (can be two echelon or multi-echelon SC), duration of collaboration and level of involvement. In this research, level of involvement is defined as the involvement of top management in terms of investment on technology and people in SCC activities. In every SCC, active participation of each SC partner can help to enhance the overall performance (Lambert and Pohlen, 2001). Cooke (2002) identified the need to change corporate culture as a pre-requirement of collaboration. Long-term SCCs can change attitude of workers. Normally, level of involvement of top management in SCC will be reflected in their investment on collaborating technology and training (Ramanathan et al., 2011). Based on the literature, we define the degree of SCC in terms of number of collaborating partners, total number of years, and investment on collaborative effort. tInvolvemen of Level businessin Duration years ingCollaborat memberschain supply ofNumber partners SCC ofNumber Degree  Here ‘level’ will be identified from the case company and percentage value will be assumed based on the collaborative operations (activities) in proportion to total activities. Level of Involvement =       yearper y technologand gon trainin investment Total yearper y) technologand ng(on traini investment iveCollaborat 4.5. Information sharing                 cost Variable cost Production ).( 0 2 , 0 , HCR-D)(IPPC Average T j jjn T j jAP Page 16 of 27 In recent competitive market, a great deal of business is relying on SC information and proper use of data. SCC can contribute to improve information sharing among SC partners (Yu et al, 2001). According to VICS (2002) accelerated information sharing among SC partners will increase the reliability of the order generation. Li and Wang (2007) asserted that the benefit of information sharing is depending on two factors: one is the context and the other is the proper use of information. Optimizing the supply chain will be possible through collaboration (Horvath, 2001) and information sharing (Horvath 2001, Yu et al. 2001). Information sharing among SC partners will help improve forecast accuracy and hence will help potential cost savings (Aviv, 2007;Byrne and Heavey, 2006). An exceptional level of service can also be achieved through integrated data and information (Kim 2006). Critical information sharing among SC partners varies widely depending on the industries involved (Smaros, 2007). Ovalle and Marquez (2003) summarized the types of information under three headings: product information, customer demand and transaction information, and inventory information. Yu et al. (2001) revealed that the centralized information sharing benefits manufacturers more than the retailers. Though information sharing and the role of IT were accepted as significant phenomenon in collaboration (Sanders and Premus 2005), the use of technology is not argued widely as a necessary condition for collaboration. This argument is evident from Smaros’ (2007) statement ‘collaboration technology is not a key obstacle for large scale collaborative forecasting’. In SCCs, product replenishment is a sub-process of forecasting (CPFR, 2002). Internal forecasting is the one which is generated by each collaborating partner based on the time series data and other exceptional factors (such as sales promotions) and market criteria. Collective forecasting is based on all the individual internal forecast figures which in turn facilitate order generation. Internal forecast accuracy will reflect in the collective forecast figure and help to reduce bullwhip effect (Aviv, 2001). In SCCs, the forecasting accuracy and forecast information quality can improve the profit proportion (Forslund and Jonsson, 2007). From the above literature, we understand that effectiveness of information sharing in SCC will be reflected in forecasting accuracy (FA) of product demands (Ramanathan and Muyldermans, 2010) and returns. Accordingly, we calculate FA as follows: Forecasting Accuracy (Sales) =               AD FDADabs )( 1 Forecasting Accuracy (Returns) =               AR FRARabs )( 1 Collectively, Forecasting Accuracy (FA) can be calculated as: Page 17 of 27 100 2 )()( 1 0                                     T j j jj j jj AR FRARabs AD FDADabs We assume that the demand follows the normal distribution. FD—Expected Demand; AD- Actual Demand; FR- Expected Returns; AR- Actual Returns; j- SCC period (here, 0 < j < 6) The underlying assumption is that the use of technology and information system helps to exchange real time information without any delay in information sharing. Hence, the point of sale data is available to manufacturer without any delay, i.e. accessibility is 100%. Standard Deviation (SD) of Forecast Error (FE) describes the spread of errors or uncertainty about an error which can be used for setting safety stock. Forecasting Error is calculated as follows: Absolute percentage of error (Sales) % = 100% AD FD)abs(AD        Absolute percentage of error (Returns) % = 100% AR FR)abs(AR        In this paper, we measure the degree of collaboration based on the level of involvement; the length of collaboration (period) and the number of partners. The impact of change in the degree of collaboration will be identified in forecast accuracy, business objectives and processes. The overall performance of SCC is calculated as the sum of individual performance in terms of BOO, BOF, forecasting accuracy, and processes at various degrees of collaboration. 5. Analysis and discussion of simulation results Improving overall performance, in terms of both quality and service, of SCs along with other business objectives such as maximising profit and minimising costs are the common underlying features of CPFR. But not many researchers have considered the impact of other underlying factors such as degree of collaboration, involvement of top management, information sharing, customer support, business objectives and SC processes. Magnitude of benefits on implementation of SCC often varies widely across different industries as substantial amount of investment and time are involved (Ramanathan et al., 2011). For example, products that are mainly manufactured to stock (such as detergents and shampoo) will have longer shelf life (Fisher, 1997) and hence SCs may not require high degree of collaboration. At the same time, fast moving technology products such as laptops and software need to be sold in a short span of Page 18 of 27 time in order to avoid obsolescence which requires higher degree of collaborative support from other SC members. For the purpose of this research we have contacted five different global companies from the packaging industry, who practice SCC. Three of them have collaboration with either upstream or downstream SC partners but not with both. Finally, we have considered two manufacturing companies who have been involved in collaboration for over six years with both upstream and downstream customers. SCC information selected for further analyses were mainly focussed on five factors as explained before. For each company, we have collected data of 10 collaborating partners and simulated the data using excel. Table 2 describes the sample data of one of the companies collaborating with different supply chain partners at various degrees. The first three columns of the table represent SC investment in collaboration (in US dollars), number of partners and length of collaboration. All the remaining columns have used the formula as described in Section 4. Table 2: Analysis of sample data Coll. investment Coll. partners Coll. period Degree BOO Information sharing Forecast accuracy SC processes 83500 3 3 0.01 0.02 0.77 65% 0.79 50000 10 3 0.02 0.99 0.92 96% 0.62 55000 4 4 0.04 0.89 0.95 91% 0.54 34000 10 4 0.01 0.00 0.91 96% 0.57 48500 11 4 0.01 0.03 0.87 91% 0.12 53500 7 5 0.04 0.01 0.91 91% 0.01 133000 8 5 0.05 0.01 0.93 87% 0.56 49000 4 5 0.02 0.99 0.76 69% 0.49 45000 3 6 0.01 0.90 0.79 65% 0.51 56000 6 6 0.02 0.00 0.97 95% 0.54 59000 12 7 0.02 0.99 0.93 99% 0.45 43590 7 7 0.03 0.01 0.94 97% 0.45 We have simulated 1000 instances of SCC based on the company’s data. The results indicate that the forecasts accuracy becomes stable over a period of time with the same number of collaborating partners. Figure 3 indicates the effect of the levels of collaboration on the performance of the company in terms of financial and non-financial objectives (SC processes and information sharing). SC partners collaborating for longer period of time have achieved increasing performance both financially and operationally. But it is not guaranteed that the company individual financial business objectives will be achieved consistently in case of high investments on collaboration. Also, the higher the number of collaborating partners does not mean proportionately the higher the level of performance. The Page 19 of 27 performance of the company shows a very slow but incremental effect against the level of collaboration in terms of number of partners (see Figure 3). Our interview with the case companies revealed that collaborating partners who are in the same business for a long term will bring success for all SC collaborating partners. This is possible mainly due to the sharing of knowledge and well established SC network. But “new members need to wait to reveal the actual benefit of collaboration. Huge investment in SCCs will not always help to reap the benefit quickly. Time is the key success factor in collaboration. Committed SC partners make our SCs really profitable and successful in terms of performance”. The results of the analysis suggest that companies do not need to investment on collaboration every year in order to yield high profit. Companies that believe in high investment on collaborative relationship without having effective SC operations will be difficult to survive in the competitive market. Even though new partnership is encouraged in competitive business scenario, it is vital for companies to continue the existing profitable partnership for a longer period of time to obtain consistent performance. Page 20 of 27 Figure 3: Effect of levels of collaboration on performance y = 28.97x + 155.67 -500 0 500 1000 0 0.2 0.4 0.6 0.8 1 1.2 B O F Level of collaboration - period Financial performance y = 0.1057x + 0.8232 0 1 2 3 0 0.5 1 1.5 B O O , S C p ro ce se s a n d in fo rm a ti o n s h a ri n g Level of collaboration - period Non-financial Performance y = 21.986x + 158.25 -500 0 500 1000 0 0.2 0.4 0.6 0.8 1 1.2 B O F Level of collaboration -investment Financial performance y = -0.244x + 0.9932 0 1 2 3 0 0.2 0.4 0.6 0.8 1 1.2 B O O , S C p ro ce se s a n d in fo rm a ti o n s h a ri n g Level of collaboration - investment Non-financial Performance y = 1.2282x + 165.7 -200 0 200 400 600 0 5 10 15 B O F Level of collaboration - partners Financial performance y = 0.1057x + 0.8232 0 1 2 3 0 0.2 0.4 0.6 0.8 1 1.2B O O , S C p ro ce se s a n d in fo rm a ti o n s h a ri n g Level of collaboration-partners Non-financial Performance Page 21 of 27 6. Managerial implications, conclusions and future research This paper addresses a recent relevant practical approach of SC collaboration in performance improvement. Understanding the important factors of SC collaboration and their impact on the potential benefits of SC can help the top management to understand the required degree of collaboration with upstream and downstream partners. One of the interesting managerial insights on fundamental principal of collaboration is that neither investment nor number of partners nor duration of collaboration, will independently contribute to improve the performance of SCs. This result helps to understand the importance of involvement of each SC partner. Increasing the number of partners in SCs will complicate the decision making and hence slow down the performance. However, human interactions in SCs can assist appropriate investment decisions in IT and collaborations to improve SC processes. Long-term collaborating partners can help yielding sustainable benefits to SCs. In general, the financial performance of a company is an indicator of success of operational performance. From the data analysis, we identified that the less involvement of top management in SC collaboration results in poorer overall performance. By measuring the performance, the top management of the company can decide whether to improve its investments in collaborative activities. Measuring the forecast accuracy can alert the managers the usefulness of available information and also can point out the need for accessible information and technology (Ramanathan and Muyldermans, 2010). Different supply chain partners collaborating for various purposes will have individual business objectives. Successful collaboration will help the businesses to be successful in achieving those set objectives. By measuring both financial and operational objectives any company can understand the current accomplishment of expected achievements. For example, in the given case company, the higher investment in collaboration has not shown more substantial benefit in terms of revenue. Hence, the company can try to improve other aspects of the current collaborative arrangement instead of investing further in the collaboration. On knowing the potential benefits of SC collaboration, SC partners can extend their partnership further to increase profit, to reduce lead time and to improve customers’ satisfaction. In this research, we have tested the SC collaboration with different levels of involvement and partnerships for certain period of time using simulation techniques. For different degrees of collaboration, the benefits of SC are found different. In real businesses, it is risky to experiment various degrees of collaboration as it can involve a huge amount of investment. Findings of this research suggest that the degree of collaboration should be revised on analysing the performance of the company (see Figure 4). The conduciveness of collaboration for any company depends on its flexibility in changing the degree of collaboration to achieve the business objectives. For example, if too many SC partners are involved in the collaboration, the partners with the highest Page 22 of 27 investment may have the power of dominance in planning and decision making; this may affect the smaller players in SCC arrangements (Aviv, 2007). In this case, the top management of the focal company can alter the degree of collaboration such as duration of collaboration, level of involvement and number of participating members to achieve required performance. Irrespective of the degree of collaboration, another performance measure namely, ‘forecast accuracy of the company’ will explicitly indicate the role of information exchange in the collaborative SC (Ramanathan and Muyldermans, 2010). Since the products with shorter lead time can normally benefit more from collaborative forecasting (Aviv, 2001), in this research we suggest extending the use of collaborative forecasting for products with medium or longer lead time. In poor forecast accuracy, top management can increase accessibility of information exchange. The company can also think of revamping the IT technology in order to improve the efficiency of information sharing. Achieving the predefined business objectives in terms of financial and operational activities will help the SC partners to sustain in the competitive business market. Performance measurement, in terms of financial and operational business objectives, indicates the conduciveness of the current SC collaboration. The collaborating company can adjust the degree of collaboration to match its business objectives. For example, SCC can be strengthened to increase profit by reducing the cost of operations. Similarly, SC collaboration can help reducing the product returns or help selling the returned products in secondary markets. Our research confirms that production lead time and capacity utilisation can also be improved with SC collaboration of suppliers’ for on-time delivery of raw materials for timely planned production (see Figure 4). To evolve efficient and effective competitive supply chain collaboration, all SC processes need to be assessed from time to time for evaluating the performance. In a growing field, performance measurement is highly indispensable in order to improve further. In a new field, it is equally important to monitor the performance to test the conduciveness. Our research has indicated the importance of identifying conducive environments for successful supply chain collaborations. We have based our simulation study using data from two companies from packaging industry. The same research can be extended further for different industries that have SC collaboration with many partners involving huge investment for long duration. This can help to draw a general conclusion on suggested level of investment and supply chain partnership, specific to each business sector. Page 23 of 27 Collaborative Supply Chain Processes Plan, Produce, Replenish and Return Overall performance Degree Information sharing &Forecasting Business Objectives (Financial) Duration Partners Level of Involvement Investment Technology People Top Management Involvement Business Objectives (Operational) Adjust the degree Increase frequency, accessibility & revamp technology In c re a se p ro fi t; re d u c e c o st Require more involvement Adhere to plan, control production and delivery time and frequent monitoring Reduce returns; reduce product lead time; improve capacity utilization Figure 4: Areas of improvement in SC collaboration Page 24 of 27 References Angerhofer, B.J. and Angelides, M.C. (2006). A model and a performance measurement system for collaborative supply chains. Decision Support Systems 42, 283-301. Aviv, Y. (2001). The effect of collaborative forecasting on supply chain performance. Management Science 47 (10), 1326-1343. Aviv, Y. (2002). Gaining benefits from joint forecasting and replenishment process: The Case of Auto-Correlated demand. Manufacturing & Service Operations Management 4 (1), 55-74. Aviv, Y. (2007). On the benefits of collaborative forecasting partnerships between retailers and manufacturers. Management Science 53 (5), 777-794. Barratt, M. and Oliveira, A. (2001). Exploring the experiences of collaborative planning initiatives. International Journal of Physical Distribution & Logistics Management 31 (4), 266- 289. Boddy, D., Cahill, C., Charles, M, Fraser-Kraus. M and MacBeth, D. (1998) Success and failure in implementing supply chain partnering: an empirical study. European Journal of Purchasing and Supply Management 2(2-3), 143-151. Byrne, P.J. and Heavey, C. (2006). The impact of information sharing and forecasting in capacitated industrial supply chains: A case study. International Journal of Production Economics 103 (1), 420-437. Cachon, G.P. and Fisher, M. (2000). Supply chain inventory management and the value of shared information. Management Science 46 (8), 1032-1048. Cadilhon, J. and Fearne, A. P. (2005). Lessons in Collaboration: A case study from Vietnam. Supply Chain Management Review 9 (4), 11-12. Caridi, M., Cigolini, R. and Marco, D.D. (2005). Improving supply-chain collaboration by linking intelligent agents to CPFR. International Journal of Production Research 43(20), 4191-4218. Chang, T., Fu, H., Lee, W., Lin, Y. and Hsueh, H. (2007). A study of an augmented CPFR model for the 3C retail industry. Supply chain management: An International Journal 12, 200-209. Chan, Felix.T.S. and Zhang, T. (2011). The impact of collaborative transportation management on supply chain performance: A simulation approach. Expert Systmes with Applications, 38 (3), 2319-2329. Chen, I.J. and Paulraj, A. (2004). Towards a theory of supply chain management: the constructs and measurements. Journal of Operations Management 22, 119-150. Chen, T. H. and Chen, J. M. (2005). Optimizing supply chain collaboration based on joint replenishment and channel coordination. Transportation Research Part E: Logistics and Transportation Review 41, 261-285. Cheung, C.F., Cheung, C.M. and Kwok, S.K. (2012). A knowledge-based customization system for supply chain integration. Expert Systems with Applications. 39 (4), 3906-3924. Christopher, M. (1998). Logistics and Supply Chain Management, second ed. Financial Times Press, Prentice Hall, Englewood Cliffs, NJ. Chung, W.C. and Leung, W.F. (2005). Collaborative planning, forecasting and replenishment: a case study in copper clad laminate industry. Production Planning & Control 16 (6), 563-574. Cooke, 2002; Page 25 of 27 Cooper, M.C., Lamber, P.M. and Pagh, J.D. (1997). Supply chain management: more than just a new name for logistics. International Journal of Logistics Management 8(1), 1-14. Corsten, D. and Felde, J. (2005). Exploring the performance effects of key supplier collaboration. International Journal of Physical Distribution & Logistics Management 35 (6), 445-461. Danese, P. (2007). Designing CPFR collaborations: insights from seven case studies. International Journal of Operations and Production Management 27 (2), 181-204. Dowlatshahi, S. (2000). Developing a theory of Reverse Logistics, Interfaces 30 (3), 143-155. ECR, Europe (2002). European CPFR Insights, ECR European facilitated by Accenture, Brussels. Fisher, M.L. (1997). What is the right supply chain for your product? Harvard Business Review, 75 (2), 105-116. Fliedner, G. (2003). CPFR: An emerging supply chain tool. Industrial Management + Data Systems 103 (1/2), 14-21. Forslund, H. and Jonsson, P. (2007). The impact of forecast information quality on supply chain performance. International Journal of Operations & Production Management 27 (1), 90-107. Fu, Y. and Piplani, R. (2004). Supply-side collaboration and its value in supply chains. European Journal of Operational Research 152, 281-288. Gavirneni, S., Kapuscinski, R. and Tayur, S. (1999). Value of information in capacitated supply chains. Management Science 45 (1), 16-24. Horvath, L. (2001). Collaboration: The key to value creation in supply chain management. Supply Chain Management 6 (5), 205. Humphreys, P. K., Shiu, W. K. and Chan, F. T. S. (2001). Collaborative buyer-supplier relationships in Hong Kong manufacturing firms. Supply Chain Management 6 (3/4), 152-162. Ireland, R.K. and Crum, C. (2005). Supply chain collaboration: How to implement CPFR and other best collaborative practices. J. Ross Publishing Inc.: Florida. Kim, B. and Oh, H. (2005). The impact of decision-making sharing between supplier and manufacturer on their collaboration performance. Supply Chain Management 10 (3/4), 223- 236. Kwon, O., Im, G.P. and Lee, K.C. (2007). MACE-SCM: A mult-agent and case based reasoning collaboration mechanism for supply chain managment under supply and demand uncertainties. Expert Systems with Applications, 33 (3), 690-705. Lambert, D.M. and Pohlen, T.L. (2001). Supply chain metrics. International Journal of Logistics Management 12 (1), 1-19. Larsen, T.S., Thenoe, C. and Andresen, C. (2003). Supply chain collaboration: Theoretical perspectives and empirical evidence. International Journal of Physical Distribution & Logistics Management 33 (6), 531-549. Lee, H. and Whang, S. (2001). Demand chain excellence: A tail of two retailers. Supply Chain Management Review 5 (2), 40-46 Lee, H. L. (2002). Aligning Supply Chain Strategies with Product Uncertainties. California Management Review 44 (3), 105-119. Lee, H.L., So, K.C. and Tang, C.S. (2000). The value of information sharing in a two-level supply chain. Management Science 46 (5), 626-643. Page 26 of 27 Lee., H.L. and Padmanabhan.,V. (1997). Information distortion in a supply chain: The bullwhip effect. Management Science 43, 546. Li, X. and Wang, Q. (2007). Coordination mechanisms of supply chain systems. European Journal of Operational Research 179 (1), 1-16. Matchette, J. and Seikel, A. (2004). How to win friends and influence supply chain partners. Logistics Today 45 (12), 40-42. McCarthy, T.M. and Golicic, S. L. (2002). Implementing collaborative forecasting to improve supply chain performance. International Journal of Physical Distribution & Logistics Management 32 (6), 431-454. Mishra, A.A. and Shah, R. (2009). In union lies strength: Collaborative competence in new product development and its performance effects. Journal of Operations Management 27 (4), 324-338. Nyaga, G.N., Whipple, J.M. and Lynch,D.F. (2010). Examiningsupplychainrelation-ships:do buyer andsupplier perspectives on collaborative relationships differ? Journal of Operations Management 28,101–114. Ovalle, O.R., Marquez, A.C. (2003). The effectiveness of using e-collaboration tools in the supply chain: An assessment study with system dynamics. Journal of Purchasing & Supply Management 9 (4), 151-163. Raghunathan, S. (2001). Information sharing in a supply chain: A note on its value when demand is non stationary. Management Science 47 (4), 605-610. Raghunathan, S. (2001). Information sharing in a supply chain: A note on its value when demand is non stationary. Management Science 47 (4), 605-610. Ramanathan, U and Muyldermans, L. (2011). Identifying the underlying structure of demand during promotions: A structural equation modelling approach. Expert Systems with Applications 38(5), 5544-5552. Ramanathan, U. (2012a). Aligning supply chain collaboration using Analytic Hierarchy Process. Omega - The International Journal of Management Science (forthcoming.doi:10.1016/j.omega.2012.03.001). Ramanathan, U. (2012b). Supply chain collaboration for improved forecast accuracy of promotional sales, International Journal of Operations and Production Management, 32 (6) (forthcoming). Ramanathan, U. and Muyldermans, L. (2010). Identifying demand factors for promotional planning and forecasting: A case of a soft drink company in the UK. International Journal of Production Economics, 128 (2), 538-545. Ramanathan, U., Gunasekaran, A. and Subramanian, N. (2011), Performance metrics for collaborative supply chain: A conceptual framework from case studies, Benchmarking: An International Journal, 18 (6), 856-872. Ramanathan. U. and Gunasekaran, A., (2013). Supply chain collaboration: Impact of success in long-term partnerships. International Journal of Production Economics. (forthcoming.http://dx.doi.org/10.1016/j.ijpe.2012.06.002). Sanders, N. R. and Premus, R. (2005). Modelling the relationship between firm IT capability collaboration and performance. Journal of Business Logistics 26 (1), 1-23. Page 27 of 27 Seifert, D. (2003). Collaborative Planning Forecasting and Replenishment: How to create a supply chain advantage. Saranac Lake NY USA: AMACOM. Shafiei, F., Sundaram, D. And Piramuthu, S., (2012). Muti-enterprise collaboratie decision support system. Expert Systems with Applications. 39 (9), 7637-7651. Sinha, A.K., Aditya, H.K., Tiwari,M.K. achd Chan, F.T.S. (2011). Agent oriented petroleum supplhy chain coordination:Co-evaluationary particle swarm optimisation based approach. Expert Systems with Applications, 38(5). 6132-6145. Simatupang, T.M. and Sridharan, R. (2004). Benchmarking supply chain collaboration: An empirical study. Benchmarking: An international Journal 11 (5), 484-503. Singh, P.F. and Power, D. (2009). The nature and effectiveness of collaboration between firms, their customers and suppliers: a supply chain perspective. Supply Chain Management: An International Journal 14 (3), 189-200. Smaros, J. (2007). Forecasting collaboration in the European grocery sector: Observations from a case study. Journal of Operations Management 25 (3), 702-716. Smith, L. (2006). West Marine:CPFR success story. Supply Chain Management Review 10 (2), 29-36. Stank, T.P., Keller, S. B. and Daugherty, P.J. (2001). Supply chain collaboration and logistical service performance. Journal of Business Logistics 22 (1), 29-48. Steermann, H. (2003). A practical look at CPFR: The Sears-Michelin experience. Supply Chain Management Review 7 (4), 46-53. VICS (2002). CPFR guidelines Voluntary Inter-industry Commerce Standards. Available at: www.cpfr.org (accessed January 2007). Yu, Z., Yan, H and Cheng, T.C.E. (2001) Benefits of information sharing with supply chain partnerships, Industrial Management & Data Systems,101(3), 114-119. Acknowledgement: Author would like to thank - two anonymous reviewers for their valuable comments and Dr Yongmei Bentley for her support in improving the paper. work_3akpcw3alvfzpbwlc3yzkadwpa ---- An Expert System for Quantification of Bradykinesia Based on Wearable Inertial Sensors sensors Article An Expert System for Quantification of Bradykinesia Based on Wearable Inertial Sensors Vladislava Bobić 1,2,*, Milica Djurić-Jovičić 2, Nataša Dragašević 3, Mirjana B. Popović 1,4, Vladimir S. Kostić 3 and Goran Kvaščev 1 1 University of Belgrade-School of Electrical Engineering, 11000 Belgrade, Serbia; mpo@etf.rs (M.B.P.); kvascev@etf.rs (G.K.) 2 Innovation Center, School of Electrical Engineering, University of Belgrade, 11000 Belgrade, Serbia; milica.djuric@etf.rs 3 Clinic of Neurology, School of Medicine, University of Belgrade, 11000 Belgrade, Serbia; ntdragasevic@gmail.com (N.D.); vladimir.s.kostic@gmail.com (V.S.K.) 4 Institute for Medical Research, University of Belgrade, 11000 Belgrade, Serbia * Correspondence: vladislava.bobic@ic.etf.rs; Tel.: +381-11-3218-455 Received: 3 April 2019; Accepted: 4 June 2019; Published: 11 June 2019 ���������� ������� Abstract: Wearable sensors and advanced algorithms can provide significant decision support for clinical practice. Currently, the motor symptoms of patients with neurological disorders are often visually observed and evaluated, which may result in rough and subjective quantification. Using small inertial wearable sensors, fine repetitive and clinically important movements can be captured and objectively evaluated. In this paper, a new methodology is designed for objective evaluation and automatic scoring of bradykinesia in repetitive finger-tapping movements for patients with idiopathic Parkinson’s disease and atypical parkinsonism. The methodology comprises several simple and repeatable signal-processing techniques that are applied for the extraction of important movement features. The decision support system consists of simple rules designed to match universally defined criteria that are evaluated in clinical practice. The accuracy of the system is calculated based on the reference scores provided by two neurologists. The proposed expert system achieved an accuracy of 88.16% for files on which neurologists agreed with their scores. The introduced system is simple, repeatable, easy to implement, and can provide good assistance in clinical practice, providing a detailed analysis of finger-tapping performance and decision support for symptom evaluation. Keywords: decision support system; wearable inertial sensors; finger-tapping; automatic scoring; Parkinson’s disease; atypical parkinsonism; UPDRS 1. Introduction Wearable sensors and advanced algorithms are increasingly being used for the development of new clinical support systems for more efficient diagnostics and the evaluation of symptom severity and disease progress in Parkinson’s disease (PD) [1]. This covers a wide range of applications assessing different symptoms, such as tremor, hypokinesia, rigidity, and bradykinesia. Bradykinesia is one of the main manifestations of PD. It is evidenced as slowness of body movements, especially in tasks that require fine motor control [2]. In clinical practice, bradykinesia (as well as other motor symptoms) are usually assessed using the Unified Parkinson’s Disease Rating Scale (UPDRS), in which the third part of the examination is dedicated to motor skill evaluation (UPDRS III) [3]. Bradykinesia is evaluated using repetitive hand and leg movements, such as finger-tapping, hand opening/closing, pronation/supination, and foot (or toe) tapping [2]. As a part of the examination, patients are requested to repeatedly perform specified movements, as fast and Sensors 2019, 19, 2644; doi:10.3390/s19112644 www.mdpi.com/journal/sensors http://www.mdpi.com/journal/sensors http://www.mdpi.com http://www.mdpi.com/1424-8220/19/11/2644?type=check_update&version=1 http://dx.doi.org/10.3390/s19112644 http://www.mdpi.com/journal/sensors Sensors 2019, 19, 2644 2 of 17 with the biggest amplitude as possible, during some short period of time, usually 10–15 s [4–7], or for some specified number of repetitions, e.g., 10 times [8–10]. These movements are evaluated based on specifically defined criteria, including speed, amplitude, amplitude decrement, and number of hesitations or freezes. The performance is rated with scores ranging from 0 to 4, in which the lowest values correspond to normal movements, and higher values are given for more severe bradykinesia expressed through significant amplitude losses, decreasing speed, or an increased number of hesitations/freezes. However, in clinical practice, this examination is usually performed visually, which may result in subjective evaluation and rough quantification. Since precise evaluation represents a very important part of the long-term monitoring of the disease’s progress and patients’ response to therapy, researchers have dedicated their effort and time to design new systems that can be used for the objective evaluation and automatic scoring of symptom severity. In the literature, different approaches are presented for the objective evaluation and quantification of PD motor symptom severity, including bradykinesia. The introduced methodologies differ in terms of applied instrumentation, analysed movements, measurement protocols, the size and composition of patients’ groups, and implemented signal processing and learning techniques. Some studies implement RGB or infrared camera systems for measuring clinically important repetitive hand and leg movements [5,11,12]. Although such systems can provide high-precision measurements, they have some limitations. They are expensive and require dedicated space for recording (they are bulky), which significantly limits their applicability in clinical settings [13]. Due to these limitations, wearable systems, such as smartphones [14], magnetic sensors [13], and inertial measurement units (IMUs) [7–9,15–20], are increasingly being applied for bradykinesia assessment. IMUs are small, lightweight, easy to mount, and do not require dedicated space for recording, which makes them more suitable for fast and reliable everyday clinical applications. In the literature, bradykinesia is assessed by analysing different repetitive movements, including finger-tapping [8,13,21], hand opening/closing [16,17], hand pronation/supination [9,10,22], and toe tapping [23], as well as by simultaneous analysis of different movements [12,20,24]. From finger-tapping (FT) accelerometer data, researchers extract different features, describing the frequency and biomechanical properties of the movements, and use them as input into the ordinal logistic regression model for prediction of UPDRS FT scores [8]. It is shown that scores can be predicted with high predictive power (the Goodman–Kruskal Gamma score is 0.961). Four parameters extracted from the FT gyro data were found to be statistically correlated with clinical scores (from r = 0.73 to r = −0.80) [7]. A similar approach is applied to a repetitive hand opening/closing task [16]. Signals are acquired with small IMUs and described by the dominant grasping frequency and mean angle, and fitted with the clinical UPDRS scores using a regression model [16]. It is shown that the predicted scores are highly correlated with the clinical scores (the determination coefficient is r2 = 0.99). A methodology that combines principal component analysis and multiple linear regression is applied to quantify bradykinesia severity in FT movements [13]. The method is applied to features that are extracted from the data recorded using magnetic sensors. It is shown that this approach can provide scores with a mean square error of 0.45 compared to the reference UPDRS FT scores. Another study presented a new approach that uses a motion capture system and dynamical features rather than standard spectral features for automatic scoring of the FT performance [5]. The results show strong and significant correlations with clinical scores. In order to describe bradykinesia in multi-joint upper limb movements, researchers have introduced new performance indexes that are correlated with UPDRS bradykinesia scores and implemented for differentiation between PD patients with and without bradykinesia [22]. A similar approach is designed for the evaluation of bradykinesia in walking and sit-to-stand tasks, in which novel performance indices are successfully used for differentiation between healthy subjects and PD patients, and ON and OFF states in patients [25]. A support vector machine (SVM) classifier applied to spectral and nonlinear features achieves high-accuracy results (accuracy, sensitivity, and specificity above 97%) for the prediction of UPDRS FT scores (0–3) [19]. However, the method is applied to Sensors 2019, 19, 2644 3 of 17 gyro signals recorded from healthy subjects who mimick the impaired movements of PD patients. In another study, SVM was successfully applied (error below 5%) for estimation of the severity of several symptoms (bradykinesia, tremor, and dyskinesia) in 12 PD patients using the features extracted from accelerometer data describing multiple upper and lower extremity movements [20]. SVM was also applied for estimation of bradykinesia severity in a study comprising 78 PD patients and 18 healthy subjects, who were instructed to perform hand opening/closing for 10 s [4]. It was shown that SVM can predict clinical scores with an accuracy of 95.349%. Decision trees, applied to features extracted from inertial signals, were also used for prediction of UPDRS scores for a pronation/supination task, showing a mean agreement of 0.48 with clinical ratings [10]. Although supervised machine learning algorithms provide prediction of clinical scores with high accuracy, the applied models are trained on a smaller dataset with subjectively defined data labels, which may cause subjectivity in the results as well. Because of that, some researchers have introduced different approaches to this topic. Decision rules can be designed to match exactly the criteria of the decision-making process and instructions applied in clinical practice. Fuzzy rules were applied for prediction of clinical scores using inertial data recorded during foot tapping [23] and hand pronation/supination movements in [9]. Their designed rules provide good results, with an accuracy of about 90%. Great Lake Technologies proposed a commercialized smartphone application, called Kinesia One, that provides clinical scores and subscores for different criteria for several bradykinesia tasks using a inertial sensor positioned on the index finger [26]. In this paper, we propose a new decision support system for the provision of clinical scores based on the use of inertial data describing finger-tapping movements. The proposed system uses novel metric and decision rules that are especially designed to capture and evaluate the relevant characteristics of the finger-tapping movement. The system provides very good results for data obtained from patients with idiopathic Parkinson’s disease but also with atypical parkinsonism. The output of the system comprises the kinematic features describing the finger-tapping performance, a graphical presentation of the recorded data with marked irregularities, and important changes in the signal and bradykinesia severity scores. 2. Materials and Methods 2.1. Measurement System The used system comprises two miniature (10 × 12 mm) and lightweight inertial sensors with three-dimensional (3D) gyroscopes L3G4200 (STMicroelectronics, Geneva, Switzerland) positioned over the fingernails of the thumb and index finger, as shown in Figure 1 [6]. Inertial sensors are connected to sensor-control units (SCUs). An SCU acquires and wirelessly transmits sensor data to a remote computer, where custom-made software controls data acquisition (developed in CVI 9.0, NI LabWindows, National Instruments, Austin, Texas, USA). Sensors 2019, 19, x FOR PEER REVIEW 3 of 17 severity of several symptoms (bradykinesia, tremor, and dyskinesia) in 12 PD patients using the features extracted from accelerometer data describing multiple upper and lower extremity movements [20]. SVM was also applied for estimation of bradykinesia severity in a study comprising 78 PD patients and 18 healthy subjects, who were instructed to perform hand opening/closing for 10 s [4]. It was shown that SVM can predict clinical scores with an accuracy of 95.349%. Decision trees, applied to features extracted from inertial signals, were also used for prediction of UPDRS scores for a pronation/supination task, showing a mean agreement of 0.48 with clinical ratings [10]. Although supervised machine learning algorithms provide prediction of clinical scores with high accuracy, the applied models are trained on a smaller dataset with subjectively defined data labels, which may cause subjectivity in the results as well. Because of that, some researchers have introduced different approaches to this topic. Decision rules can be designed to match exactly the criteria of the decision-making process and instructions applied in clinical practice. Fuzzy rules were applied for prediction of clinical scores using inertial data recorded during foot tapping [23] and hand pronation/supination movements in [9]. Their designed rules provide good results, with an accuracy of about 90%. Great Lake Technologies proposed a commercialized smartphone application, called Kinesia One, that provides clinical scores and subscores for different criteria for several bradykinesia tasks using a inertial sensor positioned on the index finger [26]. In this paper, we propose a new decision support system for the provision of clinical scores based on the use of inertial data describing finger-tapping movements. The proposed system uses novel metric and decision rules that are especially designed to capture and evaluate the relevant characteristics of the finger-tapping movement. The system provides very good results for data obtained from patients with idiopathic Parkinson’s disease but also with atypical parkinsonism. The output of the system comprises the kinematic features describing the finger-tapping performance, a graphical presentation of the recorded data with marked irregularities, and important changes in the signal and bradykinesia severity scores. 2. Materials and Methods 2.1. Measurement System The used system comprises two miniature (10 × 12 mm) and lightweight inertial sensors with three-dimensional (3D) gyroscopes L3G4200 (STMicroelectronics, Geneva, Switzerland) positioned over the fingernails of the thumb and index finger, as shown in Figure 1 [6]. Inertial sensors are connected to sensor-control units (SCUs). An SCU acquires and wirelessly transmits sensor data to a remote computer, where custom-made software controls data acquisition (developed in CVI 9.0, NI LabWindows, National Instruments, Austin, Texas, USA). Figure 1. Illustration of the inertial sensor system, with local coordinate systems of the thumb (𝑋 , 𝑌 , 𝑍 ) and index finger (𝑋 , 𝑌 , 𝑍 ) sensors. SCU - sensor-control unit. Figure 1. Illustration of the inertial sensor system, with local coordinate systems of the thumb (X1, Y1, Z1) and index finger (X2, Y2, Z2) sensors. SCU-sensor-control unit. Sensors 2019, 19, 2644 4 of 17 2.2. Subjects Fifty-six subjects were recruited for this study from the Clinic of Neurology, Clinical Centre of Serbia, Belgrade. The subjects included 13 patients (Gender: seven male/six female, Age: 62.23 ± 10.79 years) with idiopathic Parkinson’s disease (PD), 17 patients (Gender: five male/12 female, Age: 58.41 ± 6.41 years) with atypical parkinsonism multiple system atrophy (MSA), 14 patients (Gender: 11 male/three female, Age: 65.71 ± 9.33 years) with atypical parkinsonism progressive supranuclear palsy (PSP), and 12 healthy controls (HC) (Gender: four male/eight female, Age: 58.40 ± 7.78 years). The patients were tested during their “off” phase (after at least 12 h of treatment withdrawal, if possible). Descriptive statistics (average ± standard deviation and median) of the clinical data for each group of subjects are presented in Table 1, including the Hoehn and Yahr (H&Y) scale, total UPDRS, UPDRS-III (complete Motor examination scores), and scores given solely for the finger-tapping task by two neurologists, separately for the less- and more-affected hand. Table 1. Descriptive statistics of the subjects’ data. Group Statistics H&Y UPDRS Total UPDRS III FTN1 Score FTN2 Score Less AH More AH Less AH More AH PD Avg ± std 1.80 ± 0.79 42.60 ± 16.93 24.60 ± 9.07 1.67 ± 0.89 2.17 ± 0.94 1.75 ± 0.97 2.17 ± 0.94 Median 2 36 19.5 2 2 2 2 MSA Avg ± std 3.18 ± 0.75 77.73 ± 13.70 46.64 ± 9.08 2.31 ± 0.70 2.81 ± 0.54 2.38 ± 0.72 2.81 ± 0.54 Median 3 79 45 2 3 2.5 3 PSP Avg ± std 3.45 ± 0.93 74.45 ± 20.08 42.91 ± 13.14 2.17 ± 0.94 2.62 ± 0.77 2.08 ± 0.79 2.77 ± 0.73 Median 4 79 46 2.5 3 2 3 HC Avg ± std / / / 0.44 ± 0.63 0.50 ± 0.73 Median / / / 0 0 1 PD—Parkinson’s disease; MSA—Multiple system atrophy; PSP—Progressive supranuclear palsy; HC—Healthy controls; H&Y—Hoehn and Yahr scale; UPDRS–Unified Parkinson’s Disease Rating Scale; UPDRS III—Unified Parkinson’s Disease Rating Scale, Part III—Motor examination; FTN1—Finger-tapping score provided by the first neurologist; FTN2—Finger-tapping score provided by the second neurologist; Less AH—The less-affected hand, More AH—The more-affected hand. 2.3. Measurement Methodology During the recordings, the subjects were sitting in the chair with their arms bent and supported at the elbow and hands placed in front of them. They were instructed to perform the finger-tapping test by tapping their thumb and index finger as quickly and as widely as possible for 15 s. Although instructions provided in the UPDRS test state that patients should tap their fingers 10 times [3], in this study, longer recordings were acquired to ensure that sufficient data for analysis were available. In order to become accustomed to the instrumentation and measurement methodology, for each subject, several trials were recorded per hand, with one minute of rest between the trials. Each trial was also recorded with a video camera, which filmed the hand in a close-up view. The most representative recording (one for each hand) was used for further analysis. The recording was selected by the neurologists as the recording that fullfills the requirements of tapping duration and patients’ understanding of the given instructions. The testing of each subject was performed during one day at the Clinic of Neurology, Clinical Centre of Serbia, Belgrade. The examination was carried out in accordance with the ethical standards of the Declaration of Helsinki, and approved by the Ethical Committee of the School of Medicine, University of Belgrade. All of the participants provided informed consent prior to participation in the study. 2.4. Scoring by Neurologists The recorded video data were later examined and scored by two neurologists with more than 10 years of experience, based on their knowledge and experience and the instructions given in the UPDRS, Part III–Motor examination, task 3.4 Finger tapping. The neurologists were blinded to the subjects’ identity, since the video data show a close-up view of each subject’s hand. The scores were given for Sensors 2019, 19, 2644 5 of 17 each patient, separately for the left and right hand. The scores given by neurologists are provided in Table 1, in the last two columns. The scores given for the patients were provided separately for the less- and more-affected hand (averaged for all patients per group), whereas, in the case of healthy controls, the scores were averaged for both hands and all HC participants. 2.5. Data Processing and Analysis The gyro data were recorded with a sampling frequency fs = 200 Hz. Calibrated data were processed in Matlab 9.0 R2016a (MathWorks, Natick, MA, USA). The flowchart of the expert system for calculation of UPDRS finger-tapping scores is presented in Figure 2. The inputs to the expert system are angular velocities from the thumb ( → ω1) and index finger sensors ( → ω2). No pre-processing was performed on the input signals. Upon the sensor’s placement, the coordinate system of the thumb (X1, Y1, Z1) and the coordinate system of the index finger (X2, Y2, Z2) were rotated with respect to each other (Figure 1). The angular velocities were transformed and analyzed from the index-finger coordinate system. The relative angular velocity of the thumb with respect to the index finger was calculated → ωr = → ω1 − → ω2 [27]. The dominant component of the relative angular velocity ωrd was automatically selected and used as the input in further data processing and analysis [27]. It was shown that, in most cases, the dominant component of the relative angular velocity is about the Y2−axis of the index-finger coordinate system. In other cases (when this component is not dominant), the coordinate system of the index finger was rotated, so the new Y2−axis represents the dominant rotation. Sensors 2019, 19, x FOR PEER REVIEW 5 of 17 2.4. Scoring by Neurologists The recorded video data were later examined and scored by two neurologists with more than 10 years of experience, based on their knowledge and experience and the instructions given in the UPDRS, Part III–Motor examination, task 3.4 Finger tapping. The neurologists were blinded to the subjects’ identity, since the video data show a close-up view of each subject’s hand. The scores were given for each patient, separately for the left and right hand. The scores given by neurologists are provided in Table 1, in the last two columns. The scores given for the patients were provided separately for the less- and more-affected hand (averaged for all patients per group), whereas, in the case of healthy controls, the scores were averaged for both hands and all HC participants. 2.5. Data Processing and Analysis The gyro data were recorded with a sampling frequency 𝑓 = 200 𝐻𝑧. Calibrated data were processed in Matlab 9.0 R2016a (MathWorks, Natick, MA, USA). The flowchart of the expert system for calculation of UPDRS finger-tapping scores is presented in Figure 2. The inputs to the expert system are angular velocities from the thumb (𝜔⃗) and index finger sensors (𝜔 ⃗). No pre-processing was performed on the input signals. Upon the sensor’s placement, the coordinate system of the thumb (𝑋 , 𝑌 , 𝑍 ) and the coordinate system of the index finger (𝑋 , 𝑌 , 𝑍 ) were rotated with respect to each other (Figure 1). The angular velocities were transformed and analyzed from the index-finger coordinate system. The relative angular velocity of the thumb with respect to the index finger was calculated 𝜔⃗ = 𝜔 ⃗ − 𝜔 ⃗ [27]. The dominant component of the relative angular velocity 𝜔 was automatically selected and used as the input in further data processing and analysis [27]. It was shown that, in most cases, the dominant component of the relative angular velocity is about the 𝑌 −axis of the index-finger coordinate system. In other cases (when this component is not dominant), the coordinate system of the index finger was rotated, so the new 𝑌 −axis represents the dominant rotation. Further analysis was divided into one pre-processing block for segmentation to individual taps and four blocks that calculate features to describe criteria defined in the UPDRS test: tapping amplitude, amplitude decrement, hesitations and freezes, and tapping speed. The calculated features are then used as the input to the decision-support system. As the result, a complete analysis of the patient’s finger-tapping performance, including the finger-tapping score (0–4), is provided. Figure 2. Block diagram of the expert system for UPDRS finger-tapping score calculation. Figure 2. Block diagram of the expert system for UPDRS finger-tapping score calculation. Further analysis was divided into one pre-processing block for segmentation to individual taps and four blocks that calculate features to describe criteria defined in the UPDRS test: tapping amplitude, amplitude decrement, hesitations and freezes, and tapping speed. The calculated features are then used as the input to the decision-support system. As the result, a complete analysis of the patient’s finger-tapping performance, including the finger-tapping score (0–4), is provided. Sensors 2019, 19, 2644 6 of 17 2.5.1. Individual Taps In order to evaluate characteristics of the tapping performance for individual taps, segmentation of the dominant component of the relative angular velocity ωrd was performed. A moving-average filter was applied to the observed signal with a span equal to ( fS/ f0)/2, where f0 represents the basic tapping frequency extracted from the spectrum. The filtered signal was normalized to its maximum value. From the obtained sequence, areas above 0.1 and below −0.1 were identified, corresponding to regions where positive peaks and negative valleys are located, respectively. Local extrema were identified for each of the extracted regions. Positive peaks correspond to the maximal closing velocity (circles, Figure 3), whereas negative valleys represent moments when fingers achieve the maximal opening velocity (squares, Figure 3). The samples in which the smoothed angular velocity ωrd passes through a zero value for the first time were identified between each neighbouring maximum and minimum marker. These samples represent the moments when fingers are closed (“zero posture”). The sequence was complemented with the first and last sample. The finally obtained samples were identified as time markers for drift removal and segmentation on individual taps (crosses, Figure 3). Sensors 2019, 19, x FOR PEER REVIEW 6 of 17 2.5.1. Individual Taps In order to evaluate characteristics of the tapping performance for individual taps, segmentation of the dominant component of the relative angular velocity 𝜔 was performed. A moving-average filter was applied to the observed signal with a span equal to (𝑓 /𝑓 )/2, where 𝑓 represents the basic tapping frequency extracted from the spectrum. The filtered signal was normalized to its maximum value. From the obtained sequence, areas above 0.1 and below −0.1 were identified, corresponding to regions where positive peaks and negative valleys are located, respectively. Local extrema were identified for each of the extracted regions. Positive peaks correspond to the maximal closing velocity (circles, Figure 3), whereas negative valleys represent moments when fingers achieve the maximal opening velocity (squares, Figure 3). The samples in which the smoothed angular velocity 𝜔 passes through a zero value for the first time were identified between each neighbouring maximum and minimum marker. These samples represent the moments when fingers are closed (“zero posture”). The sequence was complemented with the first and last sample. The finally obtained samples were identified as time markers for drift removal and segmentation on individual taps (crosses, Figure 3). Figure 3. An example of the normalized dominant component of the relative angular velocity 𝜔 for one MSA patient (ID: MSA11) with extracted markers. 2.5.2. Amplitude One of the evaluation criteria is the tapping amplitude, which evaluates how widely subjects can tap their fingers. The finger-tapping amplitude is defined as the angle that fingers formed during repetitive tapping movements. The tapping angle was calculated by integrating the dominant component of the relative angular velocity 𝜔 [27]. The drift was removed by using a third-order polynomial fitted (approximation) through markers corresponding to moments where fingers are closed (i.e., the angle is equal to zero, red crosses in Figure 4). Upon the drift’s removal, the obtained angle sequence was segmented into individual taps using the same time markers. The highest aperture of the fingers (the biggest angle that fingers form) was found for each individual tap and is expressed in degrees 𝛼(𝑖) (°) (black circles in Figure 4, lower panel). The final parametric result was calculated as the average of the maximum angles calculated for each tap - 𝛼 (°). Figure 3. An example of the normalized dominant component of the relative angular velocity ωrd for one MSA patient (ID: MSA11) with extracted markers. 2.5.2. Amplitude One of the evaluation criteria is the tapping amplitude, which evaluates how widely subjects can tap their fingers. The finger-tapping amplitude is defined as the angle that fingers formed during repetitive tapping movements. The tapping angle was calculated by integrating the dominant component of the relative angular velocity ωrd [27]. The drift was removed by using a third-order polynomial fitted (approximation) through markers corresponding to moments where fingers are closed (i.e., the angle is equal to zero, red crosses in Figure 4). Upon the drift’s removal, the obtained angle sequence was segmented into individual taps using the same time markers. The highest aperture of the fingers (the biggest angle that fingers form) was found for each individual tap and is expressed in degrees α(i) (◦) (black circles in Figure 4, lower panel). The final parametric result was calculated as the average of the maximum angles calculated for each tap-αav (◦). Sensors 2019, 19, 2644 7 of 17 Sensors 2019, 19, x FOR PEER REVIEW 7 of 17 Figure 4. Upper panel: Angle estimation. The dashed grey line marks the drifted angle sequence, and the solid black line corresponds to the angle sequence after drift removal. Red crosses show “zero posture” markers, and the dotted red line presents the polynomial fit used for drift removal. Lower panel: Angle amplitude decrement. The solid grey line shows the angle sequence, whereas black circles mark the angle amplitudes (highest finger apertures) per tap. The dashed red line presents the threshold 𝑇𝐻 used for the detection of decreased amplitudes. The example is given for one MSA patient (ID: MSA11). 2.5.3. Amplitude Decrement Physicians evaluate the amplitude decrement according to the part of the tapping sequence at which the amplitude starts to decrease. In order to objectively quantify changes of the tapping amplitude, we observed tap-to-tap changes in the highest finger apertures calculated for individual taps 𝛼(𝑖). The angle amplitude of each individual tap was compared with the previously achieved maximum aperture of the fingers. The threshold 𝑇𝐻 = 75% of the value of the previous maximum finger aperture was selected as the optimum (as shown in Figure 4, lower panel). This threshold was heuristically determined through extensive analysis of the used signal database. Threshold values from 50% to 90% (with a step of 5%) were chosen and tested. The threshold of 75% provides the best results for the prediction of scores. It was shown that higher threshold values cause detection of very small amplitude changes, which can appear due to normal movement variability. Lower threshold values detect angle decrements later in the tapping sequence, with some delay compared to the first real significant decrement. Indices of all taps that satisfy this criterion for the amplitude decrease were extracted using the chosen threshold 𝑇𝐻 (for the example shown in the lower panel of Figure 4, all taps are below the threshold except for the first one, which is used as the reference for the calculation of the threshold). The indices of the first tap from the obtained sequence were selected as the final parametric result and marked with 𝑖 (for the example in Figure 4, that is the second tap and, therefore, 𝑖 = 2). 2.5.4. Hesitations and Freezes Hesitations and freezes are manifested as irregularities or breaks of the tapping rhythm that may occur in different moments of the tapping performance and represent an important part of the finger- tapping evaluation. The continuous wavelet transform (CWT) was applied for the detection and Figure 4. Upper panel: Angle estimation. The dashed grey line marks the drifted angle sequence, and the solid black line corresponds to the angle sequence after drift removal. Red crosses show “zero posture” markers, and the dotted red line presents the polynomial fit used for drift removal. Lower panel: Angle amplitude decrement. The solid grey line shows the angle sequence, whereas black circles mark the angle amplitudes (highest finger apertures) per tap. The dashed red line presents the threshold THα used for the detection of decreased amplitudes. The example is given for one MSA patient (ID: MSA11). 2.5.3. Amplitude Decrement Physicians evaluate the amplitude decrement according to the part of the tapping sequence at which the amplitude starts to decrease. In order to objectively quantify changes of the tapping amplitude, we observed tap-to-tap changes in the highest finger apertures calculated for individual taps α(i). The angle amplitude of each individual tap was compared with the previously achieved maximum aperture of the fingers. The threshold THα = 75% of the value of the previous maximum finger aperture was selected as the optimum (as shown in Figure 4, lower panel). This threshold was heuristically determined through extensive analysis of the used signal database. Threshold values from 50% to 90% (with a step of 5%) were chosen and tested. The threshold of 75% provides the best results for the prediction of scores. It was shown that higher threshold values cause detection of very small amplitude changes, which can appear due to normal movement variability. Lower threshold values detect angle decrements later in the tapping sequence, with some delay compared to the first real significant decrement. Indices of all taps that satisfy this criterion for the amplitude decrease were extracted using the chosen threshold THα (for the example shown in the lower panel of Figure 4, all taps are below the threshold except for the first one, which is used as the reference for the calculation of the threshold). The indices of the first tap from the obtained sequence were selected as the final parametric result and marked with idec (for the example in Figure 4, that is the second tap and, therefore, idec = 2). 2.5.4. Hesitations and Freezes Hesitations and freezes are manifested as irregularities or breaks of the tapping rhythm that may occur in different moments of the tapping performance and represent an important part of the finger-tapping evaluation. The continuous wavelet transform (CWT) was applied for the detection Sensors 2019, 19, 2644 8 of 17 and localization of disruptions of the tapping rhythmicity [28]. It is a time-frequency analysis method that is suitable for the analysis of transient changes and spikes in rhythmic behaviour [29]. CWT was applied on the dominant component of the relative angular velocity (ωrd). The CWT method based on the Fast Fourier transform algorithm was used, together with the mother wavelet function from the complex Morlet family (center frequency f 0 = 1 Hz and time-frequency resolution σ = 0.7). A matrix of complex CWT coefficients was obtained as a result. We introduced a cross-sectional area by summing the CWT coefficients perpendicular to the time axis. The obtained characteristic was normalized with respect to its maximum value and is expressed as a percentage (CSAT (%)). In this way, we obtain a characteristic that describes the temporal changes in the tapping activity [28]. An example of the CSAT characteristic for one patient is presented in Figure 5, lower panel. The samples were then divided according to two thresholds: TH50 = 50% of the average CSAT value, and TH25 = 25% of the average CSAT value (the dashed grey and dotted black vertical lines in Figure 5, respectively). Samples with values below TH50 and above TH25 threshold were considered to be parts of the hesitation sequences (Figure 5, dotted grey vertical lines, with an “H” mark), whereas samples with the smallest amplitude (below TH25) were considered to be parts of freezes (Figure 5, dotted grey vertical lines, with an “F” mark). If a hesitation sequence lasts three times longer than the subjects’ average tapping frequency, then it is considered to be a freeze sequence. In addition, very short sequences (shorter than one half of the subjects’ average tapping frequency) were discarded from the analysis. The parametric result comprises the number of hesitation sequences Hnum and the number of freeze sequences Fnum. Using the average CSAT value for thresholds ensures that the detection of irregularities is adapted to the intrinsic properties of each signal, considering the signal parts with significant losses in power (below 50% and 25% of the average) as irregularities. The values of the applied thresholds were verified through an extensive search of the database. All detected irregularities were confirmed by neurologists during their visual inspection of video recordings. Sensors 2019, 19, x FOR PEER REVIEW 8 of 17 localization of disruptions of the tapping rhythmicity [28]. It is a time-frequency analysis method that is suitable for the analysis of transient changes and spikes in rhythmic behaviour [29]. CWT was applied on the dominant component of the relative angular velocity (𝜔 ). The CWT method based on the Fast Fourier transform algorithm was used, together with the mother wavelet function from the complex Morlet family (center frequency f0 = 1 Hz and time-frequency resolution σ = 0.7). A matrix of complex CWT coefficients was obtained as a result. We introduced a cross-sectional area by summing the CWT coefficients perpendicular to the time axis. The obtained characteristic was normalized with respect to its maximum value and is expressed as a percentage (𝐶𝑆𝐴 (%)). In this way, we obtain a characteristic that describes the temporal changes in the tapping activity [28]. An example of the 𝐶𝑆𝐴 characteristic for one patient is presented in Figure 5, lower panel. The samples were then divided according to two thresholds: 𝑇𝐻 = 50% of the average 𝐶𝑆𝐴 value, and 𝑇𝐻 = 25% of the average 𝐶𝑆𝐴 value (the dashed grey and dotted black vertical lines in Figure 5, respectively). Samples with values below 𝑇𝐻 and above 𝑇𝐻 threshold were considered to be parts of the hesitation sequences (Figure 5, dotted grey vertical lines, with an “H” mark), whereas samples with the smallest amplitude (below 𝑇𝐻 ) were considered to be parts of freezes (Figure 5, dotted grey vertical lines, with an “F” mark). If a hesitation sequence lasts three times longer than the subjects’ average tapping frequency, then it is considered to be a freeze sequence. In addition, very short sequences (shorter than one half of the subjects’ average tapping frequency) were discarded from the analysis. The parametric result comprises the number of hesitation sequences 𝐻 and the number of freeze sequences 𝐹 . Using the average 𝐶𝑆𝐴 value for thresholds ensures that the detection of irregularities is adapted to the intrinsic properties of each signal, considering the signal parts with significant losses in power (below 50% and 25% of the average) as irregularities. The values of the applied thresholds were verified through an extensive search of the database. All detected irregularities were confirmed by neurologists during their visual inspection of video recordings. Figure 5. Calculation of hesitations and freezes: angular velocity 𝜔 (upper panel) and calculated 𝐶𝑆𝐴 characteristic (bottom panel). The solid grey horizontal line marks the average 𝐶𝑆𝐴 value. The dashed grey horizontal line corresponds to the upper threshold 𝑇𝐻 = 50% of the 𝐶𝑆𝐴 average value. The dotted black horizontal line shows the lower threshold 𝑇𝐻 = 25% of the 𝐶𝑆𝐴 - average value. Similarly, dotted grey vertical lines show areas that are classified as hesitations (an “H” mark) and freezes (an “F” mark). The example is given for one PSP patient (ID: PSP14). 2.5.5. Speed An important criterion for evaluation of bradykinesia in the finger-tapping task is the tapping speed. During the evaluation, neurologists examine how fast subjects are tapping. If subjects tap faster, then during those 15 s of the tapping test they perform a larger number of taps, and vice versa. Figure 5. Calculation of hesitations and freezes: angular velocity ωrd (upper panel) and calculated CSAT characteristic (bottom panel). The solid grey horizontal line marks the average CSAT value. The dashed grey horizontal line corresponds to the upper threshold TH50 = 50% of the CSAT average value. The dotted black horizontal line shows the lower threshold TH25 = 25% of the CSAT- average value. Similarly, dotted grey vertical lines show areas that are classified as hesitations (an “H” mark) and freezes (an “F” mark). The example is given for one PSP patient (ID: PSP14). 2.5.5. Speed An important criterion for evaluation of bradykinesia in the finger-tapping task is the tapping speed. During the evaluation, neurologists examine how fast subjects are tapping. If subjects tap Sensors 2019, 19, 2644 9 of 17 faster, then during those 15 s of the tapping test they perform a larger number of taps, and vice versa. Although this can also be evaluated from the number of performed taps and their duration, by using the calculated matrix of CWT coefficients, the dominant tapping frequency can be found for each time sample. In this way, all changes of the tapping rhythm are assessed, detected, and included in the analysis. The vector of coefficients corresponding to one sample was extracted from the CWT matrix. From the obtained vector, the most prominent frequency was calculated as the frequency at which the coefficient with the highest value is located (as shown for the i-th sample in Figure 6). The procedure was repeated for all samples. In this way, the new frequency characteristic f (i) was obtained. The average value of the frequency characteristic f (i) was calculated and is marked as f (i)av (Hz). Sensors 2019, 19, x FOR PEER REVIEW 9 of 17 Although this can also be evaluated from the number of performed taps and their duration, by using the calculated matrix of CWT coefficients, the dominant tapping frequency can be found for each time sample. In this way, all changes of the tapping rhythm are assessed, detected, and included in the analysis. The vector of coefficients corresponding to one sample was extracted from the CWT matrix. From the obtained vector, the most prominent frequency was calculated as the frequency at which the coefficient with the highest value is located (as shown for the i-th sample in Figure 6). The procedure was repeated for all samples. In this way, the new frequency characteristic 𝑓 ( ) was obtained. The average value of the frequency characteristic 𝑓 ( ) was calculated and is marked as 𝑓 ( ) (Hz). Figure 6. Calculation of the frequency characteristic: scalogram of the obtained continuous wavelet transform (CWT) coefficients. The dashed black line marks the i-th sample. The CWT coefficients at the i-th sample are presented in the smaller upper panel. The red dashed line in the upper panel marks the frequency with the highest amplitude of the CWT coefficients for the i-th sample (referred to as 𝑓 ( )). The example is given for one PSP patient (ID: PSP14). 2.5.6. Decision Support System In the UPDRS motor scale, Part III–Motor examination, task 3.4 Finger tapping [3], instructions for bradykinesia evaluation are given as follows: 0. Normal: Regular rhythm, without hesitations or freezes. Fast movement, large amplitude, no amplitude decrement. 1. Slight: Any of the following: (a) the regular rhythm is broken with one or two interruptions or hesitations of the tapping movement; (b) slight slowing; (c) the amplitude decrements near the end of the 10 taps. 2. Mild: Any of the following: (a) three to five interruptions during tapping; (b) mild slowing; (c) the amplitude decrements midway in the 10-tap sequence. 3. Moderate: Any of the following: (a) over five interruptions during tapping or at least one freeze in ongoing movement; (b) moderate slowing; (c) the amplitude decrements starting after the first tap. 4. Severe: Cannot or can only barely perform the task due to slowing, interruptions, or decrements. Each hand is evaluated separately, in terms of speed, amplitude, hesitation and freezes, and decrementing amplitude. These criteria are described with the introduced features, which are then fed to the decision support system. The input feature set includes the average tapping angle 𝛼 , the average frequency 𝑓 ( ), the index of the first tap with a significant angle amplitude decrement 𝑖 , the number of hesitations 𝐻 , and the number of freezes 𝐹 . The rules are defined separately for Figure 6. Calculation of the frequency characteristic: scalogram of the obtained continuous wavelet transform (CWT) coefficients. The dashed black line marks the i-th sample. The CWT coefficients at the i-th sample are presented in the smaller upper panel. The red dashed line in the upper panel marks the frequency with the highest amplitude of the CWT coefficients for the i-th sample (referred to as fi (i)). The example is given for one PSP patient (ID: PSP14). 2.5.6. Decision Support System In the UPDRS motor scale, Part III–Motor examination, task 3.4 Finger tapping [3], instructions for bradykinesia evaluation are given as follows: 0 Normal: Regular rhythm, without hesitations or freezes. Fast movement, large amplitude, no amplitude decrement. 1 Slight: Any of the following: (a) the regular rhythm is broken with one or two interruptions or hesitations of the tapping movement; (b) slight slowing; (c) the amplitude decrements near the end of the 10 taps. 2 Mild: Any of the following: (a) three to five interruptions during tapping; (b) mild slowing; (c) the amplitude decrements midway in the 10-tap sequence. 3 Moderate: Any of the following: (a) over five interruptions during tapping or at least one freeze in ongoing movement; (b) moderate slowing; (c) the amplitude decrements starting after the first tap. 4 Severe: Cannot or can only barely perform the task due to slowing, interruptions, or decrements. Each hand is evaluated separately, in terms of speed, amplitude, hesitation and freezes, and decrementing amplitude. These criteria are described with the introduced features, which are then fed to the decision support system. The input feature set includes the average tapping angle αav, Sensors 2019, 19, 2644 10 of 17 the average frequency f (i)av , the index of the first tap with a significant angle amplitude decrement idec, the number of hesitations Hnum, and the number of freezes Fnum. The rules are defined separately for each feature to give the subscores for each criterion, which are afterwards used for the calculation of the final score. As indicated, the lowest score corresponds to “normal” movements. Therefore, the first step is to find values that could be considered to be the reference for normal movements. The defined methodology was initially applied to a signal database from the control group that included a subset of healthy controls with no signs of bradykinesia (scored with 0). During the examination of the video files, it was noticed that both patients and healthy controls performed the tapping task in two different ways. In the first group, the subjects tapped as widely as possible at the highest speed that allows for such a tapping. In the other group, the subjects tapped with smaller amplitudes, but at their fastest pace. Using the parameters describing the tapping speed and amplitude (αav and f (i) av , respectively), the selected healthy controls were divided into two clusters using the k-means algorithm. The coordinates of the cluster centers were used as the measure for discriminating the two types of tapping performance. From all three groups of patients, we randomly selected 50% percent of the files and assigned them to the testing group. By calculating the distance between the center of the clusters and the data (αav, f (i) av ) obtained from the testing group, each patient was assigned to one of the two defined clusters (C1, “wider and slower”; C2, “narrower and faster”). The scores provided by the neurologists are given as the final score and do not provide information about different aspects (characteristics) of the performance that were analyzed. Because of that, it was necessary to apply an unsupervised learning algorithm to analyze properties of the features and find a natural grouping among the data. Testing data corresponding to one of the parameters (αav and f (i) av ) and one of the clusters (C1 or C2) were additionally divided into four clusters (corresponding to scores 0–3) using the k-means algorithm. Although there are five scores in the UPDRS test, the data were divided into four clusters, since the highest score (corresponding to the worst performance) is assigned to patients that barely perform the task (the movement is affected by multiple types of disturbances simultaneously). The coordinates of the cluster centers (c1, c2, c3, c4) were used for calculation of decision boundaries: bi = ci + ci+1, 2 ; i = 1, 2, 3 (1) where ci and ci+1 represent the centers of two neighbouring clusters and bi represents the calculated boundary separating the two scores. The procedure was repeated for both clusters C1 and C2, and for both parameters αav and f (i) av , separately (1: C1 and αav, 2: C2 and αav, 3: C1 and f (i) av , 4: C2 and f (i) av ), resulting in four sets of boundaries, each with three values. For each analyzed file, decision boundaries for the αav and f (i) av features were selected from those four sets. If this coordinate pair (αav, f (i) av ) is closer to the center of the cluster C1 than to the center of the cluster C2, then the patient’s file was assigned to the cluster C1 and decision boundaries (bα1,2,3 and b f 1,2,3) for cluster C1 were selected, and vice versa. Decision boundaries for the remaining features (idec and Hnum, Fnum) were set to match the instructions and criteria given within the UPDRS scale (as mentioned above). The block scheme of the decision support system is presented in Figure 7. The first part of the decision-making process is divided into four blocks (each bordered with dashed black line). The inputs of these blocks are the calculated features: αav, f (i) av , idec, and Hnum, Fnum, respectively. For each feature, a subscore is calculated separately, based on the range within which the feature value is located. In this way, the four processing blocks result in four subscores: Sα, S f , Sdec, and SHF, respectively. If the subscore “3–Moderate” is obtained for at least three out of four features, then the final score SFT is set to be “4–Severe”. Otherwise, the final score SFT is selected as the maximum obtained subscore among the four subscores corresponding to the individual features. Sensors 2019, 19, 2644 11 of 17 Sensors 2019, 19, x FOR PEER REVIEW 11 of 17 Figure 7. The block scheme of the decision support system. The system is divided into four processing blocks (bordered with dashed black rectangles). The inputs to the blocks are the calculated features: 𝛼 , 𝑓 ( ) 𝑖 , and 𝐻 , 𝐹 , respectively. Each block implements rules and assigns a subscore for the input feature. The final score 𝑆 is decided based on the results obtained from all four blocks. 2.5.7. Statistical Analysis and Evaluation To find the agreement between the scores obtained from two neurologists (raters), Cohen’s kappa statistics for finding intra-rater reliability among categorical data were applied. The results obtained from the decision support algorithm were compared with the scores given by the neurologists. The performance was measured using the confusion matrix and the accuracy of the proposed method, expressed as the percent of equally assigned scores. Initially, results were evaluated for all recordings (Case I), and later for the recordings equally scored by both raters (Case II). 3. Results The intra-rater reliability was calculated with the Cohen’s Kappa statistic and it equals to κ = 0.79, showing some discrepancy between raters’ scores. This result is expected, since the scores are provided based on their visual and subjective estimation. Overall, 87 recordings obtained from 44 patients (PD: 26 recordings, MSA: 34, PSP: 27) were included in the analysis, as well as 24 recordings obtained from 12 healthy controls. The descriptive statistics for the introduced features are given in Table 2 for each group of subjects separately. Table 2. Descriptive statistics (average ± st.deviation) for each feature and group of subjects. Group 𝒇𝒂𝒗(𝒊) (𝐇𝐳) 𝜶𝒂𝒗 (°) 𝒊𝒅𝒆𝒄 (#) 𝑯𝒏𝒖𝒎 (#) 𝑭𝒏𝒖𝒎 (#) PD 2.04 ± 0.87 63.08 ± 8.54 5.00 ± 5.66 0–4 0 MSA 1.71 ± 1.26 56.27 ± 36.11 4.03 ± 4.74 0–7 0–2 PSP 2.37 ± 1.11 44.87 ± 31.74 5.62 ± 4.88 0–4 0–1 HC 3.32 ± 0.89 80.48 ± 26.55 11.00 ± 10.99 / / Figure 7. The block scheme of the decision support system. The system is divided into four processing blocks (bordered with dashed black rectangles). The inputs to the blocks are the calculated features: αav, f (i) av idec, and Hnum, Fnum, respectively. Each block implements rules and assigns a subscore for the input feature. The final score SFT is decided based on the results obtained from all four blocks. 2.5.7. Statistical Analysis and Evaluation To find the agreement between the scores obtained from two neurologists (raters), Cohen’s kappa statistics for finding intra-rater reliability among categorical data were applied. The results obtained from the decision support algorithm were compared with the scores given by the neurologists. The performance was measured using the confusion matrix and the accuracy of the proposed method, expressed as the percent of equally assigned scores. Initially, results were evaluated for all recordings (Case I), and later for the recordings equally scored by both raters (Case II). 3. Results The intra-rater reliability was calculated with the Cohen’s Kappa statistic and it equals to κ = 0.79, showing some discrepancy between raters’ scores. This result is expected, since the scores are provided based on their visual and subjective estimation. Overall, 87 recordings obtained from 44 patients (PD: 26 recordings, MSA: 34, PSP: 27) were included in the analysis, as well as 24 recordings obtained from 12 healthy controls. The descriptive statistics for the introduced features are given in Table 2 for each group of subjects separately. Table 2. Descriptive statistics (average ± st.deviation) for each feature and group of subjects. Group f(i)av (Hz) αav ( ◦) idec (#) Hnum (#) Fnum (#) PD 2.04 ± 0.87 63.08 ± 8.54 5.00 ± 5.66 0–4 0 MSA 1.71 ± 1.26 56.27 ± 36.11 4.03 ± 4.74 0–7 0–2 PSP 2.37 ± 1.11 44.87 ± 31.74 5.62 ± 4.88 0–4 0–1 HC 3.32 ± 0.89 80.48 ± 26.55 11.00 ± 10.99 / / The highest values for the αav and f (i) av parameters were obtained for HC. Among patients, the PD group achieved the biggest angle amplitude values (on average); however, their tapping frequency Sensors 2019, 19, 2644 12 of 17 was found to be lower (on average) compared to the PSP group. This discrepancy shows that we need to discriminate between the two types of movements and, consequently, use two sets of decision boundaries for these two features. The feature describing the angle decrement (idec) was found to be comparable among patients. Although some HC also show a decrease in the angle amplitude, this is observed later in the tapping sequence (usually after the 10th tap). PD patients did not experience any freezing during the performance, whereas the number of hesitations was found to be comparable among groups. Among HC, none of the subjects experienced either hesitation or a freeze. Figure 8 shows the obtained angle αav and frequency f (i) av features versus the calculated scores. The performance clusters are shown using the color- and shape-coded representation. It can be seen that the αav and f (i) av features decrease with higher scores, which is in line with the criteria that is observed within the UPDRS. In addition, it can be confirmed that files assigned to the cluster C1 are characterized by larger angle values and a lower tapping frequency, whereas the cluster C2 includes files with a lower angle amplitude and a larger tapping frequency. Sensors 2019, 19, x FOR PEER REVIEW 12 of 17 The highest values for the 𝛼 and 𝑓 ( ) parameters were obtained for HC. Among patients, the PD group achieved the biggest angle amplitude values (on average); however, their tapping frequency was found to be lower (on average) compared to the PSP group. This discrepancy shows that we need to discriminate between the two types of movements and, consequently, use two sets of decision boundaries for these two features. The feature describing the angle decrement (𝑖 ) was found to be comparable among patients. Although some HC also show a decrease in the angle amplitude, this is observed later in the tapping sequence (usually after the 10th tap). PD patients did not experience any freezing during the performance, whereas the number of hesitations was found to be comparable among groups. Among HC, none of the subjects experienced either hesitation or a freeze. Figure 8 shows the obtained angle 𝛼 and frequency 𝑓 ( ) features versus the calculated scores. The performance clusters are shown using the color- and shape-coded representation. It can be seen that the 𝛼 and 𝑓 ( ) features decrease with higher scores, which is in line with the criteria that is observed within the UPDRS. In addition, it can be confirmed that files assigned to the cluster 𝐶 are characterized by larger angle values and a lower tapping frequency, whereas the cluster 𝐶 includes files with a lower angle amplitude and a larger tapping frequency. (a) (b) Figure 8. Dependency of: (a) calculated scores and the 𝛼 feature; (b) calculated scores the and 𝑓 ( ) feature. Grey circles mark samples that are assigned to the 𝐶 cluster (“wider and slower” performance), whereas black crosses correspond to members of the cluster 𝐶 (“narrower and faster” performance). The results of the expert system are presented in Table 3, for each group separately, as well as the summary for all patients. The results from the left column were obtained using all the recordings (Case I—87 recordings, PD: 26, MSA: 34, PSP: 27) and averaged for two raters, whereas the right column shows results obtained using only the recordings equally scored by both raters (Case II—76 recordings, PD: 25, MSA: 29, PSP: 22). Results are also presented in Figure 9 by a confusion matrix. Table 3. Results of the decision support system for each group of patients separately and in total. The result is provided for two cases: when all recordings are included in the analysis (Case I) and when only recordings with the same score from both raters are included in the analysis (Case II). Group Case I Accuracy (%) Case II Accuracy (%) PD 82.69 ± 2.72 84.00 MSA 82.36 ± 8.32 89.65 PSP 83.76 ± 7.86 90.91 TOTAL 83.33 ± 6.50 88.16 1 PD—Parkinson’s disease; MSA—Multiple system atrophy; PSP—Progressive supranuclear palsy. Figure 8. Dependency of: (a) calculated scores and the αav feature; (b) calculated scores the and f (i) av feature. Grey circles mark samples that are assigned to the C1 cluster (“wider and slower” performance), whereas black crosses correspond to members of the cluster C2 (“narrower and faster” performance). The results of the expert system are presented in Table 3, for each group separately, as well as the summary for all patients. The results from the left column were obtained using all the recordings (Case I—87 recordings, PD: 26, MSA: 34, PSP: 27) and averaged for two raters, whereas the right column shows results obtained using only the recordings equally scored by both raters (Case II—76 recordings, PD: 25, MSA: 29, PSP: 22). Results are also presented in Figure 9 by a confusion matrix. Table 3. Results of the decision support system for each group of patients separately and in total. The result is provided for two cases: when all recordings are included in the analysis (Case I) and when only recordings with the same score from both raters are included in the analysis (Case II). Group Case I Accuracy (%) Case II Accuracy (%) PD 82.69 ± 2.72 84.00 MSA 82.36 ± 8.32 89.65 PSP 83.76 ± 7.86 90.91 TOTAL 83.33 ± 6.50 88.16 1 PD—Parkinson’s disease; MSA—Multiple system atrophy; PSP—Progressive supranuclear palsy. Sensors 2019, 19, 2644 13 of 17 Sensors 2019, 19, x FOR PEER REVIEW 13 of 17 (a) (b) Figure 9. Presentation of results using the confusion matrix. (a) Case I—The result obtained when all recordings are included. (b) Case II—The result obtained using only recordings on which both raters agreed. The cells on the diagonal of the confusion matrix show the overall success rate for each score (expressed as a percentage (%)), whereas the cells outside the diagonal show the error rate for the scores (expressed as a percentage (%)). When all recordings are included in the analysis, comparable results are obtained for all three groups of patients. It is shown that the decision support system provides results that agree with the scores of the neurologists with a good accuracy (above 80%). This result is improved when only recordings equally scored by both raters are included in the analysis, achieving nearly 90% matching between the system results and neurologists’ estimates. The scores evaluated by the proposed decision system and the scores given by neurologists (Figure 9) do not exceed a one score difference, except for one patient. In Figure 10, we present the results of the expert system. The example is given for two patients who were equally scored by both raters and our expert system (score 𝑆 = 3). Figure 10. The result of the expert system comprising a graphical representation with detected irregularities, calculated features, and the final score. The example is given for one MSA patient (ID: MSA11), right hand, and one PSP patient (ID: PSP14), right hand. Figure 9. Presentation of results using the confusion matrix. (a) Case I—The result obtained when all recordings are included. (b) Case II—The result obtained using only recordings on which both raters agreed. The cells on the diagonal of the confusion matrix show the overall success rate for each score (expressed as a percentage (%)), whereas the cells outside the diagonal show the error rate for the scores (expressed as a percentage (%)). When all recordings are included in the analysis, comparable results are obtained for all three groups of patients. It is shown that the decision support system provides results that agree with the scores of the neurologists with a good accuracy (above 80%). This result is improved when only recordings equally scored by both raters are included in the analysis, achieving nearly 90% matching between the system results and neurologists’ estimates. The scores evaluated by the proposed decision system and the scores given by neurologists (Figure 9) do not exceed a one score difference, except for one patient. In Figure 10, we present the results of the expert system. The example is given for two patients who were equally scored by both raters and our expert system (score SFT = 3). Sensors 2019, 19, x FOR PEER REVIEW 13 of 17 (a) (b) Figure 9. Presentation of results using the confusion matrix. (a) Case I—The result obtained when all recordings are included. (b) Case II—The result obtained using only recordings on which both raters agreed. The cells on the diagonal of the confusion matrix show the overall success rate for each score (expressed as a percentage (%)), whereas the cells outside the diagonal show the error rate for the scores (expressed as a percentage (%)). When all recordings are included in the analysis, comparable results are obtained for all three groups of patients. It is shown that the decision support system provides results that agree with the scores of the neurologists with a good accuracy (above 80%). This result is improved when only recordings equally scored by both raters are included in the analysis, achieving nearly 90% matching between the system results and neurologists’ estimates. The scores evaluated by the proposed decision system and the scores given by neurologists (Figure 9) do not exceed a one score difference, except for one patient. In Figure 10, we present the results of the expert system. The example is given for two patients who were equally scored by both raters and our expert system (score 𝑆 = 3). Figure 10. The result of the expert system comprising a graphical representation with detected irregularities, calculated features, and the final score. The example is given for one MSA patient (ID: MSA11), right hand, and one PSP patient (ID: PSP14), right hand. Figure 10. The result of the expert system comprising a graphical representation with detected irregularities, calculated features, and the final score. The example is given for one MSA patient (ID: MSA11), right hand, and one PSP patient (ID: PSP14), right hand. 4. Discussion In this paper, we introduced a new methodology that enables objective evaluation and quantification of the finger-tapping test that is usually used for bradykinesia assessment in patients with Parkinson’s disease. The system comprises two miniature and lightweight gyro sensors that Sensors 2019, 19, 2644 14 of 17 record the motion of fingers. The methodology for signal quantification is based on the use of simple, automatized, and repeatable signal-processing techniques. In this study, 15 s long finger-tapping sequences were recorded and analyzed. However, the methodology is applicable to other approaches as well (e.g., a 10-tap-long sequence). Patients with finger-tapping bradykinesia severity ranging from 0 to 4 were included in this study. Most of the studies in the literature include severity stages up to 3, indicating that patients with the highest severity cannot perform specified tasks at all. In this study, we included three patients that barely managed to perform the finger-tapping task with one of their hands. However, their performance was poor and affected by multiple performance disturbances. Therefore, they were evaluated with the highest bradykinesia severity score (4). In this way, the performance of the proposed system was examined for the entire range of severity stages. The parametric result comprises features that can be directly correlated with biomechanical properties of the movement and can, therefore, be used to assist physicians during the assessment of bradykinesia in repetitive finger-tapping. For such a purpose, we implemented the time-frequency method continuous wavelet transform, which allows us to evaluate temporal changes in the tapping frequency and to detect irregularities in rhythmic tapping behaviour, such as hesitation and freezes. Numerical integration and drift removal were used for estimation of tapping angle apertures that are calculated for each tap. Temporal changes in the angle apertures were found and described by the index of the first tap with a significant amplitude decrement (compared to previously achieved tapping angle amplitudes). The final feature set is used as the input into the decision support system. Although the literature suggests that machine learning algorithms can predict scores with a high degree of accuracy, labels used for learning are given by physicians, which may cause subjectivity in the obtained results. Typically, only a few dozen recordings are used for learning, which is not a sufficient number of training examples to obtain a clinically acceptable system. Therefore, the decision support system consists of simple rules with decision boundaries designed to match the UPDRS scoring criteria. The decision boundaries for the tapping frequency and angle amplitude are defined according to the feature values obtained from the healthy controls and testing group of patients using the clustering techniques. As shown in Figure 8, smaller angle amplitudes and frequencies correspond to higher scores, which is in line with the criteria that are defined within the UPDRS. The boundaries differ, and they are selected based on the type of movement: wider and slower (cluster C1, grey circles in Figure 8) or narrower and faster (cluster C2, black crosses in Figure 8). In addition, using the clustering techniques, the boundaries are not defined linearly or empirically, but solely based on the grouping of some randomly selected testing data. The decision boundaries for two other features, i.e., the criteria, are defined to match the rules defined in the UPDRS. The results of the decision support system (Table 3) demonstrate that the expert system has achieved an overall accuracy of 83.33 ± 6.50% (averaged for both raters), whereas this result is 82.69 ± 2.72% for PD, 82.36 ± 8.32% for MSA, and 83.76 ± 7.86 for PSP patients. By analyzing the recordings that were evaluated with the same score by both raters, the overall accuracy of the system is increased, achieving 88.16%, whereas this result is 84.00% for PD, 89.65% for MSA, and 90.91% for PSP patients. In the latter case, the decision support system provides wrong scores for only nine recordings (out of 76 recordings). The decision support system provides very good results, even for atypical parkinsonism, in which finger-tapping can be performed differently compared to the typical PD form [27]. By using a larger number of patients, fine tuning of the decision boundaries can be performed, providing even better results. The differences in Table 2 between the obtained parameters indicate that this methodology can be used as the basis for differential diagnostics of typical and atypical parkinsonism. As shown in the confusion matrices in Figure 9, the quantification errors are equal to a one score difference, except for one PD patient, in which our system provides results that are two scores lower than the score from both raters. The question that arises is whether a scale with such a small resolution and only four grades of performance is sufficient. Figure 10 shows the results of the expert system for two patients evaluated by the same score (SFT = 3). These two performances are different: the first Sensors 2019, 19, 2644 15 of 17 one has a lower tapping frequency and higher angle apertures, with a significant decrease in the angle amplitude after the first tap. The second performance has large variations in frequency and angle amplitudes, as well as four hesitations and one freeze. For the first patient, the resulting score is given based on the early amplitude decrease, whereas, for the second patient, the score is given based on the number of performance irregularities. In the UPDRS instructions, it is said that a score is to be given if any of the criteria (speed, amplitude, amplitude decrease, hesitations/freezes) are satisfied. If different or more than one criterion is satisfied for different patients, they cannot be compared. Due to that fact, some researchers have introduced continuous scoring of repetitive hand motions that are used for bradykinesia evaluation. Although they provide a more detailed scoring system, this evaluation does not correspond to standardized clinical scores and may be confusing to physicians. The proposed system provides a complete analysis of repetitive finger-tapping performance, objective measures of important biomechanical properties of the movement, and a graphical presentation of the recorded data with specific changes and irregularities marked in the data. The system can differentiate between different types of performances and provides decision support through automatically calculated scores and subscores for different criteria for the evaluation of movements. The scores are given using rules that are specifically designed to match the universal clinical criteria for evaluation of bradykinesia severity in repetitive finger-tapping movements. In addition, the system was tested on patients with different forms of parkinsonism and at different disease stages, and for the full range of symptom severity (from normal to severely impaired movements). The proposed expert system is detailed and objective, and, therefore, it can be used as a powerful support tool in clinical practice for the evaluation of symptom severity, monitoring the disease’s progress and a patient’s response to therapy, and comparisons with other patients. In the future, an intuitive graphical interface for a software application will be developed to provide a graphical presentation, numerical results for features, scores and subscores, and a statistical analysis. Future work will also include collaborations with other researchers and groups to obtain larger databases and augment the data for analysis in terms of included subjects and the number of tests used to assess bradykinesia and other motor symptoms. We also plan to develop a metric to be used for more efficient differential diagnostics of typical and atypical parkinsonism. Author Contributions: V.B. participated in data acquisition, development of the proposed method, processing the results, and writing the manuscript. M.D.-J. participated in the design of the study, data acquisition, development of the proposed method, and writing the manuscript. N.D. participated in the design of the study, organization and coordination of data acquisition, testing and validation of the method, and writing the manuscript. M.B.P. participated in the design of the study and writing and reviewing the manuscript. V.S.K. participated in the design of the study and writing and reviewing the manuscript. G.K. participated in the development of the proposed method and writing and reviewing the manuscript. Funding: This research is in part funded by the Serbian Ministry of Education, Science and Technological Development under Grant No. OI-175016. Acknowledgments: We would like to acknowledge the PhD student Minja Belić for assisting with recordings. Conflicts of Interest: The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. References 1. Rovini, E.; Maremmani, C.; Cavallo, F. How wearable sensors can support parkinson’s disease diagnosis and treatment: A systematic review. Front. Neurosci. 2017, 11, 555. [CrossRef] [PubMed] 2. Jankovic, J. Parkinson’s disease: Clinical features and diagnosis. J. Neurol. Neurosurg. Psychiatry 2008, 79, 368–376. [CrossRef] [PubMed] 3. MDS UPDRS Rating Scale. Available online: https://www.movementdisorders.org/MDS-Files1/PDFs/Rating- Scales/MDS-UPDRS_English_FINAL.pdf (accessed on 31 January 2019). http://dx.doi.org/10.3389/fnins.2017.00555 http://www.ncbi.nlm.nih.gov/pubmed/29056899 http://dx.doi.org/10.1136/jnnp.2007.131045 http://www.ncbi.nlm.nih.gov/pubmed/18344392 https://www.movementdisorders.org/MDS-Files1/PDFs/Rating-Scales/MDS-UPDRS_English_FINAL.pdf https://www.movementdisorders.org/MDS-Files1/PDFs/Rating-Scales/MDS-UPDRS_English_FINAL.pdf Sensors 2019, 19, 2644 16 of 17 4. Lin, Z.; Xiong, Y.; Cai, G.; Dai, H.; Xia, X.; Tan, Y.; Lueth, T.C. Quantification of Parkinsonian Bradykinesia Based on Axis-Angle Representation and SVM Multiclass Classification Method. IEEE Access 2018, 6, 26895–26903. [CrossRef] 5. Lainscsek, C.; Rowat, P.; Schettino, L.; Lee, D.; Song, D.; Letellier, C.; Poizner, H. Finger tapping movements of Parkinson’s disease patients automatically rated using nonlinear delay differential equations. Chaos An Interdiscip. J. Nonlinear Sci. 2012, 22, 013119. [CrossRef] [PubMed] 6. Djuric-Jovicic, M.; Jovicic, N.; Radovanovic, S.; Jecmenica-Lukic, M.; Belic, M.; Popovic, M.; Kostic, V. Finger and foot tapping sensor system for objective motor assessment. Vojnosanit. Pregl. 2018. [CrossRef] 7. Kim, J.-W.; Lee, J.-H.; Kwon, Y.; Kim, C.-S.; Eom, G.-M.; Koh, S.-B.; Kwon, D.-Y.; Park, K.-W. Quantification of bradykinesia during clinical finger taps using a gyrosensor in patients with Parkinson’s disease. Med. Biol. Eng. Comput. 2011, 49, 365–371. [CrossRef] [PubMed] 8. Stamatakis, J.; Ambroise, J.; Crémers, J.; Sharei, H.; Delvaux, V.; Macq, B.; Garraux, G. Finger Tapping Clinimetric Score Prediction in Parkinson’s Disease Using Low-Cost Accelerometers. Comput. Intell. Neurosci. 2013, 2013, 1–13. [CrossRef] [PubMed] 9. Garza-Rodríguez, A.; Sánchez-Fernández, L.P.; Sánchez-Pérez, L.A.; Ornelas-Vences, C.; Ehrenberg-Inzunza, M. Pronation and supination analysis based on biomechanical signals from Parkinson’s disease patients. Artif. Intell. Med. 2018, 84, 7–22. [CrossRef] [PubMed] 10. Piro, N.; Piro, L.; Kassubek, J.; Blechschmidt-Trapp, R.; Piro, N.E.; Piro, L.K.; Kassubek, J.; Blechschmidt-Trapp, R.A. Analysis and Visualization of 3D Motion Data for UPDRS Rating of Patients with Parkinson’s Disease. Sensors 2016, 16, 930. [CrossRef] 11. Ferraris, C.; Nerino, R.; Chimienti, A.; Pettiti, G.; Cau, N.; Cimolin, V.; Azzaro, C.; Albani, G.; Priano, L.; Mauro, A.; et al. A Self-Managed System for Automated Assessment of UPDRS Upper Limb Tasks in Parkinson’s Disease. Sensors 2018, 18, 3523. [CrossRef] 12. Lee, W.L.; Sinclair, N.C.; Jones, M.; Tan, J.L.; Proud, E.L.; Peppard, R.; McDermott, H.J.; Perera, T. Objective evaluation of bradykinesia in Parkinson’s disease using an inexpensive marker-less motion tracking system. Physiol. Meas. 2019, 40, 014004. [CrossRef] [PubMed] 13. Sano, Y.; Kandori, A.; Shima, K.; Yamaguchi, Y.; Tsuji, T.; Noda, M.; Higashikawa, F.; Yokoe, M.; Sakoda, S. Quantifying Parkinson’s disease finger-tapping severity by extracting and synthesizing finger motion properties. Med. Biol. Eng. Comput. 2016, 54, 953–965. [CrossRef] [PubMed] 14. Arora, S.; Venkataraman, V.; Zhan, A.; Donohue, S.; Biglan, K.M.M.; Dorsey, E.R.R.; Little, M.A.A. Detecting and monitoring the symptoms of Parkinson’s disease using smartphones: A pilot study. Park. Relat. Disord. 2015, 21, 650–653. [CrossRef] [PubMed] 15. Van den Noort, J.C.; Verhagen, R.; van Dijk, K.J.; Veltink, P.H.; Vos, M.C.P.M.; de Bie, R.M.A.; Bour, L.J.; Heida, C.T. Quantification of Hand Motor Symptoms in Parkinson’s Disease: A Proof-of-Principle Study Using Inertial and Force Sensors. Ann. Biomed. Eng. 2017, 45, 2423–2436. [CrossRef] [PubMed] 16. Lin, Z.; Dai, H.; Xiong, Y.; Xia, X.; Horng, S.-J. Quantification assessment of bradykinesia in Parkinson’s disease based on a wearable device. In Proceedings of the 2017 39th IEEE Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jeju Island, Korea, 11–15 July 2017; pp. 803–806. 17. Dai, H.; Lin, H.; Lueth, T.C. Quantitative assessment of parkinsonian bradykinesia based on an inertial measurement unit. Biomed. Eng. Online 2015, 14, 68. [CrossRef] [PubMed] 18. McKay, G.N.; Harrigan, T.P.; Brašić, J.R. A low-cost quantitative continuous measurement of movements in the extremities of people with Parkinson’s disease. MethodsX 2019, 6, 169–189. [CrossRef] [PubMed] 19. Alam, M.; Tabassum, T.; Munia, K.; Tavakolian, K. A Quantitative Assessment of Bradykinesia Using Inertial Measurement Unit Performance Measurement View project Signal-Image Processing View project. In Proceedings of the 2017 Design of Medical Devices Conference, Minneapolis, MN, USA, 10–13 April 2017. 20. Patel, S.; Lorincz, K.; Hughes, R.; Huggins, N.; Growdon, J.; Standaert, D.; Akay, M.; Dy, J.; Welsh, M.; Bonato, P. Monitoring motor fluctuations in patients with parkinsons disease using wearable sensors. IEEE Trans. Inf. Technol. Biomed. 2009, 13, 864–873. [CrossRef] 21. Djurić-Jovičić, M.; Petrović, I.; Ječmenica-Lukić, M.; Radovanović, S.; Dragašević-Mišković, N.; Belić, M.; Miler-Jerković, V.; Popović, M.B.; Kostić, V.S. Finger tapping analysis in patients with Parkinson’s disease and atypical parkinsonism. J. Clin. Neurosci. 2016, 30, 49–55. [CrossRef] http://dx.doi.org/10.1109/ACCESS.2018.2835463 http://dx.doi.org/10.1063/1.3683444 http://www.ncbi.nlm.nih.gov/pubmed/22462995 http://dx.doi.org/10.2298/VSP150502323D http://dx.doi.org/10.1007/s11517-010-0697-8 http://www.ncbi.nlm.nih.gov/pubmed/21052856 http://dx.doi.org/10.1155/2013/717853 http://www.ncbi.nlm.nih.gov/pubmed/23690760 http://dx.doi.org/10.1016/j.artmed.2017.10.001 http://www.ncbi.nlm.nih.gov/pubmed/29042162 http://dx.doi.org/10.3390/s16060930 http://dx.doi.org/10.3390/s18103523 http://dx.doi.org/10.1088/1361-6579/aafef2 http://www.ncbi.nlm.nih.gov/pubmed/30650391 http://dx.doi.org/10.1007/s11517-016-1467-z http://www.ncbi.nlm.nih.gov/pubmed/27032933 http://dx.doi.org/10.1016/j.parkreldis.2015.02.026 http://www.ncbi.nlm.nih.gov/pubmed/25819808 http://dx.doi.org/10.1007/s10439-017-1881-x http://www.ncbi.nlm.nih.gov/pubmed/28726022 http://dx.doi.org/10.1186/s12938-015-0067-8 http://www.ncbi.nlm.nih.gov/pubmed/26164814 http://dx.doi.org/10.1016/j.mex.2018.12.017 http://www.ncbi.nlm.nih.gov/pubmed/30733930 http://dx.doi.org/10.1109/TITB.2009.2033471 http://dx.doi.org/10.1016/j.jocn.2015.10.053 Sensors 2019, 19, 2644 17 of 17 22. Delrobaei, M.; Tran, S.; Gilmore, G.; McIsaac, K.; Jog, M. Characterization of multi-joint upper limb movements in a single task to assess bradykinesia. J. Neurol. Sci. 2016, 368, 337–342. [CrossRef] 23. Ornelas-Vences, C.; Sánchez-Fernández, L.P.; Sánchez-Pérez, L.A.; Martínez-Hernández, J.M. Computer model for leg agility quantification and assessment for Parkinson’s disease patients. Med. Biol. Eng. Comput. 2019, 57, 463–476. [CrossRef] 24. Mentzel, T.Q.; Lieverse, R.; Levens, A.; Mentzel, C.L.; Tenback, D.E.; Bakker, P.R.; Daanen, H.A.M.; van Harten, P.N. Reliability and validity of an instrument for the assessment of bradykinesia. Psychiatry Res. 2016, 238, 189–195. [CrossRef] [PubMed] 25. Memar, S.; Delrobaei, M.; Pieterman, M.; McIsaac, K.; Jog, M. Quantification of whole-body bradykinesia in Parkinson’s disease participants using multiple inertial sensors. J. Neurol. Sci. 2018, 387, 157–165. [CrossRef] [PubMed] 26. Kinesia ONETM. Available online: https://glneurotech.com/kinesia/products/kinesia-one/ (accessed on 13 May 2019). 27. Djurić-Jovičić, M.; Jovičić, N.; Roby-Brami, A.; Popović, M.; Kostić, V.; Djordjević, A.; Djurić-Jovičić, M.; Jovičić, N.S.; Roby-Brami, A.; Popović, M.B.; et al. Quantification of Finger-Tapping Angle Based on Wearable Sensors. Sensors 2017, 17, 203. [CrossRef] [PubMed] 28. Bobic, V.; Djuric-Jovicic, M.; Jarrasse, N.; Jecmenica-Lukic, M.; Petrovic, I.; Radovanovic, S.; Dragasevic, N.; Kostic, V. Spectral parameters for finger tapping quantification. Facta Univ.-Ser. Electron. Energ. 2017, 30, 585–597. [CrossRef] 29. Senhadji, L.; Wendling, F. Epileptic transient detection: Wavelets and time-frequency approaches. Neurophysiol. Clin. Neurophysiol. 2002, 32, 175–192. [CrossRef] © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). http://dx.doi.org/10.1016/j.jns.2016.07.056 http://dx.doi.org/10.1007/s11517-018-1894-0 http://dx.doi.org/10.1016/j.psychres.2016.02.011 http://www.ncbi.nlm.nih.gov/pubmed/27086232 http://dx.doi.org/10.1016/j.jns.2018.02.001 http://www.ncbi.nlm.nih.gov/pubmed/29571855 https://glneurotech.com/kinesia/products/kinesia-one/ http://dx.doi.org/10.3390/s17020203 http://www.ncbi.nlm.nih.gov/pubmed/28125051 http://dx.doi.org/10.2298/FUEE1704585B http://dx.doi.org/10.1016/S0987-7053(02)00304-0 http://creativecommons.org/ http://creativecommons.org/licenses/by/4.0/. Introduction Materials and Methods Measurement System Subjects Measurement Methodology Scoring by Neurologists Data Processing and Analysis Individual Taps Amplitude Amplitude Decrement Hesitations and Freezes Speed Decision Support System Statistical Analysis and Evaluation Results Discussion References work_3b2a2vjlanay3ksoid7xvudtrq ---- Piecewise linear value functions for multi-criteria decision-making Delft University of Technology Piecewise linear value functions for multi-criteria decision-making Rezaei, Jafar DOI 10.1016/j.eswa.2018.01.004 Publication date 2018 Document Version Final published version Published in Expert Systems with Applications Citation (APA) Rezaei, J. (2018). Piecewise linear value functions for multi-criteria decision-making. Expert Systems with Applications, 98, 43-56. https://doi.org/10.1016/j.eswa.2018.01.004 Important note To cite this publication, please use the final published version (if applicable). Please check the document version above. Copyright Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Takedown policy Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim. This work is downloaded from Delft University of Technology. For technical reasons the number of authors shown on this cover page is limited to a maximum of 10. https://doi.org/10.1016/j.eswa.2018.01.004 https://doi.org/10.1016/j.eswa.2018.01.004 Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. https://www.openaccess.nl/en/you-share-we-take-care Expert Systems With Applications 98 (2018) 43–56 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa Piecewise linear value functions for multi-criteria decision-making Jafar Rezaei Faculty of Technology Policy and Management, Delft University of Technology, Delft, The Netherlands a r t i c l e i n f o Article history: Received 24 August 2017 Revised 1 January 2018 Accepted 2 January 2018 Available online 3 January 2018 Keywords: Multi-criteria decision-making MCDM Decision criteria Value function Monotonicity a b s t r a c t Multi-criteria decision-making (MCDM) concerns selecting, ranking or sorting a set of alternatives which are evaluated with respect to a number of criteria. There are several MCDM methods, the two core el- ements of which are (i) evaluating the performance of the alternatives with respect to the criteria, (ii) finding the importance (weight) of the criteria. There are several methods to find the weights of the cri- teria, however, when it comes to the alternative measures with respect to the criteria, usually the existing MCDM methods use simple monotonic linear value functions. Usually an increasing or decreasing linear function is assumed between a criterion level (over its entire range) and its value. This assumption, how- ever, might lead to improper results. This study proposes a family of piecewise value functions which can be used for different decision criteria for different decision problems. Several real-world examples from existing literature are provided to illustrate the applicability of the proposed value functions. A numerical example of supplier selection (including a comparison between simple monotonic linear value functions, piecewise linear value functions, and exponential value functions) shows how considering proper value functions could affect the final results of an MCDM problem. © 2018 Elsevier Ltd. All rights reserved. 1 d t i h p p f w a f l U w a l o t a c a U d t w d s t t t l ( c D o C o n h t o t i D h 0 . Introduction Decision theory is primarily concerned with identifying the best ecision. In many real-world situations the decision is to select he best alternative(s) from among a set of alternatives consider- ng a set of criteria. This subdivision of decision-making, which as gained enormous attention, due to its practical value, in the ast recent is called multi-criteria decision-making (MCDM). More recisely, MCDM concerns problems in which the decision-maker aces m alternatives ( a 1 , a 2 , …, a m ), which should be evaluated ith respect to n criteria ( c 1 , c 2 , …, c n ), in order to find the best lternative(s), rank or sort them. In most cases, an additive value unction is used to find the overall value of alternative i, U i , as fol- ows: i = n ∑ j=1 w j u i j , (1) here u ij is the value of alternative i with respect to criterion j , nd w j shows the importance (weight) of criterion j . In some prob- ems, the decision-maker is able to find u ij from external sources as bjective measures, in some other problems, u ij reflects a qualita- ive evaluation provided by the decision-maker(s), experts or users s subjective measures. Price of a car is an objective criterion while omfort of a car is a subjective one. For objective criteria, we usu- E-mail address: j.rezaei@tudelft.nl t [ ttps://doi.org/10.1016/j.eswa.2018.01.004 957-4174/© 2018 Elsevier Ltd. All rights reserved. lly use physical quantities, for instance, ‘International System of nits’ (SI), while for subjective criteria, we do not have such stan- ards, which is why we mostly use pairwise comparison, linguis- ic variables, or Likert scales in order to evaluate the alternatives ith regard to such criteria. In order to find the weights, w j , the ecision-maker might use different tools and methods, from the implest way, which is assigning weights to the criteria intuitively, o use simple methods like SMART (simple multi-attribute rating echnique) ( Edwards, 1977 ), to more structured methods like mul- iple attribute utility theory (MAUT) ( Keeney & Raiffa, 1976 ), ana- ytic hierarchy process (AHP) ( Saaty, 1977 ), and best worst method BWM) ( Rezaei, 2015, 2016 ). While these methods are usually alled ‘multi attribute utility and value theories’ ( Carrico, Hogan, yson, & Athanassopoulos, 1997 ), there is another class of meth- ds, called outranking methods, like ELECTRE (ELimination and hoice Expressing REality) family ( Roy, 1968 ), PROMETHEE meth- ds ( Brans, Mareschal, & Vincke, 1984 ) which do not necessarily eed the weights to select, rank or sort the alternatives. What, owever, is in common in these methods is the way they consider he nature of the criteria. That is to say, in the current literature, ne of the common assumptions about the criteria (most of the ime it is not explicitly mentioned in the literature), is monotonic- ty. efinition 1 ( Keeney & Raiffa, 1976 ). Let u represents a value func- ion for criterion X , then u is monotonically increasing if: x 1 > x 2 ] ⇔ [ u ( x 1 ) > u ( x 2 ) ] . (2) https://doi.org/10.1016/j.eswa.2018.01.004 http://www.ScienceDirect.com http://www.elsevier.com/locate/eswa http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2018.01.004&domain=pdf mailto:j.rezaei@tudelft.nl https://doi.org/10.1016/j.eswa.2018.01.004 44 J. Rezaei / Expert Systems With Applications 98 (2018) 43–56 V Fig. 1. Increasing value function. t a w b s v t c f 2 fi t s w d s s a d t 2 f r i u less efficient fuel energy. 1 It is worth-mentioning that the studies we discuss to support each value func- tion have some theoretical or practical support for the proposed value functions. It does not, however, mean that those studies have used these value functions in their analysis. Definition 2 ( Keeney & Raiffa, 1976 ). Let u represents a value func- tion for criterion X , then u is monotonically decreasing if: [ x 1 > x 2 ] ⇔ [ u ( x 1 ) < u ( x 2 ) ] . (3) A function which is not monotonic is called non-monotonic and may have different shapes. For instance, a value function with the first part increasing and the second part decreasing called non- monotonic, by splitting of which, we have two monotonic func- tions. This assumption – monotonicity – however, is an oversimpli- fication in some real-world decision-making problems. Another simplification is the use of simple linear functions over the en- tire range of a criterion. Considering the two assumptions (mono- tonicity, linearity), we usually see simple increasing and decreas- ing linear value functions for the decision criteria in MCDM prob- lems. The literature is full of such applications. For instance, many of the studies reviewed in the following review papers implic- itly adopt such assumptions: the MCDM applications in supplier selection ( Ho, Xu, & Dey, 2010 ), in infrastructure management ( Kabir, Sadiq, & Tesfamariam, 2014 ), in sustainable energy plan- ning ( Pohekar & Ramachandran, 2004 ), and in forest management and planning ( Ananda & Herath, 2009 ). While in some studies the use of monotonic and/or linear value function might be logical, their use in some other applications might be unfitting. For in- stance, Alanne, Salo, Saari, and Gustafsson (2007) , for evaluation of residential energy supply systems use monotonic-linear value functions for all the selected evaluation criteria including “global warming potential (kg CO 2 m −2 a −1 )”, and “acidification potential (kg SO 2 m −2 a −1 )”. Considering a monotonic-linear value function for such criteria implies that the decision-maker accepts any level of such harmful environmental criteria for an energy supply sys- tem. However, if the decision-maker does not accept some high levels of such criteria (which seems logical), a piecewise linear function might better represent the preferences of the decision- maker (see the decrease-level value function in the next section). Some authors have discussed nonlinear monotonic value functions (e.g., exponential value functions by Kirkwood, 1997; Pratt, 1964 ). Others use qualitative scoring to address the non- monotonicity ( Brugha, 20 0 0; Kakeneno & Brugha, 2017; O’Brien & Brugha, 2010 ). We can also find some forms of eliciting piecewise linear value function in Jacquet-Lagreze and Siskos (2001 ), and Stewart and Janssen (2013 ). Some other value function construc- tion or elicitation frameworks can be found in Herrera, Herrera- iedma, and Verdegay (1996 ), Lahdelma and Salminen (2012 ), Mustajoki and Hämäläinen (20 0 0 ), Stewart and Janssen (2013 ), and Yager (1988 ). Although in PROMETHEE we use different types of piecewise functions for pairwise comparisons ( Brans, Mareschal, & Vincke, 1984 ), the functions are not used to evaluate the decision criteria. So, despite some effort s in literature, there is no a library of some standard piecewise linear value functions which can be used in different methods like AHP or BWM. It is also important to note that while in many studies value functions are elicited ac- cording to the preference data we have from the decision-maker(s), in MCDM, usually we use the value function as a subjective input. This implies that, in MCDM methods (except a few methods, such as UTA), the value function is not elicited, but an approximation is used. This also suggests that the rich literature on determining and eliciting value functions is not actually helping MCDM methods in this area. In this paper, first, a number of piecewise linear value functions with different shapes are proposed to be considered for the decision criteria. It is then shown, with some real-world ex- amples, how such consideration might change the final results of a decision problem. A comparison between simple monotonic linear value functions, piecewise linear value functions, and exponential value functions is conducted, which shows the effectiveness of the proposed pricewise value functions. This is a significant contribu- ion to this field and it is expected to be widely used by MCDM pplications. In the next section, some piecewise linear value functions along ith some real-world examples are presented, which is followed y some remarks in Section 3 . In Section 4 , some numerical analy- es are used to show the applicability of considering the proposed alue functions in a decision problem. In Section 5 , the determina- ion of the value functions is discussed. In Section 6 , the paper is oncluded, some limitations of the study are discussed, and some uture research directions are proposed. . Piecewise linear value functions In this section, a number of piecewise value functions are de- ned for decision criteria. We provide some example cases from he existing literature or practical decision-making problems to upport 1 each value function. In all the following value functions e consider [ d l j , d u j ] as the defined domain for the criterion by the ecision-maker; x ij shows the performance of alternative i with re- pect to criterion j ; and u ij shows the value of alternative i with re- pect to criterion j . For instance, if a decision-maker wants to buy car considering price as one criterion, if all the alternatives the ecision-maker considers are between €17,0 0 0 and €25,0 0 0, then he criterion might be defined for this range [17,0 0 0, 25,0 0 0] . .1. Increasing Increasing value function is perhaps the most commonly used unction in MCDM applications. It basically shows that as the crite- ion level, x ij , increases, its value, u ij , increases as well. It is shown n Fig. 1 and formulated as follows: i j = ⎧ ⎨ ⎩ x i j − d l j d u j − d l j , d l j ≤ x i j ≤ d u j , 0 , otherwise . (4) For this function we can think of: • Product quality in supplier selection ( Xia & Wu, 2007 ). Con- sidering a set of suppliers, a buyer may always prefer a sup- plier with a higher product quality compared to a supplier with lower product quality. • Energy efficiency in alternative-fuel bus selection ( Tzeng, Lin, & Opricovic, 2005 ). Considering a set of buses, a bus with more efficient fuel energy might always be preferred to a bus with J. Rezaei / Expert Systems With Applications 98 (2018) 43–56 45 Fig. 2. Decreasing value function. Fig. 3. V-shape value function. 2 i l u 2 c a i u r t Fig. 4. Inverted V-shape value function. 2 x a F u 2 i t l u .2. Decreasing Decreasing value function shows that as the criterion level, x ij , ncreases, its value, u ij , decreases. It is shown in Fig. 2 and formu- ated as follows: i j = ⎧ ⎨ ⎩ d u j − x i j d u j − d l j , d l j ≤ x i j ≤ d u j , 0 , otherwise . (5) For this function we can think of: • Product price in supplier selection ( Xia & Wu, 2007 ). Consid- ering a set of suppliers, a supplier with a lower product price might always be preferred to a supplier with higher product price. So, a higher product price has a lower value. • Maintenance cost in alternative-fuel bus selection ( Tzeng et al., 2005 ). Considering a set of buses, a bus with less maintenance cost might be preferred to a bus with higher maintenance cost. So, a higher maintenance cost is associated with a lower value. .3. V-shape V-shape value function shows that as the criterion level, x ij , in- reases up to a certain level, d m j , its value, u ij , decreases gradually, nd after that certain level, d m j , its value, u ij , increases gradually. It s shown in Fig. 3 and formulated as follows: i j = ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ d m j − x i j d m j − d l j , d l j ≤ x i j ≤ d m j , x i j − d m j d u j − d m j , d m j ≤ x i j ≤ d u j , 0 , otherwise . (6) For this function, we could not find many examples, and it may epresent a small number of very particular decision criteria. For his function we can think of: • Relative market share in selecting a firm for investment ( Wilson & Anell, 1999 ). Wilson and Anell (1999) found that for invest- ment decision-making, firms with low and high market share are more desirable to the investors. This implies that the value of a firm decreases while its market share increases up to a cer- tain level, d m , and after that its value increases again. • Firm size in R&D productivity ( Tsai & Wang, 2005 ). Tsai and Wang (2005) found that both small and large firms have higher R&D productivity compared to medium-sized firms. This is true for both high-tech and traditional industries. This means that the relationship between size and value (measured by R&D pro- ductivity) is V-shape with a minimum level of value assigned to a certain size of d m j . .4. Inverted V-shape Inverted V-shape value function shows that as the criterion level, ij , increases up to a certain level, d m j , its value, u ij , increases, and fter that certain level, d m j , its value, u ij , decreases. It is shown in ig. 4 and formulated as follows: i j = ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ x i j − d l j d m j − d l j , d l j ≤ x i j ≤ d m j , d u j − x i j d u j − d m j , d m j ≤ x i j ≤ d u j , 0 , otherwise . (7) For this function we can think of: • Commute time in selecting a job ( Redmond & Mokhtar- ian, 2001 ). For many people, the ideal commute, d m j , is larger than zero. This implies that commute times between zero and the optimal commute time, and between the optimal commute time and larger times, have lower value than the optimal com- mute time for such individuals. This suggests an inverted V- shape value function. • Cognitive proximity in innovation partner selection ( Nooteboom, 20 0 0 ). For a company there is an optimal cognitive distance to the partner they are working on innova- tion ( d m j ). This implies that any distance less than d m j or larger than d m j has less value. .5. Increase-level Increase-level value function shows that as the criterion level, x ij , ncreases up to a certain level, d m j , its value, u ij , increases, and after hat certain level, d m j , its value, u ij , will remain at the maximum evel. It is shown in Fig. 5 and formulated as follows: i j = ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ x i j − d l j d m j − d l j , d l j ≤ x i j ≤ d m j , 1 , d m j ≤ x i j ≤ d u j , 0 , otherwise . (8) For this function we can think of: 46 J. Rezaei / Expert Systems With Applications 98 (2018) 43–56 Fig. 5. Increase-level value function. Fig. 6. Level-decrease value function. Fig. 7. Level-increase value function. Fig. 8. Decrease-level value function. 2 x i I u 2 x a I u • Fill rate in supplier selection ( Chae, 2009 ). Although a buyer prefers suppliers with higher fill rate, which implies that as the fill rate increases its value increases, the buyer might be indif- ferent to any increase after a certain level, d m j , as usually buy- ers pre-identify a desirable service level which is satisfied by a certain minimum level of supplier’s fill rate. • Diversity of restaurants in hotel location selection ( Chou, Hsu, & Chen, 2008 ). In order to find the best location for an international hotel, a decision-maker prefers locations with more divers restaurants. However, reaching a level, d m j , might fully satisfy a decision-maker implying that the decision-maker might not be sensitive to any increase after that certain level. 2.6. Level-decrease Level-decrease value function shows that as the criterion level, x ij , increases up to a certain level, d m j , its value, u ij , remains at max- imum level, and after that certain level, d m j , its value, u ij , decreases gradually. It is shown in Fig. 6 and formulated as follows: u i j = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 , d l j ≤ x i j ≤ d m j , d u j − x i j d u j − d m j , d m j ≤ x i j ≤ d u j , 0 , otherwise . (9) For this function we can think of: • Distance in selecting a university ( Carrico et al., 1997 ). While a student prefers a closer university to a farther university, this preference might start after a certain distance, d m j , implying that any distance between [ d l j , d m j ] is optimal and indifferent for the student. • Lead time in supplier selection ( Çebi & Otay, 2016 ). Although a supplier with a shorter lead time is preferred, if the lead time is in a limit such that it does not negatively affect the com- pany’s production, the company might then be indifferent to that range. .7. Level-increase Level-increase value function shows that as the criterion level, ij , increases up to a certain level, d m j , its value, u ij , remains at min- mum level, and after that certain level, d m j , its value, u ij , increases. t is shown in Fig. 7 and formulated as follows: i j = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 0 , d l j ≤ x i j ≤ d m j , x i j − d m j d u j − d m j , d m j ≤ x i j ≤ d u j , 0 , otherwise . (10) For this function we can think of: • Level of trust in making a buyer–supplier relationship ( Ploetner & Ehret, 2006 ). Trust increases the level of partnership between a buyer and a supplier, however it is only effective after a cer- tain threshold, d m j . • Level of relational satisfaction in evaluating quality commu- nication in marriage ( Montgomery, 1981 ). Below a minimum level of relational satisfaction, d m j , quality communication can- not take place thus results in minimum value. .8. Decrease-level Decrease-level value function shows that as the criterion level, ij , increases up to a certain level, d m j , its value, u ij , decreases, and fter that certain level d m j , its value, u ij , will remain at minimum. t is shown in Fig. 8 and formulated as follows: i j = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ d m j − x i j d m j − d l j , d l j ≤ x i j ≤ d m j , 0 , d m j ≤ x i j ≤ d u j , 0 , otherwise . (11) For this function we can think of: • Carbon emission in transportation mode selection ( Hoen, Tan, Fransoo, & van Houtum, 2014 ). In selecting a transportation mode, the more the carbon emission by the mode, the less the value of that mode. A decision-maker, however might assign J. Rezaei / Expert Systems With Applications 98 (2018) 43–56 47 Fig. 9. Increasing stepwise value function. 2 l a j I u w h t t s r v u w Fig. 10. Decreasing stepwise value function. 2 l a u s u w m c u w 3 s u R m e d i c b zero value to a mode with carbon emission higher than a cer- tain level, d m j . • Distance when selecting a school ( Frenette, 2004 ). It has been shown that the longer the distance to the school the less the preference to attend that school. It is also clear that, for some people, there is no value after a certain distance, d m j . .9. Increasing stepwise Increasing stepwise value function shows that as the criterion evel, x ij , increases up to a certain level, d m j , its value remains at certain level, u 0 , and after that certain level, d m j , its value, u ij , umps to a higher level (maximum) and remains at the maximum. t is shown in Fig. 9 and formulated as follows: i j = ⎧ ⎨ ⎩ u 0 , d l j ≤ x i j ≤ d m j , 1 , d m j ≤ x i j ≤ d u j , 0 , otherwise . (12) here 0 < u 0 < 1. For this function we can think of: • Suppliers capabilities in supplier segmentation ( Rezaei & Ortt, 2012 ). Suppliers of a company are evaluated based on their capabilities, and then segmented based on two levels (low and high) with respect to their capabilities. As such a supplier scored between d l j and d m j is considered as a low-level capa- bilities supplier while a supplier scored between d m j and d u j is considered as a high-level capabilities supplier. • Symmetry in selecting a close type of partnership ( Lambert, Emmelhainz, & Gardner, 1996 ). In order to have a successful relationship between supply chain partners, there should be some demographical similarities (for instance, in terms of brand image, productivity) between them. So, more symmetry means closer relationship. However, if we consider two levels of closeness, it is clear that for some level of symmetry the value of closeness remains the same. For the increasing stepwise value function, a criterion might ave more than one jump. For instance, if a decision-maker wants o consider three levels low, medium, and high when segmenting he suppliers with respect to their capabilities, then an increasing tepwise function with two jumps should be defined for this crite- ion. The following value function is a general increasing stepwise alue function with k jumps. i j = ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ u 0 , d l j ≤ x i j ≤ d m 1 j , u 1 , d m 1 j ≤ x i j ≤ d m 2 j , . . . 1 , d mk j ≤ x i j ≤ d u j , 0 , otherwise . (13) here 0 < u < u < … < 1. 0 1 .10. Decreasing stepwise Decreasing stepwise value function shows that as the criterion evel, x ij , increases up to a certain level, d m j , its value, u ij , remains t a the maximum level, and after that certain level, d m j , its value, ij , jumps down to a lower level, u 0 , and remains at that level. It is hown in Fig. 10 and formulated as follows: i j = ⎧ ⎨ ⎩ 1 , d l j ≤ x i j ≤ d m j , u 0 , d m j ≤ x i j ≤ d u j , 0 , otherwise . (14) here 0 < u 0 < 1. For this function we can think of: • Considering supply risk in portfolio modeling ( Kraljic, 1983 ). For a company, an item with a higher level of risk results in less value, however, due to portfolio modeling, there is no dif- ference between all levels of risk in the domain [ d l j , d m j ] . Simi- larly, all levels of risk in the domain [ d m j , d u j ] result in the same value. • Delay in logistics service provider selection ( Qi, 2015 ). Some companies consider stepwise value function for delay in de- livering the items by a logistics service provider, which means that the value of that provider decreases when delay increases, however it is constant within certain intervals. For the decreasing stepwise function, a criterion might have ore than one jump. The following value function is a general de- reasing stepwise function with k jumps. i j = ⎧ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎩ 1 , d l j ≤ x i j ≤ d m 1 j , u 1 , d m 1 j ≤ x i j ≤ d m 2 j , . . . u k , d mk j ≤ x i j ≤ d u j , 0 , otherwise . (15) here 0 < u k < … < u 1 < 1. . Some remarks on the value functions Here, a number of remarks are discussed, shedding light on ome aspects of the proposed value functions, which might be sed in real-world applications. emark 1. Shape and parameters of a value function is decision- aker-dependent, implying that (i) while a decision-maker consid- rs, for instance, a level-increasing function for the size of gar- en when buying a house, another decision-maker considers an ncreasing stepwise function, and (ii) while two decision-makers onsider increasing stepwise function for the size of garden when uying a house, the parameters they consider for their functions ( d l j , d m j , d u j ) might be different. 48 J. Rezaei / Expert Systems With Applications 98 (2018) 43–56 Fig. 11. Increasing-level-decreasing value function. Table 1 Suppliers performance with respect to different decision criteria ( x ij ). Criteria Supplier Quality Price ( €/item) Trust CO 2 (g/item) Delivery (day) 1 85 27 4 10 0 0 3 2 90 28 2 1500 4 3 80 26 5 20 0 0 3 4 75 25 5 10 0 0 2 5 95 29 7 1700 3 6 99 30 6 20 0 0 1 w w v e h a d t a r m n t t t d m o U w s T w w T v 2 4 v e u p a s b fi w a e g 2 Please note that we report some weights for the criteria as the aim of the study is not the weighing part. Remark 2. A decision-maker might consider a hybrid value func- tion for a criterion. For instance, a criterion might be character- ized with an increasing-level-decreasing, which is a combination of increasing-level and level-decreasing. This function can also be considered as a special form of inverted V-shape function. It is shown in Fig. 11 and formulated as follows: u i j = ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ x i j − d l j d m j − d l j , d l j ≤ x i j ≤ d m 1 j , 1 , d m 1 ≤ x i j ≤ d m 2 , d u j − x i j d u j − d m j , d m 2 ≤ x i j ≤ d u j , 0 , otherwise . (16) For instance, a decision-maker has to select the best R&D part- ner among 10 partners. One of the criteria is distance and the com- pany gives less preference to very close or very distant partners, which are distributed in the range [10 km, 20 0 0 km]. The com- pany considers an optimal distance of [20 0 km, 50 0 km]. This im- plies that distance follows an ‘increasing-level-decreasing’ function for this decision-maker: [ d l j , d m 1 j , d m 2 j , d u j ] = [ 10 , 200 , 500 , 2000 ] . 4. Numerical and comparison analyses In this section, we show how to incorporate the proposed piecewise value functions into account when applying an MCDM method, and we show that the results might be different when we consider the proposed piecewise value functions. We consider an MCDM problem, where a buyer should select a supplier from among six qualified suppliers considering five crite- ria: quality which is measured by 1 −α, where α shows the lot-size average imperfect rate; price (euro) per item; trust, which is mea- sured by a Likert scale (1: very low to 7: very high); CO 2 (gram) per item; delivery (day), the amount of time which takes to deliver items from the supplier to the buyer (all the criteria are continuous except trust). Table 1 shows the performance of the six suppliers with respect to the five criteria. The buyer has used an elicitation method 2 to find the weights hich are as follows: ∗ quality = 0 . 20 , w ∗price = 0 . 30 , w ∗trust = 0 . 27 , w ∗CO 2 = 0 . 08 , w ∗ delivery = 0 . 15 . And we assume that the decision-maker considers piecewise alue functions for these criteria (see, Table 2 ). So, as can be seen from Table 2 , the decision-maker consid- rs a level-increase linear function for quality with the lowest and ighest values of 0 and 100, respectively. For the decision-maker ny number below 85 has no value at all. For criterion price, the ecision-maker gives the highest value to any price below 15 (al- hough in the existing set of suppliers there is no supplier with price within this range), after which the value decreases till it eaches to the maximum price of 30. For criterion trust which is easured using a Likert scale (1: very low to 7: very high), any umber less than 3 has no value for the decision-maker, while he value gradually increases between 3 and 7. For CO 2 emission, here is a decreasing value function from 0 to 1500 g per item, af- er which till 20 0 0 g, all the numbers have zero value. Finally, for elivery there is a simple decreasing function with minimum and aximum values of 0 and 5 days. By using the following equation, we can find the overall value f each supplier and then rank them to find the best supplier. i = n ∑ j=1 w j u i j (17) here, u ij is the value of the performance of supplier i with re- pect to criterion j (using the equations in Table 2 for the data in able 1 ), and w j is the weight of criterion j as follows: ∗ qual ity = 0 . 20 , w ∗price = 0 . 30 , w ∗trust = 0 . 27 , ∗ CO 2 = 0 . 08 , w ∗delivery = 0 . 15 . The value scores and the aggregated values are presented in able 3 (see also Fig. 13 for the final results). As can be seen from Table 3 , supplier 6 with the greatest overall alue of 0.51 is ranked as the first supplier. Suppliers 5, 4, 3, 1, and are ranked in the next places. .1. Comparing with the simple linear value functions In existing literature, considering the nature of the criteria, the alues are calculated, for instance, using the following simple lin- ar value function: i j = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ x i j − d l j d u j − d l j , if more x i j is more desirable ( such as quality ) , d u j − x i j d u j − d l j , if more x i j is less desirable ( such as price ) . (18) Eq. (18) is used to find the values of the criteria for each sup- lier using the data in Table 1 . Considering the criteria weights, nd u ij ( Eq. (18) for the data in Table 1 ) using Eq. (17) the value cores and also the aggregated overall score of each alternative can e calculated which are shown in Table 4 (see also Fig. 13 for the nal results). In Table 4 it is assumed that quality and trust are criteria for hich the higher the better, while for the other criteria (price, CO 2 , nd delivery), the lower the better. In fact, we consider simple lin- ar functions (increasing and decreasing respectively) for the two roups of criteria. J. Rezaei / Expert Systems With Applications 98 (2018) 43–56 49 Table 2 Piecewise value functions. Shape Value function u i j = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 0 , d l j ≤ x i j ≤ d m j , x i j − d m j d u j − d m j d m j ≤ x i j ≤ d u j , 0 , otherwise . = ⎧ ⎪ ⎨ ⎪ ⎩ 0 , 0 ≤ x i j ≤ 85 , x i j − 85 100 − 85 , 85 ≤ x i j ≤ 100 , 0 , otherwise . u i j = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 , d l j ≤ x ≤ d m j , d u j − x i j d u j − d m j , d m j ≤ x ≤ d u j , 0 , otherwise . = ⎧ ⎪ ⎨ ⎪ ⎩ 1 , 13 ≤ x i j ≤ 15 , 30 − x i j 30 − 15 , 15 ≤ x i j ≤ 30 , 0 , otherwise . u i j = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 0 , d l j ≤ x i j ≤ d m j , x i j − d m j d u j − d m j , d m j ≤ x i j ≤ d u j , 0 , otherwise . = ⎧ ⎪ ⎨ ⎪ ⎩ 0 , 1 ≤ x i j ≤ 3 , x i j − 3 7 − 3 , 3 ≤ x i j ≤ 7 , 0 , otherwise . u i j = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ d m j − x i j d m j − d l j , d l j ≤ x i j ≤ d m j , 0 , d m j ≤ x i j ≤ d u j , 0 , otherwise . = ⎧ ⎪ ⎨ ⎪ ⎩ 1500 − x i j 1500 − 0 , 0 ≤ x i j ≤ 1500 , 0 , 1500 ≤ x i j ≤ 2000 , 0 , otherwise . u i j = ⎧ ⎨ ⎩ d u j − x i j d u j − d l j , d l j ≤ x i j ≤ d u j , 0 , otherwise . = { 5 − x i j 5 − 0 , 0 ≤ x i j ≤ 5 , 0 , otherwise . r ( a c i 3 t s v e t According to Table 4 , the best supplier is supplier 4, which is anked as the 3rd one considering the piecewise value functions Table 3 ). The ranking of the other suppliers is also different. So, s can be seen, such differences are associated to the way we cal- ulate the value of the criteria. If we look at the criterion trust, for nstance ( Table 3 ), we can see that only the numbers greater than can be used for compensating the other criteria. In other words, he values 1, 2 and 3 for this criterion have no selection power. No upplier can compensate its weakness in other criteria by having a alue between 1 and 3 for trust. However, such important issue is ntirely ignored in the simple way of determining the value func- ions which is very popular in existing studies. This consideration 50 J. Rezaei / Expert Systems With Applications 98 (2018) 43–56 Table 3 Value scores, u ij , and the aggregated overall score considering the proposed piecewise value functions. Supplier Quality Price Trust CO 2 Delivery Aggregated value Rank 1 0.00 0.20 0.25 0.50 0.40 0.23 5 2 0.33 0.13 0.00 0.00 0.20 0.14 6 3 0.00 0.27 0.50 0.00 0.40 0.28 4 4 0.00 0.33 0.50 0.50 0.60 0.37 3 5 0.67 0.07 1.00 0.00 0.40 0.48 2 6 0.93 0.00 0.75 0.00 0.80 0.51 1 Table 4 Value scores, u ij , and the aggregated overall scores considering the simple linear value func- tions. Supplier Quality Price Trust CO 2 Delivery Aggregated value Rank 1 0.42 0.60 0.40 1.00 0.33 0.50 4 2 0.63 0.40 0.00 0.50 0.00 0.29 6 3 0.21 0.80 0.60 0.00 0.33 0.49 5 4 0.00 1.00 0.60 1.00 0.67 0.64 1 5 0.83 0.20 1.00 0.30 0.33 0.57 2 6 1.00 0.00 0.80 0.00 1.00 0.57 3 Fig. 12. Exponential value functions. s u f c F f v d M v ( f t 3 To see how these value functions are elicited considering the decision-maker is even of a higher importance for compensatory methods such as AHP and BWM. 4.2. Comparing with the exponential value functions Another important way to approximate the value functions in practice is the use of exponential value functions ( Kirkwood, 1997; Pratt, 1964 ). The exponential value functions can specifically be used when the preferences are monotonically increasing or de- creasing. Although this approach is not popular in MCDM do- main, and we were not able to find any application of these value functions particularly in MCDM field, we would like to compare our results to the results of applying these functions, which are, to some extent, close to some of our proposed piecewise value functions (such as level-increase, level-decrease, increase-level, and decrease-level). Using the same notations as before and consider- ing a shape parameter ρ which is called ‘risk tolerance’, a mono- tonically increasing exponential value function can be shown as follows: u i j = ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ 1 − exp [ − ( x i j − d l j ) /ρ ] 1 − exp [ − ( d u j − d l j ) /ρ ] , ρ � = Infinity x i j − d l j d u j − d l j , otherwise . (19) r A monotonically decreasing exponential value function can be hown as follows: i j = ⎧ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎩ 1 − exp [ − ( d u j − x i j ) /ρ ] 1 − exp [ − ( d u j − d l j ) /ρ ] , ρ � = Infinity d u j − x i j d u j − d l j , otherwise . (20) Fig. 12 shows the monotonically increasing exponential value unctions (for different values of ρ) (a), and the monotonically de- reasing exponential value functions (for different values of ρ) (b). Risk-averse decision-makers have ρ > 0 (hill-like functions in ig. 12 ), while risk-seeking decision-makers have ρ < 0 (bowl-like unctions in Fig. 12 ). ρ = Infinity (straight-line in Fig. 12 ) shows the alue for the risk neutral decision-makers. In fact, ρ = Infinity pro- uces the simple linear value functions which are very popular in CDM field. In order to do the comparison analysis, we use exponential alue functions for the criteria of the aforementioned example Table 1 ) to check the similarities and differences. To make a air comparison, we try to generate 3 the corresponding exponen- ial value functions of the piecewise value functions ( Table 2 ) as isk tolerance, refer to Kirkwood (1997) . J. Rezaei / Expert Systems With Applications 98 (2018) 43–56 51 c t a t t p l ρ f s o p lose as possible. For quality, a monotonically increasing exponen- ial value function with negative ρ would be appropriate. For price monotonically decreasing exponential value function with a posi- ive ρ, for trust, a monotonically increasing exponential value func- ion with a negative ρ, for CO 2 , a monotonically decreasing ex- onential value function with a negative ρ, and, finally, for de- ivery a monotonically decreasing exponential value function with Table 5 Exponential value functions. Shape = Infinity would be suitable. Table 5 shows the functions, where unctions with different ρ′ s are shown and a more suitable one is hown in bold. Using the exponential value functions of Table 5 , for the data f Table 1 , we get the value scores and the aggregated values as resented in Table 6 (see also Fig. 13 for the final results). Function u i j = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 − exp [ −( x i j − d l j ) /ρ] 1 − exp[ −( d u j − d l j ) /ρ] , ρ � = Infinity x i j − d l j d u j − d l j , otherwise . = ⎧ ⎨ ⎩ 1 − exp [ −( x i j − 0 ) /ρ] 1 − exp[ −( 100 − 0 ) /ρ] , ρ � = Infinity x i j − 0 100 − 0 , otherwise . u i j = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 − exp [ −( d u j − x i j ) /ρ] 1 − exp[ −( d u j − d l j ) /ρ] , ρ � = Infinity d u j − x i j d u j − d l j , otherwise . = ⎧ ⎨ ⎩ 1 − exp [ −( 30 − x i j ) /ρ] 1 − exp[ −( 30 − 0 ) /ρ] , ρ � = Infinity 30 − x i j 30 − 0 , otherwise . u i j = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 − exp [ −( x i j − d l j ) /ρ] 1 − exp[ −( d u j − d l j ) /ρ] , ρ � = Infinity x i j − d l j d u j − d l j , otherwise . = ⎧ ⎨ ⎩ 1 − exp [ −( x i j − 1 ) /ρ] 1 − exp[ −( 7 − 1 ) /ρ] , ρ � = Infinity x i j − 1 7 − 1 , otherwise . u i j = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 − exp [ −( d u j − x i j ) /ρ] 1 − exp[ −( d u j − d l j ) /ρ] , ρ � = Infinity d u j − x i j d u j − d l j , otherwise . = ⎧ ⎨ ⎩ 1 − exp [ −( 20 0 0 − x i j ) /ρ] 1 − exp[ −( 20 0 0 − 1500 ) /ρ] , ρ � = Infinity 20 0 0 − x i j 20 0 0 − 0 , otherwise . ( continued on next page ) 52 J. Rezaei / Expert Systems With Applications 98 (2018) 43–56 Table 5 ( continued ) Shape Function u i j = ⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩ 1 − exp [ −( d u j − x i j ) /ρ] 1 − exp[ −( d u j − d l j ) /ρ] , Infinity d u j − x i j d u j − d l j , otherwise . = ⎧ ⎨ ⎩ 1 − exp [ −( 5 − x i j ) /ρ] 1 − exp[ −( 5 − 0 ) /ρ] , ρ � = Infinity 5 − x i j 5 − 0 , otherwise . Table 6 Value scores, u ij , and the aggregated overall scores considering the exponential value func- tions. Supplier Quality Price Trust CO 2 Delivery Aggregated value Rank 1 0.85 0.45 0.05 0.08 0.40 0.22 5 2 0.90 0.33 0.00 0.02 0.20 0.16 6 3 0.80 0.55 0.13 0.00 0.40 0.26 4 4 0.75 0.63 0.13 0.08 0.60 0.32 3 5 0.95 0.18 1.00 0.00 0.40 0.46 1 6 0.99 0.00 0.37 0.00 0.80 0.38 2 Fig. 13. Final results using three types of value functions. v o t p t l c c a As can be seen from Table 6 , supplier 5 with the greatest over- all value of 0.46 is ranked as the first supplier, which is different from what we get from the proposed piecewise value functions ( Table 3 ). While supplier 5 was ranked the 2nd based on our pro- posed value functions, using the exponential value functions, this supplier becomes number 2. Other suppliers (1, 2, 3, 4) have the same ranking based on the two approaches. The differences are ob- viously associated to the way we get the values of the criteria. We also checked some other close ρ values for the exponential value functions. There are some changes in the aggregated values, yet, the ranking is the same. As it can be seen, the results of the two approaches (piecewise alue functions and exponential functions are much closer to each ther than to the results of the regular simple linear value func- ions). Our observation is that the exponential value functions can lay a role close to a number of proposed value functions in his study such as increase-level, decrease-level, level-increase, and evel-decrease. In order to make an exponential value functions lose to one of the mentioned proposed value functions we should hoose ρ values close to zero. If we try to make the ρ as close s we perfectly make the “level” part of the criterion, then the J. Rezaei / Expert Systems With Applications 98 (2018) 43–56 53 Fig. 14. Fitting an exponential value function to a level-decrease value function. o h s i e 1 c i t p O c t i c t i o t c v f o l r a t a t w e e ( e e p e a i 5 fi h l n a v m fi s 1 & t s e fl n r u m T p m f t m s t c t 6 m i i v ther part becomes very steep and not representative. On the other and, if we want to choose a ρ value which better represents the lope of the function for the “increase” or the “decrease” part, then t is impossible to cover the “level” part of the value function prop- rly. For example, let us consider the criterion price again. In Fig. 4 , it can be seen that, if we consider ρ = 1, the level part is fully overed, but the decrease part of the exponential value function s very much different from the decrease part of the linear func- ion. Even if we choose ρ = 3, which does not fully cover the level art of the proposed function, the decrease part is really different. n the other hand, we can find ρ = 8 as a close one to the de- rease part of the linear function, but this time it is not possible o cover the level part properly. So, although a very good approx- mation, the exponential value functions might not be suitable for ases in which a decision-maker has a clear value-indifference in- erval (a level part) for a criterion. However, these functions are ndeed suitable when the decision-maker has different preferences n the lower and on the upper parts of the criterion measure. From the figure we can also see that while we could make the wo piecewise and exponential value functions, to some degree, lose to each other, they are too different from the simple linear alue function. It is also clear that none of the simple linear value unctions or the exponential value functions can represent the V- r inverted V-shape value functions. As a general conclusion, we think that the proposed piecewise inear functions have two salient features: (i) simplicity; and (ii) epresentativeness. That is, it is easy to work with linear functions, nd it is easy for a practitioner to find a more representative func- ion from the proposed library of the pricewise value functions for particular criterion. The cut-off points can also be estimated by he decision-maker. The simple monotonic-linear value functions, hich are dominant in existing literature, are very simple. How- ver, they might not be representative in some cases. Finally, the xponential value functions might have a better representativeness compared to the simple monotonic-linear value functions). How- ver, they are not simple. Working with non-liner functions is not asy for practitioner, and, more importantly, it is very difficult for a ractitioner to estimate a value for ρ (the shape parameter of the xponential value functions), as it cannot be easily interpreted by practitioner (please note that we consider a value function as an nput for an MCDM problem in this study). s I t l . Determining the value functions One of the big challenges in real-world decision-making is to nd a proper value function for a decision criterion. This, per- aps, has been one of the main reasons why the use of simple inear value functions in multi-criteria decision-making is domi- ant. The linear value functions are easy for modeling purposes nd can, to some extent, represent the reality. More complicated alue functions, although might be closer to reality of the decision- aker’s preferences, are more difficult to be elicited and are dif- cult for modeling purposes. We refer the interested readers to ome existing procedures for identifying value functions ( Fishburn, 967; Keeney & Nair, 1976; Kirkwood, 1997; Pratt, 1964; Stewart Janssen, 2013 ). We think that the proposed value functions in his paper do not have the disadvantage of nonlinearity and at the ame time have the advantages of being closer to the real pref- rences of the decision-maker as they provide some diversity and exibility in modeling the functions. As we do not consider the onlinearity of the value functions we do not use the concept of isk tolerance in determining the value functions as it has been sed by others. We rather propose a simple procedure, which is ore practical. A decision analyst, could first show the value functions in able 2 to the decision-maker to see which one most suits the reference structure of the decision-maker. Once the decision- aker selects a particular value function, the other details of the unction, such as the lower bound, the upper bound and the hresholds can be determined. We should highlight again that in ost MCDM methods, the value function is not elicited. It is rather imply assumed to have a particular shape, and this is why we hink having a pre-specified set of standard value functions which an be used as subjective approximation of the real preferences of he decision-maker can make a significant impact on the results. . Conclusion, limitations and future research This study proposes a set of piecewise value functions for ulti-criteria decision-making (MCDM) problems. While the exist- ng applications of MCDM methods usually use two general simple ncreasing and decreasing linear value functions, this study pro- ides several real-world examples to support the applicability of ome other forms of value functions for the criteria used in MCDM. t is also explicated how, in some decision problems, a combina- ion of two or more value functions can be used for a particu- ar decision criterion. The proposed functions can be used for dif- 54 J. Rezaei / Expert Systems With Applications 98 (2018) 43–56 Ç C E F F H J K K K K L M O P P P Q R R R R R S T ferent MCDM methods in different decision problems. A numeri- cal example of supplier selection problem (including a comparison between simple monotonic linear value functions, piecewise lin- ear value functions, and exponential value functions) showed how the use of the proposed value functions could affect the final re- sults. Considering these value functions could better represent the real preferences of the decision-maker. It can also help reduce the inappropriate compensations of the decision criteria, for instance, through using a level-increasing function which assigns zero value to any value of the criterion below a certain threshold. The pro- posed value functions are presented in a general form such that they can be tailor-made for a specific decision-maker. That is to say, not only it is possible for two different decision-makers to use two different value functions for a single criterion. It is also pos- sible to use different domain (e.g. min and max) values for that particular value function. Despite the advantages of the proposed value functions, they have some limitations. Although the proposed value functions con- sider some real-world features of the decision criteria, they are lin- ear which might be, to some degree, a simplification. We think that the bigger problem in existing literature of MCDM is mono- tonicity assumption and not linearity assumption. Nevertheless, more research needs to be conducted to empirically find the share of each. Furthermore, to formulate the decision criteria one should pay enough attention to check the real contribution of the decision criterion into the ultimate goal of the decision-making problem. For instance, if a criterion contributes to another criterion which has a real role in making the decision, one should exclude the ini- tial one. For a detailed discussion on this matter, interested read- ers are referred to Brugha (1998) . One interesting future direction would be to apply the proposed value functions in some real-world MCDM problems and compare their fitness to the other value func- tions. In this regard, finding a more systematic approach to deter- mine the value functions in practice would be also very interest- ing. It would be also interesting to study the cases in which there are more than one decision-maker. As different decision-makers may choose different value functions, different domains, and dif- ferent thresholds for a single criterion, proposing a way to find the final output of the MCDM problem for the group would be an interesting future research. Finally, finding a sensitivity anal- ysis for the proposed value functions is recommended. Consider- ing the studies of Bertsch and Fichtner (2016 ), Bertsch, Treitz, Gel- dermann, and Rentz (2007 ), Insua and French (1991 ), Wulf and Bertsch (2017 ) could give interesting ideas to make such sensitivity analysis framework. References Alanne, K. , Salo, A. , Saari, A. , & Gustafsson, S.-I. (2007). Multi-criteria evaluation of residential energy supply systems. Energy and Buildings, 39 , 1218–1226 . Ananda, J. , & Herath, G. (2009). A critical review of multi-criteria decision making methods with special reference to forest management and planning. Ecological Economics, 68 , 2535–2548 . Bertsch, V. , & Fichtner, W. (2016). A participatory multi-criteria approach for power generation and transmission planning. Annals of Operations Research, 245 , 177–207 . Bertsch, V. , Treitz, M. , Geldermann, J. , & Rentz, O. (2007). Sensitivity analyses in multi-attribute decision support for off-site nuclear emergency and recovery management. International Journal of Energy Sector Management, 1 , 342–365 . Brans, J. P. , Mareschal, B. , Vincke, P. , & Brans, J. P. (1984). PROMETHEE: A new fam- ily of outranking methods in multicriteria analysis. In Proceedings of the 1984 conference of the international federation of operational research societies (IFORS) (pp. 477–490). Amsterdam: North Holland. 84 . Brugha, C. M. (1998). Structuring and weighting criteria in multi criteria deci- sion making (MCDM). In Trends in multicriteria decision making (pp. 229–242). Springer . Brugha, C. M. (20 0 0). Relative measurement and the power function. European Jour- nal of Operational Research, 121 , 627–640 . Carrico, C. S. , Hogan, S. M. , Dyson, R. G. , & Athanassopoulos, A. D. (1997). Data envel- opment analysis and university selection. The Journal of the Operational Research Society, 48 , 1163–1177 . ebi, F. , & Otay, İ. (2016). A two-stage fuzzy approach for supplier evaluation and order allocation problem with quantity discounts and lead time. Information Sci- ences, 339 , 143–157 . Chae, B. (2009). Developing key performance indicators for supply chain: An indus- try perspective. Supply Chain Management: An International Journal, 14 , 422–428 . hou, T.-Y. , Hsu, C.-L. , & Chen, M.-C. (2008). A fuzzy multi-criteria decision model for international tourist hotels location selection. International Journal of Hospi- tality Management, 27 , 293–301 . dwards, W. (1977). How to use multiattribute utility measurement for social deci- sion making. IEEE Transactions on Systems, Man and Cybernetics, 7 , 326–340 . ishburn, P. C. (1967). Methods of estimating additive utilities. Management Science, 13 , 435–453 . renette, M. (2004). Access to college and university: Does distance to school mat- ter. Canadian Public Policy/Analyse de Politiques, 30 , 427–443 . Herrera, F. , Herrera-Viedma, E. , & Verdegay, J. (1996). Direct approach processes in group decision making using linguistic OWA operators. Fuzzy Sets and Systems, 79 , 175–190 . o, W. , Xu, X. , & Dey, P. K. (2010). Multi-criteria decision making approaches for supplier evaluation and selection: A literature review. European Journal of Oper- ational Research, 202 , 16–24 . Hoen, K. , Tan, T. , Fransoo, J. , & van Houtum, G. (2014). Effect of carbon emission reg- ulations on transport mode selection under stochastic demand. Flexible Services and Manufacturing Journal, 26 , 170–195 . Insua, D. R. , & French, S. (1991). A framework for sensitivity analysis in discrete multi-objective decision-making. European Journal of Operational Research, 54 , 176–190 . acquet-Lagreze, E. , & Siskos, Y. (2001). Preference disaggregation: 20 years of MCDA experience. European Journal of Operational Research, 130 , 233–245 . abir, G. , Sadiq, R. , & Tesfamariam, S. (2014). A review of multi-criteria decision– making methods for infrastructure management. Structure and Infrastructure En- gineering, 10 , 1176–1210 . akeneno, J. R. , & Brugha, C. M. (2017). Usability of nomology-based methodologies in supporting problem structuring across cultures: The case of participatory de- cision-making in Tanzania rural communities. Central European Journal of Oper- ations Research, 25 , 393–415 . Keeney, R. L., & Nair, K. (1976). Evaluating potential nuclear power plant sites in the Pacific Northwest using decision analysis, IIASA Professional Paper. IIASA, Laxenburg, Austria: PP-76-001. Keeney, R. L. , & Raiffa, H. (1976). Decisions with multiple objectives: preferences and value tradeoffs . USA: John Wiley & Sons, Inc . irkwood, C. W. (1997). Strategic decision making . California, USA: Wadsworth Pub- lishing Company . raljic, P. (1983). Purchasing must become supply management. Harvard Business Review, 61 , 109–117 . ahdelma, R. , & Salminen, P. (2012). The shape of the utility or value function in stochastic multicriteria acceptability analysis. OR Spectrum, 34 , 785–802 . Lambert, D. M. , Emmelhainz, M. A. , & Gardner, J. T. (1996). Developing and imple- menting supply chain partnerships. The International Journal of Logistics Manage- ment, 7 , 1–18 . Montgomery, B. M. (1981). The form and function of quality communication in mar- riage. Family Relations, 30 , 21–30 . ustajoki, J. , & Hämäläinen, R. P. (20 0 0). Web-HIPRE: Global decision support by value tree and AHP analysis. INFOR: Information Systems and Operational Re- search, 38 , 208–220 . Nooteboom, B. (20 0 0). Learning and innovation in organizations and economies . Ox- ford: University Press . ’Brien, D. B. , & Brugha, C. M. (2010). Adapting and refining in multi-criteria deci- sion-making. Journal of the Operational Research Society, 61 , 756–767 . loetner, O. , & Ehret, M. (2006). From relationships to partnerships—New forms of cooperation between buyer and seller. Industrial Marketing Management, 35 , 4–9 . ohekar, S. , & Ramachandran, M. (2004). Application of multi-criteria decision mak- ing to sustainable energy planning—a review. Renewable and Sustainable Energy Reviews, 8 , 365–381 . ratt, J. W. (1964). Risk aversion in the small and in the large. Econometrica, 32 , 122–136 . i, X. (2015). Disruption management for liner shipping. In Handbook of ocean con- tainer transport logistics (pp. 231–249). Springer . edmond, L. S. , & Mokhtarian, P. L. (2001). The positive utility of the commute: Modeling ideal commute time and relative desired commute amount. Trans- portation, 28 , 179–205 . ezaei, J. (2015). Best-worst multi-criteria decision-making method. Omega, 53 , 49–57 . ezaei, J. (2016). Best-worst multi-criteria decision-making method: Some proper- ties and a linear model. Omega, 64 , 126–130 . ezaei, J. , & Ortt, R. (2012). A multi-variable approach to supplier segmentation. In- ternational Journal of Production Research, 50 , 4593–4611 . oy, B. (1968). Classement et choix en présence de points de vue multiples (la méthode ELECTRE). RIRO, 2 , 57–75 . Saaty, T. L. (1977). A scaling method for priorities in hierarchical structures. Journal of Mathematical Psychology, 15 , 234–281 . tewart, T. J. , & Janssen, R. (2013). Integrated value function construction with appli- cation to impact assessments. International Transactions in Operational Research, 20 , 559–578 . sai, K.-H. , & Wang, J.-C. (2005). Does R&D performance decline with firm size?—A re-examination in terms of elasticity. Research Policy, 34 , 966–976 . http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0001 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0001 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0001 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0001 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0001 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0001 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0002 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0002 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0002 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0002 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0003 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0003 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0003 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0003 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0004 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0004 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0004 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0004 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0004 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0004 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0005 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0005 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0005 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0005 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0005 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0005 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0006 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0006 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0007 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0007 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0008 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0008 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0008 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0008 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0008 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0008 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0009 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0009 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0009 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0009 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0010 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0010 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0011 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0011 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0011 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0011 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0011 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0012 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0012 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0013 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0013 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0014 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0014 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0015 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0015 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0015 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0015 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0015 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0016 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0016 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0016 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0016 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0016 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0017 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0017 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0017 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0017 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0017 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0017 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0018 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0018 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0018 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0018 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0019 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0019 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0019 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0019 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0020 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0020 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0020 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0020 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0020 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0021 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0021 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0021 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0021 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0022 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0022 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0022 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0022 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0023 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0023 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0024 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0024 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0025 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0025 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0025 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0025 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0026 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0026 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0026 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0026 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0026 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0028 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0028 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0029 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0029 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0029 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0029 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0030 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0030 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0031 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0031 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0031 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0031 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0032 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0032 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0032 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0032 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0033 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0033 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0033 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0033 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0034 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0034 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0035 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0035 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0036 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0036 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0036 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0036 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0037 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0037 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0038 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0038 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0039 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0039 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0039 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0039 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0040 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0040 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0041 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0041 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0042 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0042 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0042 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0042 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0043 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0043 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0043 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0043 J. Rezaei / Expert Systems With Applications 98 (2018) 43–56 55 T W W X Y zeng, G.-H. , Lin, C.-W. , & Opricovic, S. (2005). Multi-criteria analysis of alternative– fuel buses for public transportation. Energy Policy, 33 , 1373–1383 . ilson, T. L. , & Anell, B. I. (1999). Business service firms and market share. Journal of Small Business Strategy, 10 , 41–53 . ulf, D. , & Bertsch, V. (2017). A natural language generation approach to sup- port understanding and traceability of multi-dimensional preferential sensitivity analysis in multi-criteria decision making. Expert Systems with Applications, 83 , 131–144 . ia, W. , & Wu, Z. (2007). Supplier selection with multiple criteria in volume dis- count environments. Omega, 35 , 494–504 . ager, R. R. (1988). On ordered weighted averaging aggregation operators in multi- criteria decision making. IEEE Transactions on Systems, Man, and Cybernetics, 18 , 183–190 . http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0044 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0044 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0044 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0044 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0044 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0045 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0045 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0045 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0045 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0046 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0046 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0046 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0046 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0047 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0047 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0047 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0047 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0048 http://refhub.elsevier.com/S0957-4174(18)30004-6/sbref0048 56 J. Rezaei / Expert Systems With Applications 98 (2018) 43–56 t Delft University of Technology, the Netherlands, where he also obtained his Ph.D. One of ) analysis. He has published in various academic journals, including International Journal arketing Management, Applied Soft Computing, Applied Mathematical Modelling, IEEE Journal of Operational Research, Information Science, Omega, and Expert Systems with Jafar Rezaei is an associate professor of operations and supply chain management a his main research interests is in the area of multi-criteria decision-making (MCDM of Production Economics, International Journal of Production Research, Industrial M Transactions on Engineering Management, Journal of Cleaner Production, European Applications. Piecewise linear value functions for multi-criteria decision-making 1 Introduction 2 Piecewise linear value functions 2.1 Increasing 2.2 Decreasing 2.3 V-shape 2.4 Inverted V-shape 2.5 Increase-level 2.6 Level-decrease 2.7 Level-increase 2.8 Decrease-level 2.9 Increasing stepwise 2.10 Decreasing stepwise 3 Some remarks on the value functions 4 Numerical and comparison analyses 4.1. Comparing with the simple linear value functions 4.2. Comparing with the exponential value functions 5 Determining the value functions 6 Conclusion, limitations and future research References work_3cy3ym4zi5ht5g5z7rnaurcdty ---- UN CO RR EC TE D PR O O F Intuitionistic fuzzy rough sets: at the cross- roads of imperfect knowledge Chris Cornelis, Martine De Cock and Etienne E. Kerre Department of Applied Mathematics and Computer Science, Ghent University, Fuzziness and Uncertainty Modelling Research Unit, Krijgslaan 281 (S9), B-9000 Ghent, Belgium E-mail: {chris.cornelis, martine.decock, etienne.kerre}@rug.ac.be Abstract: Just like rough set theory, fuzzy set theory addresses the topic of dealing with imperfect knowledge. Recent investiga- tions have shown how both theories can be combined into a more flexible, more expressive framework for modelling and processing incomplete information in information systems. At the same time, intuitionistic fuzzy sets have been proposed as an attractive extension of fuzzy sets, enriching the latter with extra features to represent uncertainty (on top of vagueness). Unfortunately, the various tentative definitions of the concept of an ‘intuitionistic fuzzy rough set’ that were raised in their wake are a far cry from the original objectives of rough set theory. We intend to fill an obvious gap by introducing a new definition of intuitionistic fuzzy rough sets, as the most natural generalization of Pawlak’s original concept of rough sets. Keywords: rough set theory, intuitionistic fuzzy set theory, L-fuzzy set theory, incomplete information, lower and upper approximation 1. Introduction As a new trend in the attempts to combine the best of several worlds, very recently all kinds of suggestions for approaches merging rough set theory and intuitionistic fuzzy set theory have started to appear. The present evolution vividly reminds us of the origin of fuzzy rough set theory, as the (so far) happy marriage of fuzzy set theory and rough set theory. A remarkable difference, however, is that in the latter case a long engagement period with intense discussions concerning the relation between fuzzy set theory and rough set theory preceded the marriage, (and in a way is still going on, simultaneously with the research on the new hybrid theory). So far, this comparison stage seems very limited for the combination of intuitionistic fuzzy set theory and (fuzzy) rough set theory. As far as we know, only Çoker (1998) went into the matter by claiming that fuzzy rough sets are intuitionistic fuzzy sets (Chakra- barty et al., 1998; Samanta & Mondal, 2001; Jena & Ghosh, 2002; Rizvi et al., 2002), which appears to be shattering the dream of a new hybrid theory. On the other hand, there exist many views on the notion ‘rough set’ which can be grouped into two main streams. Several suggested options for fuzzification have led to an even greater number of views on the notion ‘fuzzy rough set’. Typically, under the same formal umbrella, they can be further generalized to the notion ‘L-fuzzy rough set’ where the membership degrees are taken from some suitable lattice L which is not necessarily the unit interval. On top of this, there exist semantically different interpretations of intuitionistic fuzzy set theory (which is a special kind of L-fuzzy set theory). Needless to say, when trying to compare and/or to combine rough set theory, fuzzy set theory and intuitionistic fuzzy set theory, one finds oneself at a complicated crossroads with an abundance of possible ways to proceed. The aim of this paper is to provide the reader with a road map. We do this by mapping out research results obtained so far in the literature, as well as by exploring by our- selves a very important road which was virgin territory until now. However, before we can prepare a hybrid theory, it is absolutely necessary to check the origin of all ingredients, for they can have an important influence on the flavour of the resulting product! For this reason we start the paper with a short overview of all set theoretical models involved (Section 2). Making such a study forces us into thinking about relationships between them. Staying within the scope of the paper, we will only focus on intuitionistic fuzzy set theory versus fuzzy rough set theory (Section 3). After a critical examination of the added value of a hybrid intuitionistic fuzzy rough set theory, in Section 4 we present an overview of existing approaches (all originated independently from the others). Finally we fill an obvious gap by a very natural generalization of Pawlak’s original concept of a rough set. Article_______________________________________ Journal: EXSY H Disk used ED: Ramesh: Pgn by: solly Article : 02005003 Pages: 10 Despatch Date: 8/8/2003 260 ________________________________________________________________ Expert Systems, November 2003, Vol. 20, No. 5 BWUK EXSY 02005003.PDF 08-Aug-03 16:38 226131 Bytes 10 PAGES UN CO RR EC TE D PR O O F 2. Short overview of some set theoretical models 2.1. Rough set theory Rough sets, remarkably enough, are not a uniquely defined notion. A very accurate classification of various ap- proaches to definitions was given by Yao (1996). For our purposes, it is sufficient to distinguish between two main streams in the literature, denoted ‘Pawlak rough sets’ and ‘Iwinski rough sets’, respectively; both come with their own particular perception of the notion ‘rough set’. Pawlak rough sets The first stream was initiated by Pawlak (1982) who launched rough set theory as a framework for the construction of approximations of concepts when only incomplete information is available. The available information consists of a set A of examples (a subset of a universe X, X being a non-empty set of objects we want to say something about) of a concept C, and a relation R in X. R models ‘indiscernibility’ or ‘indistinguishability’ and therefore generally is a tolerance relation (i.e. a reflexive and symmetrical relation) and in most cases even an equivalence relation (i.e. a transitive tolerance relation). In this paper we will use R-foresets to denote equivalence classes. The R-foreset of an element y of X is the set Ry ¼ {x|(x, y)AR}. The couple (X, R) is called an approximation space. Rough set analysis makes statements about the membership of some element x of X to the concept C of which A is a set of examples, based on the indistinguishability between x and the elements of A. To arrive at such statements, A is approximated in two ways. The lower and upper approximations of A in (X, R) are respectively the following subsets of X: A ¼ fyjy 2 X and Ry � Ag �AA ¼ fyjy 2 X and Ry \ Ag The underlying meaning is that �AA is the set of elements possibly belonging to the concept C (weak membership), while A is the set of elements necessarily belonging to C (strong membership); for y belongs to A if all elements of X indistinguishable from y belong to A (hence there is no doubt that y also belongs to A), while y belongs to �AA as soon as an element of A is indistinguishable from y. If y belongs to the boundary region �AA=A, then there is doubt, because in this case y is at the same time indistinguishable from at least one element of A and at least one element of X that is not in A. A set A is called definable if A ¼ �AA (Radzikowska & Kerre, 2002). The existing variety in terminology on this concept of definability might indicate the high importance individual researchers attach to it. Thiele for instance uses the term ‘R-exact’ as a synonym for ‘definable’ (see Thiele, 1998). Pawlak (1982) defines a ‘composed set’ as a finite union of equivalence classes. In 1985 he generalizes this definition to any union of equivalence classes; it can be verified that this notion of composed set coincides with that of definable set. Still other terms to denote the same notion are mentioned in Iwinski (1987). A similar phenomenon occurs concerning the definition of the concept ‘rough set’: some authors say that a set A in X is a ‘rough set’ if A 6¼ �AA (see for example Komorowski et al., 1999). Hence they call a set ‘rough’ if the boundary region is not empty, i.e. if there is at least one object which cannot be classified with certainty as a member of the concept, nor as a member of its complement. Pawlak (1982), on the other hand, originally defined a rough set (A1, A2) as the class of all sets that have A1 and A2 respectively as lower and upper approximations. This convention is also adopted by Thiele (2001). Still others call (A1, A2) a rough set (in (X, R)) as soon as there is a set A in X such that A ¼ A1 and �AA ¼ A2 (see for example Radzikowska & Kerre, 2002); (A1, A2) is then also called the rough set of A. Without loss of generality of the discussion, we follow the latter approach. Rough set theory is strongly related to Gentilhomme’s (1968) flou set theory. A flou set in a universe X is a couple (A1, A2) satisfying A1DA2DX. The first coordinate A1 is called the certain area, the second coordinate A2 the area of maximal extension and A2}A1 the flou zone (or vague zone). The idea of dividing the universe into three areas is a very strong linkage between flou sets and rough sets. One could say that with the introduction of rough set theory an important next step forward was made, providing a semantically justifiable and computationally feasible way to construct the three areas involved, by approximating a set A by its lower and upper approximations. One can verify that for a reflexive relation R A � A which justifies the statement that all rough sets are flou sets. Interestingly enough, given a flou set (A1, A2) in X, it is not always possible to find an approximation space (X, R) such that (A1, A2) is a rough set in (X, R). To illustrate this, we rely on the fact that if the relation R is symmetrical then Að Þ � A � ðAÞ Example 1 Suppose that A2 contains one more element than A1, i.e. A2 ¼ A1,{x0}, x0 in X \A1. If (A1, A2) is the rough set of A then A1 ¼ A � A � A ¼ A2 Hence either A1 ¼ ACA2 or A1CA ¼ A2. In the first case A1 ¼ A implies A ¼ A. Hence ðAÞ ¼ �AA and, because of the symmetry of R, �AA DA, which contradicts AC �AA ¼ A2. In the second case a similar line of reasoning leads to the contradiction ADA. Iwinski rough sets The second stream in the literature was initiated by Iwinski (1987), who did not use an equivalence Expert Systems, November 2003, Vol. 20, No. 5 ________________________________________________________________ 261 EXSY : 02005003 BWUK EXSY 02005003.PDF 08-Aug-03 16:38 226131 Bytes 10 PAGES UN CO RR EC TE D PR O O F relation as an initial building block to define the rough set concept. Instead he departed from a complete subalgebra B of the Boolean algebra P(X) of subsets of X and defined a rough set as a pair of sets (A, B) with A in B, B in B and ADB. At the mathematical level this approach coincides with the one outlined by Pawlak (B being the class of definable sets) (Iwinski, 1987). At the semantic level there is a significant difference: although Iwinski’s formulation provides an elegant mathematical model (Yao, 1996), the absence of the equivalence relation and the set of examples to be approximated makes it hard to interpret. Despite its short history, rough set theory has already been applied successfully in many knowledge-based systems. We refer to (Komorowski et al., (1999) for an extensive overview of achievements in the field. 2.2. L-fuzzy set theory From fuzzy setsy Fuzzy set theory was introduced by Zadeh (1945) as a framework for modelling the vagueness present in everyday life (and in particular, in natural language). A fuzzy set A in a universe X is characterized by an X-[0, 1] mapping, usually denoted by A or by mA, called the membership function. For all x in X, A(x)corresponds to the degree to which x belongs to A. While fundamental research on the theory is still very topical, the application of the fuzzy set theoretical representation of vague concepts also has many working applications nowadays. The rapid growth of the series Studies in Fuzziness and Soft Computing (Kacprzyk, 1992–2002), consisting of both monographs and edited volumes, is a striking illustration of the continuous evolution and the high number of achievements in this field. y to L-fuzzy sets The mapping of elements of the universe to the interval [0, 1], however, implies a crisp, linear ordering of these elements, making [0, 1]-valued fuzzy set theory inadequate to deal with incomparable information. From the beginning therefore some attention has been paid to other partially ordered sets as well. In 1967 Goguen formally introduced the notion of an L-fuzzy set with a membership function taking values in a lattice L. In this paper we assume that (L, rL) is a complete lattice with smallest element 0L and greatest element 1L. An L-fuzzy set A in a universe X is a mapping from X to L, again called the membership function. The L-fuzzy set A is said to be included in the L-fuzzy set B, usually denoted by ADB, if A(x)rL B(x) for all x in X. An L-fuzzy set R in X � X is called a binary fuzzy relation on X. For all x and y in X, R(x, y) expresses the degree to which x and y are related through R. For every y in X, the R-foreset of y is an L-fuzzy set in X, denoted as Ry and defined by Ry(x) ¼ R(x, y) for all x in X. Logical operators L-fuzzy set theoretical operations such as complement, intersection and union can be defined by means of suitable generalizations of the well-known connectives from Boolean logic. Negation, conjunction, disjunction and implication can be generalized respectively to negator, triangular norm, triangular conorm and implicator, all mappings taking values in L. More specifically, a negator in L is any decreasing L-L mapping N satisfying N(0L) ¼ 1L. It is called involutive if N(N(x)) ¼ x for all x in L. A triangular norm (t-norm for short) T in L is any increasing, commutative and associative L 2-L mapping satisfying T(1L, x) ¼ x, for all x in L. A triangular conorm (t-conorm for short) S in L is any increasing, commutative and associative L 2-L mapping satisfying S(0L, x) ¼ x, for all x in L. The N- complement of an L-fuzzy set A in X as well as the T- intersection and the S-union of L-fuzzy sets A and B in X are the L-fuzzy sets coN(A), A-T B and A,S B defined by coNðAÞðxÞ ¼ NðAðxÞÞ A \T BðxÞ ¼ TðAðxÞ; BðxÞÞ A [S BðxÞ ¼ SðAðxÞ; BðxÞÞ for all x in X. The dual of a t-conorm S in L with respect to a negator N in L is a t-norm T in L defined as T(x, y) ¼ N(S(N(x), N(y))). An implicator in L is any L2-L mapping I satisfying I(0L, 0L) ¼ 1L, I(1L, x) ¼ x for all x in L. Moreover we require I to be decreasing in its first, and increasing in its second, component. If S and N are respectively a t-conorm and a negator in L, then it is well known that the mapping IS,N defined by IS;Nðx; yÞ ¼ SðNðxÞ; yÞ is an implicator in L, usually called S-implicator (induced by S and N). Likewise, if T is a t-norm in L, the mapping IT defined by ITðx; yÞ ¼ supfljl 2 L and Tðx; lÞ �L yg is an implicator in L, usually called the residual implicator (of T). The partial mappings of a t-norm T in L are sup- morphisms if T sup i2I xi; y � � ¼ sup i2I Tðxi; yÞ for every family I of indexes. Examples It is easy to verify that the meet and the join operation on L are respectively a t-norm and a t-conorm on L. We denote them by TM and SM respectively. Also A-B is a shorter notation for A-TMB, while A,B corresponds to A,SMB. The [0, 1]-[0, 1] mapping Ns defined as Ns(x) ¼ 1 � x, for all x in [0, 1], is a negator on [0, 1], often called the standard negator. For a [0, 1]-fuzzy set A, coNsðAÞ is commonly denoted by co(A). Table 1 depicts the values of well-known t-norms and t-conorms on [0, 1], for all x and y in [0, 1]. The first column of Table 2 Q1 Q2 262 ________________________________________________________________ Expert Systems, November 2003, Vol. 20, No. 5 EXSY : 02005003 BWUK EXSY 02005003.PDF 08-Aug-03 16:38 226131 Bytes 10 PAGES UN CO RR EC TE D PR O O F shows the values of the S-implicators on [0, 1] induced by the t-conorms of Table 1 and the standard negator Ns, while the second column lists the values of the corresponding residual implicators. Finally we mention that an L-fuzzy relation R on X is called an L-fuzzy T-equivalence relation if it is reflexive (R(x, x) ¼ 1L, for all x in X), symmetrical (R(x, y) ¼ R(y, x), for all x and y in X) and T-transitive (T(R(x, y), R(y, z))rR(x, z), for all x, y and z in X). When L ¼ {0, 1}, L-fuzzy set theory coincides with traditional set theory, in this context also called crisp set theory. {0, 1}-fuzzy sets, {0, 1}-fuzzy relations,y are usually also called crisp sets, crisp relations,y. When L ¼ [0, 1], fuzzy set theory in the sense of Zadeh is recovered. [0, 1]-fuzzy sets, [0, 1]-fuzzy relations,y are commonly called fuzzy sets, fuzzy relations,y. Furthermore it is customary to omit the indication ‘in [0, 1]’ when describing the logical operators, and hence to talk about negators, triangular norms etc. Yet another interesting choice for L gives rise to intuitionistic fuzzy set theory, as described below. 2.3. Intuitionistic fuzzy set theory A particularly interesting lattice of membership degrees leads to intuitionistic fuzzy set (IFS) theory (Atanassov, 1999). This theory basically defies the claim that, from the fact that an element x ‘belongs’ to a given degree (say mA(x)) to a fuzzy set A, it naturally follows that x should ‘not belong’ to A to the extent 1 � mA(x), an assertion implicit in the concept of a fuzzy set. On the contrary, IFSs assign to each element x of the universe both a degree of membership mA(x) and one of non-membership nA(x) such that mA(x) þ nA(x)r1, thus relaxing the enforced duality nA(x) ¼ 1 � mA(x) from fuzzy set theory. Obviously, when mA(x) þ nA(x) ¼ 1 for all elements of the universe, the traditional fuzzy set concept is recovered. By complementing the membership degree with a non- membership degree that expresses to what extent the element does not belong to the IFS, such that the sum of the degrees does not exceed 1, a whole spectrum of knowledge not accessible to fuzzy sets can be accessed. The applica- tions of this simple idea are manyfold indeed: it may be used to express positive as well as negative preferences; in a logical context, with a proposition a degree of truth and one of falsity may be associated; within databases, it can serve to evaluate the satisfaction as well as the violation of relational constraints. More generally, IFSs address the fundamental two-sidedness of knowledge, of positive versus negative information, and by not treating the two sides as exactly complementary (like fuzzy sets do), a margin of hesitation is created. This hesitation is quantified for each x in X by the number pA(x) ¼ 1 � mA(x) � nA(x). IFSs can be considered as special instances of L-fuzzy sets (Deschrijver et al., 2002). Let (L*,rL*) be the complete, bounded lattice defined by L� ¼ fðx1; x2Þ 2 ½0; 1�2jx1 þ x2 � 1g ðx1; x2Þ �L� ðy1; y2Þ , x1 � y1 and x2 � y2 The units of this lattice are denoted 0L* ¼ (0, 1) and 1L* ¼ (1, 0). For each element xAL*, by x1 and x2 we denote its first and second components, respectively. An IFS A in a universe X is a mapping from X to L*. For every xAX, the value mA(x) ¼ (A(x))1 is called the membership degree of x to A; the value nA(x) ¼ (A(x))2 is called the non-member- ship degree of x to A; and the value pA(x) is called the hesitation degree of x to A. Just as L*-fuzzy sets are called IFSs, L*-fuzzy relations are called IF relations. Logical operators The terms IF negator, IF t-norm, IF t- conorm and IF implicator are used to denote respectively a negator in L*, a t-norm in L*, a t-conorm in L* and an implicator in L*. A t-norm T on L* (respectively t-conorm S) is called t-representable (Deschrijver et al., 2002) if there exists a t-norm T and a t-conorm S on [0, 1] (respectively a t-conorm S0 and a t-norm T0 on [0, 1]) such that, for x ¼ (x1, x2), y ¼ (y1, y2)AL*, Tðx; yÞ ¼ ðTðx1; y1Þ; Sðx2; y2ÞÞ Sðx; yÞ ¼ ðS0ðx1; y1Þ; T0ðx2; y2ÞÞ T and S (respectively S0 and T0) are called the representants of T (respectively S). Finally, denoting the first projection mapping on L* by pr1, we recall from Deschrijver et al., (2002) that the [0, 1]-[0, 1] mapping N defined by N(a)¼ pr1N(a, 1� a) for all a in [0, 1] is an involutive negator on [0, 1], if N is an involutive negator on L*. N is called the negator induced by N. Furthermore N(x1, x2)¼ (N(1� x2),1� N(x1)), for all x in L*. Examples The standard IF negator is defined by Ns(x) ¼ (x2, x1), for all x in L*. The meet and the join Table 1: Triangular norms and conorms on [0, 1] t-norm t-conorm TM(x, y) ¼ min(x, y) SM(x, y) ¼ max(x, y) TP(x, y) ¼ x,y SP(x, y) ¼ x þ y � x,y TW(x, y) ¼ max(x þ y � 1, 0) SW(x, y) ¼ min(x þ y, 1) Table 2: S-implicators and residual implicators on [0, 1] S-implicator Residual implicator ISM;Nsðx; yÞ ¼ maxð1 � x; yÞ ITM ðx; yÞ ¼ 1 if x � y y else � ISP;Nsðx; yÞ ¼ 1 � x þ x; y ITP ðx; yÞ ¼ 1 if x � y y x else � ISW;Nsðx; yÞ ¼ minð1 � x þ y; 1Þ ITW ðx; yÞ ¼ minð1 � x þ y; 1Þ Expert Systems, November 2003, Vol. 20, No. 5 ________________________________________________________________ 263 EXSY : 02005003 BWUK EXSY 02005003.PDF 08-Aug-03 16:38 226131 Bytes 10 PAGES UN CO RR EC TE D PR O O F operators on L* are respectively the IF t-norm TM and the IF t-conorm SM defined by TMðx; yÞ ¼ ðminðx1; y1Þ; maxðx2; y2ÞÞ SMðx; yÞ ¼ ðmaxðx1; y1Þ; minðx2; y2ÞÞ Combining TW and SW of Table 1 gives rise to the t- representable IF t-norm TW and IF t-conorm SW defined by TWðx; yÞ ¼ ðmaxð0; x1 þ y1 � 1Þ; minð1; x2 þ y2ÞÞ SWðx; yÞ ¼ ðminð1; x1 þ y1Þ; maxð0; x2 þ y2 � 1ÞÞ However, TL and SL are also possible extensions of TW and SW to IF theory: TLðx; yÞ ¼ ðmaxð0; x1 þ y1 � 1Þ; minð1; x2 þ 1 � y1; y2 þ 1 � x1ÞÞ SLðx; yÞ ¼ ðminð1; x1 þ 1 � y2; y1 þ 1 � x2Þ; maxð0; x2 þ y2 � 1ÞÞ They are not, however, t-representable. All of these IF t- conorms induce IF S-implicators ISM;Nsðx; yÞ ¼ðmaxðx2; y1Þ; minðx1; y2ÞÞ ISW;Nsðx; yÞ ¼ðminð1; x2 þ y1Þ; maxð0; x1 þ y2 � 1ÞÞ ISL; Nsðx; yÞ ¼ðminð1; y1 þ 1 � x1; x2 þ 1 � y2Þ; maxð0; y2 þ x1 � 1ÞÞ while the IF t-norms have residual IF implicators: ITMðx; yÞ ¼ 1L� if x1 � y1 and x2 � y2 ð1 � y2; y2Þ if x1 � y1 and x2oy2 ðy1; 0Þ if x14y1 and x2 � y2 ðy1; y2Þ if x14y1 and x2oy2 8>>< >>: ITWðx; yÞ ¼ ðminð1; 1 þ y1 � x1; 1 þ x2 � y2Þ; maxð0; y2 � x2ÞÞ ITL equals TsL;Ns. 2.4. Fuzzy rough set theory The two main streams in the perception of the notion ‘rough set’ both invoke generalizations to a ‘fuzzy rough set’ notion, intended to approximate a fuzzy set in a fuzzy approximation space, i.e. defined by a fuzzy relation. 1 Many people worked on the fuzzification of upper and lower approximations in the spirit of Pawlak (e.g. Nakamura, 1988; Dubois & Prade, 1990; Yao, 1997, 1998; Thiele, 1998; Radzikowska & Kerre, 2002). In doing so, the central focus moved from elements’ indistinguish- ability (with respect to their attribute values in an information system) to their similarity, again with respect to those attribute values: objects are categorized into classes with ‘soft’ boundaries based on their similarity to one another. A concrete advantage of such a scheme is that abrupt transitions between classes are replaced by gradual ones, allowing an element to belong (to varying degrees) to more than one class. An example at hand is an attribute in an information table that records ages: in order to restrict the number of equivalence classes, the classical rough set theory advises to discretize age values by a crisp partition of the universe, e.g. using intervals [0, 10], [10, 20],y. This does not always reflect our intuition, however: by imposing such harsh boundaries, a person who has just turned 11 will not be taken into account in the [0, 10] class, even when he is only at a minimal remove from full membership in that class. A general definition of fuzzy rough set, absorbing earlier suggestions in the same direction, was given by Radzi- kowska and Kerre (2002). Paraphrasing the following relations which hold in the crisp case y 2 A , ð8x 2 XÞððx; yÞ 2 R ) x 2 AÞ y 2 �AA , ð9x 2 XÞððx; yÞ 2 R and x 2 AÞ they define the lower and upper approximations of a fuzzy set A in X as the fuzzy sets A and �AA in X, constructed by means of an implicator I, a t–norm T and a fuzzy T- equivalence relation R in X: AðyÞ ¼ inf x2X IðRðx; yÞ; AðxÞÞ AðyÞ ¼ sup x2X TðRðx; yÞ; AðxÞÞ for all y in X. For an element y in X, its membership degree in the lower approximation of A is determined by looking at the elements resembling y (the foreset Ry) and by computing to what extent Ry is contained in A. Its membership degree in the upper approximation on the other hand is determined by the overlap between Ry and A. Technically, these operations amount to taking the super- direct image and the direct image respectively of A under the fuzzy relation R, (De Cock, 2002). A couple of fuzzy sets (A1, A2) is called a rough set in the approximation space (X, R, T, I) if there exists a fuzzy set A such that A ¼ A1 and �AA ¼ A2. A suggestion for fuzzification of the second stream view was launched by Nanda and Majumdar (1992). Bearing in mind Iwinski’s original view one would expect a fuzzy rough set to be a couple (A, B) of fuzzy sets A and B, both coming from some kind of algebra and such that ADB. However, Nanda and Majumdar chose to construct something that might be called ‘the fuzzy rough set of a rough set’. Starting from an Iwinski rough set, i.e. a couple (P, Q) of sets P and Q in X, a fuzzy rough set is a couple of fuzzy sets (A, B). A is a fuzzy set in P, while B is a fuzzy set in Q. Furthermore the first should be included in the second. When trying to express such a requirement one 1 Yao (1997) used ‘fuzzy rough set’ to denote the approximation of a crisp set in a fuzzy approximation space, whereas in his view a ‘rough fuzzy set’ gives an approximation of a fuzzy set in a crisp approximation space. Most authors, like us, use ‘fuzzy rough sets’ as a general term to cover ‘fuzzified rough set theory’. 264 ________________________________________________________________ Expert Systems, November 2003, Vol. 20, No. 5 EXSY : 02005003 BWUK EXSY 02005003.PDF 08-Aug-03 16:38 226131 Bytes 10 PAGES UN CO RR EC TE D PR O O F already notices that it is not very convenient that both fuzzy sets are defined on different universes. This is why authors introduce their own ‘tricks’ for extending the universe (e.g. extending the universes of A and B to X (Chakrabarty et al., 1998; Çoker, 1998), extending the smaller universe (that of A) to the bigger one (that of B) (Samanta & Mondal, 2001)). In the literature we did not find any semantic motivation behind this kind of fuzzy rough sets, or any applications using them. Still for many authors this notion of fuzzy rough set seems to be an attractive starting point for the introduction of intuitionistic fuzzy rough sets (Chakrabarty et al., 1998; Samanta & Mondal, 2001; Jena & Ghosh, 2002). Before we go into that topic, however, we will focus on the links between fuzzy rough set theory and IFS theory. 3. Fuzzy rough set theory versus IFS theory Along with the lower and upper approximations, rough set theory provides two kinds of membership: if an element belongs to the lower (respectively upper) approximation of A, we are dealing with strong (respectively weak) member- ship of A (Pawlak, 1982). It is very natural to extend this idea to fuzzy rough set theory: the strong membership function of A is the membership function of the lower approximation of A, while the weak membership function of A is the membership function of the upper approxima- tion of A. In IFS theory we are also dealing with two kinds of functions, namely a membership function m and a non- membership function n such that m � coðnÞ ð1Þ Note that the strong membership function of the fuzzy rough set can be considered as the membership function m of an IFS, while the complement of the weak membership function of the fuzzy rough set can be used as the non- membership function n. This remark (Çoker, 1998) immediately reveals the strong link between rough set theory and IFS theory. One might even argue that IFS theory could really draw profit from fuzzy rough set theory, because the latter brings a means to construct the membership and non-membership functions of an IFS from a fuzzy set of examples and a fuzzy information relation (a fuzzy relation modelling similarity). It is a far greater challenge, however, to imagine what we could do if the input fed to a knowledge-based system carries information not only about the positive side but also about the negative side. How can we additionally benefit if the set of examples is IF, if the information relation is IF? To answer this question, in the next section we lift the study yet one level higher, by exploring different ways of introducing the concept ‘intuitionistic fuzzy rough set’. Keeping in mind what we learned in this section, we expect it to be ‘a couple of couples of functions’ (be it (strong or weak) membership functions, or non-membership func- tions). 4. Intuitionistic fuzzy rough sets A very natural way to extend concepts from fuzzy set theory to their generalizations in IFS theory is the replacement of [0, 1] by L* as the evaluation set for the membership degrees. � Doing so in Nanda and Majumdar’s view on fuzzy rough sets leads to Chakrabarty et al.’s (1998), approach to intuitionistic fuzzy rough sets; they construct an IF rough set (A, B) of a rough set (P, Q). A and B are both IFSs in X such that A � B; i:e: mA � mB and nA nB From this point of view the lower approximation A and the upper approximation B are both IFSs. In other words, the strong membership is itself characterized by a membership function mA and a non-membership function nA, while the membership function mB and the non-membership function nB together constitute the weak membership. In this way we can reflect hesitation on the strong and the weak membership. Jena and Ghosh (2002) reintroduce the same notion. � Samanta and Mondal (2001) also introduce this notion but they call it a rough IF set. Furthermore they also define their concept of IF rough set, looking at it from a different angle: in their approach an IF rough set is a couple (A, B) such that A and B are both fuzzy rough sets (in the sense of Nanda and Majumdar) and A is included in the complement of B, i.e. A � coðBÞ ð2Þ Even though we omit the details of the definition of complement and inclusion of fuzzy rough sets here (cf. (Samanta and Mondal (2001) for the full story), the reader can still compare (2) to (1) to see that A refers to membership of the IF rough set, while B corresponds to non-membership. � Obviously to Samanta and Mondal an intuitionistic fuzzy rough set (A, B) is a generalization of an IFS in which membership and non-membership functions are no longer fuzzy sets but fuzzy rough sets A and B. Note that for Chakrabarty et al. on the other hand an intuitionistic fuzzy rough set (A, B) is a generalization of a fuzzy rough set, in which upper and lower approximations are no longer fuzzy sets but IFSs A and B. � In contrast to the proposals above, the approach of Rizvi et al. (2002) is along the lines of Pawlak’s rough sets, explicitly indicating how lower and upper approx- imations of an IFS should be derived in an approxima- tion space. Rizvi et al. describe their proposal as ‘rough intuitionistic fuzzy set’, and in comparison with the Expert Systems, November 2003, Vol. 20, No. 5 ________________________________________________________________ 265 EXSY : 02005003 BWUK EXSY 02005003.PDF 08-Aug-03 16:38 226131 Bytes 10 PAGES UN CO RR EC TE D PR O O F approaches above one could say that it is again about having a hesitation margin on the weak and strong memberships, i.e. on the upper and lower approxima- tions. The hesitation margin on the lower and upper approximations can be seen as a consequence of the hesitation margin of the original IFS to be approxi- mated. Remarkably enough, the lower and upper approximations themselves are not IFSs in X but IFSs in the class of equivalence classes of R! The question remains unanswered why the definition of Radzikowska and Kerre (2002) — which has existed already in a more specific form for more than a decade — did not undergo the natural transformation process towards intuitionistic fuzzy rough set theory until now. In the remainder of this paper it becomes clear that it in fact leads to a mathematically elegant and semantically interpretable concept whereas the above-mentioned pro- posals all suffer from various drawbacks making them less eligible for applications. Until now we have been using the notations A and �AA to denote the lower and the upper approximation of A (whatever A may be), tacitly assuming that the (fuzzy) relation as well as the t-norm and the implicator are used. This was done not only for ease of notation, but also to keep a uniform notation throughout all the different models. However, to maintain clearness in the remainder, we switch to a slightly different notation. Furthermore from now on we assume that T is an IF triangular norm, I an IF implicator and R an IF T- equivalence relation in X. Together they constitute the approximation space (X, R, T, I). We use A and B to denote IFSs in X. As usual we start by defining concepts of lower and upper approximation. Definition 1 The lower and upper approximations of A are respectively the IFSs RkIA and RmTA in X defined by R #I AðyÞ ¼ inf x2X IðRðx; yÞ; AðxÞÞ R "T AðyÞ ¼ sup x2X TðRðx; yÞ; AðxÞÞ for all y in X. We say that a couple of IFSs (A1, A2) is an intuitionistic fuzzy rough set in the approximation space (X, R,T, I) if there exists an IFS A such that RkIA ¼ A1 and RmTA ¼ A2. The lower approximation of A is included in A, while A is included in its upper approximation. Proposition 1 (De Cock, 2002) RkIADADRmTA. The following propositions describe how the lower and upper approximations behave with respect to a refinement of the IFS to be approximated, or a refinement of the IF relation that defines the approximation space. Proposition 2 (De Cock, 2002) If ADB then RkIAD RkIB and RmTADRmTB. Proposition 3 (De Cock, 2002) If R1 and R2 are IF T- equivalence relations such that R1DR2 then R1 #I A R2 #I A R1 "T A � R2 "T A We now define a concept of definability, similar to the one in rough set theory. Definition 2 A is called definable if and only if RkIA ¼ RmTA. In classical rough set theory, a set is definable if and only if it is a union of equivalence classes. This property no longer holds in intuitionistic fuzzy rough set theory. However, if we imply sufficient conditions on the IF t-norm T and the IF implicator I defining the approximation space, we can still establish a weakened theorem. For the proof we rely on the following two propositions. Proposition 4 (De Cock, 2002) If the partial mappings of T are sup-morphisms and I is the residual implicator of T then the following are equivalent (1) A ¼ RkIA (2) A ¼ RmTA Proposition 5 (De Cock, 2002) If the partial mappings of T are sup-morphisms and I is the residual implicator of T then R "T ðR "T AÞ ¼ R "T A R #I ðR #I AÞ ¼ R #I A Corollary 1 If the partial mappings of T are sup- morphisms and I is the residual implicator of T then R "T ðR #I AÞ ¼ R #I A R #I ðR "T AÞ ¼ R "T A Theorem 1 If the partial mappings of T are sup-morphisms and I is the residual implicator of T then any union of R- foresets is definable, i.e. ð9BÞ B � X and A ¼ [ z2B Rz ! implies R #I A ¼ R "T A Proof Due to the symmetry and the T-transitivity of R and the fact that the partial mappings of T are sup- morphisms, we obtain R "T AðyÞ ¼ sup x2X TðRðx; yÞ; AðxÞÞ ¼ sup x2X TðRðx; yÞ; sup z2B Rðz; xÞÞ ¼ sup z2B sup x2X TðRðx; yÞ; Rðz; xÞÞ � sup z2B Rðz; yÞ ¼AðyÞ 266 ________________________________________________________________ Expert Systems, November 2003, Vol. 20, No. 5 EXSY : 02005003 BWUK EXSY 02005003.PDF 08-Aug-03 16:38 226131 Bytes 10 PAGES UN CO RR EC TE D PR O O F Proposition 1 now implies A ¼ RmTA. Combining this result with Proposition 4 we obtain the definability of A. The following example illustrates that the opposite of Theorem 1 does not generally hold, i.e. not every definable set corresponds to a union of R-foresets. Example 2 Let X ¼ {x0}, R(x0, x0) ¼ (1, 0), A(x0) ¼ (0.5, 0.5). Then R "T Aðx0Þ ¼ TðRðx0; x0Þ; Aðx0ÞÞ ¼ ð0:5; 0:5Þ ¼ Aðx0Þ That is, A is definable. Since the only R-foreset is Rx0 and Rx0(x0)>L* A(x0) we cannot rewrite A as a union of R- foresets. Under the same conditions implied on T and I as in Theorem 1, the SM-union and TM-intersection of two definable IFSs is definable. This is a corollary of the next proposition. Proposition 6 (De Cock, 2002) If the partial mappings of T are sup-morphisms and I is the residual implicator of T then R "T ðA [ BÞ ¼ R "T A [ R "T B R #I ðA \ BÞ ¼ R #I A \ R #I B with AB and A-B defined as in Section 2.2, i.e. A [ BðxÞ ¼ ðminðmAðxÞ; mBðxÞÞ; maxðnAðxÞ; nBðxÞÞÞ A \ BðxÞ ¼ ðmaxðmAðxÞ; mBðxÞÞ; minðnAðxÞ; nBðxÞÞÞ Proposition 7 Let T be a t-representable IF t-norm such that T ¼ (T, S) and such that S(x, y) ¼ 1 � T(1 � x, 1 � y), and let R1, R2 be two fuzzy T-equivalence relations such that R1(x, y)rR2(x, y), for all x and x in X. Then R defined by Rðx; yÞ ¼ ðR1ðx; yÞ; 1 � R2ðx; yÞÞ for x and y in X, is an IF T-equivalence relation. Proof Reflexivity and symmetry of R follow immediately from the corresponding properties of R1 and R2. To prove T-transitivity, let x, y and zAX. TðRðx; yÞ; Rðy; zÞÞ ¼ðTðR1ðx; yÞ; R1ðy; zÞÞ; Sð1 � R2ðx; yÞ; 1 � R2ðy; zÞÞÞ ¼ðTðR1ðx; yÞ; R1ðy; zÞÞ; 1 � TðR2ðx; yÞ; R2ðy; zÞÞÞ �L� ðR1ðx; zÞ; 1 � R2ðx; zÞÞ ¼Rðx; zÞ Example 3 Let X ¼ [0, 100], and let the fuzzy TW- equivalence relation Ec on X be defined by Ecðx; yÞ ¼ max 1 � jx � yj c ; 0 � � for all x and y in X, and with real parameter c>0. Obviously, if c1rc2 then Ec1ðx; yÞ � Ec2ðx; yÞ. By Proposi- tion 7, ðEc1; coðEc2ÞÞ is an IF TW-equivalence relation. Example 4 Figure 1 shows the membership function mA and the non-membership function nA of the IFS A in the universe X ¼ [0, 100]. Using the non-t-representable IF t- norm TL, its residual IF implicator ITL and the IF relation R defined by Rðx; yÞ ¼ ðE40ðx; yÞ; 1 � E40ðx; yÞÞ for all x and y in [0, 100] we computed the lower approximation of A( ¼ A1) as well as the upper approxima- tion of A( ¼ A2). They are both depicted in Figure 1. An IFS A is characterized by means of a membership function mA and a non-membership function nA. A natural question which arises is whether the lower approximation and the upper approximation of A could be defined in terms of the lower and the upper approximation of mA and nA (all within the proper approximation spaces of course). Generally such a ‘divide and conquer’ approach is every- thing but trivial in IFS theory, and sometimes even impossible. However, some conditions implied on the logical operators involved can allow for some results in this direction. Particularly attractive are the t-representable t- norms and t-conorms, and the S-implicators that can be associated with them. Lemma 1 For every family (ai, bi)iAI in L* sup i2I ðai; biÞ ¼ sup i2I ai; inf i2I bi � � inf i2I ðai; biÞ ¼ inf i2I ai; sup i2I bi � � Proposition 8 Let T be t-representable such that T ¼ (T, S), let N be an involutive negator on [0, 1], and let I Figure 1: A (solid lines), and the upper (broken lines) and lower (dotted lines) approximations of A. Expert Systems, November 2003, Vol. 20, No. 5 ________________________________________________________________ 267 EXSY : 02005003 BWUK EXSY 02005003.PDF 08-Aug-03 16:38 226131 Bytes 10 PAGES UN CO RR EC TE D PR O O F be the S-implicator on [0, 1] induced by S and N. Then R "T A ¼ ðmR "T mA; ðcoNðnRÞÞ #I nAÞ Proof For all y in X R "T AðyÞ ¼ sup x2X TðRðx; yÞ; AðxÞÞ ¼ sup x2X ðTðmRðx; yÞ; mAðxÞÞ; SðnRðx; yÞ; nAðxÞÞÞ ¼ðsup x2X TðmRðx; yÞ; mAðxÞÞ; inf x2X IðNðnRðx; yÞÞ; nAðxÞÞÞ ¼ððmR "T mAÞðyÞ; ðcoNðnRÞ #I nAÞðyÞÞ Proposition 9 Let S be a t-representable IF t-conorm such that S ¼ (S, T), let N be an involutive IF negator, let I be the IF S-implicator induced by S and N, let N be the negator on [0, 1] induced by N and let I be the S-implicator induced by S and N. Then R #I A ¼ ððconRÞ #I mA; coðcoNmRÞ "T nAÞ Proof For all y in X R #I AðyÞ ¼ inf x2X IðRðx; yÞ; AðxÞÞ ¼ inf x2X SðNðRðx; yÞÞ; AðxÞÞ ¼ inf x2X SððNð1 � nRðx; yÞÞ; mAðxÞÞ; supx2XTð1 � NðmRðx; yÞÞ; nAðxÞÞÞ ¼ðinf x2X IðconRðx; yÞ; mAðxÞÞ; sup x2X TðcoðcoNðmRÞÞðx; yÞÞ; nAðxÞÞÞ ¼ððconR #I mAÞðyÞ; ðcoðcoNðmRÞÞ "T nAÞðyÞÞ Observe that in both propositions on the ‘fuzzy level’ the approximations are taken under the membership function mR, or something semantically very much related such as the N-complement of the non-membership function nR or once even the standard complement of the N-complement of mR. Presented in this way, the resulting formulae look quite complicated. For better understanding, assume for a moment that the negator N is the standard negator, and that the IF relation R is in reality a fuzzy relation, i.e. mR ¼ co(nR); then the formulae reduce to R "T A ¼ ðmR "T mA; mR #I nAÞ R #I A ¼ ðmR #I mA; mR "T nAÞ Apparently in this case the membership function of the upper approximation of A is the upper approximation of the membership function of A, while the non-membership function of the upper approximation of A is the lower approximation of the non-membership function of A. For the lower approximation of A the dual proposition holds. Example 5 Figure 2 shows the same IFS A we used in Example 4. However, to compute its lower approximation A1 and its upper approximation A2, this time we used the t- representable IF t-norm TW, its residual IF implicator and the IF relation R defined by Rðx; yÞ ¼ ðE30ðx; yÞ; E50ðx; yÞÞ for all x and y in [0,100]. 5. Conclusion Rough sets and IFSs both capture particular facets of the same notion—imprecision. In this paper, it was shown how they can be usefully combined into a single framework encapsulating the best of (so far) largely separate worlds. The link, on the syntactical level, between fuzzy rough sets and IFSs, identified by Çoker, has not proven much of an obstacle in this sense: indeed, by allowing each ingredient to retain its own distinguishing semantics, it was possible to create an end product which is both syntactically sound and semantically meaningful. We feel especially justified in our cause since we have exploited to the fullest a time- honoured adage: namely, that to every object there is a positive and a negative side which need to be addressed individually in order to come up with a true representation of that object. Future work will involve tailoring this general framework to the specific needs of everyday applications. Acknowledgement Chris Cornelis and Martine De Cock would like to thank the Fund for Scientific Research Flanders—FWO for funding the research reported in this paper. Figure 2: Upper (broken lines) and lower (dotted lines) approximations of A. 268 ________________________________________________________________ Expert Systems, November 2003, Vol. 20, No. 5 EXSY : 02005003 BWUK EXSY 02005003.PDF 08-Aug-03 16:38 226131 Bytes 10 PAGES UN CO RR EC TE D PR O O F References ATANASSOV, K.T. (1999) Intuitionistic Fuzzy Sets, Heidelberg: Physica. CHAKRABARTY, K., T. GEDEON and L. KOCZY (1998) Intuitionistic fuzzy rough sets, in Proceedings of the Fourth Joint Conference on Information Sciences, 211–214. ÇOKER, D. (1998) Fuzzy rough sets are intuitionistic L-fuzzy sets, Fuzzy Sets and Systems, 96, 381–383. DE COCK, M. (2002) A thorough study of linguistic modifiers in fuzzy set theory, PhD thesis (in Dutch), Ghent University. DESCHRIJVER, G., C. CORNELIS and E.E. KERRE (2002) Intuitio- nistic fuzzy connectives revisited, in Proceedings of the 9th International Conference on Information Processing and Manage- ment of Uncertainty in Knowledge-based Systems, 1839–1844 DUBOIS, D. and H. PRADE (1990) Rough fuzzy sets and fuzzy rough sets, International Journal of General Systems, 17, 191–209. GENTILHOMME, Y. (1968) Les ensembles floues en linguistique, Cahiers de Linguistique Théorique, 5, 47–63. GOGUEN, J. (1967) L-fuzzy sets, Journal of Mathematical Analysis and Applications, 18, 145–174. IWINSKI, T.B. (1987) Algebraic approach to rough sets, Bulletin of the Polish Academy of Science and Mathematics, 35, 673–683. JENA, S.P. and S.K. GHOSH (2002) Intuitionistic fuzzy rough sets. Notes on Intuitionistic Fuzzy Sets 8, 1, 1–18. KACPRZYK, J. (ed.) (1992–2002) Studies in Fuzziness and Soft Computing, Heidelberg: Physica, Vols 1–100. KOMOROWSKI, J., Z. PAWLAK, L. POLKOWSKI and A. SKOWRON (1999) Rough sets: a tutorial, in Rough Fuzzy Hybridization: A New Trend in Decision-Making, S.K. Pal and A. Skowron (eds), Singapore: Springer. NAKAMURA, A. (1988) Fuzzy rough sets, Note on Multiple-Valued Logic in Japan, 9, 1–8. NANDA, S. and S. MAJUMDAR (1992) Fuzzy rough sets, Fuzzy Sets and Systems, 45, 157–160. PAWLAK, Z. (1982) Rough sets, International Journal of Computer and Information Sciences, 11 (5), 341–356. PAWLAK, Z. (1985) Rough sets and fuzzy sets, Fuzzy Sets and Systems, 17, 99–102. RADZIKOWSKA, A.M. and E.E. KERRE (2002) A comparative study of fuzzy rough sets, Fuzzy Sets and Systems, 126, 137–156. RIZVI, S., H.J. NAQVI and D. NADEEM (2002) Rough intuitionistic fuzzy sets, in Proceedings of the 6th Joint Conference on Information Sciences, H.J. Caulfield, S. Chen, H. Chen, R. Duro, V. Honavar, E.E. Kerre, M. Lu, M.G. Romay, T.K. Shih, D. Ventura, P.P. Wang and Y. Yang (eds), 101–104. SAMANTA, S.K. and T.K. MONDAL (2001) Intuitionistic fuzzy rough sets and rough intuitionistic fuzzy sets, Journal of Fuzzy Mathematics, 9 (3), 561–582. THIELE, H. (1998) Fuzzy rough sets versus rough fuzzy sets—an interpretation and a comparative study using concepts of modal logic, Technical Report ISSN 1433-3325, University of Dort- mund. THIELE, H. (2001) Generalizing the explicit concept of rough set on the basis of modal logic, in Computational Intelligence in Theory and Practice, B. Reusch and K.H. Temme (eds), Berlin: Physica. YAO, Y.Y. (1996) Two views of the theory of rough sets in finite universes, International Journal of Approximate Reasoning, 15 (4), 291–317. YAO, Y.Y. (1997) Combination of rough and fuzzy sets based on alpha-level sets, in Rough Sets and Data Mining: Analysis for Imprecise Data, T.Y. Lin and N. Cercone (eds), Baston, MA: Kluwer Academic, 301–321. YAO, Y.Y. (1998) A comparative study of fuzzy sets and rough sets, Information Sciences, 109, 227–242. ZADEH, L.A. (1965) Fuzzy sets, Information and Control, 8, 338–353. The authors Chris Cornelis Chris Cornelis received the BSc and MSc degrees in computer science from Ghent University, Belgium, in 1998 and 2000, respectively, and is currently working towards his PhD degree, sponsored by a grant from the National Science Foundation—Flanders. His research interests involve mathematical models of imprecision, including interval-valued fuzzy sets, intuitionistic fuzzy sets, L-fuzzy sets, rough sets and various combinations of these. He has applied these models in approximate reasoning, possibility theory and data mining. Martine De Cock Martine De Cock received an MSc degree in computer science and a teaching degree at Ghent University in 1998. In 2002 she obtained a PhD in computer science with a thesis on the representation of linguistic modifiers in fuzzy set theory. Currently she is working in the Fuzziness and Uncertainty Modelling Research Unit (Ghent University) as a postdoctoral researcher of the Fund of Scientific Research—Flanders (FWO). Her current research interests include rough set theory, L-fuzzy set theory, fuzzy association rules, and the representation of approximate equality. Etienne E. Kerre Etienne E. Kerre was born in Zele, Belgium, on 8 May 1945. He obtained his MSc degree in mathematics in 1967 and his PhD in mathematics in 1970 from Ghent University. Since 1984, he has been a lecturer, and since 1991 a full professor, at Ghent University. He is a referee for more than 30 international scientific journals, and also a member of the editorial board of international journals and conferences on fuzzy set theory. He has been an honorary chairman at various international conferences. In 1976, he founded the Fuzziness and Uncertainty Modelling Re- search Unit (Ghent University) and since then his research has been focused on the modeling of fuzziness and uncertainty, and has resulted in a great number of contributions in fuzzy set theory and its various general- izations, and in evidence theory. He has been particularly involved in the theories of fuzzy relational calculus and of fuzzy mathematical structures. Over the years he has also been a promotor of 16 PhDs on fuzzy set theory. His current research interests include fuzzy and intuitionistic fuzzy relations, fuzzy topology and fuzzy image processing. He has authored or co-authored 11 books, and more than 100 papers in international refereed journals. Q3 Q4 Q5 Q6 Expert Systems, November 2003, Vol. 20, No. 5 ________________________________________________________________ 269 EXSY : 02005003 BWUK EXSY 02005003.PDF 08-Aug-03 16:38 226131 Bytes 10 PAGES work_3czyinsxobhzjdakbts7whwj5q ---- Solid/liquid separation equipment simulation & design – an expert systems approach Browse Explore more content Tarleton 1.pdf (260.79 kB) Solid/liquid separation equipment simulation & design – an expert systems approach CiteDownload (260.79 kB)ShareEmbed journal contribution posted on 30.09.2009, 13:01 by Richard J. Wakeman, Steve Tarleton Published texts on solid/liquid separation technology have allowed a limited amount of knowledge to be available widely, but none put forward the rules of thumb in such a manner that they are readily assimilable by the non-expert. There are many unpublished techniques and approaches. The computer technologist might target solid/liquid separation as a technology ripe for the application of expert systems. It is argued here that expert systems on their own are inadequate in this subject area, but the most effective software utilises a well-chosen mix of algorithm, graphics, expert system, and interactive input from the engineer. In this paper some examples are given of both public and private knowledge and an example of the application of the combined approach to equipment selection demonstrates that efficient software can save time and enable decisions to be made rapidly. Equipment selection using the pC-SELECT software is demonstrated. Categories Chemical Engineering not elsewhere classified Keywords untagged History School Aeronautical, Automotive, Chemical and Materials Engineering Department Chemical Engineering Citation WAKEMAN, R.J. and TARLETON, E.S., 1991. Solid/liquid separation equipment simulation & design – an expert systems approach. Filtration & Separation, 28 (4), pp. 268-274 Publisher © Elsevier Version AM (Accepted Manuscript) Publication date 1991 Notes This article was published in the journal, Filtration & Separation [© Elsevier] and the definitive version is available at: http://dx.doi.org/10.1016/0015-1882(91)80118-O DOI https://doi.org/10.1016/0015-1882(91)80118-O ISSN 0015-1882 Language en Administrator link https://repository.lboro.ac.uk/account/articles/9242387 Licence CC BY-NC-ND 4.0 Exports Select an optionRefWorksBibTeXRef. managerEndnoteDataCiteNLMDC Categories Chemical Engineering not elsewhere classified Keywords untagged Licence CC BY-NC-ND 4.0 Exports Select an optionRefWorksBibTeXRef. managerEndnoteDataCiteNLMDC Hide footerAboutFeaturesToolsBlogAmbassadorsContactFAQPrivacy PolicyCookie PolicyT&CsAccessibility StatementDisclaimerSitemap figshare. credit for all your research. work_3f36lrvqwjdjdmkat65ktfoney ---- RRS1_2017.indd Romanian Statistical Review nr. 1 / 2017 3 Expert System and Heuristics Algorithm for Cloud Resource Scheduling Mamatha E (sricsrmax@gmail.com) Dept of Engineering Mathematics, GITAM University, Bangalore, India Sasritha S Dept of Engineering Mathematics, GITAM University, Bangalore, India CS Reddy School of Computing, SASTRA University, Thanjavur, India ABSTRACT Rule-based scheduling algorithms have been widely used on cloud comput- ing systems and there is still plenty of room to improve their performance. This paper proposes to develop an expert system to allocate resources in cloud by using Rule based Algorithm, thereby measuring the performance of the system by letting the sys- tem adapt new rules based on the feedback. Here performance of the action helps to make better allocation of the resources to improve quality of services, scalability and fl exibility. The performance measure is based on how the allocation of the resources is dynamically optimized and how the resources are utilized properly. It aims to maxi- mize the utilization of the resources. The data and resource are given to the algorithm which allocates the data to resources and an output is obtained based on the action occurred. Once the action is completed, the performance of every action is measured that contains how the resources are allocated and how effi ciently it worked. In addition to performance, resource allocation in cloud environment is also considered. Keywords: Cloud computing, Scheduling and Expert System, Heuristic Models JEL Classifi cation: C87 INTRODUCTION Cloud Computing, the long-held dream of computing industry, has the capability to change large IT industry, making software even more usable as a service and changing the way IT hardware is made and purchased. Developers with creative ideas for new Internet services no longer need the large capital costs in hardware to install their service or the human expense to operate it [1-5]. They need not be bothered about over-provisioning for a service whose attractiveness does not meet their predictions, thus wasting the Romanian Statistical Review nr. 1 / 20174 resources, or under-provisioning for one that becomes wildly accepted, thus missing potential customers and revenue. The prominent feature of cloud computing that it allows the consumption of services over internet with subscription based model. Based on the level of abstraction, various models for cloud computing like Infrastructure- as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS). This model of service consumption is extremely suitable for many workloads and cloud computing has become highly successful technology [5]. It allows its users to pay for what they use and also remove the upfront infrastructure cost. Cloud service providers receive resource requests from a number of users through the use of virtualization. It is very essential for cloud providers to operate very effi ciently in multiplexing at the scale to remain profi table. The increase in the use of cloud computing has risen to develop massive data centers with very large number of servers. The resource management at this scale is the concerned issue. Scheduling is responsible for arbitrate of resources and is at the center of resource management. The issue of effi ciency at this rate and developing model of consumption of cloud providers needs new approaches and techniques to be applied to the age old problem of scheduling. Virtual machine is the primary unit of scheduling in this model. In this study, we deal with problem of virtual machine scheduling over physical machines. We aim to understand and solve the various aspect of scheduling in cloud environments [6,7]. Specifi cally, we leverage different fi ne grained monitoring information to make better scheduling decisions and used learning based approach of scheduling in widely different environments. There is increasing concern over energy consumption by cloud data centers and cloud operators are focusing on energy savings through effective utilization of resources [8,9]. SLAs is also very important for them for the performance of applications running. We propose algorithms which try to minimize the energy consumption in the data center duly maintaining the SLA ensures. The algorithms try to use very less number of physical machines in the data center by dynamically rebalancing the physical machines based on their resource utilization. The algorithms also do an optimization of virtual machines on a physical machine, reducing SLA violations. For large scale distributed system autonomic management [10,11] is one of the most wanted features and even more important in dynamic infrastructures such as Clouds. This is self-managing such as self-healing, self-regulating, self-protecting, and self-improving. Good work in both academia and industry has been already carried out. Tsai [12] reported an overview of the early efforts in developing Metaheuristic scheduling for autonomic systems for storage management. Computing Grids have benefi ted Romanian Statistical Review nr. 1 / 2017 5 from the application of autonomic models for management of resources and the scheduling of applications [13,14,15]. Solutions for secure Cloud platforms have been proposed in the literature [16,17]. However, existing works are yet to address issues related to recognition of attacks against SaaS with the aim of exploiting elasticity. The basic idea of the proposed algorithm is to leverage the strengths of heuristic algorithms, such as swarm optimization [18], Scheduling Computer and Manufacturing Processes [19], Hadoop map- task scheduling [21,22] and ant colony optimization [20], by integrating them into a single algorithm. One of the important tasks for cloud provider is scheduling data packets for better achievement and effi cient resource pooling along with elasticity. In cloud computing scenarios scheduling algorithm becomes very signifi cant where the cloud service providers have to operate at very much competent to be competitive and take advantage at scale[23,24]. The wide acceptance of cloud computing means data centers with many more machines and the usage model is much different than traditional clusters, like hour boundaries, auction based prices are to name a few. Thus scheduling in cloud data center is more challenging than traditional cluster schedulers. Also, these data center run many different kinds of applications with varying expectations from infrastructure. Resource usage patterns in traditional data centers are have less variance in than the unpredictability faced by cloud data centers. PROBLEM ANALYSIS AND DEFINITION Since Rule based algorithms are straight forward and can be easily implemented, so there is lot of possibility to improve the performance of these algorithms especially in cloud environments. Conventional models and its corresponding algorithms are pretty good for small scale environments, however owing to the advent improvement of computer and related internet technological improvements; there is a huge range to enhancements in these algorithms to get better performance in large scale system. Extreme use of number of servers in the recent period has reduced the usage of traditional scheduling techniques. The resource management at this scale is the concerned as an issue. Scheduling is responsible for arbitrate of resources and is at the center of resource management. The issue of effi ciency at this rate and developing model of consumption of cloud providers wants new methodologies and techniques that to be extended to apply to presently available old algorithms of scheduling. The main objective of the paper is to develop and implement more effi cient scheduling algorithm suitable for cloud system. Since parallel and Romanian Statistical Review nr. 1 / 20176 distributed computing technologies was widely used to improve the diversity performance of computer systems, a number of models and thoughts have been anticipated for different approaches and congenital limitations in diverse eras. Whatever it may be the contemplation it is for; the way to competently use computer resources is a key research matter. Among all of them, scheduling is indispensable in the success of escalating the performance of the computing system. The wide acceptance of cloud computing means data centers with many more machines and the usage model is much different than traditional clusters, like hour boundaries, auction based prices are to name a few. Thus scheduling in cloud data center is more challenging than traditional cluster schedulers. PRESENT SYSTEM AND PROPOSED SCHEDULING ALGORITHM In this research paper, by the simple scheduling problems, we mean problems for which all the solutions can be checked in a reasonable time by using classical exhaustive algorithms running on modern computer systems. In comparison, with the large scale scheduling problems, like the problems for which not all the solutions can be examined in a reasonable time by using the same algorithms running on the same computer systems. These observations make it easy to understand that exhaustive algorithms will take a prohibitive amount of time to check all the candidate solutions for large scheduling problems because the number of candidate solutions is simply way too large to be checked in a reasonable time. As a result, researchers have paid their attention to the development of scheduling algorithms that are effi cient and effective, such as heuristics. Workfl ow is used with the automation of procedures where by fi les and data are passed between participants according to a defi ned set of rules to achieve an overall goal. A workfl ow management system is the one which manages and executes workfl ows on computing resources. Workfl ow Scheduling: It is a kind of global task scheduling as it focuses on mapping and managing the execution of inter-dependent tasks on shared resources that are not directly under its control. The authors classify and review hyper-heuristic approaches into the following four categories: based on the random choice of low level heuristics, greedy and puckish, meta heuristic-based, and those employing learning mechanisms to manage low level heuristics. The hyper heuristics can be used to operate at a higher level of abstraction. Meta heuristic techniques are expensive techniques that require knowledge in problem domain and heuristic technique. Hyper heuristic technique does not require problem Romanian Statistical Review nr. 1 / 2017 7 specifi c knowledge. In order to solve hard computational search problems the hyper heuristic techniques can be used. The hyper heuristic techniques can be operated on the search space of heuristics. Proposed System: The basic idea of the proposed algorithm is to use the diversity detection and improvement detection operators to balance the intensifi cation and diversifi cation in the search of the solutions during the convergence process. The proposed algorithm, called hyper-heuristic scheduling algorithm (HHSA).The parameters max and ni, where max denotes the maximum number of iterations the selected low-level heuristic algorithm is to be run; ni the number of iterations the solutions of the selected low- level heuristic algorithm are not improved. Line 2 reads in the tasks and jobs to be scheduled, i.e., the problem question. Line 3 initializes the population of solutions Z = fz1; z2;:::; zNg, where N is the population size. Online 4, a heuristic algorithm Hi is randomly selected from the candidate pool H = fH1;H2; :::;Hng. Hyper-heuristics are high level problem independent heuristics that work with any set of problem dependent heuristics and adaptively apply and combine them to solve a specifi c problem. This could be due to the fact that variants of differential evolution, which we mainly use as basic heuristics due to their competitive performance and simple confi guration, strongly depend on the population distribution. Hyper-heuristics might be regarded as a special form of genetic programming, the key intuition underlying research in this area is that, for a given type of problem, there are often a number of straightforward heuristics already in existence that can work well (but perhaps not optimally) for certain sorts of instances of that type of problem. Perhaps it is possible to combine those existing heuristics into some more elaborate algorithm that will work well across a range of problems. Cloud Computing denotes both the applications delivered as services, the hardware and systems software in the datacenters that provide those services. The services has been refered to as Software as a Service (SaaS). The datacenter hardware and software together combinedly called a Cloud. When a Cloud is of the kind pay-as-you-go manner to the public, then it is a Public Cloud; if the service is being sold then it is a Utility Computing. The term Private Cloud refers to internal datacenters of a business or other organizations, thart are not made available to the public. Thus, Cloud Computing is the combination of SaaS and Utility Computing, but does not comprise Private Clouds. Anyone can be users or providers of Software as a Service, and users or providers of Utility Computing. Romanian Statistical Review nr. 1 / 20178 Advantage:  Statistical multiplexing is a method that can be used to maximize utilization when compared to a private cloud.  Cloud computing can also offer services at a cost of a medium- sized datacenter and can even make a good profi t. Disadvantage:  The one important disadvantage is the chance of data loss.  User can leave the site permanently after facing a poor service; this negative feedback may result in a permanent loss of a portion of the revenue stream. WORKFLOW ENGINE: A Workfl ow engine is a software service that is used to provide the run-time environment in order to create, maintain and develop workfl ow instances. The representation of a workfl ow process is in a form which supports automated manipulation. Invoked Applications: Interfaces to support interaction with a variety of IT applications Workfl ow Client Applications: Interfaces to support interaction with the user interface. Administration and Monitoring: Interfaces to provide system monitoring and metric functions to facilitate the management of composite workfl ow application environments. It can be seen that scheduling is a functional module of a Workfl ow Engine, becoming a signifi cant part of workfl ow management systems. Workfl ow is concerned with the automation of procedures whereby fi les and data are passed between Participants according to a defi ned set of rules to achieve an overall goal. A workfl ow management system defi nes, maintains and develops workfl ows on computing resources. VIRTUAL GRID EXECUTION SYSTEM (VGES): vgES provides an uniform qualitative resource abstraction over grid and cloud systems. We apply vgES for scheduling a set of deadline sensitive weather forecasting workfl ows. Specifi cally, this paper reports on our experiences with: 1. Virtualized reservations for batch queue systems, 2. Coordinated usage of Tera Grid (batch queue),Amazon EC2 (cloud), our own clusters (batch queue) and Eucalyptus(cloud) resources, and Romanian Statistical Review nr. 1 / 2017 9 3. Fault tolerance through automated task replication. The combined effect of these techniques was to enable a new workfl ow planning method to balance the performance, reliability and the cost considerations. These results point toward improved resource selection and execution management support for a variety of e-Science applications over grids and cloud systems. This paper brings together many of the results from the VGrADS project demonstrating the effectiveness of virtual grids for scheduling LEAD workfl ows. In the process, it demonstrates a seamless merging of cloud and HPC resources in service of a scientifi c application. It also applies advanced scheduling techniques for both performance improvement and fault tolerance in a realistic context. LEAD has been run as a distributed application since its inception, but VGrADS methods have opened new capabilities for resource management and adaptation in its execution. This paper details the vgES implementation of virtual grids and their use in fault tolerant workfl ow planning of workfl ow sets with time and accuracy constraints. Our experiments show the effi ciency of the implementation and the effectiveness of the overall approach. Modules Description: A simple random method is used to select the low-level heuristic Hi from the candidate pool H. The diversity detection operator is used by HHSA to decide “when” to change the low-level heuristic algorithm Hi. This mechanism implies that the higher the temperature, the higher the opportunity to escape from the local search space to fi nd better solutions. The timer is fi xed at startup and end up mode of an application. We associate with such an event the workload dependent. Hyper heuristic Algorithm: Time-based consists in setting a timer that schedules the time instant when the Scheduled task has to be performed. The timer is fi xed at startup mode of an application. Hyper-heuristic algorithms can then maintain a high search diversity to increase the chance of fi nding better solutions at later iterations while not increasing the computation time. A time-based rejuvenation policy intends to identify the optimal time to rejuvenate with respect to one or more performance indices. The VMM does not degrade, and therefore, it is only necessary to keep memory of the age that was reached at the workload changing point. Romanian Statistical Review nr. 1 / 201710 Class Diagram to Scheduling Algorithm Sequence Diagram for Cloud Scheduling Algorithm Romanian Statistical Review nr. 1 / 2017 11 ALGORITHM DESCRIPTION: A hyper-heuristic is a heuristic search method that learns to automate, repeatedly by the incorporation of machine learning techniques, in which the process of selecting, combining, and generating or adapting a number of simpler heuristics (or components of such heuristics) to effi ciently solve the computational search problems. One of the main motivations for studying the hyper-heuristics is to build systems that which can handle the classes of problems rather than solving just a single problem. There might be multiple heuristics from which one can choose for solving the problem, and then the each heuristic has its own strength and also the weakness. The idea is to automatically devise algorithms by combining the strength and compensating for the weakness which are known heuristics. In the typical hyper-heuristic framework there consists of the high-level methodologies and a set of low-level heuristics (either constructive or perturbative heuristics). When a problem instance is given, the high-level method selects which low-level heuristic should be applied at any of the given time, based upon the current problem state, or the search stage. Romanian Statistical Review nr. 1 / 201712 Scheduling Flow Diagram Primitive model of Server Side Sample Code: using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Linq; using System.Text; using System.Windows.Forms; using System.Data.SqlClient; namespace server { Romanian Statistical Review nr. 1 / 2017 13 public partial class Form2 : Form { SqlConnection cn = new SqlConnection(“Data Source=SRUJAN\\ SQLEXPRESS;Initial Catalog=schedule;Integrated Security=True”); SqlCommand cmd; SqlDataReader dr, dr1; public Form2() { InitializeComponent(); } private void Form2_Load(object sender, EventArgs e) { cn.Open(); } private void button6_Click(object sender, EventArgs e) { //+cal1(); //string sta = rsc + “N State”; //textBox6.Text = sta.ToString(); } private void button4_Click(object sender, EventArgs e) { } string t, t1, t2; private void button2_Click(object sender, EventArgs e) { SqlCommand cmd = new SqlCommand(“select * from vm1load”, cn); SqlDataReader dr = cmd.ExecuteReader(); while (dr.Read()) { t1 = dr[0].ToString(); } dr.Dispose(); cmd.Dispose(); if (t1 == null) { MessageBox.Show(“Start sharinf fi le”); } else Romanian Statistical Review nr. 1 / 201714 { textBox7.Text = t1.ToString(); } } private void button3_Click(object sender, EventArgs e) { cmd = new SqlCommand(“select * from vm2load”, cn); dr = cmd.ExecuteReader(); while (dr.Read()) { t2 = dr[0].ToString(); } dr.Dispose(); cmd.Dispose(); if (t2 == null) { MessageBox.Show(“Start sharinf fi le”); } else { textBox1.Text = t2.ToString(); } } private void button1_Click(object sender, EventArgs e) { //int v=Convert.ToInt32(textBox1.Text); int v1 = Convert.ToInt32(textBox7.Text); int v2 = Convert.ToInt32(textBox1.Text); if (v1 < 100) { label1.Visible = true; label1.Text = “ Failure Occured Unable to Load”; } if (v2 < 100) { label2.Visible = true; label2.Text = “Failure Occured Unable to Load”; } } private void pictureBox1_Click(object sender, EventArgs e) Romanian Statistical Review nr. 1 / 2017 15 { if (textBox7.Text == “” && textBox1.Text == “”) { MessageBox.Show(“Start any of Your Server And Proceed”); } else { Form2 f2 = new Form2(); f2.Show(); } } private void button4_Click_1(object sender, EventArgs e) { Form4 f4 = new Form4(); f4.Show(); this.Hide(); } private void label9_Click(object sender, EventArgs e) { } } } CONCLUSION The proposed algorithm uses two detection operators to automatically determine when to change the low level heuristic algorithm and a perturbation operator to fi ne tune the solutions obtained by each low-level algorithm to further improve the scheduling results in terms of make span. As the simulation results show, the proposed algorithm can not only provide better results than the traditional rule-based scheduling algorithms, it also outperforms the other heuristic scheduling algorithms, in solving the workfl ow scheduling and Hadoop map-task scheduling problems on cloud computing environments. With the incorporation of genetic programming into hyper-heuristic research, a new level of approaches are found that we have termed ‘heuristics to generate heuristics’. These approaches provide richer heuristic search spaces, and thus the freedom to create new methodologies for solving the underlying combinatorial problems. In addition, the simulation results show further that the proposed algorithm converges faster than the other heuristic algorithms evaluated in this Romanian Statistical Review nr. 1 / 201716 study for most of the datasets. In brief, the basic idea of the proposed hyper heuristic algorithm is to leverage the strengths of all the low level algorithms while not increasing the computation time, by running one and only one low-level algorithm at each iteration. This is fundamentally different from the so-called hybrid heuristic algorithm, which runs more than one low level algorithm for each iteration; thus requiring a much longer computation time. REFERENCES 1. P. Chr´etienne, E. G. Coffman, J. K. Lenstra, and Z. Liu, Eds., 1995, Scheduling Theory and Its Applications. John Wiely & Sons Ltd. 2. A. Allahverdi, C. T. Ng, T. C. E. Cheng, and M. Y. Kovalyov, 2008, “A survey of scheduling problems with setup times or costs,” European Journal of Operational Research, vol. 187, no. 3, pp. 985–1032. 3. M. Armbrust, A. Fox, R. Griffi th, A. D. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, “A view of cloud computing,” Communications of the ACM, vol. 53, no. 4, pp. 50–58, 2010. 4. Reddy, C. S., et al., 2016, “Obtaining Description for Simple Images using Surface Realization Techniques and Natural Language Processing” Indian Journal of Science and Technology 9.22. 5. I. Foster, Y. Zhao, I. Raicu, and S. Lu, 2008 , “Cloud computing and grid computing 360-degree compared” in Proceedings of the Grid Computing Environments Workshop, pp. 1–10. 6. Mamatha, E., C. S. Reddy and Ramakrishna Prasad,, 2012, “Mathematical Modeling of Markovian Queuing Network with Repairs, Breakdown and fi xed Buffer” i-Manager’s Journal on Software Engineering 6.3 (2012): 21. 7. Tsai, Chun-Wei, et al., 2014, “A hyper-heuristic scheduling algorithm for cloud” IEEE Transactions on Cloud Computing 2.2 (2014): 236-250. 8. X. Lei, X. Liao, T. Huang, H. Li, and C. Hu, 2013, “Outsourcing large matrix inversion computation to a public cloud” IEEE Transactions on Cloud Computing, vol. 1, no. 1, pp. 78–87 9. Mamatha, E., C. S. Reddy, and K. R. Prasad, 2016, “Antialiased Digital Pixel Plotting for Raster Scan Lines Using Area Evaluation”, Emerging Research in Computing, Information, Communication and Applications. Springer Singapore, 461-468 10. J. H. Holland, 1975, Adaptation in Natural and Artifi cial Systems: An Introductory Analysis with Applications to Biology, Control, and Artifi cial Intelligence. University of Michigan Press. 11. Mamatha et al., An Effi cient Line Clipping Algorithm in 2D Space, in press, International Arab Journal of Information Technology, 12. C. W. Tsai and J. Rodrigues, “Metaheuristic scheduling for cloud: A survey”, 2014, IEEE Systems Journal, vol. 8, no. 1, pp. 279–297. 13. Y. T. J. Leung, 2004, Handbook of Scheduling: Algorithms, Models and Performance Analysis. Chapman & Hall/CRC. 14. K. M. Elsayed and A. K. Khattab, 2006, “Channel-aware earliest deadline due fair scheduling for wireless multimedia networks” Wireless Personal Communications, vol. 38, no. 2, pp. 233–252. 15. Mamatha, et al., 2008, “Performance evaluation of homogeneous parallel processor system of Markov modeled queue” i-Manager’s Journal on Software Engineering 3.2: 58. 16. Z. Wu, X. Liu, Z. Ni, D. Yuan and Y. Yang, 2011, “A market-oriented hierarchical scheduling strategy in cloud workfl ow systems” The Journal of Supercomputing, pp. 1–38. 17. M. R. Garey, D. S. Johnson and R. Sethi, 1976, “The complexity of fl owshop and jobshop scheduling” Mathematics of Operations Research, vol. 1, no. 2, pp. 117–129. Romanian Statistical Review nr. 1 / 2017 17 18. J. Bła˙zewicz, K. H. Ecker, E. Pesch, G. Schmidt and J. We¸glarz, 2001, Scheduling Computer and Manufacturing Processes. Springer-Verlag New York, Inc. 19. D. Shi and T. Chen, 2013, “Optimal periodic scheduling of sensor networks: A branch and bound approach” Systems & Control Letters, vol. 62, no. 9, pp. 732–738. 20. M. Dorigo and L. M. Gambardella, 1997, “Ant colony system: A cooperative learning approach to the traveling salesman problem” IEEE Transactions on Evolutionary Computation, vol. 1, no. 1, pp. 53–66. 21. Apache Hadoop. [Online]. Available: http://hadoop.apache.org 22. Y. M. Huang and Z. H. Nie, “Cloud computing with Linux and Apache Hadoop” [Online]. Available: http://www.ibm.com/developerworks/aix/library/au-cloud apache. 23. Mamatha, E., C. S. Reddy and S. Krishna Anand, 2016, “Focal point computation and homogeneous geometrical transformation for linear curves” Perspectives in Science 24. M. Rahman, X. Li, and H. Palit, 2011, “Hybrid heuristic for scheduling data analytics workfl ow applications in hybrid cloud environment,” in Proceedings of the IEEE International Symposium on Parallel and Distributed Processing Workshops, pp. 966–974. RESULTS AND DIAGRAMS Romanian Statistical Review nr. 1 / 201718 work_3f4o5vkwirca5doewxivlid55u ---- Human action recognition using an ensemble of body-part detectors Bhaskar Chakraborty, Andrew D. Bagdanov, Jordi Gonzàlez and Xavier Roca {bhaskar, bagdanov, poal, xavir}@cvc.uab.es Universitat Autònoma de Barcelona, Computer Vision Center Campus UAB Edifici O, 08193 Bellaterra, Spain May 3, 2010 Abstract This paper describes an approach to human action recognition based on the probabilistic optimization model of body parts using Hidden Markov Model (HMM). Our proposed method is able to distinguish between similar actions by only consid- ering the body parts having major contribution to the actions, for example, legs for walking, jogging and running; hands for boxing, waving and clapping. We apply HMMs to model the stochastic movement of the body-parts for action recognition. The HMM construction requires an ensemble of body-part detectors, followed by grouping of part detections to perform human identification. Three example-based body part detectors are trained to detect three components of the human body: the head, the legs and the arms. These detectors cope with viewpoint changes and self-occlusions through the use of ten sub-classifiers that detect body parts under a specific range of viewpoints. Each sub-classifier is a Support Vector Machine (SVM) trained on features selected for the discriminative power for each particular part/viewpoint combination. Grouping of these detections is then performed using a simple geometric constraint model which yields a viewpoint invariant human de- tector. We test our approach on the most commonly used action dataset, the KTH 1 dataset, and obtain promising result, which proves that with a simple and compact representation we can achieve robust recognition of human actions, compared to complex representation. 1 Introduction Visual recognition and understanding of human actions are among the most important research areas in Computer Vision (Moeslund, et al. 2006, Shipley & Zacks 2008, Wang, et al. 2003, Turaga, et al. 2008). Good solutions to these problems would yield huge potential for many applications such as the search and structuring of large video archives, video surveillance, human-computer interaction, gesture recognition and video editing. Human detection and action recognition are extremely challenging due to the non-rigid nature of humans in video sequences caused by changes in pose, changing illumination conditions and erratic non-linear motion. When viewed as a stochastic estimation problem, the critical issue in action recog- nition becomes the definition and computation of the likelihood, Pr(a|H,I), of action a given the human H in the image sequence I, since the human H is the main agent performing the action a. The difficulty in working with this likelihood is directly related to the complexity of the joint distribution Pr(H,I) over all possible human figures H in the image sequence I. Holistic approaches which attempt to model the entire human figure, generally, must resort to very sophisticated and complex models of this joint distribution, resulting in very demanding model estimation and optimization problems. Basic human actions can be represented using the local motion of individual body parts. Actions like walking, jogging, running, boxing and waving are systematic com- binations of the motion of different human body components. From this perspective, it can also be observed that not all body parts contribute equally to all action classes. For example, actions like walking, running and jogging are characterized mostly by the movement of the legs. Boxing, waving and hand clapping, on the other hand, mostly depend on the arms. Our approach is based on these observations and we define the action likelihood Pr(a|H,I) instead as Pr(a|B,I), where B is an ensemble of body parts 2 T ra in in g T ra in in g Labeled Body-parts Labeled Body-parts Legs Arms Head HOG Feature Extraction and Selection HOG Feature Extraction and Selection Head Arms Legs View-invariant Body-part Detectors View-invariant Body-part Detectors Head SVMs Arm SVMs Leg SVMs Gaussian Mixture Model (GMM) Gaussian Mixture Model (GMM) Arm Cluster Leg Cluster Walking HMM Running HMM Clapping HMM Action HMMs Action HMMs (a) Q u e ry Q u e ry Query Video Body-part Detector HOG Feature Extraction Feature Quantization Using GMM Classification by Action HMMs Label the Query Video (b) Figure 1: The proposed framework for action recognition based on probabilistic optimiza- tion model of body parts using Hidden Markov Models (a) Construction of body-part detectors and Action HMMs (b) Action recognition. which the human H is composed of. Moreover, the likelihood is further simplified by conditioning actions only on those body parts which contribute most to a particular ac- tion. Features from body parts B are used to model the action likelihood Pr(a|B,I), and optimizing this likelihood over all known actions yields a maximum likelihood estimate of the action in the image sequence. The ensemble of body part detectors is an important component of our method and we use SVM-based body part detectors over a range of viewpoints to build viewpoint- invariant body part detectors. We model human actions as a repeating chain of body- 3 part poses (Chakraborty, et al. 2008). Similar poses are grouped together by applying a Gaussian Mixture Model (GMM) on the features of the body parts in order to identify key-poses. These key-poses then serve as a vocabulary of hidden action-states for Hidden Markov Models (HMMs) that model the temporal-stochastic evolution of each action class. Figure 1 shows the overview of our proposed method. In the next section we review relevant work in the literature. Section 3 describes our probabilistic model for action recognition. In section 4 we describe the viewpoint- invariant part detectors used to define the part ensembles in an image sequence and the HMM-based approach to model the likelihood. Section 4.2 describes the HMM learning process. Experimental results are reported in section 5 and section 6 concludes the paper with some discussions and indications of future research directions. 2 Related work Several methods for learning and recognizing human actions directly from image mea- surements have been proposed in the literature (Black & Jepson 1996, Davis & Bobick 1997, Zelnik-Manor & Irani 2001, Chomat & Crowley 1999). These global feature based action recognition techniques mainly employ flow, interest points and silhouette features. These methods may work well under fixed, well-defined conditions, but are not generally applicable, especially for varying viewpoints. For example, accurately estimating the po- sitions of joints in different view-points is an extremely difficult problem and is computa- tionally expensive. Optical flow images are the least affected by appearance but are often too noisy due to inaccuracies in flow calculation. A common issue with interest point detectors is that the detected points are sometimes too few to sufficiently characterize human actions, and hence reduce recognition performance. This issue has been avoided in (Niebles, et al. 2008) by employing a separable linear filter (Dollár, et al. 2005), rather than space-time interest point detectors, to obtain motion features using a quadrature pair of 1D temporal Gabor filters. View-invariance in action recognition is addressed in (Yilmaz & Shah 2005, Rao, et al. 2002, Parameswaran & Chellappa 2006). In our 4 approach we overcome these problems by identifying human body parts under different view-points and explicitly modelling their influences on action interpretation. For action modelling and classification, many researchers use HMMs due to the sequential-stochastic nature of human actions. The HMMs and AdaBoost are used to recognize 3D human action, considering the joint position and pose angles (Fengjun & Ramkant 2006). In (Ahmad & Lee 2006) action recognition is performed using silhouette features and the Cartesian component of optical flow velocity. HMMs are then used to classify actions. Also (Mendoza & de la Blanca 2007), detect human actions using HMMs based on the contour histogram of full bodies. Again, all these methods use features from the whole body to model the HMMs. Human silhouette changes intensely under different view-points and the extracted features from the whole body do not represent the action for all views. In our approach we reduce the complexity of HMM learning by identifying the contributing body parts for an action. In this way action modelling becomes more explicit and can be generalized in different view points. Although conceptually appealing and promising, the merits of part-based models have not yet been widely recognized in action recognition. One goal of this work is to explore this area. Recent works on part-based event detection use hidden conditional random fields (Wang & Mori 2008), flow based shape feature from body parts (Ke, et al. 2005) and Histograms of Oriented Rectangles (HORs) (Ikizler & Duygulu 2009). These part-based approaches to action recognition employ complex learning techniques and the general concept of view invariance is missing. Although very promising, these approaches have several problems. Learning the parameters of a hidden conditional random field is a very complex procedure and requires a lot of training examples. Methods like HOR need to extract contours and filled silhouettes and the success of the method depends strongly on the quality of those silhouettes. Our approach, in contrast, is built upon example- based body part detectors which are simple to design and yet robust in functionality. In our method we combine the advantages of full body and part-based action recog- nition approaches. We use simple SVM-based body part detectors inspired by (Mohan, et al. 2001), but instead of using Haar-like features we use HOGs (Dalal & Triggs 2005). 5 The disadvantage of Haar-like features is that they are affected by human appearance, while HOG features are designed to extract shape information from the human body contour. Furthermore, we apply a feature selection technique to reduce the dimension- ality of the resulting SVMs. Those selected features from the body parts are then used to learn an HMM for each action class. We use Gaussian Mixture Models (GMM) to identify the group of similar body-parts, that have major contribution in an action, to obtain the action-states or key-poses. View-invariance is addressed by designing multiple sub-classifiers for each body-part, responsible for detecting the body-limbs in different view-points. 3 A probabilistic model for action recognition The task of human action recognition can be formulated as an optimization problem. Let A = {a1,a2, . . . ,an} denote a set of possible actions, where each ai denotes a specific action such as walking, running or hand clapping. We write the likelihood of a specific action ai in a given image sequence I with human H as: Pr(ai|H,I) for ai ∈ A. Given an instance of I with detected human H, a maximum likelihood estimation of the action being performed is: a∗ = argmax ai∈A Pr(ai|H,I). (1) Rather than holistically modelling the entire human H, we consider it to be an ensemble of detectable body parts: B = {B1,B2, . . . ,Bm}, where each Bi represents one of m-body parts such as the head, legs, arms or torso. The likelihood can now be expressed as: Pr(ai|H,I) = Pr(ai|{B1,B2, ...,Bm},I) (2) 6 Our model strengthen the fact that not all actions depend equally on all body parts. Actions like walking, jogging and running, for example, depend primarily on the legs. Actions like boxing, waving and clapping, on the other hand, depend primarily on the arms. To simplify the likelihood in (Equation 2), we define a dependence function over the set of all subsets of body parts: d(ai) : A −→P(B), where P(B) is the power set of B, or {c|c ⊆ B}. The set d(ai) determines the subset of body parts on which action ai most strongly depends. The likelihood is now approximated as: Pr(ai|H,I) ≈ Pr(ai|d(ai),I), (3) and the maximum likelihood estimate of the action given an ensemble of body parts and an image sequence is: a∗ = argmax ai∈A Pr(ai|d(ai),I). (4) The approximate likelihood in Equation 3 makes explicit the dependence of an action class on a subset of body parts. This approximation assumes that the action ai is inde- pendent of the body parts excluded from d(ai) and thus the likelihood can be computed by simply excluding the irrelevant parts rather than optimizing (or integrating) over them. In the following sections we describe how we model the approximate likelihood in Equation 3 using viewpoint invariant body-part detectors and HMMs over detected features of these body parts, and then from these arrive at the estimate of Equation 4 for action classification. 4 Action modelling by HMM Human action is a stochastic movement of an ensemble body parts. Moreover, for many actions it is possible to find the body parts that have a major contribution on it. We choose Hidden Markov Models for modelling Pr(ai|d(ai),I) [Equation 3] where d(ai) is 7 either Blegs or Barms. That is, we use d(ai) to indicate whether action ai depends mostly on the arms or on the legs. An HMM is a collection of finite states connected by transitions. Each state is char- acterized by two sets of probabilities: a transition probability, and either a discrete out- put probability distribution or a continuous output probability density function. These functions define the conditional probability of emitting each output symbol from a finite alphabet, conditioned on an unknown state. More formally, it is defined by: (1) A set of states S, with an initial state SI and a final state SF ; (2) The transition probability matrix, T = {tij}, where tij is the probability of taking the transition from state i to state j and (3) The output probability matrix R. For a discrete HMM, R = {rj(Ok)}, where Ok represents a discrete observation symbol. The initial state distribution is π = {πi}, and the complete parameter set of the HMM can be expressed as: λ = (T,R,π). (5) Here, for each action ai one discrete HMM is constructed using features from the con- tributing body parts: Blegs or Barms. We obtain the set of hidden states, S, using Gaussian Mixture Model (GMM) on the features from the detected body-parts. The transition probability matrix, T, is learnt and action classification is done after com- puting the output probability matrix, R, accordingly. Following sections describe the body-part detection and HMM learning. 4.1 Body-part detection Here, we use three body part detectors for the head, leg and arms respectively. Body part detection is based on sliding-window technique. Specific sized rectangular bounding boxes are used for each body part. These bounding boxes are slided over the image-frame, taking the section of the image-frame as an input to the head, leg and arm detectors. These inputs are then independently classified as either a respective body part or a non- body part. In order to prevent possible false positives, those detected components are combined into a proper geometrical configuration into another specific sized bounding 8 -1 0 1 -2 20 1 0 1 1 2 -1 0 00 -1 2 1 Feature Selection π/2 - π/2 Selected Blocks of HOG Gx Gy Training Image Gradient Image Dividing the Gradient Image into (8X8) Blocks Computing 6-bin HOG features Figure 2: Feature extraction and selection method from a training image. The training image is divided into several blocks and then HOG features are extracted. Finally standard deviation based feature selection method is applied to obtain feature vector for SVM. box as a full human. All these bounding boxes are obtained from the training samples. Furthermore, the image itself is processed at several scales. This allows the system to be scale invariant. 4.1.1 Features extraction and selection In our approach (Algorithm 1), labeled training images for each body part detector are divided into (8 × 8) blocks after applying Sobel mask on them. HOG features are extracted from those blocks and a 6 bin histogram is computed over the angle range (-π 2 , +π 2 ). So, this gives one 6-dim feature vector for each of those (8 × 8) pixel blocks. Next, we select the best feature vector group among all of them. This feature selection method is based on the minimization of the standard deviation (σ) of different dimensions of those feature vectors. 9 Algorithm 1 Feature extraction for body component SVM Require: Training images of body component. Ensure: Features for SVM. 1: for every training image do 2: Apply Sobel operator 3: Divide Sobel image into (8 × 8) blocks 4: for each blocks do 5: Compute 6 bin histogram of HOG features within the range (-π 2 , +π 2 ) 6: Keep it in feature array. 7: end for 8: Apply feature selection algorithm over the feature array. 9: end for 10: Selected features are used to learn SVM. Let there be N number of training image samples of size W × H, divided into n number of (8 × 8) blocks. For each of these (8 × 8) blocks we have 6-bin HOG features as, G = 〈g1,g2, . . . ,g6〉. We define the standard deviation, σij, of ith block and jth bin of HOG feature as: σij = ( 1 N ) N∑ t=1 (g(t)ij −µij) 2 (6) where i = 1, 2,. . . , n; j = 1, 2,. . . , 6; t = 1, 2, . . . , N; g(t)ij is defined as jth bin HOG feature of ith block of tth training image and µij is defined as the mean of jth gradient of ith block over all the training images. The values of σij, in Equation 6, are sorted and those 6dim feature vector packets are taken for which the σij is smaller than a predefined threshold. In our case we ran the experiment several times to obtain this threshold empirically. These selected features are used to train SVMs for the body part detectors. Figure 2 shows the feature extraction technique. 10 4.1.2 Geometric constraints on body-part detection The obtained outcome of those component detectors are combined based on the geo- metric constraint of full human body (Algorithm 2) to avoid the false positives obtained from the part detectors. We define the detected body-part component bounding boxes as, Head Bounding Box (RH), Leg Bounding Box (RL) and Arm Bounding Box (RA) obtained from body-part detectors and the full human bounding box (RF ). Algorithm 2 Geometric constraints on body-part detection Require: Detected body-part bounding boxes: RH, RL and RA; Width and height of full human bounding box: WRF and HRF . Ensure: Full human bounding box: RF , if geometric constraints are satisfied; NULL, otherwise. 1: (XCH ,YCH ) ← CENTROID(RH). 2: (XCL,YCL ) ← CENTROID(RL). 3: if (XCH − WRH 2 ) < XCL < (XCH + WRH 2 ) then 4: if YCL < HRF 2 then 5: Obtain RF using {(XCH − WRF 2 ), (YCH + HRH 2 ), WF , HF}. 6: (XCF ,YCF ) ← CENTROID(RF ). 7: (XCA,YCA ) ← CENTROID(RA). 8: if (XCF − WRF 2 ) < XCA < (XCF + WRF 2 ) then 9: if YCH > YCA > YCF then 10: Return(RF ). 11: end if 12: end if 13: end if 14: else 15: Return(NULL). 16: end if Removal of false positives are depicted in Figure 3. When the detectors are applied 11 (a) (b) (c) (d) Figure 3: Removal of false positives using geometric constraints of different component detectors. (a) original image (b) detection of head and legs with overlapping bounding boxes (c) after getting single detection window for head and leg including false positives (d) detection of head and leg after removal of false positives using geometric constraint. in the image there are usually many overlapping detected windows for a particular body component. We obtain a single bounding box from those overlapping detected windows in the following way. If two bounding boxes share more than 70% overlapping area they are merged to obtain one single box. In this way overlapping detection windows are converted into a single one. After that, the above geometric constraint is applied to remove possible false positives and we obtain an ensemble of body-part detectors resulting in a full human detection. 4.2 Construction of action HMMs For learning an HMM for a particular action, several sequences of frames are chosen and every such sequence is called action-cycle/cycle for that action. The frames that define one cycle depends on the training sequences and the action itself. For example, the number of frames in an action cycle varies when it is performed in a circular path. Let 12 Figure 4: Training dataset for each body-part detectors. It shows training samples for each sub-classifier. there be M frames inside one action cycle and in each of these frames the body parts are detected using component detectors. Let assume that in the kth frame we detect body part bounding boxes where there are n best 6dim feature vectors from the Section: 4.1.1 as {Gk1 , G k 2 , ..., G k n} where each Gki = 〈 gi1,g i 2, . . . ,g i 6 〉k . Then we compute mean over all these Gki ’s to get the features from the kth frame. So, we have 〈µ1,µ2, . . . ,µM〉 as a feature to construct the HMM where each of these µi’s can be expressed as: µi = ( 1 M ) M∑ i=1 〈 gi1,g i 2, . . . ,g i 6 〉 . (7) The significance of taking the mean is to get the general orientation of the body part which in turn signifies one pose or a series of similar poses in an action. We fit a Gaussian Mixture Model (GMM) using Expectation Maximization (EM) on these mean features to obtain key-pose alphabets for a particular action. These key-poses are the centre of each cluster obtained from the GMM. Once we obtain these key-pose alphabets, we assign every detected body pose in the frames of each of the action cycles using the nearest-neighbour approach to get a sequence of HMM states for them. These state sequences are used to learn the parameters of the HMM in Equation 5. In this way, we obtain the model to compute probability value in Equation 3. 13 Figure 5: Results of head, arm and leg detection as a validation process. Here detection of profile head and leg is shown. Arm detections are shown where both arms are visible and one is occluded. For an unknown action, the detected body poses in all frames are mapped into the state sequences in the similar way, but without dividing them into cycles, since in such cases the cycle information is not known. P(Ot|st = s), the probability of an observation sequence Ot given the state s at time t, is computed using both Forward and Baum- Welch (Ghahramani 2002, Rabiner 1989) algorithms for every action class HMM. The action class that gives the maximum of these probability values (Equation 4), is the class label of that unknown class. 5 Experiments We use three publicly available datasets for our experiments on body-part detection and action recognition. HumanEva dataset1: This dataset is introduced in (Sigal & Black 2006). We use this to create our human body-part component dataset. HumanEva dataset contains 7 calibrated video sequences (4 grayscale and 3 colour) that are synchronized with 3D body poses obtained from a motion capture system. The dataset contains 4 subjects performing 6 common actions (e.g. walking, jogging and gesturing etc.). KTH dataset2: This dataset is the most common dataset for the action recognition 1http://vision.cs.brown.edu/humaneva/ 2http://www.nada.kth.se/cvap/actions/ 14 introduced in (Schuldt, et al. 2004). It covers 25 subjects and four different recording conditions of the videos. There are six actions in this dataset: walking, running, jogging, boxing, clapping and waving. The resolution, (120×160), of this dataset is quite low. HERMES indoor dataset3: This dataset is described in (González, et al. 2009). In this HERMES sequence (2003 frames @ 15 fps, 1392 × 1040 pixels) there are three people in a room. They act in a discussion sequence sitting around a table, where bags are carried, left and picked from the floor, and bottles are carried, left and picked from the vending machine and from the table. In this discussion sequence several agents are involved in different simultaneous grouping, grouped and splitting events, while they are partially or completely occluded. 5.1 Training of the body part detectors Our system uses four head detectors, two leg detectors and four arm detectors. The four head detectors correspond to view angle ranges (π 4 , 3π 4 ), ( 3π 4 , 5π 4 ), ( 5π 4 , 7π 4 ) and ( 7π 4 , π 4 ). We chose this division so that the consecutive ranges have smooth transition of head poses. For arms, there are four classifiers corresponding to different arm positions, grouped in the same angular views of the head. Detecting arms is a difficult task since arms have more degrees of freedom compared to the other body parts. For each action we use four major arm poses and considering the pose symmetry the detection of other pose variation is achieved. To detect the legs, two sub-classifiers have been added: one representing front(rear) view and the other one for profile views of the legs. Figure 4 shows some training samples from our body-part dataset. In our method for head, leg and arms detection the bounding box sizes are fixed to (72×48), (184×108) and (124×64) pixels respectively. A (264×124) pixel bounding box is applied for full human. Since the test image is zoomed to various sizes, and in each zoomed image the components of those sizes are searched, the fixed component sizes do not affect scale invariance of human detection. To train each component detector, 10, 000 true positives and 20, 000 false positives selected from the HumanEva dataset are used. The 3http://iselab.cvc.uab.es/indoor-database 15 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 T u re P o si tiv e R a te False Positive Rate Head Detection Leg Detection Arm Detection 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Rate T ru e P o si tiv e R a te Head Detection Leg Detection Arm Detection (a) HumanEva dataset (b) KTH dataset 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 False Positive Rate T ru e P o si tiv e R a te Head Detection Leg Detection Arm Detection (c) HERMES indoor sequence Figure 6: ROC curves for different body part detectors on various datasets. The false alarm rate is the number of false positive detections per window inspected. sizes of different body-part component bounding boxes are determined after learning the statistics of height and width of each component from all the sequences of HumanEva dataset. A tolerance of 10 pixels is also used along the height and width of the each bounding box. 5.2 Performance Evaluation of Body-part Detection We performed extensive experiments and quantitative evaluation of the proposed ap- proach of body part detection. We validate our part detector using HumanEva dataset. For a particular body-part detector we usually obtain different detection scores from the 16 Figure 7: Performance of body part detectors on the KTH dataset. Detection of head, arms and leg shown for profile poses. sub-classifiers that the detector is composed of. We chose the detection result based on the best detection score among themselves. Figure 5 shows the results of the compo- nent detectors in HumanEva dataset. These detections are the strong examples of the view invariant detection since in the HumanEva dataset the agents perform actions in a circular path and our component detector show robust performance in detecting them. To test the performance of body part detectors, KTH Dataset (Schuldt et al. 2004) and HERMES indoor sequence dataset (González et al. 2009) are used. The Receiver Operating Characteristic (ROC) curves are shown in the Figure 6 for three component detectors: head, legs and arms of different datasets and also for the detection of full human. These ROC curves are generated considering the overall performance of all the sub-classifiers of each group of the three classifiers. ROC analysis reveals that head detection and leg detection are quite accurate. Although there are false positives, but the geometric constraint eliminates them. For the arms, however, the detection rate is not very high since arm pose varies dramatically while performing actions. In the HumanEva and Hermes indoor datasets we obtain high detection rates for full human compared to the KTH dataset. In the KTH dataset, there are several image sequences where it is impossible to detect arms due to different clothes than the other two datasets. In such cases we are able to detect head and legs, and when they are in proper geometrical arrangement, the full human bounding box is constructed. There are some sequences where only legs are 17 detected due to low resolution. Figure 7 shows some examples of human detection on the KTH dataset. In those images the detected bounding boxes are found at different scales and they are drawn after rescaling them to 1:1 scale. For low resolution image sequences much information has been lost due to scaling and the Sobel mask hardly found important edges from the particular body parts. Our system works well in high resolution datasets like, the HumanEva (resolution 644×484) and the HERMES indoor sequences (resolution 1392×1040). It gives an average 97% recognition rate for walking, jogging and boxing actions. However, on low resolution datasets like the KTH (resolution 160 × 120) we achieve lower performance on the actions like jogging and hand clapping. In the KTH dataset there are many cases where the human is visible at the original resolution but when it is zoomed to 640 × 480 to detect the body-parts, the objects are blurred and detection suffers. Table 1: Comparison of mean classification accuracy on the KTH dataset. (Ke et al. 2005) 62.9% (Wong & Cipolla 2007) 71.16% (Schuldt et al. 2004) 71.72% Our Approach 79.2% (Dollár et al. 2005) 81.17% (Niebles et al. 2008) 81.5% 5.3 Action Recognition We train the walking, jogging and boxing HMMs on the HumanEva dataset and tested on KTH dataset. But for the actions running, boxing, hand waving and hand clapping we use KTH dataset for both training and testing. In these cases we take random sample of 50% of the video sequences as training and the rest as test. Table 1 shows a comparison of our recognition rate with other methods. The average action recognition rate obtained from our method is quite promising but fails to surpass (Dollár et al. 2005) and (Niebles et al. 2008). But, in both of these cases the training and test are done on 18 the KTH dataset. We, instead, use HumanEva as a training dataset for the actions, walking, boxing and jogging and test on the KTH dataset. Table 2 shows the confusion matrix of our approach on the KTH dataset. In the confusion matrix we can see that our approach, clearly, can able to distinguish the leg and arm related actions. We obtain 100% recognition rate for walking and boxing actions which is quite impressive. But the jogging and hand clapping have the poor recognition rate. For jogging, the major confusion occurs with walking (40%) and 23.1% running actions get confused as jogging. In the case of hand clapping we get the confusions with other two actions, the boxing and hand waving. These confusions occur due to the low resolution of the KTH dataset which affects the performance of the body-part detectors. Table 2: Confusion matrix of action recognition using our component-wise HMM ap- proach. KTH dataset are used. Walk Jog Run Box Wave Clap Walk 100.0 0.0 0.0 0.0 0.0 0.0 Jog 40.0 60.0 0.0 0.0 0.0 0.0 Run 0.0 23.1 76.9 0.0 0.0 0.0 Box 0.0 0.0 0.0 100.0 0.0 0.0 Wave 0.0 0.0 0.0 20.0 73.4 6.6 Clap 0.0 0.0 0.0 13.5 19.8 66.7 6 Discussion and Conclusion This work presents a novel approach for recognizing actions based on a probabilistic op- timization model of body limbs. It also includes a view-point invariant human detection is achieved using example-based body-part detectors. Stochastic changes of body components are modelled using HMM. This method is also able to distinguish very similar actions like walking, jogging and running (considering features from legs); boxing, hand waving and hand clapping (considering features from 19 hands). The leg movements, sometimes, are very similar in the actions like jogging and running. Also, in some cases there are problems of resolution and contrast which makes it is difficult to distinguish those actions. Actions, involving hand motion also suffer from similar problems, some part of hand waving action look similar to hand clapping, which in tern causes ambiguity. We observe that there are major confusions occurred in actions jogging and clapping. In the KTH dataset some sequences of jogging action resemblance with walking and running actions. Moreover, other state-of-the the art methods also suffer in the same way. The confusion between clapping and waving could be because of the arm detectors are difficult to design for every possible degree of freedom specially in those two actions there are a variety of arm pose changes. On the hand, arm detectors perform well for boxing since in this case the pose changes do not vary a lot. The advantage of our approach is two fold; first, the body part detection is robust except for resolution limited images. Second, the HMM-based action model is capable of recognizing actions even when the body part detectors fail on some frames of an action sequence. The performance can be improved in human detection by adding more training sam- ples and introducing more angular views. There is no good dataset for different body- part components, so building a component dataset is an important task. For action recognition, higher order HMM can be applied. Information of other body parts which have minor contribution in the action like, arms for the walking can be included in order to minimize the misclassification rate. References M. Ahmad & S.-W. Lee (2006). ‘HMM-based Human Action Recognition Using Multi- view Image Sequences’. In ICPR ’06: Proceedings of the 18th International Conference on Pattern Recognition, vol. 1, pp. 263–266, Washington, DC, USA. IEEE Computer Society. 20 M. Black & A. Jepson (1996). ‘Eigentracking: Robust matching and tracking of articu- lated objects using a view-based representation’. B. Chakraborty, et al. (2008). ‘View-Invariant Human Action Detection Using Component-Wise HMM of Body Parts’. In V Conference on Articulated Motion and Deformable Objects, pp. 208–217, Andratx, Mallorca, Spain. O. Chomat & J. L. Crowley (1999). ‘Probabilistic recognition of activity using local appearance’. In In Conference on Computer Vision and Pattern Recognition (CVPR ’99), Fort Collins, pp. 104–109. N. Dalal & B. Triggs (2005). ‘Histograms of Oriented Gradients for Human Detection’. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 01, pp. 886–893, Washington, DC, USA. IEEE Computer Society. J. W. Davis & A. F. Bobick (1997). ‘The Representation and Recognition of Action Using Temporal Templates’. In CVPR ’97: Proceedings of the IEEE Computer Vision and Pattern Recognition, pp. 928–934. P. Dollár, et al. (2005). ‘Behavior recognition via sparse spatio-temporal features’. pp. 65–72. L. Fengjun & N. Ramkant (2006). ‘Recognition and Segmentation of 3-D Human Action Using HMM and Multi-class AdaBoost’. In 9th European Conference on Computer Vision, pp. 359–372, Graz, Austria. Z. Ghahramani (2002). ‘An introduction to hidden Markov models and Bayesian net- works’ pp. 9–42. J. González, et al. (2009). ‘Understanding dynamic scenes based on human sequence evaluation’. Image and Vision Computing 27(10):1433–1444. N. Ikizler & P. Duygulu (2009). ‘Histogram of oriented rectangles: A new pose descriptor for human action recognition’. Image and Vision Computing 27(10):1515–1526. 21 Y. Ke, et al. (2005). ‘Efficient Visual Event Detection using Volumetric Features’. In IEEE International Conference on Computer Vision, vol. 1, pp. 166–173. M. Mendoza & N. P. de la Blanca (2007). ‘HMM-Based Action Recognition Using Con- tour Histograms’. In Iberian Conference on Pattern Recognition and Image Analysis, Part I, pp. 394–401, Girona, Spain. T. Moeslund, et al. (2006). ‘A Survey of Advances in Vision-Based Human Motion Capture and Analysis’. In Computer Vision and Image Understanding, vol. 8, pp. 231–268. A. Mohan, et al. (2001). ‘Example-based Object Detection in Images by Components’. IEEE Transaction on Pattern Analysis and Machine Intelligence 23(4):349–361. J. C. Niebles, et al. (2008). ‘Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words’. International Journal of Computer Vision 79(3):299–318. V. Parameswaran & R. Chellappa (2006). ‘View Invariance for Human Action Recogni- tion’. International Journal of Computer Vision 66(1):83–101. L. R. Rabiner (1989). ‘A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.’. Institute of Electrical and Electronics Engineers 2:257–286. C. Rao, et al. (2002). ‘View-Invariant Representation and Recognition of Actions’. International Journal of Computer Vision 50(2):203–226. C. Schuldt, et al. (2004). ‘Recognizing Human Actions: a Local SVM Approach.’. In International Conference on Pattern Recognition, pp. 32–36, Cambridge, UK. T. F. Shipley & J. M. Zacks (2008). In ”Understanding Events From Perception to Action”. Oxford University Press. L. Sigal & M. J. Black (2006). ‘HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion’. In Technical Report CS-06-08, Brown University. 22 P. Turaga, et al. (2008). ‘Machine Recognition of Human Activities: A Survey’. Circuits and Systems for Video Technology, IEEE Transactions on 18(11):1473–1488. L. Wang, et al. (2003). ‘Recent developments in human motion analysis’. Pattern Recognition 36(3):585–601. Y. Wang & G. Mori (2008). ‘Learning a discriminative hidden part model for human action recognition’. In Advances in Neural Information Processing Systems, vol. 21, pp. 1721–1728. MIT Press. S. Wong & R. Cipolla (2007). ‘Extracting Spatiotemporal Interest Points using Global Information’. In 11th IEEE International Conference on Computer Vision, pp. 1–8. A. Yilmaz & M. Shah (2005). ‘Recognizing Human Actions in Videos Acquired by Uncalibrated Moving Cameras’. IEEE International Conference on Computer Vision 1:150–157. L. Zelnik-Manor & M. Irani (2001). ‘Event-based analysis in video’. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 123–130. 23 work_3g2qgpmfbnbqxfqrqvxk4quel4 ---- This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright http://www.elsevier.com/copyright Author's personal copy Effective synchronizing algorithms R. Kudłacik a, A. Roman b,⇑, H. Wagner b a IBM Poland, SWG Krakow Laboratory, Armii Krajowej 18, 30-150 Krakow, Poland b Institute of Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348 Krakow, Poland a r t i c l e i n f o Keywords: Circuit testing Conformance testing Synchronizing sequences Synchronizing automata Reset word Synchronizing algorithm a b s t r a c t The notion of a synchronizing sequence plays an important role in the model-based testing of reactive systems, such as sequential circuits or communication protocols. The main problem in this approach is to find the shortest possible sequence which synchronizes the automaton being a model of the system under test. This can be done with a synchronizing algorithm. In this paper we analyze the synchronizing algorithms described in the literature, both exact (with exponential runtime) and greedy (polynomial). We investigate the implementation of the exact algorithm and show how this implementation can be optimized by use of some efficient data structures. We also propose a new greedy algorithm, which relies on some new heuristics. We compare our algorithms with the existing ones, with respect to both runtime and quality aspect. � 2012 Elsevier Ltd. All rights reserved. 1. Introduction Synchronizing words (called also: synchronizing sequences, re- set sequences, reset words or recurrent words) play an important role in the model-based testing of reactive systems (Broy, Jonsson, Katoen, Leucker, & Pretschner, 2005). In presence, with advanced computer technology, systems are getting larger and more compli- cated, but also less reliable. Therefore, testing is indispensable part of system design and implementation. Finite automata are the most frequently used models that describe structure and behavior of the reactive systems, such as sequential circuits, certain types of programs, and, more recently, communication protocols (Fukada, Nakata, Kitamichi, Higashino, & Cavalli, 2001; Ponce, Csopaki, & Tarnay, 1994; Zhao, Liu, Guo, & Zhang, 2010). Because of its prac- tical importance and theoretical interest, the problem of testing fi- nite state machines has been studied in different areas and at various times. Originally, in 1950s and 1960s, the researchers work in this area was motivated by automata theory and sequential cir- cuit testing. The area seemed to have mostly died down, but in 1990s the problem was resurrected due to its applications to con- formance testing of communication protocols. The problem of conformance testing can be described as follows (Lee & Yannakakis, 1996). Let there be given a finite state machine MS which acts as the system specification and for which we know completely its internal structure. Let MI be another machine, which is the alleged implementation of the system and for which we can only observe its behavior. We want to test whether MI correctly implements or conforms to MS. Synchronizing words allow us to bring the machine into one state, no matter which state we currently are in. This helps much in designing effective test cases, e.g. for sequential circuits. In Pomeranz and Reddy (1998) authors show a class of faults for which a synchronizing word for the faulty circuit can be easily determined from the synchronizing word of the fault free circuit. They also consider circuits that have a reset mechanism, and show how reset can ensure that no single fault would cause the circuit to become unsynchronizable. In Hyunwoo, Somenzi, and Pixley (1993) a framework and algo- rithms for test generation based on the multiple observation time strategy are developed by taking advantage of synchronizing words. When a circuit is synchronizable, test generation can em- ploy the multiple observation time strategy and provide better fault coverage, while using the conventional tester operation mod- el. The authors investigate how a synchronizing word simplifies test generation. The central problem in the approach based on the synchroniz- ing words is to find the shortest one for a given automaton. As the problem is NP-hard (see Section 2), the polynomial algorithms cannot be optimal, that is they cannot find the shortest possible synchronizing words (unless P ¼NP, which is strongly believed to be false). In last years some efforts were made in the field of algorithmic approach for finding short synchronizing words (Deshmukh & Hawat, 1994). Pixley, Jeong, and Hachtel (1994) presented an efficient method based upon the universal alignment theorem 0957-4174/$ - see front matter � 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2012.04.079 ⇑ Corresponding author. E-mail addresses: rafal.kudlacik@gmail.com (R. Kudłacik), roman@ii.uj.edu.pl (A. Roman), hub.wag@gmail.com (H. Wagner). Expert Systems with Applications 39 (2012) 11746–11757 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a Author's personal copy and binary decision diagrams to compute a synchronizing word. There are also Natarajan (1986) and Eppstein (1990) algorithms. The problem of synchronizing finite state automata has a long history. While its statement is simple (find a word that sends all states to one state), there are still some important questions to be answered. One of the most intriguing issues is the famous Černý Conjecture (Černý, Pirická, & Rosenauerová, 1971), which states that for any n-state synchronizing automaton there exists a syn- chronizing word of length at most (n � 1)2. Should the conjecture be true, this would be a strict upper bound, as there exist automata with minimal synchronizing words of length exactly (n � 1)2. The Černý Conjecture has profound theoretical significance (remaining one of the last ‘basic’ unanswered questions in the field of auto- mata theory, especially after the Road Coloring Problem has been recently solved by Trahtman (2009)). On the other hand, there are several practical applications of finding short reset sequences: part orienters (Natarajan, 1986), finding one’s location on a map/ graph (Kari, 2002), resetting biocomputers (Ananichev & Volkov, 2003), networking (determining a leader in a network) (Kari, 2002) and testing electronic circuits, mentioned above. Clearly, finding short reset words is important both for theoretical and practical reasons. The paper is organized as follows. In Section 2 we give the basic definitions on automata and synchronizing words. In Sec- tion 3 we introduce two auxiliary constructions which are com- monly used in the synchronizing algorithms. In Section 4 we present the well-known synchronizing algorithms, both exact and greedy. In Sections 5 and 6 we present our two main re- sults: application of efficient data structures to the exact syn- chronizing algorithm and a new, efficient heuristic algorithm. Both sections end with the experimental results and efficiency comparison to other algorithms. 2. Synchronizing words An alphabet is a nonempty, finite set. A word over some alphabet A is a sequence of letters from A. The length of a word w is the number of its letters and is denoted by jwj. By e we denote the empty word of length 0. If A is an alphabet, by A⁄ we denote the set all words over A. For example, if A = {a, b}, then A⁄ = {e, a, b, aa, ab, ba, bb, aaa, . . .}. The catenation of words is denoted by a dot: if u, v 2 A⁄, then u�v = uv. A finite state automaton is a triple A¼ðQ; A; dÞ, where Q is a finite set of states, A is an alphabet and d is a transition function, d:Q � A ? Q. Note that initial and terminal states are not marked – we are not interested in languages accepted by automata, but rather in automaton action itself. In the following, for the sake of simplicity, we will use the word ’automaton’ instead of ‘a finite state automa- ton’. The transition function can be extended to PðQÞ� A�, that is, to the sets of states and words over A. The same symbol d will be used to refer to the extended function d : PðQÞ� A� !PðQÞ. This makes no confusion: "P # Q, a 2 A, w 2 A⁄ dðP; eÞ¼ P; dðP; awÞ¼ [ p2P fdðdðp; aÞ; wÞg: A word w is called a synchronizing word for A¼ðQ; A; dÞ iff jd(Q, w)j = 1. We say that such a word synchronizes A. We also say that A¼ðQ; A; dÞ is synchronizing if there exists w 2 A⁄ that syn- chronizes it. If, for a given A, there is no shorter synchronizing word than w, the word w is called the shortest synchronizing word (SSW) for A. There are two main algorithmic problems in the synchronization theory: in the first one, given a synchronizing automaton A¼ðQ; A; dÞ we ask about SSW for A. In the second one we ask to find any synchronizing word, not necessarily the shortest (but, of course, the shortest word is found, the better), in a reasonable time. These problems can be restated in the form of the following decision problems: Problem FIND-SSW. Input: a synchronizing automaton A and k 2 N. Output: YES iff the shortest word synchronizing A has length k. Problem FIND-SW-OF-LENGTH-K Input: a synchronizing automaton A and k 2 N. Output: YES iff there exists a synchronizing word of length k for A. The decision problem FIND-MSW has been recently shown to be DP-complete (Olschewski & Ummels, 2010). The decision problem FIND-SW-OF-LENGTH-K is NP-complete (Eppstein, 1990). It is well-known that the length of SSW for an n-state synchronizing automaton is at most n 3�n 6 (Klyachko, Rystsov, & Spivak, 1987; Pin, 1983). The Černý conjecture states that this length can be bounded by (n � 1)2. Černý showed (Černý, 1964) that for each n P 1 there exists an automaton with SSW of length (n � 1)2, so the conjectured bound is tight. These automata are called the Černý automata. An n-state Černy automaton will be denoted by Cn. Černy automaton is defined over a two-element alphabet A = {a, b} and its transition function is as follows: 8q 2f0; . . . ; n � 1g dðq; xÞ¼ ðq þ 1Þmodn if x ¼ a q if x ¼ b ^ q – n � 1 0 if x ¼ b ^ q ¼ n � 1 8>< >: ð1Þ Černý automata are very important, as automata with jSSWj = (n � 1)2 are very rare. Only eight such automata are known that are not isomorphic with the Černy automata (Trahtman, 2006). 3. Auxiliary constructions In this section we describe two auxiliary constructions used throughout the paper. Let A¼ðQ; A; dÞ be a synchronizing autom- aton. A pair automaton for A is the automaton A2 ¼ðQ 0; A; d0Þ, where: Q 0 ¼ [ p;q2Q^p–q ffp; qgg[f0g; d0 : Q 0 � A ! Q 0; d0ðfp; qg; lÞ¼ fdðp; lÞ; dðq; lÞg ifdðp; lÞ – dðq; lÞ; 0 otherwise; ( d0ð0; lÞ¼ 0 8l 2 A: Let A¼ðQ; A; dÞ be an automat on. A sequence (q1, q2), (q2, q3), . . . , (ql, ql+1), q1, . . . , ql+1 2 Q, is called a path in A, if for each i = 1, . . . , l there exists ai 2 A, such that d(qi, ai) = qi+1. We will identify such a path with a word a1a2 . . . al (notice that if there is more than one letter transforming some qi into qi+1, then the path including (qi, qi+1) can be identified with more than one word). Pair automaton shows how the pairs of states behave when words are applied to the original automaton. If p, q 2 S # Q and w is a path leading from {p, q} to 0, it means that jd(S, w)j < jSj, where p, q 2 S. In such a situ- ation we say that pair {p, q} of states was synchronized by w. Pair R. Kudłacik et al. / Expert Systems with Applications 39 (2012) 11746–11757 11747 Author's personal copy automaton is utilized in all heuristic algorithms, see Sections 4.3, 4.4 and 4.5. The next proposition is a straightforward, but very important fact, utilized in all heuristic algorithms. Proposition 1. A word w 2 A⁄ synchronizes A2 iff w synchronizes A. Proposition 1 implies the following necessary and sufficient condition for A to be synchronized: Proposition 2. A is synchronizing iff each pair of its states is synchronizing. The problem of finding SSW can be restated as a problem of path-searching in a so-called power-set automaton (or power automaton for short) of A. A power-set automaton for A¼ðQ; A; dÞ is an automaton PðAÞ¼ ð2Q ; A; DÞ, where: 2Q ¼fP � Qgnf;g; D : 2Q � A ! 2Q ; Dðq; lÞ¼ [ s2q fdðs; lÞg 8q 2 2Q ; l 2 A: Like for d, we can extend D to 2Q � A⁄. Let PðAÞ¼ ð2Q ; A; DÞ be a power automaton of A¼ðQ; A; dÞ. State Q 2 2Q will be called a start state of PðAÞ. The size of the power automaton is exponential in the size of the original automaton. There are 2jQj � 1 states and jAj(2jQj� 1) edges. States of the power automaton represent subsets of states of the input automaton and are labeled by the corresponding subsets. We will only consider a subautomaton of the power automaton which is reachable from the start state. Specifically, when we say that the power automaton is small, we mean that the reachable subautomaton is small. Černy automata Cn are the interesting examples here, as all states of their power automata are reachable from the start state. We will sometimes refer to the ‘‘size of state s’’. This means that s is a state of the power automaton and it represents an jsj-element subset s of Q. Since edges in pair automaton represent transitions between subsets of states, the power automaton can be thought of as a way of expressing the global behavior of the input automaton when certain letter (or word) is applied. In contrast, pair-automa- ton describes local behavior only (it shows how the pairs of states are transformed). Proposition 3. The sets of synchronizing words in A and PðAÞ coincide. It means that A is synchronizing iff PðAÞ is synchronizing. It is clear that in a power automaton a path leading from Q 2 2Q to any state F 2 2Q, such that jFj = 1, represents a synchronizing word for A. Also, the shortest such path determines the shortest synchronizing word for A. So the entire problem can be rephrased as a basic graph problem. This is convenient as the single-source path-searching algorithms (exact or otherwise) have been extensively studied. Also, augmenting the generic path-searching methods with knowledge specific to this problem may give some interesting results. 4. Synchronizing algorithms In this section we describe 5 synchronizing algorithms, that is, the algorithms that find a synchronizing word for a given automa- ton. Two of them (EXACT and SEMIGROUP) are exponential ones that always find the shortest synchronizing words. Three others (NATARA- JAN, EPPSTEIN and SYNCHROP) are heuristic algorithms working in poly- nomial time, so they are faster, but they find not necessarily the shortest synchronizing words. In the following sections we assume jQj = n. 4.1. Exact exponential algorithm There are two well-known algorithms finding the shortest syn- chronizing words. Due to the fact that this problem is NP-hard, their runtime complexity is exponential in the size of the input automaton, which limits their use. The standard exact algorithm is a simple breadth-first-search in the power automaton. The runtime is X(2n) in the worst case. Standard implementation requires X(2nn) space. Due to these discouraging fact this algorithm is often disregarded in the literature. Algorithm 1 EXACT ALGORITHMðAÞ 1: Input: an automaton A¼ðQ; A; dÞ 2: Output: SSW of A (if exists) 3: queue Q empty 4: push Q into queue Q 5: mark Q as visited 6: while Q is not empty 7: S popðQÞ 8: foreach a 2 A 9: T d(S, a) 10: if jTj = 1 11: return reversed path from T to Q 12: if T is not visited 13: push T into Q 14: mark T as visited 15: return A is not synchronizing 4.2. Semigroup algorithm Another algorithm (which is typically more memory-efficient) was described in Trahtman (2006) and uses a notion of syntactic semigroup. Let A¼ðQ; A; dÞ be an automaton. Alphabet letters (and also words over A) represent functions Q ? Q, so if f is a func- tion from Q to Q and w 2 A⁄, by f�w we denote the composition of two functions: f and a function corresponding to w. Syntactic semi- group for A is constructed as follows: process all words over A⁄ in the lexicographic order. If a processed word defines a new function f: Q ? Q, add f to list L. The procedure is stopped in two cases: (1) when "f 2 L "a 2 A f�a 2 L, that is, when no new function can be de- fined; (2) when a constant function (mapping all elements into one element) is found. The word corresponding to the constant func- tion is SSW, as words were processed in the lexicographic order. The semigroup algorithm does not require a costly power automa- ton construction phase, but its standard implementation is terribly inefficient in the worst case. So it is slightly better than the power automaton algorithm, but the above fact limits its use only to small automata. The algorithm’s runtime complexity is O(jAjn � s2) with O(n � s) space required (Trahtman, 2006), where s is the size of the syn- tactic semigroup S. Syntactic semigroup size can be as big as nn, but since only a subset of S (containing only words no longer than SSW) is considered, the average runtime is usually much lower. The semigroup algorithm is used in a well-known synchroniza- tion package TESTAS. Its worst-time complexity can be drastically reduced and we show it in Section 5. 4.3. Natarajan algorithm One of the first heuristic algorithms for finding short synchro- nizing words was provided by Natarajan (1986). The algorithm is shown in Listing 2. 11748 R. Kudłacik et al. / Expert Systems with Applications 39 (2012) 11746–11757 Author's personal copy Algorithm 2 NATARAJANðAÞ 1: Input: synchronizing automaton A¼ðQ; A; dÞ 2: Output: synchronizing word for A 3: Q {1, 2, . . . , n}; s e 4: while jQj > 1: 5: choose two states p, q 2 Q 6: w the shortest path from {p, q} to 0 7: Q d(Q, w) 8: s s.w 9: return s A loop in line 4. is performed O(n) times. The shortest path (line 6.) can be found in O(jAjn2). Transformation in line 7. is done in O(n3), because jQj = O(n) and jwj = O(n2). Hence, the total complex- ity is O(jAjn3 + n4). 4.4. Eppstein and cycle algorithms Eppstein proposed a modification of Natarajan’s algorithm. The modification is based on a preprocessing in which for each pair of states we compute the first letter of the shortest word synchroniz- ing these states. Eppstein has shown that this preprocessing allows us to reduce the complexity to O(n3 + jAjn2). CYCLE is a slight modification of EPPSTEIN. In CYCLE, when a pair of states in synchronized into some state q, it is required that in the next step q must be one of the elements in the chosen pair. CYCLE works optimally for Černý automata, that is, always returns SSW. 4.5. SYNCHROP algorithm SYNCHROP (and its modified version, SYNCHROPL) algorithm (Roman, 2009) is, in comparison to NATARAJAN, a ‘one-step-ahead’ procedure – we do not choose arbitrary pair of states as in line 5. of NATARAJAN. Let w(p, q) be the shortest word synchronizing {p, q}. For each {p, q} we check how the set of states in the pair automaton will be transformed if we apply w(p, q) to all states we currently are in. Each transformation is rated in terms of some heuristically de- fined cost function. We choose the pair with the lowest cost func- tion. The remaining part of the algorithm is exactly the same as in NATARAJAN. In its original version, SYNCHROP algorithm does not use the preprocessing introduced in EPPSTEIN. Therefore, its complexity is O(n5 + jAjn2). The detailed description and discussion on SYNCHROP properties and complexity is given in Section 6. 5. Optimizing exponential algorithms In this section we deal with the exact synchronizing algorithms. We show how the selection of the efficient data structures affects on the time complexity. Let us consider the basic version of the algorithm shown in Listing 1. While the algorithm looks very simple, its performance greatly depends on the data-structures used. The following aspects of the algorithm must be considered: 1. transition function computation, 2. state representation, 3. queue implementation, 4. visited states’ set implementation, 5. predecessor tree implementation (required, if the actual SSW must be returned rather than its length). Judging by the complexity (given in Trahtman (2006)) of the SEMIGROUP algorithm implementation in TESTAS, checking if t was previously visited (Listing 1, line 12.) is assumed to be performed in H(nm), where m is the number of elements visited. This step can easily be done in time nlog m using the standard tree-based dictionaries. So the worst case runtime complexity can easily be re- duced from H(jAjns2) to H(jAjnslog s). Another simple optimization is possible. Note that the original algorithm generates sequences of states of size n (and not sets). We can treat these elements as sets without losing any valuable information. This should also speed the process up: semigroup size s can reach the value of nn (and reaches 22n for Cn ), while there are at most 2n subsets of the considered set. Finally, for small n, a trick can be used: sets can be mapped to integers. This technique will be described in more details later. An ordinary array can be used to check if a set was previously added to the visited sets (actually, similar trick can be applied when sequences of size n are considered: radix n rather than 2 must be used). This allows us to skip the logarithmic part in H(jAjnslog s) yielding H(jAjns) (assuming that n is small). Of course, one could argue that there is no point in analyzing asymp- totic complexity with bounded values of input size. In such cases this notation should be understood as a way to express the order of complexity. This makes a really big difference. For example, TESTAS (which apparently uses this algorithm) cannot handle calculating SSW of Cn for n P 16. After these simple optimizations, all automata of size up to 26 (or more) should be handled easily (space complexity be- comes a bigger problem in case of slowly-synchronizing auto- mata). More detailed comparison will be shown later. Interestingly, after performing these optimizations, the algo- rithm is very similar to the power-automaton-based approach. In fact, we will try to merge the best features of these two approaches. 5.1. Power automaton traversal We would like to focus now on the algorithm that utilizes the concept of the power-automaton: a breadth-first-search is per- formed, beginning with the start state (set of all states of the input automaton). When a singleton state is found, the SSW has been found and the computation can be terminated. In the effect, typi- cally only a subset of states of the power automaton is utilized. This is an important fact that will enable us to examine larger automata. 5.1.1. On-line transition function computation The power automaton transition function can be calculated on- line (i.e. whenever it is required). In order to compute a single power automaton transition, one to n transition functions of the original automaton must be calculated. This is reasonable and is, in fact, a standard approach used. Typically the resulting runtime complexity is O(jAjn2nM(n)), where M(n) is the cost of performing one mapping of state into an associated value. For small n it can be assumed to be O(jAjn2n) (this is a slight abuse of the O notation, as this algorithm is limited to small n; in fact we are not interested in asymptotic behavior of this function). 5.1.2. Off-line transition function computation It is possible to generate all transitions in the power automaton in an amortized constant time. This approach leads us to H(jAj2n) complexity of calculating SSW. Currently, memory requirements make it usable only for n < 30, so we will only investigate this case. Power automaton’s states S1 . . . S2n�1 are generated in Gray’s code order, which can be done in amortized constant time. This way, consecutive states differ on exactly one position (i.e jSi � Si+1j = 1, where � denotes a symmetric-difference operation). Power automaton transition for certain l 2 A can be expressed as DðSi; lÞ¼ dðSi \ Si�1; lÞ[ DðSi � Si�1; lÞ; ð2Þ R. Kudłacik et al. / Expert Systems with Applications 39 (2012) 11746–11757 11749 Author's personal copy where jSi \ Si�1j ¼ jSij� 1; jSi � Si�1j ¼ 1: If Si is the ith generated state, d(Si \ Si�1,l) can be determined in con- stant time. This can be done by storing the count of each element of the current transition’s function value. This information can be up- dated in constant time. Some additional pre-computation is necessary, so that logarithm of a number can be computed in constant time. This is required to map a power automaton’s state (representing the sym- metric difference of consecutive states) into input automaton’s state. Since jSi � Si�1j = 1, Si � Si�1 is encoded as an integer I being a power of two. So log(I) denotes the state’s number. Listing 3 shows how to calculate this value quickly. Finally, the entire algo- rithm shown in Listing 4 can be run in a constant time. Algorithm 3 FAST_LOG2(n) 1: Input: integer n = 2k 2: Output: log2n = k 3: r right (n)// right half, r < 216 4: if r > 0: 5: return log2[r]// precomputed value 6: l left (n) shr 16 7: return 16 + log2[l]// assuming 32 bit integers are used Algorithm 4 FAST POWER AUTOMATON TRANSITION (Si,Si+1, prev_trans, image_count, l) 1: Input: States Si, Si+1; transition function for Si; count of each element in the transition function’s image; letter l 2 A for which the transition function is calculated 2: Output: power automaton transition for a given state and letter 3: ex Si XOR Si+1// bitwise symmetric difference 4: in Si AND next// bitwise and 5: ret prev_trans 6: change fast_log2(ex)// changed state’s number 7: change_to delta (change, l) 8: if in == prev:// change was added 9: image_count[change]+ = 1 10: ret = ret OR change_to 11: else// change was removed 12: image_count[change_to]� = 1 13: if image_count[change_to] == 0 14: ret = ret XOR change_to 15: return ret A full graph representing the power automaton is constructed in memory. This approach is good for calculating SSW of slowly-synchroniz- ing automata. It takes exactly H(jAj2n) integer operations (that is, never takes less, unlike other implementations). It is not suitable for larger (n > 30) automata due to memory requirements. It is also not recommended for random and fast-synchronizing automata as they tend to have small power automata. This should be a good choice when the inexact algorithm (run prior to the exact one) shows that SSW may be long. 5.1.3. Mapping states to arbitrary information In the algorithm a set of visited states must be maintained. We will consider a more general solution: mapping states into arbi- trary values (when a Boolean value is used, this approach is equiv- alent to defining a subset by its characteristic function). This method will be useful for storing predecessor tree. 5.1.4. Dense mapping When we can afford keeping the entire mapping in memory, the situation is rather simple. To each set can be assigned an integer that will uniquely identify this set. There exists a convenient corre- spondence r: r : 2f1;...;ng ! N; r�1 : f0; . . . ; 2n � 1g! 2f1;...;ng; rðSÞ¼ X i2S 2i�1; r�1ðIÞ¼ fi 6 n : bit ði � 1Þ of I is set to 1g: In other words, subsets of a fixed set can be represented as its characteristic vector. This binary vector can be encoded as an inte- ger. It is a common technique. All relevant operations can be performed quickly. Hence, an ordinary array (of size 2n) can be used to map power automaton’s states into arbitrary data (for example, an information if a given set was visited earlier during the search). Since the array must fit into memory, n must be small. 5.1.5. Sparse mapping In case where only a subset (of unknown size) must be mapped, sparse data structures should be used. Such structures include tree-based dictionaries (like various BSTs) and hash tables. Trees require a strict weak ordering on elements (that is a proper ’ < ’ predicate must be supplied). Note that x ¼ y () :ðx < yÞ^:ðy < xÞ. Hash tables require an equality predicate (=). Also, it is required that hash value h can be calculated for each element. Tree-based dictionaries typically guarantee that insertions and retrievals perform H(log2m) key comparisons, where m is the number of stored keys. Hence, the complexity of performing key comparisons must be included in the estimation of the total com- plexity (this fact seems to be often omitted). Hash tables promise insertions and retrievals in O(1) time on average. Once more, the complexity of comparing keys and calcu- lating hashes must be taken into account. Using a trie data structure is also an option, but no efficient implementations seem to be available. An optimized implementa- tion based on a compact two-array approach would be suitable for our needs. There exist standard implementations of tree-based and hash- based dictionaries and we will not delve into further details here. We used std80). In effect, power automaton’s states can no longer be encoded into a single integer value. An alternative representation must be devised. Essentially, we need to represent subsets of {1, 2, . . . , n}. We will now investi- gate various data-structures that can be used for this purpose. Let a data structure S represent a set S # {1, 2, . . . , n}. By itera- tion over S we mean enumerating elements of S, preferably in sorted order. So if S represents the set {1, 2, 3}, iterating over S yields elements 1, 2, 3, and signalizes that no more elements are present. 5.1.7. Tree-based sets Using the standard set structures based on AVL or red–black trees (like std from 6: return down (tree, n, node, left (node)) 7: return down (tree, n, node, right (node)) The UPPERBOUND goes up to the root searching for the non-empty nodes. When a node is found, we go down, following the leftmost path (leading to leaves with indices greater than the start node) of non-empty nodes. Clearly, at most 2log n nodes are visited. This happens, for example, when only one value is present. The entire structure can be iterated using the function ITERATE (see Listing 7). Algorithm 7 ITERATE (tree, n) 1: Input: structure S 2: Output: next elements of S 3: elem = �1 4: while(elem != �1) 5: elem = UPPERBOUND(tree, n, elem + 1) 1: yield elem // the next element was found Let us investigate the amortized complexity of iterating over a full set. This is trivial, since UPPERBOUND function returns immedi- ately in line 5. Hence, the amortized complexity is constant when all elements are present. We already mentioned that when one ele- ment is present, the amortized complexity (as well as total) is 2log n. Instead of going up to a parent we can visit our grand-sibling’s parent without skipping any valuable nodes (sometimes we could visit the grand-sibling itself if it is non-empty, but this happens only once per UPPERBOUND call and lowers performance in the worst case). The remaining part of the algorithm is unchanged: if the cur- rent node is non-empty (marked), we start moving down. Other- wise, we continue moving right-up. This small change is important, as it enables us to perform more detailed complexity analysis. Note that the cost (counted in the number of visited nodes) between two leaves p, q is bounded by 3log(jp � qj). The search process can be divided into two phases. The search starts from node p. 1. Right-up jumps are performed until a non-empty node is found. Successive right-up jumps cull sub-trees of expo- nentially growing sizes: 1, 2, 4 etc. Since there are jp � qj empty leaves between the considered leaves, it is enough to perform logjp � qj jumps. 2. Now, the left-most non-empty path is followed. For each performed up or right-up jump a down-move must be performed (there were at most logjp � qj such jumps). Each down-move requires checking two nodes. It can be seen that at most 3log(jp � qj) nodes will be accessed. Let us now investigate the worst case. Assuming there are at least two elements (and one of them equals 0 for simplicity), they are iterated left-to-right: yi is the number of elements checked dur- ing step i (that is, during the ith call to UPPERBOUND). Let us put xi = yi+1 � yi "i < n. Now, the cost of iterating through all elements equals P i xi. Note that P i yi 6 n, but we will assume P iyi ¼ n, which is the worst case. The total cost is maxxð P i logðxiÞÞ¼ maxxðlogðPi xiÞÞ. From loga- rithm’s monotonicity we only have to maximize the product, which happens when xi = xj "i, j. This leads to function a n/a, where a is a parameter. Differentiating shows that it is maximized for a = e. Taking into account that we operate on positive integer values, we finally obtain that the cost is bounded by Oðlog 3 n 3 � � Þ¼ O n3 log 3 � � ¼ OðnÞ. The worst case amortized complexity has been improved from H(n) to H(logn), while worst case total complexity is still O(n). Memory consumption is increased twice (two bits per value are required). 5.1.11. Other data-structures Van Emde Boas Tree is an extension of the indexing tree concept described above. This heavy, recursive data-structure (each node contains a smaller VEB-Tree that is used as an index) enables iter- ation of s in H(jsjlog logn), so it guarantees H(log logn) amortized complexity. This structure is complex, so it would be an improve- ment only for much bigger n. It is of no use in our applications. 5.1.12. Partial power-set automaton The following technique can be used to reduce the amount of computations involved in calculating power automaton’s transi- tions. This, as far as we know, novel technique is most useful for small n. Let us define a partial power automaton of A for X � Q as PXðAÞ¼ ð2Q ; A; DjXÞ. In other words, the transition function domain is restricted to X. Note that the co-domain does not change. It is obvious that S # 2Q ^ [ S ¼ Q )PðAÞ¼ [ x2S PxðAÞ: From a mathematical standpoint it is a trivial tautology, but it turns out that it can be useful for our purposes. Let us consider a simple case. Put Q = {1, . . . , 32}, X = {1, . . . , 16}, Y = {17, . . . , 32}. Then jPXðAÞj¼ jPYðAÞj¼ 216 � 1. Clearly, values of DX and DY can be precomputed in 32 � 28 ( n ffiffiffiffiffi 2n p in general). Later, they can be used to construct the transition function of the entire power automaton by taking a union of transitions of the partial power automaton. Assuming that the union operation is faster then per- forming the transition in the original automaton, a speedup will occur. This is a low-level optimization, but experiments show that it can boost overall performance by a factor of 10. It is indeed possible to perform a fast set-union operation. When the represented universe is fixed (in this case: {1, . . . , 32}), a set can be represented as its characteristic vector, en- coded by an integer. Set-union is done by bit-wise OR operation, so the union of two states of size 32 is performed in one CPU instruction. This approach can be generalized. Let m (dividing n, for simplic- ity) be the maximum value such that m2n/m transitions can be pre- computed. Let us assume that our machine’s word size is 32 bits. There are d n32e words necessary to store one transition value. We need m values, so we need to bit-or m values, which takes mn32 oper- ations. Normally we would need to perform exactly n transitions in the original automaton. For m < 32 this technique should be faster. When m = 2, n = 32, exactly two results must be united. This can be done using one bit-or operation. In the result three integer oper- ations must be performed (instead of n/2 on average). The prepro- cessing phase takes 2 � 216 operations. 5.2. Possible further improvements As it was noted in Volkov (2008), an average length of SSW should be quite low (comparing to the conjectured (n � 1)2). According to the considerations of Volkov (referring to the paper 11752 R. Kudłacik et al. / Expert Systems with Applications 39 (2012) 11746–11757 Author's personal copy of Higgins (1988)), the expected SSW length is O(n). More precisely, it can be proved that a randomly chosen n-state autom- aton with a sufficiently large alphabet is synchronizing with prob- ability 1 as n goes to infinity and the length of its SSW does not exceed 2n. Therefore, statistically, only a small part of the power automaton must be visited in the search process. Recently, Skvort- sov and Tipikin (2011) provided experimentally the average length of SSW for a random n-state automaton over binary alphabet – it is O(n0.55). Regarding to the above facts, the described algorithm is not only exact, but also turns out to be quite efficient in an average case. Also, for small n, such an algorithm can be more effective than some polynomial-time solutions. One more improvement can be introduced. It aims at reducing the runtime complexity for certain types of automata (such as Černý automaton) whose SSWs are of the form wkv, w, v 2 A⁄. A similar heuristics was described in Trahtman (2006) to enhance the greedy algorithm. For each word wa such that jd(Q, w)j > jd(Q, wa)j, w 2 A⁄, a 2 A, the powers of this word are checked. It may turn out that (wa)k is a synchronizing word (just as in case of the Černý automaton). Actually, many slowly synchronizing automata (i.e. with long SSW) fall into this category. Note that by using this heuristics only a small fraction of the power automaton is visited, even though the length of SSW is quadratic and (normally) vast number of states would have to be traversed. Experimental data show that employing this heuristics for the Černý automata reduces the number of visited states to Hð1:5nÞ Hð ffiffiffiffiffi 2n p Þ. 5.3. Performance comparison Table 1 shows how the described optimizations affected perfor- mance. The results are compared with the popular synchronization tool, TESTAS (v. 1.0). Fig. 2 shows the results graphically. The measurements clearly show that the optimizations are very effective. Version 4 (using raw arrays of integers and partial power automaton concept), while significantly faster, cannot be used for larger automata. Other solutions, while slower, scale better and can be used for much larger automata, as long as the reachable power automaton size fits in memory. Algorithm 3 (using hash tables and the described fast bit-vec- tor) should be used for medium size automata. It can be more effective than algorithm 4 for smaller automata if the power automaton is relatively small. Approach 4 is superior for small (n < 30), slowly-synchronizing automata. Note that all of these approaches are faster than the algorithm used in TESTAS, due to the described complexity improvements. Fig. 2 shows performance for growing n. It must be noted that the memory conservation plays a major role here. In case of slowly synchronizing automata the compact structures enable cache-friendly (therefore fast) computations for small n, while for larger n they are necessary for the algorithm to work, due to memory requirements. Performance of algorithm 3 supports this conclusion. Algorithm 3 should be as fast as standard implementations using power automaton. Unlike them, it can handle larger automata (n > 30) as long as the reachable power automaton is small. Computations were performed with the use of a laptop manu- factured in 2004 (Athlon XP-M 2800+(@2.13 GHz), 768 MB of DDR RAM (@133 MHz)). 6. New greedy algorithm In this section we introduce a new greedy algorithm. It will be based on SYNCHROP, which itself is based on EPPSTEIN. Both algorithms are focused on choosing a pair of states, which should be synchro- nized in the next step and applying the proper word to the whole automaton. We can reformulate this task equivalently in terms of the pair automaton states (both EPPSTEIN and SYNCHROP use this struc- ture) as choosing the state, which should be transformed to a sin- gleton 0 state. From now on all procedures described below are regarded to the pair automaton A0 ¼ ðQ 0; A; d0Þ of the original input automaton A¼ðQ; A; dÞ. The choice of state depends on the set P of states that we are actually in, that is, the set P = d0(Q0, w), where w is the catenation of the words found in all previous steps. Such set will be called the active states set. The choice is based on the evaluation of the ac- tive states arrangement in the pair automaton. In SYNCHROP the eval- uation is based on the following heuristics: Heuresis 1. Let us define the distance d(p) of state p = {s1, s2} to the singleton state 0 as dðpÞ¼ min w2R� ðjwj : dðs1; wÞ¼ dðs2; wÞÞ: ð3Þ Let w be a synchronizing word for some state q. The bigger is the difference between d(p) and d(d(p, w)), the more profitable is the selection of w in the next algorithm step, because the distance of d(p, w) to the singleton state is smaller. The idea enclosed in the above heuristics is utilized in SYNCHROP in the following way: we compute the differences between states p and d(p, w), where w is the shortest synchronizing word for the pair q, as Dqðp; wÞ¼ dðdðp; wÞÞ� dðpÞ if p – q; 0 if p ¼ q: ð4Þ Table 1 Runtime (in seconds) for different implementations and different sizes of input Černý automata. No algorithm version C12 C14 C18 0 TESTAS: SSW < 1.0 s 18 s Time-out 1 Set of int 0.34 s 3.3 s 60 s 2 hash_set of vector of int 0.07 s 0.34 s 9.7 s 3 hash_set of fast_bitset 0.01 s 0.07 s 2.0 s 4 Int array 0.0 s 0.01 s 0.2 s Fig. 2. Graphical representation of performance for Cn. R. Kudłacik et al. / Expert Systems with Applications 39 (2012) 11746–11757 11753 Author's personal copy We compute Dq(p, w) for all active states except the singleton state. Let X be the set of all active states in the pair automaton. We define U1ðwÞ¼ X p2X Dqðp; wÞ: ð5Þ Having U1(w) for all the shortest words that synchronize pairs of states we choose the one with the smallest U1 and apply it to the automaton. The modified version of SYNCHROP, SYNCHROPL, uses U2 function – a modification of U1 – which takes into account the length of the word: U2ðwÞ¼ X p2X Dqðp; wÞþ jwj ¼ U1ðwÞþ jwj: ð6Þ Thanks to this penalty component shorter words are preferable. This is a good solution in case where there are two or more candi- date words with the same U1 value. Let us consider the complexity of SYNCHROP. The preprocessing part is the same as in EPPSTEIN and can be done in O(n3 + jAjn2). The new part is the choice of s1 and s2, which are to be synchro- nized. This is done O(n) times. To compute U1(w) we have to pro- cess all active states in order to compute D((s1, s2), w) values, which are O(n2). The U1 value has to be computed for all pairs of states of the input automaton (O(n2)). Hence, the total complexity of the SYNCHROP is O(n3 + jAjn2 + n(n2 � n2)) = O(n5 + jAjn2). 6.1. FASTSYNCHRO – a better SYNCHROP Comparing to NATARAJAN and EPPSTEIN, SYNCHROP and SYNCHROPL give good results, but have high complexity. In this section we present a modification of SYNCHROP with improved complexity and still pre- serving the quality. This algorithm will be called FASTSYNCHRO. The first modification regarding to SYNCHROP is the way that synchronizing word is created. Instead of computing U for the words synchronizing pairs of states of A, we will compute it for all letters from A. This modified function will be denoted by U3. Then, we will choose the letter that minimizes U3. This letter will be denoted by s. Let X be the set of active states in the pair automaton, A – the alphabet and d(p) – the distance of p to the singleton state. We define U3ðlÞ¼ X p2X ðdðd0ðp; lÞÞ� dðpÞÞ; p 2 X; l 2 A; s ¼ arg min l2A ðU3ðlÞÞ: Letter s is applied to all active states and added at the end of the currently found word, increasing its length by 1. The drawback of this solution is that it does not guarantee us to finally find the synchronizing word – we do not know if the number of active states will decrease after some number of steps. Therefore, we will use U3 only when it improves the arrangement of all active states in the pair automaton (that is, when U3 < 0). If U3 P 0, we will use U2 function for finding the word that guarantees the synchronization, hence necessarily decreases the number of active states. However, we introduce one restriction. In SYNCHROP the greatest impact on the complexity has the computation of U for all pairs of states and all the shortest words that synchronize these pairs. This can be done in O(n4). We will reduce it to O(n3) by reducing the number of processed words from square to linear order of magnitude by choosing only n shortest words synchro- nizing the pairs (if there is less than n words, we choose all of them). Such a choice, inspired by EPPSTEIN, is simple in realization and gives better results in average than other choices of words that were checked by us. The above modifications are given in Listing 8. Algorithm 8 FASTSYNCHRO(A) 1: Input: automaton A¼ðQ; A; dÞ 2: Output: synchronizing word w (if exists) 3: w e 4: A0 pair automaton of A 5: if A is not synchronizing return null 6: perform Eppstein preprocessing// see Section 4.4 7: X Q// X is the set of active states 8: count 0// a counter 9: while (jXj > 1) 10: a arg minl2A{U3(l)} 11: if (U3(a) < 0 AND count++ < jQj2) 12: w w.a; X d(X, a) 13: else 14: compute U2 for minfjQj; jXj2�jXj 2 g shortest words synchronizing the active states (denote Y for this set of words) 15: v argminy2Y{jyj} 16: w w.v; X d(X, v) 17: return w Let n = jQj. The complexity of line 4. is O(jAjn2). In line 5. we check if an automaton is synchronizing. This requires the use of BFS on a pair automaton with reverse transitions. This takes O(jAjn2). Eppstein preprocessing in line 6. takes O(jAjn2 + n3). Now consider the instructions in the while loop. The cost of line 10. is the cost of computing U3 for all letters – O(jAjn2). Application of the letter or word to all active states (lines 12. and 16.) is O(n), thanks to the Eppstein preprocessing. The cost of line 14. is O(n3), thanks to the restriction on the number of processed words. Now it remains to compute the number of while loop calls. Each application of v to the set of active states reduces their number at least by 1, so lines 14. – 16. will be executed at most n � 1 times. Alternatively, line 12. may be executed, but it does not necessarily reduce the number of active states. Therefore, to keep a tight rein on the total complexity, we set the restriction on the number of line 12. executions. In tests we have noticed that setting the limit to n2 had no influence on the algorithm results. Summarizing: the total cost of lines 4. – 6. is O(jAjn2 + n3), the cost of the instructions inside the while loop is O(n � n3) + O(n2 � jAjn2). This gives us the following theorem. Theorem 1. FASTSYNCHRO works in O(jAjn4) time complexity. 6.2. Experiments and comparison In this section we present the results of the experiments on heuristic algorithms. We focus on the efficiency (the running time) and the quality (the length of the synchronizing word found). We also make some remarks on FASTSYNCHRO algorithm. We tested five heuristic algorithms: EPPSTEIN, CYCLE, SYNCHROP, SYNCHROPL and FASTSYNCHRO. 6.2.1. Efficiency The efficiency has a great impact on the algorithms usability. In this subsection we present the efficiency comparison for EPPSTEIN, NATARAJAN, SYNCHROP and FASTSYNCHRO. Algorithms were tested for n = {10, 20, . . . , 300}. For each n one hundred random automata were generated such that 8a 2 A 8p; q 2 Q Prðdðp; aÞ¼ qÞ¼ 1n. If the generated automaton was not synchronizing, the procedure 11754 R. Kudłacik et al. / Expert Systems with Applications 39 (2012) 11746–11757 Author's personal copy was repeated. Tests were performed for jAj 2 {2,10} (Figs. 3 and 4). We have also performed tests for Černý automata (Fig. 5). It is clearly seen from Fig. 3 that SYNCHROP, due to its high com- plexity, is worse than other algorithms. The runtimes for other algorithms are comparable. The results for jAj = 10 does not differ much from these with jAj = 2. Running times are of course higher, but still the differences between NATARAJAN, EPPSTEIN and FASTSYNCHRO are small. Tests on Černý automata were performed to check how the algorithms work for automata with long synchronizing words. We can see noticeable decrease of efficiency in case of FASTSYNCHRO. EPPSTEIN and NATARAJAN work much faster in this case. 6.2.2. Quality In order to compare the algorithms in the widest possible con- text, the tests were performed for three different quality measures. If the number of all automata for a given number of states and alphabet size was reasonable, the algorithms were tested on all such automata. If the number of such automata was too big, we re- duced the tests to a subset of all possible random automata. All algorithms were run on the same set of automata. The first quality measure is the mean difference between the length of the word found by the algorithm and the SSW length. De- note by ALGðAÞ the word returned by algorithm ALG for automaton A. Let X be the set of all automata that were given as the input to ALG. Formally, we can define the first measure as M1ðXÞ¼ P A2XðjALGðAÞj� jSSWðAÞjÞ jXj : ð7Þ Table 2 shows that CYCLE and EPPSTEIN, despite their fast speed, does not give a good results in terms of M1. SYNCHROP and SYNCHROPL, based on Heuristics 1, are much better. FASTSYNCHRO is comparable to them and sometimes outperforms them (n = 5, 6, 10). The second quality measure, M2 is the ratio of cases in which ALG found the SSW to all cases: M2ðXÞ¼ P A2X½jALGðAÞj¼ jSSWðAÞj jXj ; ð8Þ where [expr] = 1 if expr is true and 0 otherwise. Notice how the quality decreases with increasing the alphabet size from 2 to 10. The ordering of the algorithms is the same as in case of M1 measure. To test the algorithms for automata with larger number of states, we need a measure which does not involve computing the Fig. 3. Efficiency for automata with jAj = 2. Fig. 4. Efficiency for automata with jAj = 10. Fig. 5. Efficiency for Černý automata. Table 2 Quality of algorithms in terms of M1, M2 and M3. C, sc Ep, SP, SPL, FS correspond to CYCLE, EPPSTEIN, SYNCHROP, SYNCHROPL and FASTSYNCHRO algorithms. n,jAj C EP SP SPL FS jXj Measure M1 3, 2 0.20 0.16 0 0 0 all automata 4, 2 0.36 0.33 0.06 0.03 0.04 All automata 4, 3 0.45 0.42 0.08 0.05 0.05 All automata 5, 2 0.64 0.55 0.21 0.17 0.13 All automata 6, 2 0.91 0.77 0.38 0.34 0.24 All automata 10, 2 1.97 1.63 1.00 0.96 0.78 105 random automata 10, 10 1.78 1.77 0.72 0.69 0.54 105 random automata 20, 2 4.18 3.49 2.18 2.07 2.01 105 random automata 20, 10 3.45 3.31 1.61 1.54 1.54 105 random automata Measure M2 3, 2 0.80 0.84 1 1 1 All automata 4, 2 0.71 0.72 0.94 0.97 0.97 All automata 4, 3 0.64 0.65 0.93 0.96 0.95 All automata 5, 2 0.57 0.60 0.85 0.87 0.89 All automata 6, 2 0.47 0.51 0.76 0.77 0.82 All automata 10, 2 0.25 0.28 0.49 0.50 0.57 105 random automata 10, 10 0.12 0.12 0.41 0.43 0.54 105 random automata 20, 2 0.08 0.10 0.23 0.24 0.28 105 random automata 20, 10 0.02 0.02 0.13 0.14 0.16 105 random automata Measure M3 50, 2 26.21 24.44 21.75 21.53 21.93 104 random automata 50, 10 16.32 15.49 12.84 12.71 13.00 104 random automata 100, 2 40.75 37.53 33.16 32.84 33.95 103 random automata 100, 10 25.30 23.41 19.84 19.61 20.78 103 random automata R. Kudłacik et al. / Expert Systems with Applications 39 (2012) 11746–11757 11755 Author's personal copy SSW length. Therefore, as M3 we took the mean length of the syn- chronizing words found by a given ALG. The use of this measure is meaningful only in relative comparison of two or more algorithms. M3ðXÞ¼ P A2XjALGðAÞj jXj : ð9Þ FASTSYNCHRO, in terms of M3, gives a little worse results than SYNCHROP and SYNCHROPL, however these results are still much better than those of EPPSTEIN and CYCLE. 6.3. Analysis of FASTSYNCHRO behavior In this section we investigate the behavior of FASTSYNCHRO. The analysis will allow us to explain the decrease of FASTSYNCHRO effi- ciency shown in Fig. 5. We will check what is the impact of differ- ent algorithm’s parts on the process of building the synchronizing word. Recall that in FASTSYNCHRO a synchronizing word can be ob- tained in two ways: first one is choosing the letter a 2 A and apply it to the set of active states. The other is to use U2 to find the pair of states and transform the set of active states by the word synchro- nizing this pair. We will refer to these two ways as to the first and the second part of the algorithm. We have performed an experiment for random automata. By g1 (resp. g2) we denote the number of executions of the first (resp. second) part of the algorithm. By k we denote the mean length of the synchronizing word found by FASTSYNCHRO and by k⁄ the esti- mated value of the SSW length for random automata over binary alphabet (Skvortsov & Tipikin, 2011). Notice that each execution of the first part of the algorithm cor- responds to the generation of exactly one letter added to the syn- chronizing word that is constructed. The value k � g1 expresses the number of letters added in result of the second part execution. From Table 3 we can see that with the increase of the number of states the fraction of the second part execution also grows. The ratio k�g1g2 is the mean length of the word added during the second part execution. This value remains relatively small and grows slightly with the increase of the number of states. When jAj is increased to 10, we can see that the influence of the second part decreases. Despite that its frequency is almost the same as in case jAj = 2, the mean length of the word added by each execution is 2–3 times shorter. The same experiment was performed for Černý automata (Table 4). These experiments explain why the runtime depends so much on the length of the synchronizing word found by the algorithm: the number of executions of the algorithm’s first part increases significantly. Also, the words found by the execution of the algo- rithm’s second part are not short anymore. Fortunately, the auto- mata with long SSWs are very rare, so the case of Černý automata is exceptional. 7. Conclusions We presented some efficient data structures for exact (expo- nential) synchronizing algorithm. Their application into the well-known algorithm that uses a power-set automaton makes the algorithm more effective than existing implementations. We also presented a new, greedy synchronizing algorithm and we compared it with some previously known greedy algorithms. Experiments show that our FASTSYNCHRO algorithm in general works better (that is, finds shorter synchronizing words) and usually works in a comparable time or faster than other methods. For lar- ger automata FASTSYNCHRO works twice as long as EPPSTEIN, but it finds much shorter synchronizing words. When one wants to find a synchronizing word, two factors have to be considered: the quality (the length of the synchronizing word) and the time. If time is a key issue, the optimal choice would be the EPPSTEIN algorithm. But if the quality is much more important (and this is usually the case in industrial testing of electronic cir- cuits, when one has to apply the same synchronizing word to thou- sands or millions copies of circuits), the best choice is to use our new FASTSYNCHRO algorithm. References Ananichev, D. S., & Volkov, M. V. (2003). Synchronizing monotonic automata. Lecture Notes in Computer Science, 2710, 111–121. Björn, K. (2005). Beyond the C++ standard library: An introduction to boost. Addison- Wesley. Broy, M., Jonsson, B., Katoen, J.-P., Leucker, M., & Pretschner, A. (2005). Model-based testing of reactive systems model-based testing of reactive systems. Advanced lectures. Lecture Notes in Computer Science, 3072. Černý, J. (1964). Poznámka k. homogénnym experimentom s konecnymi automatmi. Matematicko-fyzikálny Časopis Slovenskej Akadémie Vied, 14, 208–215. Černý, J., Pirická, A., & Rosenauerová, B. (1971). On directable automata. Kybernetika, 7(4), 289–298. Deshmukh, R. G., & Hawat, G. N. (1994). An algorithm to determine shortest length distinguishing, homing, and synchronizing sequences for sequential machines. In Proc. Southcon 94 conference (pp. 496–501). Eppstein, D. (1990). Reset sequences for monotonic automata. SIAM Journal of Computing, 19(3), 500–510. Fukada, A., Nakata, A., Kitamichi, J., Higashino, T. & Cavalli, A. R. (2001). A conformance testing method for communication protocols modeled as concurrent dfsms. In ICOIN (pp. 155–162). Higgins, P. M. (1988). The range order of a product of i transformations from a finite full transformation semigroup. Semigroup Forum, 37, 31–36. Hyunwoo, C., Somenzi, F., & Pixley, C. (1993). Multiple observation time single reference test generation using synchronizing sequences. In Proc. IEEE European conf. on design automaton (pp. 494–498). Kari, J. (2002). Synchronization and stability of finite automata. Journal of Universal Computer Science, 8(2), 270–277. Klyachko, A. A., Rystsov, I. K., & Spivak, M. A. (1987). An extremal combinatorial problem associated with the bound of the length of a synchronizing word in an automaton. Cybernetics and Systems Analysis, 23(2). translated from Kibernetika, No 2, 1987, pp. 16–20, 25. Lee, D., & Yannakakis, M. (1996). Principles and methods of testing finite state machines – a survey. In Proceedings of the IEEE (Vol. 84, pp. 1090–1123). Natarajan, B. K. (1986). An algorithmic approach to the automated design of part orienters. In Proc. IEEE symposium on foundations of computer science (pp. 132– 142). Olschewski, J., & Ummels, M. (2010). The complexity of finding reset words in finite automata. Lecture Notes in Computer Science, 6281, 568–579. Pin, J.-E. (1983). On two combinatorial problems arising from automata theory. Annals of Discrete Mathematics, 17, 535–548. Table 3 Behavior of FASTSYNCHRO for random automata. n,jAj g1 g2 k k⁄ k � g1 k�g1 g2 jXj 10, 2 6.52 0.44 7.52 6.92 0.99 2.28 10 000 20, 2 10.11 0.77 12.26 10.13 2.15 2.80 10 000 50, 2 16.12 1.63 22.07 16.77 5.95 3.65 10 000 100, 2 21.69 2.83 34.07 24.55 12.38 4.38 1 000 200, 2 27.38 4.56 51.81 35.94 24.42 5.35 1 000 10, 10 4.01 0.02 4.04 n/a 0.02 1.00 10 000 20, 10 6.68 0.21 6.91 n/a 0.22 1.07 10 000 50, 10 12.00 0.76 13.02 n/a 1.01 1.34 10 000 100, 10 17.60 1.92 20.88 n/a 3.28 1.71 1 000 200, 10 24.32 4.16 32.72 n/a 8.40 2.02 1 000 Table 4 Behavior of FASTSYNCHRO for Černý automata. n g1 g2 k k � g1 k�g1 g2 10 24 3 81 57 19 20 67 4 361 294 73 50 211 5 2401 2190 438 100 520 6 9801 9281 1546 200 1238 7 39601 38363 5480 11756 R. Kudłacik et al. / Expert Systems with Applications 39 (2012) 11746–11757 Author's personal copy Pixley, C., Jeong, S.-W., & Hachtel, G. D. (1994). Exact calculation of synchronizing sequences based on binary decision diagrams. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 13(8), 1024–1034. Pomeranz, I., & Reddy, S. M. (1998). On synchronizing sequences and test sequence partitioning. In Proc. 16th IEEE VLSI test symposium (pp.158–167). Ponce, A.M., Csopaki, G., & Tarnay, K. (1994). Formal specification of conformance testing documents for communication protocols. In 5th IEEE international symposium on personal, indoor and mobile radio communications (Vol. 4, pp. 1167–1172). Roman, A. (2009). Synchronizing finite automata with short reset words. Applied Mathematics and Computation, 209(1), 125–136. Skvortsov, E., & Tipikin, E. (2011). Experimental study of the shortest reset word of random automata. Lecture Notes in Computer Science, 6807, 290–298. Trahtman, A. N. (2006). An efficient algorithm finds noticeable trends and examples concerning the cerny conjecture. Lecture Notes in Computer Science, 4162, 789–800. Trahtman, A. N. (2009). The road coloring problem. Israel Journal of Mathematics, 1(172), 51–60. Volkov, M. V. (2008). Synchronizing automata and the cerny conjecture. Lecture Notes in Computer Science, 5196, 11–27. Zhao, Y., Liu, Y., Guo, X., & Zhang, C. (2010). Conformance testing for is-is protocol based on e-lotos. In IEEE int. conf. on information theory and information security (pp. 54–57). R. Kudłacik et al. / Expert Systems with Applications 39 (2012) 11746–11757 11757 work_3gmfusebffdg3m6qdqgyme5u7e ---- Mining the data from a hyperheuristic approach using associative classification This is a repository copy of Mining the data from a hyperheuristic approach using associative classification. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/75051/ Version: Submitted Version Article: Thabtah, Fadi A. and Cowling, Peter I. orcid.org/0000-0003-1310-6683 (2008) Mining the data from a hyperheuristic approach using associative classification. Expert systems with applications. pp. 1093-1101. ISSN 0957-4174 https://doi.org/10.1016/j.eswa.2006.12.018 eprints@whiterose.ac.uk https://eprints.whiterose.ac.uk/ Reuse Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of the full text version. This is indicated by the licence information on the White Rose Research Online record for the item. Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing eprints@whiterose.ac.uk including the URL of the record and the reason for the withdrawal request. Mining the data from a hyperheuristic approach using associative classification Fadi Thabtah a,*, Peter Cowling b a Department of Computing and Engineering, University of Huddersfield, Huddersfield, UK b MOSAIC Research Centre, University of Bradford, Bradford, UK Abstract Associative classification is a promising classification approach that utilises association rule mining to construct accurate classification models. In this paper, we investigate the potential of associative classifiers as well as other traditional classifiers such as decision trees and rule inducers in solutions (data sets) produced by a general-purpose optimisation heuristic called the hyperheuristic for a personnel scheduling problem. The hyperheuristic requires us to decide which of several simpler search neighbourhoods to apply at each step while constructing a solutions. After experimenting 16 different solution generated by a hyperheuristic called Peckish using different classifi- cation approaches, the results indicated that associative classification approach is the most applicable approach to such kind of problems with reference to accuracy. Particularly, associative classification algorithms such as CBA, MCAR and MMAC were able to predict the selection of low-level heuristics from the data sets more accurately than C4.5, RIPPER and PART algorithms, respectively. � 2007 Elsevier Ltd. All rights reserved. Keywords: Associative classification; Classification; Data mining; Hyperheuristic; Scheduling 1. Introduction Heuristic and metaheuristic approaches have been applied widely in personnel-scheduling problems (Blum & Roli, 2003; Cowling, Kendall, & Han, 2002). A metaheuris- tic could be defined as a recursive process which directs a simpler local search method by using different concepts for exploring and exploiting the search space in order to achieve good enough solutions (Blum & Roli, 2003). There are several different metaheuristic strategies for solving scheduling and optimisation problems such as local search, tabu search, simulated annealing and variable neighbour- hood search. Hamiez and Hao (2001) have used a tabu search-based method to solve the sport league scheduling problem (SLSP). Their implementation of the enhanced tabu search algorithm was able to schedule a timetable for up to 40 teams and its performance in term of the CPU time was excellent if compared with previous algorithms such as that of (McAloon, TretKoff, & Wetzel, 1997) that had been used for solving the same problem. Aicklen and Dowsland (2000) have used Genetic algorithms to deal with a nurse rostering problem in major UK hospi- tals, and Hansen and Mladenovic (1997) have showed that variable neighbourhood search is an effective approach for solving optimisation problems in which it generates good or sometimes near-optimal solutions in a moderate time. Cowling, Kendall, and Soubeiga (2000) and Cowling et al. (2002) argued that metaheuristic and heuristic approaches tend to be knowledge rich and require exten- sive experience in the problem domain and the selected heuristic techniques, and therefore they are expensive in term of their implementation. A new general framework to deal with large and complex optimisation and schedul- ing problems, called a hyperheuristic, has been proposed 0957-4174/$ - see front matter � 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2006.12.018 * Corresponding author. E-mail addresses: F.Thabtah@hud.ac.uk (F. Thabtah), P.I.Cowling@- bradford.ac.uk (P. Cowling). www.elsevier.com/locate/eswa Available online at www.sciencedirect.com Expert Systems with Applications 34 (2008) 1093–1101 Expert Systems with Applications by Cowling et al. (2000). It tends to robustly find good solutions for large and complex scheduling problems and has been proven to be effective in many experiments (Cowling & Chakhlevitch, 2003; Cowling et al., 2002). A hyperheuristic approach can be described as a super- visor, which controls the choice of which local search neighbourhood to choose while constructing a solution/ schedule. A local search neighbour, also known as a low- level heuristic, is a rule or a simple method that generally yields a small change in the schedule. Often these low-level heuristics are based on normal methods of constructing a schedule such as adding an event, deleting an event or swapping two events. Fig. 1 represents the general hype- rheuristic framework that at each iteration, selects and applies the low-level heuristic that has the largest improve- ment on the objective function, i.e. LLH5 in the figure shown below. The arrows going from and to the hyper- heuristyic in Fig. 1 represent the selected low-level heuris- tics improvement values on the objective function obtained after trying them by the hyperheuristic. The training scheduling problem that we consider in this paper is a complex optimisation problem for a large finan- cial service company (Cowling et al., 2002). It involves a number of events, trainers, and locations to be scheduled over a period of time. The task is to create a timetable of courses distributed over several locations in a specific period of time using a known number of trainers. A more detailed description of the problem is presented in the next section. In this paper, our aim is to determine an applicable data mining technique to the problem of deciding which low-level heuristic to apply in a given situation, using infor- mation about heuristic performance derived earlier. In particular, we would like to answer questions like: which learning algorithm can derive knowledge that could direct the search in order to produce good solutions? To achieve our goal, we compare three associative classi- fication techniques CBA (Liu, Hsu, & Ma, 1998), MCAR (Thabtah, Cowling, & Peng, 2005), MMAC (Thabtah, Cowling, & Peng, 2004) and two popular traditional classification techniques (PART (Frank & Witten, 1998) and RIPPER (Cohen, 1995)) on data sets generated using a hybrid hyperhueristic called Peckish (Cowling & Chakhlevitch, 2003), for the trainer scheduling problem. We analyse data from several solutions of the Peckish hyperheuristic that combines greedy (best first) and ran- dom approaches. We identify that the learning task involves classifying low-level heuristics in terms whether they improved the objective function in old solutions in order to produce useful rules. These rules then will be used to decide the class attribute ‘‘low-level heuristic’’ while con- structing new solutions. We use the classification algo- rithms mentioned above to learn the rules. The training scheduling problem and different hyperheu- ristic approaches utilised to solve it are discussed in Section 2. Section 3 is devoted to the applicability of data mining classification algorithms to predict the behaviour of low-level heuristics used by the Peckish hyperheuristic. Data sets, their features and experimental results are presented in Section 4 and finally conclusions are given in Section 5. 2. The training scheduling problem and hyperheuristics A much simpler version of the training scheduling prob- lem has been solved in Cowling et al. (2002) using a Hyper- Genetic algorithm. A larger and more complex problem, which has been described in Cowling and Chakhlevitch (2003) is summarised in this section. It involves a number of events, trainers, and locations to be scheduled over a period of time. The task is to create a timetable of geo- graphically distributed courses over a period of time using different trainers, and the aim is to maximise the total pri- ority of courses and to minimise the amount of travel for each trainer. The problem is associated with a large num- ber of constraints such as: • Each event is to be scheduled at one location from the available number of locations. • Each event must start within a specified time period. • Each event can occur at most once. • Each event to be delivered by competent trainers from the available trainers. • Each location has a limited number of rooms and rooms have different capacities and capabilities. The data used to build the solutions of the training scheduling problem is real data provided by a financial firm where training is given by 50 trainers over a period of 3 months in 16 different locations. Further, there are about 200 events to be scheduled and 95 different low-level heuris- tics that can be used to build each solution. However, solu- tions given to us by the authors of Cowling and Chakhlevitch (2003) have been constructed using only 10 low-level heuristics, where each low-level heuristic repre- sents local search neighbourhoods. For example, selecting a location with the lowest possible travel penalty for a par- ticular trainer to deliver a course as early as possible corre- sponds to a low-level heuristic. In solving the trainer scheduling problem, three hype- rheuristic approaches, random, greedy and hybrid, have been used. All of these approaches aim to manage the choice of which low-level heuristics to choose during the process of building the solution. The random approach LLH1 LLH2 LLH5LLH3 LLH4 LLH6 Hyperheurisic Fig. 1. Hypreheuristic general framework. 1094 F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 selects a low-level heuristic at random from the available ones in the problem. At each choice point in the search space, commonly all low-level heuristics have an equal chance to be selected. On the other hand, the greedy approach selects the low-level heuristic that yields the big- gest improvement in the objective function. If none of the available ones improve the objective function then the algorithm will be trapped in a local optimum. The hybrid approach is named Peckish and consists of a combination of greedy and random approaches. It builds a solution by selecting a single low-level heuristic during each iteration in the search in order to apply. The choice is based on the low-level heuristic that has the largest improvement on the objective function in the problem (if one exists). In the case that none of the available low-level heuristics improve upon the objective function value, then the choice is random. In this paper, we choose a low-level heuristic from a can- didate list of good low-level heuristics. By changing the length of this candidate list and considering how it is merged, we can trade off the degree of greediness and ran- domness in the Peckish hyperheuristic. As a result, several different solutions produced by the Peckish hyperheuristic are investigated. We analysed the strategy used by the Peckish hyperheu- ristic to construct a solution and observed that all available low-level heuristics in the problem must be tested in order to record their effect on the objective function at each iter- ation and apply only a single one. Data mining could pro- vide a much quicker prediction of effective low-level heuristics at each iteration. In the next section, we investi- gate some of the popular data mining techniques for learn- ing the sets of low-level heuristics that improve the objective function and have been applied by the Peckish hyperhueristic. 3. Data mining for the selection of low-level heuristics Since we are aiming to use knowledge derived from old solutions of the problem, data mining seems an appropri- ate technique to extract that knowledge. The next task is to identify which data mining method is applicable to extract knowledge from solutions generated by the Peck- ish hyperheuristic. As mentioned earlier, the Peckish hype- rheuristic usually selects and applies the low-level heuristic that leads to the largest improvement on the objective function and this is the class we want to find. In other words, we can learn rules that predict the performance of low-level heuristics in some solution runs and use these rules to forecast which low-level heuristics the hyperheu- ristic should choose in other runs. Since we are predicting a particular attribute (low-level heuristic), as a result, supervised learning approaches such as classification are appropriate. There are many classification approaches for extracting knowledge from data that have been studied in the litera- ture Cendrowska (1987), Quinlan (1993) and Cohen (1995). Three common approaches, divide-and-conquer (Quinlan, 1987), rule induction (Cohen, 1995; Furnkranz & Widmer, 1994) and associative classification (Li, Han, & Pei, 2001; Liu et al., 1998) have been selected for our base comparison. Further, five classification techniques related to such approaches have been compared, which are PART (Frank & Witten, 1998), RIPPER (Cohen, 1995), CBA (Liu et al., 1998), MCAR (Thabtah et al., 2005) and MMAC (Thabtah et al., 2004). Our choice of these methods is based on the different schemes they use in learning rules from data sets. In the next subsection, we briefly survey these algorithms. 3.1. Associative classification Associative classification techniques employ association rule discovery methods to find the rules. This approach was introduced in 1997 by Ali, Manganaris, and Srikant (1997) to produce rules for describing relationships between attri- bute values and the class attribute and not for prediction, which is the ultimate goal for classification. In 1998, asso- ciative classification has been successfully employed to build classification models (classifiers) by Liu et al. (1998) and later attracted many researchers, e.g. (Yin & Han, 2003), from data mining and machine learning communi- ties. In this subsection we survey associative classification techniques used in this paper to generate rules from the hyperheuristic data. 3.1.1. Classification based on association (CBA) The idea of using association rule mining in classifica- tion problems was first introduced in Liu et al. (1998), in which an algorithm called CBA is proposed, which oper- ates in three main steps. Firstly, if the intended data set contains any real or integer attributes, it is disctretised using multi-interval discretisation method of Fayyad and Irani (1993). Secondly, the Apriori candidate generation step (Agrawal & Srikant, 1994) is adopted to find the potential rules. Apriori candidate generation method necessitates multiple passes, where the potential rules found in the previous pass are used for the generation of potential rules in the current pass. This repetitive scans requires high CPU time and main memory. Once all poten- tial rules are produced, the subset that leads to the least error rate against the training data set is selected to from the classifier. The selection of such subset is accomplished using the database coverage heuristic, which ensures that every rule in the classifier must cover correctly at least one training data object. 3.1.2. MCAR: multi-class classification based on association rule A recently developed associative classification algo- rithm called MCAR (Thabtah et al., 2005) employs tid- list intersections to quickly find the rule. This algorithm consists of two main phases: rules generation and a clas- sifier builder. In the first phase, the training data set is F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 1095 scanned once to discover the potential rules of size one, and then MCAR intersects the potential rules tid-lists of size one to find potential rules of size two and so forth. This rules discovery method does no require passing over the training data multiple times. In the second phase, rules created are used to build a classifier by considering their effectiveness on the training data set. Potential rules that cover certain number of training objects will be kept in the final classifier. Finally, MCAR adds upon previous rule ranking approaches in associative classification, which are based on (confidence, support, rule length) by looking at the class distribution frequencies in the training data and prefers rules that are associated with dominant classes. Experimental results showed that MCAR rule ranking method reduces rule random selection during the process of ranking the rules especially for dense clas- sification data. 3.1.3. Multi-class, multi-label associative classification (MMAC) The MMAC algorithm consists of three steps: rules gen- eration, recursive learning and classification. It passes over the training data in the first step to discover and generate a complete set of rules. Training instances that are associated with the produced rules are discarded. In the second step, MMAC proceeds to discover more rules that pass user pre- defined thresholds denoted by minimum-support and mini- mum-confidence from the remaining unclassified instances, until no further potential rules can be found. Finally, rule sets derived during each iteration are merged to form a glo- bal multi-label classifier that then is tested against test data. The distinguishing feature of MMAC is its ability of gener- ating rules with multiple classes from data sets where each of their data objects is associated with just a single class. This provides decision makers with useful knowledge dis- carded by other current associative classification algorithms. 3.2. Traditional classification approaches 3.2.1. C4.5 C4.5 algorithm was created by Quinlan (1993) as a deci- sion tree method for extracting rules from a data set. C4.5 is an extension of the ID3 algorithm (Quinlan, 1979), which accounts for missing values, continuous attributes and pruning of decision trees. As for the ID3 algorithm, C4.5 uses information gain to select the root attribute. The algo- rithm selects a root attribute from the ones available in the training data set. C4.5 makes the selection of the root based on the most informative attribute and the process of selecting an attribute is repeated recursively at the so- called child nodes of the root, excluding the attributes that have been chosen before, until the remaining training data objects can not be split any more (Quinlan, 1979). At that point, a decision tree is outputted where each node corre- sponds to an attribute and each arc to a possible value of that attribute. Each path from the root node to any give leaf in the tree corresponds to a rule. One of the major extensions of the ID3 algorithm that C4.5 proposed is that of pruning. Two known pruning methods used by C4.5 to simplify the decision trees constructed are sub-tree replace- ment and pessimistic error estimation (Witten & Frank, 2000). 3.2.2. Repeated incremental pruning to produce error reduction algorithm (RIPPER) RIPPER is a rule induction algorithm that has been developed in 1995 by Cohen (1995). It builds the rules set as follows: The training data set is divided into two sets, a pruning set and a growing set. RIPPER constructs the classifier using these two sets by repeatedly inserting rules starting from an empty rule set. The rule-growing algorithm starts with an empty rule, and heuristically adds one condi- tion at a time until the rule has no error rate on the growing set. In fact, RIPPER is a refined version of an earlier developed algorithm called Incremental Reduced Error Pruning (IREP) (Furnkranz & Widmer, 1994) that adds a post pruning heuristic on the rules. This heuristic has been applied to the classifier produced by IREP as an optimisation phase, aiming to simplify the rule set fea- tures. For each rule ri in the rule set, two alternative rules are built; the replacement of ri and the revision of ri. The replacement of ri is created by growing an empty rule r 0 i and then pruning it in order to reduce the error rate of the rules set including r0 i on the pruning data set. The revi- sion of ri is constructed similarly except that the revision rule is built heuristically by adding one condition at a time to the original ri rather than to an empty rule. Then the three rules are examined on the pruning data to select the rule with the least error rate. The integration of IREP and the optimisation procedure forms the RIPPER algorithm. 3.2.3. PART Unlike the C4.5 and RIPPER techniques that operate in two phases, the PART algorithm generates rules one at a time by avoiding extensive pruning (Frank & Witten, 1998). The C4.5 algorithm employs a divide-and-conquer approach, and the RIPPER algorithm uses rule induction approach to derive the rules. PART combines both approaches to find and generate rules. It adopts rule induc- tion approach to generate a set of rules and uses divide- and-conquer to build partial decision trees. The way PART builds and prunes a partial decision tree is similar to that of C4.5, but PART avoids constructing a complete decision tree and builds partial decision trees. PART differs from RIPPER in the way rules are created, where in PART, each rule corresponds to the leaf with the largest coverage in the partial decision tree. On the other hand, RIPPER builds the rule in a greedy fashion, starting from an empty rule, it adds conditions, until the rule has no error rate and the process is repeated. Missing values and pruning tech- niques are treated in the same way as C4.5. 1096 F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 4. Data and experimental results 4.1. Data sets and their features Data from 16 different solutions produced by Peckish hyperheuristic for the training scheduling problem were provided by the authors of Cowling and Chakhlevitch (2003). Each solution represents 500 iterations of applied low-level heuristics and is given in a text file. Twelve of the data files each represents only a single solution, whereas, each of the remaining four files represents ten combined solutions. Ten different low-level heuristics (LLH 1, LLH 2, LLH 20, LLH 27, LLH 37, LLH 43, LLH 47, LLH 58, LLH 68, LLH 74) have been used to produce each solution. Each file consists of 15 different attributes and 5000 instances. One iteration in a single solu- tion is shown in Table 1 where the bold row indicates that low-level heuristic number 74 was applied by the Peckish hyperheuristic because it has the largest improvement on objective function. In Table 1, column LLH represents the low-level tested, ESM and RSM columns stand for event and resource selection methods, respectively which specify how an event is scheduled. SE indicates the selected event number to be scheduled. UE reflects whether another event is a conflict with currently scheduled event, i.e. they share the same trainer, location, or timeslot. EID corresponds to the event identification number and R stands for rescheduled, which means, if swapping unscheduled event from the schedule with the selected event is possible, then, it is possible to reschedule back the removed event from the schedule. OP, NP, OPE and NPE correspond to old priority, new priority, old penalty and new penalty, respectively. The new priority and new penalty values represent the total pri- ority and penalty of the schedule after applying a low-level heuristic. The difference between the new priority and new penalty values gives the value of the objective function. IMP stands for amount of improvement and represents the change in value on the objective function after a low- level heuristic has been applied. CPU is the time in search for the low-level heuristic to be applied and SC reflects whether or not the current schedule has been changed, i.e. whether the current low-level heuristic had any effect at all, (‘1’ changed or ‘0’ not changed). Finally AP column indicates whether or not the current low-level heuristic has been applied (‘1’ applied or ‘0’ not applied) by the hyperheuristic. After analysing the data in each file, we identified that six attributes have some correlation to the class attribute (LLH), which are (OP, NP, OPE, NPE, IMP, AP). How- ever, we are interested to learn rules that represent useful sequence of applied low-level heuristics at different itera- tions and lead to improvement on the objective function. Therefore, solutions generated by the Peckish hyperheuris- tic have been filtered to retain certain iterations where improvements have occurred upon the objective function. Furthermore, a PL/SQL program has been designed and implemented to generate a new structure of each solution in order to enable the extraction of rules that represent the sequence of applied low-level heuristics. In other words, each training instance in the new structure should contain the low-level heuristics that improved the objective func- tion in the current iteration along with others applied in the previous iterations. Specifically, in each solution run and for each iteration, we record low-level heuristics applied in the previous three iterations along with the ones that improved the objective function in the current iteration. Table 2 represents part of a solution run (data) gener- ated in the new structure after applying the PL/SQL pro- gram on the initial solution features, where columns LLH_3, LLH_2 and LLH_1 represent the low-level heu- ristics applied at the previous three iterations. Column LLH represents the current low-level heuristic that improved the objective function and column Imp repre- sents the improvement on the objective function value. Finally column Apply represents whether or not the selected low-level heuristic has been applied by the hype- rheuristic. As shown in Table 2, data generated by the hyperheuristic have multiple labels, since there could be more than one low-level heuristic that improve the objec- tive function at any give iteration. Hence, each training instance in the scheduling data may associate with more than one class. Table 1 One iteration of the Peckish hyperheuristic for the training scheduling problem LLH ESM RSM SE UE EID R OP NP OPE NPE IMP CPU SC AP 1 0 1 1 0 �1 �1 72,999 72,999 830 830 0 0.03 0 0 2 0 2 128 0 �1 �1 72,999 72,999 830 830 0 0.02 0 0 20 4 0 12 0 �1 61 72,999 72,999 830 830 0 0.02 0 0 27 0 2 36 �1 �1 �1 72,999 72,999 830 830 0 0 0 0 37 1 2 70 �1 �1 58 72,999 72,999 830 830 0 0 0 0 43 1 8 142 �1 �1 �1 72,999 72,999 830 830 0 0 0 0 47 2 2 22 �1 �1 �1 72,999 72,999 830 830 0 0 0 0 58 3 3 18 �1 �1 �1 72,999 72,999 830 830 0 0 0 0 68 4 3 25 �1 �1 �1 72,999 72,999 830 830 0 0 0 0 74 4 9 57 �1 �1 �1 72,999 73,099 830 830 100 0 1 1 F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 1097 4.2. Experimental results In this section, we describe the experiments to evaluate classification accuracy and rules features produced by dif- ferent rule learning algorithms from the optimisation data sets. We have performed a number of experiments using ten-fold cross validation on 16 different data files derived by the Peckish hyperheuristics for the trainer scheduling problem. The sizes of the data files after applying the PL/ SQL program vary. There are 12 data files, each contains 182–750 training instances, where each file represents only one single run of the Peckish hyperheuristic. The remaining four data files contain 1500–2400 training instances and represent ten combined solutions. Two popular traditional classification algorithms (PART, C4.5) and three associa- tive classification techniques (CBA, MCAR, MMAC) have been compared in terms of accuracy. The experiments of PART and C4.5 have been conducted using Weka software system (WEKA, 2000) and CBA experiments have been performed using a VC++ version provided by the authors of CBA (1998). Finally MCAR and MMAC algorithms were implemented in Java under Windows XP on 1.7 Ghz, 256 RAM machine. The relative prediction accuracy, which corresponds to the difference of the classification accuracy of CBA, PART and C4.5 algorithms with respect to (MCAR, MMAC) are shown in Figs. 2–4, respectively. Fig. 2 for instance signifies the difference of the accuracy between CBA classifiers rela- tive to those derived by (MCAR, MMAC) on the 16 optimisation data sets. The relative prediction accuracy figures shown are computed using the formulae ðAccuracyMCAR�AccuracyCBAÞ AccuracyCBA and ðAccuracyMMAC�AccuracyCBAÞ AccuracyCBA for CBA. The same sort of formulae has been used for PART and C4.5 algorithms, respectively. We used a minsupp of 2% and a minconf of 30% in the experiments of CBA, MCAR and MMAC algorithms. The label-weight evaluation mea- sure (Thabtah et al., 2004) has been used to calculate the accuracy for MMAC algorithm in the figures. -40.00% -30.00% -20.00% -10.00% 0.00% 10.00% 20.00% 30.00% 40.00% 50.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Data sets A c c u ra c y ( % ) d if fe re n c e M CA R M M A C CB A Fig. 2. Difference of accuracy between CBA and (MCAR, MMAC). -40.00% -20.00% 0.00% 20.00% 40.00% 60.00% 80.00% 100.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Data sets A c c u ra c y ( % ) D if fe re n c e M CA R M M A C P A RT Fig. 3. Difference of accuracy between PART and (MCAR, MMAC). Table 2 Sample of optimisation data Iteration LLH 3 LLH 2 LLH 1 LLH Imp Apply 1 58 68 58 20 100 1 1 58 68 58 43 25 0 2 20 27 43 74 50 1 2 20 27 43 2 10 0 2 20 27 43 58 8 0 3 43 27 58 68 875 1 4 37 20 2 37 1055 1 4 37 20 2 2 950 0 4 37 20 2 74 69 0 4 37 20 2 58 9 0 1098 F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 Label-weight evaluation method assigns each class in a multi-label rule a value based on its number of occurrences with rule’s consequent (left-hand-side) in the training data. To clarify, a training object may belong to several classes where each one associated with it by a number of occur- rences in the training data set. Each class can be assigned a weight according to how many times that class has been associated with the training object. Thus, unlike the error- rate method (Witten & Frank, 2000), which considers only one class for each rule in computing the correct predictions, label-weight gives a value for each possible class in a rule according to its frequency in the training data. This gives the top ranked class in a rule the highest weight and not all the weight as error-rate method does. The accuracy results shown in the graphs indicated that associative classification algorithms outperformed the other learning techniques over the majority of test instances. Particularly, CBA, MCAR and MMAC outperformed the other learning algorithms on 5, 6, 7 and 5 benchmark problems, respectively. The won-loss-tied record of MCAR against CBA, C4.5 and PART are 11-5-0, 10-6-0 and 11-5-0, respectively. The MMAC won-loss-tied records against CBA, C4.5 and PART are, 10-6-0, 9-7-0 and 11-5-0, res- pectively. These figures show that associative classifica- tion approach is able to produce more accurate classifiers than decision trees and rule induction approaches, res- pectively. It should be noted that CBA, PART and C4.5 algorithms outperformed (MCAR, MMAC) on data set number 14 in Figs. 2–4, respectively. After analysing the data in this par- ticular set, it turned out that classes in this set are not evenly distributed. For example, class LLH2 and LLH20 were fre- quently applied by the hyperheuristic in this data set, whereas, classes LLH27, LLH37, LLH43 and LLH58 were rarely been used by the hyperheuristic. This is compound by the limited number of training instances in this particular data set (182 training data objects). Analysis of the classifiers produced revealed consistency in the accuracy of both PART and C4.5 because the differ- ence on average in the accuracy between them in all exper- iments is less than 1.6 %. This supports research works conducted in Frank and Witten (1998), where they show that despite the simplicity of PART, it generates rules as accurately as C4.5 and RIPPER. Also C4.5 and PART algorithms showed consistency in the number of rules produced. Analysis of the rules features generated from the hyper- huristic data has been carried out. Fig. 5 shows the number of rules extracted from nine data sets, categorised by the number of classes. MMAC is able to extract rules that are associated with up to four classes for this data. This is one of the principle reasons for improving the accuracy within applications. Fig. 5 also demonstrates that the majority of rules created from each solution are associated with one or two class labels. It turns out that this reflects accurately the nature of the hyperheuristic data, since dur- ing each iteration, normally only one or two low-level heu- ristics improve on the objective function in the scheduling problem. Thus, each training instance usually corresponds to just one or two classes. The additional classes discovered by MMAC algorithm from the real data represent useful knowledge discarded by CBA, PART and RIPPER algorithms. The fact that -60.00% -40.00% -20.00% 0.00% 20.00% 40.00% 60.00% 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Data sets A c c u ra c y ( % ) d if fe re n c e M CA R M M A C C4.5 Fig. 4. Difference of accuracy between C4.5 and (MCAR, MMAC). 0 4 8 12 16 20 24 28 32 36 40 1 2 3 4 5 6 7 8 9 Data set N u m b e r o f R u le s 1-label 2-labels 3-labels 4-labels Fig. 5. Distribution of the rules with regards to their labels. F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 1099 MMAC is able to extract rules with multiple classes enables domain experts to benefit from this additional useful infor- mation. In addition, these multi-label rules can contribute in the prediction step, possibly improving upon the classi- fication accuracy. The numbers of rules produced by C4.5, PART, CBA and MCAR algorithms are listed in Table 3. Since MMAC produces rules with multiple classes, we did not record their numbers of rules generated for fair comparison. The values in Table 3 show that C4.5 always generates more rules than PART, CBA and MCAR. This contradicts some earlier results reported in Liu et al. (1998) and Thabtah et al. (2005) on classifier sizes obtained against UCI data collec- tion (Merz & Murphy, 1996), which show that associative classification approaches like CBA and MCAR normally generate more rules than decision trees. For this reason, we performed extensive analysis on the classifiers derived by C4.5 from the optimisation data sets. After analysing the decision trees constructed by C4.5 algorithm at each iteration, we observed that many rules are generated, which do not cover any training data. We found out that the reason for these many useless rules appear to be the attribute that C4.5 splits the training instances on, if that attribute has many distinct values and only few of these values appear in the training data, then a rule for each branch will be generated, and hence only some of these branches cover training instances. The rest will represent rules that cover not even a single training instance. In other words, when the training data set consists of attributes that have several distinct values and a split occurs, the expected number of rules to be derived by C4.5 can be large. Since data sets used to derive the results in Table 3 contain four attributes where each one of them has 10 different values (low-level heuris- tics) and some of these low-level heuristics are never result in solution improvement for the hyperheuristic, this explains the large numbers of rules derived by C4.5. 5. Conclusions In this paper, we have studied data sets produced from a complex personnel scheduling problem, called the training scheduling problem. These data sets represent solutions generated by a general hybrid approach, called the Peckish hyperheuristic, which is a robust and general-purpose opti- misation heuristic that requires us to decide which of sev- eral simpler low-level heuristic techniques to apply at each step while building the schedule. Our study focused on analysing the behaviour of low-level heuristics that were selected by the hyperheuristic and improved upon the qual- ity of the current solution in order to extract useful rules. These rules can be used later to quickly predict the appro- priate low-level heuristics to call next. For this purpose, we have compared five data mining classification algorithms, (PART, C4.5, CBA, MCAR, MMAC) on 16 different solu- tions produced by Peckish hyperheuristic. The experimental tests showed a better performance for associative classification techniques (MCAR, MMAC, CBA) over decision trees (C4.5), rule induction (RIPPER) and PART algorithm with reference to the accuracy of pre- dicting the appropriate set of low-level heuristics. Since the MMAC algorithm was able to produce rules with multiple classes, including very useful information that the hype- rheuristic can use in forecasting the behaviour of low-level heuristic while constructing a new solution. It is the most applicable data mining approach, which can be used to pre- dict low-level heuristic performance within the Peckish hyperheuristic. Furthermore, C4.5 generated more rules than the other rule learning algorithms since useless rules were extracted by the C4.5 algorithm, which have not even a single repre- sentation in the training data. The reason of these useless rules turn out to be the training data attributes, in which when some of these attributes are associated with many distinct values and only a subset of these values have suf- ficient representation in the training data, there will be valid rules for this subset and the rest will represent useless rules. References Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rule. In Proceedings of the 20th international conference on very large data bases (pp. 487–499). Aicklen, U., & Dowsland, K. (2000). Exploiting problem structure in a genetic approach to a nurse rostering problem. Journal of Scheduling, 3, 139–153. Ali, K., Manganaris, S., & Srikant, R. (1997). Partial classification using association rules. In D. Heckerman, H. Mannila, D. Pregibon, & R. Uthurusamy (Eds.), Proceedings of the third international conference on knowledge discovery and data mining (pp. 115–118). Blum, C., & Roli, A. (2003). Metaheuristics in combinatorial optimisa- tion: overview and conceptual comparison. ACM Computing Surveys (CSUR) (pp. 268–308). CBA (1998). . Cendrowska, J. (1987). PRISM: an algorithm for inducing modular rules. International Journal of Man-Machine Studies, 27(4), 349–370. Table 3 Number of rules of C4.5, PART, CBA and MCAR on the optimisation data sets Data C4.5 PART CBA MCAR 1 11 3 3 14 2 55 16 18 32 3 46 19 12 28 4 82 17 11 33 5 46 17 10 31 6 19 16 23 31 7 91 69 1 20 8 100 51 3 9 9 163 71 2 12 10 163 145 1 21 11 46 21 10 22 12 64 19 22 37 13 46 19 6 33 14 64 24 23 42 15 55 23 16 31 16 64 56 5 18 1100 F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 Cohen, W. (1995). Fast effective rule induction. In Proceedings of the 12th international conference on machine learning (pp. 115–123). CA: Morgan Kaufmann. Cowling, P., & Chakhlevitch, K. (2003). Hyperheuristics for managing a large collection of low level heuristics to schedule personnel. In Proceedings of IEEE conference on evolutionary computation (pp. 1214– 1221). Canberra, Australia. Cowling, P., Kendall, G., & Han, L. (2002) An investigation of a hyperheuristic genetic algorithm applied to a trainer scheduling problem. In Proceedings of congress on evolutionary computation (CEC 2002) (pp. 1185–1190). Hilton Hawaiian Village Hotel, Hono- lulu, Hawaii, May 12–17, ISBN 0-7803-7282-4. Cowling, P., Kendall, G., & Soubeiga, E. (2000). A hyperheuristic approach to scheduling a sales summit. In Proceedings of the third international conference of practice and theory of automated timetabling (PATAT 2000). LNCS (vol. 2079, pp. 176–190). Springer. Fayyad, U., & Irani, K. (1993). Multi—interval discretisation of contin- ues-valued attributes for classification learning. In Proceedings ofI JCAI (pp. 1022–1027). Frank, E., & Witten, I. (1998). Generating accurate rule sets without global optimisation. In Proceedings of the fifteenth international conference on machine learning (pp. 144–151). Madison, Wisconsin: Morgan Kaufmann. Furnkranz, J., & Widmer, G. (1994). Incremental reduced error pruning. In Machine learning: Proceedings of the 11th annual conference. New Brunswick, New Jersey: Morgan Kaufmann. Hamiez, J., & Hao, J. (2001). Solving the sports league scheduling problem with Tabu search. Lecture notes in artificial intelligence (vol. 2148, pp. 24–36). Springer. Hansen, P., & Mladenovic, N. (1997). Variable neighbourhood search. Computer and Operations Research, 1097–1100. Li, W., Han, J., & Pei, J. (2001). CMAR: accurate and efficient classification based on multiple-class association rule. In Proceedings of the ICDM’01 (pp. 369–376). CA: San Jose. Liu, B., Hsu, W., & Ma, Y. (1998). Integrating classification and association rule mining. In Proceedings of the KDD (pp. 80–86). New York, NY. McAloon, K., TretKoff, C., & Wetzel, G., 1997. Sports league scheduling. In Proceedings of the third ILOG optimisation suite international users conference. Paris, France, 1997. Merz, C., & Murphy, P. (1996). UCI repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science. Quinlan, J. (1979). Discovering rules from large collections of examples: a case study. In D. Michie (Ed.), Expert systems in the micro-electronic age (pp. 168–201). Edinburgh: Edinburgh University Press. Quinlan, J. (1987). Generating production rules from decision trees. In Proceedings of the 10th international joint conferences on artificial intelligence (pp. 304–307). CA: Morgan Kaufmann. Quinlan, J. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufman. Thabtah, F., Cowling, P., & Peng, Y. (2004). MMAC: a new multi-class, multi-label associative classification approach. In Proceedings of the fourth IEEE international conference on data mining (ICDM ’04) (pp. 217–224) Brighton, UK. (Nominated for the Best paper award). Thabtah, F., Cowling, P., & Peng, Y. (2005). MCAR: multi-class classification based on association rule approach. In Proceeding of the 3rd IEEE international conference on computer systems and applications (pp. 1–7). Cairo, Egypt. WEKA (2000): Data Mining Software in Java: . Witten, I., & Frank, E. (2000). Data mining: practical machine learning tools and techniques association rules. In Proceedings of the 3rd KDD conference (pp. 283–286). Yin, X., & Han, J. (2003) CPAR: classification based on predictive association rule. In Proceedings of the SDM (pp. 369–376). San Francisco, CA. F. Thabtah, P. Cowling / Expert Systems with Applications 34 (2008) 1093–1101 1101 Mining the data from a hyperheuristic approach using associative classification Introduction The training scheduling problem and hyperheuristics Data mining for the selection of low-level heuristics Associative classification Classification based on association (CBA) MCAR: multi-class classification based on association rule Multi-class, multi-label associative classification (MMAC) Traditional classification approaches C4.5 Repeated incremental pruning to produce error reduction algorithm (RIPPER) PART Data and experimental results Data sets and their features Experimental results Conclusions References work_3hopjr7buneu3mpkb67fz4jd6a ---- wp-p1m-39.ebi.ac.uk Params is empty 404 sys_1000 exception wp-p1m-39.ebi.ac.uk no 221017040 Params is empty 221017040 exception Params is empty 2021/04/06-03:06:06 if (typeof jQuery === "undefined") document.write('[script type="text/javascript" src="/corehtml/pmc/jig/1.14.8/js/jig.min.js"][/script]'.replace(/\[/g,String.fromCharCode(60)).replace(/\]/g,String.fromCharCode(62))); // // // window.name="mainwindow"; .pmc-wm {background:transparent repeat-y top left;background-image:url(/corehtml/pmc/pmcgifs/wm-nobrand.png);background-size: auto, contain} .print-view{display:block} Page not available Reason: The web page address (URL) that you used may be incorrect. Message ID: 221017040 (wp-p1m-39.ebi.ac.uk) Time: 2021/04/06 03:06:06 If you need further help, please send an email to PMC. Include the information from the box above in your message. Otherwise, click on one of the following links to continue using PMC: Search the complete PMC archive. Browse the contents of a specific journal in PMC. Find a specific article by its citation (journal, date, volume, first page, author or article title). http://europepmc.org/abstract/MED/ work_3ikurlx7ifgmperterkvnzglka ---- Intelligent modeling of e-business maturity Intelligent modeling of e-business maturity George Xirogiannis a,*, Michael Glykas b a University of Piraeus, Department of Informatics, 80, Karaoli & Dimitriou St., 185 34 Piraeus, Athens, Greece b University of Aegean, Department of Financial and Management Engineering, 31, Fostini Street, 82 100 Chios, Greece Abstract E-business has a significant impact on managers and academics. Despite the rhetoric surrounding e-business strategy formulation mechanisms, which support reasoning of the effect of strategic change activities to the maturity of the e-business models, are still emerg- ing. This paper describes an attempt to build and operate such a reasoning mechanism as a novel supplement to e-business strategy for- mulation exercises. This new approach proposes the utilization of the fuzzy causal characteristics of Fuzzy Cognitive Maps (FCMs) as the underlying methodology in order to generate a hierarchical and dynamic network of interconnected maturity indicators. By using FCMs, this research aims at simulating complex strategic models with imprecise relationships while quantifying the impact of strategic changes to the overall e-business efficiency. This research establishes generic adaptive domains – maps in order to implement the inte- gration of hierarchical FCMs into e-business strategy formulation activities. Finally, this paper discusses experiments with the proposed mechanism and comments on its usability. � 2006 Elsevier Ltd. All rights reserved. Keywords: Fuzzy cognitive maps; E-business modeling; Strategy planning; Decision support 1. Introduction Today, there is an increasing demand for a strategic- level assessment of e-business capabilities that can be assembled and analyzed rapidly at low cost and without significant intrusion into the subject enterprises. The bene- fits from completing such an exercise are quite straightfor- ward, for instance, identification of significant strengths and weaknesses, establishment of a rationale for action, a reference point for measuring future progress, etc. This paper proposes a novel supplement to strategic- level maturity assessment methodologies based on fuzzy cognitive maps (FCMs). This decision aid mechanism pro- poses a new approach to supplement the current status analysis and objectives composition phases of typical e- business strategy formulation projects, by supporting ‘‘intelligent’’ modeling of e-business maturity and ‘‘intelli- gent’’ reasoning of the anticipated impact of e-business strategic change initiatives. The proposed mechanism uti- lizes the fuzzy causal characteristics of FCMs as a new modeling technique to develop a causal representation of dynamic e-business maturity domains. This research pro- poses a holistic set of adaptive domains in order to generate a hierarchical network of interconnected e-business matu- rity indicators. The proposed mechanism aims at simulat- ing the operational efficiency of complex hierarchical strategy models with imprecise relationships while quan- tifying the impact of strategic alignment to the overall e-business efficiency. Also, this paper proposes an updated FCM algorithm to model effectively the hierarchical and distributed nature of e-business maturity. This application of FCMs in modeling the maturity of e-business is considered to be novel. Moreover, it is the belief of this paper that the fuzzy reasoning capabilities enhance considerably the usefulness of the proposed mech- anism while reducing the effort to identify precise maturity measurements. The proposed model has both theoretical and practical benefits. Given the demand for effective strategic positioning of e-business initiatives, such a suc- cinct mechanism of conveying the essential dynamics of 0957-4174/$ - see front matter � 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2006.01.042 * Corresponding author. E-mail addresses: georgex@unipi.gr (G. Xirogiannis), mglikas@ aegean.gr (M. Glykas). www.elsevier.com/locate/eswa Expert Systems with Applications 32 (2007) 687–702 Expert Systems with Applications mailto:georgex@unipi.gr mailto:mglikas@ aegean.gr mailto:mglikas@ aegean.gr e-business fundamental principles is believed to be useful for anyone contemplating or undertaking an e-business strategy formulation exercise. Primarily, the proposed model targets the principle beneficiaries and stakeholders of strategy formulation projects (enterprise top administra- tion, strategic decision makers, internal auditors, etc) assisting them to reason effectively about the status of e-business maturity metrics, given the (actual or hypothet- ical) implementation of a set of strategic changes. Never- theless, the explanatory nature of the mechanism can prove to be useful in a wider educational setting. This paper consists of five sections. Section 2 presents a short literature overview, Section 3 presents an overview of the FCM based system, while Section 4 discusses the new approach to e-business maturity modeling based on FCMs. Finally, Section 5 concludes this paper and briefly discuses future research activities. 2. Literature overview 2.1. E-business drivers E-business offers promise to apply web and other elec- tronic channel technologies to enable fully the integration of end-to-end processes. It involves both core and support business aspects, it focuses on information sharing effi- ciency, not just financial transactions. E-business primary objective is business improvement through: • Deployment of new technologies in the value chain. • Connection of the value chains between enterprises (B2B) and between enterprises and consumers (B2C) in order to improve service, exploit alternative dis- tribution/communication channels and support cost reduction due to the associated value chain optimi- zation. • Increase of the speed of information processing (mainly at real-time) and responsiveness by utilizing common information sources (both external and internal). E-business has a significant impact on every business function. Integrated information technology causes a shift in the value chain of the enterprise. It causes a considerable deflation of prices due to radical cost reductions, annihila- tion of profit margins, disintermediation of companies and industries due to the transparent product/service delivery to the end customer, increase in cross selling volumes and so forth. On the other hand, no industry is immune to intense competition due to chain reactions that affect all electronic network partners (Palmer, 2002). This may cause a higher level of uncertainty of future business prospects, but it is only fair to say that adaptive risk management may reduce such pitfalls. Also, the current enterprise valu- ation can be radical altered by this new business environ- ment therefore enterprises must reconsider their core competencies and strategies to maintain their competitive advantages. The new economy associated with e-business has broken down many of the traditional barriers. The fundamental shift in focus from optimizing the efficiency of individual enterprises to optimizing the efficiency of a network of enterprises for competitive advantage is a considerable challenge (Chung, Yam, Chan, & Potter, 2005). E-business activities now operate across an extended network of digi- tally connected partners to enable demand/capacity/price optimizations while offering self-service client relationships at multiple channels with a significant communication speed. It is the view of this paper that while e-business solution providers promise financial prosperity and sales volumes, case studies clearly indicate that awareness, targeted strate- gic planning and holistic organizational alignment are the key success factors for managing business in the digital age. Understanding the speed and scope of e-business impacts while generating the essential momentum forms the basis for setting realistic strategic priorities, mapping out a go-forward plan while evaluating the critical factors for e-business success. Effective service/product delivery through electronic channels requires efficient process con- trol and management of measurable targets, in order to maintain the necessary range of organizational buy-in, to manage risk and assure accountability. 2.2. Relevant research in business modeling 2.2.1. Modeling traditional business activities Enterprises usually employ modeling techniques to drive re-design activities and communicate the impact of internal change. Such modeling techniques may loosely fit within the area of decision support systems (Carlsson & Turban, 2002; Shim et al., 2002; Sprague & Watson, 1986). Several modeling approaches can be brought to bear on the task of supplementing business modeling activities. In particular the field of knowledge-based systems (Harmon & King, 1985; Metaxiotis, Psarras, & Samouilidis, 2003) could ful- fill the desire for more accurate predictive business model- ing tools. The research presented by Lin, Yang, and Pai (2002) proposed generic structures with no formal reason- ing capabilities to model traditional business processes, which could represent a business process in various con- cerns and multiple layers of abstraction. The research presented by Burgess (1998) modeled busi- ness process models with system dynamics to support the feasibility stage of business process re-engineering (BPR). Similarly, research (Burgess, 1998) modeled the interaction between competitive capabilities of quality and cost during total quality management (TQM) initiatives (Burgess, 1996). This model did not decompose hierarchical relation- ships, nor did it allow the connection of the sub-models. Finally, the model required formal definition of causal rela- tionships (e.g. functions), which posed a significant over- head in supplementing the business modeling exercise. The research presented by Crowe, Fong, Bauman, and Zayas-Castro (2002) reported the development of a tool 688 G. Xirogiannis, M. Glykas / Expert Systems with Applications 32 (2007) 687–702 https://isiarticles.com/article/3737 work_3jqspwm2ifdpjfftgomw57gcc4 ---- Identifying objects by touch: An “expert system” Perception & Psychophysics 1985. 37 (4). 299-302 Identifying objects by touch: An "expert system" ROBERTA L. KLATZKY University of California, Santa Barbara, California and SUSAN J. LEDERMAN and VICTORIA A. METZGER Queen's University, Kingston, Ontario, Canada How good are we at recognizing objects by touch? Intuition may suggest that the haptic system is a poor recognition device, and previous research with nonsense shapes and tangible-graphics displays supports this opinion. We argue that the recognition capabilities of touch are best as- sessed with three-dimensional, familiar objects. The present study provides a baseline measure of recognition under those circumstances, and it indicates that haptic object recognition can be both rapid and accurate. Many of you may remember a children's game in which the goal is to identify, by touch alone, several common objects plucked from a bag or a tray. The joy of playing lies in the discovery that success is possible. For those of us who have played the game and experienced that suc- cess, it may not be surprising that objects can be identi- fied through tactual exploration. However, the present report documents that this is no trivial observation. In a systematic study of haptic object identification, it was found that people are highly accurate and quite fast at iden- tifying a large number (100) of objects. We deem this observation surprising, because it has often been argued, both empirically and theoretically, that the haptic system (i.e., purposive touch, as defined in Gib- son, 1966) is inadequate for object identification, espe- cially when compared with vision. Two lines of research are used to support this claim. One compares haptic and visual identification of raised two-dimensional (and some- times fully three-dimensional) nonsense shapes (e.g., Bryant & Raz, 1975; Cashdan, 1968; Rock & Victor, 1964). The second uses tangible graphics displays-raised line drawings of objects, maps, graphs, etc.-in which raised symbols represent spatial and structural informa- tion (e.g., Lederman & Campbell, 1982; Lederman, Klatzky, & Barber, 1985; Magee & Kennedy, 1980). Such displays 'are intended to be read by hand rather than by eye. Although the research assesses the level of per- formance that can be achieved by touch alone, compari- sons with vision are nevertheless implicit. These studies demonstrate either that touch is ineffective for reading and This work was based on an honor psychology thesis conducted by the third author under the supervision of the second author at Queen's University, 1982. It was supported by Natural Sciences and Engineer- ing Research Council of Canada Grant A9854 to the second author. The first author acknowledges the support of the Center for Advanced Study in the Behavioral Sciences. Reprint requests may be sent to either the first or second author: R. Klatzky, Psychology-UCSB, Santa Barbara, CA 93106; or S. Lederman, Psychology, Queen's University, King- ston, Ontario, Canada K7L 3N6. identification or that touch is so dominated by vision that its contribution to pattern perception is minimal. However, we will suggest reasons why performance with arbitrary, unfamiliar three-dimensional or raised two- dimensional stimuli might underestimate the capacity for haptic object recognition. Further, we offer the present study as an "existence proof' that haptics can be highly effective for object identification. Our argument is simi- lar to one made by Reed, Durlach, and Braida (1982), who have been developing a sensory substitution system for the deaf. They suggest that the success of "Tadoma"-a method of understanding speech in which the "listener's" fingers and thumb(s) are placed on the cheeks, lips, and jaws of the speaker-is proof that the tactual system can process complex spatiotemporal pat- terns at rates close to those of auditory speech percep- tion. Similarly, we argue by demonstration that touch can perform a very common identification task with consider- able competence, despite its poor performance in other tasks (particularly those with less naturalistic stimuli). One reason for caution in generalizing results from studies with artificial objects or raised graphics displays to haptic performance as a whole is that such studies generally require pattern "apprehension" -obtaining in- formation about volumetric, topographical, and other at- tributes of the stimuli-as opposed to categorization. Even when artificial displays depict real objects and a categor- ical response is required, the stimuli generally fail to re- tain many of the properties of the objects themselves, such as temperature, size, or texture. The cues that these dis- plays do provide are usually dictated by the original visual master from which a raised replica was derived. It there- fore becomes necessary to determine the shape of the stimulus, perhaps even to form a visual image, in order to identify it. In contrast, real objects might be recognized on the basis of nonstructural cues. A kitchen sponge, for example, could be identified by its texture, without regard for its shape or size. As mentioned above, vision-touch comparisons have often been interpreted as evidence that haptic performance 299 Copyright 1985 Psychonomic Society, Inc. 300 KLATZKY, LEDERMAN, AND METZGER is poor. Several considerations, however, suggest that these comparisons are inappropriate to assess haptic ob- ject recognition. One concern is the degree of practice, which has been found to improve haptic discrimination performance (Gibson, 1966; Simons & Locher, 1979). A lack of familiarity with haptic identification of artifi- cial displays might be critical to its inferiority relative to vision. Another issue is whether the displays that have been used in previous research adequately allow for fun- damental differences between the visual and haptic sen- sory systems (Berla, 1982; Ikeda & Uchikawa, 1978; Lederman, 1979). For example, the resolving power of the fingertip is much less than that of the eye (Weinstein, 1968), and the tactile system may have inherent difficulty in representing stimulus orientation (Pick, Klein, & Pick, 1966). Both of these factors could influence haptic per- formance profoundly; yet stimulus construction cannot easily compensate for either. Consider that changing the size of a stimulus to accommodate the poor resolution of touch also changes the rate at which it can be explored and, accordingly, changes the temporal-integration and memory demands of the task (Berla, 1982). Similarly, us- ing explicit orientation cues may aid tactual apprehension (Lederman, 1979; Lederman & Campbell, 1982), but may also add irrelevant information that interferes with the per- ception of structural properties of the display. The foregoing suggests that in order to assess haptic object identification, one should avoid artificial objects or two-dimensional displays and, instead, use real objects. The cues that real objects provide are ecologically deter- mined, rather than based on a visual replica. Haptic manipulation of objects is commonplace and therefore familiar. Real objects maintain in full scale the attributes that contribute to haptic identification, and their proper orientation is determined by such intrinsic characteristics as principal axes, flat surfaces, and center of gravity. Thus, objects seem ideally suited to recognition through haptic exploration. Some extant studies suggest, in agreement with this reasoning, that haptic identification of common objects is quite accurate (Bigelow, 1981; Hoop, 1971; Schiff & Dytell, 1972; Simpkins, 1979). However, those investi- gations do not provide a general assessment of haptic iden- tification capabilities, because of various limitations- using very young subjects, a small sample of objects, or a task other than identification per se. The present study directly assessed adults' haptic identification of hand-size common objects that were readily identifiable through vi- sion. Its goal was to provide baseline measures of speed and accuracy. METHOD The subjects were students at Queen's University, 20-23 years of age. Three females and 2 males took part in a visual identifica- tion task, and 10 females and 10 males participated in a haptic iden- tification task. The stimuli were 100 common objects, of a size that could be held within the hands (see Appendix). None made a noise, either functionally or in the course of manual exploration, and none had a clear identifiable odor. Forty-one of the stimuli were selected from a list of objects (Snodgrass & Vanderwart, 1980) as having high (>70%) name agreement when they were presented pictorially; the remainder were selected by the experimenters as being unam- biguously identifiable by name. The objects were roughly classifi- able as personal articles, articles for entertainment, foods, cloth- ing, tools, kitchen supplies, office supplies, and household articles. There were from 8 to 23 items in each class. The visual identification task served as a pretest for verifying the nameability of the stimuli by sight. In addition to the 100 stimuli described above, an additional 10 were included, but were discarded when they were incorrectly named by a subject. A name was con- sidered correct if it was commonly applied to objects of the given type, was not commonly applied to distinctly different objects, and was not the name of a relatively abstract category. (Thus, for ex- ample, "thread" or "spool" would be acceptable for that object, but "dowel" or "sewing implement" would not.) Each object was placed, in turn, on a table before the subject, who was asked to name it. All of the stimuli ultimately used were named correctly by all five subjects. Although reaction times were not measured, they were generally brief, on the order of a second. For the haptic identification task, each subject sat at a table that had been padded with a towel to reduce noise. He or she was blind- folded and wore headphones through which white noise was deli- vered, in order to mask inadvertent noise from the exploration. The subject's chin was placed in a chinrest clamped to the table, and he or she made responses into a microphone that triggered a voice key for purposes of timing responses. The subject was free to use both hands and to pick up the objects while exploring. On each trial, the experimenter set an object on the padded table and turned on a tape recorder. (The order of objects was determined randomly for each subject.) She then tapped the subject's hand, to indicate that the object was available for exploration. When the sub- ject first touched the object, the experimenter pressed a finger switch, starting a timer that terminated with the subject's vocal response. The subject's task was to identify each object as quickly and ac- curately as possible, or, if he or she could not do so, to say, "I don't know." In addition, following vocalization of a name, the subject was asked to describe the properties that had been used to identify the object. RESULTS The principal dependent variables were reaction time (between the first tactile contact with the object and vocali- zation) and errors, either misnaming or omission ("1- don't-know" responses). Misnaming errors were further categorized as superordinate (giving the name of a higher order category, e.g., "vegetable" for a pumpkin), cate- gorically related (e.g., "sock" instead of "sweater"), corrected superordinate or related (following an initial su- perordinate or categorically related name with a correct response, e.g. "clothing ... sweater"), and categorically unrelated (e.g., "rock" for a potato). Of the 2,000 responses, only 83 (4.2 %) were errors. Four errors were omissions. There were 22 superordinate errors, 29 categorically related, 14 corrected superor- dinate or related, and 14 unrelated. Males and females did not differ in their error rates, by t test. Analysis of the reaction times for correct responses in- dicated that the model response latency was 1-2 sec and that 68 % of responses occurred within 3 sec of contact. Only 6% of responses took longer than 5 sec of contact. These data differed little by gender of subject. When mean response latencies were computed for each stimulus, all but two items had mean values of 5 sec or less. The devi- ant items were rice (mean latency = 6.6 sec) and T-shirt (mean latency = 10.6 sec). These items also accounted for relatively high proportions of the errors (6 per object). Another type of data was subjects' phenomenological reports of the object properties that had led to their iden- tifications. These were divided into 16 categories, derived from an initial sampling of the data, which distinguished among reports of a distinctive component and of the size, shape, texture, temperature, and function of the whole object or a component. Two independent raters assigned the reports to the 16 categories, with 85 % agreement. An average of 1.9 comments was made about each item. Global shape (e.g., of a whistle), global texture (e.g., of sandpaper), and presence of a distinct component (e.g., cap on a pen) were predominantly mentioned as the basis for identification (on 46%, 36%, and 35% of trials, respectively), with component texture (16%), global size (15%), and component shape (7%) next most often reported. DISCUSSION The principal finding from this study is that haptic iden- tification of a wide range of objects can be remarkably fast and accurate. Given the present scoring system, 96% of the naming responses were correct. With a more lenient system, allowing superordinate category names and related categorical responses, and permitting false starts, the accuracy rate would be 99 %. Moreover, 94 % of cor- rect (under strict scoring) names were given within a 5-sec interval. This performance is all the more remarkable when it is compared with haptic reading and identifica- tion performance observed in past research. Difficulties with raised drawings are informally confirmed in our laboratories as well; frequently, people are unable to iden- tify even simple outlines after 2-3 min of exploration! The observed levels of haptic identification are also con- siderably better than the level of identification of com- mon odors (Cain, 1979, 1982; Desor & Beauchamp, 1974; Engen & Ross, 1973)-generally estimated at about 40%-50%. An important factor underlying the relatively low level of identification by odor is label availability; thus, performance is subject to considerable improvement with practice (Cain, 1979; Desor & Beauchamp, 1974). One might argue, then, that by equating stimuli for the natural-language frequency of their labels, identification rates for touch and olfaction would be comparable. However, even after accounting for difficulties in associa- tive retrieval of names, a residual error in odor identifi- cation remains, apparently reflecting discrimination failures imposed by the sensory system itself (Cain, 1979). Odor identifications also have considerably longer laten- cies than the responses in the present experiment. In the Desor and Beauchamp study, unpracticed subjects took approximately 10 sec to give correct responses and highly HAPTIC OBJECT IDENTIFICATION 301 practiced subjects still averaged over 2.5 sec to respond. The rapid and accurate responses by unpracticed subjects in the present experiment suggest that the superiority of haptic over olfactory identification cannot be explained solely by superior accessibility to labels. Earlier it was suggested that studies with arbitrary con- figurations or two-dimensional simulations might underes- timate the capacity for haptic object identification, because these types of stimuli deprive the haptic system of some of its most effective cues. The present data provide sup- port for this suggestion. Most bases for identification that are emphasized in the phenomenological reports of these subjects-global shape, texture, and size cues-are not readily available with simulated objects. Raised-line draw- ings might appear to provide shape information, but the usual nature of that information is a projection to the reti- nal plane, not the tangible, grasped shape of haptic ex- ploration. Texture cues in raised drawings are normally minimal and usually arbitrarily assigned to the concepts they are intended to represent. Moreover, often only rela- tive size can be portrayed. Also relevant to this argument is a study (Krantz, 1972) that used factor analysis to assess the basis for haptic ob- ject identification. It identified five factors-amount of exertion needed to explore (related to compliance), rough- ness, size, temperature, and sharpness-none of which is adequately represented by raised-line drawing tech- niques in current use. Although we have argued that the haptic system is well equipped to identify familiar objects, we do not mean to claim that the perception of form through touch is gener- ally accurate and efficient. It is one thing to assign an ob- ject to a known category and quite another to apprehend its structural characteristics. Studies using artificial ob- jects and graphics displays appear well suited to assess haptic perception in this latter sense-and they assess it . as poor. In contrast, past experience with objects-visual as well as tactual-might enable categorization on the basis of minimal cues, without full apprehension of surface and shape. The present study indicates that, in this latter task, the haptic system can be very competent indeed, more competent than might be expected from intuition or ex- trapolation from experiments on haptic perception of form. We see the results of the current study as a neces- sary first (phenomenological) step towards developing a model of haptic object identification, one component of which may be a representation that is readily accessed by both haptics and vision (as suggested by Garbin & Bern- stein, 1984; Gibson, 1966). The present work also has implications for applied con- cerns. One application that motivated this research was the design of effective tangible graphics for the visually handicapped. Our study suggests that effective graphic aids should eschew simple mimicry of two-dimensional visual displays. Instead, they might incorporate the three- dimensionality of real objects and retain critical proper- ties such as texture. A rather different area in which research on haptic object identification might be applied 302 KLATZKY, LEDERMAN, AND METZGER is the development of intelligent robotic systems that use tactile sensing (Harmon, 1982). By understanding the mediators between haptic exploration and identification, we may discern levels of information representation that are appropriate for robots to simulate human behavior, thus rendering them capable of highly accurate identifi- cation performance. REFERENCES BERLA, E. P. (1982). Haptic perception of tangible graphic displays. In W. Schiff & E. Foulke (Eds.), Tactual perceptions: A sourcebook (pp. 364-386). Cambridge: Cambridge University Press. BIGELOW, A. E. (1981). Children's tactile identification of miniaturized common objects. Developmental Psychology, 17,111-114. BRYANT, P., & RAZ, I. (1975). Visual and tactual perception of shape by young children. Developmental Psychology, 11, 525-526. CAIN, W. S. (1979). To know with the nose: Keys to odor identifica- tion. Science, 203, 467-469. CAIN, W. S. (1982). Odor identification by males and females: Predic- tions vs performance. Chemical Senses, 7, 129-142. CASHDAN, S. (1968). Visual and haptic form discrimination under con- ditions of successive stimulation. Journal of Experimental Psychol- ogy Monograph, 76(2, Pt. I). DESOR, J. A., & BEAUCHAMP, G. K. (1974). The human capacity to transmit olfactory information. Perception & Psychophysics, 16, 551-556. ENGEN, L. T., & Ross, B. (1973). Long-term memory of odors with and without verbal descriptions. Journal ofExperimental Psychology, 100, 221-227. GARBIN, C. P., & BERSTEIN, I. H. (1984). Visual and haptic percep- tion of three-dimensional solid forms. Perception & Psychophysics, 36, 104-110. GIBSON, J. J. (1966). The senses considered as perceptual systems. Boston: Houghton Mifflin. HARMON, L. (1982). Automated tactile sensing. Robotics Research, 1, 3-32. Hoop, N. H. (1971). Haptic perception in preschool children. Ameri- can Journal of Occupational Therapy, 25, 340-344. IKEDA, M., & UCHIKAWA, K. (1978). Integrating time for visual pat- tern perception and a comparison with the tactile model. Vision Research, 18, 1565-1571. KRANTZ, M. (1972). Haptic recognition of objects in children. Journal of Genetic Psychology, 120,121-133. LEDERMAN, S. J. (1979). Tactual mapping from a psychologist's per- spective. Bulletin ofthe Association of Canadian Map Libraries, 32, 21-25. LEDERMAN, S., & CAMPBELL, J. (1982). Tangible graphs for the blind. Human Factors, 24, 85-100. LEDERMAN, S., KLATZKY, R. L., & BARBER, P. (1985). Spatial and movement-based heuristics for encoding pattern information through touch. Journal of Experimental Psychology: General, 114, 33-49. MAGEE, L. E., & KENNEDY, J. M. (1980). Exploring pictures tactu- ally. Nature, 283, 287-288. PICK, H. L., KLEIN, R. E., & PICK, A. D. (1966). Visual and tactual identification of form orientation. Journal ofExperimental Child Psy- chology, 4, 391-397. REED, C., DURLACH, N., & BRAIDA, L. (1982). Research on tactile com- munication of speech: A review. American Speech-Language-Hearing Association Monographs, No. 20, 1-23. ROCK, I., & VICTOR, J. (1964). Vision and touch: An experimentally created conflict between the two senses. Science, 143, 594-596. SCHIFF, W., & DYTELL, R. S. (1972). Deafand hearing children's per- formance on a tactual perception battery. Perceptual and Motor Skills, 35, 683-706. SIMONS, R. W., & LOCHER, P. J. (1979). Role of extended perceptual experience upon haptic perception of nonrepresentational shapes. Per- ceptual and Motor Skills, 48, 987-991. SIMPKINS, K. E. (1979). Tactual discrimination of household objects. Journal of Visual Impairment and Blindness, 73, 86-92. SNODGRASS, J. G., & VANDERWART, M. (1980). A standardized set of 260pictures: Norms for name agreement, image agreement. familiar- ity. and visual complexity. Journal ofExperimental Psychology: Hu- man Learning and Memory, 6, 174-215. WEINSTEIN, S. (1968). Intensive and extensive aspects of tactile sensi- tivity as a function of body pan, sex, and laterality. In D. R. Ken- shalo (Ed.), The skin senses (pp. 195-218). Springfield, II.: Thomas. APPENDIX Stimulus Objects Category Articles Personal comb, emery board, glasses, hair dryer, ring, swab, toothbrush, wallet, watch Entertainment balloon, baseball bat, baseball glove, birdie, crayons, golf ball, playing cards, record, tennis racket, whistle Clothing belt, boot, mitten, scarf, shoelace, sock, sweater, T-shirt, tie Foods carrot, cracker, egg, onion, potato, pump- kin, rice, tea bag Office Supplies binder, book, bow, calculator, clipboard, envelope, eraser, notebook, paper clip, paper pad, pen, pencil, pencil sharpener, ruler, sta- pler, tape Tools clamp, hammer, paintbrush, sandpaper, scis- sors, screw, screwdriver, twine, wrench Household ash tray, bandage, button, candle, clothespin. dust pan, electric cord, flashlight, flower pot, hook, key, light bulb, match book, padlock, rubber band, safety pin, scrub brush, sponge, thread, toilet paper, toothpicks, umbrella, watering can Kitchen Supplies baby bottle, bottle opener, bowl, fork, fun- nel, glass, kettle, knife, ladle, muffin pan, mug, plate, pot, spatula, strainer, wooden spoon (Manuscript received November 21. 1984: revision accepted for publication January 20, 1985.) work_3klorapqwnd2pmpmvo24rrzmta ---- Recommender system based on pairwise association rules Expert Systems With Applications 115 (2019) 535–542 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa Recommender system based on pairwise association rules Timur Osadchiy a , ∗, Ivan Poliakov a , Patrick Olivier a , Maisie Rowland b , Emma Foster b a Open Lab, School of Computing, Newcastle University, Newcastle upon Tyne, United Kingdom b Institute of Health and Society, Newcastle University, Newcastle upon Tyne, United Kingdom a r t i c l e i n f o Article history: Received 4 April 2018 Revised 9 July 2018 Accepted 10 July 2018 Available online 21 August 2018 Keywords: Association rules Cold-start problem Data mining Ontologies Recommender systems a b s t r a c t Recommender systems based on methods such as collaborative and content-based filtering rely on ex- tensive user profiles and item descriptors as well as on an extensive history of user preferences. Such methods face a number of challenges; including the cold-start problem in systems characterized by ir- regular usage, privacy concerns, and contexts where the range of indicators representing user interests is limited. We describe a recommender algorithm that builds a model of collective preferences indepen- dently of personal user interests and does not require a complex system of ratings. The performance of the algorithm is analyzed on a large transactional data set generated by a real-world dietary intake recall system. © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ ) 1 a t l 2 d b u p E f s u e a c r t s U N k m F i s p p m a i u fi i c w n M i b m ( c & h 0 . Introduction Recommender systems aim to identify consumer preferences nd accurately suggest relevant items (e.g. products, services, con- ent). They are used in various application domains, including on- ine retail, tourism and entertainment ( Covington, Adams, & Sargin, 016; Linden, Smith, & York, 2003 ). Widely adopted recommen- ation techniques often utilize collaborative filtering or content- ased recommendation methods ( Pazzani & Billsus, 2007 ). Collaborative filtering produces recommendations based on ser preference models that are generated from explicit and/or im- licit characteristics and metrics corresponding to user interests. xplicit indicators normally imply users assigning ratings to items; or example, to products viewed in, or purchased from, an online tore. Examples of implicit indicators include the amount of time sers spend interacting with content (e.g. watching a video) or lev- ls of interaction (e.g. scroll offset of a web page containing an rticle they are reading). Items that are positively rated or pur- hased by consumers with similar preference models are used as ecommendations for target users. User similarity can be expressed hrough correlations in purchasing history or ratings given to the ame products, which can be further amplified with demograph- ∗ Corresponding author at: Open Lab, School of Computing, Newcastle University, rban Sciences Building, 1 Science Square, Science Central, Newcastle upon Tyne, E4 5TG, United Kingdom. E-mail addresses: t.osadchiy@newcastle.ac.uk (T. Osadchiy), ivan.polia ov@newcastle.ac.uk (I. Poliakov), patrick.olivier@newcastle.ac.uk (P. Olivier), aisie.rowland@newcastle.ac.uk (M. Rowland), emma.foster@newcastle.ac.uk (E. oster). u u c m o a t ttps://doi.org/10.1016/j.eswa.2018.07.077 957-4174/© 2018 The Authors. Published by Elsevier Ltd. This is an open access article u cs (e.g. age, gender, occupation). Content-based filtering identifies imilarities between items based on a set of their descriptors (e.g. urpose of an item, author, artist, keywords). Items similar to those ositively rated or purchased by the target user, are used as recom- endations. Collaborative and content-based filtering recommender systems re therefore heavily dependent on extensive user- or item- profile nformation and are most effective when there is a rich history of ser preferences or behavior. Sparse data sets and lean user pro- les typically result in low-quality recommendations or an inabil- ty to produce recommendations at all. This is referred to as the old-start problem, where new users are added into the system ith empty behavior profiles or new items are added that have ot been reviewed or rated by anyone ( Shaw, Xu, & Geva, 2010 ). any solutions to the cold-start problem have been considered, ncluding hybrid methods that combine collaborative and content- ased filtering ( Schein, Popescul, Ungar, & Pennock, 2002 ), and ethods that aim to predict user preferences from demographics Lika, Kolomvatsos, & Hadjiefthymiades, 2014 ), or knowledge of so- ial relationships ( Carrer-Neto, Hernández-Alcaraz, Valencia-García, García-Sánchez, 2012 ). There are, however, a number of application contexts where sers interact anonymously; for example, online shops where an nregistered user browses and adds products to their basket to heck out later. In other contexts, such as email recipient recom- endation ( Roth et al., 2010 ), applications requiring high levels f privacy, or those where individual interactions with a system re necessarily infrequent (e.g. dietary surveys Bradley et al., 2016 ) here are no features and ratings to exploit, and the construction nder the CC BY license. ( http://creativecommons.org/licenses/by/4.0/ ) https://doi.org/10.1016/j.eswa.2018.07.077 http://www.ScienceDirect.com http://www.elsevier.com/locate/eswa http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2018.07.077&domain=pdf http://creativecommons.org/licenses/by/4.0/ mailto:t.osadchiy@newcastle.ac.uk mailto:ivan.poliakov@newcastle.ac.uk mailto:patrick.olivier@newcastle.ac.uk mailto:maisie.rowland@newcastle.ac.uk mailto:emma.foster@newcastle.ac.uk https://doi.org/10.1016/j.eswa.2018.07.077 http://creativecommons.org/licenses/by/4.0/ 536 T. Osadchiy et al. / Expert Systems With Applications 115 (2019) 535–542 o g g c l t t I a t a m fi a b t t u t D c fi l A B C o w t t a t q p e o 3 3 v d b p t w s i d i d D l m t 4 c t d f t of personal behavior and preference models is not possible. To ad- dress these challenges we describe a recommender algorithm that is independent of any personal user model and does not require a complex system of ratings. Based on a set of observed items se- lected by a user, the algorithm produces a set of items ranked by confidence of their being observed next. In designing the under- lying algorithm, we review existing methods that aim to address similar tasks, adapt them to meet the constraints of the application context that is our primary concern (dietary surveys), and propose a novel alternative. The performance of three methods is compared through the task of recommending omitted foods in a real world dietary recall system. 2. Related work While various approaches have been proposed to address the cold-start problem in recommender systems, the majority of these rely on knowledge of content ( Popescul, Pennock, & Lawrence, 2001 ) and users ( Lika et al., 2014 ), including social relationships between users ( Carrer-Neto et al., 2012 ), whereas our concern is with contexts where such information is not available. Shaw et al. addressed the cold-start problem in recommender systems by us- ing a data analysis technique, which is applied to large data sets for discovering items that frequently appear together in a single transaction. This technique is known as association rules ( Agrawal, Imielinski, & Swami, 1993; Shaw et al., 2010 ). Each association rule normally consists of a set of antecedent items that lead to a con- sequent item with a certain confidence. Pazzani and Billsus con- sider the list of topics of books users voted for as transactions, which allowed them to extract association rules for topics that fre- quently appeared together as part of a user’s interests ( Pazzani & Billsus, 2007 ). To expand preferences for each user, the algo- rithm then generates all possible combinations of topics for every book the user voted for and filters association rules, where the an- tecedent part of the rule matches one of the combinations. The consequent list of topics is added to the preferences of that user. In combination with a domain ontology association rules can be effectively employed for extracting, understanding and for- malizing new knowledge ( Ruiz, Foguem, & Grabot, 2014; Sene, Kamsu-Foguem, & Rumeau, 2018 ). However, association rules have to be adapted for recommendation tasks since they are primar- ily designed to be used as exploratory tools ( Rudin, Letham, Salleb-Aouissi, Kogan, & Madigan, 2011 ) to discover previously un- known relations that need to be analyzed for their interesting- ness ( Atkinson, Figueroa, & Pérez, 2013 ). As we will probably want to provide more specificity, and recommend the exact titles of the books instead of generic categories, this potentially leads to a vast number of mined association rules and matching all pos- sible combinations of the observed items may not result in rules being found. Furthermore, a consequent item may appear in mul- tiple matching rules, meaning that a function must be introduced that aggregates the confidences of found rules into a single score for the consequent item. Finally, only the associations with a sup- port (i.e. how often a rule holds as true across the data set) higher than a defined threshold are normally extracted ( Li & Deng, 2007 ). The produced list of rules is supposed to be of a reasonable size, to allow manual examination. In a recommender system, even as- sociations with low frequencies could still be relevant, if other rel- evant rules with higher confidence are not found. This requires the extraction of as many rules as possible, making the mining process a computationally expensive task ( Zheng, Kohavi, & Mason, 2001 ). Roth et al. (2010) introduced a method for building implicit so- cial graphs based on histories of interaction between users and es- timations of their affinity and applied it to the problem of email recipient recommendation. Based on a set of email addresses se- lected by a user (the seed group) the algorithm extracts all groups f contacts with whom the user has ever exchanged emails. Here a roup of contacts is a set of email contacts that were observed to- ether in a recipient list. For each of the contacts in each group, ex- luding members of the seed group, an interaction score is calcu- ated based on the volume of messages exchanged with that group, he recency of those messages, and the number of intersections of he seed group with the group of contacts that is being considered. nteraction scores of contacts that are present in multiple groups re aggregated into a single interaction score, which is then used o rank the set of email recommendations. The implicit social graph is a promising alternative to mining ssociation rules. It instead measures the confidence of recom- ended items based solely on observed transactions that are pre- ltered by intersections with given items. However, the method lso estimates the relevance of a group of recommended emails ased on the strength of social interaction of the target user with hat group, which is not a meaningful metric for applications hat do not assume communication (or other interaction) between sers. Association rules and implicit social graphs are data en- ities that represent item-to-group relationships. However, uMouchel and Pregibon (2001) suggested that a more effi- ient approach to discovering “interesting” associations is to first nd pairs of items that frequently appear together and then ana- yze larger sized item sets that contain those pairs. For example, if BC appears in a data set with a certain frequency, then pairs AB, C and AC would be at least as frequent as the triplet. Raeder and hawla (2011) effectively analyzed associations through a graph f individual items connected to each other with edges that are eighted by the frequency of two items appearing together. Items hat have stronger relationships with each other are compared o other items form clusters, which are then targeted for further nalysis. Similar to the implicit social graph this method avoids he need to mine all possible association rules, but without re- uiring any additional indicators of relevance except for the item airs frequencies. The relevance of produced recommendations is ffectively inf erred from the likelihood of their appearing with the bserved items. . Associated food recommender algorithm .1. Intake24 We introduce a new recommendation algorithm that was de- eloped for Intake24, a system for conducting 24-h multiple-pass ietary recall surveys ( Bradley et al., 2016 ). Intake24 is designed to e a cost-effective alternative to interviewer-led 24-h recalls and rovides respondents with a web-based interface through which hey enter their dietary intake for the previous day. Respondents ill likely only ever use the system if they are a part of a dietary tudy and only for a short period of time. Within a survey, a respondent typically records their dietary ntake for the previous day on three separate occasions. A single ay normally consists of four to seven meals (e.g. breakfast, morn- ng snack, lunch etc.) which include a selection of foods, drinks, esserts, condiments, and such (referred to generically as foods). uring the first step of a recall session, a respondent reports a ist of names of foods consumed during each of the previous day’s eals in a free-text format. For each text entry, the system re- urns a list of relevant foods selected from a taxonomy of around 800 foods, organized in a tree-based, multi-level structure. Spe- ific foods are terminal nodes of this taxonomy and are linked to heir nutrient values and portion size estimation methods. Respon- ents select one food from the returned list to add to their meal; or example: Coca-Cola (not diet); Beef, stir fried (meat only); Toma- oes, tinned; Basmati rice; Onions, fried; Chilli powder; Kidney beans . T. Osadchiy et al. / Expert Systems With Applications 115 (2019) 535–542 537 t 4 a o I t m s o d a t c i w t d m r s t n f t d p a p 3 a s o a r o a a o e t p t s s s o o 3 r s E q w s i r m t e l h d I n r s d d s t p 3 R s c p f t i s s i p m t ( ( i s t b i c d t c Completing an accurate recall requires respondents to be able o identify foods they ate from a database that covers more than 800 foods; for example, there are more than 30 types of bread lone. Thus, one of the key features in determining the usability f a dietary recall system is its presentation of food search results. f respondents are not able to readily identify items from a list re- urned in response to their textual description of the food, they are ore likely to select foods perceived as the closest match or even kip reporting the intake of that food. In other words, the relevance f search results, in terms of prioritizing them appropriately, may irectly affect the accuracy of dietary recall through level of effort nd time required to select the correct foods and report intake. The main application for the recommender algorithm is to au- omate the extraction of questions about foods that are commonly onsumed together (associated foods). In Intake24, this feature is mplemented as a link between an antecedent food (e.g. toast, hite bread ) and the consequent associated food category (e.g. but- er/margarine/oils ) along with a question that is asked if a respon- ent selects the antecedent food (e.g. Did you have any butter or argarine on your toast? ). Such food associations prompts are cur- ently hand-crafted by trained nutritional experts, which for thou- ands of foods is inevitably a time consuming process that is prone o omissions. Eating habits depend on region, culture, diet, and a umber of other factors, which requires defining new associations or every context in which a system is deployed. Furthermore, over ime new foods and recipes emerge and dietary trends change. In- eed, existing rules are often curated based only on personal ex- erience or previous research, and no published study has evalu- ted their appropriateness or explored alternative data driven ap- roaches to extracting such associations. .2. Generic procedure Our approach assumes that the patterns in eating behaviors of n observed population; that is, the respondents who took part in urveys conducted in a given country, has some relevance to those f an individual in that population. The recommender algorithm ssumes no prior knowledge about an individual except their cur- ently selected food items. The algorithm is trained on a large set f observed meals and produces a model of the eating behavior of given population, where a meal is a group of uniquely identifi- ble foods (e.g. vanilla ice cream, pear juice) reported to be eaten n a single occasion. Each individual food can be recorded as being aten only once during a meal. During the recommendation step, he resulting model accepts a set of foods, which we refer to as in- ut foods IF , and returns a set of recommended foods RF mapped o likelihoods of being reported along with IF (recommendation cores). IF are excluded from recommendations. In the following ection we discuss three possible implementations that were con- idered for the recommender algorithm. Along with the description f our methods we include examples of generated models and rec- mmendations for a sample transaction data set. .3. Association rules We introduce a recommender algorithm based on association ules (AR) that generates a model of eating behavior from a data et of meals (in the training step) in the form of association rules. ach rule consists of a set of antecedent foods and a single conse- uent food, together with the confidence that the consequent food ill be present in a meal given the antecedent foods that were ob- erved. The procedure for retrieving association rules is described n Agrawal et al. (1993) . The AR algorithm makes predictions from stored association ules with antecedent part antc similar to IF , and produces recom- endations from the consequent parts of the rules. To do so, AR akes association rules that have a consequent food that is differ- nt from any of IF and the antecedent foods antc that include at east one of IF ( Algorithm 1 ). The algorithm calculates the likeli- Algorithm 1: Recommendations based on association rules. function Recommend input : AM, association rules based model IF , foods selected by a respondent returns : RF , list of food recommendations 1 RF ← ∅ ; 2 foreach rule r l ∈ AM & r l.consequent / ∈ IF : 3 f ← rl.consequent; 4 if ∃ a f : a f ∈ rl.antecedent & a f ∈ IF : 5 if f / ∈ RF : 6 RF [ f ] ← 0 7 ant c ← rl.antecedent ; 8 c ← rl.con f idence ; 9 intr ← size ({ a f : a f ∈ antc & a f ∈ IF } ) ; 10 ms ← intr 2 / (size (antc) ∗ size (IF )) ; 11 RF [ f ] ← RF [ f ] + c ∗ ms 12 return RF ood of a recommended food f to be selected next as the confi- ence of the rule c multiplied by the similarity between antc and F (i.e. match score ms ). The match score ms is calculated as the umber of foods that appear both in IF and antc (i.e. intersections) aised to the power of two and divided by the size of IF and the ize of antc . We introduce the match score so that the recommen- ations from the rules with antc that are more similar to IF pro- uce recommendations that will appear higher. We then sum the cores for every f as its single recommendation score RF [ f ]. Recommendations produced by AR applied to the example ransaction data set { abcd, ade, de, ab } and given items { ab } are rovided in Table 1 . .4. Transactional item confidence We adapted the implicit social graph method described in oth et al. (2010) for our food recommendation task, which re- ulted in a recommender algorithm based on transactional item onfidence (TIC). One key difference to our food recommendation roblem is that the original email recipient recommendation task or which the implicit social graph was developed assumed two ypes of relationships between items in a data set (outgoing and ncoming emails). Our data set assumes only one type of relation- hip, which is the co-occurrence of foods in a meal. For that rea- on, TIC produces recommendations based on similarity of histor- cally observed transactions to IF and the frequency of foods ap- earing in those transactions. During the training step, TIC converts all reported meals to a ap of unique meals (or transactions) TM , so that there are no wo transactions of the same length containing the same foods Algorithm 2 ). For every food f in a transaction m , the confidence conditional probability) is calculated as TM [ m, f ] of f being present n m given that the rest of the foods from m were observed. To do o, the algorithm counts the number cf of reported meals that con- ain all the foods from m , excluding f , and divides it by the num- er cm of reported meals containing all of the foods from m . This s similar to the confidence measured in AR, but in this case we alculate it only for the full-sized meals that were observed in the ata set M , and not for all possible combinations of foods within hose meals. At the recommend step, the algorithm retrieves all transactions ontaining any of the input foods IF ( Algorithm 3 ). Within each of 538 T. Osadchiy et al. / Expert Systems With Applications 115 (2019) 535–542 Table 1 Recommender algorithm based on association rules applied to the example data set. Model based on AR Filtered rules Recommendations 1. a ⇒ b 0.67, d 0.67, c 0.33, e 0.33 1. a ⇒ d 0.67, c 0.33, e 0.33 d: 2.50 2. b ⇒ a 1.00, c 0.50, d 0.50 2. b ⇒ c 0.50, d 0.50 c: 1.96 3. c ⇒ a 1.00, b 1.00, d 1.00 6. a, b ⇒ c 0.50, d 0.50 e: 0.29 4. d ⇒ a 0.67, e 0.67, b 0.33, c 0.33 7. a , c ⇒ d 1.00 5. e ⇒ d 1.00, a 0.50 8. a , d ⇒ c 0.50, e 0.50 6. a, b ⇒ c 0.50, d 0.50 9. a , e ⇒ d 1.00 7. a, c ⇒ b 1.00, d 1.00 10. b , c ⇒ d 1.00 8. a, d ⇒ b 0.50, c 0.50, e 0.50 11. b , d ⇒ c 1.00 9. a, e ⇒ d 1.00 14. a, b , c ⇒ d 1.00 10. b, c ⇒ a 1.00, d 1.00 15. a, b , d ⇒ c 1.00 11. b, d ⇒ a 1.00, c 1.00 12. c, d ⇒ a 1.00, b 1.00 13. d, e ⇒ a 0.50 14. a, b, c ⇒ d 1.00 15. a, b, d ⇒ c 1.00 16. a, c, d ⇒ b 1.00 17. b, c, d ⇒ a 1.00 Algorithm 2: Training the model based on transactional item confidence. function Train input : M, data set of all meals returns : T M, map of unique meals with confidence for every food 1 T M ← ∅ ; 2 foreach meal m ∈ M : 3 if m / ∈ T M : 4 T M[ m ] ← ∅ 5 cm ← size ({ m 1 : m 1 ∈ M & m ∈ m 1 } ) ; 6 foreach food f ∈ m : 7 m 2 ← { f 1 : f 1 ∈ m & f 1 � = f } ; 8 c f ← size ({ m 3 : m 3 ∈ M & m 2 ∈ m 3 } ) ; 9 T M[ m, f ] ← c f /cm 10 return TM Algorithm 3: Recommendations based on transactional item confidence. function Recommend input : T M, map of unique meals with confidence for every food IF , foods selected by a respondent returns : RF , list of food recommendations 1 RF ← ∅ ; 2 foreach meal m ∈ T M : 3 if ∃ f 1 : f 1 ∈ m & f 1 ∈ IF : 4 foreach food f ∈ m & f / ∈ IF : 5 if f / ∈ RF : 6 RF [ f ] ← 0 7 con f ← m [ f ] ; 8 inter ← size ({ f 2 : f 2 ∈ m & f 2 ∈ IF } ) ; 9 RF [ f ] ← RF [ f ] + inter ∗ con f 10 return RF t v 3 d o a w P t a t C t a c t t c i the retrieved transactions m , foods f that are not included in IF are mapped to a score that is calculated as the number of intersections of m with IF (i.e. similarity) multiplied by the food’s confidence TM [ m, f ]. Multiple scores for f measured from different transactions are summed into a final recommendation score RF [ f ]. Recommendations produced by the TIC applied to an example ransaction data set { abcd, ade, de, ab } given items { ab } are pro- ided below ( Table 2 ). .5. Pairwise association rules Unlike the previous two algorithms, which produce recommen- ations from association rules and transactions similar to currently bserved IF , the recommender algorithm based on pairwise associ- tion rules (PAR) recommends foods that are likely to be observed ith any of IF in pairs. During the training stage ( Algorithm 4 ), AR for every observed food f counts the number OD [ f ] of meals Algorithm 4: Training the model based on pairwise associa- tion rules. function Train input : M, data set of all meals returns : P M, pairwise association rules 1 OD ← ∅ , food occurrences; 2 CD ← ∅ , food co-occurrences; 3 foreach meal m ∈ M : 4 foreach food f ∈ m : 5 if f / ∈ OD & f / ∈ CD : 6 OD [ f ] ← 0 ; 7 CD [ f ] ← ∅ ; 8 OD [ f ] ← OD [ f ] + 1 ; 9 foreach food f 1 ∈ m & f 1 � = f : 10 if f 1 / ∈ CD [ f ] : 11 CD [ f, f 1] ← 0 12 CD [ f, f 1] ← CD [ f, f 1] + 1 13 P M ← [ OD, CD ] ; 14 return PM hat contain that food. For every observed pair of foods { f, f 1}, it lso counts the number CD [ f, f 1] of reported meals that contain hat pair. At the recommend step ( Algorithm 5 ), PAR retrieves pairs D [ inf ], where one food inf is observed in IF . For every pair { inf, f }, he algorithm calculates the conditional probability p , of f being in meal, given that inf was observed as the number of meals that ontain that pair CD [ inf, f ], divided by the number of meals OD [ inf ] hat contain only inf . For example, if item A appeared 10 times in he data set and co-occurred with item B only 2 times, then the onditional probability that item B will occur the next time the A s present is 0.2. Multiple probabilities retrieved for f from differ- T. Osadchiy et al. / Expert Systems With Applications 115 (2019) 535–542 539 Table 2 Recommender algorithm based on transactional confidence applied to the example data set. Model based on TIC Filtered rules Recommendations 1. a 1.00, b 1.00, c 1.00, d 1.00 1. a 1.0, b 1.00, c 1.00, d 1.00 d: 3.00 2. d 1.00, a 0.50, e 0.50 2. d 1.00, a 0.50, e 0.50 c: 2.00 3. d 1.00, e 0.67 e: 0.50 4. a 1.00, b 0.67 Algorithm 5: Recommendations based on pairwise association rules. function Recommend input : P M, pairwise association rules IF , foods selected by a respondent returns : RF , list of food recommendations 1 RF ← ∅ ; 2 P ← ∅ , conditional probabilities of foods; 3 W ← ∅ , conditional probability weights; 4 OD ← P M[ OD ] , food occurrences ; 5 CD ← P M[ CD ] , food co-occurrences ; 6 foreach input food in f ∈ IF : 7 foreach food f ∈ CD [ in f ] & f / ∈ IF : 8 if f / ∈ P & f / ∈ W : 9 P [ f ] ← ∅ ; 10 W [ f ] ← ∅ 11 p ← CD [ in f, f ] / OD [ in f ] ; 12 P [ f ] ← P [ f ] + { p} ; 13 W [ f ] ← W [ f ] + { OD [ in f ] } 14 foreach food f ∈ P : 15 RF [ f ] ← sum (P [ f ]) ∗ sum (W [ f ]) 16 return RF e P i d t m g s a f c p R a 1 C w s t v 4 p p W f i s T R p d m a c o r r t W e p w e a l ( D 1 t a ( a s m fi t ( e b A c a n i Z p i a s fi o t ( 5 5 d nt associations are summed into its single recommendation score [ f ]. As demonstrated in Roth et al. (2010) the number of times tems are observed together is an important relevance metric. In- eed, if we simply aggregate the probabilities derived from mul- iple associations, we lose information as to whether a recom- ended food has ever been observed with all IF . For example, iven two input items C and D , the aggregation may produce two cores R CD (A ) = 0 . 5 , where item A appeared with both C and D , nd R C (B ) = 0 . 5 , where item B appeared only with item C . There- ore A should receive a higher score. Likewise we take into ac- ount the frequency of an input food inf that matched a retrieved air. For example, we may have two equal scores, R C (A ) = 0 . 5 and D (B ) = 0 . 5 , where A and B historically appeared only with items C nd D respectively; but C appeared 10 times and item D appeared 00 times, which implies that the recommendation produced by should have a higher score. For these reasons, the algorithm eights aggregated probabilities P [ f ] by multiplying them by the ummed frequency of inf . Recommendations produced by PAR applied to the example ransaction data set { abcd, ade, de, ab } given items { ab } are pro- ided in Table 3 . . Methodology We compare the three algorithms for 20 0 0 0 randomly sam- led meals, each containing no fewer than two foods, reported by articipants of various ages in the UK between 2014 and 2018. e also randomize the order of foods in each meal. We use k- old ( k = 10 ) cross validation to segment the data set into train- ng and testing sets ( Salzberg, 1997 ). On each step we use nine ubsets for training a model, leaving out one subset for testing. he testing procedure is similar to the procedure described in oth et al. (2010) : we sample a few foods from each meal (in- ut foods), leaving the rest (at least one food) to simulate respon- ents’ omitted foods to be guessed by the algorithm. Every trained odel makes predictions, starting from an input size of one food nd gradually incrementing it to five. In the course of the evaluation, we plot the precision-recall (PR) urves for every algorithm on every increment. For the purposes f the evaluation, we measure the recall as the percentage of cor- ect predictions out of the total number of foods selected by the espondent, and the precision as the percentage of correct predic- ions out of the total number of predictions made by the algorithm. e count predictions that were present in the set of foods actually ntered by the respondent (excluding input foods) as correct (true ositives). We analyze the quality of the top 15 recommendations, hich is a slightly larger size than viewed by most users ( Burges t al., 2005; Van Deursen & Van Dijk, 2009 ). As the measure of lgorithm ranking quality for every size of input foods we calcu- ate the mean value of Normalized Discounted Cumulative Gain nDCG) at rank 15 ( Burges et al., 2005 ) as nDC G 15 = DC G 15 /IDC G 15 . iscounted cumulative gain is measured as DCG 15 = ∑ 15 i =1 (2 r(i ) − ) / log (i + 1) , where r ( i ) is the relevance score of the i th food. As he relevance score, we use 0 for a wrong prediction and 1 for correct prediction. Thus, the Ideal Discounted Cumulative Gain IDCG) in our case is always 1, which is a single correct prediction s the first result. We then select the implementation that demon- trates the highest performance and apply it to the task of recom- ending foods omitted by respondents with a lower level of speci- city, and for ranking search results returned in response to their ext queries. For the implementation of AR we use the FP-growth algorithm frequent patterns algorithm) ( Li & Deng, 2007 ). FP-growth is an fficient and scalable association rules mining algorithm that is ased on building frequent-pattern tree structure. In contrast to priori-like algorithms that serve the same purpose, the FP-growth ompresses a large database into a much smaller data structure voiding costly repeated database scans and generation of a large umber of candidate sets. We use a parallel version of FP-growth mplemented in the Apache Spark framework ( Li, Wang, Zhang, hang, & Chang, 2008; Meng et al., 2016 ). As a parameter, this im- lementation accepts the minimum support for an item set to be dentified as frequent and the minimum confidence for the gener- ted association rules. To gather as many association rules as pos- ible we set both the minimum support and the minimum con- dence to the lowest value (3 × 10 4 ) that allows the completion f the mining process of our data set on our machine within a ime limit of 5 minutes. The evaluation is conducted on Mac Pro 2.9 GHz Intel Core i5, 16 GB). . Results .1. General performance As can be observed from the PR curves ( Figs. 1 and 2 ), PAR pro- uces the largest area under the curve, which increases with the 540 T. Osadchiy et al. / Expert Systems With Applications 115 (2019) 535–542 Table 3 Recommender algorithm based on pairwise association rules applied to the example data set. Model based on PAR Filtered rules Recommendations 1. a 3.0 ⇒ b 2.0, d 2.0, c 1.0, e 1.0 1. a 3.0 ⇒ d 2.0, c 1.0, e 1.0 d: 5.8 2. b 2.0 ⇒ a 2.0, c 1.0, d 1.0 2. b 2.0 ⇒ c 1.0, d 1.0 c: 4.2 3. c 1.0 ⇒ a 1.0, b 1.0, d 1.0 e: 1 4. d 3.0 ⇒ a 2.0, e 2.0, b 1.0, c 1.0 5. e 2.0 ⇒ d 2.0, a 1.0 Fig. 1. Precision-Recall curves for an input size of 2 foods. Fig. 2. Precision-Recall curves for an input size of 4 foods. Table 4 Mean training and recommendation times in mil- liseconds. Model Training Mean recommendation AR 3905.1 39.5 PAR 6904.9 2.5 TIC 93710.2 32.0 Fig. 3. The ratio of mean nDCG for the top 15 results to the number of input foods. Fig. 4. The ratio of recall for the 15 results to the number of input foods for pair- wise association rules with the first and the second levels of specificities and man- ually entered associated food prompts. 5 r f a f a i t i e r c c c size of input foods. PAR also demonstrates higher nDCG than TIC and AR for all input sizes ( Fig. 3 ). PAR is the second fastest algo- rithm to produce a model (after AR) but the fastest to produce a single set of recommendations ( Table 4 ). Based on this comparison, PAR is selected to be used for the implementation of the associ- ated foods recommender algorithm. At the same time, these results demonstrate that the quality of predictions produced by PAR is still relatively low. In the following experiments we aim to improve the performance of the algorithm by exploiting the context of the task it is used for. .2. Associated food questions To compare the efficacy of recommendations produced by the ecommender algorithm to the existing hand-coded associated ood questions we go through the same evaluation procedure as bove, except that on the recommend step a trained model returns ood categories instead of exact foods. In this case, true positives re considered to be foods selected by the respondent (excluding nput foods) that belong to one of the food categories predicted by he recommender algorithm. The taxonomy of foods implemented n Intake24 allows control of the specificity of the returned cat- gories. So, we demonstrate the performance of the algorithm in eturning the direct parent category of a food (first level, e.g. Flake ereals is the parent category for Choco flakes ) and a more generic ategory (second level, e.g. Breakfast cereals ) that is a parent of the ategory with the first-level specificity ( Fig. 4 ). Since the existing T. Osadchiy et al. / Expert Systems With Applications 115 (2019) 535–542 541 Table 5 Omitted foods captured with pairwise association rules but not with manually entered associated food prompts. Input foods First-level specificity Second-level specificity Chicken breast; Fanta; Instant potato Gravy Sauces, condiments, gravy and stock Bananas; Fruit and yoghurt smoothie; Semi skimmed milk Sugar Sugar, jams, marmalades, spreads and pates Blackcurrant squash (juice), e.g ribena; Heinz beans and sausages Brown bread toasted Brown, wholemeal and 50:50 bread Porridge, made with skimmed milk; Tea; White sugar Butter Butter/margarine/oils Tuna mayo sandwich; Volvic mineral water, still or fizzy Chocolate covered biscuits Sweet biscuits Bread sticks; Coffee Dips Pickles, olives, dips and dressings Cheese and tomato pizza (includes homemade); Raspberries Ice cream Ice cream & ice lollies Cheese sandwich; Tea Crisps and snacks Crisps, snacks and nuts Green Olives; Water Wine Alcohol Bottled mineral water; Chicken breast fillet; Chips, fried; Hot sauce Fizzy drinks Drinks Still energy drink, eg Lucozade Hydroactive, Gatorade, Powerade; Tuna in brine, tinned; White bread sliced Mayonnaise Sauces/condiments/gravy/stock a t r d f a w 7 l r o s I t l d o c d e f 5 s fi t t m 2 e e V w t t n t b a Fig. 5. The ratio of mean nDCG for the top 15 results to the number of input foods for the search results ranked based on pairwise association rules and FRC. u q p i s t a w o g 6 e w m i f a u p r p ssociated food questions do not store any relevance scores plot- ing their PR curves or assessing their nDCG is impossible. For that eason we compare the recall of the top 15 recommendations pro- uced by the algorithm to the recall of all hand-coded associated oods rules extracted for given input foods. In the simulation of respondents omitting foods hand-coded ssociation food rules recognize 8.3% of omitted foods at most, hereas the recommender algorithm’s peak recall is at 58.0% and 9.1% for the first and the second levels of specificity respectively. Table 5 includes examples of commonly forgotten foods estab- ished in the validation of Intake24 ( Bradley et al., 2016 ) but cor- ectly predicted by the recommender algorithm with two levels f specificity. At the time of writing this paper, none of these as- ociations were covered by hand-coded associated foods rules in ntake24. In addition to that, controlling the specificity of the re- urned recommendations allows us to address the cold-start prob- em, so that new foods that have not been reported by any respon- ents can still be captured by their categories. However, the names f some food categories predicted with the second-level specificity ould be perceived as too generic (e.g. ”Pickles, olives, dips and ressings”) and may require being assigned names that would be asier to understand by respondents when displayed in associated ood prompts. .3. Search ranking In response to a respondent’s text query, the existing Intake24 earch algorithm ranks foods based on two types of scores. The rst is the matching cost of the known food description against he query. The matching cost is based on several metrics, including he edit distance between matched words (the approximate string atching is performed using Levenshtein automata Burges et al., 005 ); phonetic similarity of words (using a pluggable phonetic ncoding algorithm that depends on the localization language, .g. Soundex or Metaphone for English Elmagarmid, Ipeirotis, & erykios, 2007 ); the relative ordering of words; the number of ords not matched; and so forth. The lower the matching cost, he better the food name matches the query. The second score is he likelihood of the food being selected, which is measured by the umber of times the food was previously reported. The results are hen sorted, first by decreasing food report count (FRC) and then y increasing matching cost. The evaluation of the associated foods recommender algorithm pplied to the task of ranking search results, follows the same eval- ation procedure, with some variations. In response to each text uery that was recorded into the Intake24 database for each re- orted food (excluding input foods), we retrieve a list of foods us- ng the existing search algorithm. We count foods selected by a re- pondent as true positives and the rest of the results as false nega- ives. We compare the mean nDCG produced by the existing search lgorithm and by the new search algorithm, where FRC is replaced ith PAR. As we can see from the figure below ( Fig. 5 ), PAR slightly utperforms FRC starting from an input size of two foods, with the ap gradually widening as the number of input foods increases. . Conclusions We aimed to address one of the key issues in automated di- tary assessment, which is unintentional under-reporting. To do so, e developed an associated foods recommender algorithm to re- ind respondents of omitted foods and improve the ranking qual- ty of search results returned in response to respondents’ free-text ood name queries. The algorithm, in contrast with collaborative nd content-based filtering approaches, is independent of personal ser profiles and does not require an extensive history of users’ references or a multitude of item descriptors. Instead, the algo- ithm uses transactions performed by respondents from a given opulation to build a collective model of preferences. 542 T. Osadchiy et al. / Expert Systems With Applications 115 (2019) 535–542 D E L L L L M P P R R R R S V Z We considered three approaches to the implementation of the recommender algorithm, based on an implicit social graph ( Roth et al., 2010 ), association rules ( Agrawal et al., 1993 ), and an- alyzing pairwise association rules ( DuMouchel & Pregibon, 2001 ). The evaluation, performed on a large data set of real dietary re- calls, has demonstrated that the implementation based on pair- wise association rules performs better for the defined task. By con- trolling the specificity of the produced recommendations within a reasonable level we achieved a recall of 79.1%. That is signifi- cantly higher than food associations hand-coded by trained nutri- tionists, the recall for which reached only 8.3%. Where a respon- dent filled in at least one food, the recommender algorithm im- proves the ranking of search results. The algorithm was evaluated on dietary recalls of respondents from the UK. As a future work we are planning to analyze how dietary specificities of different regions affect the accuracy of the recommender algorithm. Although the evaluation results described in this paper were produced by analyzing food contents of meals reported by respondents in Intake24, the described methods apply to any recommender tasks where selection of items by the target user can be observed (e.g. email recipients or tags recommenda- tions on community platforms). Supplementary material Supplementary material associated with this article can be found, in the online version, at doi: 10.1016/j.eswa.2018.07.077 . References Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. In Proc. 1993 ACM SIGMOD international conference on management of data, SIGMOD’93: 22 (pp. 207–216). ACM. doi: 10.1145/170036. 170072 . Atkinson, J. , Figueroa, A. , & Pérez, C. (2013). A semantically-based lattice approach for assessing patterns in text mining tasks. Computación y Sistemas, 17 (4) . Bradley, J., Simpson, E., Poliakov, I., Matthews, J. N. S., Olivier, P., Adamson, A. J., & Foster, E. (2016). Comparison of INTAKE24 (an online 24-h dietary recall tool) with interviewer-led 24h recall in 11–24 year-old. Nutrients, 8 (6), 358. doi: 10. 3390/nu8060358 . Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullen- der, G. (2005). Learning to rank using gradient descent. In Proceedings of the 22nd international conference on machine learning (pp. 89–96). ACM. doi: 10.1145/ 1102351.1102363 . Carrer-Neto, W., Hernández-Alcaraz, M. L., Valencia-García, R., & García- Sánchez, F. (2012). Social knowledge-based recommender system. Application to the movies domain. Expert Systems with Applications, 39 (12), 10990–110 0 0. doi: 10.1016/j.eswa.2012.03.025 . Covington, P., Adams, J., & Sargin, E. (2016). Deep neural networks for youtube rec- ommendations. In Proceedings of the 10th ACM conference on recommender sys- tems, RecSys ’16 (pp. 191–198). ACM. doi: 10.1145/2959100.2959190 . uMouchel, W., & Pregibon, D. (2001). Empirical bayes screening for multi-item associations. In Proceedings of the 7th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’01 (pp. 67–76). ACM. doi: 10.1145/ 502512.502526 . lmagarmid, A. K., Ipeirotis, P. G., & Verykios, V. S. (2007). Duplicate record detec- tion: A survey. IEEE Transactions on Knowledge and Data Engineering, 19 (1), 1–16. doi: 10.1109/TKDE.2007.250581 . i, C. R. J., & Deng, Z. H. (2007). Mining frequent ordered patterns without candi- date generation. In Proceedings 4th international conference on fuzzy systems and knowledge discovery, FSKD 2007: Vol. 1 (pp. 402–406). IEEE. doi: 10.1109/FSKD. 2007.402 . i, H., Wang, Y., Zhang, D., Zhang, M., & Chang, E. (2008). Pfp: Parallel fp-growth for query recommendation. In Proceedings of the 2008 ACM conference on recom- mender systems (pp. 107–114). ACM. doi: 10.1145/1454008.1454027 . ika, B., Kolomvatsos, K., & Hadjiefthymiades, S. (2014). Facing the cold start prob- lem in recommender systems. Expert Systems with Applications, 41 (4 PART 2), 2065–2073. doi: 10.1016/j.eswa.2013.09.005 . inden, G., Smith, B., & York, J. (2003). Amazon.com recommendations: Item-to- item collaborative filtering. IEEE Internet Computing, 7 (1), 76–80. doi: 10.1109/ MIC.2003.1167344 . eng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D., et al. (2016). Ml- lib: Machine learning in apache spark. The Journal of Machine Learning Research, 17 (1), 1235–1241. doi: 10.1145/2882903.2912565 . azzani, M. J., & Billsus, D. (2007). Content-based recommendation sys- tems. In The adaptive web: Vol. 4321 (pp. 325–341). Springer. doi: 10.1007/ 978- 3- 540- 72079- 9 _ 10 . opescul, A. , Pennock, D. M. , & Lawrence, S. (2001). Probabilistic models for uni- fied collaborative and content-based recommendation in sparse-data environ- ments. In Proceedings of 17th conference on uncertainty in artificial intelligence (pp. 437–4 4 4). Morgan Kaufmann Publishers Inc . aeder, T., & Chawla, N. V. (2011). Market basket analysis with networks. Social Net- work Analysis and Mining, 1 (2), 97–113. doi: 10.1007/s13278- 010- 0 0 03- 7 . oth, M., Ben-David, A., Deutscher, D., Flysher, G., Horn, I., Leichtberg, A., et al. (2010). Suggesting friends using the implicit social graph. In Proceedings of 16th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 233–242). ACM. doi: 10.1145/1835804.1835836 . udin, C. , Letham, B. , Salleb-Aouissi, A. , Kogan, E. , & Madigan, D. (2011). Sequential event prediction with association rules. In Proceedings of 24th annunal confer- ence on learning theory (pp. 615–634) . uiz, P. P. , Foguem, B. K. , & Grabot, B. (2014). Generating knowledge in maintenance from experience feedback. Knowledge-Based Systems, 68 , 4–20 . Salzberg, S. L. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1 (3), 317–328 . Schein, A. I., Popescul, A., Ungar, L. H., & Pennock, D. M. (2002). Methods and met- rics for cold-start recommendations. In Proceedings of the 25th annual interna- tional ACM SIGIR conference on research and development in information retrieval, SIGIR ’02 (pp. 253–260). ACM. doi: 10.1145/564376.564421 . ene, A. , Kamsu-Foguem, B. , & Rumeau, P. (2018). Discovering frequent patterns for in-flight incidents. Cognitive Systems Research, 49 , 97–113 . Shaw, G., Xu, Y., & Geva, S. (2010). Using association rules to solve the cold- start problem in recommender systems. In Pacific-Asia conference on knowl- edge discovery and data mining: 6118 (pp. 340–347). Springer. doi: 10.1007/ 978- 3- 642- 13657- 3 _ 37 . an Deursen, A. J., & Van Dijk, J. A. (2009). Using the Internet: Skill related problems in users’ online behavior. Interacting with Computers, 21 (5–6), 393–402. doi: 10. 1016/j.intcom.20 09.06.0 05 . heng, Z., Kohavi, R., & Mason, L. (2001). Real world performance of association rule algorithms. In Proceedings of 7th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’01 (pp. 401–406). ACM. doi: 10.1145/ 502512.502572 . https://doi.org/10.1016/j.eswa.2018.07.077 https://doi.org/10.1145/170036.170072 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0002 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0002 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0002 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0002 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0002 https://doi.org/10.3390/nu8060358 https://doi.org/10.1145/1102351.1102363 https://doi.org/10.1016/j.eswa.2012.03.025 https://doi.org/10.1145/2959100.2959190 https://doi.org/10.1145/502512.502526 https://doi.org/10.1109/TKDE.2007.250581 https://doi.org/10.1109/FSKD.2007.402 https://doi.org/10.1145/1454008.1454027 https://doi.org/10.1016/j.eswa.2013.09.005 https://doi.org/10.1109/MIC.2003.1167344 https://doi.org/10.1145/2882903.2912565 https://doi.org/10.1007/978-3-540-72079-9_10 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0015 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0015 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0015 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0015 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0015 https://doi.org/10.1007/s13278-010-0003-7 https://doi.org/10.1145/1835804.1835836 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0018 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0018 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0018 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0018 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0018 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0018 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0018 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0019 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0019 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0019 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0019 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0019 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0020 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0020 https://doi.org/10.1145/564376.564421 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0022 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0022 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0022 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0022 http://refhub.elsevier.com/S0957-4174(18)30441-X/sbref0022 https://doi.org/10.1007/978-3-642-13657-3_37 https://doi.org/10.1016/j.intcom.2009.06.005 https://doi.org/10.1145/502512.502572 Recommender system based on pairwise association rules 1 Introduction 2 Related work 3 Associated food recommender algorithm 3.1 Intake24 3.2 Generic procedure 3.3 Association rules 3.4 Transactional item confidence 3.5 Pairwise association rules 4 Methodology 5 Results 5.1 General performance 5.2 Associated food questions 5.3 Search ranking 6 Conclusions Supplementary material References work_3lzp6xtt5javxo757ggrue4gse ---- PII: S0957-4174(03)00066-6 An application of expert systems to botanical taxonomy W. Fajardo Contreras a , E. Gibaja Galindo a,*, A. Bailón Morillas b , P. Moral Lorenzo a a Universidad de Granada, E.T.S. de Ingenierı́a Informática, Departamento de Ciencias de la Computación e Inteligencia Artificial, C/Periodista Daniel Saucedo Aranda, 18071 Granada, Spain. b Universidad de Almerı́a, Escuela Politécnica Superior, Departamento de Lenguajes y Computación, Carretera Sacramento s/n, 04120 La Cañada de San Urbano, Almerı́a, Spain. Abstract The implementation of intelligent systems is not particularly widespread in the field of Botany and even less so on Internet. At present, we can currently only find hypertext documents or databases which store unprocessed information. The GREEN (Gymnosperms Remote Expert Executed Over Networks) System is presented as the application of Artificial Intelligence techniques to the problem of botanical identification. GREEN is an Expert System for the identification of Iberian Gymnosperms which allows online queries to be made. It can be consulted in: http://drimys.ugr.es/experto/index.html q 2003 Elsevier Ltd. All rights reserved. Keywords: Gymnosperms; Identification keys; Expert Systems; Artificial Intelligence; World Wide Web; Iberian Peninsula 1. Introduction Plant Taxonomy is a complex, meticulous science which allows taxa to be identified by retrieving information contained on them in a classification system. There are various ways which this identification may be carried out, although the one most commonly used employs dichotomic keys (a process which requires knowledge of botanical terminology and organography). As a result of the complex- ity of this process, botany-related activities are not particularly automated. In fact, the systems which exist are basically databases which store files on the specimens. Artificial Intelligence can offer a more productive approach to these systems by processing the information they contain in order to obtain knowledge which has not been stored explicitly in the database. Within the wealth and variety offered by the plant kingdom, the subject of scientific disclosure has been dealt with using Artificial Intelligence techniques with a specific study of the group of Gymnosperms (Gymnospermae) in the Iberian peninsula. This group was chosen due to the presence of important forest species which it contains. In addition, many of these offer resources or are cultivated as ornamental, which makes their identification useful for non botanical expert users. This has all given rise to GREEN (Gymnosperms Remote Expert Executed Over Networks), a pioneering system in the application of Artificial Intelligence techniques to the field of botany. GREEN is an online decision aid system, resulting in a much greater and faster diffusion of knowledge and a broader receptor spectrum. 2. Material and methods We have divided this study on GREEN into 5 parts: † A first part (Sections 2.1 and 2.2) in which we describe the structure of the system, and defines the main modules which comprise the system and the knowledge gathered. † A second part (Sections 2.3, 2.4, 2.5, 2.6) in which we develop the process for acquiring and validating the knowledge available on the problem domain until a knowledge base is finally obtained. In this part, the processing of imprecise information, common to this type of problem, is also discussed. † A third part (Section 2.7) is devoted to the reasoning process which the System uses. 0957-4174/03/$ - see front matter q 2003 Elsevier Ltd. All rights reserved. doi:10.1016/S0957-4174(03)00066-6 Expert Systems with Applications 25 (2003) 425–430 www.elsevier.com/locate/eswa * Corresponding author. Tel.: þ34-958240468; fax.: þ 34-958243317. E-mail addresses: gibaja@decsai.ugr.es (E.G. Galindo). http://www.elsevier.com/locate/eswa † A fourth part (Section 2.8) in which we discuss other important features of the System. † Finally, we finish (Section 3) with conclusions drawn directly from what has been presented in this article and from the bibliography used. 2.1. System structure The system structure is directly derived from the way in which botanical experts work. Dichotomic keys of the type IF – THEN are used for the classification and recognition of plant species. That is to say, that each key leads to either another key or a plant species. In this way, when a botanist wants to classify a particular species, it is possible to distinguish: † A source of knowledge comprising all the available information on each plant species in the form of dichotomic rules. † A process of the use of this knowledge in order to solve the particular problem (keys are searched until a particular species is identified). This description coincides perfectly with that of a Knowledge-Based System and more specifically with that of a rule-based Expert System (Luger and Stubblefield, 1993) with: a Knowledge Base which stores knowledge about the domain of the problem in the form of rules and an Inference Engine which extracts information from the Knowledge Base. In addition to the two essential modules described in the previous paragraph and reflecting the ideal structure of a Knowledge-Based System, the System has: † An uncertainty processing module fitting the nature and subjectivity of the observer. † A justifying module which will explain the results achieved to the System in a language close to the natural language. † We will also add user support modules. † A multimedia database to reference known species. † A glossary of scientific terms to make the System more accessible to users who are not botanical experts. Additionally to design and implement a server which will deal with user (or client) requests and send the results by Internet is needed. In Section 2.2 we outline the process for the design and implementation of the System, detailing the Artificial Intelligence techniques which have been applied. 2.2. Knowledge gathered by the system The first stage is to determine its application domain, that is, the type of knowledge the System will manage. As we have mentioned, the group of Gymnosperms has been chosen from which information is provided on 46 taxa present in the Iberian Peninsula (Castroviejo et al., 1986) both autochthonous and cultivated. In addition to the Knowledge Base, which has been optimized in order to obtain results in the queries, the System gathers information on the System domain in other formats and these are incorporated into a multimedia database which provides images and data about its distribution and ecology and a glossary of botanical terms which make the arduous task of species identification easier and more enjoyable. 2.3. Knowledge acquisition and elicitation The first problem when developing the System is that the information available on the problem domain does not have a structure which may be directly translated to a Computer System. The information is dispersed, incomplete; it is imprecise and unstructured. In order to be able to represent the knowledge in an appropriate way, a process of knowledge acquisition and elicitation is needed, and on which the final functions of the System depend to a large extent. In order to begin the acquisition and elicitation process, we begin with different keys (Blanca and Morales, 1991, Font Quer 1979, López González 1982, Garcı́a Rollán, 1983, Krüssmann, 1972). We gather and summarize their information, thereby producing a list of diagnostic characters (descriptors or attributes) at family, genus, species and subspecies level. This hierarchical organization of the information offers the advantage of multilevel answers so that, even with little information, some objective may be reached in the higher levels of the hierarchy. This has a simple explanation: Generally, in order to reach an objective in the higher levels of the hierarchy only a small amount of information is needed, which is also what is observed more easily. Heuristically, this leads us to suppose that the minimum amount of information which the user knows will be that which will allow inference in the highest levels. As information becomes known, the response will be refined until the lower and less general levels of the hierarchy are reached. The more information we have, the more we will know, nevertheless, results may generally be obtained with little information. All information has subsequently been compared by observing nature and consulting herbalist documents and experts. The most important taxonomical characters in Gymnos- perms have been divided into different groups: general aspect of the taxon, characteristics of the leaf, of the branches, of the shoots, monoecious or dioecious, charac- teristics of the fructification (cone and ‘berry’ cone), of the seeds, and ecology of the taxon. With these characters, decision tables have been compiled (Durkin, 1994), which gather the identifying diagnostic characters for each taxon ‘Table 1’. As it is not W.F. Contreras et al. / Expert Systems with Applications 25 (2003) 425–430426 advisable for these tables to have many empty cells, they have been filled in since many were not necessary when the taxon were identified using the traditional method. Although initially filling in a table of this type supposes a greater effort than using dichotomic keys directly, this investment is easily compensated for since these will enable us to apply Artificial Intelligence techniques in order to obtain keys which are different from the standard ones. Botany uses identification keys, whereas applied Artifi- cial Intelligence techniques determine the minimum set of diagnostic characters in order to recognize the different taxa. Artificial Intelligence allows us to find determining characters, which exclude others, and this enables quicker identification than that provided by the traditional method. 2.4. Obtaining the Knowledge Base A set of rules (represented in the Knowledge Base) is obtained automatically from the tables. For this, we apply Artificial Intelligence learning techniques (Machine Learn- ing), in particular we modify the ID3 algorithm proposed by Quinlan (Ignizio, 1991), so that it allows us to obtain more than one rule per objective. For this: † We use Occam’s razor criterion as a heuristic for ramification (simple explanations are preferable to more complex explanations) quantifying this criterion through the use of the concept of entropy. In this way, rules of minimum length are created which exclude irrelevant knowledge, since irrelevant descriptors will not be taken into account. † We obtain a Knowledge Base, the content of which is more complete than that of the dichotomic keys, since it contains all the consistent rules which may be obtained according to the selected descriptors in order to determine the objectives. The rules provide a structuring of the knowledge which the user can understand and which is similar to the dichotomic keys used by expert botanists. When the System presents its conclusions in the form of rules, the user understands the reasoning followed by GREEN perfectly and the user becomes familiar with the reasoning process followed by the human experts who have contributed their knowledge to the System (learning). 2.5. Treatment of uncertainty Information about the domain is based on what normally happens, but every rule has its exceptions. As it is usual for some data not to be known with absolute certainty and since expert knowledge is not always defined with complete certainty, errors of measurement may be committed. But this does not mean that the information that we have should be rejected as not only are experts able to work with uncertainty but good results can also be obtained regardless. Given this large amount of sources of uncertainty, GREEN incorporates a module to deal with uncertainty. Uncertainty is modeled using certainty factors (Shortlife & Buchanan, 1975) since it is a simple computational model which allows experts to estimate confidence in each hypothesis and in the conclusion, facilitating the expression of subjective certainty estimations. This model also enables knowledge to be represented easily in the form of rules and has successfully been used in many other systems. 2.6. Consistency reinforcer During the development of the Knowledge Base, inconsistencies may arise mainly due to errors during the knowledge acquisition and elicitation stage or during the design or implementation of the technique for automatically obtaining the rules. Another important note is that GREEN is capable of accommodating uncertainty which is why inconsistencies about the certainty of results cause an additional impact. Consequently, this makes it necessary for GREEN to incorporate a consistency reinforcer which systematically analyzes each of the rules in the Knowledge Base in order to be able to detect possible errors (Gonzalez and Dankel Table 1 Decision table Arrangement of the ‘berry’ cones Color of the ‘berry’ cone Pruinose ‘berry’ cone Size of the ‘berry’ cone No. of seeds in the ‘berry’ cone Juniperus communis subsp. communis Axillary Bluish-black Yes Between 0.6 and 1 cm 3 Juniperus communis subsp. hemisphaerica Axillary Bluish-black Yes Between 0.6 and 1 cm 3 Juniperus communis subsp. alpina Axillary Bluish-black Yes Between 0.6 and 1 cm 3 Juniperus oxycedrus subsp. oxycedrus Axillary Brown No Between 0.6 and 1 cm 1 – 3 Juniperus oxycedrus subsp. badia Axillary Brown No More than 1 cm 1 – 3 (· · ·) (· · ·) (· · ·) (· · ·) (· · ·) (· · ·) W.F. Contreras et al. / Expert Systems with Applications 25 (2003) 425–430 427 1993) which have been introduced during the design process thereby guaranteeing that the Knowledge Base has been correctly designed and implemented. 2.7. System reasoning The Inference Engine provides the control mechanism and knowledge inference (a process used in an expert System in order to derive new information from information known). It combines the input facts with the knowledge gathered in the Knowledge Base thereby responding to user queries. In order to design the Inference Motor, Ignizio’s BASELINE with forward chaining has been taken as a model (Ignizio, 1991). The Inference Engine incorporated into the System is quite a different module from the Knowledge Base. This differentiation is important since: 1. Knowledge may be represented more naturally. The knowledge model together with the inference process reflects the problem-solving mechanism followed by a human being better than a model which incrusts knowledge within the inference process. 2. The System designers can focus on capturing and organizing the knowledge common to the problem domain independently of its implementation. 3. It enables the content of Knowledge Base to be changed without the need to change the control System so that a) the Knowledge Base may be updated without changing the Inference Engine b) a single Inference Engine may be used to solve different problems. 2.8. Other characteristics As we have already mentioned, GREEN is extremely easy to use (see Fig. 1). The specimen descriptors are grouped into general categories (general appearance, leaf, branch, cone, etc.) with names which are familiar to all users. Within each category, users select the descriptor they know and enter a value for the degree of belief. The System has been provided with two methods for entering the query: basic and advanced. In the basic mode, the user has a set of options, so that the use of certainty Fig. 1. A screen shot for the user interface for introduction of data. Author: Eva Lucrecia Gibaja Galindo. W.F. Contreras et al. / Expert Systems with Applications 25 (2003) 425–430428 Fig. 2. A screen shot for the user interface for identification results. Author: Eva Lucrecia Gibaja Galindo. Fig. 3. A screen shot for the user interface for additional information. Author: Eva Lucrecia Gibaja Galindo. W.F. Contreras et al. / Expert Systems with Applications 25 (2003) 425–430 429 factors is clear. In the advanced mode, the user must manually enter the certainty value of the observation. After entering the data, the inference process is executed and the System gives the user a set of results ordered according to how well they fit the query and an outline of the reasoning followed in order to reach these conclusions. If the user wishes, it is possible to increase the information about the specimen by accessing the multimedia database. GREEN is specifically designed to work on Internet which is why interaction with the user is carried out using forms which send the data and the queries to a remote server. The entire transfer of information online has been minimized so as not to overload the server and in order to obtain a satisfactory System response time for the user. GREEN has been designed independently of the type of botanical database on which it is employed, so that it may be easily adapted in order to classify species other than Gymnosperms. Figs. 1 – 4. 3. Conclusions 1. Computing offers new advantages to the popularization of Botany, including the production of automatic keys or computer-generated keys, which will make it possible for non-experts to identify plants. 2. In this paper, an expert System is presented which will offer the user a new ‘interactive’ species identification method. 3. The GREEN System is a practical tool which may be used online and which will enable different taxa comprising the Iberian Gymnosperm flora to be recognized. References Blanca, G., & Morales, C. (1991). Flora del Parque Natural de la Sierra de Baza. Granada: Servicio de Publicaciones de la Universidad de Granada. Castroviejo, S., Laı́nz, M., López González, G., Montserrat, P., Muñoz Garmendia, F., Paiva, J., & Villar, L. (1986). Flora Ibérica. Plantas vasculares de la Penı́nsula Ibérica e Islas Baleares (Vol. 1). Lycopodiaceae-Papaveraceae, Madrid: Real Jardı́n Botánico. Durkin, J. (1994). Expert systems. Design and development. London: Prentice Hall International. Font Quer, P. (1979). Diccionario de Botánica. Barcelona: Labor. Garcı́a Rollán, M. (1983). Claves de la flora de España (Vol. I). Penı́nsula y Baleares, Madrid: Mundi-Prensa. Gonzalez, A. J., & Dankel, D. D. (1993). The Engineering of knowledge- based systems. Theory and practice. Englewood Cliffs, NJ: Prentice- Hall International. Ignizio, J. P. (1991). Introduction to expert systems. The development and implementation of rule-based expert systems. New York: McGraw-Hill. Krüssmann, G. (1972). Manual of cultivated conifers. Portland: Timber Press. López González, G. (1982). La Guı́a de Incafo de los árboles y arbustos de la Penı́nsula Ibérica. Madrid: INCAFO. Luger, G. F., & Stubblefield, W. A. (1993). Artificial intelligence. Structures and strategies for complex problem solving. The Benjamin/Cummings series in artificial intelligence, Redwood City: Benjamin/Cummings. Shortlife, E., & Buchanan, B. G. (1975). A model of inexact reasoning in medicine. Mathematical Biosciences, 23, 351 – 379. Fig. 4. Other screen shot for the user interface for additional information. Author: Eva Lucrecia Gibaja Galindo. W.F. Contreras et al. / Expert Systems with Applications 25 (2003) 425–430430 An application of expert systems to botanical taxonomy Introduction Material and methods System structure Knowledge gathered by the system Knowledge acquisition and elicitation Obtaining the Knowledge Base Treatment of uncertainty Consistency reinforcer System reasoning Other characteristics Conclusions References work_3rdd7ykfkveu5asatfqcf24ohq ---- Expert systems for knowledge management: crossing the chasm between information processing and sense making Expert systems for knowledge management: crossing the chasm between information processing and sense making Y. Malhotra* Florida Atlantic University, 818 N.W. 89th Avenue, Fort Lauderdale, FL 33324, USA Abstract Based on insights from research in information systems, information science, business strategy and organization science, this paper develops the bases for advancing the paradigm of AI and expert systems technologies to account for two related issues: (a) dynamic radical discontinuous change impacting organizational performance; and (b) human sense-making processes that can complement the machine learning capabilities for designing and implementing more effective knowledge management systems. q 2001 Elsevier Science Ltd. All rights reserved. Keywords: Expert systems; Arti®cial intelligence; Knowledge management; Information systems; Information science; Business strategy; Discontinuous change; Sense making; Information processing ªThere has been an over-concentration on Shannon's de®nition of information in terms of uncertainty (a very good de®nition for the original purposes) with little attempt to understand how MEANING directs a message in a network. This, combined with a concen- tration on end-points (equilibria) rather than proper- ties of the trajectory (move sequence) in games has lead to a very unsatisfactory treatment of the dynamics of organizations.º Ð John H. Holland (personal communication, June 21, 1995) 1 1. Introduction The narrative cited above as an observation by the noted psychologist and computer scientist John Holland was in response to my query to him regarding the possibility of using intelligent information technologies for devising self-adaptive organizations. As meaning seems to be a crucial construct in understanding how humans convert information into action [and consequently performance], it is evident that information-processing based ®elds of arti®- cial intelligence and expert systems could bene®t from understanding how humans translate information into mean- ings that guide their actions. In essence, this issue is relevant to the design of both human- and machine-based knowledge management systems. Most such systems had been tradi- tionally based on consensus and convergence-oriented information processing systems, often based on mathema- tical and computation models. Increasing radical discontin- uous change (cf. Huber & Glick, 1993; Nadler, Shaw, & Walton, 1995) that characterizes business environments of today and tomorrow, however, requires systems that are capable of multiple Ð complementary and contradictory Ð interpretations. Despite observations made by Churchman (1971) and Mason and Mitroff (1973), the paradigm of information systems, arti®cial intelligence (AI) and expert systems have yet to address the needs posed by wicked environ- ments that defy the logic of pre-determination, pre- diction and pre-speci®cation of information, control and performance systems (cf. Malhotra, 1997). Wicked Expert Systems with Applications 20 (2001) 7±16PERGAMON Expert Systems with Applications 0957-4174/01/$ - see front matter q 2001 Elsevier Science Ltd. All rights reserved. PII: S 0 9 5 7 - 4 1 7 4 ( 0 0 ) 0 0 0 4 5 - 2 www.elsevier.com/locate/eswa * Tel.: 11-954-916-1585. E-mail address: yogesh.malhotra@brint.com (Y. Malhotra). 1 Considering organizational adaptation for survival and competence as the key driver for most organizational information and knowledge processes (cf. Malhotra, 2000a,b,c), it seemed logical to develop the model of IT-enabled self-adaptive organizations based upon technologies that are often considered as a benchmark for self-adaptive behavior. In this context, genetic algorithms (also referred to as adaptive computation) offer the closest archetype for devising technology-enabled organizations that could possibly exhibit self-adaptive behavior given the dynamically chan- ging environment. By offering the basis for evolution of solutions to parti- cular problems, controlling the generation, variation, adaptation and selection of possible solutions using genetically based processes, it seemed probable that genetic algorithms could offer the basis for self-adaptive evolution of organizations. As solutions alter and combine, the worst ones are discarded and the better ones survive to go on and produce even better solutions. Thus, genetic algorithms breed programs that solve problems even when no person can fully understand their structure. business environments Ð characterized by radical discontinuous change Ð impose upon organizations the need for capabilities for developing multiple mean- ings or interpretations and continuously renewing those meanings given the changing dynamics of the environ- ment. Scholars in business strategy have advocated human and social processes such as `creative abrasion' and `creative con¯ict' (cf. Eisenhardt, Kahwajy, & Bourgeois, 1997; Leonard, 1997) for enabling the inter- pretive ¯exibility (Nonaka & Takeuchi, 1995) of the organization. It is also evident that there is an imperative need for relating the static notion of information captured in data- bases or processed through computing machinery to the dynamic notion of human sense making. More importantly, our current understanding of information as the [indirect] enabler of performance can immensely bene®t from unra- veling the intervening processes of human sense making that are more directly related to action (or inaction) and resulting performance outcomes (or lack thereof). Based upon a review of the current state of AI and expert systems research and practice in knowledge management, this article develops the bases for AI and expert systems researchers to develop knowledge management systems for addressing the above needs. Section 2 provides an over- view of the state-of-the-art expert systems research and practice issues related to knowledge management highlight- ing key relationships with the key theses of the article. Section 3 offers a more current understanding of knowledge management as it relates to organizational adaptability and sustainability by drawing upon information systems and business strategy research. Section 4 highlights the contrast between the computational model of information processing and human sense making while recognizing both as valid meaning making processes. Finally, sense-making bases of human action and performance are discussed in Section 6, followed by conclusions and recommendations for future research in Section 8. 2. State of related research and practice in AI and expert systems Faced with uncertain and unpredictable business environ- ments, organizations have been turning to AI and expert systems to develop knowledge management systems that can provide the bases for future sustainability and compe- tence. For instance, faced with competition and uncertainty in the ®nance industry, banks are using neural networks to make better sense of a plethora of data for functions such as asset management, trading, credit card fraud detection and portfolio management (Young, 1999). Similarly, insurance and underwriting industries are relying upon knowledge management and AI technologies to offer multiple channels for rapid response to customers (Rabkin & Tingley, 1999). Many such knowledge management implementations using AI and expert systems rely upon the meaning making and sense-making capabilities of AI and expert systems technol- ogies and humans using them. In recent years, there have been signi®cant advances in endowing inanimate objects with limited sense-making capabilities characteristic of self-adaptive behavior of humans. For instance, some proponents of `perceptual intel- ligence' (cf. Pentland, 2000) have suggested such capabil- ities derived from a computers' ability to isolate variables of interest by classifying any situation based on categorization heuristics for taking appropriate action. Their suggestion is that once a computer has the perceptual ability to know who, what, when, where and why, then the probabilistic rules derived by statistical learning methods are normally suf®- cient for the computer to determine a course of action. However, these models, though helpful for procedural deci- sion making, need to advance beyond the static, pre-speci- ®ed and pre-determined logic to account for dynamically changing environments that may require fundamental and radical rede®nition of underlying rules as well as the beha- vior of the actors. Similarly, research on `perceptual interfaces' has been trying to unravel how people experience information that computers deliver (cf. Reeves & Nass, 2000). This stream of research is based on the premise that human experience with information is caused by stimulation of the senses. While paying attention to the chemical senses (taste and olfaction), the cutaneous senses (skin and its receptors), vision and hearing, this research has yet to take into consideration the interpretive, meaning making and sense-making processes that occur at a more cerebral level. The personal constructivist theory discussed in this article could help better relate information to meaning and consequent beha- vior (or actions) in above cases. Simultaneously, the state-of-art research and practice in data mining, often described as ªknowledge discovery from databases,º ªadvanced data analysis,º and machine learn- ing, has been trying to decipher how computers might auto- matically learn from past experience to predict future outcomes (Mitchell, 1999). However, as explained later, current thinking in business strategy is imposing upon the organization the need to move beyond prediction of future to anticipation of surprise (Malhotra, 2000a,b). The most advanced machine learning capabilities Ð such as those of the most advanced chess-playing computer (cf. Camp- bell, 1999) Ð are still limited by pre-speci®ed, pre- determined de®nition of problems that are solved based on the pre-speci®ed rules of the game. Though interesting, such capabilities may have limited use in the emerging game of strategy that is being rede®ned as it is being played. In such game, all ªrules are up for grabsº even though computational machinery has yet to evolve to the stage of sensing changes that it has not been pre-programmed to sense and to re-evaluate the rules embedded in the logic devised by human programmers. In contrast to machine learning, humans are endowed with Y. Malhotra / Expert Systems with Applications 20 (2001) 7±168 https://isiarticles.com/article/11857 work_3sxzzd3jjneznfcxhrsajjc32i ---- Data mining with various optimization methods Data mining with various optimization methods Vladimir Nedic a, Slobodan Cvetanovic b, Danijela Despotovic c, Milan Despotovic d,⇑, Sasa Babic e a Faculty of Phil. and Arts, University of Kragujevac, Jovana Cvijica bb, 34000 Kragujevac, Serbia b Faculty of Economics, University of Nis, Trg kralja Aleksandra Ujedinitelja 11, 18000 Nis, Serbia c Faculty of Economics, University of Kragujevac, Djure Pucara Starog 3, 34000 Kragujevac, Serbia d Faculty of Engineering, University of Kragujevac, Sestre Janjic 6, 34000 Kragujevac, Serbia e College of Applied Mechanical Engineering, Trstenik, Serbia a r t i c l e i n f o Keywords: Traffic noise Artificial intelligence Genetic algorithm Hooke and Jeeves Simulated annealing Particle swarm optimization Software a b s t r a c t Road traffic represents the main source of noise in urban environments that is proven to significantly affect human mental and physical health and labour productivity. Thus, in order to control noise sound level in urban areas, it is very important to develop methods for modelling the road traffic noise. As observed in the literature, the models that deal with this issue are mainly based on regression analysis, while other approaches are very rare. In this paper a novel approach for modelling traffic noise that is based on optimization is presented. Four optimization techniques were used in simulation in this work: genetic algorithms, Hooke and Jeeves algorithm, simulated annealing and particle swarm optimization. Two different scenarios are presented in this paper. In the first scenario the optimization methods use the whole measurement dataset to find the most suitable parameters, whereas in the second scenario optimized parameters were found using only some of the measurement data, while the rest of the data was used to evaluate the predictive capabilities of the model. The goodness of the model is evaluated by the coefficient of determination and other statistical parameters, and results show agreement of high extent between measured data and calculated values in both scenarios. In addition, the model was com- pared with classical statistical model, and superior capabilities of proposed model were demonstrated. The simulations were done using the originally developed user friendly software package. � 2013 Elsevier Ltd. All rights reserved. 1. Introduction Road traffic noise along with the noise coming from railways and industries represents very important factor regarding environ- mental pollution in urban areas. The influence of traffic noise on human health has been studied on numerous occasions in recent years (Brink, 2011; Fyhri & Klboe, 2009; Pirrera, De Valck, & Cluydts, 2010) resulting that this kind of annoyance significantly affects both mental and physical health in many ways: causing anxiety, stress, hearing impediments, sleep disturbance, cardiovas- cular problems, etc. Thus, in order to control noise sound level in urban areas, it is very important to develop methods for prediction of the traffic noise. Due to the rapid development of means of transportation and road traffic, the influence of the traffic flow structure on the level of traffic noise is an important area of research. Through the monitoring of basic flow parameters and their trends it is possible to predict and monitor noise that appears in the certain part of the transport network. In this way, the effect of noise reduction can be achieved through different modes of traffic management, which is particularly important for human health and environmental improvement. The first traffic noise prediction (TNP) models date back to early 1950s. Since then large number of methods and models for traffic noise prediction has been developed. The critical reviews of the most used ones are given in Steele (2001) and Quartieri et al. (2009). Most of the TNP models that are presented in literature are based on linear regression analysis. The main limit of those models, as concluded in Quartieri et al. (2009) and Guarnaccia, Lenza, Mastorakis, and Quartieri (2011), is ‘‘that they do not take into account the intrinsic random nature of traffic flow, in the sense that they do not take care of how vehicles really run, consid- ering only how many they are’’. More advanced models involve artificial neural networks (ANN) (Cammarata, Cavalieri, & Fichera, 1995; Givargis & Karimi, 2010) and genetic algorithms (Gndogdu, Gkdad, & Yksel, 2005; Rahmani, Mousavi, & Kamali, 2011). ANN model that was used in Cammarata et al. (1995) has 3 inputs: equivalent number of vehicles, which was obtained by adding to the number of cars number of motorcycles multiplied by 3 and number of trucks multiplied by 6, the average height of the build- ings on the sides of the road, and the width of the road. In order to increase the number of inputs authors decomposed equivalent number of vehicles into the number of cars, the number of 0957-4174/$ - see front matter � 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2013.12.025 ⇑ Corresponding author. Tel.: +381 69 844 9679. E-mail address: mdespotovic@kg.ac.rs (M. Despotovic). Expert Systems with Applications 41 (2014) 3993–3999 Contents lists available at ScienceDirect Expert Systems with Applications j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2013.12.025&domain=pdf http://dx.doi.org/10.1016/j.eswa.2013.12.025 mailto:mdespotovic@kg.ac.rs http://dx.doi.org/10.1016/j.eswa.2013.12.025 http://www.sciencedirect.com/science/journal/09574174 http://www.elsevier.com/locate/eswa motorcycles, and the number of trucks, and got the ANN model with 5 inputs. In terms of the parameters involved in the CoRTN (Calculation of Road Traffic Noise) model (Quartieri et al., 2009), which was initially developed in 1975 by the Transport and Road Research Laboratory and the Department of Transport of the Uni- ted Kingdom, the ANN model that was used in Givargis and Karimi (2010) has 5 input variables: the total hourly traffic flow, the percentage of heavy vehicles, the hourly mean traffic speed, the gradient of the road, and the angle of view. Authors tested the developed model on the data collected on Tehran’s roads, and found no significant differences between the outputs of the developed ANN and the calibrated CoRTN model. In Gndogdu et al. (2005) genetic algorithm was used to model the traffic noise in relation to traffic composition (vehicle per hour), the road gradi- ent and the ratio of building height to the road width. In Rahmani et al. (2011) the proposed model is a function of total equivalent traffic flow and equivalent traffic speed. In both papers the authors used MATLAB to find the optimized values of model parameters. In this paper an application of four optimization techniques for the prediction of traffic noise is presented. These techniques are: genetic algorithms, Hooke and Jeeves algorithm, simulated anneal- ing, and particle swarm optimization. The model that is proposed consists of five variables: the number of light motor vehicles, the number of medium trucks, the number of heavy trucks, the num- ber of buses and the average traffic flow speed. All optimized mod- els are tested on data measured on Serbian road using the originally developed user friendly software package. 2. Problem formulation The most suitable measure for depicting traffic noise emission is equivalent sound pressure level ðLeqÞ, which is expressed in units of dbA and corresponds to fictitious noise source emitting steady noise, which in specific period of time contains the same acoustic energy as the observed source with fluctuating noise. For a number of discrete measurements ðNÞ; Leq for time period T is expressed by following equation: Leq ¼ 10log10 1=T XN i¼1 10 Li 10 ! ð1Þ where Li is sound pressure level, which corresponds to i th measurement. In order to reduce the noise it is necessary to know functional relationship between the equivalent sound pressure level and influential parameters. Leq is correlated to numerous parameters, such as numbers and types of vehicles, their velocities, type of road surface, width and slope of the road, height of buildings facing the road, etc. As mentioned in the introduction, in this paper the following variables were considered: the number of light motor vehicles (LMV), the number of medium trucks (STV), the number of heavy trucks (TTV), the number of buses (BUS) and the average traffic flow speed (Vavg). A brief description of how these variables were measured is given in the following chapter. 3. Data sampling For traffic data measurement and for noise measurement on the road M5, automatic traffic counters QLTC-10C and sound level me- ter Bruel&Kajer type 2230 class 1 respectively were used. The equivalent sound pressure levels were measured for time period of 15 min. In order to include greater number of scenarios that might occur in urban environments, a total of 124 measurements of equivalent noise levels for time periods of 15 min were carried out. Measurements of Leq for time period of 15 min were performed at various times to include diversity of the traffic flow as much as possible. Simultaneously, variations in traffic flow, traffic speed and composition of traffic flow were measured. For that reasons the surveys at the same time also consist of the following param- eters: the number of light motor vehicles, the number of medium trucks, the number of heavy trucks, the number of buses, and the average traffic speed in the given time periods. Measurements were taken in accordance with recommenda- tions for road traffic noise measurement; microphone was mounted away from reflecting facades, at a height of 1.2 m above the ground level and 7.5 m away from central line of the road. Dur- ing the measurements it has been taken care that climate condi- tions are as similar as possible (no wind, no rain) in order to eliminate their influence. 4. Mathematical model and methods The equivalent sound pressure level is supposed to be modeled by the following equation: Leq ¼ N1 � log10ðLMVÞþ N2 � log10ðSTVÞþ N3 � log10ðTTVÞ þ N4 � log10ðBUSÞþ N5 � Vavg N6 þ N7 � log10ðVavgÞ ð2Þ where Niði ¼ 1 � 7Þ are coefficients. The problem transforms to find coefficients Ni , such that supposed model best fits experimental data. For that purpose genetic algorithms, Hooke and Jeeves algorithm, simulated annealing, and particle swarm optimization are used. These techniques are briefly described in following subchapters. 4.1. Genetic algorithms Genetic algorithms (Rao, 1996) are class of evolutionary algo- rithms that could be used for a large number of different applica- tion areas. The principle of genetic algorithms is based on Darwin’s theory of evolution, by which the fittest individuals have the best chances to survive. Genetic algorithms operate with a set of individuals (chromosomes) called population. The information Fig. 1. Flowchart of the Genetic algorithm workflow. 3994 V. Nedic et al. / Expert Systems with Applications 41 (2014) 3993–3999 https://isiarticles.com/article/22312 work_3mu6jnosszdnhdltvokblqh2oy ---- Cloud computing service composition: A systematic literature review Expert Systems with Applications 41 (2014) 3809–3824 Contents lists available at ScienceDirect Expert Systems with Applications j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a Review Cloud computing service composition: A systematic literature review 0957-4174/$ - see front matter � 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2013.12.017 ⇑ Corresponding author. E-mail addresses: amin.jula@gmail.com, aminjula@ftsm.ukm.my (A. Jula), elan@ftsm.ukm.my (E. Sundararajan), zalinda@ftsm.ukm.my (Z. Othman). Amin Jula a,⇑, Elankovan Sundararajan b, Zalinda Othman a a Data Mining and Optimization Research Group, Centre for Artificial Intelligence, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, UKM Bangi, 43600 Selangor, Malaysia b Centre of Software Technology and Management, Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, UKM Bangi, 43600 Selangor, Malaysia a r t i c l e i n f o a b s t r a c t Keywords: Cloud computing service composition Systematic literature review Quality of service parameter QoS Research objectives Importance percentage of quality of service parameters The increasing tendency of network service users to use cloud computing encourages web service vendors to supply services that have different functional and nonfunctional (quality of service) features and provide them in a service pool. Based on supply and demand rules and because of the exuberant growth of the services that are offered, cloud service brokers face tough competition against each other in providing quality of service enhancements. Such competition leads to a difficult and complicated pro- cess to provide simple service selection and composition in supplying composite services in the cloud, which should be considered an NP-hard problem. How to select appropriate services from the service pool, overcome composition restrictions, determine the importance of different quality of service param- eters, focus on the dynamic characteristics of the problem, and address rapid changes in the properties of the services and network appear to be among the most important issues that must be investigated and addressed. In this paper, utilizing a systematic literature review, important questions that can be raised about the research performed in addressing the above-mentioned problem have been extracted and put forth. Then, by dividing the research into four main groups based on the problem-solving approaches and identifying the investigated quality of service parameters, intended objectives, and developing environ- ments, beneficial results and statistics are obtained that can contribute to future research. � 2013 Elsevier Ltd. All rights reserved. 1. Introduction exposes a very large number of similar single services to different In recent years, the importance of affordable access to reliable high-performance hardware and software resources and avoiding maintenance costs and security concerns has encouraged large institution managers and stakeholders of information technology companies to migrate to cloud computing. The birth of giant trust- worthy clouds has led to a dramatic reduction in apprehension to- ward such an approach. There are two challenges to address from the standpoint of the significance of all of the needed service accessibilities and efficient allocation possibilities. First, anticipating all of the possible re- quired services is extremely difficult, particularly for software ser- vices. Designing and providing simple and single fundamental services by different service providers will be considered constitu- tive and constructive parts of complicated required services and can be utilized in encountering this problem. The second challenge is in selecting the optimum required single services, which are pro- vided by different service providers with different quality of ser- vice (QoS) attributes; an optimal combination for forming a complicated service must be composed. Addressing this challenge as an optimization problem is an NP-hard problem because it service providers in the cloud. Service composition is one of the best approaches that has been proposed by researchers and applied by cloud providers; this ap- proach can consider both of the mentioned challenges simulta- neously. Selecting appropriate services from a service pool, addressing service composition restrictions, determining the important QoS parameters, understanding the dynamic character- istics of the problem, and having rapid changes in the properties of the services and network are some important issues that must be addressed in this approach to assure the service users’ satisfaction. In the early 2000s and in the years before applications in cloud computing, service composition was introduced and investigated for web services (Kosuga et al., 2002; Milanovic & Malek, 2004; Schmid, Chart, Sifalakis, & Scott, 2002; Singh, 2001). Different arti- ficial and evolutionary algorithms (Ai & Tang, 2008; Canfora, Di Penta, Esposito, & Villani, 2005; Liao et al., 2011; Luo et al., 2011; Tang, Ai, & IEEE, 2010; Wang, 2009; Zhao, Ma, Wang, & Wen, 2012; Zhao et al., 2012) and classic algorithms (Gabrel, Manouvrier, Megdiche, Murat, & IEEE, 2012; Gao, Yang, Tang, Zhang, & Society, 2005; Gekas & Fasli, 2005; Liu, Li, & Wang, 2012; Liu, Wang, Shen, Luo, & Yan, 2012; Liu, Xiong, Zhao, Dong, & Yao, 2012; Liu, Zheng, Zhang, & Ren, 2011; Torkashvan & Haghighi, 2012) have been applied extensively to solve the problem. Design- ing workflows and frameworks for the composition of single ser- vices to achieve specific goals is another approach that has been http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2013.12.017&domain=pdf http://dx.doi.org/10.1016/j.eswa.2013.12.017 mailto:amin.jula@gmail.com mailto:aminjula@ftsm.ukm.my mailto:elan@ftsm.ukm.my mailto:zalinda@ftsm.ukm.my http://dx.doi.org/10.1016/j.eswa.2013.12.017 http://www.sciencedirect.com/science/journal/09574174 http://www.elsevier.com/locate/eswa 3810 A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 observed in the field (Chen, Li, & Cao, 2006; He, Yan, Jin, & Yang, 2008; Song, Dou, & Chen, 2011; Song, Dou, & Chen, 2011). Service composition techniques were first applied in cloud com- puting systems in 2009 (Kofler, ul Haq, & Schikuta, 2009; Zeng, Guo, Ou, & Han, 2009). Afterward, a substantial effort was made in this area. The number of ongoing studies in cloud computing service composition is rapidly increasing due to the increasing ten- dency of researchers within different areas of expertise to address the problem. A glimpse of reliable published works shows that researchers who are interested in this promising area face a tre- mendous number of novel ideas, mechanisms, frameworks, algo- rithms and approaches, and further expanding the scope of problems. Furthermore, several existing datasets, effective QoS parameters and implementation environments with different fea- tures and effects should be recognized. Hence, and due to the ab- sence of related surveys, a systematic review on cloud computing service composition is necessary and will help facilitate future re- searches. A systematic review in which the most important aspects of the accomplished researchers must be investigated, and useful information and statistics must be extracted. This paper provides a systematic literature review on state-of- the art approaches and techniques in addressing cloud computing service composition. The discussed advancements and develop- ments of this topic provide useful information to motivate further investigations in this area. Identifying the different objectives of performing cloud computing service composition studies and the reasonable and purposeful classifications of such divergent ap- proaches and mechanisms is a major achievement of the review. Furthermore, this study extracts all of the considered QoS param- eters, introduces the most significant and the least considered parameters and calculates the importance parameter percentages, aiming to eliminate barriers to future research efforts, such as pro- posing comprehensive and reliable mathematical models for calcu- lating composite service QoS values. The goals of this research also include declaring the appropriate, investigated QoS datasets, utiliz- ing software in generating problems for evaluating methods and discussing the most widely used implementation environments. Because the investigated subject is very extensive, it is impossi- ble to include all relevant topics. Hence, some related subjects do not fall within the research scope of this review but will be briefly mentioned. Providing network services requires common lan- guages and protocols and is closely related to service composition. However, services that provide details will not be considered in this work. Strong and independent studies are required to address research methodologies and experimental and statistical perfor- mance evaluation strategies. Describing the accurate significance of the parameters for real cloud customers is also beyond the scope of this report but can be achieved by conducting comprehensive studies, utilizing interviews and questionnaires and adopting appropriate statistical methods. The structure of this paper is organized as follows. After the introduction, the main aims of the paper and research questions will be defined and described in Section 2. Briefly, cloud computing and its characteristics will be explained in Section 3. Cloud com- puting service composition (CCSC) will be defined in Section 4, including its challenges and classified applied approaches. In Sec- tion 5, an extensive discussion on the objectives of the investigated research, their approaches, and utilized datasets is provided, and effective information and statistics are extracted for future re- search. The final sections of the paper contain the conclusions and references. 2. Survey goals and execution Survey goals and research questions are described in Section 2.1, and the statistics on published and presented papers in different journals and conferences are presented in Section 2.2. The authors of the present paper have profited from ‘‘Guidelines for performing Systematic Literature Reviews in Software Engineering’’ (Kitchenham & Charters, 2007) and have also used (Garousi & Zhi, 2013) to conduct and perform this research. 2.1. Survey goals and research questions The present research aims at collecting and investigating all of the credible and effective studies that have examined CCSC. More specifically, the extraction of salient features and methods of pa- pers will be considered, and their characteristics will be described. To achieve the above-mentioned goals and identify the methods that have been selected by researchers for their studies and result assessment methods, case studies are covered for which new methods are proposed and datasets and benchmarks are used. Most researchers have considered QoS parameters and proposed objective functions and user trends that are important in designing these functions. The following research questions (RQs) are raised. RQ 1. What are the main goals of the researches? RQ 2. What is the proposed approach and what are the methods used? How have the researchers conducted the research? RQ 3. What datasets or benchmarks are used and what case studies are considered? RQ 4. What evaluating procedures have been used to assess the results in each paper? RQ 5. What other research has been considered in each paper to compare the results? RQ 6. What QoS parameters are accounted for? RQ 7. How can the user requirements and tendencies be consid- ered in the applied objective function? 2.2. Publication statistics In this study, attempts have been made to examine all of the pa- pers that have been published on CCSC in particular and that in- clude novel methods or interesting ideas. To achieve this goal and to answer the research questions in Section 2.1, 34 papers that were published from 2009 to December 2013 were selected from different high-level refereed journals and prestigious international conferences and are considered in Section 4. In each study, if it was required to be familiar with some concepts and methods and to read further on the topic, other books and papers are proposed and referred to. The result of this effort is a comprehensive collec- tion of resources that can provide an acceptable level of concepts and information about the service composition problem in cloud computing and the different views of addressing this problem that are introduced in the literature. 3. Cloud computing 3.1. Cloud definition There are different definitions for cloud computing in the liter- ature, many of which do not cover all of the features of the cloud. In one attempt, Vaquero et al. attempted to extract a comprehensive definition using 22 different compliments (Vaquero, Rodero- Merino, Caceres, & Lindner, 2008). Efforts have been made to stan- dardize the definition of the cloud, in which we accept the cloud definition provided by the National Institute of Standards and Technology (NIST) (Peter Mell, 2011). The NIST cloud computing definition: ‘‘Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 3811 servers, storage, applications, and services) that can be rapidly provi- sioned and released with minimal management effort or service pro- vider interaction. This cloud model is composed of five essential characteristics, three service models, and four deployment models’’. The cloud cannot be considered to be a new concept or technol- ogy that arose in recent years; instead, its root can be found in what John McCarthy described as the ability to provide computa- tional resources as a ‘‘utility’’ (Hamdaqa & Tahvildari, 2012). Based on standard material presented by NIST, cloud computing is com- posed of five main characteristics, and two other characteristics are added based on the literature, with three spanning service models and four models of deployment, which will be described in some detail in the following sections (see Fig. 1). 3.2. Cloud computing characteristics On-demand self-service. A user can request one or more services whenever he needs them and can pay using a ‘‘pay-and-go’’ method without having to interact with humans using an online control panel. Broad network access. Resources and services that are located in different vendor areas in the cloud can be available from an extensive range of locations and can be provisioned Fig. 1. Cloud computing, characteristics, deployment m through standard mechanisms by inharmonious thin or thick clients. The terms ‘‘easy-to-access standardized mechanisms’’ and ‘‘global reach capability’’ are also used to refer to this characteristic (Hamdaqa & Tahvildari, 2012; Yakimenko et al., 2009). Resource pooling. Providing a collection of resources simulates the behavior of a single blended resource (Wischik, Handley, & Braun, 2008). In other words, the user does not have knowl- edge and does not need to know about the location of the pro- vided resources. This approach helps vendors to provide several different real or virtual resources in the cloud in a dynamic manner. Rapid elasticity. Fundamentally, elasticity is another name for scalability; elasticity means the ability to scale up (or scale down) resources whenever required. Users can request different ser- vices and resources as much as they need at any time. This char- acteristic is so admirable that Amazon, as a well-known cloud service vendor, has named one of its most popular and commonly used services the Elastic Compute Cloud (EC2). Measured service. Different aspects of the cloud should automat- ically be controlled, monitored, optimized, and reported at sev- eral abstract levels for the resources of both the vendors and consumers. odels, service pool, and types of services and users. 3812 A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 Multi-Tenacity. This concept is the fifth cloud characteristic that is suggested by the Cloud Security Alliance. Multi-tenacity means that it is essential to have models for policy-driven enforcement, segmentation, isolation, governance, service lev- els, and chargeback/billing for different consumer categories (Espadas et al., 2013). Auditability and certifiability. It is important for services to pre- pare logs and trails to make it possible to evaluate the degree to which regulations and policies are observed (Hamdaqa & Tahvildari, 2012). 3.3. Cloud computing service models Definition 1 (Service). A service is a mechanism that is capable of providing one or more functionalities, which it is possible to use in compliance with provider-defined restrictions and rules and through an interface (Ellinger, 2013). Definition 2 (Platform). A platform is a fundamental computer system that includes hardware equipment, operating systems, and, in some cases, application development tools and user inter- faces on which applications can be deployed and executed. Definition 3 (Infrastructure). Infrastructure refers to underlying physical components that are required for a system to perform its functionalities. In information systems, these components can contain processors, storage, network equipment, and, in some cases, database management systems and operating systems. Software as a Service (SaaS). A software or application that is executing on a vendor’s infrastructure is recognized as a service provided that the consumer has limited permission to access; the provision is through a thin client (e.g., a web browser) or a program interface for sending data and receiving results. The consumer is unaware of the application provider’s infra- structure and has limited authority to configure some settings. Platform as a Service (PaaS). In this service model, the service vendor provides moderate basic requisites, including the oper- ating system, network, and servers, and development tools to allow the consumer to develop acquired applications or soft- ware and manage their configuration settings. Infrastructure as a Service (IaaS). The consumer has developed the required applications and needs only a basic infrastructure. In such cases, processors, networks, and storage can be provided by vendors as services with consumer provisions. 3.4. Cloud computing deployment models Public cloud. This approach is the major model of cloud comput- ing; here, the cloud owner provides public services in the vast majority of cases on the Internet based on predefined rules, pol- icies, and a pricing model. Possessing a large number of wide- spread world resources enables providers to offer a consumer different choices to select appropriate resources while consider- ing the QoS. Private cloud. A private cloud is designed and established to pre- pare most of the benefits of a public cloud exclusively for an organization or institute. Setting up such a system because of the utilization of corporate firewalls can lead to decreased security concerns. Because the organization that implements a private cloud is responsible for all of the affairs of the system, facing abundant costs is the blind spot of establishing a private cloud. Community cloud. Based on their similar requirements, con- cerns, and policies, a number of organizations establish a com- munity and share cloud computing to be used by their community member’s consumers. A third-party service pro- vider or a series of community members can be responsible for providing the required infrastructure of the cloud comput- ing. Lowering costs and dividing expenses between community members along with supporting high security are the most important advantages of a community cloud (Dillon, Chen, & Chang, 2010). Hybrid cloud. A combination of two or more different public, pri- vate, or community clouds led to the creation of a different cloud model called hybrid cloud, in which constitutive infra- structures not only keep their specific properties but also require standardized or agreed functionalities to enable them to communicate with each other with respect to interoperabil- ity and portability on applications and data. 4. The cloud computing service composition problem (CCSC) 4.1. CCSC definition Fast development in the utilization of cloud computing leads to publishing more cloud services on the worldwide service pool. Because of the presence of complex and diverse services, a single simple service cannot satisfy the existing functional requirements for many real-world cases. To complete a complex service, it is essential to have a batch of atomic simple services that work with each other. Therefore, there is a strong need to embed a service composition (SC) system in cloud computing. The process of service introduction, requesting, and binding, as shown in Fig. 2, can be accounted for in such a way that service providers introduce their available services to the broker to expose them to user requests. However, users also send their service re- quests to the broker, who must select the best service or set of ser- vices on the basis of user requirements and tendencies; the broker wants providers to bind selected services to the users with respect to predefined rules and agreements. Increasing the number of available services causes an increase in the number of similar-function services for different servers. These similar services are located in different places and have dis- tinct values in terms of the QoS parameters. For this reason, SC ap- plies appropriate techniques to select an atomic service among the different similar services that are located on distinct servers to al- low the highest QoS to be achieved according to the end-user requirements and priorities. Because of intrinsic changes in cloud environments, available services, and end-user requirements, SC should be designed dynamically, with automated function capabilities. Therefore, selecting appropriate and optimal simple services to be combined together to provide composite complex services is one of the most important problems in service composition. The SC problem in cloud computing can be defined as determining what atomic simple services should be selected such that the ob- tained complex composite service satisfies both the functional and QoS requirements based on the end-user requirements. Be- cause of various and abundant effective parameters and a large number of simple services provided by many service providers in the cloud pool, CCSC is considered an NP-hard problem (Fei, Dongming, Yefa, & Zude, 2008). In this paper, it is assumed that every composed service (CS) in the cloud consists of n unique services (USs) and has p QoS param- eters. To terminate a CS, a combination of unique services act sequentially in an ordinal workflow (wf), as shown in Fig. 3. Fig. 2. Process of service introduction, requesting, and binding. A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 3813 Define qi (USj) as the value of QoS parameter i for unique service j. Accordingly, the QoS vector of unique service j is defined as Eq. (1): qðUSjÞ¼ ðq1ðUSjÞ; q2ðUSjÞ; . . . ; qpðUSjÞÞ ð1Þ However, if wfk is the workflow of CSk, then it is possible to define Qi (wfk) to obtain the value of QoS parameter i for workflow k. Then, the QoS vector of workflow k is described in Eq. (2): QðwfkÞ¼ ðQ 1ðwfkÞ; Q 2ðwfkÞ; . . . ; Q pðwfkÞÞ ð2Þ 4.2. CCSC challenges The dynamic nature of cloud environments involves occasional and consciously planned changes; these changes expose cloud computing to different challenges in the SC. The most remarkable challenges are the following: Dynamically contracting service providers. The pricing policy of most service providers is determined by service fees based on supply and demand. Thus, mechanisms for updating the table of available resource characteristics must be predicted (Gutierrez-Garcia & Sim, 2013). Addressing incomplete cloud resources. Optimal service selec- tion by a broker depends on the availability of complete and updated information on the services. Facing several changes in the service characteristics could result in the loss of some data (Gutierrez-Garcia & Sim, 2013; Yi & Blake, 2010). Describing and measuring QoS attributes of network services. A lack of consensus on the definition and description of network services’ QoS parameters among worldwide cloud service Fig. 3. Cloud service co vendors is still an important challenge for cloud developers. The absence of agreed-upon ways to measure network QoS is another problem that is not completely solved and that should be considered (Qiang, Yuhong, & Vasilakos, 2012). Interservice dependency/conflict. Dependency or conflicts that exist among two or more services leads to a complicated service composition problem. In real-world scenarios, encountering dependency and conflict among services is quite common and should be considered in SC (Strunk, 2010). Security. Designing and apprising security rules, policies, and instructions are among the basic responsibilities of cloud ser- vice vendors. However, a principled self-administered frame- work for supplying and provisioning services in which vendors’ security concerns and policies are observed must be provided (Takabi, Joshi, & Gail-Joon, 2010; Zissis & Lekkas, 2012). 4.3. Current approaches for CCSC The different approaches proposed in the literature can be divided into five distinct categories: including classic and graph- based algorithms (CGBAs), combinatorial algorithms (CAs), ma- chine-based approaches (MBAs), structures (STs), and frameworks (FWs). Different studies that belong to these categories are investi- gated in Sections 4.3.1 to 4.3.5. 4.3.1. Classic and graph-based algorithms CCSC is a problem in which there are many potential solutions, among which one or a limited number of solutions are optimal. Thus, CCSC is known as an optimization problem (Anselmi, Arda- gna, & Cremonesi, 2007; Yu & Lin, 2005). Some types of classic algorithms, such as backtracking and branch-and-bound, can be used to solve optimization problems. These algorithms can guaran- tee that an optimal solution will be found solely by taking expo- nential time complexity (Neapolitan & Naimipour, 2009). Thus, using classic algorithms to solve optimization problems is possible only with some modifications and improvements for decreasing the execution time. To achieve a feasible concrete workflow for CCSC with respect to the consumer QoS requirement, the problem is considered to be equivalent to a multi-dimensional multi-choice knapsack problem (MMKP) in which a parameter called happiness that is calculated based on QoS parameters is used as the utility (Kofler et al., 2009). To solve the MMKP and obtain the benefits of heterologous multi-processing environments (e.g., grid computing (Preve, 2011)), a parallel form of the branch-and-bound algorithm is proposed in which each node of the decision tree is an independent mposition process. 3814 A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 instance of a main routine that could be assigned to separated computational nodes. To evaluate the algorithm, researchers ran- domly generated four different size problems; they executed the problems in serial and parallel modes and compared the results against one another. It is obvious that the most important problem that the proposed algorithm faces is the exponential complexity execution time. The performance evaluation of the algorithm could also be completed and could be more reliable if the results were compared with similar algorithms’ results on real-world datasets. To rectify the high execution time problem in the method, a two- phase approach is also proposed in Kofler, Haq, and Schikuta (2010). In this approach, the first phase is similar to the previous phase, whereas for second phase, the previous existing solutions are typically reused for similar requests, and some changes are ap- plied to correspond with the new situation. After designing a new cloud resource data-saving method, a matching algorithm called SMA is applied in Zeng et al. (2009) to check whether the output parameters of a service and the input parameters of another service are compatible with each other. In this algorithm, the service matching problem is mapped to calcu- late the semantic similarity of the different input and output parameters of different services. The matching degrees of each pair of services that are stored in a table lead to the composition of a weighted directed graph in which finding all reachable paths for two nodes can yield all of the possible service composition meth- ods. Researchers proposed an improved Fast-EP algorithm called FastB+�EP to obtain all possible paths in a shorter amount of time. However, two-step graph building and searching leads to increased execution time and decreasing algorithm performance, especially in cases when there is an increase in the size of the problem and the number of required services. In view of all of the cloud participants (e.g., resource vendors and service consumers) as agents and simulating their activities by a colored petri net (Barzegar, Davoudpour, Meybodi, Sadeghian, & Tirandazian, 2011; Jensen, 1995), an agent-based algorithm for CCSC is introduced (Gutierrez-Garcia & Sim, 2010). In the proposed algorithm, the broker agent receives a consumer agent’s require- ments and attempts to find a service vendor agent for each single service. Thereafter, the contract net protocol (Smith, 1981) is used to select proper single services to compose the solution. Using the experimental results, it is proven that the proposed method can achieve solutions in linear time with respect to the number of ser- vice vendor agents and that it can consider parallel activities to ad- dress heterogeneous services. A two-phase service composition approach is proposed for dy- namic cloud environments in which the service performance changes in different transactions (Shangguang, Zibin, Qibo, Hua, & Fangchun, 2011). In the first phase, the cloud model (Li, Cheung, Shi, & Ng, 1998) is applied to change the qualitative value of the QoS parameters to their quantitative equivalent to calculate the uncertainty level. Thereafter, mixed integer programming (MIP) is utilized in the second phase to find appropriate services in which the binary decision vector is used to determine whether a service is selected. Another two-phase method in which MIP is also applied to focus on service performance fluctuations in the dynamic cloud environment by solving CCSC is developed in Zhu, Li, Luo, and Zheng (2012). To decrease the number of candidate single services, some appropriate services are selected based on the history of sin- gle services and using K-means in the first phase of the proposed method. In the second phase, MIP is used to select the best single services among the preliminary selected services. The authors proved that their two-phase approach can outperform HireSome (Lin, Dou, Luo, & Jinjun, 2011). Applying linear programming (LP) (Korte & Vygen, 2012; Vanderbei, 2008) for optimizing virtual machine resource alloca- tion in physical servers for video surveillance was proposed in Hossain, Hassan, Al Qurishi, and Alghamdi (2012) for the first time. Composing an optimal set of media services to prepare a composite service over virtual machine resource allocation was also consid- ered. To reach this goal, the authors mapped the virtual machine resource allocation problem to the multi-dimensional bin-packing problem and used LP to solve it. They also suggested the use of a customized best-fit decreasing method (Kou & Markowsky, 1977) to solve the problem, which considerably increases the probability of finding appropriate results compared to LP but cannot guarantee that an optimal solution will be obtained. To evaluate two pro- posed methods, randomly generated service composition problems were considered, and the results are compared to fractional knap- sack mapping and round-robin allocation execution results. For generating composite services with the lowest execution cost a two-phase method is applied in Liu et al. (2012). The first phase includes utilizing a state transition matrix for the analysis of the dynamic process of composite service execution. In this phase, each composite service status was modeled on a state tran- sition diagram, which is used to produce a state transition matrix. The execution cost of each composite service can be calculated using its state transition matrix. In phase two, Business Process Execution Language for Web Services (BPEL4WS) (Huang, Lo, Chao, & Younas, 2006) is used to find an optimal solution. Because the process of BPEL4WS, which is known as the distributed traffic model, is extremely time consuming, researchers divided it into three parts, each of which is executed by an independent computer. To consider the user preferences and different resource charac- teristics bundled together, Liu et al. proposed a three-layer hierar- chical structure; the first layer includes optimal service selection; the second layer is called the criterion layer and includes timeli- ness, stability, and security, based on which services are divided into three categories; and the third layer is designed for the repre- sentation of nine additional QoS parameters (Liu et al., 2012). Next, the phases of the proposed algorithm (SSUP) include generating and normalizing a user requirement matrix, generating a QoS weight specification using the analytic hierarchy process (AHP) (Jeonghwan, Rothrock, McDermott, & Barnes, 2010), generating and normalizing the service attribute matrix, and performing the service-demand similarity calculation. The experimental results indicate that SSUP could outperform AHP when solving the CCSC problem. Worm et al. successfully addressing two antithetical aims for the same provider, namely, the aims of revenue maximization and quality assurance (Worm, Zivkovic, van den Berg, & van der Mei, 2012). The authors based their method on three decision cri- teria: service availability at the decision instant, executing cost, and resting time to deadline. With respect to the response time and with an emphasis on service availability, dynamic program- ming (DP) is used to achieve the main goals of the study, in which it is necessary to save intermediate results to select the best service among all available services. In addition, in the case of a deadline, an arrival decision rule would select services with the lowest price for the remaining tasks. The proposed algorithm is executed for static and dynamic service composition but uses real-world data- sets and benchmarks, comparing the results with the previous re- search; calculation time and memory complexity are neglected. Zhou and Mao proposed a cloud-based semantics approach for the composition of web services utilizing a Bayesian decision (Zhou & Mao, 2012). The authors applied a Bayesian approach to antici- pate the semantics of a web service for which a discovery graph is generated based on a cloud to use for implementation. They also obtained relations that are based on the graph that could encoun- ter a Markov chain (Song, Geng, Kusiak, & Xu, 2011). In addition, using an equation for the cosine theorem (Xiangkun & Yi, 2010), it is possible to achieve a similarity of services, which helps to A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 3815 identify the cloud service interface through Web services and de- velop a composition system. The proposed method has been ap- plied to Amazon services and can be compared with an existing approach (Stewart, 2009). On the basis of variability modeling (Sinnema & Deelstra, 2007), cloud feature models (CFMs) are presented as mechanisms to explain cloud services and user requirements together to pre- pare a suitable ground for their cloud service selection process (CSSP) (Wittern, Kuhlenkamp, & Menzel, 2012). A CFM is repre- sented by a directed graph in which the nodes play service roles and the edges denote the relations between services. Further- more, the CSSP should satisfy user objectives and also respect the requirements. To satisfy these goals, the user must enter the objectives and requirements to start the process, and the CSSP uses decision support tools to generate a list of offered services based on the input model. Focusing on the dynamic characteris- tics of cloud computing and updating the graph throughout the execution could help the proposed method to enhance its capabilities. A two-step procedure to satisfy users’ QoS requirements is proposed (Huang, Liu, Yu, Duan, & Tanaka, 2013) in which in the first step, for each user’s request, the proposed method at- tempts to select single services that meet the first two types of user functional requirements and that eliminate the remaining requirements. Then, a virtual network of service providers of se- lected services is generated and modeled by a directed acyclic graph (DAG). In the second step, in the case of a single user’s QoS requirement, it is sufficient to apply an algorithm to deter- mine the shortest path in the DAG. However, for the two QoS parameters, the problem is changed to a multi-constrained path problem (Korkmaz & Krunz, 2001) when this constructing auxil- iary graph FPTAS (CAGF) is used. In the CAGF, an auxiliary graph is used in which the two-weight preliminary DAG is changed to a one-weight DAG (which can be solved as in the previous case) by considering the first QoS parameter as a weight and merging the second parameter into the connectivity among the expanded vertices. Researchers have also proposed a method for consider- ing more than two QoS parameters. The execution time and re- turned path weight (the final QoS) are two parameters that are considered when making a comparison with existing DAG-based algorithms to determine the performance of the method; however, generating several graphs in the algorithm is a time- consuming activity that increases the execution time. This inter- esting study could also be enriched by addressing known datasets. To obtain the best set of single services for service composition, three algorithms are proposed (Qi & Bouguettaya, 2013) for gener- ating a service skyline that can be considered a specific set of ser- vice vendors that other possible sets cannot dominate with respect to the QoS parameters (Yu & Bouguettaya, 2010). The basic idea of these algorithms is based on reducing the search space. The first algorithm is called OPA, and it examines all of the service execution plans, one for each phase, and saves the best found solution. The researchers also applied some improvements on OPA for decreas- ing the processing time and the consumed space. DPA is a second algorithm; it uses a tree structure and is based on the progressive examination of service execution plans that are sorted in ascending order based on their score and progressive results output. This pro- gressive method provides the algorithm pipeline capabilities. To overcome the problem of the repetition of nodes and thus over- head, researchers have proposed reducing the parent table data structure. For a third attempt, a linear approach is used to design a bottom-up algorithm (BUA) to address the expensive computa- tional costs of DPA for an increasing number of services. Evaluating the proposed algorithms by using different models of problems is another advantage of this study. HireSome-II is proposed (Dou, Zhang, Liu, & Chen, 2013) to im- prove the reliability of composition QoS values. With HireSome-II, the QoS history of services is investigated for evaluation instead of what the service providers claim. Thus, the context of service com- position is implemented using a two-layer tree structure in which the required service is located in the root, and the nodes are lo- cated in the second layer as candidate services. In addition, the K-means clustering algorithm can be used to categorize the history of each candidate service into two peer groups. Inserting these two peer groups into the tree will provide a three-layer tree, the Task- Service-History Record tree, which identifies the best performance history for candidate services. Based on the required accuracy, additional history layers can be inserted into the tree. Based on experimental results, the performance of the proposed method proved to be superior to the authors’ previous method, HireSome-I, especially in the case of facing abundant candidate services. Despite its strengths, with increasing required services, service providers and history volume, this method will face significant increases in execution time due to the use of K-means for categorizing the history of all candidates. In (Wu, Zhang, Zheng, Lou, & Wei, 2013) a model for QoS-satis- fied predictions in CCSC is proposed based on the hidden Markov model (HMM) (Li, Fang, & Xia, 2014; Zeng, Duan, & Wu, 2010). The proposed model was to ensure customer satisfaction of the composite service QoS, utilizing the basic form of HMM, in which QoS parameters are considered as a state-set, and distinct observa- tion-probability functions are each defined for one of the parame- ters. Due to the high computational complexity of the functions, researchers also applied a two part, forward–backward structure for reducing the complexity, in which each part includes three stages, initialization, recursion and end. To evaluate the proposed model, a real cloud computing simulation system was constructed that provides more than 100 different services. The model param- eters can be adjusted with a support vector machine-based algo- rithm. The obtained evaluation error rates were small but with respect to the complicated structure of the model, which will face many services in the real world. This model is predicted to be con- fronted with larger error rates and an ineligible execution time. A novel method for QoS mapping is proposed in which a set of three rules is used to map QoS specifications and guarantees to- gether in the cloud (Karim, Chen, & Miri, 2013). To overcome the complexity, the analytic hierarchy process (AHP) (Ishizaka & Labib, 2011; Jeonghwan et al., 2010) is utilized in which customer prefer- ences, possible solutions and ranking the services, are conditions, search space and the goals, respectively. AHP has also led to specify QoS weights to rank and select candidate services. To evaluate the method, a case study was described, and the obtained results were discussed. Because the AHP was constructed from a graph that in- cluded many-to-many relationships, increasing the number of re- quired services and search space led to increased method time complexity, which is unacceptable in real environments. Two novel cloud service ranking approaches, CloudRank1 and CloudRank2, are proposed in which the basic strategy ranked the services based on the prediction of their QoS (Zibin, Xinmiao, Yilei, Lyu, & Jianmin, 2013). These two algorithms are composed of three steps, including similarity computation, preference value compu- tation and ranking prediction, and are different with respect to preference value treatment. Because of the negative effects of equal preference value treatment in the accuracy of ranking pre- diction in the first algorithm, the confidence value of service QoS preference values is defined and used in the treatment process of the second algorithm to remove the effects, reaching more accu- racy. Researchers have also provided a new dataset called tpds 2012 that includes a response-time and throughput of 500 services provided by 300 users to evaluate the proposed algorithms. The evaluation results indicate that the algorithms outperformed 3816 A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 previous approaches; however, based on mathematical consider- ations, the time complexity function of the algorithms was O(n2- m) + O(n2m) + O(n2), where n and m are the number of services and users, respectively. Because the number of services is typically much greater than the number of users, apparent increases in the execution time of the algorithms may occur. 4.3.2. Combinatorial algorithms CCSC is an optimization problem, and a distinctive feature of this problem is being involved in very large search spaces to obtain optimal results. In addition, the importance of achieving an opti- mal solution in a shorter time frame forces researchers to seriously consider the use of combinatorial algorithms (Knuth, 2011; Abonyi et al., 2013). Thus, different efforts have been conducted using dif- ferent models of combinatorial algorithms to solve the CCSC prob- lem, and these efforts are investigated in the following studies. To simplify the composition of services in cloud computing without priority weights, an algebra of communication process (ACP) (Fokkink, 2000) based on a QoS value aggregation algorithm is applied (Zhang & Dou, 2010) in which an artificial neural net- work (Haykin, 1998) is used to focus on service consumer require- ments without predefined user priority parameters. A three-layer perceptron (Shrme, 2011) neural network is used in this study by employing back-propagation (Wang, Wang, Zhang, & Guo, 2011) and least-mean-square-error (Sayed, 2003) algorithms to learn the network and adjust the weights, which are initialized ran- domly. In the proposed neural network, the number of neurons for the input layer is equal to the number of QoS parameters con- sidered. The number of neurons in the second layer is also equal to the number of neurons in the first layer, and the third layer has one neuron for an output. During the training process, the users’ prior- ity information is entered into the neural network to realize suit- able values for the weights. Weight parameter efficiency is evaluated by applying the mean square error (Zhou & Bovik, 2009) as the objective function. In this research, datasets and benchmarks are not investigated, and the results are not compared. By dividing the QoS parameters into three groups, including ascending, descending, and equal QoS attributes, and by using sim- ple additive weighting to normalize the values of those parame- ters, a new model for calculating the QoS of composite services (Ye, Zhou, & Bouguettaya, 2011) is proposed. These authors also applied a genetic algorithm to solve the CCSC problem in which a roulette wheel selection algorithm is used to select chromosomes to execute a crossover operation. The proposed model uses the achievement of QoS services as the fitness value. The results ob- tained with the proposed method were compared with different existing algorithms, such as LP and culture algorithm (Reynolds, 1999) and has shown to be more efficient. To achieve the goal of the representation of automatic service compositions for dynamic cloud environments, in Jungmann and Kleinjohann (2012) the problem is modeled as a Markov decision process (Arapostathis, Borkar, Fernández-Gaucherand, Ghosh, & Marcus, 1993; Chang & Yuan, 2009). A set of all possible composi- tions of services and a set of all possible actions for changing com- positions by adding new valid services are generated, and the composite service provider is considered an agent. A reward func- tion is also used to learn the optimal policy and optimal composi- tion for a user request that utilizes utility values, such as previous user feedback and former execution information. The reward func- tion results are used by the agent for subsequent decisions and up- dated using reinforcement learning techniques (Kaelbling, Littman, & Moore, 1996; Liu & Zeng, 2009). A judgment about this research cannot be made due to a lack of appropriate performance evalua- tion and comparison of results. Because of the growing importance of networks in the QoS of cloud service composition, in Klein, Ishikawa, and Honiden (2012) an approach is suggest for considering the non-network and network QoS of services separately. To this end, they estimated the real network latency among the desired services and their applicants with low time complexity by proposing a network mod- el that allowed services that had a lower latency to be selected. Researchers also introduced a QoS equation to calculate the net- work QoS, latency, and transfer rate. In the last phase of the ap- proach, based on the genetic algorithm, a selection algorithm is designed to apply proposed models to generate composite services, and its results are compared with the Dijkstra algorithm (Neapolitan & Naimipour, 2009) and random selection. The results of this interesting study could be enriched by the use of real-world datasets. With respect to self-adaptivity (Denaro, Pezze, & Tosi, 2007) in the service provider system, an improved genetic algorithm is pro- posed (Ludwig, 2012) in which a clonal selection algorithm (de Castro & Von Zuben, 2002) is used instead of tournament selection to select individuals for the crossover and mutation operators in a more discernible manner. The explanation of the proposed algo- rithm and the experimental results in the original paper do not go into detail regarding the researchers’ work on self-adaptivity. In Yang, Mi, and Sun (2012) game theory is used to propose a service level agreement (SLA)-based service composition algo- rithm. In this research, SLA is defined as quadruple, including SLA main information, service vendors and consumer information, ser- vice type and parameters, and a set of responsibilities for service vendors and consumers. To establish an SLA, SC is intended to be a multiple dynamic game, called a bid game, in which service ven- dors and service consumers are players that aim to achieve their goals. In this competitive game, every consumer should promul- gate a price for each requested service based on effective parame- ters and other consumers’ proposed prices, and vendors can assign their services according to received proposed prices with respect to the requested level of the quality of services that are agreed upon and signed in the SLA. The reliability of this method is limited due to the lack of comparison with other techniques and the lack of comparison with real-world datasets. A parallel form of the chaos optimization algorithm (Jiang, Kwong, Chen, & Ysim, 2012) called FC-PACO-RM is proposed to solve the CCSC problem (Fei, Yuanjun, Lida, & Lin, 2013). The researchers attempted to dynamically modify the sequence length based on the evolutionary state of the solutions. They also utilized the roulette wheel selection algorithm before executing the chaos operator to eliminate improper randomly generated solutions and escape from their destructive consequences. Because one of the main goals of this study has been to reduce the execution time, parallelization of the proposed algorithm is also considered. To accomplish this goal, a full-connection topology has been selected because of its high searching capability and message passing inter- face (MPI) (Barney, 2012). Finally, a novel migration method called Way-Reflex Migration is introduced and applied to reduce commu- nication overhead of fully connected topologies. Compared with a genetic, chaos genetic and chaos optimization algorithm, the pro- posed method showed better results in view of the best solution fitness and execution time. In Wang, Sun, Zou, and Yang (2013) particle swarm optimiza- tion (PSO) with integer array coding is applied to achieve a fast method to solve the CCSC problem. To achieve this goal, the skyline operator with binary decision variables is used to eliminate impro- per services from the search space. Because it is necessary to con- trast all of the services pair-wise when using the skyline operator, its execution time is not acceptable for an increasing number of services. Thus, the researchers used offline skyline services to de- crease the response time (Borzsony, Kossmann, & Stocker, 2001). The QWS dataset (Al-Masri & Mahmoud, 2008) and Synthetic Generator (Wang, Sun, & Yang, 2010) are used to evaluate the A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 3817 algorithms. Comparing the results of the proposed algorithm with another service composition called GOA (Ardagna & Pernici, 2007) demonstrates that the proposed method achieves positive results. Simultaneous optimization of the response time and execution cost of the service composition process motivated Jula et al. to pro- pose Imperialist competitive-gravitational attraction search algo- rithm (ICGAS) as a new memetic algorithm (Moscato & Cotta, 2003) for solving the CCSC problem (Jula, Othman, & Sundararajan, 2013). Because they wanted to address the enormous number of services that were provided by several service vendors, they at- tempted to apply the advantages of the exploration capability of an imperialist competitive algorithm (ICA) (Atashpaz-Gargari & Lu- cas, 2007) and the significant exploitation ability of a gravitation search algorithm (GSA) (Jula & Naseri, 2012) simultaneously. In the proposed algorithm, to balance the process of execution of the two algorithms and to increase the performance, the number of population members of GSA is determined to be equal to 20% of the number of members of the ICA algorithms. Researchers also introduced a new mathematical model for calculating the QoS of the nominated composite services based on user-defined weights for different QoS parameters. Using a very large real-world dataset to evaluate the algorithm and to compare its results with the re- sults from executing a genetic algorithm, PSO and classic ICA have demonstrated suitable results for ICGAS. A negative selection algorithm, which is inspired from a nega- tive selection mechanism in biological immune recognition and can eliminate improper solutions through an execution process, is used in Zhao, Wen, and Li (2013). In the proposed algorithm, the solutions and candidate services are implemented in the form of vector and integer strings, respectively. There are two different applied models for calculating the local and global fitness. To com- pute the fitness of the solutions for a local search, users are respon- sible for defining their criteria with related weights and a set of constraints based on which quality vectors are generated for all of the services for the decision-making process. Global fitness is also computed using an equation that is designed based on local fitness. The performance capability of the proposed algorithm is proven by comparing its results with the results obtained from standard particle swarm intelligence and the clonal selection im- mune algorithm. Attempting to select the best services based on overall quality of services a model is designed in which customer feedback and the objective performance analysis results are considered as two inputs, and the output is the quality of services (Lie, Yan, & Orgun, 2013). Because customer feedback is composed of linguistic vari- ables, a mapping table is applied to change the inputs to fuzzy numbers. Then, two input values that were changed to fuzzy rat- ings and a fuzzy simple additive weighting system (Chou, Chang, & Shen, 2008) are utilized to obtain the model output. Researchers have also applied a filtering mechanism for removing misleading values that are given by unprofessional customers. The method was evaluated with a case study; however, the results have not been compared to other approaches. This method has also not been applied to select a set of services for assessing its effectiveness in service composition. Thus, this method cannot present proper effi- ciency in the case of facing many QoS parameters, customer feed- backs and requests. 4.3.3. Machine-based methods In Bao and Dou (2012) researchers designed finite state ma- chines (FSMs) (Koshy, 2004) to consider service correlations and rightful task executing sequences; each FSM is used to implement a group of services that have limitations on their invocation and executing sequence called CWS community and another FSM for a desired composite service called the target process. The proposed method is composed of two phases. In the first phase, a composi- tion service tree is created on the basis of CWS community and a target process. A pruning policy is also applied to eliminate impro- per paths of the tree and to reduce the processing time. The QoS of all paths in the tree are calculated in the second phase based on user requirements, and the path with the highest QoS is selected as the final solution. To reach this goal, a dichotomous simple addi- tive weighting technique (Liangzhao et al., 2004) is used in the first part, from which the scaling of paths is performed by dividing them into negative and positive criteria, and in the second part, the total QoS of all of the paths is calculated. Researchers could prove appropriate efficiency of their proposed methods by compar- ing its executing time with the executing time of the enumeration method (Gerede, Hull, Ibarra, & Su, 2004). 4.3.4. Structures Based on the B+-tree (Bayer & Unterauer, 1977), an index struc- ture has been designed by in Sundareswaran, Squicciarini, and Lin (2012) to simplify the process of information insertion and retrie- val for cloud service providers. In the proposed structure, different properties, such as the service type, security, QoS, and measure- ment and pricing units, have specific locations to be stored and considered. To increase the speed at which the information man- agement operators are executed and appropriate vendor queries can be found, service vendors with the same characteristics should be stored together in adjacent rows. Researchers also proposed a query algorithm based on a designed structure to search the pro- viders’ database for the best vendors; after finding k vendors close to the optimal for vendors with each desired service, a refinement procedure is designed to reduce the number of selected vendors and sort them according to their Hamming distances, which entails starting with the optimal and progressing in ascending order to facilitate the selection of better providers. The proposed method is compared with a brute-force search algorithm and has shown al- most 100 times better execution speed for solving the CCSC prob- lem with 10,000 service providers. 4.3.5. Frameworks Pham, Jamjoom, Jordan, and Shae (2010) proposed a new frame- work for service composition in which a composition agent is responsible for receiving the request and providing service man- agement. The agent analyzes each request and divides it into the required single services. Using a knowledge base, the service dependencies are identified, and a service recovery section extracts similar service information. A composition is successful provided that all of the required single services are available and are used to update the knowledge base. There is a packaging engine that generates a new software package by using existing composition together with the new composition and that registers it in a service catalog. Finally, a service delivery section utilizes service catalog information for service provisioning. Chunqing, Shixing, Guopeng, Bu Sung, and Singhal (2012) pro- posed a systematic framework for automatic service conflict detec- tion and supplying policies and user requirements. The first phase of the proposed framework is a conflict analysis section that in- cludes two sub-sections called the comparator and analyzer. The comparator checks the conformity of the policies and user require- ments based on their priorities, and the analyzer is responsible for uncovering the contradictions between the user requirements and their affiliate relations while applying Satisfiability Modulo Theo- ries (SMT) (Moura & Bjørner, 2011). The filter, allocator, and solver are three parts of the second phase of the framework, which is called the solution derivation. This section finds appropriate single services and eliminates policy-violating services. It determines a set of appropriate single services for each user’s needs and finally concludes with the best composite service with respect to policies 3818 A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 and requirements. It uses the backtracking algorithm (Neapolitan & Naimipour, 2009) for the assigned tasks. According to the definition of trust as a conceptual probability by which a composite service is expected to execute a task as well as expected by the user, a trust-based method and a framework are proposed by Xiaona, Bixin, Rui, Cuicui, and Shanshan (2012) to solve the CCSC problem. Applying the trust in the process of service composition is divided into three parts, including trustworthy in service selection, guaranteeing trust in the composition processes, and trustworthy in binding the generated plan. In the designed framework, the requirement analyzer classifies the requirements of the user into their different elements, including functional and non-functional requirements and expected input–output parame- ters. The service retriever restores information on the services from the resource pool using query requests. Inappropriate services are eliminated from the candidate list by a service filter, and the name and type of the remaining services are identified by an WSDL ana- lyzer. The clustering component, template generator, and binding optimizer are responsible for checking the services’ composability, checking the math interfaces, and evaluating the binding plan trust, respectively. CloudRecommender is a cloud-based three-layer structured service composition system that was proposed by Zhang, Ranjan, Nepal, Menzel, and Haller (2012). The first layer is a configuration management layer in which a cloud service ontology and cloud QoS ontology are located because of its two parts for uncovering services based on their functionality and QoS parameters; the ser- vices are mapped to a rational model and data structure. Applica- tion logic is the second layer, which is implemented to select single services in the form of SQL queries to include criteria, views, and stored procedures. The third layer is a widget that divides the user Table 1 Desired objectives in the investigated researches. Reference Approach RO1 RO2 Kofler et al. (2009) CGBA p Zeng et al. (2009) CGBA Gutierrez-Garcia and Sim (2010) CGBA Zhang and Dou (2010) CA p Pham et al. (2010) FW Wang et al. (2011) CGBA p Ye et al. (2011) CA Zhu et al. (2012) CGBA Hossain et al. (2012) CGBA Liu et al. (2012) CGBA p Liu et al. (2012) CGBA Worm et al. (2012) CGBA Zhou and Mao (2012) CGBA Wittern et al. (2012) CGBA p Jungmann and Kleinjohann (2012) CA p Ludwig (2012) CA Yang et al. (2012) CA Bao and Dou (2012) MBM p Sundareswaran et al. (2012) ST Chunqing et al. (2012) FW p Xiaona et al. (2012) FW p Zhang et al. (2012) FW Huang et al. (2013) CGBA p Qi and Bouguettaya (2013) CGBA Wang et al. (2013) CA Jula et al. (2013) CA p Zhao et al. (2013) CA p Dou et al. (2013) CGBA Wu et al. (2013) CGBA p Karim et al. (2013) CGBA p Zibin et al. (2013) CGBA Wu et al. (2013) FW p Lie et al. (2013) CA p p Fei Tao et al. (2013) CA CGBA, classic and graph-based algorithms; CA, combinatorial algorithms; MBM, machin interface into four objects, including the computing resources, storage resources, network resources, and recommendation. This layer is implemented using RESTful (Richardson, 2007) and several JavaScript frameworks. A novel framework is proposed for adaptive service selection in mobile cloud computing (Wu et al., 2013). The framework extracts the QoS preferences of the customer immediately after receiving a request. Then, based on the Euclidean distance, some of the nearest customer preference services are selected and suggested to the ser- vice adapter. Finally, the service adapter selects the best service among the suggested services to be assigned to the customer, with respect to device context matching and effectiveness of the service option. To reach the context matching service based on the input information, a fuzzy cognitive map model is also utilized in the ser- vice adapter module. The weaknesses of this method include that the proposed framework can only be used to select a single service; furthermore, this strategy has not been compared to other approaches. 5. Discussion 5.1. Objectives of the researches To respond to the first research question RQ1 and achieve a comprehensive view of the topic, it is essential to categorize the goals of the researches. Objective scrutiny of the considered papers guarantees the existence of nine categories, RO1 to RO9, in which each paper can be placed in one or more categories, as described in Table 1 and Fig. 4(a) and (b). Based on Table 1, the largest amount of researchers’ attention has been focused on RO3, and there is a large difference in terms of attention paid in the RO3 RO4 RO5 RO6 RO7 RO8 RO9 p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p p e-based methods; ST, structures; FW, Frameworks. A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 3819 literature between RO3 and RO1. This difference may be because decreasing the time complexity of the algorithms is a priority of researchers, and user satisfaction is the second-most important objective for researchers. The objective categories of the research are defined as follows: 1. User requirement satisfaction (RO1). Strictly abiding by cus- tomers’ requirements and providing facilities for them to deter- mine or describe their needs would make them more satisfied. 2. Qualitative to quantitative transformation of QoS parameters (RO2). When accurately considering the qualitative QoS param- eters and applying them in decision making, it is important to transform them into quantitative values using reliable methods. 3. Algorithm improvements (RO3). There are many different heu- ristic and non-heuristic algorithms that are introduced by algo- rithm designers and applied to solve NP-hard problems. The following efforts to improve the current algorithms and specify them for CCSC to obtain the best solutions or reduce the execu- tion time is one of the more often investigated objectives in the literature. 4. Introducing data storage and indexing structures (RO4). Pos- sessing appropriate and well-defined data structures and dat- abases can play an important role in the design of an efficient algorithm. Using a suitable indexing method is also useful in increasing the search speed, especially when the number of cases is very high. 5. Self-adaptability, automaticity, increasing reliability and accu- racy, and quality assurance (RO5). Establishing automated and self-adaptable service brokers is unavoidable (Denaro et al., 2007) because of increasing complexity, the number of requests, the number of available services in the pool, their diversity, and the limitation of human abilities. One of the main factors that attracts customers and retains them in utilizing cloud computing is reliability. Because of the importance of providing a reliable and self-adaptable service composition organization, the most important part of a cloud is in its direct contact with customers; researchers must consider this direct contact more seriously than before. 6. Proposing an improvised QoS mathematical model (RO6). Calculating a QoS value for composite services requires a math- ematical model in which all aspects, parameters, user require- ments, and tendencies are investigated. To reach this goal, some researchers have attempted to present improved models that focus more on these objectives. Fig. 4. (a) Importance percentage of objective categories a 7. Revenue maximization (RO7). Encouraging service providers to expose their high-quality services depends on the ability to amass significant profits. Thus, revenue maximization can be noted as an important fundamental aim. 8. Optimization of the service discovery process (RO8). If the ser- vice composer policy is not to register services based on prede- fined requirements, then it must discover required available simple services in the network. It is critical to have the type of policy that uses optimal discovery methods. 9. Proposing new frameworks and structures (RO9). Reaching some basic goals, e.g., the definition of new roles, requires changes in the framework of the CCSC and in the structures. In some cases, to achieve a goal, there could be a need to design a new frame- work. This scenario has led some researchers to be encouraged to design new frameworks or to change the existing frameworks. 5.2. Applied approaches To address RQ2, all of the proposed or applied approaches are di- vided into five distinct categories, as mentioned in Section 4.3. These categories and their statistics are summarized in Table 2 and Fig. 5(a) and (b), from which it can be inferred that the highest and lowest percentages of usage can be seen in the classic and graph-based algo- rithms (52%) and combinatorial algorithms (24%), respectively. 5.3. Investigated datasets The obtained answers for the third question RQ3 are not prom- ising. Possessing different datasets each of them can support sev- eral QoS parameters, and the predefined composition problem are useful and unavoidable for evaluating the proposed approaches and comparing their results. Unfortunately, the number of datasets that are available to all and in the research domain is very low and is limited to three datasets, QWS (Al-Masri & Mahmoud, 2009), WS-DREAM (Zibin, Yilei, & Lyu, 2010) and tpds 2012 (Zibin et al., 2013), and an unknown randomly generated dataset RG (Shanggu- ang et al., 2011). Researchers have also used a synthetic generator, rarely. The datasets used in each study are listed in Table 2. 5.4. Significance of QoS parameters and their percentage Based on the literature review presented in this paper, the pri- ority and importance percentage of the QoS parameters and their nd (b) number of repetitions of objective categories. 3820 A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 relevant factors have not yet been studied comprehensively. Thus, an analysis of the considered QoS parameters in the observed pa- pers and their statistics is essential. In this section, we attempted to extract the importance of different QoS parameters and their percentage considering this field of research. As specified in Table 2 and to answer RQ6, according to the sig- nificance and priority, most researchers have accounted for differ- ent QoS parameters, although others have neglected them. Based on the abovementioned parameters and their frequency of occur- rence in the literature, it is important to obtain the most important and effective QoS parameters and their importance percentage. To reach this goal, using Eq. (3), the number of occurrences of a parameter has been counted separately and divided by the sum of the number of occurrences of all parameters. The importance per- centage of each parameter is obtained by multiplying its calculated value by 100. The result obtained is the ratio of the importance percentage of every parameter to all of the other parameters. The number of occurrences of all of the parameters and their impor- tance percentages are shown in Fig. 6(a) and (b). imp percentageðiÞ¼ occurr noðiÞ Pparam no j¼1 occurr noðjÞ ð3Þ Table 2 Utilized tools, datasets and QoS parameters. Reference Tools Dataset QoS c Kofler et al. (2009) Kepler Workflow tool, CORBA, C++ – Resp Zeng et al. (2009) Visual C++ 6, PostgreSQL 8.4. – Resp Gutierrez-Garcia and Sim (2010) Not mentioned – Not m Zhang and Dou (2010) Not mentioned – Not m Pham et al. (2010) Not mentioned – Not m Wang et al. (2011) Matlab 7.6 WS-DREAM, RG Resp Relia Ye et al. (2011) Not mentioned – Resp Zhu et al. (2012) Visual C#.NET – Not m Hossain et al. (2012) Not mentioned – Resp Liu et al. (2012) Not mentioned – Cost Liu et al. (2012) Not mentioned WS-DREAM Not m Worm et al. (2012) Not mentioned – Resp Zhou and Mao (2012) Not mentioned – Not m Wittern et al. (2012) Researchers built a tool based on eclipse modeling framework – Not m Jungmann and Kleinjohann (2012) Not mentioned – Not m Ludwig (2012) Java – Resp Yang et al. (2012) Not mentioned – Not m Bao and Dou (2012) Not mentioned – Resp Sundareswaran et al. (2012) C – Not m Chunqing et al. (2012) Java, Cauldron, zChaff solver – Cost Xiaona et al. (2012) Not mentioned Seekda Avail Zhang et al. (2012) JavaScript, RESTful – Not m Huang et al. (2013) Not mentioned – Not m Qi and Bouguettaya (2013) Java Using Synthetic Generator Resp Wang et al. (2013) Matlab 7.6, Lp-Solve 5.5 QWS, Synthetic Generator Not m Jula et al. (2013) Visual C#.NET WS-DREAM Resp Zhao et al. (2013) Not mentioned – Resp Dou et al. (2013) Not mentioned – Cost, Wu et al. (2013) Not mentioned – Not m Karim et al. (2013) Not mentioned – Cost, Relia Zibin et al. (2013) Planet-lab, Axis2 tpds 2012 Resp Wu et al. (2013) Not mentioned – Not m Lie et al. (2013) Not mentioned – Not m Fei Tao et al. (2013) Not mentioned – Resp Func where imp_percentage(i) is the importance percentage of QoS parameter i, occur_no(i) is the number of repetitions of QoS param- eter i in the investigated papers, and param_no is the number of ob- served QoS parameters in the literature. There could be another way to look at the importance percent- age of the QoS parameters. It is possible to consider this issue in view of the ratio of the frequency of occurrences of the parameters in the papers. To increase the total number of papers, it is possible to account for all of the papers, including those that did not men- tion any parameters or excluded them. To reach the results, it is sufficient to divide the number of occurrences of each parameter by the total number of papers. The obtained results are shown in Fig. 7(a) and (b) when including and excluding papers that do not consider the QoS parameters, respectively. 6. Conclusion and future works Showcasing pertinent achievements and findings with a com- prehensive review generally increases the research efforts and pa- pers in that particular scientific field. Proposing novel techniques onsidered parameters onse time (RT), Cost onse time (RT), Availability (Avail), Reliability (Reli) entioned entioned entioned onse time (RT), Cost, Throughput (Throu), Reputation (Reput), Availability (Avail), bility (Reli) onse time (RT), Cost, Availability (Avail), Reputation (Reput) entioned onse time (RT), Cost entioned onse time (RT), Cost, Availability (Avail) entioned entioned entioned onse time (RT), Cost, Availability (Avail), Reliability (Reli) entioned onse time (RT), Cost, Reliability (Reli), Availability (Avail), Reputation (Reput) entioned ability (Avail), Durability (Dur) entioned entioned onse time (RT), Cost entioned onse time (RT), Cost onse time (RT), Cost, Availability (Avail), Reliability (Reli) Latency, Reputation (Reput), entioned Response Time (RT), Security (Secur), Reputation (Reput), Availability (Avail), bility (Reli), Durability (Dur), Data Control (DC) onse Time (RT), Throughput (Thr) entioned entioned onse time (RT), Cost, Reliability Reli, Energy, Trust, Maintainability (Maint), tion similarity (FS) Fig. 5. (a) Approach categories and (b) number of papers in each category. Fig. 6. (a) Percentage of repetition of QoS parameters and (b) number of repetitions of QoS parameters. Fig. 7. (a) Percentage of repetition of QoS parameters in all investigated papers and (b) percentage of repetition of QoS parameters in all investigated papers, excluding those that did not mention any parameters. A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 3821 and approaches for addressing cloud computing service composi- tion from different aspects in addition to addressing the lack of comparable activities were the strong motivating factors for pre- paring this systematic literature review. This paper has provided a complete definition of the CCSC in combination with its associ- ated concepts and a comprehensive analysis of the different ap- plied algorithms, mechanisms, frameworks and techniques extracted from 34 authentic published papers, spanning 3822 A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 2009–2013. The achievements of this review shed light on the re- search grounds of CCSC for future studies. This investigation demonstrates that all cloud computing ser- vice composition innovations and improvements can be catego- rized into the following five groups: classic and graph-based algorithms, combinatorial algorithms, machine-based methods, structures and frameworks. The most widely applied category is the classic and graph-based algorithms group (52%), and the least- used categories are machine-based methods and structures (3% and 4%, respectively). The objectives of these reports can also be di- vided into 9 categories, among which algorithm improvements (RO3) and user requirement satisfaction (RO1) have attracted the most attention (36% and 25%, respectively), while revenue maximi- zation (RO7) has been the least important objective (2%). Counting the number of QoS parameter occurrences indicated that 14 different parameters have been considered in the literature, among which service cost and response time are the most repeated ones (24% and 22%, respectively). Calculating the importance per- centage of the QoS parameters also revealed that the importance percentages of two mentioned parameters after including and excluding papers that do not consider the QoS parameters were 44% and 41% and 88% and 82%, respectively. To evaluate the pro- posed approaches, 4 QoS datasets have been previously identified, WS-DREAMS2, QWS, tpds2012 and RG, the first three of which are real-world extracted, and the last of which is generated randomly. Synthetic generators and Seedka are also two applications that have applied for runtime generating QoS values. With respect to the findings in this paper, achieving certain goals is of utmost importance for the planning of future work. Aim- ing to prepare an identical, competitive environment for compar- ing the proposed algorithms and approaches, it is indispensable to provide a set of differently sized, standard problems and a com- prehensive QoS dataset. The dataset should include a great number of unique services and service providers and encompass cost, re- sponse time, availability, reliability and reputation as significant QoS parameters. Another essential research goal is to focus on designing comprehensive mathematical models for calculating the QoS values composite services that cover all of the involved parameters and their importance percentage. Proposing real-time algorithms that can obtain a few optimal composite services for the given requests represent a significant achievement in the field. Furthermore, the less considered objectives (e.g., RO2, RO6, RO7, and RO8) should be regarded. Finally, acknowledging the stunning growth in mobile computing, future research efforts should be di- rected towards this forward-looking area. References Ai, L. F., & Tang, M. L. (2008). A penalty-based genetic algorithm for QoS-aware web service composition with inter-service dependencies and conflicts. New York: IEEE. Al-Masri, E., & Mahmoud, Q. H. (2008). Investigating web services on the world wide web. In Proceedings of the 17th international conference on world wide web (pp. 795–804). Beijing, China: ACM. Al-Masri, E., & Mahmoud, Q. H. (2009). Discovering the best web service: A neural network-based solution. In systems, man and cybernetics, 2009. SMC 2009. IEEE international conference on (pp. 4250–4255). Anselmi, J., Ardagna, D., & Cremonesi, P. (2007). A QoS-based selection approach of autonomic grid services. In Proceedings of the 2007 workshop on service-oriented computing performance. Aspects, issues, and approaches (pp. 1–8). Monterey, California, USA: ACM. Arapostathis, A., Borkar, V. S., Fernández-Gaucherand, E., Ghosh, M. K., & Marcus, S. I. (1993). Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM Journal on Control and Optimization, 31, 282–344. Ardagna, D., & Pernici, B. (2007). Adaptive service composition in flexible processes. IEEE Transactions on Software Engineering, 33, 369–384. Atashpaz-Gargari, E., & Lucas, C. (2007). Imperialist competitive algorithm: An algorithm for optimization inspired by imperialistic competition. In Evolutionary Computation, 2007. CEC 2007. IEEE Congress on (pp. 4661–4667). Bao, H. H., & Dou, W. C. (2012). A QoS-aware service selection method for cloud service composition. In 2012 IEEE 26th international parallel and distributed processing symposium workshops & Phd Forum (pp. 2254–2261). New York: IEEE. Barney, B. (2012). Message Passing Interface (MPI). In Lawrence Livermore National Laboratory (Vol. 2013). Barzegar, S., Davoudpour, M., Meybodi, M. R., Sadeghian, A., & Tirandazian, M. (2011). Formalized learning automata with adaptive fuzzy coloured petri net; an application specific to managing traffic signals. Scientia Iranica, 18, 554–565. Bayer, R., & Unterauer, K. (1977). Prefix B-trees. ACM Transaction on Database Systems, 2, 11–26. Borzsony, S., Kossmann, D., & Stocker, K. (2001). The Skyline operator. In Data Engineering, 2001. Proceedings. 17th international conference on (pp. 421–430). Canfora, G., Di Penta, M., Esposito, R., & Villani, M. L. (2005). An approach for QoS- aware service composition based on genetic algorithms. New York: Assoc Computing Machinery. Chang, W.-L., & Yuan, S.-T. (2009). A Markov-based collaborative pricing system for information goods bundling. Expert Systems with Applications, 36, 1660–1674. Chen, L., Li, M. L., & Cao, J. (2006). ECA rule-based workflow modeling and implementation for service composition. IEICE Transactions on Information and Systems, E89D, 624–630. Chou, S.-Y., Chang, Y.-H., & Shen, C.-Y. (2008). A fuzzy simple additive weighting system under group decision-making for facility location selection with objective/subjective attributes. European Journal of Operational Research, 189, 132–145. Chunqing, C., Shixing, Y., Guopeng, Z., Bu Sung, L., & Singhal, S. (2012). A Systematic Framework Enabling Automatic Conflict Detection and Explanation in Cloud Service Selection for Enterprises. In Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on (pp. 883–890). de Castro, L. N., & Von Zuben, F. J. (2002). Learning and optimization using the clonal selection principle. IEEE Transactions on Evolutionary Computation, 6, 239–251. Denaro, G., Pezze, M., & Tosi, D. (2007). Designing self-adaptive service-oriented applications. In Autonomic Computing, 2007. ICAC ‘07. Fourth International Conference on (pp. 16–16). Dillon, T., Chen, W., & Chang, E. (2010). Cloud computing: issues and challenges. In Advanced Information Networking and Applications (AINA), 2010 24th IEEE International Conference on (pp. 27–33). Dou, W., Zhang, X., Liu, J., & Chen, J. (2013). HireSome-II: Towards privacy-aware cross-cloud service composition for big data applications. IEEE Transactions on Parallel and Distributed Systems, 1. Ellinger, R. S. (2013). Governance in SOA Patterns (white paper). In The Northrop Grumman Corporation for Consideration by OASIS SOA Reference Architecture Team (pp. 1–11). Espadas, J., Molina, A., Jiménez, G., Molina, M., Ramírez, R., & Concha, D. (2013). A tenant-based resource allocation model for scaling software-as-a-service applications over cloud computing infrastructures. Future Generation Computer Systems, 29, 273–286. Fei, T., Dongming, Z., Yefa, H., & Zude, Z. (2008). Resource service composition and its optimal-selection based on particle swarm optimization in manufacturing grid system. IEEE Transactions on Industrial Informatics, 4, 315–327. Fei, T., Yuanjun, L., Lida, X., & Lin, Z. (2013). FC-PACO-RM: a parallel method for service composition optimal-selection in cloud manufacturing system. IEEE Transactions on Industrial Informatics, 9, 2023–2033. Fokkink, W. (2000). Introduction to Process Algebra. Springer-Verlag New York, Inc.. Gabrel, V., Manouvrier, M., Megdiche, I., & Murat, C.IEEE. (2012). A new 0-1 linear program for QoS and transactional-aware web service composition. New York: IEEE. Gao, A. Q., Yang, D. Q., Tang, S. W., Zhang, M., & Society, I. C. (2005). Web service composition using integer programming-based models. Los Alamitos: IEEE Computer Soc. Garousi, V., & Zhi, J. (2013). A survey of software testing practices in Canada. Journal of Systems and Software, 86, 1354–1376. Gekas, J., & Fasli, M. (2005). Automatic Web service composition based on graph network analysis metrics. In R. Meersman, Z. Tari, M. S. Hacid, J. Mylopoulos, B. Pernici, O. Babaoglu, H. A. Jacobsen, J. Loyall, M. Kifer, & S. Spaccapietra (Eds.). On the Move to Meaningful Internet Systems 2005: Coopis, Doa, and Odbase, Pt 2, Proceedings (3761, pp. 1571–1587). Berlin: Springer-Verlag Berlin. Gerede, E., Hull, R., Ibarra, O. H., & Su, J. (2004). Automated composition of e- services: lookaheads. In Proceedings of the 2nd International Conference on Service Oriented Computing (pp. 252–262). New York, NY, USA: ACM. Gutierrez-Garcia, J. O., & Sim, K. (2013). Agent-based cloud service composition. Applied Intelligence, 38, 436–464. Gutierrez-Garcia, J. O., & Sim, K. M. (2010). Agent-based service composition in cloud computing. In T. H. Kim, S. S. Yau, O. Gervasi, B. H. Kang, A. Stoica, & D. Slezak (Eds.). Grid and Distributed Computing, Control and Automation (121, pp. 1–10). Berlin: Springer-Verlag Berlin. Hamdaqa, M., & Tahvildari, L. (2012). Cloud Computing Uncovered: A Research Landscape. In H. Ali & M. Atif (Eds.), Advances in Computers (86, pp. 41–85). Elsevier. Haykin, S. (1998). Neural Networks: A Comprehensive Foundation. Prentice Hall PTR. He, Q., Yan, J., Jin, H., & Yang, Y. (2008). Adaptation of web service composition based on workflow patterns. In A. Bouguettaya, I. Krueger, & T. Margaria (Eds.). Service-Oriented Computing – ICSOC 2008, Proceedings (5364, pp. 22–37). Berlin: Springer-Verlag Berlin. Hossain, M. S., Hassan, M. M., Al Qurishi, M., & Alghamdi, A. (2012). Resource Allocation for Service Composition in Cloud-based Video Surveillance Platform. New York: IEEE. Huang, C.-L., Lo, C.-C., Chao, K.-M., & Younas, M. (2006). Reaching consensus: a moderated fuzzy web services discovery method. Information and Software Technology, 48, 410–423. http://refhub.elsevier.com/S0957-4174(13)00992-5/h0005 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0005 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0010 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0010 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0010 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0655 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0655 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0655 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0655 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0660 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0660 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0660 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0030 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0030 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0040 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0040 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0040 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0050 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0050 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0050 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0055 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0055 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0065 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0065 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0065 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0070 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0070 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0075 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0075 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0075 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0080 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0080 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0080 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0080 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0090 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0090 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0665 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0665 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0665 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0115 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0115 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0115 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0115 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0120 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0120 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0120 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0125 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0125 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0125 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0670 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0135 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0135 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0135 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0140 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0140 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0140 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0145 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0145 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0675 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0675 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0675 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0675 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0675 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0155 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0155 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0155 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0160 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0160 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0165 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0165 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0165 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0165 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0790 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0790 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0790 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0175 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0180 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0180 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0180 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0180 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0185 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0185 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0185 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0190 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0190 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0190 A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 3823 Huang, J., Liu, Y., Yu, R., Duan, Q., & Tanaka, Y. (2013). Modeling and algorithms for QoS-aware service composition in virtualization-based cloud computing. IEICE Transactions on Communications, E96.B, 10–19. Ishizaka, A., & Labib, A. (2011). Review of the main developments in the analytic hierarchy process. Expert Systems with Applications, 38, 14336–14345. Jensen, K. (1995). Coloured Petri Nets: Basic Concepts, Analysis Methods and Practical Use (2). Springer-Verlag. Jeonghwan, J., Rothrock, L., McDermott, P. L., & Barnes, M. (2010). Using the analytic hierarchy process to examine judgment consistency in a complex multiattribute task. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 40, 1105–1115. Jiang, H., Kwong, C. K., Chen, Z., & Ysim, Y. C. (2012). Chaos particle swarm optimization and T-S fuzzy modeling approaches to constrained predictive control. Expert Systems with Applications, 39, 194–201. Jula, A., & Naseri, N. K. (2012). A hybrid genetic algorithm-gravitational attraction search algorithm (HYGAGA) to solve grid task scheduling problem. In In International Conference on Soft Computing and its Applications (ICSCA’2012) (pp. 158–162). Planetary Scientific Research Center (PSRC). Jula, A., Othman, Z., & Sundararajan, E. (2013). A hybrid imperialist competitive- gravitational attraction search algorithm to optimize cloud service composition. In Memetic Computing (MC), 2013 IEEE Workshop on (pp. 37–43). Jungmann, A., & Kleinjohann, B. (2012). Towards the Application of Reinforcement Learning Techniques for Quality-Based Service Selection in Automated Service Composition. In Services Computing (SCC), 2012 IEEE Ninth International Conference on (pp. 701–702). Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: a survey. Journal of Artificial Intelligence Research, 4, 237–285. Karim, R., Chen, D., & Miri, A. (2013). An end-to-end QoS mapping approach for cloud service selection. In Services (SERVICES), 203 IEEE Ninth World Congress on (pp. 341–348). Kitchenham, B., & Charters, S. (2007). Guidelines for performing Systematic Literature Reviews in Software Engineering. (2.3 ed., pp. 1–65). Keele University and Durham University. Klein, A., Ishikawa, F., & Honiden, S. (2012). Towards network-aware service composition in the cloud. In the 21st international conference on World Wide Web (pp. 959-968). Lyon, France: ACM. Knuth, D. E. (2011). The Art of Computer Programming. Volume. 4A: Combinatorial Algorithms, Part 1, Pearson Education India. Kofler, K., Haq, I. U., & Schikuta, E. (2010). User-centric, heuristic optimization of service composition in clouds. LNCS, 6271, 405–417. Kofler, K., ul Haq, I., & Schikuta, E. (2009). A parallel branch and bound algorithm for workflow QoS optimization. In Parallel Processing, 2009. ICPP ‘09. International Conference on (pp. 478–485). Korkmaz, T., & Krunz, M. (2001). Multi-constrained optimal path selection. In INFOCOM 2001. Twentieth Annual Joint Conference of the IEEE Computer and Communications Societies. Proceedings (2, pp. 834–843). IEEE. Korte, B., & Vygen, J. (2012). Linear Programming (21). Berlin Heidelberg: Combinatorial Optimization Springer, pp. 51-71. Koshy, T. (2004). Chapter 11 – formal languages and finite-state machines. In Discrete Mathematics With Applications (pp. 733–802). Burlington: Academic Press. Kosuga, M., Kirimoto, N., Yamazaki, T., Nakanishi, T., Masuzaki, M., & Hasuike, K. (2002). A multimedia service composition scheme for ubiquitous networks. Journal of Network and Computer Applications, 25, 279–293. Kou, L. T., & Markowsky, G. (1977). Multidimensional bin packing algorithms. IBM Journal of Research and Development, 21, 443–448. Abonyi, J., Akerkar, R., Alavi, A. H., Arango, C., Aydogdu, I., Brest, J., et al. (2013). List of Contributors. In X.-S. Yang, Z. Cui, R. Xiao, A. H. Gandomi & M. Karamanoglu (Eds.), Swarm Intelligence and Bio-inspired Computation (pp. xv-xviii). Oxford: Elsevier. Li, D., Cheung, D., Shi, X., & Ng, V. (1998). Uncertainty reasoning based on cloud models in controllers. Computers & Mathematics with Applications, 35, 99–123. Li, Z., Fang, H., & Xia, L. (2014). Increasing mapping based hidden Markov model for dynamic process monitoring and diagnosis. Expert Systems with Applications, 41, 744–751. Liangzhao, Z., Benatallah, B., Ngu, A. H. H., Dumas, M., Kalagnanam, J., & Chang, H. (2004). QoS-aware middleware for web services composition. IEEE Transactions on Software Engineering, 30, 311–327. Liao, J. X., Liu, Y., Zhu, X. M., Xu, T., & Wang, J. Y.IEEE. (2011). Niching particle swarm optimization algorithm for service composition. In 2011 IEEE Global Telecommunications Conference. New York: IEEE. Lie, Q., Yan, W., & Orgun, M. A. (2013). Cloud service selection based on the aggregation of user feedback and quantitative performance assessment. In Services Computing (SCC), 2013 IEEE International Conference on (pp. 152–159). Lin, W., Dou, W., Luo, X., & Jinjun, C. (2011). A history record-based service optimization method for QoS-aware service composition. In Web Services (ICWS), 2011 IEEE International Conference on (pp. 666–673). Liu, F., & Zeng, G. (2009). Study of genetic algorithm with reinforcement learning to solve the TSP. Expert Systems with Applications, 36, 6995–7001. Liu, H., Zheng, Z. B., Zhang, W. M., & Ren, K. J. (2011). A global graph-based approach for transaction and QoS-aware service composition. KSII Transactions on Internet and Information Systems, 5, 1252–1273. Liu, M., Wang, M. R., Shen, W. M., Luo, N., & Yan, J. W. (2012). A quality of service (QoS)-aware execution plan selection approach for a service composition process. Future Generation Computer Systems-the International Journal of Grid Computing and Escience, 28, 1080–1089. Liu, S., Xiong, G., Zhao, H., Dong, X., & Yao, J. (2012). Service composition execution optimization based on state transition matrix for cloud computing. In Intelligent Control and Automation (WCICA), 2012 10th World Congress on (pp. 4126-4131). Liu, Y., Li, M., & Wang, Q. (2012). A novel user-preference-driven service selection strategy in cloud computing. International Journal of Advancements in Computing Technology, 4, 414–421. Ludwig, S. A. (2012). Clonal selection based genetic algorithm for workflow service selection. In Evolutionary Computation (CEC), 2012 IEEE Congress on (pp. 1–7). Luo, Y. S., Qi, Y., Hou, D., Shen, L. F., Chen, Y., & Zhong, X. (2011). A novel heuristic algorithm for QoS-aware end-to-end service composition. Computer Communications, 34, 1137–1144. Milanovic, N., & Malek, M. (2004). Current solutions for web service composition. IEEE Internet Computing, 8, 51–59. Moscato, P., & Cotta, C. (2003). A Gentle Introduction to Memetic Algorithms. In F. Glover & G. Kochenberger (Eds.), Handbook of Metaheuristics (57, pp. 105–144). US: Springer. Moura, L. D., & Bjørner, N. (2011). Satisfiability modulo theories: introduction and applications. Communication ACM, 54, 69–77. Neapolitan, R., & Naimipour, K. (2009). Foundations of Algorithms (4). Jones and Bartlett Publishers, Inc (December 28, 2009). Peter Mell, T. G. (2011). The NIST Definition of Cloud Computing. In N. I. o. S. a. Technology (Ed.): U.S. Department of Commerce. Pham, T. V., Jamjoom, H., Jordan, K., & Shae, Z.-Y. (2010). A service composition framework for market-oriented high performance computing cloud. In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing (pp. 284–287). Chicago, Illinois: ACM. Preve, N. P. (2011). Grid Computing: Towards a Global Interconnected Infrastructure. Springer. Qi, Y., & Bouguettaya, A. (2013). Efficient service skyline computation for composite service selection. IEEE Transactions on Knowledge and Data Engineering, 25, 776–789. Qiang, D., Yuhong, Y., & Vasilakos, A. V. (2012). A survey on service-oriented network virtualization toward convergence of networking and cloud computing. IEEE Transactions on Network and Service Management, 9, 373–392. Reynolds, R. G. (1999). Cultural Algorithms: Theory and Applications. In C. David, D. Marco, G. Fred, D. Dipankar, M. Pablo, P. Riccardo, & V. P. Kenneth (Eds.), New Ideas in Optimization (pp. 367–378). UK: McGraw-Hill Ltd.. Richardson, L. (2007). RESTful Web Services (1). O’Reilly Media. Sayed, A. H. (2003). Fundamentals of Adaptive Filtering. Wiley. Schmid, S., Chart, T., Sifalakis, M., & Scott, A. (2002). Flexible, Dynamic, and Scalable Service Composition for Active Routers. In Proceedings of the IFIP-TC6 4th International Working Conference on Active Networks (pp. 253–266). Springer- Verlag. Shangguang, W., Zibin, Z., Qibo, S., Hua, Z., & Fangchun, Y. (2011). Cloud model for service selection. In Computer Communications Workshops (INFOCOM WKSHPS), 2011 IEEE Conference on (pp. 666–671). Shrme, A. E. (2011). Hybrid intelligent technique for automatic communication signals recognition using bees algorithm and MLP neural networks based on the efficient features. Expert Systems with Applications, 38, 6000–6006. Singh, M. P. (2001). Physics of service composition. IEEE Internet Computing, 5, 6–7. Sinnema, M., & Deelstra, S. (2007). Classifying variability modeling techniques. Information and Software Technology, 49, 717–739. Smith, R. G. (1981). Correction to the contract net protocol: high-level communication and control in a distributed problem solver. IEEE Transactions on Computers, C-30, 372. Song, X., Dou, W., & Chen, J. (2011). A workflow framework for intelligent service composition. Future Generation Computer Systems, 27, 627–636. Song, X. D., Dou, W. C., & Chen, J. J. (2011). A workflow framework for intelligent service composition. Future Generation Computer Systems-the International Journal of Grid Computing and Escience, 27, 627–636. Song, Z., Geng, X., Kusiak, A., & Xu, C. (2011). Mining Markov chain transition matrix from wind speed time series data. Expert Systems with Applications, 38, 10229–10239. Stewart, B. J. (2009). Leveraging Amazon Web Services for Enterprise Application Integration (1). IBM Corporation, developerWorks, pp. 1–41. Strunk, A. (2010). QoS-aware service composition: a survey. In Web Services (ECOWS), 2010 IEEE 8th European Conference on (pp. 67–74). Sundareswaran, S., Squicciarini, A., & Lin, D. (2012). A Brokerage-Based Approach for Cloud Service Selection. In Cloud Computing (CLOUD), 2012 IEEE 5th International Conference on (pp. 558–565). Takabi, H., Joshi, J. B. D., & Gail-Joon, A. (2010). Security and privacy challenges in cloud computing environments. IEEE Security and Privacy, 8, 24–31. Tang, M. L., & Ai, L. F.IEEE. (2010). A Hybrid Genetic Algorithm for the Optimal Constrained Web Service Selection Problem in Web Service Composition. New York: IEEE. Torkashvan, M., & Haghighi, H. (2012). A Greedy Approach for Service Composition. New York: IEEE. Vanderbei, R. J. (2008). Linear Programming: Foundations and Extensions. Springer London, Limited. Vaquero, L. M., Rodero-Merino, L., Caceres, J., & Lindner, M. (2008). A break in the clouds: towards a cloud definition. SIGCOMM Compututer Communication Review, 39, 50–55. Wang, J.-Z., Wang, J.-J., Zhang, Z.-G., & Guo, S.-P. (2011). Forecasting stock indices with back propagation neural network. Expert Systems with Applications, 38, 14346–14355. http://refhub.elsevier.com/S0957-4174(13)00992-5/h0685 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0685 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0685 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0200 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0200 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0690 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0690 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0210 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0210 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0210 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0210 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0215 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0215 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0215 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0695 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0695 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0695 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0695 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0235 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0235 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0700 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0700 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0705 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0705 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0705 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0710 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0710 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0280 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0280 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0280 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0285 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0285 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0285 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0290 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0290 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0300 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0300 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0305 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0305 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0305 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0310 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0310 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0310 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0720 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0720 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0720 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0330 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0330 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0335 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0335 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0335 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0340 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0340 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0340 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0340 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0350 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0350 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0350 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0360 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0360 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0360 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0365 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0365 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0725 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0725 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0725 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0810 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0810 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0735 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0735 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0390 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0390 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0390 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0390 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0740 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0740 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0400 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0400 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0400 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0405 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0405 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0405 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0745 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0745 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0745 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0750 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0420 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0755 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0755 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0755 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0755 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0435 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0435 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0435 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0440 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0445 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0445 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0760 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0760 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0760 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0455 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0455 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0460 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0460 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0460 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0465 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0465 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0465 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0765 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0765 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0485 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0485 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0770 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0770 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0770 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0495 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0495 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0775 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0775 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0505 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0505 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0505 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0510 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0510 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0510 3824 A. Jula et al. / Expert Systems with Applications 41 (2014) 3809–3824 Wang, S., Sun, Q., & Yang, F. (2010). Towards web service selection based on QoS estimation. Internatinal Journal of Web Grid Services, 6, 424–443. Wang, S. G., Sun, Q. B., Zou, H., & Yang, F. C. (2013). Particle swarm optimization with skyline operator for fast cloud-based web service composition. Mobile Networks & Applications, 18, 116–121. Wang, Y. W. (2009). Application of Chaos Ant Colony Algorithm in Web Service composition based on QoS. Los Alamitos: IEEE Computer Soc. Wischik, D., Handley, M., & Braun, M. B. (2008). The resource pooling principle. SIGCOMM Computer Communication Review, 38, 47–52. Wittern, E., Kuhlenkamp, J., & Menzel, M. (2012). Cloud service selection based on variability modeling. LNCS, 7636, 127–141. Worm, D., Zivkovic, M., van den Berg, H., & van der Mei, R. (2012). Revenue maximization with quality assurance for composite web services. In Service- Oriented Computing and Applications (SOCA), 2012 5th IEEE International Conference on (pp. 1–9). Wu, Q., Zhang, M., Zheng, R., Lou, Y., & Wei, W. (2013). A QoS-satisfied prediction model for cloud-service composition based on a hidden Markov model. Mathematical Problems in Engineering, 2013, 7. Xiangkun, T., & Yi, G. (2010). A cosine theorem based algorithm for similarity aggregation of ontologies. In Signal Processing Systems (ICSPS), 2010 2nd International Conference on Vol. 2 (pp. V2–16-V12-19). Xiaona, W., Bixin, L., Rui, S., Cuicui, L., & Shanshan, Q. (2012). Trust-Based Service Composition and Optimization. In Software Engineering Conference (APSEC), 2012 19th Asia-Pacific Vol. 1 (pp. 67–72). Yakimenko, O. A., Slegers, N. J., Bourakov, E. A., Hewgley, C. W., Bordetsky, A. B., Jensen, R. P., Robinson, A. B., Malone, J. R., & Heidt, P. E. (2009). Mobile system for precise aero delivery with global reach network capability. In Control and Automation, 2009. ICCA 2009, IEEE International Conference on (pp. 1394–1398). Yang, Y., Mi, Z., & Sun, J. (2012). Game theory based IaaS services composition in cloud computing environment. Advances in Information Sciences and Service Sciences, 4, 238–246. Ye, Z., Zhou, X., & Bouguettaya, A. (2011). Genetic algorithm based QoS-aware service compositions in cloud computing. In J. Yu, M. Kim, & R. Unland (Eds.). Database Systems for Advanced Applications (6588, pp. 321–334). Berlin Heidelberg: Springer. Yi, W., & Blake, M. B. (2010). Service-oriented computing and cloud computing: challenges and opportunities. IEEE Internet Computing, 14, 72–75. Yu, Q., & Bouguettaya, A. (2010). Computing service skyline from uncertain QoWS. IEEE Transactions on Services Computing, 3, 16–29. Yu, T., & Lin, K.-J. (2005). Service selection algorithms for composing complex services with multiple qos constraints. In Proceedings of the Third International Conference on Service-Oriented Computing (pp. 130–143). Amsterdam, The Netherlands: Springer-Verlag. Zeng, C., Guo, X. A., Ou, W. J., & Han, D. (2009). Cloud computing service composition and search based on semantic. In M. G. Jaatun, G. Zhao, & C. Rong (Eds.). Cloud Computing, Proceedings (5931, pp. 290–300). Berlin: Springer-Verlag Berlin. Zeng, J., Duan, J., & Wu, C. (2010). A new distance measure for hidden Markov models. Expert Systems with Applications, 37, 1550–1555. Zhang, M., Ranjan, R., Nepal, S., Menzel, M., & Haller, A. (2012). A declarative recommender system for cloud infrastructure services selection. In Proceedings of the 9th International Conference on Economics of Grids, Clouds, Systems, and Services (pp. 102–113). Berlin, Germany: Springer-Verlag. Zhang, X. Y., & Dou, W. C. (2010). Preference-aware QoS evaluation for cloud web service composition based on artificial neural networks. In F. L. Wang, Z. G. Gong, X. F. Luo, & J. S. Lei (Eds.). Web Information Systems and Mining (6318, pp. 410–417). Berlin: Springer-Verlag Berlin. Zhao, S. S., Ma, L., Wang, L., & Wen, Z. P. (2012). An improved ant colony optimization algorithm for QoS-aware dynamic web service composition. Los Alamitos: IEEE Computer Soc.. Zhao, X., Wen, Z., & Li, X. (2013). QoS-aware web service selection with negative selection algorithm. Knowledge and Information Systems, 1–25. Zhao, X. C., Song, B. Q., Huang, P. Y., Wen, Z. C., Weng, J. L., & Fan, Y. (2012). An improved discrete immune optimization algorithm based on PSO for QoS- driven web service composition. Applied Soft Computing, 12, 2208–2216. Zhou, W., & Bovik, A. C. (2009). Mean squared error: love it or leave it? A new look at signal fidelity measures. IEEE Signal Processing Magazine, 26, 98–117. Zhou, X., & Mao, F. (2012). A Semantics Web Service Composition Approach Based on Cloud Computing. In Computational and Information Sciences (ICCIS), 2012 Fourth International Conference on (pp. 807–810). Zhu, Y., Li, W., Luo, J., & Zheng, X. (2012). A novel two-phase approach for QoS- aware service composition based on history records. In Service-Oriented Computing and Applications (SOCA), 2012 5th IEEE International Conference on (pp. 1–8). Zibin, Z., Xinmiao, W., Yilei, Z., Lyu, M. R., & Jianmin, W. (2013). QoS ranking prediction for cloud services. IEEE Transactions on Parallel and Distributed Systems, 24, 1213–1222. Zibin, Z., Yilei, Z., & Lyu, M. R. (2010). Distributed QoS Evaluation for Real-World Web Services. In Web Services (ICWS), 2010 IEEE International Conference on (pp. 83–90). Zissis, D., & Lekkas, D. (2012). Addressing cloud computing security issues. Future Generation Computer Systems, 28, 583–592. http://refhub.elsevier.com/S0957-4174(13)00992-5/h0515 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0515 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0520 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0520 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0520 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0525 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0525 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0530 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0530 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0780 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0780 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0545 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0545 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0545 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0565 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0565 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0565 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0785 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0785 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0785 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0785 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0575 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0575 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0580 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0580 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0585 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0585 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0585 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0585 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0590 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0590 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0590 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0595 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0595 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0600 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0600 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0600 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0600 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0605 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0605 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0605 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0605 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0610 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0610 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0610 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0615 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0615 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0620 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0620 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0620 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0625 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0625 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0640 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0640 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0640 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0650 http://refhub.elsevier.com/S0957-4174(13)00992-5/h0650 Cloud computing service composition: A systematic literature review 1 Introduction 2 Survey goals and execution 2.1 Survey goals and research questions 2.2 Publication statistics 3 Cloud computing 3.1 Cloud definition 3.2 Cloud computing characteristics 3.3 Cloud computing service models 3.4 Cloud computing deployment models 4 The cloud computing service composition problem (CCSC) 4.1 CCSC definition 4.2 CCSC challenges 4.3 Current approaches for CCSC 4.3.1 Classic and graph-based algorithms 4.3.2 Combinatorial algorithms 4.3.3 Machine-based methods 4.3.4 Structures 4.3.5 Frameworks 5 Discussion 5.1 Objectives of the researches 5.2 Applied approaches 5.3 Investigated datasets 5.4 Significance of QoS parameters and their percentage 6 Conclusion and future works References work_3rmubmxsfbbjlk6w2bf2tmt7nm ---- Article Reference Studying the Emotion-Antecedent Appraisal Process: An Expert System Approach SCHERER, Klaus R. Abstract The surprising convergence between independently developed appraisal theories of emotion elicitation and differentiation is briefly reviewed. It is argued that three problems are responsible for the lack of more widespread acceptance of such theories: (1) the criticism of excessive cognitivism raised by psychologists working on affective phenomena; (2) the lack of process orientation in linking appraisal to the complex unfolding of emotion episodes over time; and (3) the lack of consensus on the number and types of appraisal criteria between theorists in this domain. Although readers are referred to recent theoretical discussions and evidence from the neurosciences with respect to the first two issues, an empirical study using computerised experimentation is reported with respect to the third issue. Data obtained with an expert system based on Scherer's (1984a) “stimulus evaluation check” predictions show the feasibility of this approach in determining the number and types of appraisal criteria needed to explain emotion differentiation. It is suggested to use computer modelling and experimentation as a powerful tool [...] SCHERER, Klaus R. Studying the Emotion-Antecedent Appraisal Process: An Expert System Approach. Cognition and Emotion, 1993, vol. 7, no. 3/4, p. 325-355 DOI : 10.1080/02699939308409192 Available at: http://archive-ouverte.unige.ch/unige:102025 Disclaimer: layout of this document may differ from the published version. 1 / 1 http://archive-ouverte.unige.ch/unige:102025 Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=pcem20 Download by: [Université de Genève] Date: 08 January 2018, At: 02:28 Cognition and Emotion ISSN: 0269-9931 (Print) 1464-0600 (Online) Journal homepage: http://www.tandfonline.com/loi/pcem20 Studying the emotion-antecedent appraisal process: An expert system approach Klaus R. Scherer To cite this article: Klaus R. Scherer (1993) Studying the emotion-antecedent appraisal process: An expert system approach, Cognition and Emotion, 7:3-4, 325-355, DOI: 10.1080/02699939308409192 To link to this article: https://doi.org/10.1080/02699939308409192 Published online: 07 Jan 2008. Submit your article to this journal Article views: 377 View related articles Citing articles: 143 View citing articles http://www.tandfonline.com/action/journalInformation?journalCode=pcem20 http://www.tandfonline.com/loi/pcem20 http://www.tandfonline.com/action/showCitFormats?doi=10.1080/02699939308409192 https://doi.org/10.1080/02699939308409192 http://www.tandfonline.com/action/authorSubmission?journalCode=pcem20&show=instructions http://www.tandfonline.com/action/authorSubmission?journalCode=pcem20&show=instructions http://www.tandfonline.com/doi/mlt/10.1080/02699939308409192 http://www.tandfonline.com/doi/mlt/10.1080/02699939308409192 http://www.tandfonline.com/doi/citedby/10.1080/02699939308409192#tabModule http://www.tandfonline.com/doi/citedby/10.1080/02699939308409192#tabModule COGNITION AND EMOTION. 1993, 7(3/4), 325-355 Studying the Emotion-Antecedent Appraisal Process: An Expert System Approach Klaus R. Scherer University of Geneva, Switzerland The surprising convergence between independently developed appraisal theories of emotion elicitation and differentiation is briefly reviewed. It is argued that three problems are responsible for the lack of more widespread acceptance of such theories: (1) the criticism of excessive cognitivism raised by psychologists working on affective phenomena; (2) the lack of process orientation in linking appraisal to the complex unfolding of emotion episodes over time; and (3) the lack of consensus on the number and types of appraisal criteria between theorists in this domain. Although readers are referred to recent theoretical discussions and evidence from the neurosciences with respect to the first two issues, an empirical study using computerised experimentation is reported with respect to the third issue. Data obtained with an expert system based on Scherer’s (1984a) “stimulus evaluation check” predictions show the feasibility of this approach in determining the n u m b e r and types of appraisal criteria needed to explain emotion differentia- tion. It is suggested to use computer modelling and experimentation as a powerful tool to further theoretical development and collect pertinent data on the emotion-antecedent appraisal process. INTRODUCTION The notion that emotions are elicited and differentiated via appraisal of situations or events as centrally important to a person has a venerable history. The idea can be traced from t h e writings of early philosophers such as Aristotle, Descartes, and Spinoza to theoretical suggestions by pioneering emotion psychologists such as Stumpf (see Reisenzein & Schonpflug, 1992). In the 196Os, Arnold (1960) and Lazarus (1968) had Requests for reprints should be sent to Klaus R . Scherer, Department of Psychology, University of Geneva, 9, Rte de Drize, Carouge, CH-1227 Geneva, Switzerland. This paper was specifically prepared for the special issue of Cognition and Emotion on Appraisal and Beyond. The author gratefully acknowledges important contributions and suggestions by George Chwclos, Nico Frijda, Keith Oatley, Ursula Scherer. and two anonymous reviewers. Q 1993 Lawrence Erlbaum Associates Limited D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 326 SCHERER explicitly formulated theories incorporating rudimentary appraisal criteria in an effort to explain the emotional consequences of being faced with a particular event. At the beginning of the 1980s a number of psychologists independently proposed detailed and comprehensive sets of appraisal criteria to explain the elicitation and differentiation of the emotions (De Rivera, 1977; Frijda, 1986; Johnson-Laird & Oatley, 1989; Mees, 1985; Ortony, Clore, & Collins, 1988; Roseman, 1984, 1991; Scherer, 1981, 1982,1983, 1984a,b, 1986; Smith & Ellsworth, 1985,1987; Solomon, 1976; Weiner, 1982) and engaged in empirical research to demonstrate the validity of these hypothetical suggestions (Ellsworth & Smith, 1988; Frijda, 1987; Frijda, Kuipers, & ter Schure, 1989; Gehm & Scherer, 1988; Manstead & Tetlock, 1989; Reisenzein & Hofrnann, 1990; Roseman, 1984, 1991; Roseman, Spindel, & Jose, 1990; Smith & Ellsworth, 1985, 1987; Tesser, 1990; Weiner, 1986). In a comparative review of such “appraisal theories of emotion differentiation” Scherer (1988) attempted to show the extraordinary degree of convergence of the different theoretical sugges- tions, especially with respect to the central criteria postulated in the different approaches (see Table 1, reproduced from Scherer, 1988). This convergence is all the more surprising since the theorists concerned come from widely different traditions in psychology and philosophy. The impres- sion that appraisal theories of emotion differentiation have generated a highly cumulative body of research has been confirmed in more recent reviews as well as in some comparative empirical studies (Lazarus & Smith, 1988; Manstead & Tetlock, 1989; Reisenzein & Hofmann, 1990; Roseman, et al. 1990; Scherer, 1988). It seems reasonable to take such theoretical and empirical convergence as an indication of the plausibility and validity of appraisal theories, particularly in the light of the absence of rival theories that could reasonably claim to explain emotion differentiation by alternative con- ceptual frameworks. Yet, appraisal theories currently face three major challenges which seem to prevent more widespread acceptance of this explanatory framework: (1) the reproach of excessive cognitivism; (2) the lack of process orientation; and (3) the lack of consensus on the number and types of appraisal criteria. 1. The Reproach of Excessive Cognitivism Appraisal theorists are often accused of excessive cognitivism by psycholo- gists dealing with a wide variety of different affective phenomena. Critics question the likelihood that elaborate cognitive evaluations are performed during the few milliseconds that seem sufficient to bring about a n emotion episode. It is further suggested that affective arousal can be triggered without any evaluative processing at all (Zajonc, 1980). The “cognition- emotion controversy” (Lazarus, 1984a,b; LeDoux, 1987, 1989; Leventhal D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 T A B L E 1 C o n v er g en ce o f S et s of A pp ra is al C ri te ri a as S u g g es te d b y D if fe re nt A pp ra is al T h eo ri st s S ch er er F ri jd a O rt o n yl C lo re R o se rn an S rn ith lE IL F w o rt h S o lo m o n W ei n er N o ve lt y S ud de nn es s Fa m il ia ri ty Pr ed ic ta bi li ty In tr in si c p le as an tn es s G o a l si g n if ic an ce C on ce rn r el ev an ce O ut co m e pr ob ab il it y E xp ec ta ti on C on du ci ve ne ss U rg en cy C o p in g p o te n ti al C au se : A ge nt C au se : M ot iv e C on tr ol Po w er A dj us tm en t C o m p at ib ili ty s ta n d ar d s E xt er na l In te rn al C ha ng e Fa m ili ar ity V al en ce Fo ca lit y C er ta in ty Pr es en ce O pe nl C lo se d U rg en cy In te nu se lf -O th er M od if ia bi lit y C on tr ol la bi li ty V al ue r el ev an ce U ne xp ec te dn es s A pp ea li ng ne ss L ik el ih oo d Pr os pe ct r ea li sa ti on D es ir ab il it y Pr ox im it y A ge nc y B la m ew or th in es s A tt en ti on Pl ea sa nt ne ss A pp IA ve M ot iv es S co pe F oc us Pr ob ab il it y C er ta in ty M ot iv e co ns is te nc y G oa ll P at h ob st ac le E va lu at io n A nt ic ip at ed e ff or t A ge nc y P ow er A ge nc y A ge nc y L eg it im ac y R es po ns ib il it y L oc us o f ca us al it y St ab il it y C on tr ol la bi li ty P ow er C on tr ol la bi li ty A pp , A pp ro ac h; A ve , av oi da nc e. R ep ro du ce d fr om S ch er er ( 19 88 . p . 92 ) w N 4 D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 328 SCHERER & Scherer, 1987; Zajonc, 1980, 1984; Zajonc & Markus, 1984) is centrally concerned with this issue. The crux of the matter, however, is the definition of cognition, a term which has not gained in precision by becoming increasingly fashionable. Although the formulations used by some theorists may suggest that appraisal is viewed as a conscious, and consequently exclusively cortically based process, other theorists in this tradition have insisted early o n that the cognitivistic connotations of the terms “appraisal” or “evaluation” do nor preclude that a substantial part of these processes occur in an unconscious fashion, mediated via sub- cortical, e.g. limbic system, structures (Scherer, 1984a,b). Leventhal and Scherer (1987) have pointed out that evaluation can occur at the sensori- motor, schematic, or conceptual levels, respectively, and that, rather than discussing the cognition issue on an abstract level, one should determine the precise nature of the information-processing involved. LeDoux (1989), from a neuropsychological point of view, has likewise advocated to address the issue of the nature of emotion-antecedent information-processing and its underlying neural pathways rather than getting sidetracked by the issue of the definition of cognition: “The process involved in stimulus evaluation could, if one chose, be called cognitive processes. The meaning of the stimulus is not given in physical characteris- tics of the stimulus but instead is determined by computations performed by the brain. As computation is the benchmark of the cognitive, the computation of affective significance could be considered a cognitive process” (LeDoux, 1989, p. 271). LeDoux and his coworkers have in fact empirically demonstrated the existence of subcortical stimulus evaluation patterns for affect eliciting situations in rats (LeDoux, 1987, 1989; LeDoux, Farb, & Rugiero, 1990). The empirical demonstration of such patterns in humans is hardly to be expected at present because most current research on emotion-antecedent appraisal in human subjects uses self-report of emotional experiences (necessarily involving higher centres of the brain). Subjects are generally asked to recall or infer the nature of their event or situation appraisal, often with the help of rating scales constructed on the basis of the theoretically assumed appraisal dimensions. Clearly, verbally reported appraisal patterns are mediated via conscious, almost exclusively cortically controlled information-processing, and are thus easy targets for charges of excessive cognitivism. They are also subject to the criticism that such recall or inference illustrates social representations of emotions rather than reflecting the actual emotion-eliciting process. Given the difficulty of settling these issues empirically, Scherer (1993) has suggested to look toward potential contributions from the neurosciences to better understand the nature of the appraisal process. The author denotes a number of possibilities of empirically studying controversial D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 STUDYING EMOTION-ANTECEDENT APPRAISAL 329 questions related to the appraisal notion with the help of modern neuroscience technology, such as electroencephalographic signal analysis and imaging techniques, and adopting neuropsychologically oriented experimental designs as well as case studies of neurologically impaired patients. Such procedures might help to overcome one of the most serious limitations of current empirical research on emotion-antecedent appraisal: The reliance on respondents’ verbal reports of recalled or inferred situation evaluations. 2. The Lack of Process Orientation The second problem mentioned earlier, lack of a process orientation in many appraisal theories, is responsible for the frequently encountered opinion that appraisal theories basically provide a semantic grid for the comprehension of the use of emotion terms or labels, and are thus limited to structural analyses or explications of semantic fields of emotion terms. This impression is due partly to the explicit semantic orientation of some ‘of the models that have been proposed (Ortony et al., 1988), and partly to the use of verbal labels in all theories to identify the emotional states that are seen to be elicited and differentiated by the appraisal process. It is certainly one of the legitimate applications of appraisal theories to identify the nature of the emotion-antecedent appraisal process that determines which verbal label will be chosen to communicate the nature of the emotion episode. However, appraisal theories need to go beyond semantics and attempt to specify the true nature of the emotion-antecedent appraisal process. This process might result in an emotional state that the person concerned is unable or unwilling to label with one of the standard emotion terms that are currently used in emotion research. Scherer (1984a) has argued that the stimulus or event evaluation process can elicit as many different emotional states as there are distinguishable outcomes of the appraisal process. This suggestion clearly contradicts the notion that there are a very limited number of “basic” or “fundamental” discrete emotions (Ekman, 1984, 1992; Izard, 1977; Tomkins, 1984). In order to allow systematic discussion of this issue, it is necessary to agree on a consensual definition of emotion that helps to explicate the boundaries between different emotional states and their components (see Scherer, 1993). A further requirement for advancing in the debate on this issue is the specification of the micro-generic process of appraisal and reaction. Although many emotion theories give the impression that emotions are static states that can be conveniently labelled with a single term, there can be little doubt that we need to talk about emotion epkodes that are characterised by continuously occurring changes in the underlying appraisal and reaction processes (see Folkman & Lazarus, 1985; Frijda, D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 330 SCHERER 1986; Scherer, 1984a,b). In consequence, it is not sufficient to specify a pattern of appraisal results that is supposed to explain a static emotion as indexed by a label. The nature of the appraisal process and the immediate effects of the evaluation results on the other components of emotion (such as subjective feeling, physiological responses, motor expression, and action tendencies) need to be explored. Unfortunately, most of the appraisal theorists have so far devoted only very limited attention to the process underlying the evaluation of situations, events, or actions. An exception to this general pattern is the component process theory suggested by Scherer (1984a,b, 1986, 1988), which postulates that the appraisal criteria (stimulus evaluation checks, abbreviated as SECs) pro- posed occur in an invariant sequence (in the order shown in Table 2). The sequence notion, which is based on phylogenetic, ontogenetic, and micro- genetic (logical) considerations, cannot be discussed in detail in the present context. Generally speaking, it is assumed that the appraisal process is conrrunrfy operative with evaluations being continuously performed to update the organism’s information on an event or situation (including the current needs or goals of the organism and the possibility to act on these). In consequence, the sequential stimulus evaluation checks are expected to occur in very rapid succession (similar to a rotating radar antenna updating the reflection patterns on the screen). This continuous operation can explain the sudden changes that can occur during emotion episodes and which are often based on re-evaluations of the event or of one’s coping potential (cf. Lazarus’, 1968, “secondary appraisal”; see Scherer, 1984a,b, for further details on the hypothesised sequential processing). Many different objections have been raised against this sequence notion. Quite a few of these can be refuted on logical grounds or on the basis of recent insights into the neural bases of information-processing, particularly with respect to neural networks (see Scherer, 1993, for a detailed dis- cussion). However, empirical research is needed to demonstrate the feasibility of the sequence hypothesis and to encourage further work in this direction. Unfortunately, our dependence on verbal report of recalled or inferred appraisal processes does not lend itself to the study of the sequence hypothesis. It is likely that the different steps of the evaluation process occur extremely rapidly and are not generally represented in awareness. Any reconstruction of these processes is likely to miss the temporal dynamics of the process. In the future, neuroscience technology might allow us to monitor such rapidly occumng evaluation sequences directly. Also, i t seems feasible to develop sophisticated research designs making use of latency time measures in carefully designed stimulus presentation modes to shed some light on these time-critical processes (see D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 STUDYING EMOTION-ANTECEDENT APPRAISAL 331 Scherer, 1993, for concrete suggestions on adopting appropriate paradigms from the cognitive neurosciences). Unfortunately, such studies might well be slow in the making. 3. The Lack or Consensus on the Number and Types of Appraisal Criteria The third problem concerns the issue of how many and precisely which evaluation or appraisal dimensions are necessary to account for the degree of emotion differentiation that can be empirically demonstrated. Although, as mentioned earlier, there is much convergence in this field, authors do differ with respect to the number and definition of appraisal dimensions that are proposed. A few recent studies have attempted to compare different appraisal theories and to empirically determine how many dimensions are needed and which dimensions seem to account for most of the variance (Manstead & Tetlock, 1989; Mauro, Sato, & Tucker, 1992; Reisenzein & Hofmann, 1990; Roseman et al., 1990). All of these studies are limited to post hoc evaluation of how well the dimensions studied explain differentiation between the emotions reported by the subjects. In other words, the same group of subjects provides both the emotion and the appraisal information and statistical analysis is limited to identifying the shared variance. Needless to say, the results cannot be generalised beyond the respective set of emotions and dimensions studied. Even though such information is eminently useful for the further develop- ment of appraisal theories, it seems desirable to develop a model that emphasises the prediction of emotional states on the basis of a minimal set of necessary and sufficient dimensions or criteria of appraisal. The empirical study to be reported in this paper suggests such a predictive approach. Based on Scherer’s component process model of emotion (1984a,b, 1986, 1988), an expert system on emotion differentia- tion that contains such a minimal set of evaluation criteria is presented and submitted to a first empirical test. As shown earlier, the question of how many and which appraisal criteria are minimally needed to explain emotion differentiation is one of the central issues in research on emotion-antecedent appraisal. It is argued here that one can work towards settling the issue by constructing, and continuously refining, an expert system that attempts to diagnose the nature of an emotional experience based exclusively on information about the results of the stimulus or event evaluation processes that have elicited the emotion. The knowledge base of the expert system would contain a limited set of evaluation or appraisal criteria together with theoretically defined (and empirically updated) predictions about which pattern of evaluation results is likely to produce a particular emotion out of a limited D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 T A B L E 2 P at te rn s o f S ti m ul us E va lu at io n C he ck s (S E C ) P re di ct ed to D if fe re nt ia te 1 4 M aj or E m ot io ns ~ E N JI H A P E L A IJ O Y D IS P ID IS G C O N IS C O SA D ID E J D E SP A IR A N X IW O R N ov el ty S ud de nn es s Fa m il ia ri ty Pr ed ic ta bi li ty In tr in si c pl ea sa nt ne ss G oa l s ig ni fic an ce C on ce rn r el ev an ce O ut co m e pr ob ab il it y E xp ec ta ti on C on du ci ve ne ss U rg en cy C op in g po te nt ia l C au se : A ge nt C au se : M ot iv e C on tr ol Po w er A dj us tm en t C om pa tib ili ty s ta nd ar ds E xt er na l In te rn al lo w op en m ed iu m hi gh op en v hi gh co ns on an t co nd uc iv e v lo w op en in te nt op en op en hi gh op en op en hi /m ed op en lo w op en se lf lr el a v hi gh op en v co nd uc iv e lo w op en ch al in t op en op en m ed iu m op en op en op en lo w lo w v lo w bo dy v hi gh op en op en m ed iu m op en op en op en op en op en op en op en op en op en op en op en re ld or de r hi gh op en op en lo w ot he r in te nt hi gh lo w hi gh v lo w v lo w lo w lo w op en op en op en v hi gh op en ob st ru ct lo w op en ch al ne g v lo w v lo w m ed iu m op en op en hi gh v lo w lo w op en op en v hi gh di ss on an t ob st ru ct hi gh o th h at ch al ne g v lo w v lo w v lo w op en op en lo w op en op en op en bo d yl se lf m ed iu m op en ob st ru ct m ed iu m ot w na t op en op en lo w m ed iu m op en op en D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 F E A R IR R IC O A R A G E IH O A B O R II N D S H A M E G U IL T P R ID E N ov el ty S ud de nn es s Fa m il ia ri ty Pr ed ic ta bi li ty in tr in si c pl ea sa nt ne ss G oa l s ig ni fi ca nc e C on ce rn r el ev an ce O ut co m e pr ob ab il it y E xp ec ta ti on C on du ci ve ne ss U rg en cy C op in g po te nt ia l C au se : A ge nt C au se : M ot iv e C on tr ol P ow er A dj us tm en t C om pa ti bi li ty st an da rd s E xt er na l In te rn al hi gh op en lo w lo w bo dy hi gh di ss on an t ob st ru ct v hi gh ot hl na t op en op en v lo w lo w op en oD en lo w op en m ed iu m op en or de r v hi gh op en ob st ru ct m ed iu m op en in t/n eg hi gh m ed iu m hi gh lo w lo w hi gh lo w lo w op en or de r v hi gh di ss on an t ob st ru ct hi gh ot he r in te nt hi gh hi gh hi gh lo w lo w v lo w hi gh v hi gh op en bo dy v hi gh co ns on an t op en lo w op en op en m ed iu m m ed iu m hi gh op en op en lo w op en op en op en se lf v hi gh op en op en hi gh se lf in t/ ne g op en op en m ed iu m op en v lo w op en op en op en op en re ld or de r v hi gh op en hi gh m ed iu m se lf in te nt op en op en m ed iu m v lo w v lo w op en op en op en op en se lf v hi gh op en hi gh lo w se lf in te nt op en op en hi gh hi gh v hi gh _ _ _ _ _ ~ ~ ~ ~ A bb re vi ar io ns : E N JI H A P , en jo ym en th ap pi ne ss ; E L A II O Y , el at io nl jo y; D IS P ID IS G , di sp le as ur el di sg us t; C O N IS C O , co nt em pt ls co rn ; SA D ID E I, s ad ne sd de je ct io n; I R R IC O A , ir ri ta ti od co ld a ng er ; R A G E IH O A , ra ge ho t an ge r; B O R II N D , bo re do di nd if fe re nc e; v , ve ry ; re la , re la ti on sh ip s; n or . n at ur e; c ha , c ha nc e; n eg , n eg li ge nc e; i nr o r in te nt , i nt en ti on ; 0t h o r ot he r. o th er p er so n( s) . R ep ro du ce d fr om S ch er er ( 19 88 , p . 11 2) . w w 0 D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 334 SCHERER set of possibilities. At present, this system is limited to predicting the verbal labels given to the emotions experienced and to obtain the required information about appraisal processes by requesting verbal report of recalled or inferred evaluation results. As shown earlier, this is a highly imperfect approach to study the dynamic appraisal and reaction processes involved in emotional episodes, many of which do not require involvement of consciousness or language-or may not even be accessible to them. However, even an approximative approach to a predictive model seems useful at our present state of knowledge. METHOD Designing the Expert System The aim was to develop a computer program that would allow a user to enter information on a situation in which a strong emotion had been experienced and have the program predict or diagnose the nature of that emotional state (as represented by a verbal label).’ Using TurboPascal3.0, a program called GENESE (Geneva Expert System on Emotions) was developed.* In contrast to expert systems based on IF-THEN rules the present system is of the type that employs algorithms determining the relative similarity between input vectors and prototypical category vectors representing the knowledge base. In the present case t h e “knowledge base” consists of a set of vectors (one for each emotion) which contain quantified predictions relative to the typical stimulus evaluation check outcomes for specific emotions. These vectors have been derived from the prediction tables published by the author in earlier work (Scherer, 1984a,b, 1986, 1988). The most recent set of predictions is shown in Table 2 (reproduced from Scherer, 1988). Concretely, then, for each of the specific emotions contained in the expert system, a vector of numbers (which represent the predicted results of selected stimulus evaluation checks for the respective emotions) con- stitutes the prototypical pattern which will be used to classify user- generated input vectors. The input vector for a target emotion to be classified (which is determined by the user’s choice of a recalled emotional experience he or she wants to have diagnosed) is determined by the computer asking the 15 questions listed in Table 3 and requiring the user to answer with the help of predefined answer categories. Each of these ’ A similar approach was independently developed by Frijda and Swagerman (1987). * The prototype of the system was written in 1987 by Philippe N a r k 1 and Roland Bapst based on specifications by the author who has continuously modified the program since. D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 STUDYING EMOTION-ANTECEDENT APPRAISAL 335 questions corresponds to a particular stimulus evaluation check or sub- check. The numbers representing the predicted prototypical answer alter- natives for each question constitute the entries for the stimulus evaluation checks into the prediction vector for the respective emotion. These prediction vectors are shown in the second row of the vector matrices for the 14 emotions in Table 5. It should be noted that although the prediction vectors have been derived from earlier prediction tables, not all of the stimulus evaluation subchecks listed in Table 2 have been included in the quantified prediction vectors of the GENESE expert system. The need for a selection of what seemed to be the most important and differentiating checks was imposed by the necessity to curtail the number of questions posed to the user. Furthermore, for some subchecks, e.g. agent of causation, several ques- tions had to be asked to obtain the required quantitative information. Table 3 shows the correspondence between the stimulus evaluation checks or subchecks and the specific questions. It should be noted further that the prediction vectors (as contained in the system and shown in Table 5) are based on but do not necessarily correspond exactly to the earlier prediction tables (e.g. Table 2). The author considers theory development a dynamic process. Consequently, predictions change and evolve over time. For example, the prediction vectors in Table 5 show some changes over earlier hypothesising. In particular, an attempt has been made to reduce the number of “open” or “not pertinent” predictions (see Table 2)- particularly in the case of shame and guilt-as these reduce the discriminative power of the vectors in the expert system. The present version of GENESE contains prediction vectors for the 14 emotions listed in Table 2. The choice of these 14 emotions was determined by the arguments advanced in Scherer (1986) advocating to distinguish between more quiet and more aroused varieties of some of the major emotions, e.g. imtatiodcold anger vs. ragehot anger. The input vector, as based on the user’s answers to the 15 questions, is systematically compared to the 14 predicted emotion vectors, using Euclidian distance measures. The distance indices obtained in this fashion are then adjusted on the basis of theoretical considerations concerning the need to weight particular combinations of input values. The following adjustments of the distance indices are used in the present version of the expert system: - 0.3 for shame and guilt if the causal agent is “self’ + 0.8 for all positive emotions if the event is evaluated as unpleasant + 0.3 for contempt except in cases in which another person is the causal and hindering goal attainment agent and the act is highly immoral (- 0.6) D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 336 SCHERER - 0.5 for sadness and desperation if the event happened in the past and - 0.5 for imtation and anger if power and adjustment are high - 0.3 for joy, desperation, fear, and anger if the intensity of the emotion if power and adjustment are low is rated above 4 on a 6-point scale The nature of the adjustments and the size of these increments or decrements of the distance value computed on the basis of the comparison of the input vector with the prediction vectors are based on rules of thumb and are subject to change in future versions of the system. TABLE 3 Questions Posed by t h e Expert System a n d their Correspondence to t h e Stimulus Evaluation Checks ( S E W 1. Did the situation that elicited your emotion happen very suddenly or abruptly? (0) not pertinent (1) not at all (2) a little (3) moderately (4) strongly ( 5 ) extremely 2. Did the situation concern an event o r an action that had happened in the past, that had just happened or that was to be expected for the future? [see text] (0) not pertinent (1) the event had happened a long time ago (2) it happened in the recent past (4) it was to be expected for the near future 3. This type of event, independent of your personal evaluation, would it be generally considered as pleasant or unpleasant? [SEC?-MTRINSIC PLEASANTNESS] (0) not pertinent (1) very unpleasant ( 2 ) rather unpleasant (3) indifferent (4) rather pleasant 4. Was the event relevant for your general well-being, for urgent needs you felt, or for specific goals or plans you were pursuing at the time? [SEC>RELEVANCE] (0) not pertinent (1) not at all ( 2 ) a little (3) moderately (4) strongly ( 5 ) extremely 5 . Did you expect the event and its consequences before the situation actually happened? [ SEC3-EXPECTATION) (0) not pertinent (4) a little ( 5 ) strongly 6. Did the event help or hinder you in satisfying your needs, in pursuing your plans or in attaining your goals? [SECICONDUCTVENESS] (0) not pertinent (4) it helped a little 7. Did you feel that action on your part w a s urgently required to cope with the event and its consequences? [SECI-URGENCY] (0) not wrtinent (1) not at all (2) a little (3) moderately (4) strongly ( 5 ) extremely [SECl-NOVELTY] (3) it had just happened at that moment ( 5 ) it was to be expected in the long run ( 5 ) very pleasant (1) never in my life (2) not really (3) I did not exclude it (1) it hindered a lot (2) it hindered a little (3) it had no effect ( 5 ) it helped a lot (Continued) D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 STUDYING EMOTION-ANTECEDENT APPRAISAL 337 TABLE 3 (Con tinued) 8. Was t h e event caused by your own action-n o t h e r words, were you partially or fully responsible for what happened? [ S E C M A U S A T I O N ] (0) not pertinent ( I ) not at all (2) a little, but unintentionally (3) somewhat, but I was unaware of the consequences (4) quite responsible, 1 knew what I was doing (5) fully responsible, I absolutely wanted t o d o what I did 9. W a s t h e event caused by o n e or several o t h e r p e r s o n s i n o t h e r words, were o t h e r people partially or fully responsible for what happened? [ S E C M A U S A T I O N ] (0) not pertinent (2) a little, but unintentionally (3) somewhat, but helshdthey were unaware of the consequences (4) quite responsible, he/she/they knew what they were doing (5) fully responsible, h e k h d t h e y absolutely wanted t o d o what they did 10. Was t h e event mainly d u e to chance? [ S E C M A U S A T I O N ] (0) not pertinent (3) somewhat, but human action contributed to it 1 1 . Can t h e occurrence and the consequences of this type of event generally be controlled or modified by human action? (SEC4-CONTROLI (0) not pertinent (1) not at all (2) a little (3) moderately (4) strongly (5) extremely 12. Did you feel that you had enough power t o cope with the e v e n t 4 . e . being able t o influence what was happening or t o modify the consequences? [SEC&POWER] (0) not pertinent ( I ) not at all (2) a little (3) moderately (4) strongly ( 5 ) e x m m e l y 13. Did you feel that, after having used all your means of intervention, you could live with the situation a n d adapt to the consequences? [SEC&ADIUSTMENT] (0) not pertinent (1) not at all (2) with much difficulty (3) somewhat (4) quite easily ( 5 ) without any problem a t all 14. Would the large majority of people consider what happened t o be quite in accordance with social norms and morally acceptable? [SECS-NORM C O M P A T I B I L I T Y ] (0) not pertinent ( 1 ) certainly not (2) not really (3) probably (4) most likely (5) certainly 15. If you were personally responsible for what happened, did your action correspond to your self-image? [SECS-SELF COMPATIBILITY] (0) not pertinent ( I was not responsible) (1) not at all (2) not really (3) somewhat (4) strongly (5) extremely well (1) not at all (1) not at all (2) a little, but human action was t h e decisive factor (4) strongly ( 5 ) exclusively The emotion with the smallest overall distance measure is suggested to the user as diagnosis of the experienced emotional state. If t h e user does not accept the diagnosis as valid, the emotion with the vector that shows the second smallest distance is proposed as a second guess. If the user rejects this one also, he or she is prompted to provide the correct response in the list of the 14 emotions contained in the standard version of the system. If the user identifies one of these 14 emotions as correct, the D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 338 SCHERER respective prediction vector is changed in the direction of the empirical input vector (using an adaptable weighting function) to establish an empirically updated prediction matrix for this particular user. The user can indicate that none of the 14 emotion labels proposed corresponds to the real emotion that was felt. He or she then has the possibility to enter a freely chosen verbal label for that particular state. This label, together with the input vector, is then added to the personalised prediction matrix. In this manner, an unlimited number of new emotions can be added to the personalised knowledge base of a user. It should be noted that the personalised knowledge base for a particular user no longer represents pure theoretical predictions because the prediction vectors have been adapted to fit the empirical input. After prolonged usage by a particular user, the prediction matrix may actually represent a true empirical knowledge base-at least for that particular user. The system stores all the information provided by the user in two separate data files, one which contains the complete protocol of the session and one that contains the personalised vector matrix. Procedure The system h a s been designed in such a fashion that it does not require any intervention by an experimenter. Users are expected to start the program and follow instructions on the screen which should be self- explanatory. I n the following, a brief summary of the typical procedure is given. Following a title page and the entry of a user code that permits repeated access and establishes a personalised data base, the user is requested to remember a situation that has produced a strong emotional response: Please recall a situation in which you experienced a strong emotional feeling. The emotion might have been elicited by an event that happened t o you or by the consequences of your own behaviour. This might have happened recently or quite some time ago. I will ask you a certain number of questions concerning this situation and will then attempt to diagnose t h e emotion you felt at that time. Before continuing, please recall the situation as best as you can and t r y to reconstruct the details of what happened. The program then pauses until the subject confirms to now recall a situation very vividly by pressing a key. He o r she is t h e n asked to type a brief description of the situation on the keyboard. To ensure anonymity and privacy, the text typed is not shown on the screen. Then, the 15 D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 STUDYING EMOTION-ANTECEDENT APPRAISAL 339 questions shown in Table 3 are presented consecutively. The questions are always presented in the order given in Table 3 because the underlying theory predicts that this is the natural micro-genetic appraisal sequence. It is hypothesised that following the original sequence of the appraisal in assessing the different checks may help the subject to recall t h e appraisal process faster and with fewer errors. The subject is then asked to enter the intensity with which the emotion was felt on a &point scale from very weak to extremely strong, as well as hidher age group and gender. Then, the subject is presented with the following message: I have now completed a first diagnosis of the affective state elicited by the situation you described and I am about to present you with a label that I consider to be a good description of the emotion you experienced. Please remember that, at t h e time, you may not have been conscious of all aspects of your emotional experience. Therefore, it is quite possible that the verbal label you normally use to describe your feelings in that situation does not exactly correspond to the term I will suggest i n my diagnosis. If that is the case, please consider the possibility that the diagnosis which I suggest might reflect some part of what you felt i n the situation-possibly without realising it. I GUIEWI EXPERT SYSTen OH DWT"T'OM irritationicold anger sadncssidcprcssion - displeasureidisgust conteqt/scorn - anxictyiuorry - indiffcrcnce/boredor - cibarrassrcntisharc - despcrat ionlgr icf - hot angcriragc - guilt feelings feariterror joyielat ion happinessifcel good prideijubi lation The shorter I line in the follouing graphic display, the m e appropriate should be the respective label as a description of your feeling state. FIG. 1 . different emotion concepts. Feedback screen showing relative distances of input vector to predicted vectors for D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 340 SCHERER The system then presents the first diagnosis or suggested hypothesis and asks the user to indicate whether it is correct or not. If the user enters “incorrect”, a second diagnosis is presented. If that is again incorrect, the user is presented with the list of 14 emotions and asked to indicate the correct emotion. Following either of these three cases the user is presented with feedback on the diagnostic process in form of a graph showing the relative distances of the various emotion concepts to the situation described (see Fig. 1). If the subject indicates that none of the 14 emotion labels in t h e list describes the felt affect accurately, he or she is given the opportunity to enter a new concept: Aha, something new!? Do you really want to teach me a new emotion? I t will change your personal knowledge base! Your decision-(y = yedn = no followed by the new concept] After each of the four possible options: (1) first diagnosis correct; (2) second diagnosis correct; (3) correct emotion identified in list; (4) new concept entered, the user is given the possibility to enter a new situation or to exit. Ad mi n istra t i on This expert system was used in a number of pilot studies, using English, German, and French versions. The French version was used for a first major study of the accuracy of the expert system in diagnosing emotional states on the basis of theoretically predicted appraisal patterns. The program was inserted in a batch-file environment that allows automatic administration. After each user completes a session, the system returns to a title screen inviting potential users to test the power of the system to diagnose emotional states. To avoid the possibility that users would start a session and leave in the middle, a time limit for the responses to each screen was set. If the time limit is exceeded the system returns automatically to the title page. A personal computer (Olivetti M240) on which this batch-file system had been installed, was placed in the exhibition of the University of Geneva at the 1990 Geneva book fair (Salon du Livre). This is a large international bookshow with exhibitors and visitors from different countries, mostly French speaking. Posters positioned around the PC invited passers by to test the GENESE emotion expert system. During three days of the exhibition, 201 persons used the system entering generally one, but sometimes two or three situations. I n addition, 35 first D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 STUDYING EMOTION-ANTECEDENT APPRAISAL 341 year students in psychology at the University of Geneva (in their first 2 months of study) used the system as part of a course exercise (also in a completely automatised fashion). In all, 236 persons entered the data for a total of 282 emotional situations in this manner. Data Analysis A major concern for the analysis is the possibility that some users may have entered nonsensical information while just playing with or trying to mislead the system. However, there were very few cases where the text entered suggested that this was the case. In some cases no text was entered and it was difficult to decide whether the input vector constituted a serious trial or not. To avoid biasing the data by subjective judgement of the “seriousness” of t h e entry it was decided to retain all situations, assuming that nonsensical entries should work against finding accurate diagnoses and thus lead to a conservative estimate of the power of the system. Data were excluded only in the following, clearly discernible cases: In some situations there was virtually no variability in the responses to t h e questions, e.g. a user responding with 1 to all questions. Fifteen situations in which 13 or more of the answers had the same numerical value were excluded. In 14 cases of the total of 282 situations entered, users neither judged any of the diagnoses as correct nor identified any of the 14 emotion labels suggested as the correct response. In these cases new concepts were entered. Because the number of such cases was small, and because in some cases strange concepts like “le spleen total” were entered, it was decided to exclude these cases from analysis. After having excluded these cases, a total of 253 situations were analysed with respect to the number of hits and misses and the correlation between the predicted and the empirically obtained appraisal profiles for the 14 emotions studied. RESULTS Tables 4 and 5 show the major results of the analyses. In Table 4, the first column contains the total number of situations that were entered for each of the 14 emotions (using the final indication of a correct diagnosis or the user correction as a criterion). Column 2 shows in how many of these cases the first diagnosis was correct, and Col. 3 in how many cases the second diagnosis was correct. Column 4 shows the total number of misses. However, some of the latter cases can be considered as “dubious misses” as the input vectors not only deviate strongly from the predicted vectors but also from the empirically obtained mean vectors for each of the emotion (as shown in Table 5). It is highly probable, then, that the D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 342 SCHERER TABLE 4 Results of Expert System Runs for 253 Emotion Situations Total 1st 2nd Total Dubious Marked Profle Emotion Situat Hit Hit M i v u M i r r u % Correct Diff Correl Happiness/Feel good 27 21 1 5 2 88.0 3 0.76 JoyElation 3 4 3 0 3 1 1 100.0 3 0.63 DispleasurJDisgust 10 4 3 3 1 n . 8 1 0.60 ContemptFhrn 3 3 0 0 0 100.0 2 0.51 SadnesslDepression 34 19 3 12 3 71 .O 2 0.24 DesperatiodGrief 58 49 8 1 1 100.0 2 0.75 Anxie ty/Wony 19 2 0 17 5 14.3 4 0.13 Fearmerror 19 2 2 15 4 26.7 10 0.61 ImtatiodCold anger 11 3 4 4 1 70.0 3 0.30 Hot angermage 19 10 4 5 1 n.7 6 0.74 IndifferencJBoredom 3 1 0 2 0 33.3 3 0.36 EmbarrassmentIShame 4 1 0 3 1 33.3 4 0.23 PnddJubilation 9 6 0 3 1 75.0 6 0.75 Guilt feelings 3 1 0 2 1 50.0 9 -0.10 Totals 253 152 2.8 73 22 n . 9 Notrs: To& S h u t , total number of situations clearly categoriscd by respondents; 1st Hit, number correctly recognised on first attempt; 2nd Hit, number correctly recognised on second attempt; Total misses, number misscd as shown by user correction; Dubww Misses, cases in which the deviation of the input profile from the mean empirical profile exceeded half a standard deviation; % Correct, percentage of total hits (first plus second) on the basis of total number of situations minus dubious misses; Murked Difl, marked difference between predicted and empirically obtained vectors; Projile Correl, Pcarson r between the mean empirical input profile and the theoretically specified prediction profile over N = 15 questions (0 in prediction vector treated as missing observation). appraisal information was not provided in the correct manner. Twenty-two situations were considered dubious because the absolute value of the sum of the differences (deviations) obtained by deducting t h e individual values for each question from the mean value-Row 1 in Table 5 exceeded the value corresponding to a standard deviation for all difference scores. Column 5 shows the number of these “dubious misses” per emotion. Column 6 shows the percentage of correct diagnoses (excluding the “dubious misses” which are considered to be the result of incorrect input). Table 5 lists, for each of the 14 emotions, the mean input vector (Row l), the theoretically predicted SEC vector as represented in the knowledge base (Row 2), t h e difference between the two (Row 3), and t h e standard deviations of the empirical values in the input vector (Row 4). This table allows to compare t h e theoretically predicted SEC vectors in the know- ledge base with the empirically obtained input vectors. Thus it permits to determine the stimulus evaluation checks for which the empirical values greatly differ from the predicted value and for which, in consequence, a D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 STUDYING EMOTION-ANTECEDENT APPRAISAL 343 revision of the prediction might be required (if there is reason to believe that the input value does not represent artifacts or errors). For this decision, the standard deviations of the empirical values are useful: High discordance of the values entered for a particular stimulus evaluation check can be taken to indicate that there may not be a standard appraisal pattern (or that the respondents did n o t understand the question). The data in Table 5 generate a large number of interesting issues to be explored. A detailed discussion of these points would exceed the space available in this paper. However, some general trends can be inferred by doing a rough analysis of the size of the difference between theoretical and empirical patterns across stimulus evaluation checks and across emotion^.^ Counting the number of cases in which a difference score exceeds the value of 1 (absolute) for different emotions yields an indication of where the predicted patterns deviate most strongly from the empirically obtained patterns (these values are shown in Col. 7, Table 4). Another way of evaluating the fit between theoretical and empirical patterns is to correlate the two vectors for each emotion. Column 8 in Table 4 shows the mean Pearson correlation coefficients (over respondents) between the predicted profile o r vector in the knowledge base, and the empirically obtained input vectors for each of the emotions across the 15 appraisal criteria (as shown in Table 5 ) . At first observation, although not directly pertinent to the questions outlined earlier, concerns the relative frequency of the different emotions which were presented to the expert system (&I. 1, Table 4). The categories mentioned most frequently are sadneddepression and desperatiodgrief, both of which are closely linked to some kind of permanent loss. Positive emotions, happinesdfeel good and joy/elation are also mentioned rela- tively frequently. Anxietylworry and feadterror, both related to apprehen- sion about impending dangers, are in third position with respect to frequency. Anger states (imtatiodcold anger and hot angedrage) are mentioned the least frequently of the four major fundamental emotion types. The remaining emotions are all relatively low in occurrence. The most important question concerns the accuracy of the expert system in diagnosing the emotional state descriptions entered by the users. Column 6, Table 4 shows the percentage of correct diagnoses o n either the first or second guess. The data in Cols 2 and 3 show that first hits are generally much more frequent (84.4% of all correct diagnoses) than second guesses. The accuracy percentage in Col. 6 is based on a comparison of all It should be noted that the differences should not be interpreted in cases in which the theoretical prediction is h o t pertinent-as the difference score is not interpretable. Also, as explained in the description of the expert system design, the quantitative prediction vectors are based on but not identical to the patterns in the published prediction tables. D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 T A B L E 5 E m p ir ic a ll y O b ta in e d a n d T h e o re ti c a ll y P re d ic te d A p p ra is a l V ec to rs f o r 14 E m o ti o n s E m ot io n N ov T im e Pl ea R el e E xp ec C o n d u U rg en E go C O th C C ha C C on 1 P ow er A dj uc E x1 In t E nj oy J O Y D is gu s C on te m S ad D es pe r 2. 59 2. 00 -0 .5 9 1. 18 3. 29 4. 00 0. 71 1. 30 3. 10 3. 00 -0 .1 0 1. 68 2. 33 2. 00 -0 .3 3 1. 11 3. 06 2. 00 -1 .0 6 1. 24 4. 12 4. 00 -0 .1 2 0. 88 2. 19 4. 41 2. 00 4. 00 -0 .1 9 -0 .4 1 0. 86 0. 70 2. 97 4. 26 3. 00 3. 00 0. 03 -1 .2 6 0. 93 0. 99 2. 30 1. 70 3. 00 1. 00 0. 70 -0 .7 0 1. 44 0. 98 1. 00 1. 67 2. 00 2. 00 1. 00 0. 33 1. 33 0. 89 2. 21 1. 85 2. 00 2. 00 -0 .2 1 0. 15 1. 15 0. 97 2. 74 1. 79 3. 00 2. 00 0. 26 0. 21 0. 91 1. 05 2. 70 3. 59 3. 00 4. 00 0. 30 0. 41 1. 57 0. 88 3. 76 2. 85 4. 00 1. 00 0. 24 -1 .8 5 1. 30 1. 16 1. 60 1. 90 2. 00 2. 00 0. 40 0 .1 0 0. 92 0. 90 1. 67 2. 33 2. 00 2. 00 0. 33 -0 .3 3 0. 89 1. 11 2. 44 2. 65 4. 00 2. 00 1. 56 -0 .6 5 1. 58 1. 39 2. 90 2. 17 5. 00 1. 00 2. 10 -1 .1 7 1. 98 1. 07 2. % 4. 00 1. 04 0. 87 3. 97 5. 00 1. 03 0. 98 2. 40 3. 00 0. 60 I . 20 1 . 00 3. 00 2. 00 0. 67 2. 38 2. 00 -0 .3 8 0. 94 2. 00 1 . 00 -1 .0 0 0. 93 1. 52 3. 11 1. 00 2. 00 -0 .5 2 -1 .1 1 0. 95 1. 38 2. 62 3. 12 2. 00 3. 00 -0 .6 2 -0 .1 2 1. 39 1. 24 2. 40 2. 00 3. 00 1. 00 0. 60 -1 .0 0 1. 60 1. 20 2. 67 2. 00 2. 00 1. 00 -0 .6 7 -1 .0 0 1. 11 0. 67 2. 26 2. 15 2. 00 2. 00 -0 .2 6 -0 .1 5 1. 27 1. 07 3. 34 1. 95 3. 00 1. 00 -0 .3 4 -0 .9 5 1. 51 0. 98 3. 41 3. 00 1. 29 2. 97 3. 00 0. 03 1. 44 2. 70 3. 00 0. 30 - I . 30 4. 00 5. 00 1 .O O 0. 67 2. 62 3. 00 0. 38 1. 46 2. 53 2. 00 -0 .5 3 1. 42 -0 .4 1 - 2. 19 2. 00 -0 .1 9 1. 06 2. 71 3. 00 0. 29 1. 24 3. 30 3. 00 -0 .3 0 1. 10 1 . OO 1 . 00 0. 00 0. 00 2. 21 3. 00 0. 79 1. 25 2. 38 3. 00 0. 62 1. 53 3. 22 0. 00 1. 55 3. 32 0. 00 1. 40 2. 30 3. 00 0. 70 1. 36 2. 33 3. 00 0. 67 1. 78 3. 12 3. 00 -0 .1 2 1. 59 2. 52 3. 00 0. 48 1. 41 2. 37 0. 00 1. 20 2. 59 0. 00 1. 26 1. 70 2. 00 0. 30 0. 84 I . 33 4. 00 2. 67 0. 89 1. 71 0. 00 0. 91 1. 50 0. 00 0. 72 . 3. 33 5. 00 1. 67 1. 21 3. 53 4. 00 0. 47 1. 14 2. 20 4. 00 1. 80 1. 36 3. 67 4. 00 0. 33 0. 89 2. 26 2. 00 -0 .2 6 0. 87 1. 69 1 .O O -0 .6 9 0. 54 3. 85 2. 93 0. 00 0. 00 1. 09 1. 44 3. 97 2. 88 0. 00 0. 00 0. 93 1. 56 2. 70 1. 50 3. 00 0. 00 0. 30 1. 56 1. 60 3. 00 1. 00 2. 00 0. 00 0. 67 1. 33 2. 26 1. 56 0. 00 0. 00 1. 17 1. 12 1. 90 1. 47 0. 00 0. 00 1. 27 1. 27 -1 .0 0 . D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 E m ot io n N ov T im e P le o R el e E xp ec C on du U rg en E go C O ih C C ho C C on i P ow er A dj u r Ex 1 In r A n x i 2. 42 2. 74 1. 26 3. 05 3. 16 3. 00 2. 79 2. 32 2. 11 2. 16 2. 95 2. 42 2. 68 3. 05 1. 68 2. 00 5. 00 2. 00 3. 00 2. 00 2. 00 3. 00 1. 00 4. 00 3. 00 3. 00 2. 00 3. 00 0. 00 0. 00 1. 43 1. 10 0. 49 1. 42 1. 11 0. 74 1. 51 1. 28 1. 39 1. 25 1. 64 1. 07 0. 58 1. 41 1. 35 F ea r 4. 32 2. 79 1. 42 2. 79 2. 05 2. 11 3. 53 2. 11 2. 11 2. 37 3. 42 2. 26 1. 47 1. 00 1. 58 5. 00 4. 00 2. 00 5. 00 1. 00 1. 00 5. 00 1. 00 4. 00 4. 00 2. 00 1. 00 2. 00 0. 00 0. 00 0. 68 1. 21 0. 58 2. 21 -1 .0 5 -1 .1 1 1. 47 -1 .1 1 1. 89 1. 63 -1 .4 2 -1 .2 6 0. 53 0. 94 1. 05 0. 71 1. 80 0. 82 0. 88 1. 44 0. 99 1. 20 1. 48 1. 30 1. 36 0. 97 0. 84 1. 51 lr ri t 3. 27 2. 64 2. 18 2. 73 3. 45 2. 36 3. 09 2. 55 3. 36 1. 55 3. 55 2. 45 2. 91 3. 00 2. 27 3. 00 2. 00 3. 00 3. 00 2. 00 2. 00 3. 00 2. 00 4. 00 3. 00 4. 00 3. 00 4. 00 2. 00 2. 00 -0 .2 7 -0 .6 4 0. 82 0. 27 -1 .4 5 -0 .3 6 -0 .0 9 -0 .5 5 0. 64 1. 45 0. 45 0. 55 1. 09 -1 .0 0 -0 .2 7 1. 21 1. 19 1. 02 1. 70 0. 96 0. 76 1. 02 0. 96 0. 76 0. 69 1. 21 1. 14 0. 84 0. 91 1. 16 A ng er 2. 63 2. 05 1. 32 2. 79 2. 11 2. 21 3. 26 1. 53 2. 89 1. 63 4. 21 2. 68 2. 42 1. 74 2. 11 4. 00 3. 00 2. 00 4. 00 1. 00 1. 00 4. 00 1. 00 4. 00 2. 00 4. 00 4. 00 3. 00 1. 00 0. 00 1. 37 0. 95 0. 68 1. 21 -1 .1 1 -1 .2 1 0. 74 -0 .5 3 1. 11 0. 37 -0 .2 1 1. 32 0. 58 -0 .7 4 1. 37 1. 10 0. 60 1. 51 1. 08 1. 13 1. 09 0. 93 1. 39 1. 15 1. 00 1. 17 0. 88 1. 01 1. 29 ln di ff 3. 67 3. 67 2. 33 2. 67 3. 67 2. 33 1. 67 0. 67 2. 67 1. 67 3. 67 1. 67 3. 67 2. 33 2. 33 1. 00 3. 00 3. 00 2. 00 5. 00 3. 00 1. 00 0. 00 0. 00 0. 00 3. 00 3. 00 4. 00 0. 00 0. 00 0. 44 0. 44 0. 89 1. 56 1. 78 0. 44 0. 89 0. 44 1. 56 0. 44 0. 44 0. 44 1. 78 1. 56 1. 11 S h am e 4. 50 2. 75 1. 50 2. 00 1. 50 2. 50 1. 25 4. 00 2. 75 1. 50 3. 50 2. 50 2. 50 3. 00 3. 75 2. 00 3. 00 2. 00 4. 00 2. 00 2. 00 2. 00 4. 00 2. 00 1. 00 2. 00 2. 00 3. 00 2. 00 2. 00 -2 .5 0 0. 25 0. 50 2. 00 0. 50 -0 .5 0 0. 75 0. 00 -0 .7 5 -0 .5 0 -1 .5 0 -0 .5 0 0. 50 -1 .0 0 -1 .7 5 0. 50 0. 38 0. 50 2. 00 0. 50 1. 50 0. 38 1. 00 1. 25 0. 75 1. 50 1. 50 1. 25 1. 00 0. 75 -0 .4 2 2. 26 0. 74 -0 .0 5 -1 .1 6 -1 .0 0 0. 21 -1 .3 2 1. 89 0. 84 0. 05 -0 .4 2 0. 32 . -2 .6 7 -0 .6 7 0. 67 -0 .6 7 1. 33 0. 67 -0 .6 7 -0 .6 7 1. 33 0. 33 w R (C on ti nu ed ) D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 TA B LE 5 (C on ti nu ed ) ~ ~ ~ ~ E m o ti o n N o v T im e P le a R el e E xp ec C o n d u U rg en E g o C O rh C C h o C C on 1 P ow er A d ju s E x1 In r ~ ~ ~ ~ G ui lt 3. 33 3. 33 1. 67 0. 33 1. 33 1. 67 3. 33 3. 33 2. 33 1. 00 3. 00 2. 00 1. 67 3. 67 2. 33 2. 00 2. 00 3. 00 4. 00 2. 00 3. 00 3. 00 4. 00 1. 00 1. 00 2. 00 2. 00 3. 00 2. 00 1. 00 -1 .3 3 -1 .3 3 1. 33 3. 67 0. 67 1. 33 -0 .3 3 ,0 67 -1 .3 3 0. 00 -1 .0 0 0. 00 1. 33 -1 .6 7 -1 .3 3 1. 56 0. 44 0. 44 0. 44 0. 44 1. 11 1. 56 0 .4 4 1. 78 0. 00 2. 00 1. 33 1. 11 1. 11 1. 11 Pr id e 1. 11 1. 67 4. 44 3. 67 3. 67 2. 78 2. 00 3. 44 1. 89 1. 33 4. 00 2. 89 3. 11 3. 56 3. 33 2. 00 3. 00 4. 00 4. 00 4. 00 4. 00 2. 00 5. 00 2. 00 2. 00 4. 00 4. 00 5. 00 4. 00 5. 00 0. 89 1. 33 -0 .4 4 0. 33 0. 33 1. 22 0. 00 1. 56 0. 11 0. 67 0. 00 1. 11 1. 89 0. 44 1. 67 0. 62 1. 26 0. 99 1. 56 1. 19 2. 02 1. 33 1. 85 1. 85 0. 81 1. 11 1. 70 1. 65 1. 38 1. 56 N o re : Fi rs t ro w : M ea n em pi ri ca l in pu t ve ct or . S ec on d ro w : T he or et ic al p re di ct ed v ec to r. T hi rd r ow : D if fe re nc e be tw ee n em pi ri ca l an d th eo re ti ca l ve ct or s. F ou rt h ro w : S ta nd ar d de vi at io n fo r va lu es in e m pi ri ca l v ec to r. si gn if ie s ca se s in w hi ch d if fe re nc e sc or e is n ot m ea ni ng fu l be ca us e pr ed ic ti on i s “0 -n ot pe rt in en t o r op en ”. A b b re vi ar io m : N ov , n ov el ty ; T im e, w he n di d th e ev en t ha pp en ; P le a, in tr in si c pl ea sa nt ne ss ; R el e, re le va nc e; E xp ec , e xp ec ta ti on ; C o n d u . g oa l co nd uc iv en es s; U rg en , u rg en cy ; E g o C , s el f as c au sa l a ge nt ; O th C , o th er (s ) as c au sa l a ge nt ; C h o C , c ha nc e as c au sa l a ge nt ; C o n r, c on tr ol ; P o w er , Po w er t o co pe ; A d ju s, c ap ac it y to a dj us t; E xr , ex te rn al ; In r, i nt er na l. D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 STUDYING EMOTION-ANTECEDENT APPRAISAL 347 hits with true misses (excluding the dubious misses because there is a very high probability that the information on the appraisal criteria was entered incorrectly). As shown in the row for the totals, the overall percentage of hits is 77.9% (180 first and second hits compared to 51 true misses). However, averaging the accuracy percentages in Col. 6 across all emotion categories yields a mean accuracy percentage of only 65.5%. This difference between the total accuracy percentage and the average percentage across the different categories is due to marked differences in the number of situa- tions per category. Because the accuracy percentage is rather low in some of the categories containing a small number of cases, the average percentage drops. It is difficult to decide whether this lower accuracy is due to the small number of cases or to greater difficulties in predicting the respective categories. One has to assume that the true accuracy of the present version of GENESE lies somewhere between 65% and 80%. In view of the fact that with 14 emotion alternatives one would expect 7.14% accuracy if the system operated on chance level, this result seems quite respectable. Closer inspection of the accuracy percentages for the individual emo- tions shows that the average across the emotion categories is reduced by very low percentages for anxietylwony and feadterror, o n the one hand, and indifferencehoredom, embarrassmentlshame, and guilt feelings, on the other. With respect to the latter group, it is difficult to evaluate the lack of precision in the diagnoses, because o n l y very few cases are involved and the results may not be very stable. However, it is possible that the low performance for indifferencehoredom is due to the fact that the SEC profile for this state is not highly differentiated across the different stimulus evaluation checks (see Table 4). Shame and guilt are among the most complex human emotions and the current prediction profiles might well be too simplistic to differentiate these emotions. The comparatively low correlations between predicted and actually obtained profiles (shown in Table 4 , Col. 8) suggest important divergences between prediction and empirical means. In consequence, it is not too surprising to find low accuracy for these emotions. I n contrast, the abnormally low accuracy percentages for anxiety and fear are quite unexpected. One possible explanation is the rapidity with which fear situations tend to change-particularly due to the occurrence of events that eliminate the danger or due to a re-evaluation of an event or stimulus as less dangerous. Because of the low accuracy in the anxiety and fear cases, the individual data files and particularly the input profiles were closely scrutinised. This qualitative analysis showed that i n many cases subjects entered appraisal results from both the danger anticipation and the resolution part of the emotion process. A concrete example may demonstrate this phenomenon: A man between 41 and 60 years of age describes a situation in which his daughter leaned D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 348 SCHERER over a burning candle in such a way that her hair started to catch fire. The input vector constituted by the answers to the SEC-based questions and the predicted fear vector are reproduced below (see Table 3 for the exact text of the questions corresponding to the vector entries): Question 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Criterion nov tern pie rei exp con urg ego 0 t h cha c o n pow adj ext int Input 5.0 3.0 1.0 0.0 1.0 0.0 5.0 1.0 1.0 5.0 4.0 4.0 4.0 0.0 0.0 Prediction 5.0 4.0 2.0 5.0 1.0 1.0 5.0 1.0 4.0 4.0 2.0 1.0 2.0 0.0 0.0 Difference 0 -1 -1 * 0 0 0 -3 1 2 3 2 * The comparison between the input and prediction vectors shows that for the first 10 questions there is rather good correspondence (the difference for “other responsibility”-question %is due to the cause of the event being exclusively seen in chance factors, which is of course possible in the present case). However, the answers concerning coping potential (control, power, and adjustment) are clearly related to a phase in the continuous appraisal process in which the danger has already passed (e.g. the flames having been extinguished) and the situation is under control. Otherwise it would be difficult to understand that “very intensive” fear results in spite of the strong ability to control and master the event (an input of 4- “strongly” for both control and power) and it being “quite easy” (4) to adjust to the cosequences of the situation. In this situation, fear was probably quickly followed by relief after realising that no serious con- sequences had ensued. Yet, the total situation was stored and referred to under the most prominent and distinctive label-in this case fear. Given that the appraisal results reported by the subject are likely to come partly from the fear phase and partly from the relief phase of this emotion episode, it is not surprising that the expert system does not correctly diagnose the target emotion-in this case fear. Many other similar examples for the anxiety and fear cases could be listed. This probably reflects a tendency of the subjects to respond with respect to the total situation which may be characterised by a rapid change i n the type of emotion-especially in the case of fear which has been empirically shown to be of very brief duration (Frijda, Mesquita, Sonnemans, & van Goozen, 1991; Scherer & Wallbott, submitted; Scherer, Wallbott, Matsumoto, & Kudoh, 1988; Scherer, Wallbott, & Summerfield, 1986; Wallbott & Scherer, 1986). In consequence, some of the lack of accuracy may well be due to the respondents’ tendency to report appraisals from different phases of an emotion situation rather than responding to all SEC appraisal questions with respect to a singular and well-defined slice of time. In addition, further refinement of the prediction profiles is required to improve the predictions D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 STUDYING EMOTION-ANTECEDENT APPRAISAL 349 for fear and anxiety. The correlation between predicted and obtained profiles is comparatively low for fear and particularly anxiety which would account for low accuracy. It is possible, then, that the theoretically predicted profiles for fear and anxiety are quite unrepresentative of reality and require major changes. Alternatively, it is possible that anxiety/wony situations, in particular, are very variable in their appraisal patterns which would imply that no clear-cut prototype profile can be defined. I n this case, i t could be one or more central criteria which determine the special nature of this emotion (one or all of which might be missing from the list of stimulus evaluation checks). Thus, the differentiation might hinge on one or more very central criteria which may not be contained in the list of stimulus evaluation checks or imperfectly measured by the questions. This might imply the need for a revision in the theoretical underpinnings of GENESE, i.e. t h e list of stimulus evaluation checks. At present, t h e second question in the system (see Table 3) requires the subject to indicate whether the emotion-inducing event happened in the past, is about to happen, or is likely to happen in the future. This is not based on a particular stimulus evaluation check but is part of the facets of situa- tions which are part of component process theory (see Scherer, 1984a). This particular facet was added to the prediction profile precisely because of the need to differentiate anxiety and fear, which imply threats of negative outcomes in the future, from other negative emotions. However, it may be necessary to go beyond the straightforward timing issue and include dimensions such as certainty (Frijda, 1987; Roseman, 1984, 1991; Smith & Ellsworth, 1985, 1987; see also Reisenzein & Hofmann, 1990). Although “outcome probability” was added to the prediction table in Scherer (1988) this check was not implemented as a question in the expert system (due to the reasons given above in the description of the expert system design). The present results could be interpreted to show that this appraisal dimension might be a major discriminating factor for fear and anxiety and thus needs to be added to the prediction vectors in the expert system. These considerations demonstrate one of the major uses of the GENESE expert system, providing impetus and direction for theory development. The comparison between predicted and actual appraisal, as well as the precision of diagnosis, should help to identify the points where emotion-specific appraisals are badly represented in t h e theoretical predictions or where appropriate appraisal criteria are still lacking. CONCLUSIONS This paper illustrates how an empirical expert system approach to the study of emotion-antecedent appraisal can go beyond the established paradigms of obtaining correlational evidence between self-report of verbally labelled D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 3% SCHERER emotional experiences and inferred appraisal dimensions. In particular, the study examined the feasibility of using an expert system to empirically test the author’s predictions on emotion differentiation as based on a limited number of stimulus evaluation checks. The results of a first major study reported here demonstrated an accuracy of posr hoc diagnosis that substantially exceeds chance for many of the emotions studied and that lends support to the specific appraisal theory suggested by the author. The present results might well underestimate the actual capacity of the system (and the support for the SEC predictions) as there is some evidence for incorrect input by some users. One particular problem is the reporting of appraisal results from different points in time during the emotional episode which may reflect different emotions (e.g. fear and relief). Because most real-life emotion episodes seem to consist of rapid sequences of changing emotional states (see also Scherer & Tannenbaum, 1986). it is necessary to make the requirement that appraisal reports need to be focused on one clearly defined point or time slice in the emotion episode more apparent to the users of the expert system. One possibility would be to ask the users to segment the recalled emotion episode into several clearly distinguishable segments and to report the appraisal process separately for each of these segments. In this case, GENESE could attempt to diagnose a sequence of emotions rather than an overall state. One of the major sources for possible errors in reporting the recalled appraisal results is the wording of the questions. For example, even the use of the word “consequence” might have the effect of focusing the respondent’s attention on the aftermath of the emotion episode rather than the crucial period of appraisal at the onset of an emotion-eliciting event. This would obviously lead to a reporting of appraisal results from totally different time periods in the emotion episode (and thus render an accurate expert system diagnostic impossible). This problem is one that the expert system approach shares with all other research paradigms in appraisal research that attempt to elicit verbal report of appraisal processes via questionnaires or interviewing. The process of appraisal is clearly non- verbal and probably occurs largely outside of awareness. Thus, the attempt to obtain a verbal report of many fine details from recall of a process that generally occurs in a split second is obviously fraught with many dangers. A particular problem is the conceptualisation of some of the major appraisal dimensions. In the process of developing GENESE it became clear t h a t many subjects had great difficulty in understanding the concept of goal conduciveness (even in the simple formulation used i n question 6, Table 3). In the further development of GENESE much attention will have to be paid to this problem. Providing copious HELP screens that the respondent can call up to get more information about a particular question D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 STUDYING EMOTION-ANTECEDENT APPRAISAL 351 are part of such efforts to avoid noise in the data that is due to incorrect responding to the questions. The discussion of the results has attempted to show how the expert system approach yields precise suggestions as to where the theoretical assumptions need sharpening or modification. For example, the low accuracy for anxiety and fear clearly indicated the need to add a dimension or check likely to capture the future orientation of the respective appraisals. In consequence, the check of “outcome probability” (or certainty) has been added to the revised version of GENESE (and is gwen strong weight). First informal observation of some trial runs seem to show that the accuracy of GENESE in diagnosing these emotions has improved quite dramatically. Further studies like the one described here will be necessary to fine tune the prediction vectors with respect to the question of how many and which specific types of appraisal dimensions are required, and how they should be weighted, to satisfactorily diagnose the emotional states reported by the users of the system (see also Frijda & Swagerman, 1987). Obviously, the expert system approach could provide a principled way of comparing rival appraisal theories and bring about further convergence. The requirement for using this approach in critical experiments opposing different theories is that pertinent questions for the hypothesised appraisal dimensions or criteria can be formulated and that explicit, quantified predictions for an overlapping set of emotion concepts are made by each of the respective theories. In principle, these requirements could be met by most, if not all, of the appraisal theories reviewed in the introduction. Although the present version of GENESE is based on the determination of Euclidian distance in a vector space, it is certainly feasible to implement a configurational, rule-based algorithm if that were to be preferable for a comparison between theories. The automatic computer-based administration of GENESE allows for easy and economical administration of the procedure to large numbers of subjects, providing a high degree of anonymity. In consequence, the system seems to be well suited to collect large sets of data that would allow to base predictions at least in part on stable empirical patterns. Although some scholars in this area seem convinced that theoretical predictions should be made totally independently of empirical evidence, the present author believes that theory development and refinement must occur in a constant interactive process with empirical data collection. Thus, the predictions made on the basis of the stimulus evaluation check notion of the component process model (Scherer, 1984a,b, 1986, 1988) will change as a result of continuous empirical research. Concretely, the empirically found input patterns (as aggregated over many respondents) for the emotions reported in the study above, in so far as errors in answering the D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 352 SCHERER questions can be excluded, will be used to modify the theoretical prediction vectors in the standard version of GENESE. The development and use of the expert system as a tool for refining theory has just started and new ways of making use of the information provided by the system are being explored. Although at present only the experienced intensity of the emotion is to be judged, future versions of the system will contain additional questions on the duration of the emotion episode and expressive and psychophysiological responding. This should allow to examine the relationship between appraisal patterns on the one hand and specific response patterning on the other. One possible use of this procedure might be to determine to what extent theoretical predictions only work in situations characterised by particular response profiles. Such refinement may also help to study the issue of pure vs. blended emotions. More generally, GENESE might allow an empirical access to the issue of whether there are basic or fundamental emotions and how many there are of these. Users can specify new emotion concepts not contained in the “knowledge base” if neither of two attempts at a diagnosis have provided a satisfactory classification. The name given to this state is then associated with the input vector provided for the respective appraisal. One can determine, once a large number of such added emotion concepts has been obtained, which states (as defined by highly similar appraisal vectors) reoccur very frequently and ought to be added to the basic version of the system. In addition, these data allow to study the labelling used for specific appraisal patterns in a more inductive fashion. GENESE also allows us to study individual differences in the appraisal process. Because information about age and gender is obtained, it will be possible, given a large number of respondents, to investigate the effect of these variables on the appraisal patterns reported for specific emotional experiences. More background information could be obtained to refine this kind of analysis. Even more importantly, as the system is able to learn, i.e. modify the appraisal vectors on the basis of the empirical input (see section on the design of GENESE earlier), it is also possible to determine user-specific emotion appraisal patterns. For example, a group of users could be asked to use the system repeatedly over a period of some months, entering each week some of the major emotions that occurred. It would then be possible to compare the resulting matrices, having been adjusted to the empirical appraisal pattern input for each situation in order to find interindividual differences in emotion-antecedent appraisal. This might provide interesting insights into the issue of habitual emotionality and may even lead to a better understanding of moods or affective disturbance. Manuscript received 23 March 1992 Revised manuscript received 15 August 1992 D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 STUDYING EMOTION-ANTECEDENT APPRAISAL 353 REFERENCES Arnold, M.B. (1960). Emotion and personality. Vols 1 and 2. New York: Columbia University Press. De Rivera, J . (1977). A structural theory of the emotions. Psychological Issues, I0 (4), Monograph 40. Ekman, P. (1984). Expression and the nature of emotion. In K.R. Scherer & P. Ekman (Eds), Approaches to emotion. Hillsdale. NJ: Lawrence Erlbaum Associates Inc. pp. 319-344. Ekman, P. (1992). An argument for basic emotions. Cognition and Emotion. 6 , 169-200. Ellsworth, P.C. & Smith, C . A . (1988). From appraisal to emotion: Differences among unpleasant feelings. Motivation and Emotion, 12, 271-302. Folkman, S. & Lazarus, R.S. (1985). If it changes i t must be a process: Study of emotion and coping during three stages of a college examination. Journal of Personality and Social Psychology, 4 8 , 150-170. Frijda, N. (1986). The emoriom. Cambridge University Press. Frijda, N . H . (1987). Emotion, cognitive structure, and action tendency. Cognition and Emorion, I , 115-143. Frijda, N . H . , Kuipers, P., & ter Schure, E . (1989). Relation among emotion, appraisal, and emotional action readiness. Journal of Personality and Social Psychology, 5 7 , 212-228. Frijda, N. & Swageman, J. (1987). Can computers feel? Theory and design of an emotional system. Cognition and Emotion, I , 235-258. Frijda, N . H . , Mesquita, B., Sonnemans, J . , & van Goozen, S . (1991). The duration of affective phenomena, or emotions, sentiments and passions. K . Strongman (Ed.), International review of emotion and motivation. New York: Wiley, pp. 187-u5. Gehm, Th. & Scherer, K.R. (1988) Relating situation evaluation to emotion differentiation: Nonmetric analysis of cross-cultural questionnaire data. In K.R. Scherer (Ed.), Facers of emotion: Recent research. Hillsdale, NJ: Lawrence Erlbaum Associates Inc, pp. 61-78. Izard, C . E . (1977). Human emotions. New York: Plenum. Johnson-Laird, P.N. & Oatley, K. (1989). The language of emotions: An analysis of a semantic field. Cognition and Emotion, 3 , 81-123. Lazarus, R.S. (1968). Emotions and adaptation: Conceptual and empirical relations. In W.J. Arnold (Ed.), Nebraska Symposium on Motivation, Vol. 16. Lincoln, NE: University of Nebraska Press, pp. 175-270. Lazarus, R.S. (1984a). Thoughts on the relations between emotion and cognition. In K.R. Scherer & P. Ekman (Eds), Approaches ro emotion. Hillsdale, NJ: Lawrence Erlbaum Associates Inc, pp. 247-257. Lazarus, R.S. (1984b). On the primacy of cognition. American Psychologisr. 39, 124-129. Lazarus, R.S. & Smith, C . A . (1988). Knowledge and appraisal in the cognition-emotion relationship. Cognition and Emotion, 2 , 281-300. LeDoux, J.E. (1987). Emotion. In F. Plum & V. Mountcastle (Eds), Handbook of physiology. N e r v o u system, Vol. 5 . Higher functions. Washington, DC: American Physiological Society, pp. 419459. LeDoux, J . E . (1989). Cognitive-emotional interactions in the brain. Cognition and Emotion. LeDoux, J . E . . Farb, C.. & Rugiero, D . A . (1990). Topographic organization of neurons in the accoustic thalamus that project to the amygdala. The Journal of Neuroscience, 10, 1 043- 1 054. Leventhal, H. & Scherer. K.R. (1987). The relationship of emotion and cognition: A functional approach to a semantic controversy. Cognition and Emorion. I , 3-28. Manstead, A.S.R. & Tetlock. P.E. (1989). Cognitive appraisals and emotional experience: Further evidence. Cognrrion and Emotion, 3 , 225-240. 3, 267-289. D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 3% SCHERER Mauro, R.. Sato, K., & Tucker, J. (1992). The role of appraisal in human emotions: A cross- cultural study. Journal of Personality and Social Psychology, 62, 301-317. Mees, U . (1985). Was meinen wir, wenn wir von Gefiihlen E d e n ? Zur psychologirhen Textur von Emotionswortern. Sprache und Kognition, I , 2-20, Ortony, A., Clore, G. L., & Collins, A. (1988) The cognitive structure of emotiom. Cambridge University Press. Reiscnrein, R . & H o h a n n , T. (1990). An investigation of dimensions of cognitive appraisal in emotion using the repertory grid technique. Motivation and Emotion, 14. 1-26. Reisenzein. R. & Schdnpflug, W. (1992). Stumpf's cognitivecvaluative theory of emotion. American Psychologist, 47, 34-45. Roseman, I.J. (1984). Cognitive determinants of emotion: A structural theory. In P. Shaver (Ed.), Review of personality and social psychology, Vol. 5. Emotions, relatiomhips, and health. Beverly Hills, CA: Sage, pp. 11-36. Roseman, I.J. (1991). Appraisal determinants of discrete emotions. Cognirion and Emotion, Roscman. I.J., Spindel, M.S., & Jose, P. E. (1990). Appraisal of emotioncliciting events: Testing a theory of discrete emotions. Journal of Personaliry and Social Psychology, 59, 899-915. Scherer, K.R. (1981). Wider die Vernachllssigung der Emotion in der Psychologic. In W. Michaelis (Ed.), Bericht iiber den 32. Kongrefl der Deutrchen Gesellschaji ftir Psychologie in Ziirich 1980. Gottingen: Hogrefe, pp. 304-317. Scherer, K.R. (1982). Emotion as a process: Function, origin, and regulation. Social Science Information, 21, 555-570. Scherer, K.R. (1983). Rolegomina zu einer Taxonomie affektiver Zustande: Ein Komponenten- RozeBModell. In G. U e r (Ed.), Berichr cibrr dm 33. Kongrefl der Deutschm Geselkchafi fir Psychologic in Mainz. Gottingen: Hogrefe, pp. 415-223. Scherer, K.R. (1984a). On the nature and function of emotion: A component process approach. In K.R. Scherer & P. Ekman (Eds), Approaches to emotion. Hillsdale, NJ: Lawrence Erlbaum Associates Inc, pp. 293-317. Scherer, K.R. (1984b). Emotion as a multicomponent process: A model and some cross- cultural data. In P. Shaver (Ed.), Review of Personality and Social Psychology, Vol. 5 . Beverly Hills, CA: Sage, pp. 37-63. Scherer, K.R. (1986). Vocal affect expression: A review and a model for future research. Psychological Bulletin. 99, 143-165. Scherer, K.R. (1988). Criteria for emotion-antecedent appraisal: A review. In V. Hamilton, G.H. Bower, & N.H. Frijda (Eds), Cognitive perspectives on emotion and motivation. Dordrecht: Nijhoff. pp. 89-126. Scherer, K.R. (1993). Neuroscience projections to current debates in emotion psychology. Cognition and Emotion, 7. 1-41. Scherer, K.R. & Tannenbaum, P.H. (1986). Emotional experiences in everyday life: A survey approach. Motivation and Emotion, 10, 295-314. Scherer, K.R. & Wallbott, H.G. (Submitted). Evidence for universality and cultural variation of diflerential emotion response panerning. University of Geneva. Scherer, K.R., Wallbott, H . G . . Matsumoto, D., & Kudoh, T. (1988). Emotional experience in cultural context: A comparison between Europe, Japan, and the USA. In K.R. Scherer (Ed.), Facets of emotion: Recent research. Hillsdale, NJ: Lawrence Erlbaum Associates Inc, pp. 5-30. Scherer, K.R., Wallbott. H.G., & Summerfield, A.B. (Eds) (1986). Experiencing emotion: A crosscultural study. Cambridge University Press. Smith, C.A. & Ellsworth, P.C. (1985). Patterns of cognitive appraisal in emotion. Jownal of Personality and Social Psychology, 48, 81H38. 5 , 161-200. D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 STUDYING EMOTION-ANTECEDENT APPRAISAL 355 Smith, C . A . & Ellsworth, P.C. (1987). Patterns of appraisal and emotion related to taking an exam. Journal of Personality and Social Psychology, S2, 4 7 5 4 3 8 . Solomon, R . C . (1976). The passions. The myth and nature of human emotion. Garden City, NY: Doubleday. Tesser, A. (1990). Smith and Ellsworth's appraisal model of emotion: A replication. extension, and test. Personality and Social Psychology Bulletin, 16, 210-223. Tomkins. S . S . (1984). Affect theory. I n K.R. Scherer & P. Ekman (Eds), Approaches 10 emotion. Hillsdale, NJ: Lawrence Erlbaum Associates Inc, pp. 163-196. Wallbott, H. & Scherer, K.R. (1986). How universal and specific is emotional experience? Social Science I n f o m t i o n . 26, 763-795. Weiner, B. (1982). The emotional consequences of causal attributions. In M.S. Clark & S.T. Fiske (Eds), Affect and cognition. The 7th Annual Camegic Symposium on Cognition. Hillsdale, NJ: Lawrence Erlbaum Associates Inc, pp. 185-209. Weiner, B . (1986). An amibutional theory of motivation and emorion. New York: Springer. Zajonc, R . B . (1980). Thinking and feeling: Preferences need no inferences. American Zajonc, R.B. (1984). On primacy of affect. In K.R. Scherer & P. Ekman (Eds), Approaches 10 emotion. Hillsdale. NJ: Lawrence Erlbaum Associates Inc, pp. 259-270. Zajonc, R.B. & Markus, H . (1984). Affect and cognition: the hard interface. In C . E . Izard, J. Kagan, & R.B. Zajonc (Eds), Emotions, cognition, and behavior. Cambridge University Press, pp. 73-102. Psychologict, 35, 151-175. D ow nl oa de d by [ U ni ve rs it é de G en èv e] a t 02 :2 8 08 J an ua ry 2 01 8 work_3srazpdo6zbtxm7kkztjahnhlq ---- New artificial life model for image enhancement Alex F. de Araujo¹, Christos E. Constantinou² and João Manuel R. S. Tavares¹ 1 Instituto de Engenharia Mecânica e Gestão Industrial, Faculdade de Engenharia, Universidade do Porto, R. Dr Roberto Frias s/n, 4200-465 - Porto, PORTUGAL. 2 Department of Urology, School of Medicine Stanford University, Stanford, CA, USA. Corresponding author: Prof. João Manuel R. S. Tavares Faculdade de Engenharia da Universidade do Porto (FEUP) Departamento de Engenharia Mecânica (DEMec) Rua Dr. Roberto Frias, s/n, 4200-465 PORTO - PORTUGAL Tel: +315 22 5081487, Fax: +315 22 5081445 Email: tavares@fe.up.pt, Url: www.fe.up.pt/~tavares 1 http://www.fe.up.pt/%7Etavares New artificial life model for image enhancement Abstract In this work, a method to enhance images based on a new artificial life model is presented. The model is inspired on the behaviour of a herbivore organism, when this organism is in a certain environment and selects its food. This organism travels through the image iteratively, selecting the more suitable food and eating parts of it in each iteration. The path that the organism travels through in the image is defined by a priori knowledge about the environment and how it should move in it. Here, we modeled the control and perception centers of the organism, as well as the simulation of its actions and effects on the environment. To demonstrate the efficiency of our method quantitative and qualitative results of the enhancement of synthetic and real images with low contrast and different levels of noise are presented. Obtained results confirm the ability of the new artificial life model for improving the contrast of the objects in the input images. Keywords: Image Processing; Image Enhancement; Artificial Intelligence; Artificial Life Model. 1. Introduction The acquisition, transmission and compression process of the images can damage them, making it difficult, for example, to locate and extract information of the represented objects. Image enhancement techniques have been developed to soften the effect of this damage. These techniques aim to improve the quality of the images and the contrast between the represented objects, highlighting their most significant features and improving the visual perception of relevant features. Moreover, they allow the images to be represented more appropriately for further analysis by computational methods. There are many factors that can contribute to the loss of contrast, such as 2 loss of focus, noise presence, reflexes, shadows and insufficient illumination of the environment during the image acquisition process [Kao et al., 2010]. The enhancement of degraded images can be improved using different techniques that can be divided into techniques based on spatial domain, the ones that manipulate directly the pixels in the damaged image, and the techniques based on the frequency domain, that work with the image frequency information, generally obtained by Fourier and Wavelets transforms [Hanmandlu et al., 2003, Trifas et al., 2006]. The image enhancement techniques applied on the spatial domain are widely used and try to manipulate directly the image pixels, performing computational operations on these pixels based on their values. For example, considering an image in gray scale, where the intensities can have values from 0 (zero) to 255, we verify that the intensities of the original image use only a reduced range of these values. So, the enhancement methods on the spatial domain try to redistribute the intensities in a way that they can use a wider range of values, improving the perception of the different objects in the image. Some of the traditional techniques applied on the spatial domain are based in the histogram equalization, normalization, quadratic enhancement, square rooted enhancement and logarithmic enhancement [Gonzalez and Woods, 2006]. The enhancement in the frequency domain has its techniques based on the convolution theorem and can be interpreted as an enhancement extension in the spatial domain, with its operations applied to the image frequencies, usually obtained by the application of a transform, as the Fourier Transform. After the frequencies manipulation, an inverse transform is applied so that the image can be represented again in the spatial domain. This kind of enhancement method is generally obtained through filtering techniques, as the high-pass filter [Gonzalez and Woods, 2006]. Recent studies have been adopting models based on artificial life to process tasks of the computational image processing and analysis obtaining effective results. Examples include brain segmentation in magnetic resonance images [McInerney et al., 2002, Farag et al., 2007, Feng et al., 2008, Prasad et al., 2011a, Prasad et al., 2011b], and features extraction and classification from medical images [Srinivasan and Sundaram, 2013]. The use of these artificial life models, better 3 known in computational image processing, tries to make a deformable “living” model provided with a "primitive brain" capable of making decisions in search of the best results in the segmentation process. The objective of this paper is to present a new method based on artificial life to image enhancement based on the modelling of an organism, in particular, its control and perception centers. On the contrary of what is common in image segmentation processes using artificial life models, our model is not inspired in the organism’s shape, that is, the geometry of the organism’s body, but in its behavior when it is located in a certain environment and performs the food selection process. With the proposed enhancement method, it is intended to enhance images with low contrast and images affected by noise interference, highlighting the transitions between presented objects, and at the same time avoiding enhancing the noise that affects the original image quality. The results of the proposed enhancement method using the developed model allow us to conclude that this method is promising, being capable of considerably improving the visual perception and the quality of the affected images. Another contribution of this work is the innovating utilization of artificial life models as enhancement image techniques. This paper is organized as follows: in the next section, a review about existing artificial life models is presented. In section 3, the proposed model is described. Experimental results are presented and discussed in section 4, following the obtained conclusions and suggestions for future work. 2. Review about methods based on Artificial Life The studies related with computational image processing and analysis try to develop methods capable of manipulating complex images with different features, in an efficient, robust and reliable way. Various computational methods to different tasks can be found in the literature, such as for noise removal [Chen et al., 2010, Cao et al., 2011, Zhang et al., 2010a], enhancement [Cheng and Xu, 2000, Cheng et al., 2003, Lai et al., 2012, Cho and Bui, 2014], segmentation [Xue et al., 2003, 4 Nomura et al., 2005, Ma et al., 2010a], tracking [Pinho and Tavares, 2009, Nguyen and Ranganath, 2012], registration [Oliveira et al., 2011, Oliveira et al., 2010] and recognition [Papa et al., 2012]. Besides the diverse tasks that can be made by these methods, many techniques have been used, as Genetic Algorithms [Wang and Tan, 2011, Hashemi et al., 2010, Yi Kim and Jung, 2005], Artificial Neural Networks [Petersen et al., 2002, Shkvarko et al., 2011], Active Contours [Ghita and Whelan, 2010, Ma et al., 2010b], Region Growing [Fan et al., 2005, Peter et al., 2008], models based on differential equations and finite elements [Chao and Tsai, 2010, Yu et al., 2010]. A research field based on artificial life has emerged in the computer graphics area for modeling animation behavior and the evolution of plants and animals [Kim and Cho, 2006]. This investigation area has also formed the basis for new researches in computational image processing and analysis [McInerney et al., 2002, Farag et al., 2007, Feng et al., 2008, Prasad et al., 2011a, Prasad et al., 2011b, Osuna-Enciso et al., 2013, Horng, 2011], forming the methods called image processing methods based on artificial life models. The artificial life models address the diverse biological processes that characterize the live species in the attempt to overcome the problems found in the image processing, such as the complexity of the represented objects and low contrast between them and the rest of the represented scene. The majority of the artificial models used in this area apply techniques based in physical and geometrical principles, aiming to make the used deformable models [Ma et al., 2010b] more flexible and capable of performing a better image exploitation, using a priori knowledge of the area associated with the images, and analyzing better the neighborhood of the active model used to control the deformation in a more adequate and robust way. Making an analogy with the artificial life systems, the geometry of the deformable model is generally considered as being an "organism" (a worm), that keeps changing according to the available priori knowledge, associated with the information that its sensorial organs return while the organism changes its shape and moves through the image (or environment). Despite being the most common utilization of the artificial models in image processing, there are many biological processes that can be used as basis for the same 5 models, such as growing, natural selection, evolution, locomotion and learning [Kim and Cho, 2006, McInerney et al., 2002]. Another important issue is that the models based on artificial animal life are more relevant to the image processing techniques, because they present bodies with motor and sensorial organs, and mainly a brain with learning (cognitive system), perception and behavioral centers [McInerney et al., 2002, Horng, 2011]. The motor center coordinates the muscular actions to perform specific actions, as locomotion and the control of the sensorial center. This last one is used so that the artificial organism acquires information about what happens around it and sends this information to the cognitive center, which is going to decide which actions must be performed to obtain best results according to each situation. The perception center is the part of the brain that allows the organism to have its sensors trained, being formed by mechanisms that allow it to acquire and understand sensorial information about the environment that surrounds it. This behavior is useful to perceive the modifications that occur during the processing, since the individual actions can have significant changes in the environment. The learning center allows the individual to learn how to control all his organs and his behavior through practice. The bahavioral center has the routines with the actions that the individual must perform, considering their perception. For the creation of an artificial life model, the work-tasks pyramid shown in Figure 1 is usually followed. (Insert Figure 1 about here) Observing the pyramid in Figure 1, from bottom-up, there is the organism’s geometric modeling in its base. In this layer, the organism’s type is defined, as well as its appearance and geometric morphology. In the second layer, there is the physical modeling where the biomechanical principles are modeled for simulation and formation of the biological tissues, such as muscles. The next layer incorporates the motor control routines, which are responsible for stimulating the muscle actuators 6 to allow the locomotion movements. In the fourth layer, there is the behaviour and perception modeling, with an important part, for example, in detecting and navigating between obstacles. And on the top of the pyramid, the cognitive layer, which is responsible for controlling the acquired knowledge by the organism during the learning process, as well as the planning of the actions to be performed with some level of intelligence. 3. Proposed Model The computational image processing based on artificial life models uses an analogy between biological processes from certain organisms and the desired operations to do on the images being processed. One of the objectives of these models is to automate the applied computational methods, making them more robust, efficient and automated to deal with the existing variations in the involved image set. The organisms used in the models have features that deserve highlighting: they are endowed with a prior knowledge about some features of the environment, namely from the scene, in which they are found; they have sensors that operate in making decisions about the actions to be performed during processing; they have a set of actuators that in association with a knowledge system, allows the organism to adapt to the environment according to the collected data from the sensors. Some models based on artificial life, such as the ones based on worm, called deformable organisms, have been successfully used in the image segmentation; as in, for example, [McInerney et al., 2002, Farag et al., 2007, Prasad et al., 2011b]. Usually these models have an analogy to worms, considering their great flexibility and their subsequent facility to deformation. Thus, each organism has a "skeleton" defined to describe its initial shape. When the organism is placed in an image, it starts a search for a compatible region with the geometry of its skeleton and with the a priori knowledge that it has about the input scene. When a compatible object is found, a set of sensors and 7 actuators start to work, deforming the worm’s "skeleton" evolving this “skeleton” in order to fully enclose the object’s region in the input image. Although this technique of artificial intelligence has been wide used for image segmentation with considerably success, it has not been so common for image enhancement. Hence, we extend it use to enhance the contrast of images by proposing a new artificial life model. The adopted model is not based on analogy with the organism’s body shape, but on the behaviour of a herbivore organism when it is in a specific environment and performs the process of selection of its food. The organism’s body shape was not considered in this approach because what matters here is the effect that the behaviour of this organism produces in the environment, and the shape of its body has no direct influence on the operations considered in this work. In Figure 2, there is the flow diagram of the proposed method, with the steps of the method described in detail in the following sections. (Insert Figure 2 about here) 3.1. Model Description The low contrast images problem can be reduced using computational methods for image enhancement, making the difference between different objects more enhanced, allowing us to obtain better results in the later stages of image processing and analysis, such as segmentation and features extraction. Considering a grayscale image, it is desired that the pixels with low and high intensities are represented so that their intensities are as distinct as possible. Based on this principle, a good enhancement method in the spatial domain is the one able to verify all pixels of an original image, and recalculate their intensities taking into account the neighbours so that the intensity differences between the pixels belonging to neighbouring and distinct regions is increased. Thus, the model based on artificial life used in the proposed image enhancement method was inspired by the behaviour of the herbivorous organisms when they are in a specific environment and perform 8 the selection process of its food. To this end, it was considered a natural pasture as environment. These considerations have resulted from the observation that while feeding on a natural pasture, these organisms make a selection of the food, setting a "priority order" of the food to be eaten first. If we consider an area composed by a single type of food with different heights, there is a tendency that the smaller food will be eaten first. Usually this happens because the smaller parts are smoother and more nutritious than the bigger ones. Furthermore, the organism is not fixed in a single point until it eats all of the food available there. In general, it is always moving in the environment while feeding itself, guided by its cognitive system using the information collected about the food around it. Thus, the difference between the small and big foods present in a given environment tends to become more evident, at the same way as it is desired in the image enhancement operation. In the built model, the organism moves iteratively on the image until a threshold for the height of the food is reached. 3.2. Modeling the control center The control center has been modeled from the movement operations of the organism, and reduces the height of the food as a result of the eating process performed by organism. Two control operations are defined in this step: moving and reducing the height ("eat"). The movement consists in moving the organism from one point to another one on the image, avoiding passing through the same point more than one time at the same iteration. In each iteration, all pixels of the image are visited and their intensities are recalculated. To simulate how the organism feeds, the intensity of each point is reduced by considering the following assumptions: usually the organism eats a bit of food that is in the region where it is, but it does not always eat all the food at the same time; the smaller foods are fresher and more nutritious, and they have priority to be consumed. These assumptions are part of the a priori knowledge set that the organism has about their actions and the environment. So, considering a local analysis on the image, the smaller its current value, the bigger 9 the reduction of the intensity value of each pixel; i.e., in a very bright pixel a small reduction occurs while in darker pixels a greater reduction, in proportion to its size, is performed in each iteration. To model this behavior, we adopted the following rational function: ,)( 2 kx kx xf + = (1) where 255...,,1,0=x is the possible intensity values for pixels in the 8-bits images, and k is a positive constant used to control how much of the food height should be reduced at each iteration. The higher the value of k, the faster the food is reduced and less time is spent to reach the stop criterion. However, very high values for k cause similar regions to merge quickly, and distinct objects may be wrongly joined. On the other hand, if its value is too low, the growth of the food can be greater than the amount that is eaten, so the different objects represented in the image can be joined to the image foreground. As the intensities of the pixels in the input images range from 0 to 255, k was defined as equal to the maximum possible value, i.e., 255=k . As such, if the intensity of a pixel is 0 (zero), the reduction is also 0 (zero), but if the intensity is 1 (one), then the reduction is almost equal to the total height associated to the pixel. This behavior is repeated for all pixels with low intensity, and for pixels with high values, the reduction is smaller. It should be noted that, in the images considered in this work, the pixels belonging to the regions of interest are almost black. For this reason, it was considered that the best “food” (i.e., the darkest pixels) has the maximum possible height, which was experimentally considered as equal to 10% of the maximum available height. The function of Equation 1 generates a curve which shows a more significant loss for the pixels with smaller intensity, while those with bigger intensities are less affected, Figure 3. Thus, the darkest pixels tend to darken faster than the lighter ones, increasing the difference between them and, consequently, improving the enhancement of the image associated area. (Insert Figure 3 about here) 10 3.3. Modeling the perception center The perception center was modeled according to the a priori knowledge of the organism's behaviour, considering the information about the neighbourhood acquired by the organism. The perception center will act primarily to receive the height information of the neighboring points of the organism; i.e., the intensity level, and which points have been visited in a given iteration. This information is important to choose the way for the organism to go. To determine how to move in the environment, all 8 (eight) pixels neighboring the pixel occupied by the organism are visited, and at the same time that the intensity of the visited pixel is reduced, the organism stores the coordinates of the pixel with smaller intensity, which is added to its path at the end of the visit to the neighboring pixels. In addition, the visited pixels are marked as visited, being released for further visits after the end of the current iteration. This operation allows the organism to walk in a coordinated way in the image, following the darker regions. However, it may happen that the organism is isolated in a pixel whose all the neighbors have already been visited. In this case, the organism carries out a search for the nearest pixel that has not yet been visited and continues processing. This search for the nearest pixel ensures that all pixels on the image are visited, and have their values updated at each iteration. 3.4. Effects of the organism’s actions in the environment The organism walking process in the environment following the points with best food generates a secondary effect, which is the damage to the bigger foods during the movement process, since the organism moves over them. This effect is much common because the big foods are less flexible and are frequently broken when they are pressed, and the broken parts are usually discarded because when food is damaged, it loses its nutritive capability, and it will hardly be consumed by the organism. 11 Another interesting effect that happens with the adopted model is a consequence of the time that the organism needs to eat all the food available. Under the considered conditions in the used analogy, many days are usually needed for this process to be performed. So, as the food is a natural herb, the pieces not yet fully eaten tend to grow as time goes by. However, very big foods and even those small ones tend to be broken rather than grow, and the high ones break due to their low flexibility, as already previously explained, and the small ones do too because they are young and have low resistance. Thus, only those foods with middle height usually grow effectively. So to simulate these effects the function of Equation 2 was adopted, because this one represents an undulating curve, ranging from the approximate range of ],[ 21 yy− , where 1y and 2y belong to the set of real numbers, as can be seen from the curve of Figure 4. In that curve, the values 1y and 2y are the smallest and the biggest values along the Y-axis, respectively. This curve allows the performance of an approximate simulation of the degrading operations of the big and small foods and the growth of the intermediate foods according to the course of time: i4 2 sin 3 180 4 f( ) . 180 i i x x x π π π π   +   = (2) (Insert Figure 4 about here) 3.5 Stopping Criterion In the implemented method, we tried to integrate an automatic stopping criterion based on a threshold defined from the height of the food present in the environment. For this purpose, while the organism moves in the environment, the heights of the foods are summed and compared with the sum of the heights in the input image, and the processing is finished as soon as at least 30% of the 12 height of all foods has been eaten. This threshold was defined experimentally for the images to be studied. 4. Results and Discussion To test the efficiency of the proposed model, it was necessary to take images where it is possible to quantitatively measure the loss of contrast in the output image when compared with the ideal image, as well as the restoration rate obtained from the application of the proposed enhancement method. For this purpose, 8 images were used for testing, where 4 grayscale images were synthetically created and the other 4 were real ones, including the traditional images "Lena" and "Cameramen". After that, each image was damaged twice, resulting in 16 grayscale images with changed contrast. Both reduction contrast processes were applied over the original images, and in each of them a Gaussian noise was added with intensity equal to 0.3, and in one of these reduction contrast processes a circular median filter was applied with radius equal to 2 for blurring the image before applying the noise. The results of the image enhancement obtained from the images with the contrast affected were visually and statistically analysed based on the analysis of the indexes returned by PSNR (Peak Signal Noise Ratio), SSIM (Structutal Similarity) and Detail Variance and Background variance (DV-BV) metrics. The PSNR and SSIM indexes were calculated comparing the original image and the images resulting from the controlled degradation process. In addition, we analyzed the profile of a random line of the images, and compared the proposed method with other enhancement methods traditionally used in the spatial domain: normalization, histogram equalization, square enhancement, square root enhancement and logarithmic enhancement methods [Gonzalez and Woods, 2006]. 4.1. Adopted evaluation metrics 13 Generally, the comparison between two images is a natural task for the human visual system, but the realization of this same task for a computer system is more complex and not so natural. However, there are many studies attempting to provide techniques able to compare images, including techniques to statistically compare the performance of image processing methods [Wang et al., 2004, Ramponi et al., 1996]. 4.1.1 Analysis based on error The indexes of comparison based on error try to estimate the perceived error between the processed image and the original image to quantify the image degradation. The techniques of this category have as their main disadvantage the fact that they can fail in cases where translations in the images happen. For example, similar images where one of the objects has been displaced can be classified as distinct by this process. Moreover, these indexes perform the comparison based on the variation of the intensities of the pixels in the images, and images with different types of distortion can have similar indexes [Wang et al., 2004]. Despite of that, the indexes based on error are often used to compare the quality of image enhancement [Hashemi et al., 2010, Shkvarko et al., 2011, Ghita and Whelan, 2010] and image smoothing [Chen et al., 2010, Jin and Yang, 2011] methods, due to their simplicity and the fact that these images are usually affected by a few displacements during the computational processing. Some of the most known indexes based on error are the PSNR (Peak Signal Noise Ratio), RMS (Root Mean Square), MSE (Mean Squared Error) and ReErr (Relative Error) [Wang et al., 2002]. However, the most commonly used to analyze the performance of the restoration and enhancement methods is the PSNR, which attempts to calculate the relationship between the highest possible force strength of a signal (in the case of the image it is the highest intensity value) and its strength affected by noise [Dash et al., 2011, Yang et al., 2009]. In this case, the PSNR is represented in 14 function to a logarithmic scale on the base 10 (decibel), because some signals have a very high value. The PSNR can be calculated from the MSE, which can be defined as: [ ]∑ ∑ −= − = − = 1 0 21 1 ),(),( 1 m i n j r jiIjiInm MSE , (3) where m and n are the dimensions of the input image, I is the original image, and rI is the affected image by processing or by any artifact. From the MSE index, the PSNR can be calculated as follows:       = MSE MAX PSNR I 2 10log10 , (4) where IMAX is the maximum value that a pixel can be. To the grayscale images represented by 8 bits, for example, 255=IMAX . During the interpretation of the PSNR index, the higher its value, the more similar are the two compared images. It is important to note that for identical images, the MSE index value will be zero, and thus the PSNR is undefined. 4.1.2 Analysis based on structural information This approach attempts to note the changes in the structural information of the image to quantify the occurred degradation. The analysis of the structural information assumes that the human vision system is adapted for extracting structural information from what is seen, searching for changes in these structures to detect changes and consequently, possible degradation generated by some process [Wang et al., 2002]. The SSIM index (Structutal Similarity) is an index of this class most often used to analyze the quality of computational methods for image processing [Zhang et al., 2010b, Chen et al., 2010]. The SSIM index was proposed by Wang and colleagues [Wang et al., 2004] in an attempt to avoid that the images with very different visual qualities have the same index, as can happen in the 15 indexes based on error. This index is calculated based on three components of comparison: luminance, contrast and image structure. The first parameter is calculated from the average intensity of the signal, in this case the input image. The contrast is calculated from the standard deviation, and the structure parameter is computed from the normalized image using the standard deviation of the same image. So, the SSIM index can be obtained using the equation: ,),(.),(.),(),( γβα yxsyxcyxlyxSSIM = (5) where, 0>α , 0>β and 0>γ are constant parameters used to define the weight of each component in the final index. The component l refers to the luminance, c to the contrast and s to the structure. The three components are relatively independent, and changes in one of them do not affect the other ones. In [Wang et al., 2004] a more detailed analysis of each component is presented, showing in detail how they are calculated. The SSIM value shows an index for each pixel of the image and, to make its use more practical, it is common to use a mean SSIM index, also called MSSIM, which is calculated from the average of the elements obtained from SSIM. For equal images, this index is equal to 1 (one positive), being reduced as the images differ, reaching the value -1 (negative one) for two images exactly opposite (one image is the negative of the other one). 4.1.3 Detail Variance and Background Variance values The Detail Variance (DV) and Background variance (BV) values are used to give indications about the level of enhancement in an image, without necessarily using the other image as a benchmark. These values are calculated from the local variance of the n neighbors of all image pixels, creating a matrix of variances in a first step. Afterwards, each pixel is classified into two classes: the variance of each pixel is compared to a threshold, and if it is greater than the threshold value, the pixel is classified as belonging to the image foreground, otherwise the pixel belongs to the image background. 16 After classifying all pixels of the image, the mean variance of the pixels belonging to two classes are calculated, and respectively called DV and BV. To check the level of enhancement applied to the image, these two values are analyzed as follows: if the DV index of the processed image increases when it is compared to the DV index of the input image, while the BV value suffers a little change, it is considered that the image was efficiently enhanced. In this work, we adopted a neighborhood of n n× neighbors, and the threshold was calculated using the Otsu method [Osuna- Enciso et al., 2013]. 4.2. Experimental Tests In Figures 5, 6 and 7, the results obtained by applying the square root enhancement method are presented (c); quadratic enhancement method (d); logarithmic enhancement method (e); normalization method (f); histogram equalization method (g); and proposed enhancement method (h), on the input image with reduced contrast (b). In each figure, the image (a) represents the original image and the images together with the symbol (-) represent the difference between the processed images and their respective original images, for example, the image (b-) is the difference between the images (b) and (a), (c-) is the difference between the images (c) and (a), and so on. Note in these figures that the proposed method (figures 5h, 6h and 7h) allowed to enhance properly the objects represented in the input images (figures 5b, 6b and 7b, respectively), restoring the transition details between such objects, as can be seen with more accuracy in image (h) of Figure 5. In addition, our method allowed to maintain the transitions in the areas with linear transitions of the gray levels, as noted in image (h) of Figure 6. Analyzing the difference between the processed images (b-, c-, d-, and f-) and their respective original images, it is possible to note that the proposed method returned images more similar to their respective original images (figures 5a, 6a and 7a). Note that the difference between two identical images generates an image in black (image 17 a-), i.e., the darker the image resulting from the difference, the more the processed image is similar to the original image, indicating a better result. (Insert Figure 5 about here) (Insert Figure 6 about here) (Insert Figure 7 about here) In Figure 8, the results of the same enhancement methods applied in 4 real images are presented, including images "Lena" and "Cameramen" with contrast affected by the addition of synthetic Gaussian noise. The respective PSNR and SSIM indexes for the 4 images are presented in Tables 1 and 2, in lines 13, 14, 15 and 16, respectively. It is possible to note from the analysis of the indexes and the images of Figure 8 that the proposed method obtained promising results, being able to restore the contrast in the input images. (Insert Figure 8 about here) To complement the comparison of the results, a quantitative analysis of the results was made using the PSNR and SSIM indexes. These indexes were obtained from the comparison between the original images and their respective images resulting from the application of the enhancement methods, Tables 1 and 2. In these tables, the values obtained for the synthetic images shown in Figs. 5–7 are indicated in bold. 18 (Insert Table 1 about here) (Insert Table 2 about here) The PSNR index allows the performance of an analysis based on error, showing the ability of our enhancement method to restore the information intensity of the pixels of the degraded image. The PSNR indexes for different methods of enhancement tabulated in Table 1, and plotted in Figure 9 allow us to check the best performance of the proposed method. From the presented graph, it is possible to clearly note that the results of the new method were better for all images from the test set, since the PSNR index had the highest values for this method. (Insert Figure 9 about here) The SSIM index shows the ability of the processing method to preserve the structural information of the processed image. The graph of Figure 10 has been plotted from the data presented in Table 2, and together with the analysis of Figure 9, it allows us to conclude that our method was able to enhance the input images, preserving the structural information of the processed images better than the other methods presented in the literature, as indicated by the highest values for the SSIM index. (Insert Figure 10 about here) To complement the analysis of the proposed enhancement method, the profile of a line of the original images, affected and resulting from the application of the method was extracted and plotted in the graphs of Figure 11. In this figure, the profiles of a random row of images of the images of 19 Figures 5, 6 and 7, respectively, are presented. The graph of the line profile for each image has three components: the blue component for the line profile in the original image; the red line for the affected image; and the green one for the image result of the proposed method. The extraction of the line profile showed that the image affected had the line profile very irregular due to interference caused by damage in the original image with the addition of noise. In addition, the line profile of the image presented in Figure 5 shows the smoothed transition due to the blurring operation. Through the analysis of these graphs, it is clear that the method has greatly approximated the input image to the original image, reducing the smoothing transitions between objects and also reducing the interference caused by noise. This behavior was observed for almost all images studied. (Insert Figure 11 about here) Another objective analysis that can be made about the results of the image enhancement methods is through the DV and BV values. In Table 3 the DV and BV values for the 4 images of Figure 8 (Cup, Lena, Cameramen and Parrot) are presented. In the second column of the table, the values of the original images are indicated in bold as are the values that should be approximated after the application of the enhancement methods under comparison. It is possible to note from these data that the proposed enhancement method obtained the best results for three of the four analyzed images (Cup, Lena and Cameramen), returning images with high DV values and BV values close to the respective values in the original image. (Insert Table 3 about here) To better visualize the results of this analysis, the points classified as belonging to detail value (DV) of the four images of Figure 8 were marked in black color and the images are presented in Figure 20 12. Note that the detected details from the images processed by the proposed method (h) are closer to the detail detected from the original images (a) indicating that the proposed method performed a more efficient processing than the other methods tested. (Insert Figure 12 about here) In Figure 13, we have the result of the proposed method and the enhancement methods studied in this paper applied in real images. It is observed that the resulting images of the enhancement proposed method allowed a more efficient segmentation of the images, due to the enhancement of the transitions between the inner and the outer regions of the objects. It is noted from the edges extracted by the Chan-Vese method [Chan and Vese, 2001]. In the ultrasound image of the pelvic cavity, it is noted that a greater part of the bladder was detected in the image enhanced by the proposed method (image h). In image "Cameramen", it is also realized that the edges extracted from the image enhanced by the proposed method is more regular, and the objects are better involved. In the image of the skin lesion, an edge with less discontinuity was obtained, even though the lesion was in an area affected by the strong presence of hair. This work presents a pioneer application of an artificial life model for image enhancement. The results obtained using both synthetic and real images exposed that one advantage of the proposed model is a superior enhancement of the image transitions, making the structures presented from the image background more distinguishable, and the higher preservation of the structural information presented. Another advantage of the model developed is that it is based on the behavior of the organism mimicked instead of being based on the organism morphological characteristics, as happens with the deformable models commonly used in image segmentation. Although the evident advantages, the presented model can still be improved, for example, by integrating more 21 sophisticated cognitive and sensorial centers that will increase the efficiency and robustness of the enhancement process. (Insert Figure 13 about here) Being an iterative method, the proposed method requires a higher computational time than the other methods discussed, Table 4. In that table, the times required to enhance the images "Lena" and "Cameramen" are indicated, both images are in grayscale and with size equal to 256x256 pixels. The tests were performed using a core of an Intel Core i5 processor at 3.2 GHz, 8 GB of RAM and 64-bits bus. The time for each method was obtained by averaging 150 executions, in order to increase the accuracy of the indicated values. (Insert Table 4 about here) 5. Conclusions The proposed method for image enhancement in the spatial domain is based in a new artificial life model, which processes the original image using operations based on the pixels’ values in order to highlight the intensity differences between neighbour regions. In comparison to other well known enhancement methods presented in the literature, the suggested method showed very promising experimental results. A quantitative and qualitative analysis of experimental results concluded that the proposed method is able to improve the transitions between the objects presented in degraded images, and got better results than the traditional enhancement methods adopted and compared in this work. In addition, tests on real images, such as ultrasound of the pelvic cavity, "Cameramen" and the image of a skin 22 lesion, showed that the proposed method is capable of enhancing the transitions between the structures present in these images, increasing, for example, the efficiency of the segmentation methods. The analysis of the results obtained by the proposed model confirmed that, as in several applications of image segmentation, the artificial life models can accomplish acceptable results in applications of image enhancement, suggesting these models also for the field of computational image processing. The effective image enhancement makes the image segmentation step easier and more accurate as it was demonstrated by using the Chan-Vese segmentation method on enhanced images. Additionally, the characterization and classification of structures from images can be more robust, once these tasks are highly dependent on the quality of the image segmentation step. Moreover, the experimental results obtained using the proposed image enhancement method allow us to conclude that the new artificial life model is very interesting for ultrasound imaging, once the ultrasound images are commonly affected by artifacts during the acquisition process, and the transitions between the structures presented are attenuated. Hence, the method proposed can effectively contribute for a more successfull ultrasound image processing and analysis by emphasizing the transitions between the structures presented and reducing the artifacts. In future studies, we intend to integrate the modeling of the cognitive center to the organism of the proposed model, giving it the ability to make smarter and elaborated decisions, as well as add some other control parameters of the environment, such as its relief, which can influence in the way that the organism moves in the enviroment. We also intend to extend the proposed method to improve the enhancement of image sequences using the relationship between consecutive frames. Furthermore, we want to apply techniques of high performance computing to reduce the computational time of the proposed method, such as parallel computing and processing using multiresolution. Acknowledgments: 23 The first author would like to tanks Fundação para a Ciência e Tecnologia (FCT), in Portugal, for his PhD grant with reference SFRH/BD/61983/2009. This work was partially done in the scope of the project “A novel framework for Supervised Mobile Assessment and Risk Triage of Skin lesions via Non-invasive Screening”, with reference PTDC/BBB-BMD/3088/2012, financially supported by Fundação para a Ciência e a Tecnologia (FCT) in Portugal. References [Cao et al., 2011] Cao, Y., Luo, Y., and Yang, S. (2011). Image denoising based on hierarchical markov random field. Pattern Recognition Letters, 32(2):368–374. [Chan and Vese, 2001] Chan, T. and Vese, L. (2001). Active contours without edges. IEEE Transactions on Image Processing, 10(2):266 –277. [Chao and Tsai, 2010] Chao, S.-M. and Tsai, D.-M. (2010). An improved anisotropic diffusion model for detail- and edge-preserving smoothing. Pattern Recognition Letters, 31(13):2012–2023. [Chen et al., 2010] Chen, Q., Sun, Q., and Xia, D. (2010). Homogeneity similarity based image denoising. Journal of Pattern Recognition, 43(12):4089–4100. [Cheng and Xu, 2000] Cheng, H. and Xu, H. (2000). A novel fuzzy logic approach to contrast enhancement. Pattern Recognition, 33(5):809–819. [Cheng et al., 2003] Cheng, H. D., Xue, M., and Shi, X. J. (2003). Contrast enhancement based on a novel homogeneity measurement. Pattern Recognition, 36(11):2687–2697. [Cho and Bui, 2014] Cho, D. and Bui, T. D. (2014). Fast image enhancement in compressed wavelet domain. Signal Processing, 98(0):295 – 307. 24 [Dash et al., 2011] Dash, R., Sa, P. K., and Majhi, B. (2011). Restoration of images corrupted with blur and impulse noise. In Proceedings the International Conference on Communication, Computing & Security, volume 1, pages 377–382. [Fan et al., 2005] Fan, J., Zeng, G., Body, M., and Hacid, M.-S. (2005). Seeded region growing: an extensive and comparative study. Pattern Recognition Letters, 26(8):1139–1156. [Farag et al., 2007] Farag, A. A., Suri, J. S., Micheli-Tzanakou, E., Hamarneh, G., and McIntosh, C. (2007). Deformable organisms for medical image analysis. In Deformable Models, Topics in Biomedical Engineering. International Book Series, pages 387–443. Springer New York. [Feng et al., 2008] Feng, J., Wang, X., and Luo, S. (2008). Medical imaging and informatics. chapter A Worm Model Based on Artificial Life for Automatic Segmentation of Medical Images, pages 35–43. Springer-Verlag. [Ghita and Whelan, 2010] Ghita, O. and Whelan, P. F. (2010). A new gvf-based image enhancement formulation for use in the presence of mixed noise. Pattern Recognition, 43(8):2646– 2658. [Gonzalez and Woods, 2006] Gonzalez, R. C. and Woods, R. E. (2006). Digital Image Processing (3rd Edition). Prentice-Hall, Inc., Upper Saddle River, NJ, USA. [Hanmandlu et al., 2003] Hanmandlu, M., Jha, D., and Sharma, R. (2003). Color image enhancement by fuzzy intensification. Pattern Recognition Letters, 24(1-3):81–87. [Hashemi et al., 2010] Hashemi, S., Kiani, S., Noroozi, N., and Moghaddam, M. E. (2010). An image contrast enhancement method based on genetic algorithm. Pattern Recognition Letters, 31(13):1816–1824. [Horng, 2011] Horng, M.-H. (2011). Multilevel thresholding selection based on the artificial bee colony algorithm for image segmentation. Expert Systems with Applications, 38(11):13785 – 13791. 25 [Jin and Yang, 2011] Jin, Z. and Yang, X. (2011). A variational model to remove the multiplicative noise in ultrasound images. Journal of Mathematical Imaging and Vision, 39(1):62–74. [Kao et al., 2010] Kao, W.-C., Hsu, M.-C., and Yang, Y.-Y. (2010). Local contrast enhancement and adaptive feature extraction for illumination-invariant face recognition. Pattern Recognition, 43(5):1736–1747. [Kim and Cho, 2006] Kim, K.-J. and Cho, S.-B. (2006). A comprehensive overview of the applications of artificial life. Artificial Life, 12(1):153–182. [Lai et al., 2012] Lai, Y.-R., Chung, K.-L., Lin, G.-Y., and Chen, C.-H. (2012). Gaussian mixture modeling of histograms for contrast enhancement. Expert Systems with Applications, 39(8):6720 – 6728. [Ma et al., 2010a] Ma, Z., Jorge, R. N. M., and ao Manuel R. S. Tavares, J. (2010a). A shape guided c-v model to segment the levator ani muscle in axial magnetic resonance images. Medical Engineering & Physics, 32(7):766–774. [Ma et al., 2010b] Ma, Z., Tavares, J. M. R., Jorge, R. N., and Mascarenhas, T. (2010b). A review of algorithms for medical image segmentation and their applications to the female pelvic cavity. Computer Methods in Biomechanics and Biomedical Engineering, 13(2):235–246. [McInerney et al., 2002] McInerney, T., Hamarneh, G., Shenton, M., and Terzopoulos, D. (2002). Deformable organisms for automatic medical image analysis. Medical Image Analysis, 6(3):251–266. [Nguyen and Ranganath, 2012] Nguyen, T. D. and Ranganath, S. (2012). Facial expressions in american sign language: Tracking and recognition. Pattern Recognition, 45(5):1877–1891. [Nomura et al., 2005] Nomura, S., Yamanaka, K., Katai, O., Kawakami, H., and Shiose, T. (2005). A novel adaptive morphological approach for degraded character image segmentation. Pattern Recognition, 38(11):1961–1975. 26 [Oliveira et al., 2010] Oliveira, F. P., Pataky, T. C., and Tavares, J. M. R. (2010). Registration of pedobarographic image data in the frequency domain. Computer Methods in Biomechanics and Biomedical Engineering, 3(6):731–740. [Oliveira et al., 2011] Oliveira, F. P. M., Sousa, A., Santos, R., and Tavares, J. M. R. S. (2011). Spatio-temporal alignment of pedobarographic image sequences. Medical and Biological Engineering and Computing, 49(7):843–850. [Osuna-Enciso et al., 2013] Osuna-Enciso, V., Cuevas, E., and Sossa, H. (2013). A comparison of nature inspired algorithms for multi-threshold imagesegmentation. Expert Systems with Applications, 40(4):1213 – 1219. [Papa et al., 2012] Papa, J. a. P., Falcão, A. X., de Albuquerque, V. H. C., and Tavares, J. a. M. R. S. (2012). Efficient supervised optimum-path forest classification for large datasets. Pattern Recognition, 45(1):512–520. [Peter et al., 2008] Peter, Z., Bousson, V., Bergot, C., and Peyrin, F. (2008). A constrained region growing approach based on watershed for the segmentation of low contrast structures in bone micro-ct images. Pattern Recognition, 41(7):2358–2368. [Petersen et al., 2002] Petersen, E. M., Deridder, D., and Handels, H. (2002). Image processing with neural networks - a review. Pattern Recognition, 35(10):2279–2301. [Pinho and Tavares, 2009] Pinho, R. R. and Tavares, J. M. R. (2009). Tracking features in image sequences with kalman filtering, global optimization, mahalanobis distance and a management model. Computer Modeling in Engineering & Sciences, 46(1):55–75. [Prasad et al., 2011a] Prasad, G., Joshi, A. A., Feng, A., Barysheva, M., Mcmahon, K. L., De Zubicaray, G. I., Martin, N. G., Wright, M. J., Toga, A. W., Terzopoulos, D., and Thompson, P. M. (2011a). Deformable Organisms and Error Learning for Brain Segmentation. In Pennec, X., Joshi, S., and Nielsen, M., editors, Proceedings of the Third International Workshop on 27 Mathematical Foundations of Computational Anatomy - Geometrical and Statistical Methods for Modelling Biological Shape Variability, pages 135–147. [Prasad et al., 2011b] Prasad, G., Joshi, A. A., Thompson, P. M., Toga, A. W., Shattuck, D. W., and Terzopoulos, D. (2011b). Skull-stripping with deformable organisms. In ISBI’11, pages 1662–1665. [Ramponi et al., 1996] Ramponi, G., Strobel, N. K., Mitra, S. K., and Yu, T.-H. (1996). Nonlinear unsharp masking methods for image contrast enhancement. Journal of Electronic Imaging, 5(3):353–367. [Shkvarko et al., 2011] Shkvarko, Y., Atoche, A. C., and Torres-Roman, D. (2011). Near real time enhancement of geospatial imagery via systolic implementation of neural network-adapted convex regularization techniques. Pattern Recognition Letters, 32(16):2197–2205. [Srinivasan and Sundaram, 2013] Srinivasan, A. and Sundaram, S. (2013). Applications of deformable models for in-dopth analysis and feature extraction from medical images–a review. Pattern Recognition and Image Analysis, 23(2):296–318. [Terzopoulos, 1999] Terzopoulos, D. (1999). Artificial life for computer graphics. Communications of the ACM, 42(8):32–42. [Trifas et al., 2006] Trifas, M. A., Tyler, J. M., and Pianykh, O. S. (2006). Applying multiresolution methods to medical image enhancement. In Proceedings of the 44th annual Southeast regional conference, ACM-SE 44, pages 254–259, New York, NY, USA. ACM. [Wang and Tan, 2011] Wang, J. and Tan, Y. (2011). Morphological image enhancement procedure design by using genetic programming. In Proceedings of the 13th annual conference on Genetic and evolutionary computation, GECCO ’11, pages 1435–1442, New York, NY, USA. ACM. [Wang et al., 2002] Wang, Z., Bovik, A. C., and Lu, L. (2002). Why is image quality assessment so difficult? In Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing, volume 4. 28 [Wang et al., 2004] Wang, Z., Bovik, A. C., Sheikh, H. R., and Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612. [Xue et al., 2003] Xue, J.-H., Pizurica, A., Philips, W., Kerre, E., Van De Walle, R., and Lemahieu, I. (2003). An integrated method of adaptive enhancement for unsupervised segmentation of mri brain images. Pattern Recognition Letters, 24(15):2549–2560. [Yang et al., 2009] Yang, J., Wang, Y., Xu, W., and Dai, Q. (2009). Image and video denoising using adaptative dual-tree discrete wavelet packets. IEEE Transactions on Circuits and Systems for Video Technology, 19(5):642–655. [Yi Kim and Jung, 2005] Yi Kim, E. and Jung, K. (2005). Genetic algorithms for video segmentation. Pattern Recognition, 38(1):59–73. [Yu et al., 2010] Yu, J., Tan, J., and Wang, Y. (2010). Ultrasound speckle reduction by a susan-controlled anisotropic diffusion method. Pattern Recognition, 43(9):3083–3092. [Zhang et al., 2010a] Zhang, L., Dong, W., Zhang, D., and Shi, G. (2010a). Two-stage image denoising by principal component analysis with local pixel grouping. Pattern Recognition, 43(4):1531–1549. [Zhang et al., 2010b] Zhang, L., Dong, W., Zhang, D., and Shi, G. (2010b). Two-stage image denoising by principal component analysis with local pixel grouping. Journal of Pattern Recognition, 43(4):1531–1549. 29 FIGURE CAPTIONS Figure 1: Pyramid usually adopted to create an artificial life model (adapted from [Terzopoulos, 1999]). Figure 2: Flow diagram of the proposed image enhancement method. Figure 3: Graph for intensity reduction function for 255=k . Figure 4: Graph of food’s height loss due to the degradation caused by the organism and food’s high gain due to the growing of the food over time. Figure 5: Results of the enhancement methods on synthetic images I: (a) original image, (b) affected image with the addition of noise and blurring; images returned after the applying the methods (c) square root enhancement, (d) quadratic enhancement (e) logarithmic enhancement, (f) normalization, (g) histogram equalization and (h) using our method based on a new artificial life model. (The images (a-), (b-), (c-), (d-), (e-), (f-) (g-) and (h-) represent the difference between the respective processed and original images.) Figure 6: Results of the enhancement methods on syntetic images II: (a) original image, (b) affected image with the addition of noise and blurring; images returned after the applying the methods (c) square root enhancement, (d) quadratic enhancement (e) logarithmic enhancement, (f) normalization, (g) histogram equalization and (h) using our method based on a new artificial life model. (The images (a-), (b-), (c-), (d-), (e-), (f-) (g-) and (h-) represent the difference between the respective processed and original images.) 30 Figure 7: Results of the enhancement methods on synthetic images III: (a) original image, (b) affected image with the addition of noise and blurring; images returned after the applying the methods (c) square root enhancement, (d) quadratic enhancement (e) logarithmic enhancement, (f) normalization, (g) histogram equalization and (h) using our method based on a new artificial life model. (The images (a-), (b-), (c-), (d-), (e-), (f-) (g-) and (h-) represent the difference between the respective processed and original images.) Figure 8: Results of the enhancement methods on real images: (a) original image, (b) affected image with the addition of noise and blurring; images returned after the applying the methods (c) square root enhancement, (d) quadratic enhancement (e) logarithmic enhancement, (f) normalization, (g) histogram equalization and (h) using our method based on a new artificial life model. Figure 9: PSNR graph plotted from the data in Table 1. Figure 10: SSIM graph plotted from the data in Table 2. Figure 11: Profile of the 128th line (randomly chosen) of the original images, affected images, and image returned from our method applied on the images of Figures 5, 6 and 7. Figure 12: Representation of the points classified as belonging to the class DV marked in black: (a) original image, (b) damaged image with the addition of noise and blurring operations; images after application of the (c) square root enhancement method, (d) quadratic enhancement method, (e) logarithmic enhancement method, (f) normalization, (g) histogram equalization, and (h) using our method based on a new artificial life model. 31 Figure 13: Results of the enhancement methods on real images: the original image (a), original image segmented (b), segmentation of images returned by the square root enhancement method (c), quadratic enhancement method, (d) logarithmic enhancement method (e), normalization (f), histogram equalization (g), and the proposed enhancement method (h). 32 TABLE CAPTIONS Table 1: PSNR indexes of the proposed method and other methods for image enhancement presented in the literature. (The lines with values in bold are related to the images of Figs. 5-7.) Table 2: SSIM indexes of the proposed method and other methods for image enhancement presented in the literature. (The lines with values in bold are related to the images of Figs. 5-7.) Table 3: DV and BV values for the images of Figure 8. (The values indicated in bold are the ones that should be approximated.) Table 4: Computational times (in ms) to enhance the images "Lena" and "Cameramen". 33 TABLES Table 1 Image Affected Image Histogram Equalization Normalization Quadratic Enhancement Square Root Enhancement Logarithmic Enhancement Proposed Method 1 5.780 1.327 4.312 1.553 9.787 10.402 12.457 2 5.009 1.406 4.462 2.218 7.794 10.936 12.930 3 3.602 1.716 3.083 1.644 7.597 9.723 12.020 4 9.090 0.455 6.640 2.635 17.130 14.151 18.394 5 5.791 1.348 4.353 1.571 10.392 10.671 15.153 6 4.952 1.326 4.240 2.236 7.990 11.558 15.337 7 3.698 1.666 2.693 1.667 8.220 10.398 16.091 8 8.892 0.443 6.220 2.650 16.543 14.244 18.307 9 6.156 0.946 4.210 2.261 9.787 10.183 12.191 10 4.731 1.560 2.538 2.103 6.951 6.891 8.215 11 4.813 1.479 4.109 2.099 6.241 7.025 8.388 12 4.106 2.260 3.690 3.781 4.856 4.878 5.628 13 6.351 1.024 4.176 2.191 10.978 10.202 13.726 14 5.090 1.522 2.641 2.111 7.910 6.891 9.473 15 4.959 1.427 3.595 2.109 7.142 7.148 10.004 16 5.018 2.202 4.451 4.222 5.235 4.885 6.397 Table 2 Image Affected Image Histogram Equalization Normalization Quadratic Enhancement Square Root Enhancement Logarithmic Enhancement Proposed Method 1 0.147 0.078 0.103 0.061 0.597 0.578 0.728 2 0.265 0.059 0.234 0.113 0.559 0.597 0.747 3 0.134 0.091 0.127 0.120 0.435 0.533 0.722 4 0.448 0.011 0.338 0.113 0.884 0.828 0.917 5 0.169 0.092 0.127 0.078 0.631 0.601 0.844 6 0.272 0.060 0.240 0.118 0.610 0.637 0.843 7 0.162 0.100 0.136 0.129 0.536 0.598 0.874 8 0.440 0.012 0.324 0.127 0.871 0.829 0.912 9 0.308 0.039 0.199 0.033 0.654 0.666 0.765 10 0.133 0.025 0.073 0.060 0.376 0.412 0.536 11 0.195 0.007 0.126 0.012 0.418 0.482 0.560 12 0.100 0.022 0.116 0.135 0.168 0.208 0.268 13 0.337 0.070 0.197 0.044 0.700 0.667 0.814 14 0.185 0.025 0.094 0.071 0.439 0.412 0.603 15 0.224 0.024 0.117 0.026 0.482 0.498 0.650 16 0.179 0.028 0.190 0.212 0.211 0.210 0.337 34 Table 3 Image Original Image Affected Image Histogram Equalization Normalization Quadratic Enhancement Square Root Enhancement Logarithmic Enhancement Proposed Method DV BV DV BV DV BV DV BV DV BV DV BV DV BV DV BV Cup 3.18 0.05 1.41 0.24 5.37 0.30 1.82 0.25 2.21 0.29 1.06 0.15 0.67 0.04 4.48 0.03 Lena 2.81 0.07 1.34 0.22 5.21 0.26 2.39 0.18 2.62 0.18 0.82 0.18 0.54 0.04 2.65 0.06 Came- ramen 4.27 0.04 1.63 0.19 5.34 0.24 2.28 0.20 2.81 0.21 1.02 0.13 0.51 0.06 5.47 0.03 Parrot 2.51 0.15 1.40 0.15 6.29 0.22 1.97 0.13 2.66 0.14 0.92 0.13 0.71 0.03 3.06 0.06 Table 4 Image Histogram Equalization Normalization Quadratic Enhancement Square Root Enhancement Logarithmic Enhancement Proposed Method Lena 0.11 0.62 5.40 1.35 4.68 1460.05 Camer amen 0.11 0.52 5.41 1.35 4.68 2926.87 35 FIGURES Figure 1 Figure 2 36 Figure 3 Figure 4 37 Figure 5 38 Figure 6 39 Figure 7 40 Figure 8 Figure 9 Figure 10 41 Figure 11 42 Figure 12 43 Figure 13 44 New artificial life model for image enhancement Alex F. de Araujo¹, Christos E. Constantinou² and João Manuel R. S. Tavares¹ work_3tgwtllla5dxldjqob45z2m7se ---- w d r - ci d o 6 / c f g - , / STRUCTURAL DYNAMICS TEST SIMULATION AND OPTIMIZATION FOR AEROSPACE COMPONENTS S. E. Klenke and T. J. Baca Experimental Structural Dynamics Department Sandia National Laboratories Albuquerque, New Mexico 87185-0557 ABSTRACT: This paper initially describes an innovative approach to product realization called Knowledge Based Testing (KBT). This research program integrates test simulation and optimization software, rapid fabrication techniques and computational model validation to support a new experimentally-based design concept. This design concept implements well defined tests earlier in the design cycle enabling the realization of simulation and optimization software environment provides engineers with an essential tool needed to support this KBT approach. This software environment, called the Virtual Environment for Test Optimization (VETO), integrates analysis and test based models to support optimal structural dynamic test design. A goal in developing this software tool is to provide test and analysis engineers with a capability of mathematically simulating the complete structural dynamics test environment within a computer. A developed computational model of an aerospace component can be combined with analytical and/or experimentally derived models of typical structural dynamic test instrumentation within the VETO to determine an optimal test design. The VETO provides the user with a unique analysis and visualization environment to evaluate new and existing test methods in addition to simuIating specific experiments designed to maximize test based information needed to validate computational models. The results of both a modal and a vibration test design are presented for a reentry vehicle and a space truss structure. INTRODUCTION: This paper presents the Knowledge Based Testing (KBT) concept which incorporates aspects of design, analysis, and test with rapid prototyping to optimize test based product information from an experiment. The purpose of developing a KBT program is to utilize testing earlier in the design cycle. There are a number % ?- . < highly reliable aerospace components. A test i of research activities that will be discussed in this paper that help modify the conventional testing paradigm that normally tests a product at the end of development. One of these research areas being developed is the Virtual Environment for Test Optimization, VETO'. This innovative tesdanalysis tool reduces test instrumentation iterations, producing better tests through optimal test design. Communication between test and analysis engineers is enhanced early in the design cycle. Traditionally, the role of testing in the product realization process is limited to the end of the design cycle, after hardware manufacturing decisions have been made. As a result, data analysis and test requirements for a component are only considered when the hardware is scheduled for testing. Thus, the full benefit of the analysis in guiding the test is not realized. A goal in developing this software tool is to provide test and analysis organizations with a capability of mathematically simulating the complete test environment within a computer. Derived models of test equipment, instrumentation and hardware, called virtual instruments, can be combined within VETO to provide the user with a unique analysis and visualization capability to evaluate new and existing test methods. By providing engineers with a tool that allows them to optimize an experimental design within a computer environment, pre-test analysis can be performed using analytical models to rapidly evaluate components before manufacturing has occurred. The benefits of using this type of experimental design tool can be very extensive. The user can evaluate the use of different types of test instrumentation and equipment as well as investigating new testing techniques for system identification used to experimentally validate analytical models. A second research activity has focused on using the recently developed stereolithography to reduce the time between concept and product realization by evaluating plastic models to predict the structural behavior of actual metal parts. This TER Page 1 American Institute of Aeronautics and Astronautics DISTRiBUTION OF TW DOCUMENT If3 Portions of this document may be iIlegible in electronic image products. Images are produced from the best available original dolument. DISCLAIMER This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or use- fulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any spe- cific commercial product, process, or service by trade name, trademark, manufac- turer, or otherwise does not necessarily constitute or imply its endorsement, recom- mendation. or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof. stereolithography process has provided the means of economically generating very exact plastic prototype parts from three dimensional solid models. This process rapidly produces prototype plastic parts with astounding accuracy including bolt holes, countersinks, fillets, cutouts, etc. These stereolithography plastic prototypes include all the detail and are nearly perfect replicas of the actual parts. They afford detail that cannot be included in finite element models without extremely expensive high order meshes of very small features which may be important to the structural performance of the design. These prototypes are currently used primarily to verify geometries, for interference checks, and for product visualization. The ability to perform mechanical tests on these plastic prototypes and infer actual metal part structural performance could significantly reduce design cycle times. Mechanical tests of interest include static loading, modal testing and vibration testing. These rapid prototype test results could also be used to validate analytical models early in the design process to provide predictive models for effective design iterations and tradeoff studies. These very exact plastic prototypes offer a new realm of opportunity for the use of plastic models to predict the structural performance of actual metal parts. A third area of ongoing research related to the KBT program is in computational model validation4. Improved finite element modeling techniques and the ever decreasing cost of computing have been a major contributor to the increased capability of computationally-based engineering simulations. The current trend toward increased reliance on simulations for design and performance evaluation is pervasive throughout the industry. However, within the KBT program it is also recognized that there is a need for increasingly sophisticated testing and physical simulation that is essential to the development and validation of engineering models. Thus, by combining the use of the VETO, to optimally design an experiment, with the use of rapid prototyping techniques for generating component parts, the processes of model updating and validation becomes much more efficient. These tools are critical to providing test based information needed to produce confidence in the predictive capabilities of computational models of aerospace components. KNOWLEDGE BASED TESTING (KBT) OVERVIEW: As was mentioned in the previous section, the goal of the KBT program is to position testing earlier in the design cycle. This new vision or role for testing is partly motivated by increased modeling capabilities as well as the recent increase in computational power. In this vision, the definition of testing will not simply mean to provide the "admiral's tests" but will be used as an underlying tool that is essential to model development and validation. A view of the KBT concept is shown in Figure 1 . This bubble chart depicts Rapid Product Realization Figure 1. Knowledge Based Testing Chart the important interactions between design (starting with requirements and specifications), analysis and test used to support rapid product realization. The initial step in the KBT program is the generation of a computer aided design model which represents the geometry of the component or component housing to be tested. This model is generally driven by certain requirements and specifications. This geometric model is then used to assist component visualization, to generate a computational model used for dynamic analysis, and to produce a rapid prototype component through a stereolithography or "fastcast" process* '. The developed computational model of the component is then combined with analytical andor experimentally derived models of the test instrumentation within the VETO software to determine an optimal test design. The VETO is then used to simulate the structural dynamics experiment in order to maximize the test based information gathered from the experiment. The next step in the KBT process is the performance of the experimental test or characterization test using the rapid prototype component given the VETO test design. The results of this experiment are then used to update and validate the computational model. This systematic approach to rapid prototype evaluation integrated with optimal test design software is an +Fastcast uses a stereolithography or laser sintered part as a mold for an automatic metal casting system. Page 2 American Institute of Aeronautics and Astronautics important part of the KBT concept that helps bridge the gap between the old testing paradigm and this new testing vision. Design and analysis methods are integrated to support test simulations. These computationally-based simulations provide predictions of component or system response to testing environments and the results of these simulations are then used to guide an actual characterization test. By completing the test design within the virtual environment, the user can effectively evaluate what test information is needed from the experiment to update and validate the computational model. After this validation process is complete and some level of confidence is established in the computational model, further optimization and simulation studies can be performed using the model before any manufacturing or building of the component is done. It should be noted that the KBT concept also includes aspects of testing necessary for product verification. The KBT definition of product verification includes certification testing, design margin testing6 and failure mode/ mechanism identification testing. Lon g term monitoring of component performance is the final type of KJ3T which involves health monitoring7. ROLE OF STRUCTURAL DYNAMIC TEST SIMULATION AND OPTIMIZATION IN KBT: The VETO software environment currently integrates analysis and test based models to support optimal structural dynamic test design. The structural dynamics testing environment was selected as the initial VETO environment for investigation into areas of design/ analysis/test interfaces, visualization, versatility and repeatability. This initial VETO effort has focused on assisting engineers to maximize the value and information gathered from these tests. A major objective of the VETO software development effort is flexibility. Because the virtual environment is a prototype software system, a primary concern for its design is that the code be easy to develop. To minimize this effort, existing software tools are used wherever possible, provided that the necessary functionality and flexibility are available. Another significant design objective is to provide a final software system that can be used by a variety of individuals who have not been involved in its development. As is described below, VETO integrates several commercial tools to meet these objectives. Currently, the VETO software tool has been developed to support a modal virtual test environment while development of a vibration virtual environment is under development. Many of the tools needed to - support simulations of these two structural dynamic test capabilities are similar, however, because of the various differences in these technologies, two distinct simulation approaches have been developed. The first approach, which was formulated to support modal test optimization, provided the user with an environment where numerical integration is performed on a system of state space models which described the modal test configuration. This method served very well in the support of simulations which did not include closed loop control. However, in our second approach an effort is being made to address simulations that do include control aspects, namely vibration tests. The vibration virtual environment will include hardware-in- the-loop (both control and data acquisition elements) in addition to instrument and equipment models to support the simulation. Further details on these approaches are described below. For both of the structural dynamic testing environments, the database, integration, utility and user interface functions are performed in the Vetomain module, Figure 2. This main interface provides the Figure 2. Vetomain Graphical Interface communication links between the two commercial software packages: AVS which is used for visualization and MATLAB which is used to perform modeling and time integration. The "File" option of the vetomain menu bar allows users to load finite element (FE) models and previously defined virtual test files into the VETO software. The setup for the test simulation is performed using Vetomain to construct parametric models of the instruments, and to formulate interconnections between the models. The user is able to interact with the virtual instrument to provide and view information on the devices needed in the simulation. This user interaction includes the selection of the number of desired response locations, the location of input excitation and the type of instruments needed to support the simulation. This interface module also provides numerous tools to assist the engineer in setting up and understanding how the various virtual instruments interact together to support a specific structural dynamic test simulation. These tools will help guide the engineer in the design of tests that will accurately identify all the desired modes of the structure, select the appropriate test instrumentation as well as identify the proper excitation levels for the test. Page 3 American Institute of Aeronautics and Astronautics MODAL TEST SIMULATIONS AND OPTIMIZATION: The SIMULINK Dynamic System Simulation Software provided by MATLAB is used as the environment to assemble and ultimately integrate . mathematical models of the modal test system. This same software controls the simulation processing. Dynamic response equations are integrated by SIMULINK to provide system output time histories. Within the VETO software, inputs such as type of device and interconnection of instrumentation models are combined to facilitate the rapid connection of various models (including models of test instrumentation, equipment and hardware) which comprise a given modal testing process. In order to achieve rapid set up of this virtual environment, models representing the instrumentation and test equipment need to be developed. These models consist of mathematical descriptions of the dynamic response of the instruments derived either theoretically or experimentally. Most of the instruments modeled to date have been modeled in the discrete state space domain. A number of system identification tools were used in MATLAB to generate the mathematical models. Development was based on an experimental frequency response function of the instrument or equipment. The models of the different types of instruments and equipment (transducers, ampwiers, filters, etc.) needed to represent a complete modal testing environment are located in a SIMULINK Virtual Test Equipment Library (VTELib). When preparing for a test simulation, the selection of the desired test instrumentation from the Vetomain is performed with the assistance of a MATLAB code which searches the VTELib for available instrument models. Optimal experimental design and simulation of the complete test environment is further facilitated by the VETO'S ability to include models of external inputs and electronic instrumentation noise. In addition, complex instrumentation models, such as the data acquisition system (Front End), are constructed by combining multiple submodels to simulate the dynamic response behavior of the hardware. As these models are added to the new simulation system, interconnecting lines are placed between the block models within SIMULINK. These lines represent the flow of signals in the actual modal test system and are specified using the "wire" instrument in Vetomain. Using these interconnecting lines, the input signals from the actuator devices (e.g. impact hammers) are fed to both the device under test (D JT) model or simulation of system excitation and to the "Front End" device for simulation of data acquisition. This DUT model is directly generated using the FE modal data given the user desired input and sensor locations for the simulation. The "Front End" device also receives the signals from the sensors that have been attached to the DUT to simulate structural system response to the actuator input. Both the simulated actuator and sensor signals are linked through amplifier and filtering blocks to represent preconditioning of the signals. The process of simulation begins when the user selects "Run" from the "Simulate" option on Vetomain. The data files which define the dynamics of the desired instrumentation are loaded into the test simulation system and the "Simulation Monitor" is created and displayed. This monitor allows the user to observe the estimated system response based on the numerical integration. The Simulation Monitor represents the data acquisition environment commonly used to gather data in a physical test and is a graphical interface through which the user interacts with the modal test simulation system. After collecting the desired amount of simulated data, the user can activate a window providing an interface to analysis routines for computing measures such .as frequency response functions, power spectral densities and coherence based on the simulated data. MODAL TEST APPLICATION: An aerospace component was selected as a test case for application in the VETO environment. The VETO software simulation tool was used to design an optimal experiment for this mock reentry vehicle, Figure 3. The Figure 3. Mock Re-entry Vehicle Model goal of performing this test design optimization was to observe the vibration modes of interest and to study the interaction of the support flanges with the reentry vehicle housing. The initial steps in the test design , Page 4 American Institute of Aeronautics and Astronautics were to select an appropriate set of instrumentation (including sensors and actuators) needed to perform a modal experiment within the VETO environment and to simulate responses on the aerospace component. A symmetric finite element model of the structure was loaded into the VETO environment for use in the modal test simulation. The test design was performed over a frequency band, up to 250 Hz, which included fifteen vibration modes of the reentry vehicle. The outcome of the VETO test design "Setup" was to excite the structure using an impact hammer and to measure acceleration responses on the reentry vehicle at 40 different locations in order to characterize the dynamic behavior of the component. Approximately half of the response locations were automatically selected using an analysis code to optimize the sensor locations. Some care was taken in utilizing this code to ensure that redundant or closely spaced response locations on the structure were not used in the simulation. Other instrumentation such as the signal conditioning amplifiers and the data acquisition system were also set up with the use of Vetomain in preparation for the test simulation. Data acquisition parameters for sampling, averaging and acquiring the desired analysis measurements were also selected for use in the post-simulation analysis. A number of "Pre-simulation" tools were used to determine the completeness of the test design. First, the effects of mass loading the component were calculated given the test design sensor set. Small accelerometers, Endevco 2250s, were selected in the test design in order to minimize the mass loading effects that might occur during the experimentation. This analysis predicted that very small changes in the frequencies of vibration (approximately 0.1 %) would be experienced if an experimental test was conducted based on the selection of small accelerometers in the test design. Second, a normal mode indicator function and a driving point frequency response function were viewed before conducting the test simulation in order to assess whether the selected sensor and actuator (selected impact location) set would accurately identify all the desired modes of interest on the aerospace component, Figures 4 and 5 . By using the normal mode indicator function, it was determined that a single input location at the nose of the reentry vehicle would not excite a1 of the modes of the structure. Therefore, additional excitation locations would need to be included in the test setup so that all the modes of vibration of the reentry vehicle could be observed. Finally, the Modal Assurance Criterion (MAC) was calculated for the test design to determine if the modes of vibration of the structure could easily be distinguished from one :::: 0.1 50 100 150 2w 250 30( Frequency Figure 4. Normal Mode Indicator Function another given the selected sensor set. Small values on the off-diagonal terms of this MAC matrix, Figure 6, Actuator uui-:i. l0"t 1 0' 1 02 10' Frequency Figure 5 . Drivin Point Frequency Response % unction Mod3 Assurance Criteria Mode# '' Mode # Figure 6 . Modal Assurance Criterion indicate the relative independence of the modes of vibration, thus facilitating correlation with analysis. With the complete test design within the VETO environment, a SIMULINK block model of the test environment is automatically generated to support the simulation of the modal test. Figure 7 shows a partial Page 5 American Institute of Aeronautics and Astronautics Figure 7. Modal Test Simulation Block Model block model of the SIMULINK environment. The next step in the modal test simulation is the numerical integration of the mathematical models within SIMULINK to estimate the system responses. Using the Simulation Monitor, these responses are observed for each set or frame of data to be collected, Figure 8. Once the data are gathered to support the desired measurement set, the test simulation within SIMULINK is concluded. A window which provides Figure 8. Simulation Monitor an interface to the post-simulation analysis routines is then used to download the data for measurement analysis. A number of analysis routines for computing desired measures such as frequency response functions, power spectral densities and coherences are available. The simulated data, which are based on the FE dynamic analysis, were used to generate frequency response functions, Figure 9. VIBRATION TEST SIMULATION AND OPTIMIZATION: As was mentioned earlier, the vibration virtual environment is currently under development. Some of the issues that must be addressed in this particular 50 100 150 200 2YJ 300 10-2 Frequsncy (Hz) Figure 9. Measured FRF using Simulated Data simulation environment that are not included in the modal virtual environment are: closed loop vibration control, detailed shakedamplifier models and shaker/ fixture/component interaction. Because of these issues as well as others, it was determined that a new simulation approach would be developed using some of the existing hardware that directly support vibration tests. These simulations, called hardware-in-the-loop simulations, would provide the user with the ability to combine actual vibration test hardware (such as vibration control and data acquisition systems) with instrumentation models to support vibration test design. Models of the external load (the shaker/ amplifier, interface block, fixturing, device under test, accelerometers and signal conditioning elements) will be combined into a single state space model using MATLAJ3 routines and then downloaded onto a real- time control processor. With the simulation model of the external load residing on the processor, the physical hardware is then connected to the processor for performing the vibration test simulation. The particular processor used in our simulation environment was developed in-house and has the capability of handling 16 inputs and 16 outputs with a total of 128 states. The sampling rate of the processor is based on the size or number of states of the model. Figure 10 shows a simple representation of this hardware-in-the-loop simulation environment. The virtual vibration environment will allow the user to evaluate the overall testability of a component or system. The test engineer will be able to observe the effects that different control parameters might have on the DUT without putting the physical hardware through an actual vibration test. An advantage of this environment is that it can help limit unnecessary vibration inputs to flight hardware. New and existing fixture designs can also be studied through the development of analytical models that can be integrated into this simulation environment. Testing Page 6 American Institute of Aeronautics and Astronautics r . I(eal-l'ime Codrol Processor Sun Workstation d MATLAB &FtT(P VibrationControl Conplter Figure 10. Vibration VETO Concept methodologies such as the number of control transducers, the location of control transducers and the control strategy or method can also be investigated within the vibration environment. These developments will help assist analysis, design and test engineers in maximizing the value of each vibration test. VIBRATION TEST APPLICATION: A large gamma truss structure was selected as the test case for the vibration test simulation environment*. Figure 11 shows a uicture of the actual truss hardware Figure 1 1. Gamma Truss that was used to demonstrate the hardware-in-the-loop simulation concept. It should be noted that due to the ongoing development of the vibration virtual environment, not all of the instrumentation models (namely the shakerhnplifier model) were included in this demonstration of the vibration simulation capability. Our goal at this point is simply to show the hardware-in-the-loop simulation process. The truss structure was designed with integrated sensors and actuators to provide a testbed for studying structural controls applications. Through numerous control studies, experimental data had been gathered in the form of input/output models for the truss. For use in the vibration test simulation, an eight inpuueight output experimental model was selected as the model that would represent the truss in the simulation. This state space model of the truss was combined with models of the sensors (accelerometers) and signal conditioning elements to form a single state space model of the vibration external load. This combined state space model was then downloaded onto the real- time control processor in order to support running the test simulation. An actual vibration test control and data acquisition system was then connected to the real- time processor for the simulation study. A single vibration drive signal was generated from an arbitrary shaped random spectrum (up to 200 Hz) and then used in the simulation to excite the external load model on the processor. The first output or response channel from the processor was feed back into the vibration control computer as the control channel. The drive signal was updated in order to match this shaped random spectrum for the chosen control channel. This drive signal was notching at 4 4 Hz due to a lightly damped mode at that frequency. Figure 12 shows the drive spectrum for this vibration simulation. By utilizing this vibration test simulation environment, the engineer is able to select desired control parameters such as control Degrees of Freedom, frequency resolution and time constants to successfully control a specified vibration test, Figure 13. This simulation example shows the strong advantage of having an environment to setup and customize vibration tests. im Figure 12. Vibration Drive Spectrum Page 7 American Institute of Aeronautics and Astronautics Using this tool, test engineers can easily change parameters within the simulation to optimize vibration tests. CONCLUSIONS: An important goal in developing the KBT program is to provide an environment in which designers and analysts are given earlier access to test based information in order to make intelligent design decisions. There are a number of significant research activities ongoing that directly support this new testing vision. A test simulation and optimization software tool called VETO is one of these activities that helps position testing earlier in the design cycle. Using this simulation software and rapid prototype parts, characterization tests can be performed in order to generate test data needed to update and validate computational models. Within the VETO software environment, engineers are able to investigate the testing of aerospace components, using computational models, prior to the existence of any hardware. A goal in developing this software tool is to provide test and analysis organizations with a capability of mathematically simulating the complete test environment within a computer. Applications of this test simulation environment have been shown for both modal and vibration capabilities essential to the development of aerospace components. ACKNOWLEDGEMENT: This work was supported by the United States Department of Energy under Contract DE-AC04- 94AL85000. REFERENCES: [ l ] Klenke, S., Reese, G., Schoof, L. and Shierling, C., "Modal Test Optimization Using VETO (Virtual Environment for Test Optimization)", Sandia Report SAND95259 1, January 1996. [2] Gregory, D. and Hansche, B., "Rapid Prototype and Test", Sandia Report, 1996. [3] Jacobs, P., "Rapid Prototyping & Manufacturing Fundamentals of Stereolithography", McGraw, 1992. [4] Dalton, E., Chambers, B., Bishnoi, K., Bateman, V., and Baca, T., "Test Validation of MANTA: A Code for the Practical Simulation of Shock in Complex Structures", Proceedings of the 65th Shock and Vibration Symposium, Vol. 1, San Diego, CA, October [5] Zanner, F., and Maguire, M., "FASTCAST, A Program to Remove Uncertainty from Investment Casting", Modem Casting, 8/93. 1994, pp. 66-75. [6] Baca, T., Bell, R., and Robbins, S., "Conservatism Implications of Shock Test Tailoring for Multiple Design Environments", Proceedings of the 58th Shock and Vibration Symposium, Vol. 1, October 1987, [7] James 111, G., "Development of Structural Health Monitoring Techniques Using Dynamic Testing", Sandia Report SAND96-0810, April 1996. [8] Lauffer, J., Allen, J., and Peterson, L., "Structural Dynamics Considerations for Structural Control", Proceedings of the 7th International Modal Analysis Conference, Vol 11, February 1989, pp. 1079-1086. pp.29-47. Page 8 American Institute of Aeronautics and Astronautics work_3tqvfeholvburhnwt43i3vso7i ---- Microsoft Word - IfacFin.doc TOWARDS SPECIALIZED EXPERT SYSTEM BUILDING TOOLS: A TOOL FOR BUILDING IRRIGATION EXPERT SYSTEMS Samhaa R. El-Beltagy, Gamal Al-Shorbagi, Hesham Hassan and Ahmed Rafea Central Lab for Agricultural Expert Systems Agricultural Research Center Ministry of Agriculture and Land Reclamation. Cairo, Egypt El-Nour St. P.O Box 438 Dokki,Giza, Egypt E-mail: {samhaa, gamal_sh, hesham, rafea}@.claes.sci.eg) Abstract: Expert system development is very often an expensive process which requires significant time and effort. This paper investigates the idea of building a specialized tool for a specific task in order to realize very rapid development times with very little effort on the part of the developer thus considerably reducing the overall development cost. The particular task that this paper addresses is that of irrigation scheduling. The work presented describes various aspects of a tool that was built for rapidly developing irrigation expert systems for vegetable crops in Egypt, and shows how greatly this tool simplifies the process of building such systems. Copyright © 2004 IFAC Keywords: Agriculture, Expert systems, Knowledge tools, Knowledge engineering, Modelling. 1. INTRODUCTION Despite the availability of a wide spectrum of expert system tools and shells, developing an expert system is still often a time consuming and expensive process. Further more, the development of a ‘good’ expert system is still also highly dependant on the skills of its knowledge engineers. Reducing expert system development costs in terms of development time and required skills continues to be the goal of many research bodies. When addressing this issue for expert system development in general, improvements in cost and effort are likely to come solely from enhancements in development shells. However, much more significant improvements can be achieved in domains where expert system development is a recurring activity for a set of recurring tasks. Building agricultural experts systems for a wide variety of crops and a number of tasks (diagnosis, treatment, irrigation, fertilization, etc) is an example of such domains. Instead of replicating the same effort every time an expert system is being developed, commonalities can be captured and re-used. Such re-use can be achieved by standardizing ontologies and by building specialized tools for specific tasks within a specific domain. Such tools would not only contain a problem solving model for some given task, but would also include all knowledge that can be re-used for this particular task. The goal of this paper is to present an example of such a tool built specifically for irrigation expert systems. Section 2 of this paper briefly reviews some related work, section 3 explains the goals of irrigation expert systems as well as the goal of the tool to be presented, section 4 describes the tool that has been built, and section 5 concludes this paper and presents some ideas for feature work. 2. RELATED WORK Knowledge system technology has been applied to a variety of agricultural problems since the early 1980s. Generally we can classify agricultural activities into: activities that are done on the farm prior to cultivation and activities, which are done during cultivation operations. The scope of this research concentrates on an activity type that is done during cultivation. Specifically, this work focuses on the irrigation scheduling activity. Many systems have been built to address the irrigation scheduling problem. These include for example the NEPER Wheat expert system (Kamel, et al., 1995). The NEPER Wheat expert was developed for handling most of the production management activities including irrigation for the wheat crop. The generic task knowledge system development methodology (Chandrasekaran, 1988) has been utilized to develop this system. The generic task methodology is one that has recognized the importance of task re-use for speeding up the process of building knowledge based system. The methodology has identified a number of problem solving building blocks that target specific problems. For example, the hierarchical classification problem solver can be used to build a diagnosis system while a routine design problem solver can be used to build a scheduling system. Other examples of irrigation expert systems include the barley crop management expert system (Boner I., 1992) which was designed to produce water and fertilizer recommendations to maximize yield., and the CALEX/Cotton expert system (Ostergard M.and Goodell P. 1992; Plant R. 1989) which was designed to link diverse production components including irrigation, pest management and agronomy. The CUPTEX (El- Dessouki, et al., 1993) and CITEX (Edrees S., 1997) systems for cucumber and citrus production management respectively cover most of the agricultural practices for cucumber crop under tunnel and citrus cultivation in open fields including irrigation and fertilization. References for other expert system in this domain may be found in (Rafea, 1998). However each of these systems was built from scratch to target a specific crop. The extent to which these system could make use of previously built ones was limited on identifying a suitable knowledge representation scheme, and/or a suitable problem solving model. While these do in fact reduce the effort involved in building an irrigation expert system, they do not capitalize on the fact that one generic model containing common knowledge, equations and a problem solving model, can be applied to all. Through this application the reduction in both time and effort can be much more significant. 3. IRRIGATION EXPERT SYSTEMS: A CLOSER LOOK The main goal of most irrigation expert systems is to produce an irrigation schedule for a particular crop in a particular farm. The output schedule is a plan of water quantities to be applied and the time of application according to the requirements of the plant and the affecting factors like the soil type, climate, source of water, etc. Determining a crop’s water requirements is no trivial task, but is one that involves many equations that in turn involve an extensive number of complex variables relative to the soil, water, climate, crop etc. This work capitalizes on the fact that even though the irrigation requirements for various crops may vary, a number of basic concepts and irrigation determination techniques are shared among all. By recognizing this fact, a tool was built to hide all the complexities of an irrigation’s system equations and knowledge from the developer, while highlighting any missing knowledge that can vary from one crop to another thus guiding the developer as to ‘what’ knowledge needs to be acquired. In a sense, the developer is simply offered an empty template for specifying the inputs and rules for determining or calculating the value of a property of some pre-identified concept. This greatly simplifies the knowledge acquisition task, and makes it possible for any person with some computer knowledge to easily build an irrigation expert system. In the next section we present this tool. 4. THE DEVELOPED IRRIGATION TOOL A typical irrigation expert system is made out of concepts, relations, and equations on top of which a task layer is built. The developed specialized irrigation tool was built on top of an expert system development tool called KSR. KSR was designed and implemented at the Central Laboratory for Agricultural expert systems in Egypt. In this work, we have aimed to identify and capture all knowledge that is related to the irrigation task regardless of the crop, as well as identify knowledge that may vary from one crop to another. The steps for developing the specialized irrigation building tool can be summarized as follows: 1. Identify the main tasks involved in the production of an irrigation schedule 2. Acquire and model knowledge related to each of these tasks (concepts, equations, relations, etc) 3. Identify concepts and relations that may vary from one crop to another 4. Develop a generic irrigation task layer 5. Build an interface that hides non-changing knowledge from the user as well as enables him/her from accessing modifiable knowledge and editing it. The result of following these steps was a tool capable of assisting developers in rapidly building an irrigation scheduling system for any vegetable crop in Egypt. 4.1 Tool features and description The tool enables the production of an irrigation schedule on a weekly basis provided that climate reference data is available. The irrigation model caters for flooding, or drip irrigation systems. The model also includes equations to cover the case of planting within an open field, high tunnels, or low tunnels. Since the method of determining the value of potential evotranspiration (et0), which is essential for determining a crop’s water requirements, differs across planting environments, two well known methods for calculating this value known respectively as Hargreaves (Hargreaves and Samani, 1982) and Pennman (Doorenbos and Pruitt, 1984), have been adopted. The Hargreaves model applies only to high tunnels, while the Pennman model is applicable to the other two planting environments. The tool currently has about ninety five built in concepts, thirty five relations, eleven tables, and fifty two equations as well as the irrigation scheduling problem solving model. 4.2 Using the tool To develop an irrigation expert system, all the developer has to do is to examine the major tasks, identify any missing knowledge and enter that into the tool. Within the developed tool, eight major tasks for generating an irrigation schedule have been identified. These are as follows: 1. Determine the Growth Stage of the Plant 2. Determine factors affecting the Plantation 3. Calculate the value of et0 (potential evotranspiration) 4. Calculate the value of eta (exhaustion of actual water) 5. Calculate the value of pawc (water available in soil) 6. Calculate the irrigation interval 7. Calculate water requirements 8. Calculate required irrigation units For each of these tasks, any relations or rules that may vary from crop to crop were identified. These have been termed ‘modifiable steps’ as they can be modified depending on the crop under consideration. For each modifiable step, the output is determined before hand, but the inputs that influence that output are left for the developer to specify. Each modifiable step is either represented by a set of rules constituting a relation, or a table. The expert system developer need only inspect the ‘modifiable steps’ associated with each of the eight major identified tasks and fill in the missing knowledge for each. These are clearly represented in the developed tool as shown in figure 1. After examining each step, the task of the developer is to acquire the knowledge which will accurately determine the required output of the step. So, the steps for developing an irrigation system can be summarized as follows: • Determine the factors that influence any of the output(s) of a ‘modifiable step’. • Understand how the variation of these factors can influence the output. • Through the developed tool, locate the concepts and properties representing the identified factors and use them to write the rules or to fill in the tables associated with each step, so as to reflect the acquired knowledge To further assist the developer in this task, each of the modifiable steps is described and the requirements from the developer are clearly laid out Figure 1: Irrigation Expert system developer interface for him/her. An example of the description of modifiable steps associated with two of the identified major tasks is shown in figure 2. In total the developer needs only modify five relations and four tables. While the developer can view any of the built in equations, he/she can not edit any of these. For modifying relations and tables, two specialized editors are provided. The relation editor is shown in figure 3. A separate interface is also provided for the entry of reference data (for the farm, soil, water, climate, etc.) by either the expert system developer or the end user. 4.3 Knowledge Representation For the irrigation task, mathematical dependency graphs were chosen to model the irrigation knowledge. The concept of dependency networks (DNs) was first introduced in the context of developing truth maintenance systems (Forbus and Kleer, 1993). A DNs is a directed graph consisting of nodes- representing variables or concepts- and links- representing the dependencies between these nodes. A mathematical dependency graph is a novel knowledge representation scheme suitable for representing mathematical knowledge in a declarative manner such that it can be traversed to calculate a quantity of a numeric object. A dependency graph consists of a set of nodes related to each other by directed arcs. An arc between a node A and a node B means that the value of node A depends on the value of node B or in other words, that there is a depend-on relation between these two nodes. The mathematical dependency between a node and any other nodes means that the value of this Figure 3: A snapshot of the relation editor �������� � � ���� �� ����� ������������� � ���� ���� � �������� ����� � ���� ���� ������� �� ��������������� �������������� ����� � ������� �������� ��������������� ���� ��� ������������ �� ��������� ��� ��� � ��� ������ ��������������������� ����� Steps associated with Task 1: Determine the Growth Stage of the Plant There is only one step associated with this task the name of which matches the name of the task itself “Determine Growth Stage”. The step is of type ‘Relation’ meaning that it is composed of a set of rules. The purpose of this step is to be able to determine the length of each of a crop’s growth stages in days. Example growth stages include: Initiation, Vegetative, Flowering, etc. Factors that can affect these, can include the crop’s variety and/or other factors depending on the crop itself. Steps associated with Task 2: Determine factors affecting the Plantation Step1: Calculate Expected Fruit Yield (efy):This modifiable step is represented by a table. In this table you should place any factor that can affect expected fruit yield, as a column. There is only one possible output (efy) the value of which is specified based on the variation of the affecting factors. Step2: Calculate Average Expected Fruit Yield:This step is also represented by a table, and steps similar to those described for efy should be followed to fill it. In this table the only possible output is that of the average expected fruit yield Step3: Determine Variety Characteristic Bases: This step is of type ‘Relation’ and its output is the value of the ‘variety factor’. The developer should thus aim to write the rules the determine the ‘variety factor’. Step4: Determine Optimal Number of Plants: This step is also of type ‘Relation’ and its output is the value of of the ‘optimal number of plants factor’ Figure 2: A sample of help provided for filling in modifiable steps node is computed through a mathematical function, which includes the other nodes as parameters of this mathematical function. This form of knowledge representation was chosen for the following reasons: 1. To avoid recalculations. 2. To enhance readability and maintainability. 3. To promote reusability. 4. To enable explanation. Figure 4 shows a simplified diagram for part of the dependency graph used in our tool. F in the diagram denotes a function while R denotes a relation. 5. CONCLUSIONS AND FUTURE WORK An irrigation scheduling expert system is a complex system that requires a lot of effort for building. A typical system will have a considerable number of concepts, relations, tables and equations built into it. By capturing re-usable knowledge across various irrigation systems, the required development time can be drastically reduced. This paper presented a tool that can achieve this for irrigation expert systems for vegetable crops in Egypt. Through this tool the process of developing an irrigation expert system is greatly expedited and the effort for developing such a system is greatly reduced. The developer does not need to concern himself/herself with the complexities of the built in model. All he/she has to do is focus on acquiring knowledge to determine the output a very small number of relations and tables. Of the thirty five relations built into the system, only 5 need be modified and of the eleven tables only 4, while none of the fifty two built in equations need be changed. The problem-solving model is also built in and hidden from the developer. This not only cuts down of the development time of an irrigation expert system, but also alleviates the need for a skilled knowledge engineer for developing a powerful system. By providing clear instructions for how to use the tool, almost any person with access to a domain expert can easily build an irrigation expert system. In fact, domain experts themselves can easily use this tool for research purposes. The tool was used for training non- computer specialists on developing irrigation expert systems as part of a one week workshop on expert system development in general. Only two days were designated for the purpose of training a developing a simple irrigation expert system. All workshop attendees were able to develop a working system within this limited time span. Future enhancements to the tool will extend it to cover horticultural crops. Work is also currently being carried out in order to augment the developed tool with explanation facilities which is very likely to transform the tool into a powerful education instrument. REFERENCES Boner I., Parente A. and Thompson K. (1992). Knowledge-Based systems for crop management. In: Fourth International Conference On Computers In Agriculture, Orlando, Florida. American Society of Agricultural Engineers. . Chandrasekaran,B. (1988). Generic Tasks as Building Blocks for Knowledge-Based Systems: The Diagnosis and Routine Design Examples. The Knowledge Engineering Review 3, 183-210. Figure 4: A simplified diagram of a section of the dependency graph employed current_month Pcf et0_pennman Pcf et0 Farm.value=high_tunnel Farm.value=open_field OR Low_tunnel Irrigation.method=daily Irrigation.method=weekly Pcf et0_hargerve F et0_har_c F et0_har_n R control_f F smooth_et0_hargerve Irrigation.method=daily Irrigation.method=weekly currentay -avg_tc -avg_rh -ash -msh -ra -next_month Not Shown Not Shown Doorenbos J. & Pruitt W.O. (1984. Crop WaterRequirements. Food and Agriculture Organization of the United Nations. Edrees S., El-Azhary E. Tawifik K. Rafea A (1997). An Expert Systems for Citrus Production Management. In: Proceedings of the Fifth International Conference on Artificial Intelligence Applications, Cairo, Egypt. El-Dessouki, A., S. El-Azhary, and S. Edrees (1993). CUPTEX: An Integrated Expert System For Crop Management Of Cucumber. In: Proceedings of ESADW-93, Cairo, Egypt. MOALR.. Forbus K. & Kleer J. (1993) Building Problem Solvers, pp 151-170. The MIT press, Cambridge, Massachusetts, London. Hargreaves,G.H. and Z.A.Samani, (1982). Estimating potential evapotranspiration. Journal of Irrigation and Drainage Division 108, 225- 230. Kamel,A., K.Schroeder, J.Sticklen, A.Rafea, A.Salah, U.Schulthess, R.Ward, and J.Ritchie, (1995). An Integrated Wheat Crop Management System Based on Generic Task Knowledge Based Systems and CERES Numerical Simulation. AI Applications 9. Ostergard M.and Goodell P. (1992). Delivering Expert Systems to Agriculture: Experiences with CALEX/Cotton. In: Fourth International Conference On Computers In Agriculture, Orlando, Florida. American Society of Agricultural Engineers. Plant R. (1989). An integrated expert decision support system for agricultural management. Agricultural Systems 29, 49-66. Rafea A. (1998) Agricultural Expert Systems. In: The handbook of applied expert systems (J.Liebowitz (Ed)). CRC Press , LLC,USA. work_3vo6ohjwjzcebnzidwoautzbu4 ---- .. . Expert Sysrems will1 Applications 41 (2014) 7579-7595 Bivariate quality control using two-stage intelligent monitoring scheme CrossMark - Ibrahirn Masood a.'!', Adnan Hassan 'Faculty of Mechanical and Manufacturing Engineering, Universiti Tun Hussein Onn Malaysia, 86400 Pant Raja, Batu Pahat, Johor, Malaysia b ~ a c u l t y ofMechanical Engineering, Universiti Teltnologi Malaysia, 81310 UTM Sltudai, Johor, Malaysia A R T 1 C L E I N F O A B S T R A C T Article history: Available online 5 June 2014 -- Keywords: Balanced monitoring Bivariate quality control Statistical features Synergistic artificial neural network Two-stage monitoring In manufacturing industries, it is well known that process variation is a major source of poor quality products. As such, monitoring and diagnosis of variation is essential towards continuous quality improve- ment. This becomes more challenging when involving two correlated variables (bivariate), whereby selection of statistical process control (SPC) scheme becomes more critical. Nevertl~eless, the existing tra- ditional SPC schemes for bivariate quality control (BQC) were mainly designed for rapid detection of unnatural variation with limited capability in avoiding false alarm, that is, imbalanced monitoring per- formance. Another issue is the difficulty in identibing the source of unnatural variation, that is, lack of diagnosis, especially when dealing with small shifts. In this research, a scheme to address balanced mon- itoring and accurate diagnosis was investigated. Design consideration involved extensive simulation experiments to select input representation based on raw data and statistical features, artificial neural net- work recognizer design based on synergistic model, and monitoring-diagnosis approach based on two- stage technique. The study focused on bivariate process for cross correlation function, p = 0.1-0.9 and mean shifts, p = k0.75-3.00 standard deviations. The proposed two-stage intelligent monitoring scheme (2s-IMS) gave superior performance, namely, average run length, ARLl= 3.18-16.75 (for out-of-control process), ARLO = 335.01-543.93 (for in-control process) and recognition accuracy. RA = 89.5-98.5%. This scheme was validated in manufacturing of audio video device component. This research has provided a new perspective in realizing balanced monitoring and accurate diagnosis in BQC. 2014 Elsevier Ltd. All rights reserved. 1. Introduction In manufacturing industries, when quality feature of a product involves two correlated variables (bivariate), a n appropriate SPC charting scheme is necessary to monitor and diagnose these variables jointly. Specifically, process monitoring refers t o t h e iden- tification of process condition either in a statistically in-control or out-of-control, whereas process diagnosis refers t o t h e identifica- tion of t h e source variable(s) for out-of-control condition. In addressing this issue, the traditional SPC charting schemes for BQC such as x2 (Hotelling, 7947), multivariate cumulative sum (MCUSUM) (Crosier, 1988), and multivariate exponentially weighted moving average (MEWMA) (Lowly. Woodall, Champ, & Rigdon, 1992; Prabhu 8 Rungel-, 1997) are known t o be effective in monitoring aspect. Unfortunately, they a r e merely unable to provide diagnosis information, which is greatly useful for a quality practi- tioner in finding t h e root cause error and solution for corrective action. Since then, major researches have been focused on diagnosis * Corresponding author. Tel.: +607 4537700. E-mail addresses: ibraliim@~~Lli~n.edu.~ny (I. Masood), atlnan@fl~m.ul~ii.~ny (A. Hassan). URLs: l~~~p://www.~~~I~~n~edu~my (I. Masood), http://www.utm.my (A. Hassan). aspect. Shewhart-based control charts with Bonferroni-type control limits (Alt, 1985), principle component analysis (PCA) (Jacltson, 1991), multivariate profile charts (Fuchs & Ben,jamini, 1994), r2 decomposition (Mason, Tracy, & Young, 1995) and Minimax control chart (Sepulveda & Nachlas. 1997), among other, have been investi- gated for such purpose. Further discussions on this issue can be found in Lowry and Montgomery (1995), I 200). Nevertheless, the existing traditional SPC schemes were mainly designed by focusing on rapid detection of out-of- control condition (ARL, * I ) but it has limited capability in avoid- ing false alarm (ARL < 200). Fig. 1 illustrates the concepts of imbalanced monitoring vs. balanced monitoring as the central theme for this investigation. Based o n diagnosis viewpoint, an effective bivariate SPC scheme should be able to identify the source variable(s) of out-of-control condition a s accurate as possible. Nevertheless, it is difficult to cor- rectly recognize when dealing with small shifts ( ~ 1 . 0 standard deviation). Chih and Rollier (1994). Chih and Rollier (1995), Zorriassatine, Tannock, and O'Brien (2003), Chen and Wang (2004) and Yu and Xi (2009). for examples, have reported less than 80% accuracy for diagnosing mean shifts at 1.0 standard deviation. Among others, only Guh (2007) and Yu et al. (2009) reported the satisfied results ( >90% accuracy). The imbalanced monitoring and lack of diagnosis capability as mentioned above need further investigation. In order to minimize erroneous decision making in BQC, it is essential to enhance the overall performance towards achieving balanced monitoring (rap- idly detect process variationlmean shifts with small false alarm as shown in Fig. 1) and accurate diagnosis (accurately identify the sources ofvariation/mean shifts). Additionally, the BQCapplications are still relevant in today's manufacturing industries. In solving this issue, a two-stage intelligent monitoring scheme (2s-IMS) was designed to deal with dynamic correlated data streams of bivariate process. This paper is organized as follows. Section 2 describes a modeling of bivariate process data streams and patterns. Section 3 presents the frameworlc and procedures of the 2s-IMS. Section 4 discusses the performance of the proposed scheme in comparison to the traditional SPC. Section 5 finally outlines some conclusions. 2. Modeling of bivariate process data streams and patterns A large amount of bivariate samples is required for evaluating the performance of the 2s-IMS. Ideally, such samples should be tapped from real world. Unfortunately, they are not economically available or too limited. As such, there is a need for modeling of synthetic samples based on Lehman (1977) mathematical model. Further discussion on data generator can be found in Masood and Hassan (201 3). In bivariate process, two variables are being monitored jointly. Let Xl-i=(Xl.I,. . .,XI-24) and X2-i=(X2.1.. . . . X Z . ~ ~ ) represent 24 observation samples for process variable 1 and process variable 2 respectively. Observation window for both variables start with sam- ples i = (1,. . . -24). It is dynamically followed by (i + I ) , (i + 2) and so on. When a process is in the state of statistically in-control, samples from both variables can be assumed as identically and indepen- dently distributed (i.i.d.) with zero mean ( p o = 0) and unity standard deviation (go = 1). Depending on process situation, the bivariate samples can be in low correlation ( p = 0.1 -0.3), moderate correla- tion ( p = 0.4-0.6) or high correlation ( p = 0.7-0.9). Data correlation ( p ) shows a measure of degree of linear relationship between the twovariables. Generally, this data relationship is difficult to be iden- tified using Shewhart control chart as shown in Fig. 2. On the other hand, it can be clearly indicated using scatter diagram. Low corre- lated samples yield a circular pattern (circular distributed scatter plot), moderate correlated samples yield a perfect ellipse pattern, whereas high correlated samples yield a slim ellipse pattern. Disturbance from assignable causes on the component variables (variable-1 only, variable-2 only, or both variables) is a major source of process variation. This occurrence could be identified by various causable patterns such as mean shifts (sudden shifts), trends, cyclic, systematic or mixture. In this research, investigation was focused on sudden shifts patterns (upward and downshift shifts) with positive correlation ( p > 0). Seven possible categories of bivariate patterns were considered in representing the bivariate process variation in mean shifts as follows: N (0,O): both variables XI-i and X2-i remain in-control. US (1,O): shifted upwards, while X2.i remains in-control. US (0.1): X2.i shifted upwards, while remains in-control. US ( 1 , l ) : both variables Xl.i and X2.i shifted upwards. DS (1,O): shifted downwards, while X2.i remains in-control. DS ( 0 , l ) : X2.i shifted downwards, while remains in-control. DS ( 1 , l ) : both variables X1.i and X2.i shifted downwards. Reference bivariate shift patterns based on mean shifts f3.00 standard deviations are summarized in Fig. 3. Their structures are unique to indicate the changes in process mean shifts and data correlation. The degree of mean shifts can be identified when the center position shifted away from zero point (0,O). 3. Two-stage intelligent monitoring scheme As noted in Section I , an integrated MSPC-ANN was combined in a single-stage monitoring scheme (direct monitoring-diagnosis) as proposed in Chen and Wang(2004). Niaki and Abbasi (2005). and YLI et al. (2009). The other schemes based on fully ANN-based models as proposed in Zorriassatine, Tannocli, and O'Bricn (2003), C u h (2007). Yuand Xi (2009) and El-Midany et al. (2010) also can be classified as a single-stage monitoring scheme. In this research, two-stage mon- itoring scheme was investigated by integrating the powerful of MEWMA control chart and Synergistic-ANN model for improving the monitoring-diagnosis performance. Framework and pseudo- code (algorithm) for the proposed scheme are summarized in Figs. 4 and 5 respectively. It should be noted that an initial setting as fol- lows needs to be performed before it can be put into application: Load the trained raw data-ANN recognizer into the system. I. Masood, A. Hassan/Expert Systems with Applications 41 (2014) 7579-7595 Y I S e n s ~ t ~ v ~ t v in mean s h ~ R detection Capablhtv In false alann avo~dance I/ Shorter ARLl represents fastel Longer ARLO repiesenis smaller 1 detect~on of plocess mean sh~fts false alaim I t b U i 1 Current state I Imbalanced n~onitoring: able to detect process mean shifts rapidly (ARL, =1 1) but has limited capability to avoid false alarm (ARLO 5 200) I Desired state (for this research) 1 I / Balanced monltonn. [reasonable fol cu~rent ~ r a c t l c e i able to detect process mean 1 s h ~ f t s rapldly (ARL, Z, 1) and nlalnta~n small false alarm (ARLO >> 200) % I I I Ideal state i 1 Perfect balanced: able to detect process mean shifts as soon as possible (ARLI = 1) 1 I . . 1 w~tllout tnggenng any false alarm (ARLO = m) 1 L ~ ~ Fig. 1. Current state and desired state towards balanced monitoring. Set t h e values of means (p01,po2) and standard deviations (nol.ao2) of bivariate in-control process (for variables and X2.i). These parameters can be obtained based on historical or preliminary samples. Perform in-process quality control inspection until 24 observa- tion samples (individual or subgroup) to begin the system. Recognition window size is set to 24 observation samples (for variables Xl-i and X2_i) since it provided sufficient training results and statistically acceptable to represent normal distribution. Preli- minary experiments suggested that a smaller window size (<24) gave lower training result due to insufficient pattern properties, while a larger window size (>24) does not increase the training result but burden the ANN training. Rational to integrate the MEWMA control chart and the Synergistic-ANN model are based on preliminary experiments. Generally, the MEWMA control chart is ltnown to be effective for detecting bivariate process mean shifts more rapidly compared to the x2 control chart. Furthermore, it is very sensitive when deal- ing with small shifts (G1.00 standard deviations). Unfortunately, based on one point out-of-control detection technique, it gave lim- ited capability to avoid false alarm (ARb 6 200). This becomes more critical when the variables are highly correlated. In the related study, pattern recognition scheme using a Synergistic- ANN model gave better capability in avoiding false alarm (ARL,, > 200). As such, it can be concluded that process identifica- tion based on recognition of process data stream patterns (Synergistic-ANN model) is more effective compared to detection of one point out-of-control (MEWMA control chart). Nevertheless, different techniques should have their respective advantages in terms of pointlpattern discrimination properties. In order to fur- ther improve the monitoring performance (ARLl =. l , ARLO >> 200), it is useful to combine both discrimination properties (MEWMA control chart and Synergistic-ANN recognizer) by approaching I. Masood, R HassanlExpert Systems with Applications 41 (2014) 7579-7595 Shewhart Control Chart Scatter Diagram -- Low Correlation Low correlation 3 1 2 2 1 1 I 1 Y -2 I ' I' zo 41 m KI 100 1 '0 20 40 60 $0 Inn Nvrnbor orsamplcs -3 -2 -1 0 1 2 Number uisamplcs XI Moderate Correction Moderatc correlation 3 I 2 ..' ZO 40 60 SO 100 3~ 20 40 60 RO l W - 4 -3 -2 -1 0 1 2 3 Nlmbcl oisdnlplcs Numher urnampler X 1 High Correlation High correlation 3 3 I I I I 2 2 3~ 20 40 60 80 100 ' 0 20 10 60 80 IMI .4 -3 -2 -1 0 1 2 3 Nunlbel ofratnplcs Number ulsamples x1 Fig. 2. Shewhart control charts and its respective scatter diagrams. two-stage monitoring and diagnosis. In the first stage monitoring, the MEWMA control chart is used for triggering bivariate process mean shifts based on 'one point out-of-control' as per usual. Once the shift is triggered, the Synergistic-ANN recognizer will perform second stage monitoring and diagnosis through recognition of pro- cess data stream patterns that contain one of several out-of-control points. This approach is suited for 'recognition only when neces- sary' concept, that is, it is unnecessary to perform recognition while the process lies within a statistically in-control state. Besides, recognition is only necessary for identifying patterns sus- pected to a statistically out-of-control state. Besides producing smaller false alarm, this approach will also reduce computational efforts and time consumes for pattern recognition operation. 3.1. M E W M A control chart The MEWMA control chart developed by Lowry ct al. ( 1992) is a logical extension of the univariate EWMA control chart. In the bivariate case, the MEWMA statistics can be defined as follows: [U:(EWMAI, - 1 1 ) ' + u j ( E W M A z i - p 2 j 2 - 2 u : , ( E W M A l ; - p , ) ( E W M A z i - @ , j ] n M E W M A , = (u:.: - u:,, Covariance matrix of MEWMA: h M E W M A The standardized samples (Zli, Zzi) with cross correlation func- tion ( p ) were used. Thus, a1 = a2 = 1 ; 012 = p. Notations L and i rep- resent the constant parameter and the number of samples. The starting value of EWMA (EWMAo) was set as zero to represent the process target (/A,,). The MEWMA statistic samples will be out-of-control if it exceeded the control limit (H). In this research, three sets of design parameters (A, H: 0.05, 7.35; 0.10, 8.64; 0.20, 9.65) as reported in Prabhu and Runger (1997) were investigated. 3.2. Synergistic-ANN model pattern recognizer Synergistic-ANN model as shown in Fig. (5 was developed for ( 1 ) pattern recognizer. It is a parallel combination between two I. Masood, A. H a s s a n l E x p e r t S y s t e m s w i t h Applications 41 (2014) 7579-7595 . . . . . . .:. . -: - -5.0 n -5.0 -2.5 0.0 2.5 5.0 -5.0 -2.5 0.0 2.5 5.0 XI (Down-Shift) X 1 (Down-Shift) - Fig. 3. S u m m a r y o f bivariare s h i f t patterns for p = 0.1, 0.5 and 0.9 individual ANNs that are: ( i ) raw data-based ANN, and (ii) statisti- recognizers can be combined using simple summation: 01, = cal features-ANN as shown in Fig. 7. X(ORD-~,OF-~), where i = (1,. . . .7) are the number of outputs. Final Let OR^ = (ORD-i,. . . , ORD.7) and OF = (OF.l,. . . , OF.7) represent decision ( 0 ,,,,,,) was determined based on the maximum value seven outputs from raw data-based ANN and statistical features- from the c o m b i n e d ~ ~ u t ~ u t s : ANN recognizers respectively. Outputs from these individual Osynergy = max(O/l, . . . ,017) (5) 1. Masood, R HassanlExpert S y s t e m s w i t h Applications 41 (2014) 7579-7595 Bivariate shift patte~nsforrnoderate data correlation (p = 0.5) .- Partially developed shift Fully developed shift -5.0 -2.5 0.0 2.5 5.0 X l (Up-Shift) -5.0 -2.5 0.0 2.5 5.0 X I (Up-Shift) -5.0 -2.5 0.0 2.5 5.0 X I (Nonnal) h - 4 3 2.5 - w z . . b 0.0 3 .- . 5 - - -. 2 -25 5- -5.0 -5.0 -5.0 -2.5 0.0 2.5 5.0 -5.0 -2.5 0.0 2.5 5.0 X1 (Up-Shift) XI (Up-Shift) XI (Normal) u X I (Down-Shift) krFzFiT X I (Down-Shift) Fig. 3 ( c o n t i n u e d ) Multilayer perceptrons (MLP) model trained with back-propa- 48 neurons, while statistical features input representation gation (BPN) algorithm was applied for the individual ANNs. This requires only 14 neurons. The output layer contains seven neu- model comprises an input layer, one or more hidden layer(s) and rons, which was determined according to the number of pattern an output layer. The size of input representation determines the categories. Based on preliminary experiments, one hidden layer number of input neurons. Raw data input representation requires with 26 neurons and 22 neurons were selected for raw I. Mflsood, A. HossanfExpert S y s t e m s w i t h Applications 41 (2014) 7579-7595 X I (Down-Sh~ft) XI ( Down-Shift ) - --- - L --- -- ! Fig. 3 ( c o n t i n u e d ) data-based ANN and statistical features-ANN. The experiments did not improve the training results but provided poorer results. revealed that initially, the training results improved in-line with These excess neurons could burden the network computationally, the increment in the number of neurons. Once the neurons reduces the network generalization capability and increases the exceeded the required numbers, further increment of the neurons training time. I. Masood. R HassanlExpert Systems with Applications 41 (2014) 7579-7595 i......... ............................. ......................... ! First stage MEWMA , monitol-ing / control chart Next Yes (out-of-control) 11 Troubleshooting I and I i renew setting / / i ................................................ : I; t I YCS (out-of-control) I I ; Identify the sources of mean shift I ................................................................................................................................... Fig. 4. Frameworlc for the 2s-IMS. 3.3. lnput representation lnput representation is a technique to represent input signal into ANN for achieving effective recognition. There are various approaches could be used to represent input signal. Raw data (standardized samples) is the basic approach (Zorriassatine, Tannock, & O'Bricn. 2003). Besides raw data, feature-based approach that involves extracted features from raw data is one of the successful technique in image processing (Br~~nzcll & Eriltsson, 2000: I . ~ Raw data- own.. Kaw a a b a s e d Statistical F e a t u r e s Features- OF.S I I Fig. 6. Synergistic-ANN model. controlling or monitoring. Insufficient denoising will distort recognizer for improving pattern discrimination capability. Raw waveforms and introduce errors. Inversely, excessive denoising data input representation consists of 48 data, i.e., 24 'consecutive will over-smooth the sharp features of underlying signals by standardized samples of bivariate process (Z1.P1l Z1.p2,. . . .Z24.P1. recognizing them as noise or outliers. Z24.P2). Statistical features input representation consists of last In this research, raw data and improved set of statistical features value of exponentially weighted moving average (LEWMA]) with were applied separately into training of the Synergistic-ANN A = [0.25,0.20,0.15,0.10], mean ( p ) , multiplication of mean with I. Masood, A HassanlExpert Systems w i t h Applicatioiu 41 (2014) 7579-7595 Fig. 7. Individual ANN recognizer. W L g r e r - L w e r L W (48 01 02 0 3 0 4 4 0 6 0 7 Raw data-ANN standard deviation (MSD), and multiplication of mean with mean square value (MMSV). Each bivariate pattern was represented by 14 data as follows: LEWMA0.25-~~, LEWM&.20-P1, LEWM&.15.Pl, LEWMAO.IO-PI, , & I , MSDPI. MMSVpi, LEwMAo.25-~2, LEWMAo.2o-p2. LEWMAO.IS-P~. LEWMAO.IO-PZ~ PPZ, MSDPZ, MMSVp2. LEWMAn features were talten based on observation win- dow = 24. The EWMA-statistics as derived using Eq. (6) incorpo- rates historical data in a form of weighted average of all past and current observation samples (Lucas cF Saccucci. 1990): W L W - L w ~ L w (14 0 1 02 0 3 0 4 05 0 6 0 7 Statistical feature-ANN Xi represents the original samples. In this study, the standardized samples (Zi) were used instead of Xi so that Eq. (6) becomes: where 0 < A ,< 1 is a constant parameter and i = [ I , 2 , . . . ,241 are the number of samples. The starting value of EWMA (EWMb) was set as zero to represent the process target (PO). Four value of constant parameter (A = 0.25,0.20,0.15,0.10) were selected based on a range within [0.05,0.40] recommended by Lucas and Saccurci (1990). Besides resulting longer ARLO, these parameters could influence the performance of EWMA control chart in detecting process mean shifts. Preliminary experiments suggested that the EWMA with small constant parameter ( A = 0.05) were more sensitive in identify- ing small shifts ( ~ 0 . 7 5 standard deviations), while the EWMA with large constant parameter (A = 0.40) were more sensitive in identify- ing large shifts (>2.00 standard deviations). The MSD and MMSV features were used to magnify the magnitude of mean shifts ( P ~ ~ P Z ) : where (,u1,p2), ( a l , 0 2 ) (,uf,p:) are the means, standard deviations and mean square value respectively. The mathematical expressions of mean and standard deviation are widely available in textbook on SPC. The mean square value feature can be derived as in Hassail ct al. (2003). Further discussion on selection of statistical features can be found in Masood ancl Hassan (2013). 3.4. Recognizer training and testing Partially developed shift patterns and dynamic patterns were applied into the ANN training and testing respectively since these approaches have been proven effective to suit for on-line process situation (Gi~h, 2007). Detail parameters for the training patterns are summarized in Tablcs 1 ancl 2. In order to achieve the best training result for overall pattern categories, the amount of training patterns were set as follows: (i) bivariate normal patterns = [I500 x (total combination of data correlation)] and (ii) bivariate shift patterns = 1100 x (total combi- nation of mean shifts) x (total combinations of data correlation)]. In order to improve discrimination capability between normal and shift patterns, a huge amount of N (0,O) patterns was applied into ANN training. The US ( 1 , l ) and DS ( 1 , l ) pattern categories also require a huge amount of training patterns since it contain a more complex combination of mean shifts compared to the other bivar- iate shifts pattern categories. Guh (2007) reported that the utilization of partially developed shift patterns in ANN training could provide the shorter ARLl results. In order to achieve the best ARLl results for this scheme, different percentage of partially developed shift patterns were utilized for different range of mean shifts as shown in Table 2. The starting points of sudden shifts (SS) were determined empiri- cally. The actual value of data correlation is dependent to the var- iability in the bivariate samples. The simulated values ( p = 0.1,0.3, 0.5, 0.7. 0.9) as shown in Table 1 could only be achieved when the process data streams are in fully normal pattern or in fully devel- oped shift pattern. Input representations were normalized to a compact range between [-1,+1]. The maximum and the minimum values for normalization were talcen from the overall data of train- ing patterns. Based on BPN algorithm. 'gradient decent with momentum and adaptive learning rate' (traingdx) was used for training the MLP model. The other training parameters setting were learning rate (0.05) learning rate increment (1.05). maximum number of epochs (500) and error goal (0.001), whereas the network performance was based on mean square error (MSE). Hyperbolic tangent func- tion was used for hidden layer, while sigmoid function was used for an output layer. The training session was stopped either when the number of training epochs was met or the required MSE has been reached. 4. Performance results and discussion The monitoring and diagnosis performances of 2s-IMS were evaluated based on average run lengths (ARLo,ARLl) and recogni- tion accuracy (RA) as summarized in Table 4. The ARLs results were also compared to the traditional multivariate statistical process control (MSPC) charting schemes such as X 2 (Hotelling, 1947). MCUSUM (Pignatiello & Kunger, lf)90), and MEWMA (Lowry et dl., 1992), as reported in the literature. In order to achieve balanced monitoring and accurate diagnosis, the proposed 2s-IMS should be able to achieve the target perfor- mances as follows: I. Masood. A. Hossan/Expert Systems with Applications 41 (2014) 7579-7595 Table 1 . . Parameters for t h e training patterns. Pattern category M e a n shift ( i n standard deviations) Data correlation ( p ) Amount of training patterns N ( 0 , o ) X I : 0.00 0.1, 0.3, 0.5, 0.7, 0.9 1500 x 5 = 7500 X2: 0.00 US (1,o) X l : 1.00, 1.25,. . ..3.00 100 x 9 x 5 = 4 5 0 0 X2: 0.00, 0.00,. . .,o.oo US ( 0 , l ) X2: 0.00, O . O O , . . ..O.OO l O O x 9 x 5 = 4 5 0 0 Xl: 1.00, 1.25,. . ..3.00 us (1.1) Xl: 1.00, 1.00, 1.25, 1.25,. ..,3.00 100 x 25 x 5 = 12.500 X2: 1.00, 1.25, 1.00, 1.25. . . ..3.00 DS (1.0) X l : -1.00, -1.25.. . ., -3.00 l O O x 9 x 5 = 4 5 0 0 X2: 0.00, o.oo,.. ..o.oo DS ( 0 , 1 ) X2: 0.00, 0.00 ,..., 0.00 lOOx9x5=4500 X l : -1.00, -1.25,. . . , -3.00 0s ( 1 , 1 ) Xl: -1.00, -1.00, -1.25, -1.25,. . .,-3.00 100 x 25 x 5 = 12,500 X 2 : -1.00, -1.25, -1.00, -1.25,. . ..-3.00 Table 2 Parameters for the partially developed shift patterns. Range of mean shifts ( i n standard deviations) A m o u n t of partially developed shift patterns Starting point of sudden shift (SS) Sample 9 t h Sample 1 3 t h Sample 1 7 t h Table 3 Summary of monitoring-diagnosis capabilities. Traditional MSPC 2s-IMS Effective in monitoring (to identify out-of-control signal) Limited to avoid false alarm (ARLO r 200) Unable to identify the sources of variation (mean shifts) Comparable to the traditional MSPC in monitoring aspect Capable to maintain smaller false alarm (ARLO >> 200) High accuracy in identifying the sources of variation (mean shifts) (i) ARI, >> 200 to maintain small false alarm in monitoring bivariate in-control process. (ii) Short ARLl (average ARLl 6 7.5 for shifts range k0.75-3.00 standard deviations) to rapidly detect bivariate process mean shifts. (iii) High RA (average RA 2 95% for shifts range 20.75-3.00 stan- dard deviations) to accurately identify the sources of mean shifts. 4.1. Monitoring performance In monitoring aspect. the ARb represents the average number of natural observation samples of in-control process before the first out-of-control process signal exist as a false alarm. In other word, the ARLO measures how long a SPC scheme could maintain an in- control process running without any false alarm. On the other hand, the ARLl represents the average number of unnatural obser- vation samples before it is truly identified as out-of-control process signal. In other word, the ARLl measures how fast a SPC scheme could detect process mean shifts. Further discussion on this mea- sure can be referred to Montgoluery (2009). Ideally, a SPC scheme should provide ARLO as long as possible in order to minimize cost for investigating the discrepancy and trou- bleshooting while the process still within control. Meanwhile, it should provide ARLl as short as possible in order to minimize cost for reworlts or waste materials. Since t h e false alarm cannot be eliminated, the ARLO >> 200 is considered as the de facto level for balanced monitoring. In this research, the ARLs results of 2s-IMS were simulated based on correctly classified patterns. Generally, it can be observed that the smaller the mean shifts, the longer the ARLl values. This trend support the conclusion that process mean shifts with smaller magnitudes would be more difficult to detect. Specifically, the 2s-IMS indicated rapid detection capability for large shifts (shifts = 3 a . ARL1 = 3.18-3.19) and moderate shifts (shifts = 2 a , ARLl = 4.76-4.78) with short ranges of ARL1. It was also capable to deal with smaller shifts (shifts=Ilo.0.75a], ARL1= 110.33-10.60,15.69-16.751). In comparison to the X2 charting scheme, detection capability as shown by 2s-IMS was faster for small and moderate shifts (shifts = 0.75-20). In comparison to the MCUSUM and the MEWMA, it was slightly comparable in rapid detection for large shifts (shifts = 2.50, ARL,: 2s-IMS = 3.80-3.81, MCUSUM = 2.91, MEWMA=3.51) and moderate shifts (shifts=1.5o. ARL,: 2s-IMS = 6.41-6.52, MCUSUM = 5.23; MEWMA = 6.12). Similar trend can also be found when dealing with smaller shifts (shifts = la, ARLI: 2s-IMS = 10.33-10.60, MCUSUM = 9.28; MEWMA = 10.20). Meanwhile, based on the range of ARLO results ( p = 0.1,0.5, 0.9; ARLo=335.01, 543.93, 477.45), the 2s-IMS was observed to be more effective in maintaining smaller false alarm compared to the traditional MSPC (ARb r 200). It should be noted that the results for medium and high correlation processes have exceeded 370 as shown in the Shewhart control chart (Nels011, 1985; Shewhart, 1931). Overall, it can be concluded that the proposed scheme indicated balanced monitoring performance. 4.2. Diagnosis performance In diagnosis aspect, the RA measures how accurate is a SPC scheme could identify the sources of mean shifts towards diagnos- ing the root cause error and conducting troubleshooting. Generally, it can be observed that the smaller the mean shifts, the lower the RA results. This trend supports the conclusion that diagnosis infor- mation for small process mean shifts (61.0 standard deviations) - 7590 I. Masood, A. HassanjExpert Systems with Applications 41 (2014) 7579-7595 Table 4 . Performance comparison between t h e 2s-IMS and t h e traditional MSPC. Pattern category Mean shifts Average run lengths Recognition accuracv 2s-IMS X2 UCL = 10.6 MCUSUM k = 0.50 h = 4.75 MEWMA I = 0.10 H = 8.66 2s-IMS XI X2 ARLO for p = 0.1. 0.5. 0.9 ARLO for p = 0.0 RA for p = 0.1, 0.5. 0.9 N (0,o) 0.00 0.00 335.01. 543.93. 477.45 200 (0.005) 203 (0.0049) 200 (0.005) N A ARL, for p = 0.1, 0.5, 0.9 us (1,O) 0.75 0.00 17.60. 18.34, 20.00 92.7. 90.4. 89.5 US ( 0 , l ) 0.00 0.75 16.20, 15.99, 16.21 92.9. 89.3. 90.6 US (1.1) 0.75 0.75 13.64. 13.28, 14.17 82.4. 94.8, 99.9 DS (1,O) -0.75 0.00 16.31, 16.43, 17.35 92.3, 89.2. 89.4 Ds ( 0 , l ) 0.00 -0.75 16.94, 17.44, 18.75 92.3. 87.8. 88.5 DS (1.1) -0.75 -0.75 13.46. 13.37. 14.03 84.1. 96.1. 99.9 Average 15.69, 15.81, 16.75 89.5, 91.3, 93.0 us (1,o) 1 .OO 0.00 11.52. 11.57, 11.70 42-0.976 9.28-0.892 95.3, 93.1, 94.4 us ( 0 , l ) 0.00 1 .OO 10.50, 10.22, 10.20 95.8. 93.5, 94.4 Us ( 1 , l ) 1 .OO 1.00 9.1 6. 9.09. 9.66 90.0, 96.5, 100 Ds (1.0) -1 .OO 0.00 10.99, 10.86, 11.06 95.3. 93.2. 92.3 0 s ( 0 , l ) 0.00 -1.00 11.08, 11.12, 11.36 93.8, 92.1, 92.6 Ds (1,1) -1.00 -1.00 9.15. 9.12. 9.63 89.5. 98.0. 100 Average 10.40, 10.33, 10.60 93.3. 94.4. 95.6 us (1,O) 1.50 0.00 7.02, 7.07, 7.03 15.8-0.937 5.23-0.809 97.4, 96.5, 97.1 us (0,1) 0.00 1.50 6.54, 6.33, 6.40 97.1. 96.5. 96.2 U s ( 1 , l ) 1.50 1.50 5.82, 5.73. 5.94 91.7. 97.9. 100 DS (1,O) -1.50 0.00 6.81, 6.81, 6.92 97.4, 96.3, 95.5 0 s ( 0 , l ) 0.00 -1.50 6.82, 6.80, 6.85 96.2. 95.8. 95.6 0s (1,1) -1.50 -1.50 5.81. 5.69. 5.98 93.2. 99.0. 100 Average 6.47, 6.41, 6.52 95.5. 97.0. 97.4 us (1.0) 2.00 0.00 5.23, 5.15, 5.19 97.8, 97.1, 97.6 Us (0.1) 0.00 2.00 4.80. 4.72, 4.70 97.7. 97.8. 97.1 Us (1,1) 2.00 2.00 4.36, 4.32, 4.39 91.6, 98.4, 100 DS (1,o) -2.00 0.00 5.04, 5.04. 5.02 96.8. 96.7. 96.6 0 s (0,1) 0.00 -2.00 4.97, 5.03, 4.98 96.5. 96.5. 95.6 DS(1,l) -2.00 -2.00 4 29, 4.27 4 33 93.7. 98.9. 100 Average 4.78. 4.76. 4.77 95.7, 97.6. 97.8 Us (1,O) 2.50 0.00 4.10, 4.14, 4.12 98.0, 98.4, 98.0 US (0.1) 0.00 2.50 3.83, 3.81, 3.81 97.3. 97.4, 97.0 US (1.1) 2.50 2.50 3.54, 3.49, 3.53 93.2, 98.4, 100 DS (1,O) -2.50 0.00 3.99. 3.96. 3.95 97.3. 97.3, 97.0 0 s ( 0 , l ) 0.00 -2.50 3.97. 4.02. 3.98 96.5. 96.6. 97.0 D S ( 1 , l ) -2.50 -2.5O 3.41.3.40. 3.46 94.9. 98.8. 100 Average 3.81, 3.80, 3.81 96.2. 97.8. 98.2 Us (1.0) 3.00 0.00 3.47, 3.46, 3.46 98.6, 98.3. 98.2 U s (0.1) 0.00 3.00 3.20, 3.20. 3.21 97.8. 97.8, 98.0 US ( 1 , l ) 3.00 3.00 2.98, 2.93, 2.98 93.8. 98.4, 100 DS (1,o) -3.00 0.00 3.31, 3.30. 3.27 98.0, 97.1. 97.6 0 s ( 0 , l ) 0.00 -3.00 3.33.3.32, 3.32 96.7, 97.1, 97.1 DS(1,l) -3.00 -3.0° 2.84. 2.85. 2.90 94.6. 99.1. 100 Average 3.19, 3.18. 3.19 96.6. 98.0, 98.5 Grand average +(0.75-3.00) 7.39, 7.38, 7.61 94.5. 96.0, 96.8 Note: Design parameters for MEWMA control c h a r t in 2s-IMS (1= 0.1, H = 8.64). would be more difficult to identify. Specifically, the 2s-IMS indicated accurate diagnosis capability for large shifts (shifts = 30, RA = 96.6-98.5%) and moderate shifts (shifts = 20, RA = 95.7- 97.8%) with high ranges of RA. Although the results were slightly Flange Internal diametc degraded, it is still effective to deal with smaller shifts (shifts = [lo,0.75o], RA= [93.3-95.6%,89.5-93.0%]). It should be 1' noted that the RA results for medium and highly correlated pro- /' cesses were higher compared to low correlation process, which is i effective for practical case. Since the traditional MSPC charting i 1, n schemes were unable to provide diagnosis information, diagnosis \, 1" capability as shown by 2s-IMS was absolutely capable in solving \~ such issue. Overall, it can be concluded that the proposed scheme '> 1. indicated accurate diagnosis performance. Table 3 summarizes the comparison of monitoring-diagnosis capabilities between the Groove & flange view Roller head 2s-IMS and the traditional MSPC. Fig. 8. Functional features of roller head. I. Masood, A. HassanjExpert Systems with Applications 41 (2014) 7579-7595 Extnision round bar Tuining to rough size Turning to size Honing inner diameters N~ckel electroplating Beal-ings assembly Fig. 9. Process plan for the manufacture of roller head. 5. Industrial case study . Broadly, the need for BQC could be found in manufacturing industries involved in the production of mating, rotational or mov- ing parts. Investigation for this study was focused on the manufac- turing o f audio video device (AVD) component, namely, roller head. This investigation was based on the author's working experience in manufacturing industry in Johor, Malaysia. In an AVD, the roller head functions to guide and control the movement path of a film tape. Inner diameters of roller head (ID1 and ID2) as shown in Fig. 8 a r e two dependent quality characteristics (bivariate) that need for joint monitoring-diagnosis. In current practice, such func- tional features are still widely monitored independently using Shewhart control charts. It is unsure why the MSPC was not imple- mented. Based on the author's point of view, it could be due to lack of motivation, ltnowledge and sltills to adapt new technology. The process plan for the manufacture of roller head can be illus- trated in Fig. 9. Initially, an aluminium extrusion round bar was turned t o rough size (rough cut machining). Then, it was turned to size (finish cut machining) to form functional features such as inner diameters, and groove and flange, among others. The machining of inner diameters was then extended into honing pro- cess to achieve tight tolerance for bearing assembly. Hard coated surface was also necessary. As such, the machined work-piece was electroplated using nickel alloy before assembly. rTool rRoller head <. ..-.---.-. ,./' Tool bluntness Loading error (Decrement in ID) (Increment in ID) Fig. 10. Process variation occurred in turning-to-size operation. automatically loaded into pneumatic chuck using a robotic system. Bluntness in the cutting tool will cause gradual decrement in both inner diameters (ID1 ,ID2) with positive cross correlation ( p > 0). In another situation. such inner diameters could be suddenly increased simultaneously and yields positive cross correlation ( p > 0) due to loading error. Based on two examples of bivariate process variation, industrial process samples were simulated into the 2S-IMS for validating its applicability in real world. The first case study involves tool bluntness. The mean ( p ) and standard deviation (o) of bivariate Bivariate process variation can be found in turning to size oper- in-control process were determined based on the first 24 samples ation d u e to tool bluntness and loading error as illustrated in (observations lst-24th). Tool bluntness begins between observa- Fig. 10. These disturbances will cause unnatural changes in t h e tion samples 41st-50th. Validation results are summarized in process data streams as shown in '1-able 5. The work piece is Table (5, whereby the determination of process status (monitoring) Table 5 Sources of variation in machining inner diameters. Stable process Process noise N (0,O) Tool bluntness DS ( 1 , l ) Loading error US (1.1) XI., ([Dl) 1~:: f l w w t . I-"_- -- N o r m a l _ N o r m a l " k M L-.. Down-Trend I -" Down-Trend h L Up-Shift Scatter diagram i.01-1 u XI ( Down-Trend ) I. Masood R HassanjExpert Systems with Applications 41 (2014) 7579-7595 - - 7592 Table 6 . - . Inspection results based on tool bluntness case. i Original samples Standardized samples Window range Monitoring-diagnosis decision Xi.1 (ID11 Xi-2 (ID2) 4 1 ((Dl) Zi-1 (ID2) 1 7.9420 7.9428 0.3393 1.0790 2 7.941 2 7.9420 -1.1414 -0.591 7 3 7.9412 7.941 6 -1.1414 -1.4271 4 7.9420 7.9428 0.3393 1.0790 5 7.9412 7.9420 -1.1414 -0.5917 6 7.9412 7.941 6 -1.1414 -1.4271 7 7.9420 7.9428 0.3393 1.0790 8 7.9424 7.9420 1.0797 -0.5917 9 7.941 6 7.9420 -0.401 0 -0.591 7 1 0 7.941 2 7.941 6 -1.1414 -1.4271 11 7.941 6 7.9424 -0.401 0 0.2437 1 2 7.9428 7.9432 1.8201 1.9144 1 3 7.9420 7.9424 0.3393 0.2437 1 4 7.941 6 7.9424 -0.4010 0.2437 1 5 7.9424 7.9428 1.0797 1.0790 16 7.9412 7.942 -1.1414 -0.5917 1 7 7.941 2 7.9416 -1.1414 -1.4271 1 8 7.9420 7.9424 0.3393 0.2437 1 9 7.9428 7.9428 1.8201 1.0790 20 7.9420 7.9424 0.3393 0.2437 2 1 7.941 2 7.9416 -1.1414 -1.4271 22 7.9424 7.9428 1.0797 1.0790 23 7.9424 7.9424 1.0797 0.2437 2 4 7.9420 7.9424 0.3393 0.2437 1-24 N 25 7.941 2 7.941 6 -1.1414 -1.4271 2-25 N 26 7.9424 7.9420 1.0797 -0.591 7 3-26 N 27 7.9424 7.9428 1.0797 1.0790 4-27 N 28 7.941 2 7.9420 -1.1414 -0.591 7 5-28 N 29 7.9420 7.9428 0.3393 1.0790 6-29 N 3 0 7.9420 7.9424 0.3393 0.2437 7-30 N 3 1 7.9412 7.9420 -1.1414 -0.5917 8-31 N 3 2 7.9420 7.9428 0.3393 1.0790 9-32 N 3 3 7.9428 7.9424 1.8201 0.2437 10-33 N 3 4 7.941 6 7.9424 -0.4010 0.2437 11 -34 N 3 5 7.9424 7.9432 1.0797 1.9144 12-35 N 3 6 7.9428 7.9424 1.8201 0.2437 13-36 N 3 7 7.941 6 7.9420 -0.401 0 -0.591 7 14-37 N 3 8 7.9420 7.9424 0.3393 0.2437 15-38 N 3 9 7.9424 7.9420 1.0797 -0.5917 16-39 N 40 7.941 6 7.9420 -0.4010 -0.5917 17-40 N 41 7.9408 7.9412 -1.8818 -2.2625 18-41 N 42 7.9408 7.9408 -1.8818 -3.0978 19-42 N 43 7.9404 7.9408 -2.6222 -3.0978 20-43 N 44 7.9404 7.9408 -2.6222 -3.0978 21-44 DS ( I 1 ) 4 5 7.9404 7.9404 -2.6222 -3.9332 22-45 DS ( I I ) 46 7.9400 7.9404 -3.3626 -3.9332 23-46 D5 (1 1 ) 4 7 7.9400 7.9400 -3.3626 -4.7686 24-47 DS (1 1 ) 48 7.9396 7.9400 -4.1029 -4.7686 25-48 DS (1 1) 4 9 7.9396 7.9396 -4.1 029 -5.6040 26-49 DS ( I I ) 50 7.9396 7.9396 -4.1 029 -5.6040 27-50 DS ( I 1 ) ( p l , p z ) = (7.9417,7.9422): (g,,02) = (4.6687 x 10-~,4.2495 x Note: Observation samples highlighted in bold (41st-50th) represent out-of-control process. Table 7 Outputs of the scheme for tool bluntness case. Reco~nition window (RW) 1-24 2-25 3-26 4-27 5-28 6-29 7-30 8-31 9-32 P Decision based on MEWMA control chart RW P Decision based on MEWMA control chart RW P N US (I 0 ) US ( 0 1 ) us (1 1 ) DS (1 0) DS ( 0 1 ) DS (1 1 ) Note: Bold value represents the maximum output of ANN that determines pattern category. I. Mosood. A. HonanfExpert Systems with Applications 41 (2014) 7579-7595 7593 and sources of variation (diagnosis) are based on output of the scheme as shown in Table 7. In t h e first 40 samples, this scheme was able to correctly recog- nize t h e bivariate process data streams as in-control patterns (N). In this case, it was effective to identify bivariate in-control process without triggering any false alarm. Bluntness of the cutting tool begins a t sample 41st, whereby this scheme was able to correctly recognize bivariate process data streams as Down-Shift patterns (DS (1 , I ) ) starting from sample 44th (at window range 21st-44th). In overall diagnosis aspect, this scheme was observed to be effective to identify the sources of variation in mean shifts without mistalte. The second case study involves loading error. Similar as in the first case study, the mean ( p ) and the standard deviation (o) of bivariate in-control process were computed based on the first 24 observation samples. Loading error exist between samples 40th- 50th. Validation results and related output of the scheme are sum- marized in Tables S a n d 9 respectively. Based on t h e first 39 samples, this scheme is effective to cor- rectly recognize the bivariate process data streams as in-control patterns (N). In this situation, the process was running smoothly without false alarm. Improper condition of pneumatic chuck and robotic arm causes loading error between samples 40th-50th. In this situation, this scheme was able to correctly recognize the bivariate process data streams as Up-Shift patterns (US (1 , l ) ) start- ing from sample 40th (at window range: 17th-40th). In overall diagnosis aspect, this scheme is capable to correctly identify the sources of variation in mean shifts without mistalte. Table 8 Inspection results based on loading error case. i Original sa~nples Standardized samples Window range Monitoring-diagnosis decision XILI ([Dl 1 x,.z (ID21 Z i ~ l (ID1 ) zi-i (ID21 1 7.941 6 7.9420 -0.2856 -0.5099 2 7.9412 7.9428 -1.1424 1.3727 3 7.9420 7.9424 0.5712 0.43 1 4 4 7.941 2 7.941 6 -1.1424 -1.4512 5 7.9420 7.9428 0.571 2 1.3727 6 7.941 2 7.9420 -1.1424 -0.5099 7 7.941 2 7.941 6 -1.1424 -1.4512 8 7.941 6 7.9424 -0.2856 0.4314 9 7.9424 7.9420 1.4279 -0.5099 10 7.941 6 7.9420 -0.2856 -0.5099 11 7.941 2 7.941 6 -1.1424 -1.4512 1 2 7.9424 7.9428 1.4279 1.3727 13 7.9420 7.9424 0.571 2 0.4314 14 7.941 6 7.9424 0 . 2 8 5 6 0.4314 15 7.9412 7.9416 -1.1424 -1.4512 16 7.9424 7.9428 1.4279 1.3727 17 7.941 6 7.9420 -0.2856 -0.5099 18 7.9420 7.9424 0.5712 0.4314 19 7.9412 7.9420 -1.1424 -0.5099 20 7.9424 7.9424 1.4279 0.4314 21 7.9420 7.9424 0.5712 0.4314 22 7.9420 7.9424 0.5712 0.43 1 4 23 7.941 2 7.9416 1 . 1 424 -1.4512 24 7.9424 7.9428 1.4279 1.3727 1-24 N 25 7.9420 7.9424 0.571 2 0.4314 2-25 N 26 7.9412 7.9416 -1.1424 -1.4512 3-26 N 27 7.9424 7.9420 1.4279 -0.5099 4-27 N 28 7.9424 7.9428 1.4279 1.3727 5-28 N 29 7.9412 7.9420 -1.1424 -0.5099 6-29 N 30 7.9420 7.9428 0.5712 1.3727 7-30 N 3 1 7.9428 7.9424 2.2847 0.4314 8-31 N 32 7.9420 7.9424 0.5712 0.43 1 4 9-32 N 33 7.941 2 7.9420 -1.1424 -0.5099 10-33 N 34 7.9420 7.9428 0.5712 1.3727 11-34 N 35 7.9428 7.9424 2.2847 0.43 1 4 12-35 N 36 7.941 6 7.9424 -0.2856 0.4314 13-36 N 37 7.9424 7.9428 1.4279 1.3727 14-37 N 3 8 7.9416 7.9420 -0.2856 -0.5099 15-38 N 39 7.9428 7.9424 2.2847 0.43 1 4 16-39 N 40 7.9428 7.9432 2.2847 2.3140 17-40 US (1 1 ) 41 7.9432 7.9428 3.1415 1.3727 18-41 US (1 1 ) 42 7.9436 7.9432 3.9982 2.3140 19-42 US (1 1 ) 43 7.9428 7.9432 2.2847 2.3140 20-43 US (1 1 ) 44 7.9432 7.9428 3.1415 1.3727 21 -44 US (1 1 ) 45 7.9436 7.9432 3.9982 2.3140 22-45 US (1 1 ) 46 7.9428 7.9432 2.2847 2.3140 23-46 US (1 1 ) 47 7.9432 7.9428 3.1415 1.3727 24-47 US (1 1 ) 48 7.9428 7.9436 2.2847 3.2553 25-48 US(1 I ) 49 7.9428 7.9432 2.2847 2.3140 26-49 US (1 1 ) 50 7.9436 7.9432 3.9982 2.3140 27-50 US(1 1 ) (/L,,/L~) = (7.9417,7.9422); ( u 1 . u 2 ) = (4.6687 x 10-4,4.2495 x Note: Observation samples highlighted in bold (40th-50th) represent out-of-control process. I. Masood, A. HassanlExpert Systems with Applications 41 (2014) 7579-7595 Table 9 . - Outputs of t h e scheme for loading error case. - - W i n d o w range (RW) 1-24 2-25 3-26 4-27 5-28 6-29 7-30 8-31 9-32 P 0.6896 0.6910 0.8333 0.7723 0.7733 0.7822 0.7733 0.7234 0.7407 Decision based o n MEWMA control c h a r t N N N N N N N N N RW 10-33 11-34 12-35 13-36 14-37 15-38 16-39 17-40 18-41 P 0.7924 0.7753 0.7202 0.6941 0.7075 0.7254 0.6693 0.6973 0.7040 N N N N N N N N 0.8220 0.4510 US (1 0) 0.5067 0.6528 US (01) 0.1 665 0.1370 US (1 I ) 0.9479 1.1855 DS ( 1 0) 0.0985 0.0719 DS (01) 0.1533 0.1453 DS (1 1) 0.1257 0.1129 RW 19-42 20-43 21-44 22-45 23-46 24-47 25-48 26-49 27-50 P 0.7546 0.7504 0.7561 0.781 5 0.7806 0.7486 0.7258 0.7228 0.6886 N 0.1775 0.1609 0.1053 0.0434 0.0403 0.0376 0.0313 0.0318 0.0200 US (1 0) 0.7116 0.4926 0.5886 0.6606 0.4464 0.5521 0.3923 0.2824 0.3099 US (01) 0.1184 0.1124 0.1317 0.1724 0.1631 0.1540 0.1762 0.1711 0.2213 US (1 1 ) 1.4012 1.6147 1.5717 1.5235 1.6681 1.5983 1.7006 1.7666 1.7163 DS (1 0) 0.0632 0.0852 0.0817 0.0766 0.0802 0.0970 0.1002 0.1142 0.1221 DS (0 1) 0.1537 0.1508 0.1350 0.1830 0.2023 0.1735 0.1754 0.2060 0.2468 DS (1 1) 0.1304 0.1144 0.0875 0.0636 0.0597 0.0384 0.0434 0.0493 0.0283 Note: Bold value represents t h e m a x i m u m o u t p u t of ANN t h a t determines p a t t e r n category. G . Conclusions This paper proposed two-stage monitoring approach in moni- toring and diagnosis of bivariate process variation in mean shifts. Based on the frameworlc of 2s-IMS that integrates the powerful of MEWMA control chart and Synergistic-ANN recognizer, it has resulted in a smaller false alarm ( A R b = 335.01-543.93), rapid shifts detection (ARL1 = 3.18-16.75), and accurate diagnosis capa- bility (RA = 89.5-98.5%) compared to the traditional SPC charting schemes for BQC. Since the monitoring and diagnosis performances were evaluated using modeling data, real industrial data were used for the purpose of validation. The case studies involved tool bluntness and loading error in machining operations, whereby the proposed scheme has shown a n effective monitoring capability in identifying the bivariate in-control process without any false alarm. The scheme also effective in diagnosis aspect, that is, in cor- rectly identifying the sources of mean shifts when process becomes out-of-control. Based on the promising results, the 2s-IMS could be a reference in realizing balanced monitoring and accurate diagnosis of bivariate process variation. In the future work, further investigation will be extended to other causable patterns such as trends and cyclic. Aclcnowledgements The authors would like to thank Universiti Tun Hussein Onn Malaysia (UTHM). Universiti Teltnologi Malaysia (UTM), and Ministry of Higher Education (MOHE) of Malaysia who sponsoring this work. References Al-Assdf, Y. (2004). Recognition of control c h a r t p a t t e r n s using ~nulti-resolution wavelets analysis and ncul-dl networlts. Conlputers. a r ~ d I~~dustrialEngir~ecr~~~g, 47, 17-29. AIL, F. B. (1985). Multivariate quality contl-ol. In N. L. Johnson 8 S. IKotz (Etls.). Encvclopedia of.stotistica1 mcierices (vol. 6). New Yol-It: Wiley. Assaleh. I<.. 4 AI-Assal, Y. (2005). Features extraction a n d analysls for cldss~fying cdusable patterns in c o n t ~ - o l rhal-ts, CoinputeIs aitd Industrial Engirzrering, 49, 168..181. Bersimis, S., Psaraliis, 5 . . 8 I'anaretos, 1. (2007). Multivariate ~ t a t i s l ~ c a l pi-ocess control c h a m : An ovel-view. @ruliLy and Rellabilrty Engineeririg I ~ ~ r e r r ~ a t i t r n a i , 23, 5 17-543. BI-iinzell, I-]., 8 Eriltssnn. J. (2000). Feature recluction for cla.ss~fication of r n u l t i d ~ m e n s ~ o n a l data. Porter-ri Recognition, 33(121, 1741 - - 1748. Clien, 2.. Lu. S., 8 Lam, S. (2007). A hybrid system Ibr SI'C concurrenl patlel-11 recognilion. /ldvclnced Enginc.eriilg Inloi-~iialics. 21, 303-3 10. Clien. L. H.. & Wang, T. Y. (2004). AI-Lificial neural networlts lo classify mcan shifts from niultivariate %%hart s ~ g n a l s . Computers olld 111dush.iol Engineerillg, 47, 195-~205. Clieng, C. S., & Cheng, 1-1. 1'. (2008). I d e n t i f y ~ n g the source of variance shifts in the rnultivai-iafe pl-occss using neural netwol-Its and s u p p o r l vector machines. Expert 5,y.stern.s wit11 Applico1ror1.s. 3.5, 198-206. Chih. W. H.. 4 Kollier, D. A. (1994). U ~ a g n o s i s charactcristics Sor hivariale p a t t e r n recogliltion s c h c m e in SPC. Iruernalionul Jouri~o! o j r2uolity u r ~ d Reliability Molitr.yrnelit, l I ( 1 ) . 53-66. Chih. W. I<., 8 Rollier. I). A. (1995). A ~ ~ i e t h o d o i o g y of pattern recogmtion s c h e m e s for t w o variables in SPC. Il~terr~ationol jou1-11a1 of Q~ialin, and I(eliability Mnnngenrent, 12(3), 86-107. Crosiel-, R. B. (1988). M u l t ~ v a r ~ a t e g e n e ~ a l ~ z a t i o n s of c u ~ n u l a t i v e s u m q w l ~ t y c.ontrol schemes. 7'ech11ometrics. 30(3), 291-303. El-Midany. I . T., El-Baz, M. A,. & Abtl-Elwahecl. M. S. (2010). A proposccl Irarneworlt for conlrol c h a r t p a t t c r n recognition in multivariale process using &I-tificial neural networlts. Expert Systems with Aljplicutio~is, 37, 1035- 1042. Fuchs. C.. 4 Benjamini, Y. (1994). M u l t ~ v a r i a t e profile chart? for s t a t ~ s t i c a l ~ l r o c e s s control. Technom~trics, 36(2), 182- 195. Gauri, S. I<., 8 Chaltraborty, 5 . (2006). Feature-l~ased recogiiltlon of control c h a r t patterns. Conlputen und lndlrstriul Erigir~eerin,y, 51. 726-742. Gauri. S. I<.. 4 Chakraborly, S. (2008). Improved recognition of control c h a r t p a l l c r n s using artificial neural netwol-Its. I ~ ~ t e n ~ n t i o r ~ c i l J o u n ~ u l ofAdvnnted Manlrfuucturirlg Tei:hnolom,, 36, 1191--1201. Guh, R. S. (2007). On-line identification and q u a n t i l i c a t ~ o n of mean shifts In b i v a r ~ a t e processes u s ~ n g a NN-bared approach. Quality and Reliabilrtj~ Engineering I ~ ~ t e r n o t i o n a l . 23, 367 -- 385. Cull, R. S . (2010). Simultaneous process mean and variance monitol-ing using artiiicial neul-al nctworlts. Cornptrters rind Ir~dlistrifll Er~girirering, 58, 739-753. Guh. K, S., 8 Shiue, Y. K. (2005). On-line idenlification of contl-ul c h a r t p a t t e r n s using selt:o~-ganizing approaches, lnternatioiiu! lour1101 of Production Kesenrch, 43(6). 1225--- 1254. H a c h ~ c h a , W., 4 Ghorbel. A. (2012). A survey of contl-ol-chart pattern-~'ecognitio~i I ~ t e r a t u r e (1991-2010) based oli a new conceptual clasrificat~on s c h e ~ n e . C(~nlputers o r ~ d Incll~striol Eli~@l~ee~-in,y, 63, 204-222. Hassan, A., Nabi Baksli, M. S., S h a h a r o u n , M. A,, K. Jamaludin, H. (2001). Impl-ovecl SPC c h a r t paltern recognition using statislical features. 111Li.i-norionoljouii~al o j Producrion Rescorch, 41(7). 1587-1 603. 1.lotelling. 1-3. H. (1947). Multival-late quality contl'ol. In C. Eisenhdrt, M. W . Hastay, 8 W. A. W a l l ~ s (Eds.), Teclli~iques ofstub.sticul ailnlysi.~. New York: McGraw-I-1111. Jacltson. J. E. (1391). A user's guide to principle compollerrt. NL'W Jel-spy: Wiley. IKlosgen, W.. 8 Zytltow, J. M. (2002). HaildbooI< of dotti minii~g arid knowledge discovery. Lontlon: OxFord University I'ress. I