JASIS 9/23/92 - reformat An Expert System for Automatic Query Reformulation Susan Gauch* and John B. Smith+ *To whom correspondence should be sent. Biological Knowledge Laboratory College of Computer Science Northeastern University Boston, MA 02115 sgauch@flora.ccs.northeastern.edu (617) 437-3850 +Department of Computer Science University of North Carolina Chapel Hill, NC 27599-3175 This work was supported in part by ONR contract N00014-86-K-0680. ABSTRACT Unfamiliarity with search tactics creates difficulties for many users of online retrieval systems. User observations indicate that even experienced searchers use vocabulary incorrectly and rarely reformulate their queries. To address these problems, an expert system for online search assistance was developed. This prototype automatically reformulates queries to improve the search results, and ranks the retrieved passages to speed the identification of relevant information. Users' search performance using the expert system was compared with their search performance on their own, and their search performance using an online thesaurus. The following conclusions were reached: 1) The expert system significantly reduced the number of queries necessary to find relevant passages compared with the user searching alone or with the thesaurus. 2) The expert system produced marginally significant improvements in precision compared with the user searching on their own. There was no significant difference in the recall achieved by the three system configurations. 3) Overall, the expert system ranked relevant passages above irrelevant passages. 1 INTRODUCTION 1.1 Driving Problem Technology to produce, store, and distribute massive quantities of electronic information has matured. Textbases, online full-text databases, are being created in Gauch -2- many fields. Personal workstations have become common. To make use of the information accessible from their desks, technology must be developed which allows end-users to search effectively. One user study found that whereas system mechanics are rarely a problem for any but very inexperienced and infrequent users, even experienced searchers have significant problems with search strategy and output performance (Borgman, 1986a). Another found experienced searchers lost sight of the search logic, missed obvious synonyms, and searched too simply (Fenichel, 1981). In spite of low recall, half of the searchers never modified the original query in an attempt to improve their results. Studies of inexperienced searchers find even more problems with search strategy. In one study, a quarter of the subjects were unable to pass a benchmark test of minimum searching skill (Borgman, 1986b). In an experiment contrasting the searching of novices versus experienced searchers, the novices found some relevant documents easily, but they failed to achieve high recall and were unable to reformulate queries well (Oldroyd, 1984). The experienced searchers in this study were more persistent and willing to experiment than the novices. Blair and Maron (1985) paint an even bleaker picture for searching full-text databases. Legal assistants searching a legal database achieved only 20% recall, although they were attempting to do a high recall search. The factors, as identified by the authors, leading to this poor performance were poor searching technique (failure to use stemming and synonyms), stopping the query iteration too soon, and the inability to search on inter-document relationships. The authors argued that vocabulary problems make high recall impossible on full-text databases. 1.2 Related Work In our system, queries are reformulated by a knowledge-based online search assistant acting as the front-end to existing retrieval systems. Research in this area is summarized in Section 1.2.1. For a more detailed discussion, see (Gauch, 1992). The knowledge base for our system is be built on existing searching practice. Current knowledge on good search technique is presented in Section 1.2.2. 1.2.1 Expert Systems CANSEARCH (Pollitt, 1984; Pollitt, 1987) is one of the earliest expert systems for bibliographic retrieval. It is designed to enable doctors to search the MEDLINE medical database for cancer literature. The expert system contains knowledge of a single domain, cancer, rather than search strategies in general. During the query reformulation process, the expert system guides the searcher through a hierarchy of menus. IR-NLI II (Brajnik, Guida, & Tasso, 1988) incorporates user modeling into a domain- independent bibliographic retrieval expert system. Domain knowledge is supplied separately by an online thesaurus. A user model is built based on the user's amount of domain knowledge and search experience. This model is used to tailor the Gauch -3- dialogue between the system and the user. Initially, the user lists some terms which describe his interests. The expert system, through a lengthy dialogue, clarifies its model of the query, proposes terms to expand the query, and comments on the user's search strategy. No automatic query reformulation is done. Shoval (1985) developed an expert system to assist users in selecting the right vocabulary terms for a database search. The knowledge base of words, concepts, and phrases and their semantic relationships is stored in a semantic network. Decision rules, based on common search practice, are used to locate candidate vocabulary terms in the semantic network and suggest them to the user for possible query expansion. The user then decides whether or not the candidate terms are relevant and should be used to replace the terms in the nodes which point to it. PLEXUS (Vickery & Brooks, 1987) is an expert system to help novice users find information about gardening. The initial query formation consists of a dialogue with the user. Natural language queries are accepted, and information is extracted to fill in frames. The system has a knowledge base of search strategies and term classifications similar to a thesaurus. Most of the domain knowledge is in the classification, but some appears in the rule base, itself. If queries are too broad (defined as more than 10 references) , no narrowing is attempted. The references are displayed 5 at a time to the user. If the query is too narrow (defined as nothing retrieved at all), three strategies are attempted: 1) if two or more terms appear in the same subcategory, OR them together rather than AND; 2) drop one of the terms; 3) replace a term by its parent. IOTA (Chiaramella & Defude, 1987) is an expert system which incorporates a natural language interface. Passages are retrieved from an online book based on keywords which index each passage. Much of the research effort has gone into processing the user's queries, but some simple query reformulation is also done. Specifically, queries are broadened by replacing a term by its parent from an online thesaurus and narrowed by removing O R terms. Their results, based on a small-scale experiment, indicate an increase in precision and recall using the expert system. I3R (Croft & Thompson, 1987) incorporates user modeling and relevance feedback. The query formation process is a dialogue between the user and the system, during which the user supplies a short natural language query or an initial relevant document. The domain knowledge expert infers related concepts from to the query and presents them to the user for confirmation. If the thesaurus-like knowledge base does not contain related information and the initial query contained too many high-frequency terms the user may be asked to provide additional keywords. A ranked list of documents is presented to the user. The user then indicates which terms in each document are interesting. These new terms may be used to modify the query. EP-X (Krawczak et al, 1987; Smith et al, 1989) is a prototype knowledge-based system that assists users in conducting bibliographic searches of the environmental Gauch -4- pollution literature. This system makes extensive use of domain knowledge, represented as hierarchically-defined semantic primitives and frames. The user enters a query as a list of keywords and the system interacts with him to suggest possible broadening or narrowing operations. In spite of the rich domain knowledge, the final search results were mediocre. In particular, users did not take advantage of the available query refinement strategies when they should have. Fewer projects have attempted to provide intelligent assistance for full-text searching. One such system is RUBRIC (McCune et al, 1985; Tong et al, 1987), which has the user describe his query in terms of rules. These rules describe the domain knowledge for the system as a hierarchy of topics and subtopics. Rules may have weights representing the certainty and/or importance of the defined relationships. The lowest level subtopics define patterns in the text which indicate the presence of that subtopic. Although the query process is very powerful, it places a heavy burden on the user. Ongoing projects are developing models on which future expert systems will be based (Belkin & Marchetti, 1990; Chen, 1990). Recent projects focus on incorporating other aspects of artificial intelligence, particularly natural language processing (Jacobs & Rau, 1990) and probabilistic inference over networks of documents (Croft & Turtle, 1992). 1.2.2 Search Strategies The automatic query reformulation incorporated in the systems described in the previous section are, in general, very primitive. However, search strategies employed by both novice and experienced searchers have been widely studied. These studies formed the basis of our expert system's searching knowledge base, which is described in detail in Section 3. Searching Studies Bates (1979) has compiled a thorough catalogue of search tactics. She outlines 29 search tactics in four areas: monitoring, file structure, search formulation, and term manipulation. The tactics for search formulation and term manipulation describe the available techniques to broaden and narrow queries. The search formulation tactics include the selection of appropriate initial search terms and the manipulation of query structure; the term manipulation tactics describe the use of context, thesaural terms, and stemming to modify queries. The tactics she lists provide the basic operations for our expert system; however, she includes no guideline as to when each tactic is appropriate. By analyzing discourses between an expert intermediary and 17 real information seekers, Smith et al (1989) identify a set of search tactics. They noted when each of these tactics was applied, and whether the intermediary used the tactic spontaneously or in response to some cue in the retrieved document. They concluded that the intermediary makes extensive use of domain knowledge to Gauch -5- suggest topic refinesments, generates most knowledge-based suggestions spontaneously, and rarely changes logical opeartors. The results of this study are being used as the basis for EP-X, an online search intermediary (see section 1.2.1). Based on observation of 47 professional online searchers, Fidel has developed a formal decision tree that represents the intuitive rules searchers use when they select search terms. The options available to the searchers are enumerated, as are the conditions under which each option is selected. The author defines rules for deciding when to use textwords or descriptors or both to search indexed databases, and guidelines for including thesaural relationships. In contrast to the tactics proposed by Smith et al (1989), the emphasis is on identifying techniques which are domain independent. Effects of Query Expansion A study of the effects of query expansion on retrieval performance found that automatically adding terms based on their statistical relationships to the user's search terms degrades retrieval performance (Smeaton & van Rijsbergen, 1983). Clearly, a better criteria for selecting terms to add is needed. Another study (Harman, 1988) also showed performance degradation when adding terms from a statistically constructed thesaurus. However, when only those thesaural terms which occur in relevant documents are added, retrieval performance improves over that achieved by the original query. However, the best performance is achieved when user filtering of three types of candidate terms (thesaural, term variants, and statistically selected from relevant documents) is simulated. 2 SYSTEM ARCHITECTURE The prototype system consists of five major components (see Figure 1): 1) MICROARRAS (Smith, Weiss, & Ferguson, 1987), which serves as the full- text search and retrieval engine 2) a full-text database of over 188,000 words 3) a hierarchical thesaurus of approximately 7,424 words specific to the textbase's domain 4) an expert system of 85 OPS83 rules and over 5,000 lines of C code, which interprets the user's queries, controls the search process, analyzes the retrieved text, and ranks the search results 5) a user interface, which accepts the user's queries, presents requests for information from the expert system, and displays the search results. Gauch -6- User Interface Expert System MICRO- ARRAS TextbaseThesaurus User Figure 1. System Architecture The system is implemented on a Sun 3 workstation. MICROARRAS and the thesaurus construction and access routines are written in the C language. The expert system consists of a knowledge base of production rules, written in OPS83, and a set of C language functions to carry out the actions prescribed by the rule-base. The textual database for the current demonstration project consists of an unpublished manuscript on computer architecture written by Gerrit A. Blaauw and Frederick P. Brooks, Jr. (1986). The search process consists of a dialogue between the user and the expert system. The user enters the initial Boolean query and the number of passages (i.e. paragraphs) he would like to retrieve. The expert system parses the query and translates it into a request for information from MICROARRAS. MICROARRAS retrieves text passages from the full-text database and informs the expert system of the number of passages that satisfy the request. The expert system compares the number retrieved with the target number to decide whether or not to reformulate the query, and, if so, how. Once the target number has been reached, or the expert system has run out of reformulations to try, the retrieved passages are presented to the user in rank-order. A major advantage of this architecture is the separation of strategic knowledge, contained in the knowledge base for the expert system, from domain knowledge, contained in the thesaurus. Now that the search strategy rules have been developed and tested with the existing textbase, the expert system can be tested with other content domains by simply providing a suitable thesaurus for the new textbase. 2.1 MICROARRAS MICROARRAS is a full-text retrieval and analysis system. The system provides immediate access to any passage in the textbase, regardless of the length of that document. Contexts for searches can be indicated in terms of words, sentences, paragraphs, etc., for the entire search expression or for different parts of it. To be inserted into MICROARRAS' textbase, documents must first be inverted (i.e. a dictionary is created with an entry for each word in the text. Each entry contains the word and the numerical position in the text of each occurrence of that word). However, they require no semantic preprocessing. Once stored in the textbase, they can be examined individually or in groups. They can also be moved from one Gauch -7- textbase to another. Thus, documents can be processed on a workstation or microcomputer, uploaded into a textbase on a mainframe or textbase server, searched and analyzed there, or downloaded for local use once again. 2.2 Textbase The textbase contains the Fall, 1986 draft of Computer Architecture, Volume 1 - Design Decisions by Blaauw and Brooks. The manuscript consists of 188,278 words comprising 8 chapters, titled: "Introduction", "Machine Language", "Addresses", "Data", "Operations", "Instruction Sequence", "Supervision", and "Input/Output". TeX format marks were already present and were used to display of the retrieved text (line, italics, label), as well as provide structural information (chapter, section, subsection, subsubsection, paragraph, sentence, item). 2.3 Thesaurus All domain-specific knowledge is contained in a hierarchical thesaurus. The expert system uses this information to reformulate queries. The thesaurus was built by the author from the Brooks and Blaauw text, and it strongly reflects the word usage of that textbase. In general, it should not be necessary to provide a unique thesaurus for each textbase. An existing thesaurus for the domain could be used, as long as there is a good match between thesaurus classes and textbase word usage. Word types which share a common stem are grouped into s t e m g r o u p s . The members of a given stemgroup are called stemwords. Each word type in the Blaauw and Brooks text appears in exactly one stemgroup. Thesaurus classes contain stemgroups which are synonyms for each other. Stemgroups may appear in zero, one, or more than one thesaurus class. Because the thesaurus classes are linked together with parent-child links, they are also referred to as n o d e s . The arrangement of the words into stemgroups, stemgroups to thesaurus classes, and the classes into a hierarchy is discussed in (Gauch, 1991). To capture the relationships between thesaurus classes for use by the expert system and the user, the high- frequency terms are included in the thesaurus. 2.4 Query Language A Boolean query language was chosen because it is the most common type available on existing systems. The operators provided, in decreasing order of operator precedence, are: ANDNOT, AND, and OR. A default context of one sentence is used for the A N D and A N D N O T operators. When a query is parsed, the expert system interprets each search term to represent a unique concept. The concepts, and the operators, are flagged as positive or negative based on whether they are specifying information the user does, or does not, wish to receive. For example, the query 'i/o A N D N O T (device O R interrupt)' contains three concepts: i/o, device, and interrupt. I/O is a concept on which the user wishes information, so it is considered a positive concept. Device and interrupt indicate concepts on which the user does not wish information, so they are considered negative concepts. The ANDNOT and O R Gauch -8- operators are followed by negative concepts, so they too are flagged as negative. When the user is searching with the expert system, the expert system controls the context. Initially, the default of one sentence is used, but the expert system may adjust the context during query reformulation. However, when the user is searching without the expert system, the A N D and A N D N O T operators may be augmented with a user-specified context. The user may define the search context for AND or ANDNOT in terms of words, sentences, and/or paragraphs. 2.5 Knowledge Base The expert system performs three main functions: 1) it controls the operation of the system as a whole; 2) it reformulates the Boolean query based on previous search results; 3) it ranks the retrieved passages in decreasing order of estimated relevance for presentation to the user. To perform these functions the expert system contains a knowledge base of the search process, search strategies, and passage ranking procedures. Domain knowledge is contained in the hierarchically structured thesaurus. The system has no knowledge of the user's true information needs, other than the target number they specify to indicate how many passages they wish to retrieve. 2.5.1 Query Reformulation Queries are reformulated based on the target number, the number of passages retrieved, and the history of broadening and narrowing techniques already applied. The expert system has a collection of reformulation tactics at its disposal. Bates (1979) and others have identified successful search tactics and Fidel (1991) discusses when to use free-text terms versus descriptors. However, no one has outlined an overall query reformulation strategy for free-text searching. The guiding principles for the expert system's query reformulation knowledge base were: 1) each search term in the initial query represents one concept on which the user does, or explicitly does not, want information; 2) the user's initial search terms are the best indication of the user's areas of interest; 3) some terms from the thesaurus may be helpful, but others will not; 4) the expert system should never discard concepts in which the user has indicated an interest. Query Reformulation Techniques The expert system reformulates queries using three different techniques: 1) expanding concepts; 2) adjusting context; and 3) changing the query structure. Gauch -9- Expanding Concepts To broaden a query, search terms are added to the positive concepts, whereas narrowing a query adds search terms to negative concepts. Concepts may be expanded by stemming, adding synonyms, and adding related search terms from the thesaurus. The order in which the terms are added from the thesaurus is: parents, then siblings, then children. Replacing a term with its parent to broaden a query is a common practice, both by searchers (Bates, 1979; Salton, 1986), and in systems which automatically reformulate queries (Chiaramella & Defude, 1987; Vickery & Brooks, 1987). The rationale, based on experiences with keyworded bibliographic systems, is that since parent terms represent broader concepts, adding the parent term should broaden the scope of the query. In full-text databases, we believe that the reverse order may make more sense. Broadening a concept containing virtual memory with children terms, yielding 'virtual_memory OR paging OR segmentation', seems more likely to retrieve relevant passages than broadening with the parent terms, yielding 'virtual_memory OR memory'. Crouch (1988) found that augmenting a query with thesaurus terms, rather than replacing the original search terms, lead to improved results. With this in mind, concepts are expanded by adding thesaural terms (O R ing them with the terms already in the concept) rather than by replacing the terms already present. The belief that some stemgroups from the thesaurus will be useful, while other will not, is the basis for providing user filtering of the candidate thesaurus terms. One expert system uses domain-dependent search strategies to choose the appropriate terms from a thesaurus (Smith et al, 1989). In addition, Harman (1988) showed that search results improved when thesaural terms were filtered by the user. Based on these two studies, we decided to allow the users to select which stemgroups to add from a set of thesaural candidates. Finally, candidate search terms selected from the thesaurus are filtered to remove those which already occur in the query and extremely high frequency terms. The remaining terms are added one at a time, in reverse order of frequency, and the new number of retrieved passages is compared to the target number. Adjusting Context The expert system manipulates four different contexts; it adjusts the distance between words in positive and negative multiword phrases as well as the distance between positive and negative search concepts. The expert system broadens queries by increasing the positive contexts and decreasing the negative ones. Narrowing is done by decreasing the positive contexts and increasing the negative ones. Changing Query Structure The final variable the expert system can manipulate to reformulate the query structure. The query can be broadened in two different ways: first, the positive AND Gauch -10- -ve synonyms decrease context -ve parents -ve siblings -ve children decrease context +ve synonyms increase context +ve parents +ve siblings +ve children loosen operators Initial Query remove -ve -ve stemwords SuccessSuccess +ve stemwords increase context B N B B N B N N S S S S B B N N S S B B B N N N B B B S S S N N N S S S B B S N N N B B N N S S B B S N Legend B: broaden N: narrow S: success +ve: positive concepts -ve: negative concepts S tighten operators Failureincreasecontext decrease context S N B N SB Figure 2. Query Reformulation Techniques Gauch -11- operators can be switched to OR operators (and the negative OR operators switched to ANDs); second, the negative parts of the query can be dropped altogether. All of the AND operators are replaced at the same time. A better strategy would be to replace them one at a time, in inverse order of the frequency of occurrence of the concepts. Similarly, the query can be narrowed by replacing OR operators with A N Ds. The expert system does not have enough information about the user's information needs to decide which positive parts of the query to drop, so this technique is not employed to narrow queries. Because manipulating query structure causes major changes to the user's original query, these techniques are only tried as a last resort. It is not likely that the new query will find passages that the user will find highly relevant, but the goal is to find somewhat relevant passages that users can read in order to reformulate their own queries and try again. Flow of Control Figure 2 diagrams the flow of control among the reformulation techniques. The left side of the Figure 2 diagrams the broadening techniques, the right side the narrowing techniques. This figure is somewhat simplified since it does not show the use of context to converge to the target number once queries have been found which bracket the target number from above and below. The expert system records the type of initial query reformulation as the global objective. If the reformulations in the original direction overshoot the target number without achieving success, reformulations in the opposite, or local, direction are tried, beginning at the top node on that side of the diagram. Reformulation never continues in the local direction farther than it reached in the global direction. At this point, queries have already been formed which bracket the target number from below and above, otherwise the system would not have tried both narrowing and broadening techniques. Rather than using techniques which are considered less likely to produce good results, the expert system adjusts the context. Stopping Knowing when to stop a search is a difficult problem. We partially side-step this problem by having the user explicitly state the number of passages he wishes to retrieve. Since the target number he supplies is likely to be a rough guess, a range of 20% is considered successful. A larger range may be desirable, but since the user is able to stop the reformulation process himself, the size of the range is not important. Left on its own, the expert system stops the reformulation process when it achieves success, or it has run out of techniques to try. Sample Scenario The following sample scenario illustrates how the reformulation rules are applied. Since our current textbase concerns the domain of computer architecture, the example describes the interactions of the system with a user searching for Gauch -12- information on the alignment of word boundaries in memory. The user might enter a query 'boundary AND word ANDNOT page', which indicates that he wishes to retrieve passages containing information on word boundaries but not page boundaries. Assume a target number of 15. Applied to this textbase, the original query would retrieve only one passage, so the expert system would attempt to broaden the query. The first step would be to replace the word types boundary and word with their stemgroups. The resulting query would be 'Boundary A N D Word A N D N O T page', where the capitalized search terms indicate the whole stemgroup is included. Notice that page has not been expanded to its stemgroup, as it is a negative, or excluded, concept. Four passages would now be retrieved. The next step would be to broaden the query by including synonym stemgroups for each of the positive search terms, in turn. From the thesaurus it is found that Boundary has one synonym, Limit, however there is no synonym for Word. The query now becomes '(Boundary O R Limit) A N D Word A N D N O T page', which retrieves seven passages. Relaxing the context around the AND operator to adjacent sentences while decreasing the context around the A N D N O T operator to within 5 words increases the number of passages retrieved to nine. To further broaden the query, the parent stemgroups for the positive concepts are added. Block and Segment are added to the concept Boundary. The Word concept remains unchanged, since Word has no parent in the thesaurus. The query becomes '(Boundary O R Limit O R Block O R Segment) A N D Word A N D N O T page', which retrieves twelve passages. Twelve is within 20% of the fifteen passages requested, so the reformulation stops. If the user requests to see the retrieved passages, the expert system would rank the retrieved passages and present them to the user in decreasing rank-order. 2.5.2 Passage Ranking Rules The dialogue between the expert system and MICROARRAS normally produces a set of passages to be displayed to the user. The last task performed by the expert system is to rank order those passages in terms of their probable interest to the user. To do this, it performs an elementary content analysis on each passage and computes a weight representing probable interest. Ranking algorithms for document retrieval systems have been extensively studied (Harman, 1986). There has been less work done on ranking for passage retrieval systems. The FAIR system (Chang & Chow, 1988) performs a simple ranking based on the distance between word pairs, the number of search terms represented and the number of occurrences of the terms. Ro (1988) did an extensive comparison of different ranking algorithms for full-text, but failed to demonstrate significant differences in performance. Al-Hawamdeh (1991) ranks full-text paragraphs based on their similarity to a query expressed as an unstructured list of keywords, finding that the nearest-neighbor based searching is as effective as Boolean searching. Gauch -13- The ranking algorithm used by our expert system considers the following factors: the number of different concepts represented in the passage; the number of different word types for each concept; the relationship of the concept's word types to the user's original search terms; the number of occurrences for each word type from the search expression appearing in the passage; and the contextual distance between search terms. The passages are then ranked according to their respective index values and presented to the user in order of decreasing rank. Calculating Passage Weights The weight Wpq of passage p for query q, 0 <= Wpq <= 1, is a function of the weight Cip of each query concept i in p, the relationship between the concepts (determined by the parse tree), and the contextual closeness between the concepts. The concept weights are combined by applying the rules for fuzzy logic (Zadeh, 1965) to the Boolean structure of the query. Additionally, a closeness factor is associated with each of the AND and ANDNOT operators. The closeness factor for the AND operator is set to one of three values (1.0 for same sentence, 0.9 for adjacent sentences, 0.8 for same paragraph). The closer two positive concepts appear in the passage, the higher weight that passage receives. Complementary closeness values are used for the A N D N O T operator (0.8 for same sentence, 0.9 for adjacent sentences, 1.0 for same paragraph). Wp(Ci AND Cj) = min(Cip, Cjp) * PositiveCloseness (1) Wp(Ci OR Cj) = max(Cip, Cjp) (2) Wp(NOT Cj) = (1 - Cjp) (3) From (1) and (3) Wp(Ci ANDNOT Cj) = min(Cip, 1 - Cjp) * NegativeCloseness (4) The concept weights and closeness factors fall in the range [0,1], therefore the passage weights also fall in the range [0,1]. Calculating Concept Weights The weight of concept i in passage p, Cip, is a function of the weight of each concept term T in query q, denoted Tjq for search term j, and the weight of each concept term in the passage, denoted Tjp for search term j, and the number of search terms for the concept. The weight of a search term in the passage is multiplied by the weight of that search term in the query. Thus, the highest weight search terms are those which are important in the query as well as the passage. The weights for all the concept's search terms are summed together and normalized by the number of search terms for the concept, N. Gauch -14- N Cip = 1/N S Tjq * Tjp where term j is in concept i (5) j =1 The term weights fall in the range [0,1], therefore, the concept weights also fall in the range [0,1]. Calculating Term Weights Two different term weights, T, are calculated: the weight of the search term i in query q, Tiq, and the weight of the search term i in passage p, Tip. Query Term Weights The weight of the search term i in query q, Tiq, reflects the relationship of the search term to the user's original term. The relationships, from closest to most remote, are: same word, stemgroup, synonym, parent, sibling, child. These distances reflect the order in which search terms are added to the concepts, which in turn reflects confidence in the closeness of the relation of the search term to the original term. Tiq = 1.0 (word), 0.9 (stemgroup), 0.8 (synonym), 0.6 (parent), 0.5 (sibling), 0.4 (child) (6) The query term weights fall in the range [0,1] as required, with the original word receiving a weight of 1.0. Terms added by the expert system receive weights which decrease by 0.1 for every step away from the original term, except for the step from synonym to parent terms. This step decreases the term weight by 0.2, reflecting the large decrease in confidence which occurs when terms are added from outside the thesaurus class. Passage Term Weights The weight of the search term i in passage p, Tip, reflects the frequency of the search term in the passage, fip, and the frequency of the search term in the textbase, fit. An evaluation several full-text ranking algorithms and concluded that those based on relative document frequency provided the acceptable performance (Ro, 1988). Since this is also simple to calculate, we chose relative frequency for the term passage weights. Ti p = fip / fit (7) The term passage weights fall in the range [0,1], as required. Gauch -15- 3 EVALUATION Our primary goal is to demonstrate that using an expert system to reformulate queries can improve search performance for novice searchers. To evaluate the system, users queried the textbase using three interfaces with different capabilities: an interface whose only function was to accept contextual Boolean queries and display search results; a similar interface which also allowed the user to explore the online thesaurus; and a third which incorporated the searching expert system. Each subject's search performance with the three interfaces was monitored and compared. 3.1 Hypotheses Hypothesis 1: The expert system improves the search effectiveness for a novice searcher. Hypothesis 2: The expert system improves the search efficiency for a novice searcher. Hypothesis 3: The expert system can rank the passages retrieved by the search in decreasing order of relevance. The effectiveness of the retrieval output is evaluated by looking at recall (the number of relevant items found / the total number of relevant items in the database) and precision (the number of relevant items retrieved / the number of items retrieved). Two estimates of the number of relevant items retrieved are examined: the number of passages the users mark as relevant and the number of passages retrieved from the set of passages deemed relevant by the author. The efficiency of the systems is measured by the number of Boolean queries the subjects entered for each of several high-level questions, and by the amount of time they spent searching for relevant passages for each question. The ranking algorithm was evaluated by comparing the order of appearance of relevant passages after they have been ranked with a random order of appearance. 3.2 Method 3.2.1 Subjects Twelve computer science graduate students participated as subjects in the study. All subjects were knowledgeable in the use of computers, but unfamiliar with online searching. Thus, they were representative of the anticipated users of future information retrieval systems. Gauch -16- 3.2.2 Apparatus Information Retrieval Systems The user-alone configuration consisted of a Sun 3 running MICROARRAS and a rudimentary expert system. This expert system performed only the system control function, and did no query reformulation or ranking of retrieved passages. The user was prompted for a contextual Boolean query, this query was sent to MICROARRAS, and the number of passages retrieved was reported back to the user. The user could display the passages retrieved, if there were fewer than 25, or try another query. Typing was minimized by using the Sun's windowing package to cut and paste the previous query, edit, and re-run it. The u s e r - t h e s a u r u s version consisted of a Sun 3 with one window running MICROARRAS, as in the user-alone system, and a second window running a thesaurus access function. In the thesaurus window the user had access to all the thesaurus information available to the expert system. He could find out the stemname for a specific word's stemgroup. For any stemname, he could ask for the stemnames of the corresponding synonym, parent, sibling, or child stemgroups. These stemnames could be used in the user's query to MICROARRAS. Typing was minimized by using the Sun's windowing package to cut the stemgroup from the thesaurus window and paste it into the appropriate concept of the query. In the user-expert system version the user did not have access to the online thesaurus. Context and the addition of stemgroups were controlled by the expert system. Thus, the user entered a Boolean query and a target number of passages and the expert system reformulated the user's query to attempt to get close to the target number. The user was prompted to filter search terms found in the thesaurus, and to continue or abandon the current reformulation. To keep the response time approximately the same as for the other two configurations it was necessary to run MICROARRAS remotely on the Sun 4 file server containing the textbase. The user worked with one window on a Sun 3 which ran the full version of the query reformulation expert system. The expert system communicated with MICROARRAS over the network. This setup was approximately twice as fast as when MICROARRAS was run on the user's Sun 3. This speed up was necessary, not because the expert system code itself was slow, but rather because the expert system tended to form very long queries involving many MICROARRAS categories, and MICROARRAS slows down linearly with the number of search terms in a query. Questions Three sets of five questions were devised. Each set contained one training question and four questions on which the subjects were monitored. The questions covered material ranging over the whole textbase. For the fifteen questions, the number of relevant passages in the textbase ranged from a low of 7 to a high of 23 with a mean of 15.4. Gauch -17- 3.2.3 Procedure Subjects were asked to try to find on the order of ten relevant passages from the textbase in response to the questions they would be given. They were informed that they might not always be able to find that many, and they were allowed to stop working on a query whenever they were satisfied that they had found as much as they could. The target number of ten was chosen because it was large enough to require a high recall search, yet small enough that the users would not become tired reading passages. Each subject worked with each of the three systems, in turn. This was done to compensate for the large individual differences found in searching ability (Borgman, 1987). To compensate for learning during the experiment, the order of presentation of the three systems was counterbalanced among subjects. The order of presentation of the question sets was the same for all subjects (Set A first, then B, then C). Thus, each question set was searched on each system four times. The subjects received a training session with each system before they began their monitored searches. When they had completed all three sessions, they were asked to fill out the questionnaire stating their preferences and opinions. 3.2.4 Data Collection Raw Data Data was collected in a trace file while the subjects worked with the system. Each communication from the subject to the retrieval system, and vice versa, was stored with a time stamp. Thus, timing information was collected along with the history of queries entered by the subject and the search results. When the subject chose to display the retrieved passages, those passages and the subject's relevance judgment of them were also stored. Definitions A unique query was any error-free query entered by a subject. If a subject entered a query which contained a typographic or logical error, and he indicated that he noticed the error by aborting the search and re-entering a corrected version, then the erroneous query was not considered a unique query. However, if the subject gave no indication that he was aware of the error, but instead moved on to a different query altogether, then the erroneous query was considered unique. The relevance weight of a passage is the relevance number assigned to the passage by the subject. A very relevant (user) passage is one assigned a relevance weight of two. A somewhat relevant (user) passage has a relevance weight of one. A relevant passage (user) is one that is either very relevant or somewhat relevant, as judged by the user. An irrelevant passage (user) is a passage given a relevance Gauch -18- number of zero. It is necessary to have an estimate of the total number of relevant passages available for each question, in order to calculate recall. This estimate was calculated by forming the union, for each question, of the set of passages judged very relevant by any subject. Passages in this set judged irrelevant by the author were removed. The remaining passages form the absolute retrieval set and are called the r e l e v a n t passages. It was necessary to remove some passages marked very relevant by a subject because, perhaps due to a misinterpretation of the question or a misunderstanding of the passage, some subjects gave a relevance weight of two to irrelevant or marginally relevant passages. This tendency to overestimate the relevance of passages may also be because, in some cases, subjects were unable to find the truly relevant passages, and thought that they had retrieved the best passages available when in fact they had not. A successful retrieval set is a retrieval set containing at least five relevant passages. Since the subjects were attempting to find ten relevant passages, a successful retrieval set contains at least half the number for which they were looking. The textbase contained approximately the same number of relevant passages for each question, allowing the target number and size of the successful retrieval set to be held constant. The final retrieval set was chosen as the last successful retrieval set. If a subject never retrieved a successful retrieval set for a given question, the retrieval set with the highest number of relevant passages, as judged by the subject, was chosen. The final query is the query input by the user which resulted in the final retrieval set. Variables Total time per question is calculated from the entry of the subject's first query for the question until after the display, or decision not to display, of the final set of retrieved passages. Number of queries per question is determined by counting the number of unique queries the subject entered for a given question. Number of relevant passages (user) found per question is determined by counting the number of user indicated relevant passages in the final retrieval set for the question. User precision is calculated for the final retrieval set using the standard formula of: number of relevant passages (user) retrieved / number of passages retrieved. Number of relevant passages found per question is determined by counting the number of passages in the final retrieval set for the question that are members of the absolute retrieval set. Gauch -19- Precision is calculated for the final retrieval set using the standard formula of: number of relevant passages retrieved (absolute) / number of passages retrieved. Recall is calculated for the final retrieval set using the standard formula of: number of relevant passages retrieved (absolute) / total number of relevant passages available. The ranking balance point (R) for each retrieval set (not just the final one) is calculated by n S i * relevancei i=1 -------------------- n where n = number of passages in the retrieval set S relevancei i = position of the passage in the retrieval set i=1 relevancei = relevance weight of passage i This calculates where the midpoint of the relevant passages lies, accounting for the relevance weight. The earlier in the retrieval set the relevant passages occur, the smaller their midpoint. For example, consider a retrieval set of five passages of which one is very relevant (weight = 2), two are somewhat relevant (weight = 1) and two are irrelevant. An example ranking might present the very relevant passage first, then a somewhat relevant passage, an irrelevant one, the other somewhatr relevant passage and then the final irrelevant passage, represented as (2, 1, 0, 1, 0). Using this formula, the example balance point would be (1*2) + (2*1) + (3*0) + (4*1) + (5*0) ---------------------------------------------- = 2 2 + 1 + 1 + 0 + 0 Similarly, the worst-case balance point would be 4.25, and the example balance point 2. The best case balance point (BC) for each retrieval set is calculated by applying the ranking balance point formula to the case where all very relevant passages preceded all somewhat relevant passages which in turn preceded all non-relevant passages in the set. In this case, the best case ranking would be (2, 1, 1, 0, 0), yielding a balance point of 1.75. For comparison, the worst case ranking, (0, 0, 1, 1, 2), would yield a Gauch -20- balance point of 4.25. The midpoint (M) for each retrieval set is calculated by (n+1)/2 where n is the number of passages in the retrieval set. The midpoint is used for comparison with the ranking balance point. A ranking algorithm which distributed the relevant passages evenly throughout the set, in our example (1, 0, 2, 0, 1), would yield a ranking balance point of 3. This is also the midpoint of the set. A good ranking algorithm would produce distributions with balance points less than the midpoint (i.e. relevant passages presented earlier). The normalized ranking balance points were calculated from the ranking balance points by moving the midpoint to 0 and adjusting the range so that the best case balance point fell on 1, and the worst case balance point at -1. The normalization performed was: Normalized ranking balance point (NR) = ( M - R ) / ( M - BC ). For the example retrieval set, the normalized ranking balance point would be: (3 - 2) / (3 - 1.75) = 0.8. Note that this normalization is not possible when BC equals M, since a division by zero will be attempted. This can arise when all the retrieved passages receive the identical relevance weight (e.g. (2, 2, 2, 2, 2)). Since any order of presentation is as good as any other when the passages are all of equal relevance, it would be safe to ignore these cases. However, this situation never arose in the experiment. Summaries Calculated for Each System For each system the means calculated were: • number of queries per question • time per question (seconds) • number of relevant passages (user) per question • user precision • number of relevant passages (from absolute retrieval set) • precision • recall For each ranking algorithm (the expert system's, and randomness) the normalized balance points were calculated. Gauch -21- 3.3 Results The means were compared to determine if their differences were statistically significant. Pairwise two-tailed t-tests were performed. A difference was considered significant if its probability of occurring due to chance was less than 5% at the 95% confidence level (a 10% chance at the 95% confidence level was considered marginally significant). Pairs of means with statistically significant differences are flagged with asterisks. 3.3.1 Search Effectiveness All three systems retrieved comparable numbers of relevant passages. Whereas there seemed to be higher recall with the thesaurus, shown by a mean of 7.688 compared to a mean of 7.292 with the expert system, this difference was not significant (p = 0.5333). • number of relevant passages (user) per question • user alone 7.375 • user and thesaurus 7.688 • user and expert system 7.292 All three systems produced comparable precision, based on the subject's relevance judgments. • user precision • user alone 0.763 • user and thesaurus 0.786 • user and expert system 0.761 All three systems retrieved approximately the same number of passages from the absolute retrieval set. • number of passages from absolute retrieval set • user alone 5.521 • user and thesaurus 5.708 • user and expert system 5.729 Gauch -22- Recall was comparable across all three systems. There was a slight improvement in recall for the user and expert system configuration, but the advantage over the user- alone configuration was not significant (p < 0.6988). • recall • user alone 0.364 • user and thesaurus 0.368 • user and expert system 0.379 The user and expert system configuration produced marginally significant improvements in precision when compared with the user-alone configuration. • precision • user alone 0.530 * (p < 0.0817) • user and thesaurus 0.576 • user and expert system 0.604 * 3.3.2 Search Efficiency The user was marginally significantly slower when using a thesaurus. The expert system was not significantly slower than the other two systems. However, MICROARRAS was being executed by a Sun 4 with the user-expert system configuration resulting in approximately a doubling of its speed. • mean time per question (seconds) • user alone 474.5 * (p < 0.101) • user and thesaurus 571.5 * • user and expert system 539.8 The expert system improved search efficiency, as measured by number of user queries over both the user alone and user plus thesaurus. • number of queries per question • user alone 4.833 * (p < 0.0001) • user and thesaurus 5.458 * * (p < 0.0001) • user and expert system 2.354 * , * * Gauch -23- 3.3.3 Ranking The expert system ranked relevant documents more highly than would be predicted by randomness. The expert system's ranking was compared to a random distribution for 74 sets of retrieved passages. • balance points • random 5.00 * (p < 0.0165) • expert system 4.53 * • normalized balance points ( on range of -1 to +1 ) • random 0.000 * (p < 0.0025) • expert system 0.195 * 3.4 Analysis Effectiveness The first hypothesis, that the expert system can improve the search effectiveness for a novice user was not supported by this study. However, the expert system produced marginally significant improvements in precision, and seemed to indicate improvements in recall. Providing the online thesaurus produced no improvement in search effectiveness. The suggested improvements in precision may result from the expert system applying better broadening techniques. The subjects, when searching unassisted, would often stop with a very broad query and examine a large set of retrieved passages (over fifteen) looking for relevant information. This type of strategy results in the lower precision observed when the subjects search on their own. The subjects' browsing strategy may account for their ability to produce recall comparable to the expert system when there were a large number of relevant passages in the textbase. For example, in two questions with large absolute retrieval sets the subjects were able to retrieve, on average, 10 and 10.25 relevant passages on their own compared with the expert system's retrieval of 8 and 7.75 passages respectively. By using a target number of 10 for these broader questions, the expert system was operating at a disadvantage. More relevant information was easily found, judging by the high recall of the subjects, but the expert system did not even attempt to further broaden the query. Clearly, 10 is not the ideal target number for all queries. Efficiency The second hypothesis, that the expert system can improve the search efficiency of novice searchers, was supported. Using the expert system significantly reduced the Gauch -24- number of queries subjects needed to answer a given question. Subjects required fewer than half as many queries per question on average versus systems in which the user queried without it, a substantial improvement. The expert system reduced the amount of user effort required by decreasing the number of queries a user needs to design to express their information needs. If efficiency is measured in terms of total user time the expert system fares less well. The expert system was not significantly slower than either of the other two systems but it was necessary to run MICROARRAS on a faster machine to achieve this. However, this version of the expert system was designed with correctness rather than efficiency in mind, and there are several ways that it could be sped up. In particular, when a stemgroup is added to a concept, the entire query is re-evaluated against the textbase. A large speed improvement could be gained by unioning the passages retrieved by the new stemgroup with those retrieved by the rest of the concept (which has already been calculated). Allowing the subjects to access the online thesaurus actually decreased the subjects' efficiency. They took significantly more time than when they searched on their own and generated as many queries. This seems to indicate that the improvement in efficiency seen above was due to the expert system's searching knowledge base, not just access to an online thesaurus. Ranking The third hypothesis that the expert system could rank passages in decreasing order of relevance was supported. Although the expert system did present relevant passages significantly earlier than would be predicted by randomness, the improvement was not large enough to be considered truly successful. The current algorithm needs to be evaluated with different weights or a somewhat different algorithm needs to be tried in order to further improve the ranking function. Decreasing the query term weights more quickly as the query terms move farther from the original may improve the ranking by placing more emphasis on the user's original search terms. Using a more sophisticated closeness factor, one that took into account to how many words apart the search terms were in the passage, as well as sentence and paragraph measures considered in this version, could also lead to improved ranking. Reformulations Finally, some discussion of the number of reformulations performed by the expert system seems appropriate. The number of reformulations performed for a given user on a given question varies since some unsuccessful starting queries were reformulated before (and sometimes after) a successful starting query is found. The following statistics are given for the final query. The average number of reformulations performed on the starting query for the twelve questions, in order, were: 4.25, 5.75, 4.25, 2.75, 6, 5.75, 3.25, 3.25, 4.25, 2.75, 4, 1. This gives an average of 3.65 reformulations over all final retrieval set queries. It is interesting to note that the highest average number of reformulations is six. If the expert system is Gauch -25- continually broadening the query (which is the most common case), this means that even on the question requiring the most reformulations it stops, on average, just after adding child stemgroups. In fact, examining the 48 final retrieval set queries reveals that only in 3 cases did the expert system go past this point. Twice it went one more step and adjusted the context, and once it performed all ten reformulations on the broadening side before the user was satisfied with the number of passages retrieved. 3.5 Questionnaire The twelve subjects were asked which features of the expert system they liked best. The automatic addition of terms from the thesaurus was the most frequently mentioned (8 subjects), whereas the automatic context adjustment was the second most popular feature (3 subjects). Many subjects (8) mentioned the decreased amount of work needed to perform a search, with three of them specifically mentioning that they did not have to think as much. Other features mentioned which decreased the user effort were the simplified syntax, decreased typing, and the fewer queries to remember. System slowness was the feature most disliked (6 subjects). Although the amount of time necessary to answer a question was no greater with the expert system (see Section 3.3.2),there was less work for the user so time seemed longer. The other main complaints concerned the user interface. The subjects were fairly evenly split between wanting the system to proceed more automatically, with less prompting from them (4 subjects), whereas others wanted the system to explain what it was doing and/or allow the user to direct it (5 subjects). These comments lead to the conclusion that if a usable system is to be built based on the success of this research prototype, the execution of the system must be sped up and more work on interface design is needed. Almost all the subjects (10) found the user-expert system version the easiest to use, with the remaining two subjects split between the other two versions. Not surprisingly, given the comparable effectiveness of the three systems, the subjects were split on which system they felt gave the best results. Three voted for the user- alone version, two for the user-thesaurus, and three for the expert system. Three said it was a tie between the user-thesaurus and the expert system, and one abstained. 4 FUTURE WORK Running the experiment suggested several possible refinements to the system. The experimental subjects had many useful comments, the bulk of which dealt with the desire for a more sophisticated user interface. Desirable changes include: provision of a non-Boolean query language; allowing users to adjust the amount of system interaction; having the user specify the type of search desired, rather than having him give a specific target number; and increasing the speed of the system by improving the way the expert system uses MICROARRAS. Gauch -26- Observing the expert system reformulate real queries gave invaluable insight into which types of queries it handled well, and which it did not. A common type of query that required broadening was one that contained the intersection of three or more concepts. In this case, broadening context and adding search terms to each concept fails to address the fundamental problem of intersecting too many concepts. The next step of replacing the ANDs with ORs is too drastic a change. It invariably leads to too broad a query. Instead, the expert system should take the original query and drop each of the concepts in turn. The most common type of query requiring narrowing consisted of a single, high- frequency concept. None of the current reformulation techniques were of any use in this case. There were no operators to change, no context to adjust, and adding search terms just makes the query broader. This type of query should be treated as a special case. The concept's child concepts from the thesaurus should be presented as alternative, more specific, queries. The user could also be encouraged to AND this concept with another. The treatment of multiword phrases entered by the user which do not appear in the thesaurus should be changed. Currently, the only expansions done are expanding each word to its stemgroup and loosening the context allowed between the words of the phrase. It would be preferable to treat the words of the phrase as separate concepts which are ANDed together with adjacent context. Each phrase word could then be expanded using the full range of thesaural relationships, as is the case with regular search terms. Finally, more work is needed to improve the ranking of the retrieved passages. The current ranking algorithm should be tried with different weights for the query search terms and the closeness factor. It may be necessary to try entirely different algorithms, possibly incorporating syntactic or semantic information, to achieve high quality ranking. In conclusion, we have demonstrated that an expert system can provide online search assistance to improve the efficiency of novice searchers. Whereas more research is necessary to develop a better search assistant, I have been able prove that a useful search assistant can be developed which separates the search strategies from the domain knowledge, and that implementation of such a system is feasible now. REFERENCES Al-Hawamdeh, S., de Vere, R., Smith, G., and Willett, P. (1991). Using nearest- neighbour searching techniques to access full-text documents. Online Review, 15 (3/4), 173-191. Bates, M.J. (1979). Information search tactics. Journal of the ASIS, 30(4), 205-214. Belkin, N.J. & Marchetti P.G. (1990). Determining the functionality and features of an intelligent interface to an information retrieval system. Proceedings of the Gauch -27- Thirteenth Annual International ACMSIGIR Conference on Research & Development in Information Retrieval, Brussels, ACM Press, 151-174. Blair, D.C., and Maron, M.E. (1985). An evaluation of retrieval effectiveness for a full-text document-retrieval system. Communications of the ACM, 28(3), 289- 299. Blaauw G.A. and Brooks, F.P., Jr. (Fall, 1986). Computer Architecture, Volume 1- Design Decisions, Draft. Borgman, C.L. (1986a). Why are online catalogs hard to use? Journal of the ASIS, 37(6), 387-400. Borgman, C.L. (1986b). The user's mental model of an information retrieval system: an experiment on a prototype online catalog. International Journal of Man- Machine Studies, 24(1), 47-64. Borgman, C.L. (1987). Individual differences in the use of information retrieval systems: Some issues and some data. Proceedings of the Tenth Annual International ACMSIGIR Conference on Research & Development in Information Retrieval, New Orleans, ACM Press, 61-69. Brajnik, G., Guida, G., and Tasso, C. (1988). IR-NLI II: Applying man-machine interaction and artificial intelligence concepts to information retrieval. Proceedings of the Eleventh Annual International ACMSIGIR Conference on Research & Development in Information Retrieval; Grenoble, ACM Press, 387-399. Chang, S. and Chow, A. (1988). Towards a friendly adaptable information retrieval system. Proceedings of RIAO 88, M.I.T., 172-182. Chen, H. & Dhar V. (1990). Online query refinement on information retrieval systems: A process model of searcher/system interactions. Proceedings of the Thirteenth Annual International ACMSIGIR Conference on Research & Development in Information Retrieval, Brussels, ACM Press, 115-133. Chiaramella, Y. and Defude, B. (1987). A prototype of an intelligent system for information retrieval: IOTA. Information Processing & Management, 2 3 ( 4 ) , 285-303. Croft, W. B. and Thompson, R.H. (1987). I3R: A new approach to the design of document retrieval systems. Journal of the ASIS, 38(6), 389-404. Croft, W. B. and Turtle, H.R. (1992). Text retrieval and inference. In P.S. Jacobs (Ed.), Text-Based Intelligent Systems: Current Rsearch and Practice in Information Extraction and Retrieval. Hillsdale, N.J.: Lawrence Erlbaum Associates. Gauch -28- Crouch, C.J. (1988). A cluster-based approach to thesaurus construction. Proceedings of the Eleventh Annual International ACMSIGIR Conference on Research & Development in Information Retrieval, Grenoble, ACM Press, 309-320. Fenichel, C.H. (1981). Online searching: measures that discriminate among users with different types of experience. Journal of the ASIS, 32(1), 23-32. Fidel, R. (1991). Searcher's selection of search keys. Journal of the ASIS, 43(7), 490- 500. Gauch, S. (1991). Search improvement via automatic query reformulation. A C M Transactions on Information Systems, 9(3), 249-280. Gauch, S. (1992). Intelligent information retrieval: An introduction. Journal of the ASIS, 43(2), 175-182. Harman, D. (1986). Individual differences in the use of information retrieval systems: Some issues and some data. Proceedings of the 1986 ACM Conference on Research & Development in Information Retrieval, Pisa, ACM Press, 61-69. Harman, D. (1988). Towards interactive query expansion. Proceedings of the Eleventh Annual International ACMSIGIR Conference on Research & Development in Information Retrieval, Grenoble, ACM Press, 321-331. Jacobs, P.S. & Rau, L.F. (1990). SCISOR: Extracting information from on-line news. Communications of the ACM, 33, 88-97. Krawcsak, D., Smith, P.J., & Shute, S.J. (1987). EP-X: A demonstration of semantically-based search of bibliographic databases. In C.T. Yu and C.J. van Rijsbergen (Ed.), Proceedings of the Tenth Annual International ACM SIGIR Conference on Research & Development in Information Retrieval (pp. 263- 271). New Orleans, LA. McCune, B.P., Tong, R.M., Dean, J.S., Shapiro, D.G. (1985). RUBRIC: A system for rule-based information retrieval. IEEE Transactions on Software Engineering, SE-11(9), 939-945. Oldroyd, B.K. (1984). Study of strategies used in online searching 5: differences between the experienced and the inexperienced searcher. Online Review, 8(3), 233-244. Pollitt, A.S. (1984). A 'front-end' system: An expert system as an online search intermediary. ASLIB Proceedings, 36(5), 229-234. Pollitt, A.S. (1987). CANSEARCH: An expert systems approach to document retrieval. Information Processing & Management, 23(2), 119-136. Ro, J.S. (1988). An evaluation of the applicability of ranking algorithms to improve Gauch -29- the effectiveness of full-text retrieval. II. On the effectiveness of ranking algorithms on full-text retrieval. Journal of the ASIS, 39 (3), 147-160. Salton, G. (1986). Another look at automatic text-retrieval systems. Communications of the ACM, 29 (7), 648-656. Gauch -30- Shoval, P. (1985). Principles, procedures and rules in an expert system for information retrieval. Information Processing & Management, 21 (6), 475-487. Smeaton, A.F., and van Rijsbergen, C.J. (1983). The retrieval effects of query expansion on a feedback document retrieval system. The Computer Journal, 26 (3), 239-246. Smith, J.B., Weiss, S.F., and Ferguson, G.J. (1987). MICROARRAS: An advanced full-text retrieval and analysis system. Proceedings of the Tenth Annual International ACMSIGIR Conference on Research & Development in Information Retrieval, New Orleans, ACM Press, 187-195. Smith, P.J., Shute, S.J., Galdes, D., and Chignell, M.H. (1989). Knowledge-based search tactics for an intelligent intermediary system. ACM Transactions on Information Systems, 7(3), 246-270. Tong, R.M., Applebaum, L.A., Askmann, V.N., and Cunningham, J.F. (1987). Conceptual information retrieval using RUBRIC. Proceedings of the Tenth Annual International ACMSIGIR Conference on Research & Development in Information Retrieval; New Orleans, ACM Press, 247-253. Vickery, A., and Brooks, H.M. (1987). PLEXUS-The expert system for referral. Information Processing & Management 23 (2), 99-117. Zadeh, L.A. (1965). Fuzzy sets. Information and Control, 8, 338-353.