IRMJ01mcmanus 46 Journal of Distance Education Technologies, 1(3), 46-58, July-Sept 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. ABSTRACT Web-based learning enables more students to have access to the distance-learning environment, and provides students and teachers with unprecedented flexibility and convenience. However, the early experience of using this new learning means in China exposes a few problems. Among others, teachers accustomed to traditional teaching methods often find it difficult to put their courses online, and some students, especially the adult students, find themselves overloaded with too much information. In this paper, we present an open framework to solve these two problems. This framework allows students to interact with an automated question answering system to get their answers. It enables teachers to analyze students’ learning patterns and organize the web-based contents efficiently. The framework is intelligent due to the data mining and case-based reasoning features, and user-friendly because of its personalized services to both teachers and students. Data Mining and Case-Based Reasoning for Distance Learning Ruimin Shen, Peng Han and Fan Yang, Shanghai Jiaotong University, China Qiang Yang, Hong Kong University of Science and Technology, China Joshua Zhexue Huang, University of Hong Kong, China INTRODUCTION As distance learning becomes one of the hotspots in network research and ap- plications, many web-based education sys- tems have been established. Two good ex- amples are Virtual-U (Groeneboer, Stockley & Calvert, 1997) and Web-CT (http:// www.webct.com). To cover the entire spectrum of the learning process, these sys- tems have implemented a number of fun- damental components such as synchronous and asynchronous teaching systems, course-content delivery tools, polling and quiz modules, virtual workspaces for shar- ing resources, whiteboards, grade report- ing systems, and assignment submission components. These research and commer- cial e-learning systems enable large groups of dispersed individuals to interact, collabo- rate and study on the Web. 701 E. Chocolate Avenue, Hershey PA 17033, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.idea-group.com ������� INFORMATION SCIENCE PUBLISHING This chapter appears in the journal, International Journal of Distance Education Technology, edited by Qing Li and Weijia Jia. Copyright © 2003, Idea Group Publishing. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Journal of Distance Education Technologies, 1(3), 46-58, July-Sept 2003 47 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. As distance learning becomes popu- lar, new demands for more advanced fea- tures increase. For example, to satisfy the requirements of multimedia-based courses, teachers need to spend a lot of time learn- ing course-creation tools. This proves dif- ficult for the senior teachers who are ac- customed to the traditional ways of teach- ing. Another issue is that both the number of students using the web-based learning environment and the flow of e-learning ma- terials grow very fast. This creates a prob- lem of information overload for both stu- dents and teachers. Demands for person- alized services increase. We note that the existing web-based systems often do not provide sufficient support on such aspects as giving personalized services to each in- dividual student, and helping them find their desired courses for study and answers to their questions. This problem has a great impact on the quality of network-based education and has contributed largely to the students’ drop rate. In this paper, we present an intelli- gent distance-learning environment, which is developed and used at the Network Edu- cation College of Shanghai Jiao Tong Uni- versity. The motivation of our work is to build a new distance learning system that enables students to conduct online studies easily according to their own educational backgrounds, study habits and paces. We are particularly interested in providing so- lutions to the information overload problem and personalized service. In short, our ef- forts are dedicated to make teachers feel that “everything is easy” and make students feel that “everything is available” and “ev- eryone is different.” Our system is being used by thousands of adult students regu- larly in Shanghai, China. In the following, we present the framework with an empha- sis on the issues of providing answers to students’ questions, and making personal- ized recommendations to students. We dis- cuss data mining and case-based reason- ing techniques to solve these problems. To support this framework in which smart and personalized distance learning is realized, we employ the tools of data min- ing and case-based reasoning. Data min- ing allows us to study the user patterns and behaviors that are buried in massive data that we track, and case-based reasoning allows us to configure our question-an- swering system so that it allows the user to pose questions to a virtual teacher interac- tively. In this paper, we will explain both the functionalities and the algorithms be- hind these features. OVERVIEW OF THE SYSTEM ARCHITECTURE The system is composed of a real- time classroom, an EOD (Education on Demand) course centre, a CBIR (Content Based Indexing and Retrieval) search in- terface, a learning assistance center and a data analysis center. During a class ses- sion, all the data the lecturer and students need, including video, audio, handwriting materials and screen operations, are trans- mitted simultaneously to each student’s desktop. In the meantime, all interactions are recorded and public materials are pub- lished on the Web. After the class session, students who were unable to take the class can view the same content on the Web as that shown at the class. The CBIR search interface enables the students to find their desired materials conveniently and quickly. The learning assistance center consists of an assignment subsystem, an examination subsystem and an answer-machine sub- system that helps students to complete as- signments and exams on the Web, and an- swers their questions automatically. All the 48 Journal of Distance Education Technologies, 1(3), 46-58, July-Sept 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. didactical and user access data are col- lected in log files and analyzed by the data analysis center. The system can provide personalized service to the students according to the analysis results. The details of these com- ponents are discussed in the following sec- tions. The “Everything Is Easy” Teaching Environment Although multimedia tools have been built to help teachers create online courseware, some teachers still prefer to use blackboards. Especially, teachers teach- ing mathematics and chemistry feel it diffi- Input the Index keyword Teacher’s Video PPT Tutorial Matching Page Figure 1: System Overview Figure 2. Framework of the Data Analysis Centre Journal of Distance Education Technologies, 1(3), 46-58, July-Sept 2003 49 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. cult to write complex symbols and formu- las on computer screens. To make “every- thing easy” for these teachers, we have developed an intelligent board transfer sys- tem. The teachers can write anything on a computerized whiteboard and the content is transferred simultaneously to the stu- dents’ desktops and integrated with the teachers’ video and audio teaching materi- als. The students can write notes on the teachers’ handwriting window. The com- bined information is stored on the network so the students can review it anytime later. We called such content personalized notes. The teachers can also load their pre-pre- pared PowerPoint and Word documents into the transfer system, and then both the teachers and students can navigate these documents synchronously. Using this sub- system, the teacher can focus on the teach- ing content instead of formats. All the useful data from a class ses- sion are stored and published on the Web. The students missing the class session can teach themselves anytime after the class. We also convert these contents to CDs for the students who are unable to view the active online lessons due to limited band- widths. With such an environment the teach- ers and students can always find a time to communicate that suits their work and pref- erence. This conforms to our philosophy of “everything is easy.” The “Everything Is Available” Assistance Tool A distance-learning environment of- ten contains too many materials for stu- dents to choose from. It is important to pro- vide a tool for students to find the right ma- terials they need. A lot of work has been done in the past on this aspect. However, many efforts have been placed on stan- dardizing the courseware with a unified data specification such as XML so that they can be indexed on the Web. We believe that it is even more important to design an inter- face for a student to decide whether the knowledge he is searching for is inside the courseware and locate it. For example, if a student wants to review “The First Law of Thermodynamics,” he can input the phrase through a textbox or microphone, and then the computer can locate the relevant ma- terials in the courseware automatically through an answer machine system and a speech recognition system. In our system, we use a Content- Based Information Retrieval technology to implement this function. As we described above, the courseware includes such in- formation as the teacher’s video, audio and tutorials. We consider the audio and tuto- rial information to be the most important materials and index them. The students can see both the teacher’s video and the di- dactical materials such as the PowerPoint slides, as shown in Figure 1. They can also hear the teacher’s voice. In addition, the system can support the courseware on- demand with the index keyword input. Because the number of students is large, usually 10 times or more than a con- ventional teaching class, a lot of teaching tasks have to be supported by the com- puter. Let’s take Q&A (Question and An- swer) System as an example. If there are 200 students online and each student asks only one question, then it will take a teacher several hours to answer all these questions. From our experience, many questions--al- though expressed differently--have the same or similar meanings. The solution to this problem is to share the answers among the students and let a computer recognize similar questions and answer them auto- matically. If the computer cannot find an answer, it transfers the question to a 50 Journal of Distance Education Technologies, 1(3), 46-58, July-Sept 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. teacher. After the teacher answers the question, the answer is added to the Q&A database and shared among students. Therefore, as the Q&A database accumu- lates questions and answers, the hit rate grows over time. There are already some existing question-answering systems in use. In com- parison, our system emphasizes efficiency rather than comprehension of the language. We have observed that only a limited num- ber of questions are asked in each course and the questions are usually very simple. Thus, we adopt an improved keywords matching algorithm to find the answer. Af- ter a period of accumulation, the hit rate of our Q&A system has risen to 90% and the corresponding time to answer each ques- tion is reduced to two seconds. We first discuss the structure of our answer machine system in detail. The ques- tions and answers are obtained through a standard Web interface. The students us- ing the system will leave behind many ques- tions and potential answers. Over time, these questions and answers will accumu- late in a log file. The log file can then be used for training an indexing structure for the question-to-answer association. This process continues whenever the system is in use, making the answer machine system a closed-loop system. We will adopt the lifetime learning paradigm of Zang and Yang (2001) for acquiring indexical knowledge about cases in a case-based reasoning paradigm. In this paradigm, the answers are cases to be stored in a case base. The questions provide keywords that trigger the cases and rank them according to how well they can provide an answer for the ques- tions. An important issue then is how to provide ranking for the keyword-to-answer association. We call this the index-learn- ing problem. The structure of a case base can be conceptualized as a two-layer structure, where the feature-values form one layer and the cases another. The feature-value layer is connected to the case layer through a set of weights to be maintained. We now extend the original two-layer structure of a case base into a three-layer structure, tak- ing the two-layer architecture as a special case. In the case layer, we extract the an- swers from each case, and put them onto a third layer. This makes it possible for dif- ferent questions to share a solution, and for a question to have access to alternative answers. An important motivation for this separation of a structure of a case is to reduce the redundancy in the case base. Given N questions and M solutions, a case base of size MN * is now reduced to one with size MN + . This approach eases the scale-up question and helps make the case base maintenance problem easier, since when the need arises, each question and answer need be revised only once. In or- der to make this change possible, we intro- duce a second set of weights, which will be attached to the connections between cases and their possible solutions. This sec- ond set of weights represents how impor- tant an answer is to a particular question if this answer is a potential candidate. The weights correspond to a mapping function between the input questions and the final answers. Different questions may in fact correspond to the same answer. When many students ask questions, over time this mapping can be learned by a rel- evance feedback algorithm. We adopt the relevance-feedback learning algorithm pro- posed by Zhang and Yang (2001) for our case-based reasoning system, where the weights are incrementally updated based on whether a particular case provides a right answer or not for an input question. Journal of Distance Education Technologies, 1(3), 46-58, July-Sept 2003 51 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. In order to validate the system, we have to gather more data from the students. The data should not only reflect what ques- tions the students asked, as in the search engine query logs, but also how they rank the returned results. Given these question- answer log files, we can apply the above learning algorithm and keep the question to answer mapping always current (Zhang & Yang, 2001; Yang & Wu, 2001). The “Everyone Is Different” Personalized Service In a traditional education system, the course content is static and the teacher’s assignments given to different students are the same. In reality, students have differ- ent backgrounds and the knowledge struc- ture is dynamic. Given such diversity, how do we analyze students’ learning behav- iors, characteristics and knowledge struc- tures? Furthermore, how do we send the feedback of learning states to teachers? In addition, how do we visualize the analysis results to teachers and students intelligibly? In order to answer these questions, we pro- pose a subsystem, the Data Analysis Cen- tre, which includes an analysis tool to sup- port the student study behavior analysis. Figure 2 gives the framework of the sub- system. In this subsystem, the resource data- base is composed of two kinds of data: the log files with specification of W3C and the attribute tables in the sub-function database. The data-preprocessing module will deal with the original data to clean them up. The first task is to transfer the log files into da- tabase files with DTS (Data Transforma- tion Services) tools. The second task is to create the corresponding tables of User_ID and IP. The transformation also solves the problem of the one-to-many relation be- tween students’ User_ID and IP attributes. The third task is to calculate the click-time and browse-span of one URL, which is very important to mine the data structure of stu- dents. The last task is to create new tables and views for further analyses. The preprocessing creates clean data. Since we organize data sources according to knowledge points and build relation tables of sources and knowledge points, we can assess the knowledge points from two as- pects: the general information, to calculate the Interest Measure and the Mastery Measure of each chapter-point and knowl- edge-point based on the statistical data; and the personalized information, to assign the Interest Measure and the Mastery Mea- sure to each student. We use three techniques to discover knowledge and rules. The first technique Figure 3: Visualization of the Analysis Results Knowledge Point 0 1 2 3 4 5 6 Interest Measur 1 0.5 Knowledge-group/chapter Knowledge-group 52 Journal of Distance Education Technologies, 1(3), 46-58, July-Sept 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. is to use a classification algorithm to clas- sify students into different classes based on their learning actions. Based on the clas- sification, the teacher can organize differ- ent course contents and assign homework in different difficulty levels to each class. The second is to find association rules of different knowledge-points, the support and confidence values. The third is to organize and map the knowledge points using a con- cept map algorithm. Using a visualization module, we can visualize all the analysis results in different forms. Figure 3 shows the “interestingness” measure of knowledge points, based on the visit frequency of a certain chapter in a course, or the number of questions posted on the answer machine. It also shows the students’ mastery measure of a given sub- ject, determined by the students’ feedback whether they find the material satisfactory or not. The teacher can provide more sci- entific explanations online about a particu- lar knowledge point with a high interest- ingness measure. He can also choose the low mastery measure knowledge point to teach in detail and supply more reference materials to the students. Figure 3 on the previous page shows the multidimensional association of knowl- edge points. The ellipses represent knowl- edge point groups, such as chapters. The circle represents a knowledge point. We can see not only the relationship between the knowledge points in the groups but also the relationship between the knowledge points in different groups. Such informa- tion can direct the teacher to re-organize the knowledge points more effectively. Furthermore, we can also represent a knowledge-point map which can show the relationship between the knowledge points and provide hints for the students as to what the prerequisite knowledge points are before the current knowledge point. In our tests, the Data Analysis Cen- ter can find some interesting rules and cre- ate useful graphs of the knowledge point structure. These results enable the teacher to adjust the didactical progress and en- able students to learn more personally. Once we obtain the knowledge points, we now consider how to utilize the Web log data accumulated by the Web servers to derive interesting and useful association rules on the interesting knowledge points. Figure 4: Learning and Submitting Questions Answer Center Raise Question Submit Journal of Distance Education Technologies, 1(3), 46-58, July-Sept 2003 53 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Given a Web log, the first step is to clean the raw data. We filter out documents that are not requested directly by users. These are image requests in the log that are re- trieved automatically after accessing re- quests to a document containing links to these files. Their existence will not help us to do the comparison among all the differ- ent methods. We consider Web log data as a sequence of distinct Web pages, where subsequences, such as user sessions, can be observed by unusually long gaps be- tween consecutive requests. For example, assume that the Web log consists of the following user visit sequence: (A (by user 1), B (by user 2), C (by user 2), D (by user 3), E (by user 1)) (we use “(…)” to denote a sequence of Web accesses in this pa- per). This sequence can be divided into user sessions according to IP address: Ses- sion 1 (by user 1): (A, E); Session 2 (by user 2): (B, C); Session 3 (by user 3): (D), where each user session corresponds to a user IP address. In deciding on the bound- ary of the sessions, we studied the time interval distribution of successive accesses by all users and used a constant large gap in time interval as indicators of a new ses- sion. To capture the sequential and time- limited nature of prediction, we define two windows. The first one is called anteced- ent window, which holds all visited pages within a given number of user requests and up to a current instant in time. A second window, called the consequent window, holds all future visited pages within a num- ber of user requests from the current time instance. In subsequent discussions, we will refer to the antecedent window as W1, and the consequent window as W2. Intu- itively, a certain pattern of Web pages al- ready occurring in an antecedent window could be used to determine which docu- ments are going to occur in the consequent window. The moving windows define a table in which data mining can occur. Each row of the table corresponds to the URLs cap- tured by each pair of moving windows. The number of columns in the table corresponds to the sizes of the moving windows. This table will be referred to as the Log Table, which represents all sessions in the Web log. Table 1 shows an example of such a table corresponding to the sequence (A, B, C, A, C, D, G), where the size of W1 is three and the size of W2 is two. In this table, under W1, A1, A2 and A3 denote the locations of the last three objects re- quested in the antecedent window, and P1 and P2 are the two objects in the conse- quent window. We now discuss how to extract se- quential association rules of the form LHS →RHS from the session table. Here LHS refers to the left-hand-side of a rule, whereas RHS the right-hand-side of a rule. The association rules have been a main subject of study in data mining (Agrawal & Srikant, 1994; Han & Fu, 1995; Skrikant & Agrawal, 1995, 1996; Chee, Han & Wang, 2001; Yang, Zhang & Li, 2001). Our different methods below will extract rules based on different criteria for selecting the LHS. In this work, we restrict the RHS in the following way. Let {U1, U2, …Un} be the candidate URL for the RHS that can be predicted based on the same LHS. Table 1: A Portion of the Log Table Extracted by a Moving Window Pair of Size [2, 2] W1 W2 A1 A2 A3 P1 P2 A B C A C B C A C D C A C D G 54 Journal of Distance Education Technologies, 1(3), 46-58, July-Sept 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. courses. For example, our rules can in- form the teachers “Students who find Chapter 3 useful also find Chapter 5 use- ful.” Knowledge like this will allow the teachers to organize the two chapters to- gether on the Web structure. It will also allow teachers to recommend to students new chapters to read based on their cur- rent reading. Similarly, the same associa- tions can be used to help organize the ma- terial better or form better student study groups. For example, a rule such as “Stu- dents who attend Wednesday classes of- ten have difficulty with Calculus I” enables the teacher to improve the Calculus I ma- terial better online, or organize the students in that class to work together with students from other classes. We also plan to use different user information and log data to perform collaborative filtering analysis and provide recommendations (Breeze, Heckerman & Kadie, 1998) using Pearson Correlation. The above-discussed framework as- sumes that the knowledge points are given beforehand. However, these knowledge points can be discovered from the Web logs as well. Pitkow and Pirolli (1999) provide a longest subsequence mining method for extracting user profiles. Su et al. (2002) provide an interesting method for cluster- ing based on the Web logs alone. In our study, we plan to combine both the content information and the user behavior informa- tion from the Web logs to derive the clus- ters. The method that we propose to use is called clustering. Due to space limitation, we will not go into detail on this subject. A DISTANCE-LEARNING CASE STUDY When a student connects to our NEC (Network Education College) homepage We build a rule LHS→Uk where the pair {LHS, Uk} occurs most frequently in the rows of the table among all Uis in the set {U1, U2, …Un}. Ties are broken arbitrarily. This is the rule with the highest support among all LHS→Ui rules. The first rule representation we con- sider is called the subset rules. These rules are the same as the traditional association rules which simply ignore the order and adjacency between accesses. Thus, when the association rule mining methods, such as the Aprioi method (Han & Fu, 1995; Skrikant & Agrawal, 1995, 1996), are ap- plied to the log table, we obtain the subset rules. The second rule representation is called the subsequence rules, which takes into account the order information in the sessions. A subsequence within the ante- cedent window is formed by a series of URLs that appear in the same sequential order as they were accessed in the Web log data set. However, they do not have to occur right next to each other, nor are they required to end with the antecedent win- dow. When this type of rule is extracted from the log tables, the left hand side of the rule will include the order information. For each rule of the form LHS→RHS, we define the support and confidence as follows: (1) (2) In the equations above, the function count(Table) returns the number of rows in the log table, and (3) From these rules, we can obtain in- teresting association relations between )( ),( sup Tablecount RHSLHScount = � )sup( ),sup( LHS RHSLHS conf = � )( )( )sup( Tablecount LHScount LHS = � Journal of Distance Education Technologies, 1(3), 46-58, July-Sept 2003 55 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. (http://www.nec.sjtu.edu.cn), he can select which chapter or section to study. Our sys- tem provides multimedia study materials for students, including video, audio, images and text documents. The learning resources are well organized for study convenience. Dur- ing a student’s learning session, he may have a question to ask. Our system provides a functional button in every study page to help the student link to the Answer Ma- chine at any time. When the student clicks the “Answer Center” button, he can see the Ask Question page. In this window, he can input the question in natural language and submit it as shown in Figure 4. After receiving this initial query, the system shows a list of similar questions to the student. The student can choose the most similar one to see the answer. If all listed questions are not relevant, the stu- dent can submit the question to a teacher (see Figure 5). Beyond these functions, the Answer Center also provides other services, such as the Hot Spot of Lesson, the Hot Spot of Chapter, and Search Answer and so on. For example, the Hot Spot of Chap- ter can provide the hotspots discussions of every chapter. The hotspots discussion can help students find out what questions other students have asked and what the correct answers are. The user can see the distribution of questions of a chapter or section in the se- lected time-span. The results can be shown in graphs, pie charts, histograms and so on. The user can choose different forms he Figure 5: Answering the Questions in Answer Centre Figure 6: Framework of the Data Analysis Centre 56 Journal of Distance Education Technologies, 1(3), 46-58, July-Sept 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. likes and look into details by clicking each part of the diagram (see Figure 6). In addition, the relation of knowledge points can be shown in 2D or 3D graphs. According to the precedence and subse- quence of a knowledge point, the system can recommend the imperative knowledge to learn or to prepare. CASE-BASED REASONING FOR PERSONALIZED INTERACTION In order to interact with the students such that the students will feel like they are talking to a virtual teacher, we employ the technology of case-based reasoning in order to reuse the previous questions and answers. Case-based reasoning (Yang & Wu, 2001; Kolodner, 1993; Leake, 1996) is a technique to reuse past problem solv- ing experiences to solve future problems. The basic idea is based on analogy, whereby similar problems are found and their solu- tions are retrieved and adapted for solving the new problem. The effectiveness of a CBR system critically depends on the speed and quality of the case base retrieval pro- cess. If the retrieved cases are not accu- rate or the retrieval performance is too low, then a CBR system cannot function as ex- pected. If too many seemingly similar so- lutions are retrieved, as in the case of some Web browsers where thousands of items are returned, a CBR system cannot pro- vide its users with much assistance either. In using a CBR system, we must first accumulate a set of cases. The cases in our domain are the questions and their cor- responding answers that students and teachers have used in the past. These ques- tions and answers give what we call ques- tion-answer pairs. Each question can be further divided into a number of important keywords using methods in information re- trieval. The keywords correspond to fea- tures or attributes in a machine learning system. These features are linked to their answers through a weighted link, where the weights encompass much of the domain knowledge in teaching the course. These weights can be learned or trained using the previously obtained questions and answers. Given the input feature-value pairs, the first layer features are considered set with their values. For example, a keyword may be used by a student in describing a problem. In this case, that keyword will get a value of one. If a keyword does not appear in a question, it obtains a value of zero. A similarity function will then be used to calculate based on the following formula. The similarity function we use is the TF- IDF formula used in information retrieval. The documents in this domain correspond to the questions that the system has an- swers for from previous problem-solving sessions. The TF-IDF scores are then calculated by comparing the similarity be- tween the input question and all stored ques- tions. The top-n most similar questions are chosen, and their answers are provided as potential answers for the student. If the system cannot find a similar question with answers, then it always gives the student the choice of contacting the teacher directly. Then, the system will sim- ply route the question to the most qualified teacher in its knowledge base. The rout- ing module is another interesting case of using data mining, where the capabilities of teachers are modeled and updated as more questions are answered for the students. CONCLUSIONS AND FUTURE WORK In this paper, we have presented an Journal of Distance Education Technologies, 1(3), 46-58, July-Sept 2003 57 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. open, adaptive framework to organize the course material. The heart of the intelli- gent system lies in a smart front-end sys- tem we call Answer Machine, and an in- telligent back-end system using Web log association analysis and clustering analy- sis. In the future, we plan to offer more tests on the system’s performance using the data we accumulate through real teach- ing sessions. Such validation will allow us to select the best intelligent teaching meth- ods for an open virtual teaching environ- ment. REFERENCES Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. In Proceedings of VLDB’94, Santiago, Chile, 487-499. Breeze, J., Heckerman, D. and Kadie, C. (1998). : Empirical analysis of predictive algorithms for collaborative fil- tering. In Proceedings of the Fourteenth Conference on Uncertainty in AI, Madi- son, WI. Chee, S., Han, J. and Wang, K. (2001). RecTree: An Efficient Collabora- tive Filtering Method. Proceedings of the DaWaK 2001, 141-151. Ganti, V., Gehrke, J. and Ramakrishnan, R. (1999). Mining very large databases. Computer, 32(8), 38-45. Groeneboer, C., Stockley, D. and Calvert, T. (1997). Virtual-U: A collabo- rative model for online learning environ- ments. Proceedings of the Second In- ternational Conference on Computer Support for Collaborative Learning, Toronto, Ontario, Canada. Han, J. and Fu, Y. (1995). Discov- ery of multiple-level association rules from large databases. Proceedings of VLDB’95, Zürich, Switzerland, 420-431. Li, I.T., Yang, Q. and Wang, K. (2001). Classification Pruning for Web-re- quest Prediction. Poster Proceedings of the 10th World Wide Web Conference (WWW10), Hong Kong, China. Kolodner, J..(1993). Case-Based Reasoning. San Mateo, CA: Morgan Kaufmann Publishers, Inc. Leake, D.B. (1996).. Case-based Reasoning - Experiences, Lessons and Future Directions. Boston, MA: AAAI Press/The MIT Press. Pitkow, J. and Pirolli, P. (1999). Min- ing Longest Repeating Subsequences to Predict WWW Surfing. Proceedings of the USENIX Annual Technical Confer- ence. Srikant, R. and Agrawal, R. (1995). Mining generalized association rules. Pro- ceedings of VLDB’95, Zürich, Switzer- land , 407-419. Srikant, R. and Agrawal, R. (1996). Mining quantitative association rules in large relational tables. In: Proceedings of SIGMOD’96, Montreal, Canada, 1-12. Su, Z., Yang, Q., Zhang, H.J., Xu, X., Hu, Y. and Ma, S. (2002). Correla- tion-based Web-Document Clustering for Web Interface Design. International Journal of Knowledge and Information Systems., 4, 141-167. WebCT: Available online at: http:// www.webct.com. Yang, Q., Zhang, H. and Li, I.T. (2001). Mining Web Logs for Prediction Models in WWW Caching and Prefetching. In: Proceedings of the 7th ACM Inter- national Conference on Knowledge Dis- covery and Data Mining (KDD’01), San Francisco, 473-478. Yang, Q. and Wu, J. (2001). En- hancing the Effectiveness of Interactive Case-Based Reasoning with Clustering and Decision Forests. Applied Intelligence Journal, 14(1), 49-64. Zhang, Z. and Yang, Q. (2001). Fea- 58 Journal of Distance Education Technologies, 1(3), 46-58, July-Sept 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. ture Weight Maintenance in Case Bases Using Introspective Learning. Journal of Intelligent Information Systems, 16, 95- 116. Ruimin Shen: received the BS and MS degree in Computer Science from Qing Hua University, Beijing, China, in 1991.The Professor and PhD supervisor of Depart- ment of Computer Science and Engineering£¬Shanghai Jiaotong University, in 1998. His research interests include Network Information Process, Knowledge Discovery and Data Mining, Multimedia Network Cooperation, Content Based Index, E-Learning and Wireless Network Education Technology. Peng Han received the BS from Institute of Communication Engineering, Nanjing, China, in 1998, the MS degree in Computer Science from University of Science and Technology, Nanjing, China, 2001. He is now a PhD student in Computer Science and Technology of Shanghai Jiaotong University, Shanghai, China. His research interests include Content Based Index and Retrieval, Information Re- trieval, and Data Management. Fan Yang: received the BS from Institute of Communication Engineering, Nanjing, China, in 1998, the MS degree in Computer Science from University of Science and Technology, Nanjing, China, 2001. She is now a PhD student in Computer Science and Technology of Shanghai Jiaotong University, Shanghai, China. Her research interests include Data Mining, Web Mining, Case Based Reasoning, and Collaborative Filtering. Qiang Yang is an associate professor at Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong, China. His specialty is AI planning, case based reasoning and data mining. He obtained his PHD from University of Maryland in 1989, and had been a faculty member at University of Waterloo and Simon Fraser University in Canada since 1989. He is an IEEE and AAAI Member. Joshua Zhexue Huang is the Assistant Director of the E-Business Technology In- stitute of the University of Hong Kong. His research interests are data mining, text classification, data warehousing, business intelligence and CRM. Before joining ETI in early 2000, he worked three years at MIP Australia as a senior consultant to help Australia companies to implement business intelligence solutions. Before MIP he was a research scientist at the Mathematics and Information Sciences Division of The Commonwealth Science and Industry Research Organization (CISRO), Australia. He received his PhD degree from The Royal Institute of Tech- nology in Sweden.