IRMJ01mcmanus Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 1 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. ABSTRACT With the increasing popularity of the Internet, there is a growing demand for web-based educa- tion, which allows students to study and learn at their own pace over the Internet. However in order to improve the teaching quality, such systems should be able to adapt the teaching in accordance with individual students’ ability and progress. Focusing on this objective, this paper proposes a new method to construct group-wised courseware by mining both context and structure of the courseware to build personalized Web tutor trees. To this end, the concept of Web tutor units and the notion of similarity are presented. Five algoriths, including the Naive Algorithm for tutor concept tree and the Level-generate Algorithm to generate Web tutor units of K+1 levels, are proposed. Experimental results are presented to demonstrate the effectiveness of the new method. Keywords: distance learning, student profiling, web tutor unit, group-wised tutor tree. Automatic Re-Organization of Group-Wised Web Courseware Changjie Tang, Sichuan University, China Rynson W. H. Lau, City University of Hong Kong, China Qing Li, City University of Hong Kong, China Tianqing Zhang, Sichuan University, China Danny Kilis, Pacific Century CyberWorks (PCCW), Hong Kong INTRODUCTION With the rapid advancement in multi- media technologies and the availability of Web infrastructure, distance learning is now widely adopted in the higher education in China. The e-Teacher system, an experi- mental software for distance learning, has been jointly developed by the City Univer- sity of Hong Kong and Sichuan University. It runs on the software platform with Win- dows NT plus IIS (Internet Information Server) and ASP (Active Server Page). The main idea of the e-Teacher is its capa- bility in adapting the teaching in accordance with the progress of individual students. Its key mechanisms are as follows: 1) Clustering students into different groups according to their abilities. For example, group11 = (Theory, Excellent), 701 E. Chocolate Avenue, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.idea-group.com ITJ2438 IDEA GROUP PUBLISHING 2 Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. group12 = (Theory, Medium), group13 = (Theory, Not good), group21 = (Prac- tice, Excellent), group22 = (Practice, Medium), group23 = (Practice, Not good), etc. Thus, the teaching style is called “group-wised teaching.” In the ex- treme situation when each group has only one student, it becomes a “person- alized teaching.” In the e-Teacher sys- tem, this function is implemented in a data warehouse called ETDW. 2) Constructing a group-wised course- ware that can be accessed through common web browsers, such as Microsoft IE and Netscape Naviga- tor. It is implemented based on our ex- perience in the course “Reading Selected Articles on Web” (RSAW). RSAW is one of the core courses of the distant learning M.Sc. and Ph.D. degree pro- grams. One of the authors is currently teaching this course to students across several provinces in China. To organize the course RSAW in group-wised style, the distance-teacher needs to have the following: • A set of profiles: To store the profile information of each student, such as name, age, class, interests, background, academic records, etc. • A tutor tree: This is a learning schema designed for each cluster of students in accordance with their abilities. Each tree node is a 2-tuple , where WTUnit is a Web tutor unit (an article or a sub tutor tree) organized in a multi-resolution form, and Weight is an array of integers (containing the cluster number, course importance, teaching hours). • A set of evaluation and upgrading facilities: To automatically evaluate the answer sheets and exercise forms for each student, dynamically upgrade the student profiles (as a feedback of evalu- ation), and reorganize student grouping based on the evaluation results. In this paper we focus on the design of a good tutor tree. A group-wised tutor tree allows students to find the articles sat- isfying personal demands in a short time. The basic idea and main steps to construct the tutor tree are as follows: 1) Use an existing (usually naive) URL tree as the initial tutor tree. 2) Configure the model to calculate the similarity by adjusting the weights of the Web tutor units, and to evaluate the simi- larity of Web tutor units. 3) Reorganize the tutor tree by similarity and group-wised keywords. 4) Establish the new tutor tree. In current practice, distance-teach- ers use an existing collection of URLs in a way that the collection may be considered as a naive form of the tutor tree. As shown in Table 1, it works in an “eagerly collect- ing style” by collecting everything available with low efficiency. Our interest in this study is on how to build efficient and group-wised tutor trees for effective distance learning. Topic of selected articles The root of tutor tree Knowledge Discovery http://www.kdnuggets.com/ Machine Learning Database Repositories http://www.ics.uci.edu/~mlrepository.html Table 1: Sample of naive tutor tree of RSAW Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 3 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Related Work There have been some research ef- forts conducted that are related to our work. Jiang et al. (1999) attempted to dis- cover structures from documents. They ex- pounded the concept of Structural Docu- ment and developed a formula to calculate the similarity of two structural documents. The authors made a similarity matrix that can be updated by different clustering al- gorithms. Mannila and Toivonen (1999) pro- posed a method to discover generalized episodes using minimal occurrence. Agrawal et al. (1995) proposed a fast simi- larity search in the presence of noise in a time-series database. Tang et al. (1999, 2000) investigated methods to extract knowledge from semi-structural Web data and to discover the quasi-periodicity from Web data. Spertus (1997) considered in- formation clustering (grouping Web docu- ments) according to some predefined pro- files. Song et al. (2000) proposed a model to analyze the semantic similarity between Web documents, and their system supports manipulation of Web documents such as exchange, search and evolution. Unfortu- nately, most of these existing systems are developed for specific purposes, and there is no satisfactory way for constructing ef- ficient group-wised (or personalized) tutor trees for distance learning. Paper Organization The rest of the paper is organized as follows. The next section introduces the concepts of group-wised tutor tree and Web tutor unit. Then we present five algorithms for constructing the group-wised tutor tree and show some experimental results of the new method. Finally the last section draws a brief conclusion of the paper. WEB-BASED TUTORING FACILITIES Figure 1 gives a sample courseware of course RSAW for the distance-learning students. There are eight articles, some of these appeared in the Proceedings of the 16th National Database Conference in China. It includes URLs, names of the pro- ceedings, pages, titles, keywords, author names, first authors’ sexes, names of su- pervisors (for student authors), fields, and special topics. Group-Wised Tutor Tree Two group-wised tutor trees for the course, organized in different ways, are shown in Figures 2(a) and 2(b), where sym- bol ‘^’ indicates “unknown.” However, there may be professors and students who may prefer to access the Web tutor tree in a way similar to the one shown in Figure 2(c), which organizes selected articles ac- cording to the subject. The objective of group-wised courseware is to meet such individual needs. As mentioned before, we cluster students into different groups ac- cording to their ability. When there is only one member in each group, “group-wised” becomes “personalized.” In this paper, “per- sonalized” will be considered as a special case of “group-wised” without further ex- planations. Basic Concepts and Definitions In general, the selected articles for distance leaning students include HTML files, bookmarks, personal home pages, images, and voice files. To formalize the observations, we now define Web Tutor Unit. 4 Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Definition 1 (Web Tutor Unit). 1. A Web Tutor Unit (abbreviated as WTUnit) is a recursive structure: struct WTUnit { CString ObjTitle; // title of web page or bookmark SetOfCString Keywords; // std keywords given by author Cstring URL_Dir; // such as www.scu. edu.cn\CS\DB\ WTUnit *pChildrenUnit[]; // array of children unit } 2. Let N k \N k-1 \...\N 2 \N 1 \ be the URL_Dir of the Web tutor unit w. The ordered set {N 1 , N 2 , ..., N k }, arranged from the last to the first, is called an Ordered Ances- tor Set of a Web tutor unit, and is ab- breviated as Ancestor Set. • It is clear that there is a 1-1 corre- Figure 1: RSAW courseware for distance learning. 1.www.pru.edu.cn, 16DB, P1, “Implementation of Storage of Large Object Data”, {large object, spatial multi-pointer, bitmap page}, Zhang Xiao, Male, Prof. Wang, DB, Database theory. 2.www.fudan.edu.cn, 16DB, P6, “The Query Language and Data Model of Constraint Based Spatial- Temporal Data in Digital Library”, {constraint database, query language, data model}, Wang Wei, Male, Prof. Shi, DB, Database theory. 3.www.fudan.edu.cn, 16DB, P77, “Non-Monotonic Inheritance of Objects”, {Deductive OO database, non- monotonic, inheritance canonical model, the inheritance diagram}, Liu Hong Liang, Male, Prof. Shi, DB, Advanced Database. 4.www.nju.edu.cn, 16DB, P93, “The Implementation of EXPRESS Object-Oriented Data Model with Relational Database Systems”, {OO data model, relational data model, EXPRESS modeling language, RDBS}, Yu Yong Hong, Male, Prof. Xu, DB, Advanced Database. 5.www.scu.edu.cn, 16DB, P215, “Aggregation on Data Cube”, {KDD, cube OLAP, B-Tree, dependency tree}, Liu Xin, Male, Prof. Tang, DB, Data Warehouse. 6.www.pku.edu.cn, 16DB, P308, “A Client Analysis System Prototype Based on Spatial Data Mining”, {attribute-oriented induction, spatial data mining, spatial attribute-oriented induction}, Xu Qi Chang, Male, Prof. Yang, DB, Data mining. 7.www.fudan.edu.cn, 16DB, P319, “Scaling DBSCAN Algorithm to Large-scale Database by Data Sampling”, {spatial database, data clustering, sampling, DBSCAN}, Fan Ye, Male, Prof. Zhou, DB, Data mining. 8.www.scu.edu.cn, 16DB, P250, “Mining Associations of Objects with Relaxed Periodicity and its Applications in Seismic Research”, {KDD, relaxed periodicity, Seism}, Yang Lu, Female, Prof. Tang, DB, Data mining. (b) By “Director→Topic→PaperID” (c) By “Field→University→PaperID” Renmin Universitymaledatabase theory1 female⊥ Fudan Universitymaledatabase theory2  special database3  data mining7 female⊥ Nanjing Universitymalespecial database4 female⊥ Peking Universitymaledata mining6 female⊥ Sichuan Universitymaledata warehouse5 femaledata mining8 Prof. Wangdatabase theory1 Prof. Shidatabase theory2 special database3 Prof. Zhoudata mining7 Prof. XuSpecial database4 Prof. Tangdata warehouse5 data mining8 Prof. Yangdata mining6 (a) By “University→Sex-Topic→PaperID” database theoryRenmin University1 Fudan University2 Special databaseFudan University3 Nanjing University4 data warehouseSichuan University5 data miningSichuan University8 Peking University6 Fudan University7 Figure 2. Different organizations of web tutor tree. (a,b, and c) Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 5 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. spondence between a Web tutor unit and its URL. Thus, we can refer to a Web tu- tor unit by its URL when needed. Example 1. Let w be the web tutor unit with URL = www.scu.edu.cn\CS\DB\ KDD.html#papers99. ObjTitle is bookmarked as “papers99” in file “KDD.html”. The string “papers99” is shown as the title of the current page. URL_Dir is www.scu.edu.cn\CS\DB\. Assuming that there are three hyperlinks named as “Relations,” “Time Series,” and “Classify” in w, pChildrenUnit is then the set of hyperlinks. The author of the article gives the set of keywords. Finally {“CS”, “DB”} is the ordered ancestor set. • We make the following observations: 1. Let w and w’ be two web tutor units. If the first two parameters (i.e., ObjTitle and Keywords) are similar, then from the student’s point of view, w and w’ are similar in contents. 2. Let URL_Dir of w be N m\Nm- 1\...Nk...\N2\N1\, URL_Dir of w’ be N’m\N’m-1\...N’k...\N’2\N’1\, and Ni = N’i, where 1 £ i £ k. This indicates that the positions of w and w’ in the Web organization are similar, and the larger the k is, the more similar they are. Let b(Ni) be the degree of contribution of Ni to this similarity. Obviously, β(Nk) ≤ ... ≤ β(N2) ≤ β(N1). Example 2. Suppose that w and w’ correspond to URLs “www.scu.edu.cn \CS\KDD\A.html” and “www.pku.edu.cn \Math\KDD\B.html”, respectively. From observation (2), we have N1=N’1=KDD, N2 = CS (Department of Computer Sci- ence), N’2 = Math (Department. of Math- ematics), N3 is Sichuan University, and N’3 is Peking University. Files A.html and B.html are in different pages of different departments in different universities, but are similar in terms of having the same imme- diate ancestor (i.e., ‘KDD’). • Definition 2. Let UnitSet be a set of Web unit, the function Same is defined as: Same: UnitSet × UnitSet →{1, 0} The value of Same(UnitSet 1, UnitSet2) is 1 if UnitSet1 UnitSet2, or 0 oth- erwise. • Definition 3 (Similarity of Web tutor units). Let w1 and w2 be two Web tutor units. 1 The set of personalized keywords, PersKeySet = {K1, K2,…, Kn}, is se- lected from domain standard keywords by users according to the personalized guideline. 2 Let K_SET i = w i.Keywords ∩ PersKeySet, where i = 1 or 2. K(w1,w2) = |K_SET1 ∩ K_SET2| / |PersKeySet| is called the personalized keyword similarity of w1 and w2. 3 Let wi have ni child web units, and the children units are C i = {wi.pChildrenUnit[k] | 0 ≤ k ≤ ni} for i = 1 to 2. Then, C(w1,w2)= | C1 ∩ C2 | / | C1 ∪ C2 | is called the children similar- ity of w1 and w2. 4 Let ancestor sets of w1 and w2 be {N11, N12, ..., N1p} and {N21, N22, ..., N2q}, respectively. If there exists a number k, 0 ≤ k ≤ min (p, q) such that N1i = N2i for 0 ≤ i ≤ k, and N1(k+1) ¹ N2(k+1), then r = 1/ 2 + 1/4 + ... + 1/2k is called the inherit- ance similarity of w1 and w2, denoted as A(w1, w2). 5 Let k be the number described above and a, c, be non-negative numbers, and k+c+a=1. Then Group_Similarity (w1,w2) = k xK(w1,w2) + c x C(w1,w2) + a x (w1,w2) is called the group-wised (or personalized when the group size is one) similarity of w1 and w2. Note that: 1) Inheritance similarity is a binary num- 6 Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. ber 0.11...1=2-1+2-2+2-3+…+2 -k. Its length of fractional part is k. Contribu- tions of ancestors to similarity are de- creasing as series of 2-n. 2) Inheritance similarity and children unit similarity reflect the resemblance of units in a web organization. The personalized keyword similarity describes resem- blance of units under users’ guideline. 3) Parameters a, c and k are given by the distance-teacher. (The default values used in our experiment are a=0.2, c=0.1, and k=0.7.) ALGORITHMS FOR GROUP- WISED TUTOR TREE We now proceed to describe how to construct the group-wised tutor tree. We do this by introducing five algorithms that we have developed for this purpose. A Naive Way to Construct Web Tutor Tree Parameters In the Naive Algorithm for group- wised tutor tree, the parameters are set as follows: a=c=0 and k=1 (see Definition 3). That is, the role of the old organization of web courseware is ignored; only the key- words of the Web tutor unit are used as clustering criteria. Thus, the personalized similarity of w1 and w2 is: P(w1, w2) = K(w1, w2) = |K_SET1 ∩ K_SET2| / |PersKeySet|. Training Set and Personalized Order of Keywords The primary training set, denoted as PrimaryTutorSet, is selected by the dis- tance-teacher from the Web courseware under the following criteria: The size of PrimaryTutorSet is big enough, say more than 100 pages. The PrimaryTutorSet must be typical enough. It involves typical tutor contents, with typical keywords, ancestors and children units. Let PrimaryTutorSet = {w 1, w 2, ...,wn} and the personalized keyword set PersKeySet={K1, K2, ..., Km}. We con- struct the Tutor Unit-Key Matrix as shown in Table 2. If there exists a keyword ki con- tained in wj, then aij = 1, otherwise, aij = 0. The number si = ai1 + ai2 +...+ ain is the total number of wj containing ki, called key activity of ki. The number tj= a1j + a2j +...+ anj indicates the number of keywords con- tained in wj. Formally, we have: Definition 4. Let the context be as that of Table 2. 1) The set of personalized keywords with the descending order of si is called Descending Key set. tj in Table 2 is called group-wised intensity of Web tutor unit wj . 2) The function to arrange the web tutor unit set WTUnitSet with descend- ing order of group-wised in- tensity is denoted as O r d e r e d _ W T U n i t S e t =PD_Sort (WTUnitSet). Table 2: Tutor Unit-Key Matrix Unit-Key w 1, w2, ..., wj, .., wm Sum k1 ... k i ... k n . ... ... ... ... aij .. . . . .. si Total keys tj Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 7 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. 3) Let Delta be the threshold of group- wised intensity. The set TrainWTUnitSet = GD_Sort({wi | ti(wi) > Delta }) is called personalized training set with re- spect to Delta. • It is straightforward to build the Tu- tor Unit-Key Matrix, to sort keywords, and to generate personalized training set. We thus assume that these processes are done in a preprocessing phase. Candidate Tutor Concept Set During the procedure to reorganize web tutor tree, a set of similar web tutor units can be viewed as a (new) tutor con- cept. To prepare and simplify the concept- generating procedure, we define the fol- lowing: Definition 5. Let TrainWTUnitSet = {w1, w2, ..., wm} and ancestor set of wi be Ai={Si1, Si2, ..., Sik[i]}. Then the Candidate Concept Set is defined as: CandidateConceptSet=A1∪A2∪…∪Am- • Note that, Sik is a candidate concept if and only if Sik is an ancestor of some Wi. By “candidate” we mean that it can be selected as the basic material to compose a new concept. It is illustrated by the fol- lowing example. Example 3. (1) Consider Figure 2(a). CandidateConceptSet = {Renmin Univer- sity, Fudan University, Nanjing University, Peking University, Sichuan University, male, female, Database theory, Special database, Data mining, Data warehouse}. (2) Con- sider Figure 2(b). CandidateConceptSet = {Prof. Wang, Prof. Shi, Prof. Xu, Prof. Tang, Prof. Yang, Prof. Zhou, database theory, special database, data mining, data warehouse}.• Definition 6. Let S1, S2, and S3 be three strings of characters, S3 be the longest common sub-string of S1 and S2. If |S3| ≥ 0, then S1 and S2 are said to be partially similar with similarity Sim(S1, S2) = |S3| / max(|S1|, |S2| ) . • Based on a set of similar Web tutor units, new tutor concepts (or topics) can be generalized. The function GenerateConcept_Similarity(S 1, S2, NewConcept, Sim) is defined as follows and explained in Example 4. Function GenerateConcept _ Similarity Input: Web tutor concept name (or string) S1, S2. Output: NewConcept and Similarity Sim of S1 and S2 as return value. Steps: 1. L = 0; // Initialization of the com- mon feature 2. for (i = 1; i ≤ |S1|; i++) 3. for (j = 1; j ≤ |S1|-i+1; j++) { 4. Extract sub-string S3 with length of j and start from S1[i]; 5. if (S2 including S3 and j >L) then L = j; 6. } 7. Sim = L / max(|S1|, |S2|); 8. NewConcept = S3 +”_Set”; 9. Output NewConcept and Sim; �• Example 4. Based on Example 3(1) and function GenerateConcept_Similarity, we can generate a new concept from simi- lar concepts with similarity Sim ≥ 0.4, as shown below: 1)University_Set = {Renmin University, Fudan University, Nanjing University, Peking University, Sichuan University}, Sim = 0.5. 2)database_Set = {database theory, spe- cial database}, Sim = 0.5. 3)data_Set = {database theory, special database, data mining, data warehouse}, Sim = 0.5. The Meta Concept Base Some concept names are derived 8 Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. from their elements semantically but not lexically, such as “Sex” ={“male,” “fe- male”}. To generate such tutor concepts (or topic) automatically, we need a Meta concept base as shown in Table 3. Algorithm 1 (Generate new concept) Input: CandidateConceptSet, Meta concept base, threshold of similarity δ>0. Output: The set of new web tutor concepts NewConcepts within which Simi- larity ≥ δ, and all the elements are sorted by descending group-wised intensity. Steps: 1. NewConcept=NULL; // initiate it as NULL 2. for each C i and C j (i0, Max_L (the maxi- mum level of Web tutor tree), PersKeySet ordered by key activity Output: Concept hierarchy model (such as “Topic→University →PaperID” and Tutor tree (such as Figure 2(c)). Procedure: 1. Build CandidateConceptSet from TrainWTUintSet; // See Example 3 2. Invoke Algorithm 1, sort its result, and generate SortedNewConceptSet with similarity ≥ δ; 3. Assume SortedNewConceptSet = {C1, C2, ..., Cn}, and m = Min(n, Max_L). The concept hierarchy model is then C1→C2→ ... →Cm. 4. For each Web tutor unit w in TrainWTUintSet, invoke Algorithm 2 and insert tutor unit according to the concept hierarchy model C1 → C2→ ... → Cm. Denote the resulting tree as ConceptTree. 5. // Insert all remaining Web tutor units of WTUnitSet into ConceptTree TempUnit=NULL; TempSim=0; for (each wi ∈ WTUnitSet - TrainWTUintSet and each Unit wj in ConceptTree) {Simij = Group_ Similarity(wi, wj); // personalized similarity, see Definition 3 (5) if (Simij>δ and Simij>TempSim) { TempUnit=wj; TempSim= Simij; // Now TempUnit is the web tutor Unit with // Maxmum similarity to wi. } } 6. Output ConceptTree as tutor tree. • 10 Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Proposition 1. Let n be the number of Web tutor units. Assume t is the max number of ancestors of tutor unit, r is the max length of keywords and concept names, and m is the levels of concept tree. Then the complexity of the Naive Algorithm is O (p4+n×2m), where p = max (n, r, t). Proof. The complexity of step (1) is n × t. Consider step (2) of the Naive Algorithm. The size of CandidateConceptSet is not greater than n × t. As the complexity of function GenerateConcept_Similarity is O(r2), the complexity of Step (2) is O(r2×(n×t)2). In Step (3), the concept tree has m levels, thus, the complexity of Step (4) and that of Step (5) are both O(n×2m). Let p = max (n, r, t) in the worst case. The complexity of the Naive Algorithm is then O(p4+n×2m) . • In practice, usually t ≤ 10, r ≤ 10, l ≤ 4, and m < n. The complexity can be sim- ply evaluated as O(n4+ n×2 m) for the worst case. Example 5. Consider the Web tutor units (i.e., the selected articles) in Figure 1. The WTUnitSet = {w1, w2, w3, w4, w5, w6, w7, w8}, where w1= Renmin University \ CS \ database \ Prof. Wang \ database theory \ male \ 1.htm, w2= Fudan university \ CS \ database \ Prof. Shi \ database theory \ male \ 2.htm, w3= Fudan University \ CS \ database \ Prof. Shi \ special database \ male \ 3.htm, w4= Nanjing University \ CS \ database \ Prof. Xu \ special database \ male \ 4.htm, w5= Sichuan University \ CS \ database \ Prof. Tang \ data warehouse \ male \ 5.htm, w 6 = Peking University \ CS \ database \ Prof. Yang \ data mining \ male \ 6.htm, w 7 = Fudan University \ CS \ database \ Prof. Zhou \ data mining \ male \ 7.htm, w 8 = Sichuan University \ CS \ database \ Prof. Tang \ data mining \ female \ 8.htm. The inputs of Algorithm 3 are given below: • WTUnitSet={w1, w2, w3, w4, w5, w6, w7, w 8 }. • TrainWTUintSet={w1, w2, w4, w5, w6, w 8 }, selected by the teacher on the de- mands of the students and the personal- ized rules. • PersKeySet={data mining, database theory, special database, data ware- house, Sichuan University, 4}. The stepwise outputs of Algorithm 3 are as follows: 1. TrainWTUintSet = {w 1 , w 2 , w 4 , w 5 , w 6 , w 8 }. 2. CandidateConceptSet = {Renmin Uni- versity, Fudan University, Nanjing Uni- versity, Peking University, Sichuan Uni- versity, CS, database, database theory, special database, data mining, data warehouse, male, female, Prof. Wang, Prof. Shi, Prof. Xu, Prof. Tang, Prof. Yang}. 3. Similar concept sets are: • University_Set = {Renmin Univer- sity, Fudan University, Nanjing Uni versity, Peking University, Sichuan University}. • database_Set = {database theory, special database}. • data_Set = {database theory, spe- cial database, data mining, data warehouse}. • Prof._Set = {Prof. Wang, Prof. Shi, Prof. Xu, Prof. Tang, Prof. Yang}. Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 11 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. • Sex_Set = {male, female}. 4. Insert w3 and w7 after obtaining the con- cept tree. The final concept tree is as shown in Figure 2(c).• Group-Wised Algorithm Level Generating Algorithm The Naive Algorithm (Algorithm 3) is simple and with acceptable speed, but the concept hierarchy cannot be modified once it is constructed. In particular, step (5) of the Naive Algorithm only inserts Web tutor units according to similarity; thus it cannot change the concept hierarchy. To develop a more flexible algorithm, we de- fine the following: Definition 8 (multi-level tutor units). 1. The tutor unit at 0-th level, v0, is a usual Web tutor unit v, and v0. Keywords = v.Keywords. 2. Let v1, v2, ..., vn be the tutor unit at k-th level. Then (k+1)-th level tutor unit v(k+1) composed from v1, v2, ..., vn is a Web unit satisfying following conditions: a. v(k+1) has exactly n children units v1, v2, ..., vn , i.e., *pChildrenUnit[k]=vk, for k=1,2,…n. b. v(k+1).Keywords =)”vi.Keywords, ( )” for all i, 0 0, and the personalized keyword set PersKeySet. Output: Tutor unit at (k+1)-th level: v1 (k+1), v2 (k+1), ..., vn (k+1) , such that the simi- larity within each vi (k+1) is not less than δ. Procedure: Level_Generate(k+1) 1. CurrWTUnitSet = GD_Sort ({v1 k, v2 k, ..., vn k}, ti); // ti and GD_Sort same as in Definition 4 // Initialize CurrWTUnitSet as the input and ordered by descending personalized intensity 2. OutputWTUnitSet = NULL; 3. for (each vi k in CurrWTUnitSet ) { 4. CurrWTUnitSet = CurrWTUnitSet – {vi k}; // to avoid dead loop 5. v = new WTUnit; // generate a new Web tutor unit v as buffer 6. v.Keywords= PersKeySet∩vi. Key- words; // Initialize the keywords of new WTUnit. 7. v.pChildrenUnit = {vi k}; // insert vi k as the first Child-tutor unit of v. 8. v.ObjTitle = “Title_I”; // set default title; it is modifiable 9. for (each vj k in CurrWTUnitSet) 12 Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. 10. if (GroupSimilarity (vi k, vj k) > δ) { / personalized similarity, see Def. 3 (5) 11. CurrWTUnitSet = CurrWTUnitSet – {vj k}; 12. v.pChildrenUnit = v.pChildrenUnit ∪ {vj k}; // insert vj k as the child tutor Unit of v. 13. v.Keywords=v.Keywords Çvj k.Keywords; 14. } // end of if 15. } // end of for 16. For each pair (vj p ,vj q ) in the set { vj k the number of Children Units in v) then v = v ∪ {ε}; // ε is a zero unit 21. v(k+1) = v; // it is web tutor unit of level (k+1) 22. OutputWTUnitSet = OutputWTUnitSet ∪{v(k+1)}; 23. } 24. Output OutputWTUnitSet; • Proposition 2. Let n be the number of the k-th level Web units inputted to Algorithm 4. The complexity of Algorithm 4 is O(n2). Proof. In Algorithm 4, the complexity of line 1 is O(n×log(n)). The complexity of lines 3 to 15 is O(n). Line 17 (function GenerateConcept_ Similarity) will be called for at most n(n-1)/2 times and its cost is O(n2). The cost comparisons in line 18 are no greater than O(n2). Thus the total complexity can be evaluated as O(n2). • In order to simplify the algorithm, a zero tutor unit e is introduced here, which can be viewed as a bookmark pointing to an empty URL. ε is inserted to OutputWTUnitSet if it has only one child unit. In the following, Algorithm 5 will call the procedure Level_Generate (k+1) from k = 0 until its output set equals the input set. To simplify Algorithm 5, we need the concept of embedded level tree. Suppose that there are three sets: A = {1, 2, 3, 4}, B = {{1, 2}, {3, 4}}, and C = {1, {2, {3, 4}}}. The results of expanding them are shown in Figures 3(a), 3(b), and 3(c), which are called embedded level tree. The procedure to expand embedded level tree is denoted as Tree_Expand, the details of which are omitted due to its simplicity. Algorithm 5 (Group-wis3ed Algo- rithm for generating multi-level Web tutor tree) Input: 0-level Web tutor unit set {v1 0, v2 0, ..., vn 0}, threshold of similarity δ > 0, and personalized keyword set PersKeySet. Output: Web tutor tree with multi- level and its root node. Procedure: 1. InputWTUnitSet = {v1 0, v2 0, ..., vn 0}; 2. OutputWTUnitSet = NULL; 3. Root = NULL; / 1 {1, 2}1 1 2 2 {2, {3, 4}} 2 3 {3, 4}3 {3, 4}3 4 4 4 (a) A (b) B (c) C Figure 3: Embedded Level Tree Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 13 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. / Init multi-level Web tutor unit 4. k = 0; 5. while (k ≥ 0) { 6. CurrWTUnitSet = InputWTUnitSet; // Initialize it 7. Level_Generate(k+1); // Algorithm 4, generate Web tutor unit of (k+1)-th level 8. if (OutputWTUnitSet = = InputWTUnitSet) { 9. Insert each element of OutputWTUnit- Set to Root; 10. Exit; 11.}else InputWTUnitSet = OutputWTUnit- Set; 12.} // end of while 13. Tree_Expand(Root); //Use Level_Expand to expand Root to a tree; see Figure 3 14. Delete all zero units e in root and out put it; • Proposition 3. Let n be the number of Web tutor units considered. The com- plexity of Algorithm 5 is O(n3). Proof. In Algorithm 5, the complexity of lines 1-4 and lines 13-14 are O(n). In the “while” loop, line 7 incurs the maxi- mum complexity. The complexity of other lines are O(n). In the worst case, n Web tutor units can be embedded in n levels. Thus the while statement will loop at most n times. By Proposition 2, for each call of line 7, the complexity is O(n2). Thus the total complexity can be evalu- ated as O(n3) . • Comparison of Algorithms 3 and 5 Tables 4-7 list the comparison of Al- gorithm 3 (the Naive Algorithm) and Algo- rithm 5 (the Group-Wised Algorithm). There is an interesting observation that Algorithm 5 looks simpler than Algorithm 3, but with higher efficiency. The reason is that Algorithm 4 (which generates the (k+1)-th level WTUnit) has already ab- sorbed the difficulties. EXPERIMENTAL RESULTS AND ANALYSES We have done an initial experimental study based on an implementation of Algo- rithm 5. To avoid the side effect of net- work bottleneck, we first downloaded the selected Web tutor units from the Web. The format is as illustrated in Figure 1. During the testing, all the inputs are available from a local computer. The experimental result of algorithm 5 is shown in Figure 4. The descriptions of the experiment are given in Figure 4. Input: 0-level web tutor units Algorithm 3 Algorithm 5 WTUnitSet, TrainWTUintSet, threshold of similarity δ >0, PersKeySet WTUnitSet, as 0-level web tutor unit set (no need for TrainWTUintSet), threshold of similarity δ > 0, PersKeySet. Table 4: The Input of Algorithms 3 and 5 {u8, u6, u7} u8 u6 u7 u4 u5 Figure 4. Calculated Result 14 Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. WTUnitSet = {u4, u5, u6, u7, u8}; k = 0.7, c = 0.1, a = 0.2, δ = 0.3; PersKeySet = {Data Mining, Association Rule}. The item “key- words” and “field (of research)” and ab- stracts of all articles in Figure 1 are taken as the Keywords of the tutor units. Output: Shown in Figure 4. Stepwise Analysis: Because there is no child tutor unit in those WTUnits, the children similarity of all WTUnits C = 0. The program based on the algorithm can be traced in a stepwise manner, which al- lows us to view the intermediate results as follows: 1. k = 0, call Level_Generate (1): After GD_Sort(WTUnitSet, t i), CurrWTUnitSet = {u8, u6, u7, u4, u5}. Using the functions K(x,y),C(x,y) and A(x,y) defined in the Definition 3, we have: K(u8, u6) = 2/2, A(u8, u6) = 0, we get Group_Similarity (u8, u6) = 0.7´2/2+0.2´0 = 0.7, Group_Similarity (u8, u7) = 0.7´1/ 2 = 0.35 and Group_Similarity (u8, u4) = 0. Since K(u8, u5) = 0 and A(u8, u5)=1/2+1/ 4+1/8+1/16=15/16, we get Group_Similarity(u8, u5)=0.7´0+0.2´15/ 16=0.19. Since Group_Similarity (u8, u6) = 0.7>δ and Group_Similarity (u8, u7) = 0.35>δ, v1 1={u8, u6, u7} and v1 1.Keywords = {Data Mining}, where CurrWTUnitSet = {u4, u5}. The same as above, we have v2 1={u4, ε}, v21.Keywords = ∅, v3 1={u5, ε}, and v31.Keywords = ∅. So OutputWT UnitSet = {{u8, u6, u7}, {u4, ε}, {u5, ε}}. Since OutputWTUnitSet and InputWT UnitSet are of different values, we have the further step below: 2. k = 1, InputWTUnitSet = Output WTUnitSet, call Level_Generate (2). After calling Level_Generate (2), the set OutputWTUnitSet is equal to the set InputWTUnitSet. Hence, Root = OutputWTUnitSet = {{u8, u6, u7}, {u4, ε}, {u5, ε}}. 3. Deleting all zero Web tutor units, we have Root = {{u8, u6, u7}, {u4}, {u5}}. Algorithm 3 Algorithm 5 Topic hierarchy model, such as field University PaperID, Web Tutor tree (cf. Figure 1 (c)) Web tutor tree Algorithm 3 Algorithm 5 Calls procedure Group_Similarity (...), and meta concept base. It works in the style of supervised classification. Calls Algorithm 4 to generate tutor unit of (k+1)-th level from units of k-th level. It works in the style of unsupervised classification. Algorithm 3 Algorithm 5 Primary training set Candidate concept Set tutor concept hierarchy model by Alg 1 tutor concept Tree by Alg 2. 0-th Level obj 1 Level obj ,... K Level obj by Alg 4. Table 5: The output of Algorithms 3 and 5 Table 6: The calling features of Algorithms 3 and 5 Table 7. The Data-flow features of Algorithms 3 and 5 Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 15 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Then, call Tree_Expand (Root), and the final result is as shown in Figure 4. CONCLUSION In this paper, we have presented con- cepts for web-based group-wised distance learning, such as Web Tutor Unit with multi-levels, its Similarity, and Personal- ized Intensity. We presented five algo- rithms, including the Naive Algorithm for simple Web tutor tree, Level-Generate Al- gorithm to generate Web tutor concept of k+1 level, and Group-wised Algorithm for personalized Web tutor tree. Experimental results are provided to illustrate our ap- proach. We are currently studying several further research issues such as perfor- mance improvement, and the design and incorporation of the weighting concept into the personalized Web tutor tree. ACKNOWLEDGMENTS This project is in part supported by a CERG grant from the Research Grants Council of Hong Kong (RGC Reference Number: CPHK 77/95E) and the National Science Foundation of China grant #60073046. REFERENCES Agrawal, A., & Mannila, H. (1996). Fast Discovery of Association Rules. Ad- vances in Knowledge Discovery and Data Mining, AAAI Press and The MIT Press. Agrawal, A., Ling, K., Harpreet, S., & Shim, S. (1995). Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. Proc. of VLDB. Chim, J., Lau, R.W.H., Si, A., Leong, H.V., Green, M., & Lam, M. (1998).Multi- Resolution Model Transmission in Distrib- uted Virtual Environments. Proceedings of ACM VRST’99, 25-34. Fayyad, U. & Piatetsky-Shapiro, G. (1996). Advances in Knowledge Discover and Data Mining (Eds), AAAI Press and The MIT Press, 1-5. Guo, Y., Zhang, T., Yin, H., & Tang, C. (2000). The Implementation of Data Warehouse ETDW in Distance Education System E-Teacher. Proc. of National Database Conference of China (NDBC’00), B-Series, 249-252. Jiang, M., Tseng, S., & Tsai, C. (1999). Discovering Structure from Document Databases. Proc. of PAKDD’99, 169-173. Mannila, H. & Toivonen, H. (1999). Discovering Generalized Episodes Using Minimal Occurrence. Proc. of Int’ Conf. on Knowledge Discovery and Data Min- ing. Song, W., Cheung, D., & Tan, C. (2000). A Semantic Similarity Approach to Electronic Document Modeling and Inte- gration. Proceedings of International Conference on Web Information System Engineering (WISE’00), 108-116. Spertus, E. (1997). Parasite: Mining Structural Information on the Web. Proc.of International World Wide Web Confer- ence. Tang, C., Yu, Z., You, Z., Zhang, T., & Lu, Y. (2000). Mine the Quasi-Periodic- ity From Web Data. Journal of Computer, 23(1), 52-59. Tang, C., Lau, R.W.H., Yin, H., Li, Q., Lu, Y., Yu, Z., Xiang, L., & Zhang, T. (1999). Discovering Tendency Association Between Objects with Relaxed Periodic- ity and its Application in Seismology. Proc. of ICSC’99, LNCS 1749, 51-62. 16 Journal of Distance Education Technologies, 1(2), 1-16, Apr-June 2003 Copyright © 2003, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. Changjie Tang received his MSc. from the Department of Mathematics, Sichuan University in 1981. He is a Professor and the Head of Computer Science Department of Sichuan University. His current interests are in Web data mining. He has published eight books and more than 100 research papers in journals and international conferences, such as FODO, IFIP, SIGMOD, DASFAA, ICSC, TCS (Theoretical Computer Science), LNCS, JOS, JCST, SC (Science of China). Prof. Tang has served as a PC member of VLDV’97, DASFAA’99, ICSC’99, Pakdd2001, DASFAA2001, WAIM2000, WAIM02 and WAIM03. He is also a vice director of the Database Society of Chinese Computer Federation.(Email: chjtang2002@sohu.com or chjtang@scu.edu.cn) Qing Li received his BEng. degree from Hunan University (Changsha, China), MSc and PhD degrees from the University of Souther California (Los Angeles, USA), all in computer science. He is currently an Associate Professor at the City University of Hong Kong, as well as an adjunct Professor of the Hunan University. His research interests include database modeling, multimedia retrieval and management, and e-learning systems. Dr Li has published over 100 papers in technical journals and international conferences in these areas, and is actively involved in the research community by serving as a journal reviewer, programme committee chair/co-chair, and as an organizer/co-organizer of several international conferences. Currently he serves as a councillor of the Database Society of Chinese Computer Federation, and as a Steering Committee member of the international WISE Society.(Email: itqli@cityu.edu.hk) Rynson W.H. Lau received a (top) first class honors degree in Computer Systems Engineering from the University of Kent at Canterbury, England, and a Ph.D. degree in Computer Graphics from the University of Cambridge, England. He is currently an Associate Professor at the City University of Hong Kong. His research interests include computer graphics, virtual reality and multimedia systems.(Email: rynson@cs.cityu.edu.hk) Tianqing Zhang received the B.E. and M.E degree in computer science from Sichuan University, China, in 1993 and 1996 respectively. He is currently a PhD candidate in the Computer Science Department, Sichuan University. His main research interests are database systems. (Email: zhangtianqing@cs.scu.edu.cn) Danny Kilis received advanced academic training in both Computer Science as well as Electrical Engineering fields from the University of Minnesota, Minneapolis. He has over 17 years of experience in leading and managing eCommerce and Information Technology projects. His work experience spans across a wide spectrum of functional areas, including eCommerce (B2B and B2C) product development, pre- and post-sales consulting, product marketing and academic research. He is currently the Senior Vice President for the eProduct Development and Services teams in Hong Kong and Shenzhen (China) for PCCW. His teams focus in developing the company’s strategic eCommerce software products. He has pioneered the product strategy on the company’s ConXerto Product Suite (http://conxerto.pccw.com,) which is a set of software products targeted for the collaborative commerce area. In addition to his product role, Dr. Kilis advocates the group’s object-oriented and component development strategies and processes. He is well experienced in such technologies as Internet/extranet architecture, object-orientation, component software engineering, software reuse, and artificial intelligence. Prior to joining PCCW, he also co-founded an offshore eCommerce Solutions Development Center in Manila for GE Information Services, which, in turn, developed the company’s global eCommerce software products. He also single-authored some U.S. patents on software technology for IBM.