NEWJGIMfront 36 Journal of Distance Education Technologies, 4(3), 36-47, July-September 2006 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. ABSTRACT Information retrieval in the context of virtual universities deals with the representation, organization, and access to learning objects. The representation and organization of learning objects should provide the learner with an easy access to the learning objects. In this article, we give an overview of the ONES system, and analyze the relevance of two information retrieval models for virtual universities. We argue that keywords based search (i.e., the Boolean model), though well suited for Web searches, is overly coarse for virtual universities. Instead, the vector model, on which our implemented search engine is also based on, seems to be more appropriate as it provides similarity measure (i.e., the learning object having the best match is presented first). We also compare the performance of four algorithms for computing the similarities (matching). Keywords: algorithms; case study; distance learning; information retrieval; Web-based education Information Retrieval in Virtual Universities Juha Puustjärvi, Helsinki University of Technology, Finland Päivi Pöyry, Helsinki University of Technology, Finland INTRODUCTION Today people in all professions are faced with increasing demands. Technology devel- ops in an ever-increasing speed, and the roles of people in work, society, and industry are shift- ing constantly. Keeping up with the pace of change requires continuous education and learn- ing. Traditional campus-universities are trying to answer to this need of lifelong learning by building virtual universities, whilst facing com- petition from the commercial continuing edu- cation providers in the form of e-learning. E-learning can be defined as information technology enabled and supported form of dis- tance learning, in which the traditional restric- tions of classroom learning have disappeared. The main tool of e-learning is a personal com- puter, and the Internet serves as the principal communication and distribution channel. The learners can participate in online Web-based courses and interact with both the peers, in- structors, and the learning materials. E-learning sets new requirements for uni- versities: they have to build global learning in- IDEA GROUP PUBLISHING This paper appears in the publication, International Journal Distance Education Technologies, Volume 4, Issue 3 edited by Shi-Kuo Chang and Timothy Shih © 2006, Idea Group Inc. 701 E. Chocolate Avenue, Suite 200, Hershey PA 17033-1240, USA Tel: 717/533-8845; Fax 717/533-8661; URL-http://www.idea-group.com ITJ3292 Journal of Distance Education Technologies, 4(3), 36-47, July-September 2006 37 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. frastructures, course material has to be in digi- tal form, course material has to be distributed, and learners must have access to various vir- tual universities. As single virtual universities are inde- pendently created, they may provide very het- erogeneous functionalities and user interfaces. Ideally, the learner should be able to access all the virtual universities in a similar way (i.e., the heterogeneity of various virtual universities should not burden the learner). How this goal can be achieved is the main topic of the ONES- project. Consequently, the main functions of the ONES system are to hide the distribution of e-learning portals, and to hide the semantic het- erogeneity (i.e., problems arising from using same words in different meaning and vice versa). In order to achieve these goals, the sys- tem will deploy many new technologies such as “one-stop portals,” Web services, service oriented architecture, RDF-based annotation, ontology editors, and distance measures in searching learning objects. In this article, we will restrict ourselves on the role of searches in the ONES-system. In particular, we will analyze the applicability of different information retrieval technologies. Our main argument is that the technology based on the Boolean model (Yan & Garcia-Molina, 1994), though well suited for searches in the Web, is not suitable for the emerging virtual universi- ties. Instead, for virtual universities we have to develop methods, which allow learners to be more concerned with retrieving information about a subject than with retrieving data, which satisfy a given query. For example, a learner may be interested in courses dealing with ob- ject-oriented programming rather than in the courses where the term “java” or “C++” is stated. When searching for information about a subject (e.g., object oriented programming) the search engine must somehow interpret the metadata of the learning objects and rank them according to a degree of relevance to the learner’s query. The primary goal is to retrieve all the learning objects, which are relevant to a learner’s query while retrieving as few non-rel- evant objects as possible. Unfortunately, char- acterization of the learner’s information need is not a simple task. Furthermore, the difficulty is not only in expressing the information need but also in knowing how the learning objects should be characterized with the help of the metadata descriptions. The rest of this article is organized as follows. First, in the second section we give an overview of the architecture of the ONES-sys- tem. In the third section we characterize virtual universities. In particular, we will give an over- view of the e-learning environment, and specify what the notion of resource-based learning in- corporates. Then, in the fourth section, the role of metadata and ontologies in virtual universi- ties is illustrated. In addition, the usability of the Boolean and the vector model in a virtual university is analyzed. Especially, two interpre- tations of a hierarchical ontology in the context of the vector model, called weighted leaves and multilevel weighting, are introduced. Then, in the fifth section, the performance of four match- ing algorithms based on weighted leaves and multilevel weighting principles is compared. Finally, the sixth section concludes the article by summarizing the feasibility of the proposed ideas. THE ARCHITECTURE OF THE ONES SYSTEM The name ONES stands for One Stop e- learning Portal. As this name suggests, a sa- lient feature of the system is the aggregation of distance learning information from different learning sources in one portal. The idea of the one-stop portals originated from one-stop shops, and later on it is also adopted in e-gov- ernment applications. All one-stop applications have the same goal: hide the heterogeneity and distribution of local systems. So, from user’s point of view one-stop portal behaves like a centralized system. The four main components of the ONES- system are (see Figure 1): • Aggregation portal (mediator), • Wrappers, • E-learning portals, and • Course providers’ tools. 38 Journal of Distance Education Technologies, 4(3), 36-47, July-September 2006 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. The aggregation portal supports the learners in searching the courses that match to their specific needs. It differs from traditional database interfaces in a way that in addition to the traditional database queries it supports fuzzy queries. Fuzzy queries are similarity based, which means that if the similarity between the courses’ profiles and the learner’s query exceeds a certain threshold, they are said to match. A problem is that the current database manage- ment systems do not support fuzzy queries and therefore the ONES-system has to support them. From technological point of view, the aggregation portal is a mediator (Garcia- Molina, Ullman, & Widom, 2000). It supports a virtual view that integrates several learning sources in much the same way as data ware- houses do. However, since the mediator does not store any data, the mechanisms of media- tors and warehouses are rather different. Since the mediator has no data of its own, it must get the relevant data from its sources and use that data to form the answer to the learner’s query. As the data sources (e-learning portals) are in- dependently created it is obvious that they pro- vide heterogeneous interfaces (e.g., they may provide different kind of functionalities or the same functionalities are provided by different operations). In order to hide this heterogeneity there is a wrapper (Garcia-Molina et al., 2000) be- tween the mediator and each e-learning portal. So a wrapper is a software module that extracts data from local e-learning portals. This implies that the wrapper must be able to accept a vari- ety of queries from the mediator and translate any of them to the terms of local eLearning por- tal. The wrapper must also communicate the result to the mediator. An important point is that each wrapper provides equal functionality for the mediator. Ideally, each wrapper provides an interface for requesting the metadata of learning objects (i.e., descriptive information of courses, course packages and programs of- fered by educational institutions, e.g., univer- sities). From a technological point of view, each e-learning portal is a Web service (Vasudevan, 2001). Web services are self-describing modu- lar applications that can be published, located, and invoked across the Web. Once a service is deployed, other applications (e.g., an aggrega- tion portal) can invoke the deployed service. In Le a rn e r Le a rn e r A g g re g a t io n p o rt a l (a m e d ia t o r ) W ra p p e r W ra p p e r e Le a rn in g p o rt a l e Le a rn in g p o rt a l C o u rs e p ro v id e r’s t o o l C o u rs e p ro v id e r’s t o o l C o u rs e p ro v id e r C o u rs e p ro v id e r Figure 1. ONES-architecture Journal of Distance Education Technologies, 4(3), 36-47, July-September 2006 39 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. general, a Web service can be anything from a simple request to complicated business pro- cess. A course provider can enter data about a course through the course provider’s tool. The main function of this tool is to provide an inter- face, which facilitates the creation of the metadata attached to learning objects. Basically, this tool is analogous to the tools that support the content providers of electronic newspapers (Yli-Koivisto & Puustjärvi, 2002) in creating metadata items to news articles. The tool may even generate suggestions of the suitable metadata items, after which the author can make the necessary modifications and enter this in- formation to the system. CHARACTERISTICS OF VIRTUAL UNIVERSITIES E-Learning Environment E-learning can be defined as information technology enabled and supported form of dis- tance learning, in which the traditional restric- tions of classroom learning have disappeared (Liu, Chan, Hung, & Lee, 2002). The main tool of e-learning is a personal computer, and the Internet servers as the principal communica- tion and distribution channel. The learners can participate in online Web-based courses and interact with both the peers and instructors and with the learning materials. The teacher- centeredness of traditional learning does not hold for e-learning, where the learning process has become more and more learner centered. The learning process and the resources may be customized according to the individual needs of the learner. At the same time, the role of the teacher becomes that of a facilitator or of a men- tor guiding and supporting the individual pro- cess of learning (Liu et al., 2002). Typical e-learning environments, such as WebCT and Virtual-U, offer the basic elements for delivering e-learning courses: course con- tent delivery tools, synchronous and asynchro- nous discussion forums and conferencing sys- tems, possibilities for quizzes and polling, workspaces for sharing resources, white boards, possibilities for evaluation and grading, log- books, possibilities for submitting assignments, and so forth (Liu et al., 2002). Studying in Virtual Universities In the recent years, the idea of a virtual university has been becoming more and more popular in many countries all over the world. The enormous development in the field of in- formation and communication technologies has enabled the rise of e-learning and virtual learn- ing environments. As a result, the traditional universities have faced a new challenge emerg- ing from the commercial sector of education. There is a growing need for new kind of learn- ing and teaching as the technology advances rapidly and the skills and competencies required in the working life become more demanding and increasingly dynamic. Virtual university has been defined as a space where the students are provided with higher education courses with the help of the newest information and communication tech- nology (Niemi, 2002). The degree of utilizing technology in organizing the studies may vary from pure technology-based studies to face- to-face or mixed studies that are supported by learning technologies. The main channel of communication and delivery of teaching is the Internet (Niemi, 2002; Ryan, Scott, Freeman, & Patel, 2000). Thus, a virtual university can be seen as closely re- lated to e-learning that provides learning op- portunities via the Internet. The difference be- tween these two concepts is the level of stud- ies offered; virtual university is aimed to offer higher education studies while e-learning can be used for all educational levels. A virtual university may be an institution that uses the information and communication technologies for its core activities such as pro- viding learning opportunities, administration, materials development and distribution, deliv- ering teaching and tuition, and providing coun- seling, advising and examinations. On the other hand, a virtual university may also be a virtual organization created through partnerships be- tween traditional universities and other educa- tional institutes. In addition, the traditional cam- 40 Journal of Distance Education Technologies, 4(3), 36-47, July-September 2006 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. pus universities may be regarded as virtual uni- versities if they offer learning opportunities via the Internet or combine traditional ways of learning with e-learning (Ryan et al., 2000). Virtual universities are expected to offer opportunities for life-long learning for audi- ences otherwise excluded from university stud- ies. The emerging virtual university can be seen very beneficial especially for the industry, when technology-supported learning can be brought to the workplaces and integrated more closely to work. Moreover, virtual university can en- hance organizational learning and bring com- petitive advantage by continuously develop- ing the skills and knowledge of the employees (Teare, Davies, & Sandelands, 1999). Resource-Based Learning The Internet is able to store and transmit vast amounts of information in different forms and formats. Therefore the Internet is an ideal support for resource-based learning (RBL) that is one of the corner stones of learning and teaching in the virtual university. RBL has been defined a student-centered way of learning that exploits various specially designed learning materials, interactive media and technologies. RBL can be realized as self-study or as interac- tive group learning both in distance and in the face-to-face mode (Ryan et al., 2000). The Internet can be used to enable and support RBL in several ways (Ryan et al., 2000): • Courses can be delivered via the Internet. • Resources can be identified and used. • Internet serves as a communication and conferencing channel. • Learning activities and assessment can be done in the Net. • Collaborative work is enabled. • Student management and support is enabled. In the next section, we focus on the start- ing point of RBL, namely on searching learning resources. INFORMATION RETRIEVAL MODELS Information retrieval in the context of vir- tual universities deals with the representation, organization, and access to learning objects. The representation and organization of learn- ing objects should provide the learner with an easy access to the learning objects. The sys- tem retrieves all the learning objects, which are relevant to learner while retrieving as few non- relevant learning objects as possible In this section, we will analyze the use- fulness of different information retrieval mod- els (Baeza-Yates & Ribeiro-Neto, 1999) for a vir- tual university. The used model determines the way the metadata of the learning objects are given as well as the way the learner’s queries (information needs) are presented. Before ana- lyzing the information retrieval models we char- acterize the role of metadata and ontologies in virtual universities. Metadata and Ontologies In order to transfer data seamlessly and efficiently in the virtual university, there has to be a standard way for both people and comput- ers to communicate all necessary knowledge, with both people and computer systems (Stojanovic, Staab, & Studer, 2001). One pos- sible solution is to use metadata and an ontol- ogy attached to it for describing the learning objects. The term metadata has variable interpre- tations depending upon the circumstances in which it is used. For example, in the context of documents the common forms of metadata in- clude the author(s), the source of publication, the length of document, and so forth. This kind of metadata in commonly called descriptive metadata. For example, the metadata elements of the Dublin Core (Pöyry, Pelto-Aho, & Puustjärvi, 2002) represent descriptive metadata. Educational metadata is needed for im- proving the retrieval of learning objects, for supporting the management of collections of learning objects, and for supporting the deci- sion process of the learners looking for educa- Journal of Distance Education Technologies, 4(3), 36-47, July-September 2006 41 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. tional resources. LOM seems to be the most powerful and most widely used metadata stan- dard for educational information systems (Holzinger, Kleinberger, & Müller, 2001; Lamminaho, 2000). More generally, educational metadata can be used by educational institutes and professionals as well as by learners in or- der to describe (e.g., the content, structures, and relationships of the learning objects and to search for educational objects) (Lamminaho, 2000; Stojanovic et al., 2001). Educational metadata may describe any class of educational objects, such as study courses. The pedagogical features of the course, the contents, special target groups, and the technical requirements of the study course can be described with the help of a metadata schema (Lamminaho, 2000). More generally, educational metadata can be used to describe, for example, the content, structures, and rela- tionships of the learning objects (Stojanovic et al. 2001). Educational metadata can be utilized by educational and pedagogical professionals, by the institutions offering education, and by the students searching for education. Well-de- signed and sufficient metadata aid the decision making process of the students and help the educational institutions to provide suitable in- formation about their educational supply (Lamminaho, 2000). Educational metadata is very much semantic metadata, but a thorough metadata schema must include also at least structural metadata in order to be able to de- scribe the learning objects efficiently. The idea of using standardized metadata schemas is being able to develop universally applicable tools dealing with the metadata de- scriptions of the learning objects. In order to create metadata records containing the resource descriptions specific tools are needed for cre- ating the metadata according to the standards (Kassanke, El-Saddik, & Steinacker, 2001). Metadata is also useful when guiding non-ex- perienced users through a large collection of learning resources (Strijker, 2001). Moreover, metadata is seen as value-added information that is used to arrange, describe, track or other- wise enhance the access to the object content. At the moment metadata becoming increasingly important when digital government and e-com- merce are emerging. Metadata enables in- creased accessibility, expanded use of objects, multi-versioning, and system improvement. The granularity of metadata, which refers to the level of details in the description, is an important question when developing a metadata set (Gilliland-Swetland, 2000). A salient feature of descriptive metadata is that it is external to the meaning of the docu- ment, (i.e., it describes the creation of the docu- ment rather than the content of the document). The metadata describing the content of the document is commonly called semantic metadata. For example, the keywords attached to many scientific articles represent semantic metadata (Jokela, 2001). An ontology provides a general vocabu- lary of a certain domain (Fridman & McGuinness, 2001), and it can be defined as “an explicit specification of a conceptualisation” (Gruber, 1993). In essence, an ontology gives the semantics to the metadata. Ontologies are formal, explicit, and shared specifications of some conceptualizations. Formal means that the on- tology should be machine readable, and explicit refers to having defined the types of concepts and the constraints on their use are explicitly defined. Shared refers to the fact that an ontol- ogy must reach a consensus (Fensel, 2001). Ontologies together with metadata enhance efficient access to information by offering pos- sibilities to organize and categorize the content of the information system in question. In this context an ontology is defined as a means to formalize and to specify a common terminology for a defined area of interest (Turpeinen, 2000). In order to standardize semantic metadata specific ontologies are introduced in many dis- ciplines. Typically, such ontologies are hierar- chical taxonomies of terms describing certain topics. For example, the ACM Computing Clas- sification System is a hierarchy (a tree) in which the nodes represent the classes of the tax- onomy. In Figure 2, a subset of that hierarchy is represented. 42 Journal of Distance Education Technologies, 4(3), 36-47, July-September 2006 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. The Boolean Model Applying the Boolean model in searches requires that each learning object is augmented by a set of metadata items such as keywords or classification identifiers (e.g., the searches in the CUBER system (Pöyry et al., 2002; Pöyry & Puustjärvi, 2003) are based on the Boolean model). A learner can then query learning ob- jects by Boolean expressions comprising of operands and operations. The operands are the used keywords and the operators are typically “and,” “or,” and “not.” For example, by using ACM Computing Classification system (Fig- ure 2) the keywords attached to a learning ob- ject might be D, H.1, and H.2.2 (corresponding the keywords Software, Models and Principles, and Physical Design). Now, if a learner pre- sents the query “D and (B or H.1)” (i.e., learn- ing objects having the keyword “Software” and at least one of the keywords “Hardware” and “Models and Principles”), then the previous learning object will match that query. The Boolean model is intuitive and clear. Moreover, it can be efficiently implemented even in the case of huge amount of objects. For example, many Web search engines are based on this model. However, using that model in a virtual university gives rise to following draw- backs: • First, the model is based on a binary deci- sion criterion, meaning that each learning object is predicted to be relevant or non- relevant. In reality, it is obvious that the re- sulting learning objects fit more or less to the query (i.e., some kind of grading should be possible). • Second, expressing the requirements of learning objects by a Boolean expression may be difficult. • Third, a typical problem concerning search engines based on the Boolean model is that either the result of the query includes too many or too few learning objects. In the next section, we consider a more advanced model, which avoids many of the drawbacks just described. The Vector Model The vector model differs from the Bool- ean model in that weights can be assigned to each metadata item of a document as well as to the keywords of the query. The idea behind this model is that we can more accurately specify the queries and the contents of the documents (e.g., learning objects). Assuming that the standard metadata items (e.g., the classes in Figure 2) specify a vector space (i.e., each item (keyword) in the S u b je c t H . In fo r m a t io n S y s t e m s D . S o ft w a re B . H a rd w a re H . 1 . M o d e ls a n d P r in c ip le s H . 2 . D a t a b a s e M a n a g e m e n t H . 2 . 2 . P h y s ic a l D e s ig n H . 2 . 1 . Lo g ic a l D e s ig n H . 2 . 3 . La n g u a g e s H . 2 . 4 . S y s t e m s Figure 2. A subset of the ACM Computing Classification System Journal of Distance Education Technologies, 4(3), 36-47, July-September 2006 43 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. hierarchy represents a dimension in the vector space), we can represent each document and query as a vector in that vector space. Then we can process the query by computing the dis- tance of the query vector and the document vectors. This kind of computing requires that the sum of the weights of each document and query equals to a predefined constant. For con- venience, the used constant is usually one. As the result of the query the documents are sorted in the order determined by the simi- larity (i.e., the document having the best match with the query is presented first). The number of the documents in the result should be re- stricted by requiring a certain degree of similar- ity. Using the vector model in a virtual uni- versity requires that the course provider as- sign the metadata items and their weights into each learning object. The metadata items to be used are selected from the used domain ontol- ogy. Depending on the used course provider’s interface this can be done in various ways. For example, as in our prototype system, there may be an ontology structure on which the course provider inserts the weights. In Figure 3, the ontology structure of the Figure 2 is augmented by setting weights on the nodes “B.H.2,” and “H.2.2.” Note that the node having no weight means that its weight is actually zero. Hence, the profile of the learning object can be pre- sented by a vector in 9-dimensional vector space as follows: [0 x D, 0 x H, 0.3 x B, 0 x H.1, 0.6 x H.2, 0 x H.2.1, 0.1 x H.2.2, 0 x H.2.3, 0 x H.2.4]. That is, the profile is a point in an orthogonal 9- dimensional vector space. The gain of attaching metadata descrip- tion for learning objects is that we can use math- ematical distance measures in computing learn- ers’ queries. Further, computing the distance requires that the descriptions (vectors) be speci- fied in an orthogonal vector space. In other words, the nodes in the hierarchy that are used in profile vectors must be independent. In prac- tice this means that we have to follow one or the other of the following interpretations: • Multilevel weighting interpretation: The leaves and the nodes of the ontology hier- archy represent independent concepts. • Weighted leaves interpretation: The parent node represents the union of its siblings. In other words, each sibling represents a sub- set of its parent. Yet the siblings represent independent concepts. The intuition behind multilevel weight- ing is that we can express the level of a leaning object (as well of a query) by altering the weights on a node and its siblings. To illustrate this let S u b je c t H . In f o r m a t io n S y s t e m s D . S o ft w a re H . 1 . M o d e ls a n d P rin c ip le s H . 2 . D a t a b a s e M a n a g e m e n t 0 .6 H . 2 . 2 . P h y s ic a l D e s ig n 0 .1 H . 2 . 1 . Lo g ic a l D e s ig n H . 2 . 3 . La n g u a g e s H . 2 . 4 . S y s t e m s B . H a rd w a re 0 .3 Figure 3. A metadata specification of a learning object 44 Journal of Distance Education Technologies, 4(3), 36-47, July-September 2006 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. us consider the weighting of the course “Physi- cal design in database management systems.” Now, it is obvious that the weights should be given on the node H.2 (Database management) and its siblings H.2.2 (Physical design) and H.2.4 (Systems). Assuming that approximately half of the course deals with databases in gen- eral and the other part deals with physical de- sign and database management systems, then giving weight 0.4 to H.2 (Database manage- ment), 0.3 to H.2.2 (Physical design) and weight 0.3 to H.2.4 (Systems) could be an appropriate assignment. On the other hand, if the course is very specific then the weight of H.2 could be zero. If we follow the weighted leaves inter- pretation, then in determining the profile of a learning object weights are set only on the leave nodes of the hierarchy. Consequently, the profiles of the learning objects are specified by vectors in an orthogonal vector space, which is determined by the leave nodes of the hierar- chy. To illustrate this approach let us consider the weighting of the course “Physical design in database management systems.” In this case, all the weights are given on the nodes H.2.2 (Physical design) and H.2.4 (Systems) indepen- dently of the level of the course. PROCESSING LEARNER’S QUERIES The learner presents queries in the same way as the content provider determines the weights of the learning object; both these are presented by vectors. Hence the query presents an ideal profile of the learning objects that sat- isfy the learner’s requirements. For example, assuming that the multilevel weighting inter- pretation of the ontology is used, and a learner wants to find basic courses concerning data- base management. In this case the learner will set rather heavy weight on H.2 (database Management) and lighter weights on H.2.1 (Logical Design), H.2.2 (Physical Design) and H.2.3 (Languages). In contrast, if a student is looking more advanced courses on database management then the student will give a lighter weight on H.2 and heavier weights on its sib- lings. As the learners interact with the system by submitting queries it is reasonable to re- quire that the response times should be only a few seconds. We investigated the effects of different matching algorithms and the amount of stored learning objects on response times. The test environment was equipped with Pentium II processor and 192 MB memory. The computers were running the Sun Solaris 5.8 operating system. We implemented and tested four matching algorithms (i.e., algorithms) which compute the distance measures of learning ob- jects and learners’ queries. We next give a short description of the algorithms. The Cosine matching algorithm (Baeza- Yates et al., 1999) calculates the cosine mea- sure between the query (a vector) and the docu- ments profiles. As a matter of fact the algorithm does not compute distance measures but rather approximates distance measures by computing the angles of the query vector and the vectors representing documents, such as the learning objects. The Euclidean matching algorithm (Friedman, Bentley, & Finkel, 1977) calculates the Euclidean distance from the query profile to all learning objects’ profiles. The Manhat- tan distance algorithm (Bentley, Weide, & Yao, 1980) calculates a so called “city block-dis- tance.” The name comes from the fact that this measure in two dimensions tells how many blocks in a city one would have to walk be- tween two points. Our developed Fuzzy matching algo- rithm attempts to achieve more efficient match- ing procedure than the “exact” matching algo- rithms. The improved efficiency is achieved by performing the actual matching on a pre-selected subset of all learning objects. The predefined subset of the documents’ profiles is determined by choosing the three biggest weights from the query and then computing the subset based on these weights. Then only the profiles, the weights of which are within a specified toler- ance interval are selected for the final query processing. Therefore the result set is not guar- anteed to contain all the profiles that are clos- est to the matching profile. However, the close- Journal of Distance Education Technologies, 4(3), 36-47, July-September 2006 45 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. ness values of the profiles in the actual result set are exact, since they are calculated using the Euclidean measure. The computing time for matching of each algorithm is presented in Table 1. The test was performed for different amount (1000, 5000 and 10 000) of learning objects. Basically, the differ- ences of Euclidean, Cosine and Manhattan al- gorithms were rather small (less than 10%). Fuzzy matching algorithm required least com- puting time (about 20% less than others). How- ever, the test proves that all the algorithms are quick enough in the test environment as the response times are less than 1.2 seconds. If the number of the learning objects or the dimen- sions of the vector space (i.e., the used at- tributes in the profile) increases, then it obvi- ous that the Fuzzy Matching algorithm will be more superior to the other algorithms. In our test environment the vector space comprised of 15 dimensions (i.e., each profile could have at most 15 attributes). In practice, the number of attributes cannot increase significantly as otherwise the determining the weights for learn- ing objects would overly burden the coarse cre- ators. In addition, as the system is developed for universities it is not obvious that number of learning objects can be very huge (e.g., over 10,000). CONCLUSION Virtual university has been defined as a space where the students are provided with higher education courses with the help of the newest information and communication tech- nology (Niemi, 2002). The degree of utilizing technology in organizing the studies may vary from pure technology-based studies to face- to-face or mixed studies that are supported by learning technologies. A virtual university may be an institution that uses the information and communication technologies for its core activities such as pro- viding learning opportunities, administration, materials development and distribution, deliv- ering teaching and tuition, and providing coun- seling, advising and examinations. On the other hand, a virtual university may also be a virtual organization created through partnerships be- tween traditional universities and other educa- tional institutes. In addition, the traditional cam- pus universities may be regarded as virtual uni- versities if they offer learning opportunities via the Internet or combine traditional ways of learning with e-learning (Ryan et al., 2000). E-learning sets new requirements for uni- versities: they have to build global learning in- frastructures, course material has to be offered also in digital form, course material have to be distributed via the Internet and learners must have access to various virtual universities. A problem is that the current virtual university portals provide heterogeneous functionalities, which in turn hampers the learner in accessing various virtual universities. The main goal of the ONES-project is to investigate the ways of integrating various vir- tual universities in a way that such an aggre- gated virtual university would be as easily ac- cessible for a learner as a single virtual univer- sity. Achieving such a goal requires mutual understanding of the used technology and standardized descriptions of the learning ob- M a n ha tt a n C o s ine E uc lid e a n F u z z y 0 .8 0 0 .9 3 1 .0 7 0 .8 3 0 .9 6 1 .1 6 0 .8 3 0 .9 8 1 .2 3 0 .6 1 0 .7 3 0 .8 9 1 0 0 0 5 0 0 0 1 0 0 0 0 Table 1. Matching times for the algorithms 46 Journal of Distance Education Technologies, 4(3), 36-47, July-September 2006 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. jects. Furthermore, searching from various vir- tual universities requires mutual understand- ing of the information retrieval model to be used. We argue that keywords-based search (i.e., the Boolean model), though well suited for general Web searches, is unsuitable for the vir- tual universities’ purposes. Instead, the vector model (on which our implemented search en- gine is also based on) seems to be more appro- priate as it provides a similarity measure (i.e., the learning object having the best match is presented first. We also introduced two inter- pretations for the hierarchical ontologies, which allow increasing the power of the used metadata descriptions. And finally, we also compare the performance of four algorithms for computing the similarities of the profiles. It turned out that our developed Fuzzy Matching algorithm re- quires less computing time as the other “exact matching” algorithms represented in the litera- ture. REFERENCES Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Mod- ern information retrieval. New York: Addison Wesley. Bentley, J., Weide, B., & Yao, A. (1980). Optimal expected-time algorithms for closest point problem. ACM Transactions on Mathemati- cal Software, 6(4), 563-580. Fensel, D. (2001). Ontologies: Silver bullet for knowledge management and electronic commerce. Berlin: Springer Verlag. Fridman, N. N., & McGuinness, D. L. (2001, March). Ontology development 101: A guide to creating your first ontology (Stanford Knowledge Systems Laboratory Technical Report KSL-01-05, Stanford Medi- cal Informatics Technical Report SMI-2001- 0880). Friedman, J., Bentley, J., & Finkel, R. (1977). An algorithm for finding best matches in loga- rithmic expected time. ACM Transactions on Mathematical Software, 3(3), 209-226. Garcia-Molina, H., Ullman, J., & Widom, J. (2000). Database system implementation. New Jer- sey: Prentice Hall. Gilliland-Swetland, A. J. (2000). Introduction to metadata, setting the stage. Retrieved De- cember 20, 2004, from http://www.getty.edu/ research/institute/standards/intrometadata/ Gruber, T. R. (1993, March). Toward principles for the design of ontologies used for knowl- edge sharing. In Padua Workshop on For- mal Ontology (p. 23). Holzinger, A., Kleinberger, T., & Müller, P. (2001). Multimedia learning systems based on IEEE Learning Object Metadata (LOM). In Pro- ceedings of ED-MEDIA 2001, Tampere, Fin- land. Jokela, S. (2001). Metadata enhanced content management in media companies. In Acta Polytecnica Scandinavica. Mathematics and computing series no. 114. Doctoral the- sis, Helsinki University of Technology. Kassanke, S., El-Saddik, A., & Steinacker, A. (2001). Learning objects metadata and tools in the area of operations research. In Pro- ceedings of ED-MEDIA 2001, Tampere, Fin- land. Lamminaho, V. (2000). Metadata specification: Forms, menus for description of courses and all other objects. CUBER project, Deliver- able D3.1. Liu, J., Chan, S., Hung, A., & Lee, R. (2002). Facilitators and inhibitors of e-learning. In L. C. Jain, R. J. Howlett, N. S. Ichalkaranje, & G. Tonfoni (Eds.), Virtual environments for teaching and learning, Series on inno- vative intelligence (Vol. 1, pp. 75-109). World Scientific. Niemi, H. (2002). Empowering learners in the virtual university. In H. Niemi, & P. Ruohotie (Eds.), Theoretical understandings for learning in the virtual university. Univer- sity of Tampere, Research Center for Voca- tional Education and Training. Pöyry, P., Pelto-Aho, K., & Puustjärvi, J. (2002). The role of meta data in the CUBER system. In Proceedings of the Annual Conference of the SAICSIT 2002 (pp. 172-178). Pöyry, P., & Puustjärvi, J. (2003). CUBER: A personalised curriculum builder. In Proceed- ings of the 3rd IEEE International Confer- Journal of Distance Education Technologies, 4(3), 36-47, July-September 2006 47 Copyright © 2006, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. ence on Advanced Learning Technologies, Athens, Greece (pp. 326-327). Ryan, S., Scott, B., Freeman, H., & Patel, D. (2000). The virtual university. The Internet and resource-based learning. London: Kogan Page. Stojanovic, L., Staab, S., & Studer, R. (2001). E- learning based on the Semantic Web. In Pro- ceedings of WebNet2001 — World Confer- ence on the WWW and Internet, Orlando, FL. Strijker, A. (2001). Using metadata for re-using material and providing user support tools. In Proceedings of ED-MEDIA 2001, Tampere, Finland. Teare, R., Davies, D., & Sandelands, E. (1999). The virtual university — An action para- digm and process for workplace learning. Cassell. Turpeinen, M. (2000). Customizing news con- tent for individuals and communities. In Acta Polytechnica Scandinavica. Mathematics and computing series no. 103. Doctoral the- sis, Helsinki University of Technology. Vasudevan, V. (2001). A Web service primer. Retrieved December 20, 2004, from http:// www.xml/lpt/a/2001/04/04/Webservices/ indeax.html Yan, T., & Garcia-Molina, H. (1994). Index struc- tures for selective dissemination of informa- tion under the Boolean Model. ACM Trans- actions on Database Systems, 19(2), 332- 364. Yli-Koivisto, J., & Puustjärvi, J. (2002). CoMet: An electronic newspaper prototype. Work- shop on XML in Digital Media. In Proceed- ings of the 8th International Conference on Distributed Multimedia Systems (DMS’2002) (pp. 703-707). J. Puustjärvi obtained his BSc and MSc in computer science in 1985 and 1990, respectively, and his PhD in computer science in 1999, all from the University of Helsinki, Finland. Currently he is a professor of information society technologies at the Technical University of Lappeenranta. He is also a docent of e-business technologies at the Technical University of Helsinki, and a docent of computer science at the University of Helsinki. His research interests include e- learning, e-business, knowledge management, Semantic Web and databases. P. Pöyry obtained her BA (Educ) and MA (Educ) in 2001 and 2002 from the University of Helsinki, Finland. Ms. Pöyry obtained her LicSc (Tech) in 2004 from the Helsinki University of Technology, where she is a doctoral student. Currently she works as a researcher and prepares her PhD thesis at the Helsinki University of Technology in the Software Business and Engineering Institute as a member of the Information Ergonomics Research Group. Her research interests include e-learning, knowledge management, CSCW, and usability research. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.