GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | HilerA et Al. 195 José R. Hilera, Carmen Pagés, J. Javier Martínez, J. Antonio Gutiérrez, and Luis de-Marcos An Evolutive Process to Convert Glossaries into Ontologies dictionary, the outcome will be limited by the richness of the definition of terms included in that dictionary. It would be what is normally called a “lightweight” ontol- ogy,6 which could later be converted into a “heavyweight” ontology by implementing, in the form of axioms, know- ledge not contained in the dictionary. This paper describes the process of creating a lightweight ontology of the domain of software engineering, starting from the IEEE Standard Glossary of Software Engineering Terminology.7 ■■ Ontologies, the Semantic Web, and Libraries Within the field of librarianship, ontologies are already being used as alternative tools to traditional controlled vocabularies. This may be observed particularly within the realm of digital libraries, although, as Krause asserts, objections to their use have often been raised by the digital library community.8 One of the core objections is the difficulty of creating ontologies as compared to other vocabularies such as taxonomies or thesauri. Nonetheless, the semantic richness of an ontology offers a wide range of possibilities concerning indexing and searching of library documents. The term ontology (used in philosophy to refer to the “theory about existence”) has been adopted by the artificial intelligence research community to define a cate- gorization of a knowledge domain in a shared and agreed form, based on concepts and relationships, which may be formally represented in a computer readable and usable format. The term has been widely employed since 2001, when Berners-Lee et al. envisaged the Semantic Web, which aims to turn the information stored on the Web into knowledge by transforming data stored in every webpage into a common scheme accepted in a specific domain.9 To accomplish that task, knowledge must be represented in an agreed-upon and reusable computer-readable format. To do this, machines will require access to structured collections of information and to formalisms which are based on mathematical logic that permits higher levels of automatic processing. Technologies for the Semantic Web have been devel- oped by the World Wide Web Consortium (W3C). The most relevant technologies are RDF (Resource Description This paper describes a method to generate ontologies from glossaries of terms. The proposed method presupposes an evolutionary life cycle based on successive transforma- tions of the original glossary that lead to products of intermediate knowledge representation (dictionary, tax- onomy, and thesaurus). These products are characterized by an increase in semantic expressiveness in comparison to the product obtained in the previous transformation, with the ontology as the end product. Although this method has been applied to produce an ontology from the “IEEE Standard Glossary of Software Engineering Terminology,” it could be applied to any glossary of any knowledge domain to generate an ontology that may be used to index or search for information resources and documents stored in libraries or on the Semantic Web. F rom the point of view of their expressiveness or semantic richness, knowledge representation tools can be classified at four levels: at the basic level (level 0), to which dictionaries belong, tools include defini- tions of concepts without formal semantic primitives; at the taxonomies level (level 1), tools include a vocabulary, implicit or explicit, as well as descriptions of specialized relationships between concepts; at the thesauri level (level 2), tools further include lexical (synonymy, hyperonymy, etc.) and equivalence relationships; and at the reference models level (level 3), tools combine the previous relation- ships with other more complex relationships between concepts to completely represent a certain knowledge domain.1 Ontologies belong at this last level. According to the hierarchic classification above, knowledge representation tools of a particular level add semantic expressiveness to those in the lowest levels in such a way that a dictionary or glossary of terms might develop into a taxonomy or a thesaurus, and later into an ontology. There are a variety of comparative studies of these tools,2 as well as varying proposals for systematically generating ontologies from lower-level knowledge repre- sentation systems, especially from descriptor thesauri.3 This paper proposes a process for generating a termino- logical ontology from a dictionary of a specific knowledge domain.4 Given the definition offered by Neches et al. (“an ontology is an instrument that defines the basic terms and relations comprising the vocabulary of a topic area as well as the rules for combining terms and relations to define extensions to the vocabulary”)5 it is evident that the ontology creation process will be easier if there is a vocabulary to be extended than if it is developed from scratch. If the developed ontology is based exclusively on the José r. Hilera (jose.hilera@uah.es) is Professor, carmen Pagés (carmina.pages@uah.es) is assistant Professor, J. Javier Mar- tínez (josej.martinez@uah.es) is Professor, J. Antonio Gutiér- rez (jantonio.gutierrez@uah.es) is assistant Professor, and luis de-Marcos (luis.demarcos@uah.es) is Professor, Department of computer Science, Faculty of librarianship and Documentation, university of alcalá, Madrid, Spain. 196 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 configuration management; data types; errors, faults, and failures; evaluation techniques; instruction types; language types; libraries; microprogramming; operating systems; quality attributes; software documentation; soft- ware and system testing; software architecture; software development process; software development techniques; and software tools.15 In the glossary, entries are arranged alphabetically. An entry may consist of a single word, such as “software,” a phrase, such as “test case,” or an acronym, such as “CM.” If a term has more than one definition, the definitions are numbered. In most cases, noun definitions are given first, followed by verb and adjective definitions as applicable. Examples, notes, and illustrations have been added to clarify selected definitions. Cross-references are used to show a term’s relations with other terms in the dictionary: “contrast with” refers to a term with an opposite or substantially different mean- ing; “syn” refers to a synonymous term; “see also” refers to a related term; and “see” refers to a preferred term or to a term where the desired definition can be found. Figure 2 shows an example of one of the definitions of the glossary terms. Note that definitions can also include Framework),10 which defines a common data model to specify metadata, and OWL (Ontology Web Language),11 which is a new markup language for publishing and sharing data using Web ontologies. More recently, the W3C has presented a proposal for a new RDF-based markup system that will be especially useful in the con- text of libraries. It is called SKOS (Simple Knowledge Organization System), and it provides a model for expressing the basic structure and content of concept schemes, such as thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other simi- lar types of controlled vocabularies.12 The emergence of the Semantic Web has created great interest within librarianship because of the new possibili- ties it offers in the areas of publication of bibliographical data and development of better indexes and better displays than those that we have now in ILS OPACs.13 For that rea- son, it is important to strive for semantic interoperability between the different vocabularies that may be used in libraries’ indexing and search systems, and to have com- patible vocabularies (dictionaries, taxonomies, thesauri, ontologies, etc.) based on a shared standard like RDF. There are, at the present time, several proposals for using knowledge organization systems as alternatives to controlled vocabularies. For example, folksonomies, though originating within the Web context, have been proposed by different authors for use within libraries “as a powerful, flexible tool for increasing the user-friendliness and inter- activity of public library catalogs.”14 Authors argue that the best approach would be to create interoperable controlled vocabularies using shared and agreed-upon glossaries and dictionaries from different domains as a departure point, and then to complete evolutive processes aimed at semantic extension to create ontologies, which could then be com- bined with other ontologies used in information systems running in both conventional and digital libraries for index- ing as well as for supporting document searches. There are examples of glossaries that have been transformed into ontologies, such as the Cambridge Healthtech Institute’s “Pharmaceutical Ontologies Glossary and Taxonomy” (http://www.genomicglossaries.com/content/ontolo gies.asp), which is an “evolving terminology for emerging technologies.” ■■ IEEE Standard Glossary of Software Engineering Terminology To demonstrate our proposed method, we will use a real glossary belonging to the computer science field, although it is possible to use any other. The glossary, available in electronic format (PDF), defines approxi- mately 1,300 terms in the domain of software engineering (figure 1). Topics include addressing assembling, compil- ing, linking, loading; computer performance evaluation; Figure 1. Cover of the Glossary document GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | HilerA et Al. 197 4. Define the classes and the class hierarchy 5. Define the properties of classes (slots) 6. Define the facets of the slots 7. Create instances As outlined in the Introduction, the ontology devel- oped using our method is a terminological one. Therefore we can ignore the first two steps in Noy’s and McGuinness’ process as the concepts of the ontology coincide with the terms of the glossary used. Any ontology development process must take into account the basic stages of the life cycle, but the way of organizing the stages can be different in different meth- ods. In our case, since the ontology has a terminological character, we have established an incremental develop- ment process that supposes the natural evolution of the glossary from its original format (dictionary or vocabu- lary format) into an ontology. The proposed life cycle establishes a series of steps or phases that will result in intermediate knowledge representation tools, with the final product, the ontology, being the most semantically rich (figure 4). Therefore this is a product-driven process, in which the aim of every step is to obtain an intermediate product useful on its own. The intermediate products and the final examples associated with the described concept. In the resulting ontology, the examples were included as instances of the corresponding class. In figure 2, it can be seen that the definition refers to another glossary on programming languages (Std 610.13), which is a part of the series of dic- tionaries related to computer science (“IEEE Std 610,” figure 3). Other glossaries which are men- tioned in relation to some references about term definitions are 610.1, 610.5, 610.7, 610.8, and 610.9. To avoid redundant definitions and pos- sible inconsistencies, links must be implemented between ontologies developed from those glossa- ries that include common concepts. The ontology generation process presented in this paper is meant to allow for integration with other ontolo- gies that will be developed in the future from the other glossaries. In addition to the explicit references to other terms within the glossary and to terms from other glos- saries, the textual definition of a concept also has implicit references to other terms. For example, from the phrase “provides features designed to facilitate expression of data structures” included in the definition of the term high order language (figure 2), it is possible to determine that there is an implicit relationship between this term and the term data structure, also included in the glossary. These relationships have been considered in establishing the properties of the concepts in the developed ontology. ■■ Ontology Development Process Many ontology development methods presuppose a life cycle and suggest technologies to apply during the pro- cess of developing an ontology.16 The method described by Noy and McGuinness is helpful when beginning this process for the first time.17 They establish a seven-step process: 1. Determine the domain and scope of the ontology 2. Consider reusing existing ontologies 3. Enumerate important terms in the ontology Figure 2. Example of term definition in the IEEE Glossary Figure 3. IEEE Computer Science Glossaries 610—Standard Dictionary of Computer Terminology 610.1—Standard Glossary of Mathematics of Computing Terminology 610.2—Standard Glossary of Computer Applications Terminology 610.3—Standard Glossary of Modeling and Simulation Terminology 610.4—Standard Glossary of Image Processing Terminology 610.5—Standard Glossary of Data Management Terminology 610.6—Standard Glossary of Computer Graphics Terminology 610.7—Standard Glossary of Computer Networking Terminology 610.8—Standard Glossary of Artificial Intelligence Terminology 610.9—Standard Glossary of Computer Security and Privacy Terminology 610.10—Standard Glossary of Computer Hardware Terminology 610.11—Standard Glossary of Theory of Computation Terminology 610.12—Standard Glossary of Software Engineering Terminology 610.13—Standard Glossary of Computer Languages Terminology high order language (HOL). A programming language that requires little knowledge of the computer on which a program will run, can be translated into several difference machine languages, allows symbolic naming of operations and addresses, provides features designed to facilitate expression of data structures and program logic, and usually results in several machine instructions for each program state- ment. Examples include Ada, COBOL, FORTRAN, ALGOL, PASCAL. Syn: high level language; higher order language; third gen- eration language. Contrast with: assembly language; fifth generation language; fourth generation language; machine language. Note: Specific languages are defined in P610.13 198 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 Since there are terms with different meanings (up to five in some cases) in the IEEE Glossary of Software Engineering Terminology, during dictionary development we decided to create different concepts (classes) for the same term, associating a number to these concepts to differentiate them. For example, there are five different definitions for the term test, which is why there are five concepts (Test1–Test5), corresponding to the five meanings of the term: (1) An activity in which a system or compo- nent is executed under specified conditions, the results are observed or recorded, and an evaluation is made of some aspect of the system or component; (2) To conduct an activity as in (1); (3) A set of one or more test cases; (4) A set of one or more test procedures; (5) A set of one or more test cases and procedures. taxonomy The proposed lifecycle establishes a stage for the con- version of a dictionary into a taxonomy, understanding taxonomy as an instrument of concepts categorization, product are a dictionary, which has a formal and computer processed structure, with the terms and their definitions in XML format; a taxonomy, which reflects the hierarchic rela- tionships between the terms; a thesaurus, which includes other relationships between the terms (for example, the synonymy relationship); and, finally, the ontology, which will include the hierarchy, the basic relationships of the the- saurus, new and more complex semantic relationships, and restrictions in form of axioms expressed using description logics.18 The following paragraphs describe the way each of these products is obtained. Dictionary The first step of the proposed development process con- sists of the creation of a dictionary in XML format with all the terms included in the IEEE Standard Glossary of Software Engineering Terminology and their related defini- tions. This activity is particularly mechanical and does not need human intervention as it is basically a transfor- mation of the glossary from its original format (PDF) into a format better suited to the development process. All formats considered for the dictionary are based on XML, and specifically on RDF and RDF schema. In the end, we decided to work with the standards DAML+OIL and OWL,19 though we are not opposed to working with other languages, such as SKOS or XMI,20 in the future. (In the latter case, it would be possible to model the intermediate products and the ontology in UML graphic models stored in xml files.)21 In our project, the design and implementation of all products has been made using an ontology editor. We have used OilEd (with OilViz Plugin) as editor, both because of its simplicity and because it allows the exportation to OWL and DAML formats. However, with future maintenance and testing in mind, we decided to use Protégé (with OWL plugin) in the last step of the process, because this is a more flexible environment with extensible mod- ules that integrate more functionality such as ontology annotation, evaluation, middleware service, query and inference, etc. Figure 5 shows the dictionary entry for “high order language,” which appears in figure 2. Note that the dic- tionary includes only owl:class (or daml:class) to mark the term; rdf:label to indicate the term name; and rdf:comment to provide the definition included in the original glossary. Figure 4. Ontology development process HighOrderLanguage Figure 5. Example of dictionary entry GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | HilerA et Al. 199 example, when analyzing the definition of the term com- piler: “(Is) A computer program that translates programs expressed in a high order language into their machine language equivalent,” it is possible to deduce that com- piler is a subconcept of computer program, which is also included in the glossary.) In addition to the lexical or syn- tactic analysis, it is necessary for an expert in the domain to perform a semantic analysis to complete the develop- ment of the taxonomy. The implementation of the hierarchical relation- ships among the concepts is made using rdfs:subClassOf, regardless of whether the taxonomy is implemented in OWL or DAML format, since both languages specify this type of relationship in the same way. Figure 6 shows an example of a hierarchical relationship included in the definition of the concept pictured in figure 5. thesaurus According to the International Organization for Standardization (ISO), a thesaurus is “the vocabulary of a controlled indexing language, formally organized in order to make explicit the a priori relations between concepts (for example ‘broader’ and ‘narrower’).”25 This definition establishes the lexical units and the semantic relationships between these units as the elements that constitute a the- saurus. The following is a sample of the lexical units: ■■ Descriptors (also called “preferred terms”): the terms used consistently when indexing to represent a con- cept that can be in documents or in queries to these documents. The ISO standard introduces the option of adding a definition or an application note to every term to establish explicitly the chosen meaning. This note is identified by the abbreviation SN (Scope Note), as shown in figure 7. ■■ Non-descriptors (“non-preferred terms”): the syn- onyms or quasi-synonyms of a preferred term. A nonpreferred term is not assigned to documents submitted to an indexing process, but is provided as an entry point in a thesaurus to point to the appropri- ate descriptor. Usually the descriptors are written in capital letters and the nondescriptors in small letters. ■■ Compound descriptors: the terms used to represent complex concepts and groups of descriptors, which allow for the structuring of large numbers of thesau- rus descriptors into subsets called micro-thesauri. In addition to lexical units, other fundamental elements of a thesaurus are semantic relationships between these units. The more common relationships between lexical units are the following: ■■ Equivalence: the relationship between the descrip- tors and the nondescriptors (synonymous and that is, as a systematical classification in a traditional way. As Gilchrist states, there is no consensus on the meaning of terms like taxonomy, thesaurus, or ontology.22 In addi- tion, much work in the field of ontologies has been done without taking advantage of similar work performed in the fields of linguistics and library science.23 This situa- tion is changing because of the increasing publication of works that relate the development of ontologies to the development of “classic” terminological tools (vocabular- ies, taxonomies, and thesauri). This paper emphasizes the importance and useful- ness of the intermediate products created at each stage of the evolutive process from glossary to ontology. The end product of the initial stage is a dictionary expressed as XML. The next stage in the evolutive process (figure 4) is the transformation of that dictionary into a tax- onomy through the addition of hierarchical relationships between concepts. To do this, it is necessary to undertake a lexical- semantic analysis of the original glossary. This can be done in a semiautomatic way by applying natural language processing (NLP) techniques, such as those recommended by Morales-del-Castillo et al.,24 for creat- ing thesauri. The basic processing sequence in linguistic engineering comprises the following steps: (1) incorpo- rate the original documents (in our case the dictionary obtained in the previous stage) into the information sys- tem; (2) identify the language in which they are written, distinguishing independent words; (3) “understand” the processed material at the appropriate level; (4) use this understanding to transform, search, or traduce data; (5) produce the new media required to present the produced outcomes; and finally, (6) present the final outcome to human users by means of the most appropriate periph- eral device—screen, speakers, printer, etc. An important aspect of this process is natural lan- guage comprehension. For that reason, several different kinds of programs are employed, including lemmatizers (which implement stemming algorithms to extract the lexeme or root of a word), morphologic analyzers (which glean sentence information from their constituent ele- ments: morphemes, words, and parts of speech), syntactic analyzers (which group sentence constituents to extract elements larger than words), and semantic models (which represent language semantics in terms of concepts and their relations, using abstraction, logical reasoning, orga- nization and data structuring capabilities). From the information in the software engineering dictionary and from a lexical analysis of it, it is possible to determine a hierarchical relationship when the name of a term contains the name of another one (for example, the term language and the terms programming language and hardware design language), or when expressions such as “is a” linked to the name of another term included in the glossary appear in the text of the term definition. (For 200 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 indicating that high order language relates to both assembly and machine languages. The life cycle proposed in this paper (figure 4) includes a third step or phase that transforms the taxonomy obtained in the previous phase into a thesaurus through the incorporation of relationships between the concepts that complement the hierarchical relations included in the taxonomy. Basically, we have to add two types of relation- ships—equivalence and associative, represented in the standard thesauri with UF (and USE) and RT respectively. We will continue using XML to implement this new product. There are different ways of implementing a thesaurus using a language based on XML. For example, Matthews et al. proposed a standard RDF format,26 where as Hall created an ontology in DAML.27 In both cases, the authors modeled the general structure of quasi-synonymous). ISO establishes that the abbrevia- tion UF (Used For) precedes the nondescriptors linked to a descriptor; and the abbreviation USE is used in the opposite case. For example, a thesaurus developed from the IEEE glossary might include a descriptor “high order language” and an equivalence relationship with a nondescriptor “high level language” (figure 7). ■■ Hierarchical: a relationship between two descrip- tors. In the thesaurus one of these descriptors has been defined as superior to the other one. There are no hierarchical relationships between nondescrip- tors, nor between nondescriptors and descriptors. A descriptor can have no lower descriptors or several of them, and no higher descriptors or several of them. According to the ISO standard, hierarchy is expressed by means of the abbreviations BT (Broader Term), to indicate the generic or higher descriptors, and NT (Narrower Term), to indicate the specific or lower descriptors. The term at the head of the hierarchy to which a term belongs can be included, using the abbreviation TT (Top Term). Figure 7 presents these hierarchical relationships. ■■ Associative: a reciprocal relationship that is estab- lished between terms that are neither equivalent nor hierarchical, but are semantically or conceptually associated to such an extent that the link between them should be made explicit in the controlled vocabulary on the grounds that it may suggest additional terms for use in indexing or retrieval. It is generally indicated by the abbreviation RT (Related Term). There are no associative relationships between nondescriptors and descriptors, or between descriptors already linked by a hierarchical relation. It is possible to establish associative relationships between descriptors belonging to the same or differ- ent category. The associative relationships can be of very different types. For example, they can represent causality, instrumentation, location, similarity, origin, action, etc. Figure 7 shows two associative relations, .. HIGH ORDER LANGUAGE (descriptor) SN A programming language that... UF High level language (no-descriptor) UF Third generation language (no-descriptor) TT LANGUAGE BT PROGRAMMING LANGUAGE NT OBJECT ORIENTED LANGUAGE NT DECLARATIVE LANGUAGE RT ASSEMBLY LANGUAGE (contrast with) RT MACHINE LANGUAGE (contrast with) .. High level language USE HIGH ORDER LANGUAGE .. Third generation language USE HIGH ORDER LANGUAGE .. Figure 7. Fragment of a thesaurus entry Figure 6. Example of taxonomy entry ... GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | HilerA et Al. 201 terms. For example: . Or using the glossary notation: . ■■ The rest of the associative relationships (RT) that were included in the thesaurus correspond to the cross-references of the type “Contrast with” and “See also” that appear explicitly in the IEEE glossary. ■■ Neither compound descriptors nor groups of descrip- tors have been implemented because there is no such structure in the glossary. Ontology Ding and Foo state that “ontology promotes standard- ization and reusability of information representation through identifying common and shared knowledge. Ontology adds values to traditional thesauri through deeper semantics in digital objects, both conceptually, relationally and machine understandably.”29 This seman- tic richness may imply deeper hierarchical levels, richer relationships between concepts, the definition of axioms or inference rules, etc. The final stage of the evolutive process is the transfor- mation of the thesaurus created in the previous stage into an ontology. This is achieved through the addition of one or more of the basic elements of semantic complexity that differentiates ontologies from other knowledge represen- tation standards (such as dictionaries, taxonomies, and thesauri). For example: ■■ Semantic relationships between the concepts (classes) of the thesaurus have been added as properties or ontology slots. ■■ Axioms of classes and axioms of properties. These are restriction rules that are declared to be sat- isfied by elements of ontology. For example, to establish disjunctive classes ( ), have been defined, and quantification restrictions (existential or universal) and cardinality restrictions in the relation- ships have been implemented as properties. Software based on techniques of linguistic analysis has been developed to facilitate the establishment of the properties and restrictions. This software analyzes the definition text for each of the more than 1,500 glossary terms (in thesaurus format), isolating those words that a thesaurus from classes (rdf:Class or daml:class) and properties (rdf:Property or daml:ObjectProperty). In the first case they proposed five classes: ThesaurusObject, Concept, TopConcept, Term, ScopeNote; and several properties to implement the relations, like hasScope- Note (SN), IsIndicatedBy, PreferredTerm, UsedFor (UF), ConceptRelation, BroaderConcept (BT), NarrowerConcept (NT), TopOfHierarchy (TT) and isRelatedTo (RT). Recently the W3C has developed the SKOS specifica- tion, created to define knowledge organization schemes. In the case of thesauri, SKOS includes specific tags, such as skos:Concept, skos:scopeNote (SN), skos:broader (BT), skos:narrower (NT), skos:related (RT), etc., that are equivalent to those listed in the previous paragraph. Our specification does not make any statement about the formal relationship between the class of SKOS concept schemes and the class of OWL ontologies, which will allow different design patterns to be explored for using SKOS in combination with OWL. Although any of the above-mentioned formats could be used to implement the thesaurus, given that the end- product of our process is to be an ontology, our proposal is that the product to be generated during this phase should have a format compatible with the final ontology and with the previous taxonomy. Therefore a minimal number of changes will be carried out on the product created in the previous step, resulting in a knowledge representation tool similar to a thesaurus. That tool does not need to be modified during the following (final) phase of transformation into an ontology. Nevertheless, if for some reason it is necessary to have the thesaurus in one of the other formats (such as SKOS), it is possible to apply a simple XSLT transformation to the product. Another option would be to integrate a thesaurus ontology, such as the one proposed by Hall,28 with the ontology represent- ing the IEEE glossary. In the thesaurus implementation carried out in our project, the following limitations have been considered: ■■ Only the hierarchical relationships implemented in the taxonomy have been considered. These include relationsips of type “is-a,” that is, generalization rela- tionships or type–subset relationships. Relationships that can be included in the thesaurus marked with TT, BT, and NT, like relations of type “part of” (that is, partative relationships) have not been considered. Instead of considering them as hierarchical relation- ships, the final ontology includes the possibility of describing classes as a union of classes. ■■ The relationships of synonymy (UF and USE) used to model the cross-references in the IEEE glossary (“Syn” and “See,” respectively) were implemented as equiv- alent terms, that is, as equivalent axioms between classes (owl:equivalentClass or daml:sameClassAs), with inverse properties to reflect the preference of the 202 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 match the name of other glossary terms (or a word in the definition text of other glossary terms). The isolated words will then be candidates for a relationship between both of them. (Figure 8 shows the candidate properties obtained from the Software Engineering glossary.) The user then has the option of creating relationships with the identified candidate words. The user must indicate, for every relationship to be created, the restriction type that it represents as well as existential or universal quan- tification or cardinality (minimum or maximum). After confirming this information, the program updates the file containing the ontology (OWL or DAML), adding the property to the class that represents the processed term. Figure 9 shows an example of the definition of two prop- erties and its application to the class HighOrderLanguage: a property Express with existential quantification over the class DataStructure to indicate that a language must repre- sent at least one data structure; and a property TranslateTo of universal type to indicate that any high-level language is translated into machine language (MachineLanguage). ■■ Results, Conclusions, and Future Work The existence of ontologies of specific knowledge domains (software engineering in this case) facilitates the process of finding resources about this discipline on the Semantic Web and in digital libraries, as well as the reuse of learn- ing objects of the same domain stored in repositories available on the Web.30 When a new resource is indexed in a library catalog, a new record that conforms to the ontology conceptual data model may be included. It will be necessary to assign its properties according to the concept definition included in the ontology. The user may later execute semantic queries that will be run by the search system that will traverse the ontology to identify the concept in which the user was interested to launch a wider query including the resources indexed under the concept. Ontologies, like the one that has been “evolved,” may also be used in an open way to index and search for resources on the Web. In that case, however, semantic search engines such as Swoogle (http://swoogle.umbc .edu/), are required in place of traditional syntactic search engines, such as Google. The creation of a complete ontology of a knowledge domain is a complex task. In the case of the domain presented in this paper, that of software engineering, although there have been initiatives toward ontology cre- ation that have yielded publications by renowned authors in the field,31 a complete ontology has yet to be created and published. This paper has described a process for developing a modest but complete ontology from a glossary of ter- minology, both in OWL format and DAML+OIL format, accept access accomplish account achieve adapt add adjust advance affect aggregate aid allocate allow allow symbolic naming alter analyze apply approach approve arrangement arrive assign assigned by assume avoid await begin break bring broke down builds call called by can be can be input can be used as can operate in cannot be usedas carry out cause change characterize combine communicate compare comply comprise conduct conform consist constrain construct contain contains no contribute control convert copy correct correspond count create debugs decompiles decomposedinto decrease define degree delineate denote depend depict describe design designate detect determine develop development direct disable disassembles display distribute divide document employ enable encapsulate encounter ensure enter establish estimate establish evaluate examine exchange execute after execute in executes expand express express as extract facilitate fetch fill follow fulfil generate give give partial given constrain govern have have associated have met have no hold identify identify request ignore implement imply improve incapacitate include incorporate increase indicate inform initiate insert install intend interact with interprets interrelate investigate invokes is is a defect in is a form of is a method of is a mode of is a part is a part of is a sequence is a sequenceof is a technique is a techniqueof is a type is a type of is ability is activated by is adjusted by is applied to is based is called by is composed is contained is contained in is establish is established is executed after is executed by is incorrect is independent of is manifest is measured in is not is not subdivided in is part is part of is performed by is performed on is portion is process by is produce by is produce in is ratio is represented by is the output is the result of is translated by is type is used is used in isolate know link list load locate maintain make make up may be measure meet mix modify monitors move no contain no execute no relate no use not be connected not erase not fill not have not involve not involving not translate not use occur occur in occur in a operate operatewith optimize order output parses pas pass test perform permit permitexecute permit the execution pertaining place preclude predict prepare prescribe present present for prevent preventaccessto process produce produce no propose provide rank reads realize receive reconstruct records recovery refine reflect reformat relate relation release relocates remove repair replace represent request require reserve reside restore restructure result resume retain retest returncontrolto reviews satisfy schedule send server set share show shutdown specify store store in structure submission of supervise supports suppress suspend swap synchronize take terminate test there are no through throughout transfer transform translate transmit treat through understand update use use in use to utilize value verify work in writes Figure 8. Candidate properties obtained from the linguistic analysis of the Software Engineering glossary GeNerAtiNG cOllABOrAtive sYsteMs FOr DiGitAl liBrAries | HilerA et Al. 203 to each term.) We defined 324 properties or relationships between these classes. These are based on a semiauto- mated linguistic analysis of the glossary content (for example, Allow, Convert, Execute, OperateWith, Produces, Translate, Transform, Utilize, WorkIn, etc.), which will be refined in future versions. The authors’ aim is to use this ontology, which we have called OntoGLOSE (Ontology GLossary Software Engineering), to unify the vocabulary. OntoGLOSE will be used in a more ambitious project, whose purpose is the development of a complete ontology in software engi- neering from the SWEBOK Guide.32 Although this paper has focused on this ontology, the method that has been described may be used to generate an ontology from any dictionary. The flexibility that OWL permits for ontology description, along with its compat- ibility with other RDF-based metadata languages, makes possible interoperability between ontologies and between ontologies and other controlled vocabularies and allows for the building of merged representations of multiple knowledge domains. These representations may eventu- ally be used in libraries and repositories to index and search for any kind of resource, not only those related to the original field. ■■ Acknowledgments This research is co-funded by the Spanish Ministry of Industry, Tourism and Commerce PROFIT program (grant TSI-020100-2008-23). The authors also want to acknowledge support from the TIFyC research group at the University of Alcala. References and Notes 1. M. Dörr et al., State of the Art in Content Standards (Amster- dam: OntoWeb Consortium, 2001). 2. D. Soergel, “The Rise of Ontologies or the Reinvention of Classification,” Journal of the American Society for Information Science 50, no. 12 (1999): 1119–20; A. Gilchrist, “Thesauri, Tax- onomies and Ontologies—An Etymological Note,” Journal of Documentation 59, no. 1 (2003): 7–18. 3. B. J. Wielinga et al., “From Thesaurus to Ontology,” Pro- ceedings of the 1st International Conference on Knowledge Capture (New York: ACM, 2001): 194–201: J. Qin and S. Paling, “Con- verting a Controlled Vocabulary into an Ontology: The Case of GEM,” Information Research 6 (2001): 2. 4. According to Van Heijst, Schereiber, and Wielinga, ontolo- gies can be classified as terminological ontologies, information ontologies, and knowledge modeling ontologies; terminological ontologies specify the terms that are used to represent knowl- edge in the domain of discourse, and they are in use principally to unify vocabulary in a certain domain. G. Van Heijst, A. T. which is ready to use in the Semantic Web. As described at the opening of this article, our aim has been to create a lightweight ontology as a first version, which will later be improved by including more axioms and relationships that increase its semantic expressiveness. We have tried to make this first version as tailored as possible to the initial glossary, knowing that later versions will be improved by others who might take on the work. Such improvements will increase the ontology’s utility, but will make it a less- faithful representation of the IEEE glossary from which it was derived. The ontology we have developed includes 1,521 classes that correspond to the same number of concepts represented in the IEEE glossary. (Included in this num- ber are the different meanings that the glossary assigns ... Figure 9. Example of ontology entry 204 iNFOrMAtiON tecHNOlOGY AND liBrAries | DeceMBer 2010 20. W3C, SKOS; Object Management Group, XML Metadata Interchange (XMI), 2003, http://www.omg.org/technology/doc- uments/formal/xmi.htm (accessed Oct. 5, 2009). 21. UML (Unified Modeling Language) is a standardized general-purpose modeling language (http://www.uml.org). Nowadays, different UML plugins for ontologies’ editors exist. These plugins allow working with UML graphic models. Also, it is possible to realize the UML models with a CASE tool, to export them to XML format, and to transform them to the ontol- ogy format (for example, OWL) using a XSLT sheet, as the one published in D. Gasevic, “UMLtoOWL: Converter from UML to OWL,” http://www.sfu.ca/~dgasevic/projects/UMLtoOWL/ (accessed Oct. 5, 2009). 22. Gilchrist, “Thesauri, Taxonomies and Ontologies.” 23. Soergel, “The Rise of Ontologies or the Reinvention of Classification.” 24. J. M. Morales-del-Castillo et al., “A Semantic Model of Selective Dissemination of Information for Digital Libraries,” Information Technology & Libraries 28, no. 1 (2009): 22–31. 25. International Standards Organization, ISO 2788:1986 Doc- umentation—Guidelines for the Establishment and Develop- ment of Monolingual Thesauri (Geneve: International Standards Organization, 1986). 26. B. M. Matthews, K. Miller, and M. D. Wilson, “A Thesau- rus Interchange Format in RDF,” 2002, http://www.w3c.rl.ac .uk/SWAD/thes_links.htm (accessed Feb. 10, 2009). 27. M. Hall, “CALL Thesaurus Ontology in DAML,” Dynam- ics Research Corporation, 2001, http://orlando.drc.com/daml/ ontology/CALL-Thesaurus (accessed Oct. 5, 2009). 28. Ibid. 29. Y. Ding and S. Foo, “Ontology Research and Develop- ment. Part 1—A Review of Ontology Generation,” Journal of Information Science 28, no. 2 (2002): 123–36. See also B. H. Kwas- nik, “The Role of Classification in Knowledge Representation and Discover,” Library Trends 48 (1999): 22–47. 30. S. Otón et al., “Service Oriented Architecture for the Imple- mentation of Distributed Repositories of Learning Objects,” International Journal of Innovative Computing, Information & Con- trol (2010), forthcoming. 31. O. Mendes and A. Abran, “Software Engineering Ontol- ogy: A Development Methodology,” Metrics News 9 (2004): 68–76; C. Calero, F. Ruiz, and M. Piattini, Ontologies for Software Engineering and Software Technology (Berlin: Springer, 2006). 32. IEEE, Guide to the Software Engineering Body of Knowledge (SWEBOK) (Los Alamitos, Calif.: IEEE Computer Society, 2004), http:// www.swebok.org (accessed Oct. 5, 2009). Schereiber, and B. J. Wielinga, “Using Explicit Ontologies in KBS Development,” International Journal of Human & Computer Studies 46, no. 2/3 (1996): 183–292. 5. R. Neches et al., “Enabling Technology for Knowledge Sharing,” AI Magazine 12, no. 3 (1991): 36–56. 6. O. Corcho, F. Fernández-López, and A. Gómez-Pérez, “Methodologies, Tools and Languages for Buildings Ontologies. Where Is Their Meeting Point?” Data & Knowledge Engineering 46, no. 1 (2003): 41–64. 7. Intitute of Electrical and Electronics Engineers (IEEE), IEEE Std 610.12-1990(R2002): IEEE Standard Glossary of Software Engineering Terminology (Reaffirmed 2002) (New York: IEEE, 2002). 8. J. Krause, “Semantic Heterogeneity: Comparing New Semantic Web Approaches with those of Digital Libraries,” Library Review 57, no. 3 (2008): 235–48. 9. T. Berners-Lee, J. Hendler, and O. Lassila, “The Semantic Web,” Scientific American 284, no. 5 (2001): 34–43. 10. World Wide Web Consortium (W3C), Resource Description Framework (RDF): Concepts and Abstract Syntax, W3C Recommen- dation 10 February 2004, http://www.w3.org/TR/rdf-concepts/ (accessed Oct. 5, 2009). 11. World Wide Web Consortium (W3C), Web Ontology Lan- guage (OWL), 2004, http://www.w3.org/2004/OWL (accessed Oct. 5, 2009). 12. World Wide Web Consortium (W3C), SKOS Simple Knowledge Organization System, 2009, http://www.w3.org/ TR/2009/REC-skos-reference-20090818/ (accessed Oct. 5, 2009). 13. M. M. Yee, “Can Bibliographic Data be Put Directly onto the Semantic Web?” Information Technology & Libraries 28, no. 2 (2009): 55-80. 14. L. F. Spiteri, “The Structure and Form of Folksonomy Tags: The Road to the Public Library Catalog,” Information Technology & Libraries 26, no. 3 (2007): 13–25. 15. Corcho, Fernández-López, and Gómez-Pérez, “Method- ologies, Tools and Languages for Buildings Ontologies.” 16. IEEE, IEEE Std 610.12-1990(R2002). 17. N. F. Noy and D. L. McGuinness, “Ontology Develop- ment 101: A Guide to Creating Your First Ontology,” 2001, Stan- ford University, http://www-ksl.stanford.edu/people/dlm/ papers/ontology-tutorial-noy-mcguinness.pdf (accessed Sept 10, 2010). 18. D. Baader et al., The Description Logic Handbook (Cam- bridge: Cambridge Univ. Pr., 2003). 19. World Wide Web Consortium, DAML+OIL Reference Description, 2001, http://www.w3.org/TR/daml+oil-reference (accessed Oct. 5, 2009); W3C, OWL.