References as Knowledge Management

Erik Wilde
Computer Engineering and Networks Laboratory
Swiss Federal Institute of Technology, Zürich
net.dret@dret.net

Abstract

Management of bibliographic and web references for many researchers is the closest thing to knowledge management they will ever do. This article describes ShaRef, a new approach to reference management that focuses on the user and enhances traditional reference management approaches with collaboration features and lightweight knowledge management. While this is primarily targeted at providing individual users and user groups with a better tool, it also creates a new and interesting link to libraries, because of the features that enable users to go from their own references directly to the library through the use of OpenURL. Thus, libraries must adjust to these new types of users, who are using new technologies to access a library.

In the context of libraries, a "reference desk" and "reference services" usually refer to library employees helping library users find library resources. For most researchers, however, a "reference" is something more abstract: a pointer such as a bibliography entry pointing to a resource such as a journal article. Library references, on the other hand, often are required to help library users to convert very fuzzy references into more concrete references, which then point to actual resources. Thus it could be argued that the "human references" cover a much wider area of locating actual resources, but nevertheless this article concentrates on the abstract references and their management.

Looking up "reference" in the dictionary, one finds, among other entries, "one referred to or consulted," and "something that refers." We will concentrate on the latter definition of a reference as an abstract entity, arguing that most researchers routinely collect references as part of their personal -- rather lightweight -- knowledge management. We present some statistics gathered through a university-wide survey at ETH Zürich, and then present a system which is currently being developed, aimed at bringing together the references collected by researchers, groupware aspects of collaboration among researchers, and the traditional library world.

The ShaRef Project

The ShaRef (Shared References) Project is a project funded by and carried out at ETH Zürich, the third-largest Swiss university. ShaRef's goal is to improve the way ETH members (mainly researchers and students) manage their collections of references. These collections contain primarily bibliographic references and web bookmarks. In an initial study (Wilde 2004), it became apparent that many university members (mostly the researchers) are collecting large numbers of references (most of them between 100 and 500), often through their entire professional careers. They often augment these references with individual information such as keywords or comments. The goal of the ShaRef Project is to help people improve the management of this information, on the one hand by having better tools to work with, but also by having tools to share it with the research group, project collaborators, students, the university, or the entire web population.

Surprisingly, when looking into how people manage their reference collections, we found that most of this is done using rather simple tools such as BibTeX or EndNote (over 60% of all respondents used BibTeX or EndNote). While these tools are appropriate for the task of compiling bibliographic entries for inclusion in documents, they have not been built for tasks going beyond this, such as cross-linking references and extending the basic reference model with additional information. A review of the literature showed that libraries, on the other hand, are not very interested in individual users collecting their own references, because from the libraries' point of view, a library catalog compiled and maintained by trained catalogers is far superior to anything an individual user could ever create. It is interesting to note that EndNote supports this library-centered view of the world by integrating library access through Z39.50, enabling EndNote users to import library catalog records with the click of a button.

The ShaRef vision, on the other hand, is to create a tool and an environment with the individual user as the central and most important entity. If the references are less perfect than a complete MARC record, this should not be a problem, as long as the information is sufficient for the individual user's knowledge management requirements. However, the challenging part for the project is to provide as many links to the outside world as possible by implementing sharing functionality with other users (via ShaRef or via web-based publishing), and by providing back-links to a library through OpenURL technology, which enables users to easily find resources in their preferred library.

Knowledge Management Today

Even though research and vendors have created an astounding number of technologies and tools for personal knowledge management, it is surprising to see that the vast majority of researchers do not use any kind of formal knowledge management. The closest thing to formal knowledge management many researchers ever use is managing bibliographic references. Many also collect web bookmarks, but very few view these types of references as being similar, and for many researchers bibliographic information is far more important than web bookmarks.

This, however, is going to change, as the boundaries between traditional publishing and web-based information resources are slowly but surely disappearing. Traditional publishing material is made more web-compatible by assigning Digital Object Identifiers (DOIs), and an increasing number of high-quality and peer-reviewed publications are available online only. Thus, reference management solutions designed for the future should not make the increasingly artificial distinction between traditional publishing and web-based publishing. One thing, however, that should be taken into account is the much shorter average life span of a web-based resource; tools must take this into account and provide appropriate functionality such as caching and automatic link checking. After all, a broken link is not significantly different than a reference to a book that has gone out of print; both refer to resources that cannot be easily retrieved, even though they may have been archived somewhere.

ShaRef's goal is to take reference management to a level that makes it easy to seamlessly integrate bibliographic and web references. This not only reflects the changing world of publishing, it also makes it easier to manage all references uniformly, so that ShaRef's knowledge management capabilities can be used across all types of references.

Knowledge Management with ShaRef

ShaRef's first goal is to provide an environment within which references to bibliographic and web resources can be treated uniformly. Continuing from there, ShaRef provides minimal, but useful support for knowledge management. Most casual users are not willing to invest a lot of time in learning a new tool and the underlying model, so ShaRef aims at providing the most benefit with a minimum of effort. ShaRef does so by supporting two concepts, keywords and cross-references.

Keywords can be used to identify concepts that should be referenced within ShaRef. For example, keywords could be used to index a paper based on its subject. ShaRef makes no attempt to define a given keyword vocabulary, or to structure keywords in any way, so keywords are simply an unstructured set of named concepts that can be referenced throughout ShaRef. Thus, more advanced concepts of structuring keywords such as ontologies (Bechhofer 2004) are not supported by ShaRef, but it is possible to use such a structure as an overlay over ShaRef's keywords and thus connect ShaRef's keywords with an externally defined ontology.

Cross-references connect references, and BibTeX users already know this concept which can be used to link entries from collections (such as papers from conference proceedings) to the complete volume (the proceedings itself). ShaRef generalizes this concept by enabling users to make generic cross-references. This can be used to create annotations that point to other references, such as an annotation to a paper stating that the claim made in this paper has been rejected by other publications. ShaRef does not assign semantics to these cross-references and thus goes less far than other systems providing well-defined semantics and even reasoning capabilities (Uren 2003), but ShaRef's approach is lightweight and sufficient to turn isolated references into a web of related metadata resources.

Sharing Information

Since knowledge is not only about connecting information, but also connecting with other people, ShaRef supports sharing of information. It does so on a very basic level by providing publishing features which make it easy to make the reference information available to other people. This can be done through ShaRef itself, or by publishing it in suitable formats, such as HTML on the web or PDF for printed reference lists.

Sharing also is supported on a collaborative base, where people can decide to manage a set of references collaboratively. ShaRef supports users and user groups through identification and authorization features, and enables user groups to collaboratively manage references. These user groups may be research groups, university departments, or students attending some lecture. However, ShaRef does not require users to share their information, it also allows users to keep their references completely private.

Connecting to the Library

Even though the user is the center of the ShaRef environment, it is important for users to have proper support for locating resources. For web bookmarks, this is very easy by having a browser handle the bookmark's URI. For bibliographic references, however, this can be harder, and depending on the quality of the reference (for example, does it contain well-defined identifiers such as DOI, ISBN or ISSN?), finding the appropriate resource for a reference can be a non-trivial problem. And this is where we are back to the thoughts from the beginning of this article, the point where a human reference (i.e., a knowledgeable library employee) would be very helpful to find a resource for a poorly defined reference.

Apart from the fact that researchers will always be able to visit their libraries and get individual help for locating a resource, ShaRef also supports a mechanism for an automated process through the use of OpenURL (Van de Sompel 2001). ShaRef users have to configure their local library's OpenURL resolver, and in turn can access the library's OpenURL service directly from within ShaRef. This way, ShaRef users can get from their personal references to the respective holdings of their library with a few clicks, very often without having to manually enter any additional information at all.

For this scenario to work, the library's OpenURL resolver must be configured appropriately, and as initial experiments with our library have shown, this is not trivial. The new task of human references within libraries, at least partly, could thus be to set up and maintain the OpenURL resolver. Doing this is an interesting and challenging task, because it involves thinking ahead of how to best serve all possible OpenURL queries with respect to the information sources of the library. In many cases, it may even be possible to directly guide users to the full text of resources that are available online. In other cases, guiding them to the most appropriate records in the library catalog is the best response. Overall, providing a good OpenURL service is key to the library of the future.

How to Find Users

In our initial study (Wilde 2004), the majority of users (71%) responded that they did not want to manage their bibliographies and bookmarks using a single tool. However, we believe that this answer is caused by the current lack of a uniting model and tool, and that a model and tool along the lines of ShaRef will show users the possible advantages of unified reference management. It is our goal to convince users that ShaRef is the more productive tool for managing references, and by providing import and export features for popular bibliography formats (BibTeX and EndNote) and bookmarks we offer them an easy path to test and maybe switch.

However, we are also aware that our model and our tool may not be the most appropriate tool for every user, so along with our goals to unify reference handling providing an environment supporting reference management and sharing, we have also identified a number of non-goals, which we explicitly do not want to support, such as library-scale cataloging, advanced ontology management, and advanced query or even reasoning features. However, due to the openness and extensibility of ShaRef, it is easily possible to add some of this functionality as additional layers on top of ShaRef.

Openness and Extensibility

ShaRef is based on XML technologies and the data model is defined as an XML Schema. It is thus easily possible to add new fields to ShaRef records, which are handled transparently in ShaRef. Openness means that ShaRef will accept unknown XML structures (where they are allowed) and handle them transparently, so that they will be exported from ShaRef in the same way as they have been imported. For example, a user adding sophisticated XML-based descriptions of resources to references using a self-defined field type will be able to retrieve this field together with the reference unaltered. Thus, ShaRef can be used to store data that it cannot interpret.

Extensibility means that ShaRef allows users to define their own fields, choosing from a small set of predefined field types. These fields will be handled by ShaRef as if they were standard built-in fields, and users can thus extend ShaRef's data model by defining their own fields.

ShaRef is designed to avoid lock-in, it supports various import formats (BibTeX, EndNote, and bookmarks) and also supports these formats as output formats. However, because of the inherent limitations of these formats, some information will be lost when exporting data. Therefore, ShaRef data can also be exported in XML, in which case no information will be lost.

Because of its openness and extensibility, ShaRef can be used as a foundation for adding additional layers of software. As outlined above, more advanced technologies for keyword handling, for example Topic Maps, and more advanced technologies for handling cross-referencing, such as ClaiMaker (Uren 2003), could easily be added to ShaRef. Since ShaRef also has an API, applications wishing to use ShaRef as a back-end technology can easily do so by using the ShaRef API and providing an interface of their own.

Conclusions

In addition to the features described above, two interesting aspects of ShaRef so far have been left out. The first is that ShaRef is mainly designed as a Java-based GUI based on a client/server-architecture. Since client and server communicate through a web service API, this API is also available to other clients who wish to access the ShaRef server. Thus, ShaRef not only is a tool, but also can be regarded as a service.

Furthermore, ShaRef supports online and offline modes. In the online mode, the Java client communicates with the server. In offline mode, however, all data resides on the client. This configuration is ideal for traveling or for users who are only interested in the management functionality of ShaRef, but not in the publishing and sharing features.

ShaRef is under construction, but we believe that the blend of features combining personal knowledge management, collaboration, and reference handling is unique; and we hope they will make ShaRef a success at ETH Zürich and elsewhere. We are closely collaborating with the local library to integrate ShaRef as well as possible with the library's OpenURL resolver, and we hope that ShaRef will play a useful role in bringing users closer to the library, and in making the library services easier accessible for users.

References

Bechhofer, Sean, et al. 2004. OWL Web Ontology Language Reference: W3C Recommendation 10 February 2004. [Online]. Available: http://www.w3.org/TR/2004/REC-owl-ref-20040210/ [Accessed November 5, 2004].

Uren, Victoria, et al. 2003. Scholarly Publishing and Argument in Hyperspace. In The Twelfth International World Wide Web Conference, pp. 244-250, Budapest, Hungary: ACM Press. [Online]. Available: http://www2003.org/cdrom/papers/refereed/p137/p137-uren.html [Accessed November 5, 2004].

Van de Sompel, Herbert and Beit-Arie, Oren. 2001. Open linking in the scholarly information environment using the OpenURL framework. D-Lib Magazine 7(3). [Online]. Available: http://www.dlib.org/dlib/march01/vandesompel/03vandesompel.html [Accessed November 5, 2004].

Wilde, Erik. 2004. Usage and Management of Collections of References. Zürich, Switzerland: Computer Engineering and Networks Laboratory, Swiss Federal Institute of Technology, TIK-Report No. 194. [Online]. Available: http://dret.net/netdret/publications#wil04h [Accessed November 5, 2004].

Previous	Contents		Next
Issues in Science and Technology Librarianship		Fall 2004
DOI:10.5062/F4CR5R94