Dewey linked data: making connections with old friends and new acquaintances Joan S. Mitchell, Michael Panzer We adress the history, uses cases, and future plans associated with the availability of the Dewey Decimal Classification (DDC) system as linked data. Parts of the DDC have been available as linked data since 2009. Our initial offering included the DDC Summaries (the top three levels of the DDC) in eleven languages exposed as linked data in dewey.info, an experimental web service. In 2010, we extended the content of dewey.info1 by adding assignable numbers and captions from the Abridged Edition 14 data files in English, Italian, and Vietnamese. In mid-2012, we -extended the content of dewey.info yet once again by adding assignable numbers and captions from the schedules and geographic table in the latest full edition database, DDC 23. We will discuss the behind-the-scenes development and data transformation efforts that have supported these offerings, and then turn our attention to some uses of Dewey linked data plus future plans for Dewey linked data services. 1http//dewey.info. JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). DOI: 10.4403/jlis.it-5467 http//dewey.info http://dx.doi.org/10.4403/jlis.it-5467 J.S. Mitchell, Dewey linked data History The history of Dewey linked data is an evolving story of opportunity and experimentation, with an eye toward usability and use of the data. In 2009, the DDC 22 Summaries, an authorized derivative work based on the top three levels of DDC 22, had already been translated into ten languages (more languages than the full edition of the DDC on which the data were based). We decided to experi- ment with making the DDC Summaries available as linked data in an experimental web service, dewey.info. Our initial design goals included: • provide an actionable URI for every class; • encode the classification semantics in RDF/SKOS; • provide representations for machines and for humans; • make the data usable under a widely understood license used in the Semantic Web community. Publishing Dewey as linked data required development decisions on several different fronts. First of all, we had to develop a URI pat- tern that would support the identification of several different kinds of entities and relationships. The URIs had to act as dereferenceable identifiers that could deliver representations of the referenced re- sources in a RESTful manner. Each class had to be identified with a URI and the data had to be presented in a reusable way. In develop- ing the URI pattern, we had to provide for the full complexity of the DDC at any time: identification of the scheme, parts of the scheme, edition, language, and time slice. Figure 1 shows the status of DDC 22 at the time of initial development of URIs for the DDC. DDC 22 was initially published in 2003; the various DDC 22 transla- tions were published in 2005 (German), 2007 (French), 2009 (Italian), JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 178 JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013) Figure 1: Versions of the DDC based on DDC 22. JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 179 J.S. Mitchell, Dewey linked data and 2011 (Swedish-English mixed version). Abridged Edition 14 (a logical abridgment of DDC 22) was published in 2004; translations followed in 2005 (French), 2006 (Italian and Vietnamese), and 2008 (Hebrew and Spanish). The DDC Summaries based on DDC 22 were published in English and ten other languages at the time of the introduction of dewey.info. Besides the DDC Summaries, figure 1 includes two other authorized derivative works based on DDC 22: 200 Religion Class (2004), an updated subset of DDC 22; Guide de la classification décimale de Dewey, a French-language customized abridgment of DDC 22, and DDC Sachgruppen, a German transla- tion of selected DDC 22 top-level classes (including some below the three-digit level) developed for the primary use case of organizing the national bibliographies of Germany, Austria, and Switzerland (the four languages in the box on the right-hand side of figure 1 are translations of DDC Sachgruppen; all five language versions are used in the national bibliography of Switzerland). Dewey.info includes representations for machines and humans; the latter is particularly important in order to illustrate the DDC data offerings to a wider community beyond traditional users of value vocabularies from the library community. The data in dewey.info are presented in human (XHTML+RDFa) and machine (RDF) ver- sions (the machine version of dewey.info has three different RDF serializations: RDF/XML, Turtle, and JSON). The Dewey URIs have the following general pattern: http://dewey.info/{object-collection}/ {object}/{snapshot-collection}/{snapshot}/about}. Specific documents have a variable resource name component and allow specification of content language and type (format): http://dewey.info/{object-collection}/{object}/{snapshot-collection}/ {snapshot}/{resource-name}.{language}.{content-type}. An object is a member of the DDC domain and part of an object collection. The object collection specifies the type of the object. The object collection is a mandatory component and can have one of the values ”scheme,” ”table,” ”class,” ”manual,” ”index,” ”summary,” JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 180 JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013) and ”id.” A specific object from that collection follows if required. For example: http://dewey.info/class/576.83/ http://dewey.info/scheme/ http://dewey.info/table/2/ A snapshot is used to refer to versions of objects at specific points in time. Snapshots can be part of a snapshot collection, e.g., ”e22,” referring to every concept version that is part of Edition 22 of the DDC. In the following examples, the first URI is an example of a snapshot, the second is an example of a snapshot collection, and the third is an example of a snapshot-collection/snapshot/ combination. snapshot-collection/snapshot/. http://dewey.info/class/641/2009/ http://dewey.info/class/641/e22/ http://dewey.info/class/641/e23/2012-08/ Language and format are also accommodated in the URI: http://dewey.info/class/641/about.it http://dewey.info/class/641/about.rdf http://dewey.info/class/641/about.it.html While SKOS is often the RDF vocabulary of choice for represent- ing controlled vocabularies on the Web, its initial development was largely informed based on thesaurus-like knowledge structures. Panzer (“DDC, SKOS, and linked data on the web”) and Panzer and Zeng (“Modeling Classification Systems in SKOS: Some Challenges and Best-practice Recommendations”) have noted some of the chal- lenges in representing classification data in SKOS. Since the initial DDC linked data offering did not include complicated note types and relationships between classes other than those expressed by the notational hierarchy, the shortcomings in SKOS noted elsewhere with respect to the representation of classification data did not pose JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 181 J.S. Mitchell, Dewey linked data a major roadblock in the exposure of the DDC 22 Summaries in dewey.info. The query http://dewey.info/class/641/about.it.rdf delivers the following machine-actionable representation in RDF/SKOS, which focuses on presenting concept metadata together with number and caption information plus basic semantic relationships. Note that the two main entities retrieved are http://dewey.info/ \class/641/ and http://dewey.info/class/641/2007/02/about.it, connected through a dct:hasVersion relationship: Listing 1: Example of concept metadata representation in RDF/SKOS. OCLC Online ComputerLibrary Center, Inc. it JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 182 http://dewey.info/class/641/about.it.rdf http://dewey.info/\class/641/ http://dewey.info/\class/641/ http://dewey.info/class/641/2007/02/about.it JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013) 641 Cibi e bevande 2000-01-01T00:00:00.0+01:00 2006-01-28T22:04:16.000+0100 JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 183 J.S. Mitchell, Dewey linked data JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 184 JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013) 641 641 Finally we needed an appropriate license model. We make data on dewey.info available under a Creative Commons BY-NC-ND license.2Licensing information is embedded in RDF and RDFa fol- lowing the Creative Commons Rights Expression Language (ccREL) specification.3 In the RDF/SKOS extract above, the following licens- ing information is embedded in the RDF: Listing 2: CC license embedded in RDF/SKOS OCLC Online Computer Library Center, Inc. 2http://creativecommons.org/licenses/by-nc-nd/3.0. 3http://wiki.creativecommons.org/CcREL. JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 185 http://creativecommons.org/licenses/by-nc-nd/3.0 http://wiki.creativecommons.org/CcREL J.S. Mitchell, Dewey linked data A year after the initial offering, we extended the data available in dewey.info with the addition of assignable numbers and captions from Abridged Edition 14 in three languages (English, Italian, and Vietnamese). This extension added about 3500 additional records for each language to the data already available in dewey.info. While the DDC Summaries represented a broader set of languages than avail- able in the full and abridged translations, the new abridged-edition offerings were a subset of the languages in which the edition had been translated. Why were English, Italian, and Vietnamese chosen? The simple answer was that each was available in the same propri- etary format, ESS XML, for which we already had an RDF/SKOS transformation. Parallel to the linked data work, the Dewey editorial team was making a major data transformation of another type—moving from the proprietary ”ESS” format to one based on the MARC 21 Clas- sification and Authority formats. In 2009, the DDC Summaries were transformed from ESS XML to RDF/SKOS; we used the same transformation to make the Abridged Edition 14 data available in dewey.info. In 2010, OCLC moved to a new underlying represen- tation for the DDC, adopting one based on the MARC 21 formats for classification data (to represent class records) and authority data (to represent Relative Index and mapped terminologies associated with class records). At the same time, OCLC adopted MARCXML as the distribution and ingest format for DDC data across versions, and moved to a new data distribution and ingest model (previously, data transfers were handled at the individual file level over an ftp site). We made a decision to delay the distribution of additional DDC data in dewey.info until we could productionize the data trans- formation and distribution process operating on the new format JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 186 JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013) and within the distribution environment. This meant taking the data encoded in MARCXML from the distribution server, applying the RDF/SKOS transformation stylesheet, and associating the result with a ”subscription,” automatically creating an Atom feed of data sets that a user agent (in this case, dewey.info) could pick up from the distribution server over a RESTful interface. A model of the process is shown in figure 2. Figure 2: Dewey distribution environment. We installed the pieces on the distribution server that would make this possible in May 2012. In in mid-June 2012, we added assignable numbers and captions from the DDC 23 schedules will be available to dewey.info ; this addition of over 38,000 numbers increased the available Dewey linked data nearly tenfold. In August 2012, we fur- ther extended Dewey linked data by adding the assignable notation JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 187 J.S. Mitchell, Dewey linked data and captions from Table 2 (the Dewey geographic table). Next steps Our next planned offering is the linking of ”new acquaintance,” GeoNames, to Table 2 data. Because we want to manage all editori- ally curated data (including mappings) with the OCLC ESS system, this will require short-term and long-term changes to geographic data within the system. In order to allow the provision of geographic data on the class level, the Dewey editorial team developed MARC PROPOSAL NO. 2011-10,4 which was approved by MARBI in June 2011. The proposal defines new fields that allow for the storage and display of geographic codes in MARC classification records, thereby enabling the reuse of parts of the Relative Index links to GeoNames (generated by the matching algorithm) on the class level in applications downstream, e.g., in linked data representations of the DDC. Use cases In addition to linking plans, we report on use cases that facilitate machine-assisted categorization and support discovery in the Se- mantic Web environment. It is important to have use cases for Dewey linked data, and to solicit new use cases that might inform decisions about our data offering. Institutions such as Bibliothèque nationale de France, the British Library, and Deutsche Nationalbib- liothek have made use of Dewey linked data in bibliographic records and authority files .FAO has linked AGROVOC to our data at a gen- eral level. We are also exploring links between the DDC and other 4http://www.loc.gov/marc/marbi/2011/2011-10.html. JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 188 http://www.loc.gov/marc/marbi/2011/2011-10.html JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013) value vocabularies such as VIAF, FAST, ISO 639-3 language codes, and MSC (Mathematics Subject Classification). Today, we would like to focus on three uses cases, a caption service, the ”old friend” of DDC synthesized number components associated with categorized content, and the ”new acquaintance” of DDC-GeoNames links. Caption service Querying Dewey linked data The first use case is a simple one: querying Dewey linked data by a Dewey number to have the associated caption delivered as an expla- nation of the number. For example, the query http//dewey.info/ class/945.5/about will return information about class 945.5, includ- ing the captions ”Regione della Toscana” and ”Tuscany (Toscana) region.” There are also two ways in which this data is made accessi- ble to machines and can therefore be used in an automated way as part of a library catalog or other discovery tool. The HTML page for class 945.5 contains structured data in RDFa markup, which means that user agents will be able to distill caption information as regular RDF triples. Another very powerful and flexible way is directly accessing the triple store using the SPARQL endpoint. Listing 3: Query that returns all distinct captions associated with class num- ber 945.5 PREFIX skos: SELECT DISTINCT ?caption WHERE { {GRAPH ?g {?concept skos:notation ’’945.5’’^^; skos:prefLabel ?caption JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 189 J.S. Mitchell, Dewey linked data } } } Note that the endpoint supports HTTP bindings of the SPARQL protocol, meaning that the endpoint serves as a general web service interface (in case the linked data presentation is not preferred). DDC-DDC number components links The second use case is an enhancement of data in dewey.info taken from the DDC itself: links to Dewey synthesized number compo- nents. The concept is simple: What if we linked every synthesized number to its component parts? For example, 641.59455 represents the cooking of Tuscany (641.59 Cooking characteristic of specific continents, countries, localities + T2—455 Tuscany [Toscana] region). The underlying Dewey data includes the MARC 21 765 Synthesized Number Components field: 765 0# $b641.59 $z2$s 455 $u641.59455 By establishing a link between 641.59455 and T2—455 (represented as ””$z 2$s 455” in the 765 field and as ”2–455” in the URI string), it is possible to isolate the geographic facet and use it to foster alterna- tive approaches to discovery. The potential enhancements to such discovery is discussed in the next section. DDC-GeoNames links Linking Dewey data with GeoNames offers the opportunity to ex- tend the boundaries of categorization and discovery. Since GeoN- ames has emerged as not only the dominant source for geographic coordinates in the linked data space, but also as a leading provider of identifiers (URIs) for geographic entities, a GeoNames term can act as a general equivalent or a boundary object for data from dif- JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 190 JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013) ferent domains that have never been directly mapped to each other. The linking of two concepts in different schemes or from different domains to the same GeoNames entity helps to establish a common ”aboutness” of these two terms. Figure 3 illustrates how a common link to a GeoNames term from a geographic class in dewey.info and from a New York Times subject heading for the same geographic area establishes a strong (albeit implicit and untyped) relationship between these two terms because both entities are ”about” the same city. Also, by extension it can be assumed that all articles and other resources indexed with the NYT heading should be discoverable by the DDC class, therefore adding to the amount of categorized content that can be retrieved by using this DDC number in a discovery interaction. Links to datasets like Figure 3: Links to GeoNarmes. GeoNames extend the boundaries of DDC classes on a conceptual level as well. Whereas a traditional mapping between KOS usually connects entities of the same type (e.g., concepts), linking in the sense of the Semantic Web can connect different kinds of named/i- JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 191 J.S. Mitchell, Dewey linked data dentified entities. While a mapping between concepts often operates with variations of semantic relationships traditionally employed by thesauri (e.g., broader/narrower, related, whole/part), linking of different types of entities requires a new set of relationships tailored to the domain model of the linked dataset or value vocabulary. In the case of GeoNames, in order to store the links in MARC, we have to use a traditional mapping relationship. However, in a linked data version, the SKOS mapping relationships (corresponding to traditional thesaurus relationships) cannot be used to link Dewey classes and GeoNames terms, because GeoNames URIs identify a gn:Feature, which is defined as ”a geographical object” and, being a subclass of http://schema.org/Place, as an entity with a ”physical extension.” In other words, GeoNames (like many other ontolo- gies) does not contain descriptions of or identifiers for concepts of places; it contains descriptions of and identifiers for the places them- selves. In such cases, a relationship like foaf:focus should be used, which ”relates a conceptualisation of something to the thing itself.” A GeoNames URI identifies a locality, not a concept of a locality. This operation effectively connects a Dewey concept with a differ- ent set of relationships, which can be used to present information seekers compelling tools to identify and select geographic features for resource discovery. In essence, it opens up a new perspective or viewpoint on the arrangement of classes in Dewey. Figure 4 on the facing page shows in parallel two different kinds of neighborhoods applicable to T2—6626 Niger. The established Dewey ”neighborhood” shows the class in the context of the DDC notational hierarchy. Linking this class to its corresponding GeoN- ames feature, however, allows for reusing GeoNames’ gn:neighbour relationship and applying it directly to this Dewey class. The right- hand side shows the concept T2—6626 surrounded by features that neighbor the country in its foaf:focus in the physical world. JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 192 JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013) Figure 4: Two views of T2—6626 Niger. Taking this one step further, linking all geographic Dewey concepts to GeoNames allows for an on-the-fly switching of the viewpoint as needed, effectively allowing for transforming the concepts temporar- ily into features, and, by using inherited properties like geographic coordinates, placing them on a map (figure 5 on the next page). Furthermore, DDC classes can utilize more than just relationships inherited from geographic features. The links allow also for a more expressive typing of related DDC entities and open the door to geospatial reasoning over the underlying DDC data. For example, usually it is not clear whether a Dewey number represents a country (or another type of entity). But in the above example, the ”inherited” types allow for basic viewpoint-transgressing queries such as: ”Dis- play all Dewey numbers that represent countries that are adjacent to JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 193 J.S. Mitchell, Dewey linked data Figure 5: Blending of Dewey viewpoint and geographic viewpoints. JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 194 JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013) T2—6626.” Figure 6 shows another example of transgressing viewpoints. Table 2 is mainly arranged by continents, which means that countries that span different continents are separated notationally, i.e., they don’t occupy a contiguous span of Dewey numbers. This may even be true for cities in these countries, e.g., Istanbul in Turkey occupies subdivisions of both T2—4 and T2—5. While Dewey provides all necessary relationships in order to relate the European and Asian parts of Turkey, they are divided notationally, making it not a simple task for a discovery system to offer the user a compelling way of selecting subentities for retrieval. Using the inherited gn:neighbour relationship, however, makes it easy to display classes about the European part of Turkey e.g., T2—49618, shown with its Relative Index terms in yellow) and the Asian part (e.g., T2—5632, shown with its Relative Index terms in green) together in a geobrowser like Google Earth using the geographic viewpoint. Figure 6: Overlaying Dewey classes and Relative Index terms on a map using properties of linked entries. JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 195 J.S. Mitchell, Dewey linked data Conclusion The contents of dewey.info and links to Dewey data have evolved over time as we have taken advantage of various opportunities for experimentation. With each addition, we have considered possible use cases for the additional data. The following statement appears in the last paragraph of the final report of the W3C Linked Library Data Incubator Group (2011) : Linked data follows an open-world assumption: the assump- tion that data cannot generally be assumed to be complete and that, in principle, more data may become available for any given entity. The schema-less RDF data model allows for a substantial degree of freedom (compared to the relational database paradigm) in leverag- ing existing data by enrichment and addition of new connections almost ad hoc. Our efforts to publish the DDC as a linked data value vocabulary have taken place in a rich and evolving Dewey ecosys- tem. Figure 7 shows the current state of translations and versions published, planned, or under way based on DDC 23 data; where known, expected publication dates are shown in parentheses. Figure 8 shows the current mappings and crosswalks between the DDC and other knowledge organization systems. We expect to continue extending linked DDC data within the rich environment described in figure 7 on the next page and figure 8 on the facing page to meet use cases in categorization and discovery. JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 196 JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013) Figure 7: Editions and versions based on DDC 23. Figure 8: Mappings and crosswalks to the DDC. JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 197 J.S. Mitchell, Dewey linked data References Panzer, Michael. “DDC, SKOS, and linked data on the web”. Proc. of Everything Need Not Be Miscellaneous: Controlled Vocabularies and Classification in a Web World, Montréal, Canada, August 5 2008. 2008. http://www.oclc.org/news/ events/presentations/2008/ISKO/20080805- deweyskos- panzer.ppt. (Cit. on p. 181). Panzer, Michael and Marcia Lei Zeng. “Modeling Classification Systems in SKOS: Some Challenges and Best-practice Recommendations”. Semantic interoperability of linked data: Proceedings of the International Conference on Dublin Core and Metadata Applications. Seoul, October 12-16 2009. Ed. S. Oh, S. Sugimoto, and Sutton S.A. Seoul: Dublin Core Metadata Initiative and National Library of Korea, 2009. 3–14. http://dcpapers.dublincore.org/ojs/pubs/article/view/9748. (Cit. on p. 181). JOAN S. MITCHELL, OCLC. mitchelj@oclc.org http://staff.oclc.org/d̃ewey/joan.htm MICHAEL PANZER, OCLC. panzerm@oclc.org http://staff.oclc.org/d̃ewey/michael.htm Mitchell, J.S., M. Panzer. ”Dewey linked data: making connections with old friends and new acquaintances”. JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013): Art: #5467. DOI: 10.4403/jlis.it-5467. Web. ABSTRACT: This paper explores the history, uses cases, and future plans associated with availability of the Dewey Decimal Classification (DDC) system as linked data. Parts of DDC system have been available as linked data since 2009. Initial efforts in- cluded the DDC Summaries in eleven languages exposed as linked data in dewey.info. In 2010, the content of dewey.info was further extended by the addition of assignable numbers and captions from the Abridged Edition 14 data files in English, Italian, and Vietnamese. During 2012, we will add assignable numbers and captions from the latest full edition database, DDC 23. In addition to the ”old friends” of different Dewey language versions, institutions such as the British Library and Deutsche Na- JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 198 http://www.oclc.org/news/events/presentations/2008/ISKO/20080805-deweyskos-panzer.ppt http://www.oclc.org/news/events/presentations/2008/ISKO/20080805-deweyskos-panzer.ppt http://dcpapers.dublincore.org/ojs/pubs/article/view/9748 mailto:mitchelj@oclc.org http://staff.oclc.org/~dewey/joan.htm mailto:panzerm@oclc.org http://staff.oclc.org/~dewey/michael.htm http://dx.doi.org/10.4403/jlis.it-5467 JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013) tionalbibliothek have made use of Dewey linked data in bibliographic records and authority files, and AGROVOC has linked to our data at a general level. We expect to extend our linked data network shortly to ”new acquaintances” such as GeoNames, ISO 639-3 language codes, and Mathematics Subject Classification. In particular, the paper examines the linking process to GeoNames as an example of cross-domain vocabulary alignment. In addition to linking plans, the paper reports on use cases that facilitate machine-assisted categorization and support discovery in the semantic web environment. KEYWORDS: DDC; Dewey linked data; Dewey Decimal Classification Submitted: 2012-04-25 Accepted: 2012-08-31 Published: 2013-01-15 JLIS.it. Vol. 4, n. 1 (Gennaio/January 2013). Art. #5467 p. 199