Outgoing Outgoing Library metadata techniques and trends by Thom Hickey Astronimcal FITS images Now for something a little different: Since retiring from OCLC I don't do a lot with library metadata, but I've recently had some fun exploring astronomical images, which come with their own data/metadata format FITS, the Flexible Image Transport System. Everyone that wants to share astronomy data uses FITS.  It was developed in the early 1980's and has a strong FORTRAN flavor in the how the data is stored.  Having processed the variable length fields inherent in MARC records with FORTRAN I can appreciate the attractiveness of fixed length blocks, arrays of binary data and 80-byte card images to the engineers/scientists of the time. One of the wonders of our time is all the astronomical work that is being done, and that within a year or two of the observations much of the data is publicly available.  The image at the head of this post came from the Hubble Legacy Archive which has an interface that will allow you to search by name and star catalog numbers, select the type of image you are interested in, and view previews of images before downloading the FITS file. Of course most of the fun in working with the images is writing some of the code that makes it possible.  There are lots of programs available that will help you look at FITS files, such as FITS Liberator which will 'liberate' FITS images into something that Photoshop can process.  Those are nice, but farther away from bare metal than I like to be.  So I wrote a little program in J that does some rudimentary processing with FITS data.  J (download it here) is a slightly obscure (but actively used and maintained) language derived from APL. Possibly more accurately J evolved from APL in an effort led by Kenneth Iverson, the inventor of APL.  While it does take some initial effort to become proficient in array languages such as J, it is remarkable how much can be done in a few characters.  Admittedly those few characters may take some deciphering, but so would the much longer code they replace.  In some ways it reminds me of trying to use a new alphabet, such as Cyrillic.  At first the script is confusing or actually misleading, but once learned they just become letters.  I got introduced to APL in the late 1970's when I first joined OCLC.  At the time OCLC ran on Sigma computers from Xerox/Honeywell.  Xerox tried to compete with IBM in the early 70's and  APL that was one of the few languages available on Sigma machines (in general we did most things in CP-V assembler which was really quite nice).  Their APL was clunky and slow and OCLC didn't have an APL terminal, but it worked and I used it to do some research into how people were using search keys on the OCLC system (not so well!). J can be described as a fusion of APL and Backus's FP.  It is open source, easy to install, does not require a special alphabet and the things it can do with arrays are amazing, if not always immediately obvious.  One of the things I like about it is the brevity of the code.  Having experimented with compact code in Python (Z39.50 client on a t-shirt),  it is surprisingly easy to work with dense code because you can see so much of it at once. J does invite a certain amount of points-free coding, a style that confuses me at times, but can be quite elegant. Map-reduce, is another style of functional programming that can be difficult at first to get comfortable with, but turns out to be very powerful.  We used map-reduce extensively at OCLC, so I came to J with some familiarity of that aspect of the language.  The code that produced the image at the top of the post can be found at github.com/ThomasBHickey/JFits.  It consists of about 60 lines of code in two files and has a couple of sample FITS image samples as well.  Since it is one of my first J programs, I asked the J programming forum to take a look at it, and they came back with a number of suggestions, so most of the clever code probably came from them.  I've tried to keep the code reasonably straightforward so it might be worth a look, but if you are interested in astronomy, it isn't that hard to explore what's available without doing any programming at all.  Don't expect the images to look just like the ones you see published, however.  Those have had a fair amount of processing (often in Photoshop) to tease out the most pleasing parts and suppress many of the instrument artifacts that seem to be in all the images. --Th       April 22, 2018 | Permalink | Comments (0) FRBR and Humphry Clinker Some may remember OCLC Research's work (obsession?) with Tobias Smollett's The Adventures of Humphry Clinker.  I believe it was Ed O'Neill that got us started with it, using it as an example of a work with a well defined text (it was Smollett's last novel and evidently never revised by him), but with many manifestations since it first appeared in 1771.  It is an important early (picaresque epistolary) novel, and popular through most of the 19th century. At any rate, we spent quite a bit of time with the bibliographic records in WorldCat that describe the various editions of Humphry Clinker and I recently happened upon a notebook that had printouts of 106 Humphry Clinker records as they were in WorldCat in August of 1988.  The highest OCLC number in the group is just under 17 million, and we thought that was a lot (they are now nearing one billion). At any rate, 106 records isn't that many, so I thought it would be interesting to compare them to current WorldCat and our FRBR work clustering. The first thing that struck me was how old fashioned the records look now.  Comparing them to the current records, they have all been touched in some way.  They now have many more subject headings and class numbers, RDA fields, typos corrected and quite few have been merged as duplicates. Here's a summary of what I found, comparing them to the 'enhanced' version of WorldCat used for FRBR processing. 10 of the 106 records have been merged into other WorldCat records (properly as far as I could tell).  All of the others except one are collected together in one FRBR 'work' and linked to the VIAF work record http://viaf.org/viaf/180810175. The one exception turns out to be bound with Smollett's Peregrine Pickle, and so qualifies as a collected work and currently is not linked to either one. In fact, the FRBR cluster found an additional 14 records created before August 1988 that it considers Humprhy Clinker.  Looking at them, they all either spelled Humphry as Humphrey, or didn't have the title in English.  Evidently I didn't pull in the Humphrey Clinkers, either by design or oversight.  In fact, back in the 1980's, our software wasn't sophisticated to find even small spelling variants such as Humprhy vs Humphrey, much less non-English versions. As part of this retrospective, I pulled all the WorldCat records in the current Humphry Clinker work set: 730 records! I mentioned earlier that all the records appear to have been 'touched' since 1988.  To get some feel for that, I looked at the records' 040 field that shows who modified the record.  The earliest 20 of the 1988 records had 14 modifications made to them, half by OCLC and half by other libraries.  The earliest 20 in the current sample found almost 10 times that: 136 modifications, 85 of those made by OCLC. In contrast, the most recent 20 records added to WorldCat have been modified 9 times, all by OCLC.  Altogether, the 730 current records show 1,856 modifications, 1,502 of those by OCLC. Of course, one of the most striking changes that WorldCat has undergone since 1988 is the addition of metadata in languages other than English.  In fact, 301 of the 730 current Humphry Clinker records are non-Engish descriptions, altogether in 14 different languages: English, German, French, Danish, Polish, Spanish, Italian, Dutch, Catalan, Swedish, Romanian, Hungarian, Slovenian, and Serbian. Looking at the language of the books being described, 49 of the 730 were not in English, not counting the 15 'undetermined': German, Russian, French, Hungarian, Romanian and Danish.  VIAF was able to find (or create) 8 non-English expression records. --Th The image at the top is by Isaac Cruikshank from the Fine Arts Museums of San Francisco.  February 26, 2016 | Permalink | Comments (1) More about justlinks We had an earlier post about the 'justlinks' view of VIAF clusters, but I thought it would be worthwhile to explore how that can combine with other VIAF functionality. First a reminder of how the justlinks view works.  While the default view of clusters to Web browsers is the HTML interface, VIAF clusters can be displayed in several ways, including the raw XML, RDF XML, MARC-21 and justlinks JSON.  Here's a request for justlinks.json: http://viaf.org/viaf/36978042/justlinks.json which returns: { "viafID":"36978042", "B2Q":["0000279733"], "BAV":["ADV11117013"], "BNE":["XX904401"], "BNF":["http://catalogue.bnf.fr/ark:/12148/cb122767803"], "DNB":["http://d-nb.info/gnd/114712638"], "ISNI":["000000010888091X"], "LAC":["0064G7865"], "LC":["n90602202"], "LNB":["LNC10-000054199"], "N6I":["vtls000101241"], "NKC":["js20080511012"], "NLA":["000035338539"], "NLI":["000501536"], "NLP":["a11737736"], "NSK":["000051380"], "NTA":["073902861"], "NUKAT":["vtls000205390"], "PTBNP":["70922"], "SELIBR":["256753"], "SUDOC":["031580661"], "WKP":["Q6678817"], "XA":["2219"], "ORCID":["http://orcid.org/0000000229258764"], "Wikipedia":["http://en.wikipedia.org/wiki/Lorcan_Dempsey"]} Ralph LeVan came up with this and we think it is pretty neat!  But wait, it gets even better! Each of the IDs in this record that is a 'source record' ID to VIAF (in this case everything except the ORCID ID and the en.wikipedia URI) can be used to retrieve the cluster.  Here's how to pull justlinks.json using the LC ID: http://viaf.org/viaf/sourceID/LC|n90602202/justlinks.json HTTPS works too: https://viaf.org/viaf/sourceID/NSK|000051380/justlinks.json All the different views of the clusters can be requested either through the explicit URI's shown here, or through HTTP headers, and they in turn can be  combined with sourceID redirection. --Th November 09, 2015 | Permalink | Comments (0) Extracting information from VIAF Occasionally I run into someone trying to extract information out of VIAF and having a difficult time. Here's a simple example of how I'd begin extracting titles for a given VIAF ID.  Far from industrial strength, but might get you started. The problem: Have a file of VIAF IDs (one/line).  Want a file of the titles, each proceeded by the VIAF ID of the record they were found in. There are lots of ways to do this, but my inclination is to do it in Python (I ran this in version 2.7.1) and to use the raw VIAF XML record: from __future__ import print_function import sys, urllib from xml.etree import cElementTree as ET # reads in list of VIAF IDs one/line # writes out VIAFID\tTitle one/line # worry about the name space ns = {'v':'http://viaf.org/viaf/terms#'} ttlPath='v:titles/v:work/v:title' def titlesFromVIAF(viafXML, path):     vel = ET.fromstring(viafXML)     for el in vel.findall(path, ns):         yield el.text for line in sys.stdin:     viafid = line.strip()     viafURL = 'https://viaf.org/viaf/%s'%viafid     viafXML = urllib.urlopen(viafURL).read()     for ttl in titlesFromVIAF(viafXML, ttlPath):       print('%s\t%s'%(viafid, ttl.encode('utf-8'))) That's about as short as I could get it and have it readable in this narrow window.  We've been using the new print function (and division!) for some time now, with an eye towards Python 3. --Th Update 2015.09.16: Cleaned up how namespace is specified September 14, 2015 | Permalink | Comments (0) Next » About Search   Recent Posts Astronimcal FITS images FRBR and Humphry Clinker More about justlinks Extracting information from VIAF Matching names to VIAF In defense of MARC VIAF RDF Changes Moving to Wikidata Testing date parsing by fuzzing Another JSON encoding for MARC data Subscribe to this blog's feed Links Weibel Lines Inquiring Librarian Quædam cuiusdam LibraryCog Lorcan Dempsey's weblog Some Sculptures Furniture in Ohio Archives April 2018 February 2016 November 2015 September 2015 May 2015 April 2015 March 2015 February 2015 October 2014 July 2014 April 2018 Sun Mon Tue Wed Thu Fri Sat 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30