Issues in Science and Technology Librarianship | Summer 2000 | |||
DOI:10.5062/F4RV0KPJ |
URLs in this document have been updated. Links enclosed in {curly brackets} have been changed. If a replacement link was located, the new URL was added and the link is active; if a new site could not be identified, the broken link was removed. |
Searching print reference sources for thermodynamic data can be a tedious and often frustrating task for librarians and scientists alike. This article describes ThermoDex, a web-based finding aid developed at the University of Texas at Austin, that indexes over 200 thermodynamic data collections and handbooks. ThermoDex allows a user to identify specific resources that might contain particular types of data, and offers a way to rediscover underutilized sources that might otherwise be overlooked. It strives to serve both end users and reference librarians as a link between traditional print resources and the web tools preferred by most users.
"I need to find the eutectic point for a mixture of naphthalene and diphenylamine. Where would I find that?"
Many science librarians dread questions like this, and for good reason. The search for data in the physical sciences can be time consuming and frustrating for librarians and researchers alike, and the failure rate is probably quite high, especially for patrons who choose not to seek help.
The importance of accurate property data in science and engineering has been well documented over the years. (Arny 1984; Lide 1981; Maizell 1998) If people who need data cannot find them readily, or if the data they do find are erroneous, inaccurate, or of dubious origin, potentially costly and dangerous mistakes can be made both in the laboratory and in the real world.
A survey of American Chemical Society members in 1965 demonstrated that scientists did not think too highly of print data compilations at the time. One respondent neatly summarized the problems inherent in published data compilations: 1) They are never current; 2) One can't be sure of the reliability of data collected from primary sources; 3) Data are difficult to locate without knowing in advance which compilations contain which data; 4) Researchers often prefer more expedient but less effective ways of hunting for data (e.g., searching in a bibliographic index). (Weisman 1967) These observations certainly hold true today, but despite the added presence of various kinds of digital data resources, print compilations remain important sources of this information.
The rapid acceptance of remote access to digital library resources has made many researchers more reluctant than ever to visit the library and to consult librarians. In addition, researchers accustomed to split-second search results from online tools are much less likely than previous generations to spend long periods hunting for printed data in their library. Paradoxically, as the volume of scientific data expands, the patience and mobility of researchers are diminishing. These factors tend to marginalize the print collections that librarians have spent decades developing and maintaining. Highly useful tools that exist only in print form are unknown to new generations of potential users, and their "old timer" users are gradually disappearing.
What can libraries do to keep useful print resources on the radar screen? Librarians have created innumerable pathfinders, reference guides, bibliographies, card files, and other finding aids over the years. Some have been published, and proved useful for a time. (e.g., Northup & Cromer 1993) But ultimately a bibliography is hobbled by being frozen in time, and the frequent revisions needed to keep it current are very time-consuming. Finally, print guides, whether published or used as local library handouts, reach a fairly limited audience.
ThermoDex does not contain the actual property data. It is rather a "meta-index" or finding aid in which a searcher can locate books that might contain the data sought. For example, in the question above, one would select "eutectic temperature" from a list of Properties, and (optionally) "organic" and/or "liquids" from a list of Compound types. The search engine queries the database and returns a list of possible sources of this data. The sources are listed with title and library call number, and the searcher can then consult those books individually in the library, or e-mail the librarian for help in doing so.
ThermoDex serves three primary purposes:
The database includes both well-known and obscure sources, but they have generally been selected for their usefulness and availability, and they represent a wide variety of properties and compounds. There is definitely overlap among the sources: some data are common and relatively easy to find in many places, while other data are much more difficult to locate. One criterion for inclusion is that a book must be predominantly composed of data tables or graphs, rather than text and theory. Most data-intensive books include some explanatory and theoretical background, but this should be minimal. Most indexed sources also fall into the category of secondary sources, i.e., those that gather and organize data originally reported elsewhere, usually in the primary journal literature. Some compilations are critically evaluated; however, most are not.2
ThermoDex is not meant to be the first, last, or only resort in the quest for hard-to-find data. Users are advised to consult well-known handbooks first, because one can answer a great many questions with standard tools such as the CRC Handbook of Chemistry and Physics or Lange's Handbook of Chemistry. Consequently, these resources are generally excluded from ThermoDex. On the other end of the spectrum, large chemistry series such as the Beilstein and Gmelin Handbooks (and their online equivalents, Beilstein/Gmelin Crossfire) contain so much data at the compound level that they should also be separately consulted. A third major handbook series, Landolt-Börnstein, is selectively represented in ThermoDex, as is the old National Standard Reference Data Series (NSRDS) published by the National Bureau of Standards. Fee-based online datafiles, such as those found on STN, are excluded.
New books are selected for addition to ThermoDex as the library acquires them; anyone is encouraged to suggest additions to the database. Since the database has reached a level of critical mass, it is no longer growing very rapidly -- perhaps 10-20 new records are added each year. While most are part of the Chemistry Library's reference collection, items from other science branches on campus also appear. Expanding the content beyond UT's collection has always been a goal, and for a time Penn State University contributed indexing for additional titles not held at Texas.
(Property 1 OR Property 2 OR Property 3 ...) AND (Compound 1 OR Compound 2 or Compound 3 ...)
The user does not need to select both a property and a compound. A search on one section alone will yield hits that do not specify anything from the other section. The checkboxes represent the most commonly sought terms within each section, while the scrolling box offers a much more extensive list of terms to choose from. Multiple items within a scrolling box can be selected by pressing the Control key (Windows) or the Apple key (Macintosh) when clicking on them.
Sample Results List
Sample Full Record
The fact that any given chemical substance can have an almost infinite variety of names -- common names, systematic names from various conventions, trade names, acronyms, etc. -- makes working with chemical information very complicated. Thermodynamic handbooks are prepared by compilers who use chemical names as they themselves see fit, and there is little or no standardization. Some recent compilations contain Chemical Abstracts Registry Number indexes to reduce confusion, but earlier books lack this feature. Users often are ignorant of the CAS Registry system anyway, and will not have a registry number in hand. Many books have molecular formula indexes, but if the formula is not known in advance, these are of little use. Most disturbing is the number of handbooks that contain no indexes at all, although these are mainly the older ones.
The Compounds list in ThermoDex uses common names for substances that appear frequently in handbooks. (Examples: methane, argon, ethanol.) More complex compounds are lumped under a general heading describing the type of compound (Examples: hydrocarbons, organic).
For handbooks that cover hundreds or thousands of chemical compounds, it is not feasible to create indexing for each and every compound, so general compound-type headings are used instead. Creating compound-specific indexing for all the handbooks in ThermoDex would require thousands of hours of expert labor and would have to be based on a system that collated standard synonyms under CAS registry numbers, with reference to the actual data points included for each. The impossibility of doing this is unfortunate, given that most requests for thermodynamic data are indeed compound-specific, and the general headings like "organic" and "inorganic" that must be assigned to books in the name of brevity are often too broad to be useful.
Like chemical nomenclature, there is often little agreement on what to call certain thermodynamic properties. Non-specialists, including the creator of ThermoDex, can be bewildered by the array of confusing and overlapping terminology, symbols, and units. (Northup 1993, p.61.) Without specialized knowledge, it is often difficult to interpret a table filled with numbers, Greek characters, subscripts, and cryptic notations. (Even specialists can have trouble with them, and this can usually be blamed on the compiler, who may be the only person to whom a data table is actually clear! User-friendliness does not seem to be a consideration for some compilers, editors, and publishers of data handbooks.)
This makes indexing handbooks a challenging task, and the rule of thumb is to keep it simple. The goal of ThermoDex is to point users to potential sources of data; the onus of interpretation and use of the data found is left mainly to the user. Some basic decisions on terminology were made early on (e.g., using "heat of X" in place of "enthalpy of X" in all cases), but there is definitely some overlap and inconsistency in some of the Properties headings. Experts who use ThermoDex are encouraged to send in corrections and clarifications at any time.
If a handbook contains data for only one compound, such as ethanol, it makes sense to assign the compound term "ethanol". Ideally the more general terms "alcohols" and "organic" would also be assigned, and in some cases this has been done. But inconsistencies grow over time, and it is difficult to systematically review the database locating and correcting them. The current architecture of the database does not allow global subject authority revisions.
After several years of user feedback, it is clear that the principal obstacle that users encounter in ThermoDex is the overspecificity of queries. For example, suppose that a researcher is looking for heat of formation data for 1,4-dichlorobenzene. There is, predictably, no specific compound entry in ThermoDex for 1,4-dichlorobenzene. But if one searched for "heat of formation" with "organic" several handbooks would be pulled up, some of which would likely contain this data. Unfortunately, some searchers give up before making this generalization. The Help page attempts to advise users of the best search techniques, and a related {Thermodynamics page} offers additional commentary for those hunting this kind of data. But as librarians know, most people don't consult help screens.
Another enhancement is to include links to Web-based resources, such as the NIST Chemistry WebBook. At present ThermoDex is limited to printed materials, but could be programmed to include electronic tools in the same way. This is a priority for further development.
ThermoDex has proven to be an interesting and positive experiment in opening up library collections to all kinds of users. It addresses one of the most difficult and time-consuming areas of reference work. Perhaps most importantly, it attempts to use digital library technology, however simply, to link remote patrons with print collections that are underutilized but still highly useful.
2. Critical evaluation of data is the systematic evaluation of reported data for accuracy, consistency, and clarity of presentation. In the U.S., university, private and governmental data evaluation centers carry out this work. Principal among them is the Standard Reference Data Program at the National Institute of Standards and Technology (NIST; URL: http://www.nist.gov/srd/intro.htm). Uncritical compilations gather and republish reported data without thorough evaluation for accuracy. (see Arny 1984, pp.17-25.)
Lide, David R. 1981. "Critical data for critical needs." Science 212 (4501) 1343-49.
Maizell, Robert E. 1998. How to Find Chemical Information. 3rd ed. Wiley, New York. p.403-40.
NIST Chemistry WebBook. [Online] Available: http://webbook.nist.gov/ [August 10, 2000]
Northup, Diana, and Cromer, Donna. 1993. "Thermodynamic properties of substances: a selected annotated guide to the printed literature." Science & Technology Libraries 14(1) 57-95.
Weisman, Herman. 1967. "Needs of American Chemical Society members for property data." Journal of Chemical Documentation 7(1) 9-14.
Young, Robyn V., ed. 2000. World of Chemistry. Gale Group, Detroit, p.1084.
This article is based on poster sessions presented at the American Chemical Society Spring National Meeting in 1998, and at the ACRL-STS program at the ALA Annual Conference in Chicago in 2000.
The author owes a debt to current and past staff in the UT General Libraries Digital Libraries Services Division (DLSD) for their ongoing assistance and expertise in making ThermoDex work: Audrey Templeton, Erik Grostic, Ladd Hanson, and Mark McFarland (Division Head).