Issues in Science and Technology Librarianship | Fall 2008 |
|||
DOI:10.5062/F4348H87 |
URLs in this document have been updated. Links enclosed in {curly brackets} have been changed. If a replacement link was located, the new URL was added and the link is active; if a new site could not be identified, the broken link was removed. |
This article is the second part of the previously published review about the DOE Data Explorer database (Ayers 2008). It discusses the types of data that can be found using the Data Explorer by focusing on the content provided by the Data Resource Centers. Please remember that the Data Explorer is just a tool for pointing the user in the right direction. The Explorer sends the user to other site locations that may have the data needed. These locations do not have the same structure or format as the DOE Data Explorer and further searching by the user may be required once at the data collection site. As shown in the previous article, users access the list of all data centers by selecting "DOE Data Center" with the browse feature. This is the starting point for the examples in this article.
The Department of Energy (DOE) {Data Explorer} is a relatively new and currently unsophisticated research tool which helps researchers, students, and the public find stored and maintained data sets. The site claims to have cited over 200 data sets and is continuing to grow. The DOE does not claim responsibility for the accuracy and availability of the stored data. The purpose of the engine is to make both archived and active data easier to find. The Data Explorer is operated and maintained by the DOE's Office of Scientific and Technical Information (OSTI) which is responsible for providing all the bibliographic information in the database based on the information found at the web sites hosting the data.
The Data Explorer indexes collections of scientific research data, figures and plots, numeric files, scientific images, interactive maps, multimedia, and computer simulations. The data collections themselves reside on various servers in numerous possible locations. Examples of such locations would be: national laboratories, data centers, colleges and universities, corporations, or international organizations, just to name a few. Access to all the data collections is free; however, some may require password registration. Users should also be aware that they may have trouble accessing some of the data if they do not have the appropriate program or software on their computers.
There are nine data resource centers listed by the Data Explorer. This article goes into some detail about what is available at each data center and gives the user some keywords that can be used to find the data. Please keep in mind that these are summaries of what is available at the Data Resource Centers. This is not a complete list of all the types of data available, nor will all possible subject categories and keywords be included in the summaries. A list of browsable subject categories can be found at {http://www.osti.gov/dataexplorer/}. The list of subject category definitions is available at {http://www.osti.gov/dataexplorer/subjcats.html}. Links to these lists can also be found through the "site index" link in the menu bar on the home page. Keywords range from very broad to subject specific and are used to describe the entire data set, not just the individual data. Geographic regions, project names, and site locations are often used as keywords.
The content type of the first data center consists of interactive data maps, numeric files/data sets. The subject categories listed are 9-Biomass Fuels, 10-Synthetic Fuels, 29-Energy Planning, Policy and Economy, and 54-Environmental Sciences. Keywords that will lead the user to data at this site include: flexible fuel vehicles, fuel efficiency, biomass fuels and resources, alternative fuels, clean cities, and trends. Clicking on "Data Collection" will take the viewer directly to the AFDC homepage. From here the users has a menu of 10 options to choose from:
Some of these have interactive maps leading to desired data. The data are often in Excel format with graphs and data points. Many of the analysis and trend reports are in PDF format. For example if the user is looking for information on the U.S. corn production and amount used in creating ethanol, then the user would select "Biomass Resources" from the list above and click on the file for "U.S. Total Corn Production and Amount Used for Ethanol" to get the spreadsheet in Figure 1. To see the data, the user simply clicks on the data tab in the spreadsheet.
The primary subject category for this data center is 54-Environmental Sciences. Content types include figures/plots, numeric files/data sets, scientific images, and multimedia. Global climate change, aerosols, radiometric, cloud properties, radiative divergence, marine stratus clouds, and paleoclimate are some the keywords used to access the data provided by ARM. There are 17 data collections listed under this data center. Clicking on "Data Collection" will take one directly the appropriate ARM web site regarding general information about sites, data collection process and instrumentation. The data itself can only be accessed though the ACRF archive data page. Here the user as several options (Figure 2), all of which require a login to access. There is the "Data browser" which has two interfaces "Novice Interface" and "DataStream Interface." The Data Cart holds the data stream the user wishes to order; there is no charge for the data though registration is required for access. The "Catalog Browser" allows the user to view the summary of all the data available in an interactive table setting. The "Thumbnail Browser" is used to find plots and images quickly. There is a separate browser for Intensive Operation Periods(IOPS) and PI data. ARM also provides access to NCVWEb which is a NetCDF tool for plotting the data they have acquired from the archive. This is a source for atmospheric state, aerosols atmospheric carbon, cloud properties, radiometric, and surface properties data.
Thirty-four data collections are listed under this heading. The content comes in mostly the form of numeric files and data sets with the occasional interactive data map and computer model/simulation for certain topics. The primary subject heading for the data collections is 54-Environmental Sciences. Some of the common keywords in this section are carbon dioxide, vegetation, carbon cycles, carbon isotopes, and carbon sequestration. Other keywords include: DIC, fine particle management, meteorological data, volcanoes, carboocean, soil data, salinity, World Ocean Circulation Experiment (WOCE), Greenhouse effect, global climate change, and stream discharge to name a few. A few of these collections are located at separate sites and will bring up different home pages but the majority is accessed through the CDIAC's Global Change Data and Information Products By Subject page. Links to this page from the DOE Explorer will take you directly to the subject heading desired on the CDIAC's subject page. The data are found in a rage of formats from PDF, Excel, ASCII to netCDF. Some sites require the use of anonymous FTP to access data while other will require the use of interactive maps to get the data the user may be looking for. Examples of the type of data that can be found though this data center are: Historical CO2 record from the Law Dome DE08, DE08-2, and DSS ice cores; carbon-related and hydrographic data from WOCE Pacific Ocean cruises; historic CH4 Records from Antarctic and Greenland Ice Core; global, regional, and national fossil-fuel CO2 emissions, among others.
If the user clicks on "Data Collection" under the heading Atmospheric Trace Gases, Carbon Isotopes, Radionuclides, and Aerosols: Aerosols Data from the Carbon Dioxide Information Analysis Center (CDIAC)," it will give a list of data sets. Selecting the data set named Historical atmospheric CO2 record from the extended Vostok ice core, (2003) will take the user to another web page. Here the user selects graph or digital data to view the information. (Figure 3).
The four data collections listed under this data center all lead to the same page and provide cancer, radiation exposure, and mortality data. Since the majority of the data are related to humans, registration and acceptance of a confidentially agreement is required to access the free data provided by this data center. There are a few files which contain no restricted data (RERF's) which are set up so the public can view a sample of the data before actually registering. Instructions for registration in order to gain access to the complete data set are clearly presented. The subject categories for this data center included: 59-Basic Biological Sciences; 61-Radiation Protection and Dosimetry; and 63-Radiation, Thermal, and other Environmental Pollutant Effects on Living Organisms and Biological Material. Some of the keywords that can be used to access this data are epidemiologic data, occupational exposure, ionizing radiations, hazardous materials, and environmental doses. The content type is listed as specialized mix. The "Death Summary Tables" are available in PDF without authorization. The example shown in figure 4 is for {Mortality Among Female Nuclear Weapons Workers (mffegwa1)}. To access this sample click on "Data Collection" under any of the data collection headings and scroll down to "Death Summary Tables" and click on the above title in the list.
The content type for five data collections here is mostly numeric files/data sets with figures/plots available from the Electron-Impact Ionization of Multicharged Ions data collection from Oak Ridge National Laboratory (ORNL). The primary subject category for theses collections is 70-Plasma Physics and Fusion Technology although a few also list 74-Atomic and Molecular Physics as a subject category. Some of the keywords listed are electron collisions, thermonuclear fusion, transport cross sections, ion optics, double ionization, charge transfer and quantum mechanical calculation. The information from the A&M Collision Databases and Particle-Surface Interaction Databases provides a list of articles. Clicking on the article number will give access to the data used by author for the publication in a text format. The remaining four data collections provide access to information, data, and graphs in a web format. The ORNL CFADC Redbooks volumes 1,3,4, & 5 are available to be viewed on the web and are listed as one of the data collections. The remaining three data collections are interactive and require the user to select the desired data sets and the type of graph the data will be the data plotted on. The user is able to view the graphs and the data in a new window. The data available includes cross sections (single, double, or triple ionizations), charge transfers, and vibrationally resolved transitions. If the user wishes to access A+M Collision and particle Surface Interaction Databases, then all the user has to do is click on "Data Collection" under the title. A web site comes up listing citations to published articles. Then the user clicks on the number next to the citation to view the data used in the article. Figure 5 show some of the data for the following citation:
"Atomic and Molecular Data for Fusion, Part I - Recommended Cross Sections and Rates for Electron Ionization of Light Atoms and Ions" K. L. Bell, H. B. Gilbody, J. G. Hughes, A. E. Kingston, F. J. Smith. J. Phys. Chem. Ref. Data 12, 891 (1983). [This file contains databases 1 and 5]
The subject category for both collections listed here is 59-Biological Sciences. Some of the keywords for these sites are genomics, eukaryotic genomics, microbial genomic, gene sequences, tree of life, and blast to name a few. The content listed for both collections is specialized mix which includes pictures, description, and text. Both collections Eukaryotic Genomics Data and Microbial Genomics Data provide pictures and descriptions of the selected microbe or eukaryotic. Downloads of assembly, filtered models (best), functional annotations, all models filtered and unfiltered, and ETS are available after the user has accepted the data release policy. Please note that not all of options just listed are available for each record. There are differences depending on whether the item searched is a microbe or a eukaryotic. For the eukaryotic, KOG and KECG data are also available. Special software may be needed to view some of the downloadable material correctly. Both data collections offer access to BLAST Alignment Search Program were the user needs to input the parameters for the search. This search screen is not for novices. Some knowledge and understanding of what things are and terminology in genomics is necessary to operate or use it correctly. Another search option available at this site is the Tree of Life. This is an interactive map which can also be used to access information on eukaryotic, archaea, and bacteria.
Clicking on "Data Collection" for either one will go to a page where the users can select the eukaryotic or microbial genomics they are interested in from a dropdown menu. It will then display a page about the selected object that will allow the user to read about it and then select more options from the menu at the top (Figure 6).
These 11 collections consist mostly of numeric files/data sets, but there are also some interactive data maps, and at least one is listed as being a specialized mix. Subject categories for this section generally include 73-Nuclear Physics and Radiation Physics, 72-Physics of Elementary Particles and Fields, and 74-Atomic Molecular physics.
Some of keywords for these data collections are isotopes, nuclear decay, neutrons reactions, photonuclear reactions, resonance parameters, particle-induced reactions, fission product yields, nuclear structure, hypernucleus, nuclides, radiation, and gamma radiation. Among the things that can be found here are an interactive map which gives the half-life and radioactive decay of a radionuclide, nuclear reaction data, experimental nuclear reaction data, and bibliographic data. Many of these sites have forms in which the user inputs the parameters of interest and retrieves data pertaining to those parameters. Most of the data are available in HTML format. Some downloadable files may require specialized software to be viewed correctly.
Figure 7 is an example of an interactive map where one clicks on the point of interest and the data are displayed below. In this instance the map is the Interactive Chart of Nuclides found by clicking on the term "Data Collection" under the title of Chart of Nuclides from the National Nuclear Data Center (NNDC).
Among the subject headings used for the 10 data collections under the RReDC heading are: 14-Solar energy, 54-Environmental Sciences, 15-Geothermal Energy and 17-Wind Energy. While the majority of the data are numeric files/data sets, at lest three collections boast having interactive data maps and at least one has scientific images and figures/plots. Some of the keywords used to describe this data are solar radiation, circumsolar radiation, weather data, seismic data, solar irradiance, flux measurements, meteorological data, wind speed, scattering, and irradiance. Much of the data are available in text, PDF, Excel or HTML format with some larger files available as ZIP files. Most of the data are organized by geographic location or by location of site measurements and then by month and year at those locations. A few of the sites also have some historical data available. Types of data that can be found are solar irradiance including one site with a daily solar calendar, wind rose, temperatures, geothermal energy locations, and ideal locations for wind energy and more.
A user looking for wind resources for the state of Illinois would click on "Data Collection" under the title Wind Resource Maps for U.S. States from Wind Powering America. This would take the user to a web page with a map. The user would than click on the state of Illinois on the map to access the desired data (Figure 8).
The data in these three collections consist of either numeric files/data sets or is a specialized mix. The subject heading for these collections include 60-Applied Life Sciences and 63-Radiation, Thermal, and other Environmental Pollutant Effects on Living Organisms and Biological Material. Keywords for these collections include radiochemistry, pathology, human tissue samples, inhalation toxicology, and radiobiology. All three collections take the user to different pages of the same web site. The data are available in text, Excel and PDF depending on the type of data for example: narrative, numerical or tables. Information on radiation as it is related to physical health can be found here.
Figure 9 is a Excel document of the de-identified case 269 which can be found by selecting "Data Collection" under the title De-identified Case Data from the United States Transuranium and Uranium Registries (USTUR). On the next page the user would click on "Access Radiochemistry Data" and then select "0269" under "Whole Body Donations." The user should note that not all the case files are available.
The above are examples of some of the information that can be found using the DOE Data Explorer. A selection of the keywords and subject headings that are available to narrow down searches or find the information are provided in the article. This is not a complete or comprehensive list and exploration of the Data Explorer and related data collection sites is encouraged.