1 

 
Collaborative Batch Creation for Open Access E-Books: A Case Study 
 
Philip Young, Rebecca Culbertson, Kelley McGrath   
 
Abstract 
  
When the National Academies Press announced that more than 4,000 electronic books 
would be made freely available for download, many academic libraries expressed interest 
in obtaining MARC records for them.  Using cataloging listservs, volunteers were 
recruited for a project to identify and upgrade bibliographic records for aggregation into a 
batch that could be easily loaded into catalogs.  Project organization, documentation, 
quality control measures, and problems are described, as well as processes for adding 
new titles.  The project’s implications for future efforts are assessed, as are the numerous 
challenges for network-level cataloging.    
 
 
Introduction 
 
In June 2011, the National Academies Press (NAP) issued a press release announcing that 
portable document format (PDF) versions of their books could be freely downloaded 
from their website.1  While about 65 percent had previously been available for download, 
and almost all were available to read online through a web reader, over 4,000 books were 
now downloadable, as would be most books issued in the future.   
 
NAP books are primarily reports from scientific panels on a variety of topics, and often 
ordered in print by academic libraries.  A link to the online version is frequently added to 
the print record in either the library catalog or in OCLC.  A library’s catalog web page for 
a book can also include a link to the e-book via a link resolver or JavaScript that uses 
elements of the MARC record to create a search.  However, these methods depend on the 
presence of the print version in the catalog, and most libraries do not have all of the 
available titles.  The announcement by NAP presented an opportunity to fill those gaps 
and add nearly all of the content in electronic form to a library’s catalog. 
 
In several academic libraries, collection development librarians expressed interest in 
providing access to the e-books in their catalogs.  A week after the press release, a 
catalog librarian made an inquiry on the Batch cataloging listserv2 to discover whether 
records were available for the newly accessible e-books.  An OCLC collection set could 
be purchased for 2,580 books published through 2008, but the records were for the print 
version with a link added.  In 2008 the library contributing these print records suspended 
cataloging records for this set because they were changing in their local catalog to the 
separate record technique and could no longer support adding links to print records.  At 
this time, most libraries use separate records for print and electronic versions of 
monographs.  One respondent to the inquiry contacted NAP and was told that MARC 
bibliographic records were not available from them, though records could be ordered 
through NetLibrary or Ebrary.   Neither vendor answered inquiries about the availability 


 2 

and cost of a NAP record set.  A third vendor was discovered but offered only a subset of 
the available titles. 
  
Soon, another listserv respondent offered to organize a project to create a list of OCLC 
record numbers for the NAP e-books that any library could batch search and download.  
The record batch would therefore be free (that is, no cost above the OCLC subscription), 
weeded of the duplicates that plague e-book cataloging, and there would be an 
opportunity to upgrade the records.  Doing so at the network level would save individual 
libraries from the work of batch searching and weeding, as well as the quality control 
often necessary after loading records into the local catalog.  While lacking an explicit 
cost, the record batch would depend on time volunteered by catalog librarians to create it. 
 
Libraries in general add fewer online open access resources to their catalogs than might 
be expected. The most commonly added materials in this category are government 
documents. Most libraries receive government document records in batches through a 
service such as Marcive or OCLC. Many libraries load records for e-journals in the 
Directory of Open Access Journals, especially since the records are often available from 
vendors who provide journal records for electronic resource management systems 
(ERMs). Some libraries also add selected websites to their catalogs. Although there are 
many freely available online e-books, these are not often added to library catalogs in 
large numbers. One reason for this is the lack of organized record batches that could be 
quickly loaded into the catalog. This is in contrast to vendors who often provide MARC 
bibliographic records as an inducement for customers to buy a particular collection of e-
books.  However, there are potentially many advantages to including open access e-books 
in library catalogs. In addition to expanding a library’s collection, open access e-books 
can also be used as a collection weeding tool3 and libraries will likely want to provide 
access to the increasing number of open access textbooks.4    
 
Unlike some very large open access e-book collections, the size of the NAP collection is 
small enough that collaborative batch creation seemed a reasonable, attainable goal.  The 
creation of a curated record batch would ensure record quality and reduce the burden on 
any individual library wishing to provide local access to this collection. Additionally, the 
collection consists of recent and ongoing scholarship, whereas larger freely available 
collections tend to be dominated by older public domain content.  Since all libraries have 
access to the content, the NAP e-books and similar collections seem ideally suited for 
large-scale cooperative cataloging of record batches. 
 
 
Literature Review 
 
Academic consortia most frequently report collaborative work on record sets, usually 
involving quality control of vendor records.  Cary and Ogburn describe the origins of 
what may be the earliest effort, involving a group of Virginia academic libraries with 
access to the same content.5  In an attempt to avoid duplication of work, they contacted 
other consortia, but no other shared cataloging agreements were discovered.  Their first 
project involved a set of vendor records that were improved and shared via file transfer 


 3 

protocol (FTP).  Catalog librarians in the consortium differed in skill and experience, and 
the project revealed “a significant need for training and help in interpreting and applying 
cataloging rules and standards.”  Shieh, Summers, and Day subsequently provided a more 
detailed account of the same project, including difficulties in loading, record quality, and 
authority work.6  They note, “further research is needed on administrative implications of 
cooperative cataloging in consortia, addressing equitable allocation of personnel, 
scheduling in conjunction with local projects, and cost/benefit for participating 
institutions.”7  
 
Martin and Mundle relate a consortial effort to improve vendor records through 
communication rather than through record editing.8  Record problems were reported on a 
discussion list by libraries in the consortium, then aggregated and forwarded to the 
vendor.  They found that communicating with each other and the vendor to improve 
records before receipt (by reviewing sample record sets) was the best way to ensure 
quality metadata, assisted by the added influence of the consortia as opposed to a single 
library.   Contrary to the pre-distribution quality control employed by the NAP project, 
the authors suggest that libraries may best serve their users by “working to improve 
accuracy, completeness, and discoverability after access has been established.”9  
 
Chew and Braxton describe an Illinois consortium using a shared system and its effort to 
establish consortial standards for cataloging electronic resources.10  Among the problems 
mentioned are vendor restrictions on record sharing and the importance of record 
identifiers, particularly for vendor records. 
 
Preston focuses on how cooperative e-book cataloging work was “organized, negotiated, 
and divided among project participants” in an Ohio consortium.11  Work on specific 
record sets was negotiated by members at bimonthly meetings, and was largely 
dependent on the skills needed.  Issues of fairness can arise when only a small minority 
contributes but all benefit. 
 
Cataloging work can be distributed in a variety of ways.  A post on the blog All Things 
Cataloged described a method used by the Bavarian library network, in which 
 

for one year one library ‘adopts’ one e-book package, taking responsibility for 
improving that package’s metadata (which includes adding subject headings, 
doing authority work and, where possible, linking print version and e-book). 
These automatic and manual improvements are then shared cooperatively.12 

 
Such consortial efforts toward batch record improvement, however, are rarely shared on a 
wider scale. 
 
Little research has occurred about record batches for open access e-books.  Beall 
discusses loading a record set for a very large collection of open access e-books 
(Mbooks, now HathiTrust) into the catalog.13  Records were stripped of metadata due to 
the requirements of OCLC’s member agreement , made available via the Open Archives 
Initiative Protocol for Metadata Harvesting (OAI-PMH) standard, then crosswalked back 


 4 

into MARC using the editing software MarcEdit.  Records were then improved using 
global update in the catalog.  Despite low metadata quality, the author felt that providing 
access to the content through the catalog was far more important than record accuracy 
and completeness.  
 
The Cooperative Online Serials group (CONSER) completed the first year of an Open 
Access Journal Project begun in April 2010 to catalog any e-journals in the Directory of 
Open Access Journals (DOAJ) lacking a CONSER record.14  The project recognizes the 
increasing importance of open access resources as libraries undergo journal cancellation 
projects and provide support for open access publishing initiatives.  Journals in DOAJ 
must meet certain criteria, such as scholarly content, peer review, and assignment of an 
ISSN.  The DOAJ project is similar to the NAP project in size, collection growth, and 
multidisciplinary content.  Records already exist for most of the content, and the project 
“decreases duplicative cataloging efforts.”  Titles were assigned by cataloging expertise 
(e.g., language or subject knowledge), and a frequently asked questions (FAQ) page on 
the project website is provided to assist participants. 15 This project was so successful that 
CONSER libraries signed up for a second round of cataloging new DOAJ titles.  
 
Hellman points to several large open access e-book collections and notes that libraries 
have not done a good job of including them in their catalogs.16 One effort singled out is 
the University of Pennsylvania’s Online Books Page, edited by John Mark Ockerbloom.17  
Over a million open access e-books are indexed, including large collections such as 
HathiTrust and Project Gutenberg. The metadata needs for these collections are 
sometimes great, and “libraries can make significant contributions, especially when they 
work cooperatively.”18  
 
 
Project Organization 
 
Once catalogers were in agreement about the significance of the project, attention quickly 
turned to implementation. Considerable work was required to provide participants with a 
spreadsheet of the NAP titles.  An initial title list was established based on a coverage 
load for SFX (3,860 titles through some time in 2008).  SFX, an ExLibris product, is best 
known as an OpenURL link resolver, but also contains a knowledgebase.  The SFX 
spreadsheet included only minimal metadata: title proper, provider name and URL.  The 
NAP identifier was extracted from the URL and student labor was used to fill out the 
spreadsheet with ISBNs for searching and title availability status.  NAP identifiers do not 
appear to be issued sequentially and there are unused numbers in the sequence. Newer 
titles were identified by student workers who checked a range of NAP identifiers on the 
NAP website. They began checking with a number that was deemed to be low enough to 
have sufficient overlap with the identifiers provided by SFX not to miss anything and 
stopped when they encountered a long series of unused numbers. All titles were sorted 
into the following categories. 
 

1. Available in PDF and assigned an ISBN  
2. Available in PDF but not assigned an ISBN 


 5 

3. Available only in HTML/openbook format 
4. Forthcoming and prepublication titles 
5. Various categories for which a free ebook is not available 

 
This information was imported into Microsoft Access, which was used to track the status 
of the project, sort titles into categories, and generate Excel spreadsheets for participants.  
 
Early in the project, there were about eight participants, but when progress was slower 
than hoped for, a second call for volunteers was made on the Batch and Autocat listservs.  
The number of participants then swelled to about twenty, though the amount and quality 
of work varied by participant. 
 
Project documentation was prepared by the organizer and evolved through four versions 
as participants added suggestions (see appendix).  Other guidance provided by the 
organizer included a procedure for batch searching on OCLC Connexion, and directions 
for using a macro that converted e-book records to the provider-neutral standard.19  The 
project had no explicit decision-making process, and the project direction depended on e-
mail feedback to the organizer’s questions. 
 
One discussion centered on whether the URL should be standardized, and if so, which 
form of it should be used.  A wide variety of NAP links have been attached to OCLC 
records (see table 1).   A standard URL would be easier to manipulate in MarcEdit or 
other batch editing programs. Some forms of the URL lead to NAP’s web reader, where 
it is less clear that a downloadable PDF is also available. These forms were rejected.  A 
search on OCLC determined the form that was used with the greatest frequency, and this 
became the standard.   
 
Table 1.  Examples of NAP link variation in OCLC   
 
nap.edu/catalog.php?record_id=9999    General e-book page (selected)  
books.nap.edu/catalog.php?record_id=9999  General e-book page 
nap.edu/catalog/9999.html     General e-book page  
nap.edu/books/NI000136/html/   HTML/openbook version  
nap.edu/books/030907603X/html/   HTML/openbook version  
nap.edu/openbook.php?record_id=9999  HTML/openbook version 
nap.edu/catalog.php?record_id=9999#toc  HTML version table of contents 
 
 
Project Workflow  
 
Upon receiving a spreadsheet of 50 titles, participants were to carry out several tasks.  
One of the most important was identifying the best record for a title and recording its 
OCLC record number.  At the project’s end, this would enable any library to import a text 
file of all the record numbers into Connexion’s batch searching module to retrieve a 
record batch that could then be exported to the library’s catalog.  The high number of 


 6 

duplicate records made selection difficult. A few participants reported duplicates to 
OCLC, but this was a very time-consuming process. 
 
Once a record was selected, participants verified that the URL for the NAP version was 
present in the agreed-upon form, and working properly.  Other edits ensured that any e-
ISBNs, where present, were recorded in the MARC field 020 $a, and that any print 
ISBNs were in a 020 $z.  Participants also checked that headings were authorized, in 
order to save work at the end of the project.  Some edits were optional, such as making a 
record provider-neutral or adding Medical Subject Headings (MeSH) and National 
Library of Medicine (NLM) classification.  Macros were available for automating 
provider-neutral record conversion, and for deriving an original e-book record from a 
print version.  While PDF download does require registration on the NAP site (at a 
minimum, users must provide an e-mail address), no note about this was added to 
records.  Individual libraries may choose to add a note, if desired, after records are 
retrieved. 
 
Participants were also asked to note special situations or problems.  Some OCLC records 
needing editing were Program for Cooperative Cataloging (PCC) records, and non-PCC 
libraries were restricted from editing them.  The PCC status of records was noted so that 
they could be gathered and sent to a PCC participant for editing.  A recent change in 
OCLC policy allows those with a Name Authority Cooperative (NACO) authorization to 
edit PCC records.  This might have improved project workflow had the change taken 
place before the project began.  In a very few cases, an e-book record for a title did not 
exist, and an original record was created.  Name headings (usually for committees or 
conferences) without authority records (those that could not be “controlled” in OCLC) 
were also noted for NACO participants to address later.  
  
When all of the spreadsheets had been returned to the organizer, it became apparent that 
the varying skill levels of the participants resulted in quality control problems.  A follow-
up project was begun using selected volunteers.  Record numbers were searched in batch 
mode in OCLC Connexion and sent to the local save file where a macro was used to 
identify record errors and anomalies for cleanup.  The macro focused on identifying 
information that would affect the usability of the records, such as lack of a NAP URL in 
the agreed-upon form and the presence of uncontrolled headings that might not be 
supported by authority records.  Many of the name headings had authority records, but a 
few participants had not been familiar with the “control” function in OCLC which links a 
heading to its authority record. Some participants did not understand that the instructions 
from the first part of the project asked them to report headings that could not be 
controlled in the notes column of the spreadsheet so that they could be followed up on. In 
addition, there were many headings of the form  
 
710 2b $a National Research Council (U.S.). $b. Committee on Fire Research. 
 
where the base part of the heading in $a could be controlled, but the whole heading was 
not controllable. Some participants did not understand that these should be reported as 
problems. Because the Connexion macro command CS.IsHeadingControlled identifies 


 7 

these partially controlled headings as controlled, they could not be identified during the 
second stage. Although a goal of this project was to support all headings with authority 
records and control the headings in Connexion, this goal was not met. This is primarily 
because not all of the uncontrolled headings were flagged at the point where a person was 
looking at the record, and it is not currently possible to retroactively identify them by 
automated means.  
 
A second obstacle to our goal of comprehensive authority control is that records in OCLC 
Connexion are not static. Even if the record was complete and accurate at the time one of 
the participants last edited it, record quality may be enhanced or degraded before a library 
retrieves it.  Due to record merging, unauthorized headings were introduced into some of 
the project’s previously cleaned-up records. 
 
Completed reviews were reported to the organizer, along with any problems encountered.  
A second follow-up project addressed titles that were part of multi-volume sets or had 
been cataloged as serials in print.  While it is common for multi-volume sets to be 
cataloged as individual volumes as e-books, a decision was made to use set records where 
the Library of Congress had done so for the print version.  However, for cases where the 
print version was on a serial record, such as Biographical Memoirs, cataloging as 
individual volumes was thought to be more practical.  Authority record creation was also 
completed during this stage. 
 
OCLC record numbers were then compiled into a text file and uploaded to the web.  A 
separate text file was made available for multi-volume set records.  Availability of the 
files was first announced to project participants for testing before wider distribution. The 
records downloaded from OCLC based on this list will need some editing by each library 
for loading into the local catalog. At a minimum, non-NAP URLs present on these 
provider-neutral records will need to be removed and information needed for the loading 
process will have to be added. This takes little time with tools such as MarcEdit, but is an 
extra step that will have to be performed.  The record batch will also be available as a 
WorldCat collection set available to both OCLC and non-OCLC libraries, though at a 
cost.  Collection set records are pre-processed by OCLC so if they are properly set up, the 
records can be loaded immediately.  Usage data from text file downloads and WorldCat 
Collection Set purchases could give an indication of the project’s usefulness. 
 
 
Future Plans 
 
Plans for updating the record batch are ongoing.  According to NAP, about 200 new titles 
are issued per year.  Options for keeping up with new titles include the NAP weekly e-
mail newsletter,20 the new books web page21 (although it only lists new books for the last 
30 days), and possibly a vendor knowledgebase.  Volunteers could be assigned to titles 
on a monthly basis.  However, different skills are required for creating new records (i.e., 
“original” cataloging).  Deriving new records from the print version would not be 
possible in most cases, since the online version usually precedes the print version.  
Therefore the pool of qualified volunteers will likely be smaller.  Rather than create new 


 8 

records, another option would be to periodically search OCLC for new records entered by 
others, and add those OCLC record numbers to an updated text file.  
 
An additional problem is that many of the new titles are issued in a prepublication 
version before being replaced with the published version.  This process can take a few 
months.  Should titles be cataloged as prepublication versions, or should catalogers wait 
for the final publication version?  Advantages of the former are that catalogers would be 
providing timelier metadata for available content that will likely change little.  The vast 
majority of the description, including the URL, would remain the same once a 
publication version was issued, although NAP does appear to use different ISBNs for the 
prepublication and published versions.  However, records would need to be marked in 
some way (either through a MARC field or a list kept by the project) so they could be 
finalized against the publication version and the “description based on” language and 
physical description changed.  While print prepublication versions continue to exist, 
online prepublications often do not.  While a prepublication PDF could be downloaded 
by a library, this seems unlikely. If records are not updated, then they will describe a 
manifestation that no longer exists in its online form, and another record for the final 
version could be created.  NAP notifications for replacement of the prepublication 
version with the published version could be used to update the record.  Closer 
collaboration with NAP could also help in distinguishing titles with and without PDF 
versions, and in identifying any removed items.  
 
 
Discussion   
 
The NAP batch creation project was far more time-consuming than expected, and placed 
inordinate demands on the organizer.  The project began in June 2011 and the OCLC 
numbers for PDF e-books with ISBNs were distributed in February 2012. Work is 
ongoing on the NAP e-books for PDFs without ISBNs. It has not yet been decided if the 
project will incorporate e-books available only in HTML format.  Except for very recent 
releases, there were e-book records available in WorldCat for all the NAP PDF e-books 
with ISBNs. The larger problem was sorting through multiple records to identify the best 
record.  The project did not offer explicit guidelines for selection of the best record, 
although these were implied in the instructions (see appendix) and participants were 
expected to have sufficient expertise to evaluate records.  In e-mails among the 
volunteers, selecting the record with the most holdings was frequently suggested.  As 
long as a record met criteria in the instructions, or was upgraded to that standard, record 
choice was not crucial. 
 
In contrast, a significant proportion of the NAP e-books without ISBNS lack e-book or 
even print records in WorldCat. Even where records exist, they are often of lesser quality. 
Completing the process of identifying and upgrading or creating these records requires a 
different and more advanced skill set than the initial part of the project and it is not clear 
that the current pool of volunteers has the necessary resources or is willing to make the 
required time commitment. 
 

 9 

The project proceeded more slowly than anticipated in part because the organizer lacked 
time to devote to the project at key points and became a bottleneck in the workflow.  
Likewise, some participants did not complete as many batches nor work as quickly as 
would have been necessary for a more timely finish. This reflects the reality that many 
catalog librarians have extensive demands on their time and that this is a volunteer 
project added to participants' regular duties. Other factors delaying completion were the 
relatively low number of volunteers and the need for error checking.  The second review 
of the records identified errors made by a few volunteers without the appropriate skills, as 
well as record problems that are inevitably and inadvertently missed in a project on this 
scale.  
 
It is difficult, if not impossible, to anticipate the problems encountered in such a project, 
and some policy changes were made after work had begun.  Therefore the editing done 
by participants was not always consistent from beginning to end.  Consistency was also 
affected by making some edits optional, such as adding MeSH and NLM classification 
desired by some participants. 
 
Most batch creation projects will likely require too much work for a single organizer. 
Duties should be well distributed in order to prevent overload and project bottlenecks. A 
model such as the CONSER project for the DOAJ could be implemented by the PCC for 
open access e-book record sets such as those included in the new Directory of Open 
Access Books,22 though many skilled catalogers not at PCC institutions would be 
excluded.  Documentation proved hard to write, and it was even more difficult to get a 
disparate group of participants to follow it.  The project revealed wide variation in the 
skill and knowledge of its volunteers.  No assumptions should be made in this respect, 
particularly given the lack of an accountability mechanism.  This has been a long-
standing problem that was noted in the earliest consortial effort in collaborative batch 
cataloging.23  
 
Though crowdsourcing batch creation through a global cataloging network has 
tremendous potential, it is difficult to ensure that quality work will result, even when 
specific guidance is provided.  One potential way to improve the existing project 
directions would be to include more “before and after” examples of editing records, 
including screenshots. It might also be possible to create a macro for participants to run 
after editing a record that would alert them immediately to areas of the record needing 
attention. Of course, this is limited to the sorts of errors that are amenable to automated 
identification. It would also be wise, although more time-intensive, to initially distribute a 
few records to participants as a test to see if any cataloging misunderstandings exist, 
which would enable faster feedback.  Spreadsheets would only be distributed after 
participants had demonstrated the ability to meet the project's standards for record 
quality.  Alternatively, if batch e-book projects were taken on by the PCC, organizational 
responsibilities could be distributed, and well-trained participants would be ensured. 
 
The ever-changing nature of OCLC’s bibliographic database presents the practical 
problem of maintaining record consistency.  In many of the consortial projects described 
in the literature review, edits were made to records received from vendors and then 


 10 

distributed directly to consortial members. In this scenario, the quality of the distributed 
records can be closely controlled. For our project, since the current OCLC record use 
policy24 does not allow the redistribution of records, we are limited to distributing OCLC 
record numbers that OCLC members can re-search to obtain the records.  The updating 
and building upon existing metadata in OCLC records is usually a positive development, 
for example when different subject vocabularies or genre headings are added to records. 
However, the extensive record merging taking place in OCLC frequently changes record 
content, and sometimes for the worse.  In addition to the introduction of unauthorized 
headings mentioned earlier, the wide variety of URLs used in OCLC records to access 
NAP e-books means that the links probably will not remain standardized. 
 
The pre-distribution quality control employed by this project differs from common 
practice, where records are improved locally, whether before or after record load.25  Local 
quality control is a vastly inefficient process due to the duplicative work done by each 
library receiving the records.  Pre-distribution quality control, even using mostly manual 
record editing, saves a tremendous amount of duplicative effort by ensuring that records 
are largely error-free. One challenge for the local editing model has been lack of a 
method to batch upload record improvements to the network level once they have been 
distributed to individual libraries.  Locally, efficient batch editing can take place via 
MarcEdit or global update in an individual catalog, but transferring these improvements 
to OCLC would require editing individual records one at a time. There is a conflict 
between the need to quickly load records for immediate access to electronic resources, 
and the desire to reduce duplication of effort in the editing and authority control of the 
records. 
 
While network-level cataloging work remains the exception rather than the rule, 
encouraging recent steps have been taken by OCLC toward that ideal.  These include the 
expansion of the pilot Expert Community Experiment into a permanent program, the 
extension of PCC record editing privileges to NACO members, an algorithm to 
programmatically perform heading control on new and existing records, and WorldCat 
Local.  At the same time, network-level cataloging is hindered by OCLC’s record use 
policy and the proliferation of other record sources.  It is also difficult or impossible in 
OCLC’s Connexion interface to do the kinds of efficient batch editing of records 
supported by MarcEdit and many local ILSs.  Newer bibliographic utilities such as 
SkyRiver and Biblios.net freely share records, but OCLC, MARC record services, and 
some vendors place restrictions on record sharing.  Cataloging may become more 
network-like yet remain in restricted silos.  Truly network-level cataloging will require 
freely sharable records.  
 
 
Future Research 
 
Possibilities for new metadata elements emerged during this project.  A code for the 
identification of open access resources would serve two purposes for libraries.  First, it 
would enable catalogers to find and gather resources for adding to the catalog.  Second, if 
the code was part of the catalog display, education and advocacy about open access in 


 11 

academic libraries might be furthered.  Also, a metadata element to indicate publication 
status could help solve the problem with prepublication versions in this project.  This 
indicator could alert a cataloger that the record description needed updating when the 
final publication version was available.  The result would be faster delivery of electronic 
content to catalog users.  Finally, a metadata element to describe e-book formats is badly 
needed.  The term “e-book” has been used for a wide variety of online textual material, 
but users need to know whether it is downloadable (and if so in what format) or only 
readable in a web browser (thus requiring an internet connection). 
 
Further work is needed to ensure metadata consistency between formats. In the course of 
upgrading NAP e-book records, participants often consulted the print record.  Print 
records sometimes contained desirable metadata not present on the e-book record, such as 
MeSH and NLM classification.  Conversely, e-book records often contained contents 
notes and summaries not present on the print record.  It was also more common for e-
book records to link to the print version (using the MARC 776 tag) than vice versa.  The 
two formats occasionally presented conflicting metadata in the form of a differing main 
entry or call number.  This metadata divergence for identical content does not help the 
catalog user.  Implementation of FRBR-aware catalogs, in which metadata at the work 
level can be applied to all formats, may solve this problem.  FRBR may also help in 
situations where multi-volume sets were cataloged on a single record in print, but each 
volume on a separate record as an e-book. 
 
The case for open metadata should be made more clearly and forcefully.  Freely sharable 
metadata will mitigate the effects of the competing MARC record silos that are 
developing.  Open metadata may be a requirement for the linked data environment the 
library world is currently exploring,26 but there is no need to wait until then. 
    
 
Conclusion 
 
The NAP e-book project represents a unique and successful collaboration resulting in a 
batch of over 3,500 records available for loading into local catalogs.  No other cataloging 
project, to our knowledge, has been accomplished with such a wide variety of volunteers.  
While this aspect of the project hindered consistency, future projects can implement 
suggested controls.  The CONSER project for DOAJ could serve as an organizational 
model.  As open access resources increase in numbers and prominence, libraries will 
need to devote greater attention to metadata for them. 
 
Elimination of duplicative record editing is badly needed.  Lacking a mechanism to 
upload a batch of corrected records, this project employed pre-distribution quality control 
at the individual record level.  This quality control was affected by the skills of the 
volunteers as well as the dynamic nature of a large bibliographic database.  Open 
metadata is needed if network level cataloging is to be realized.  Despite the problems 
encountered in this project, organized batch creation projects are an effective way to 
provide access to important collections.  
 

 12 

 
Notes 
 
1. “The National Academies Press Makes All PDF Books Free To Download; More Than 
4,000 Titles Now Available Free To All Readers,” accessed February 28, 2012, 
http://www8.nationalacademies.org/onpinews/newsitem.aspx?RecordID=06022011  
 
2.  Batch listserv, http://listserv.vt.edu/cgi-bin/wa?A0=BATCH.  
 
3. Kirstin Steele, "Free electronic books and weeding," Bottom Line: Managing Library 
Finances 24(3) (2011): 160-161, http://dx.doi.org/10.1108/08880451111185982. 
 
4. Steven Ovadia, "Open-Access Electronic Textbooks: An Overview," Behavioral & 
Social Sciences Librarian 30(1) (2011): 52-56, 
http://dx.doi.org/10.1080/01639269.2011.546767. 
 
5. Karen Cary and Joyce L. Ogburn, “Developing a Consortial Approach to Cataloging 
and Intellectual Access,” Library Collections, Acquisitions, & Technical Services 24 
(2000): 45-51, http://dx.doi.org/10.1016/S1464-9055(99)00095-0. 
 
6. Jackie Shieh, Ed Summers, and Elaine Day, “A Consortial Approach to Cooperative 
Cataloging and Authority Control: The Virtual Library of Virginia (VIVA) Experience,” 
Resource Sharing & Information Networks 16:1 (2002), 33-52, 
http://dx.doi.org/10.1300/J121v16n01_04.  
 
7.  Shieh, Summers, and Day, “A Consortial Approach,” p. 48. 
 
8. Kristin E. Martin and Kavita Mundle, “Cataloging E-books and Vendor Records: A 
Case Study at the University of Illinois at Chicago,” Library Resources & Technical 
Services 54(4) (2010): 227-237, http://alcts.metapress.com/content/h1455767637633x8/. 
 
9.  Martin and Mundle, “Cataloging E-books,” p. 235. 
 
10. Chiat Naun Chew and Susan M. Braxton, “Developing Recommendations for 
Consortial Cataloging of Electronic Resources: Lessons Learned,” Library Collections, 
Acquisitions, & Technical Services 29 (2005): 307-325, 
http://dx.doi.org/10.1016/j.lcats.2005.08.005. 
 
11. Carrie A. Preston, “Cooperative E-Book Cataloging in the OhioLINK Library 
Consortium,” Cataloging & Classification Quarterly 49 (2011): 257-276, 
http://dx.doi.org/10.1080/01639374.2011.571147. 
 
12.  All Things Cataloged, “Publisher e-book metadata,” (July 7, 2011), accessed 
February 28, 2012, https://allthingscataloged.wordpress.com/2011/07/07/publisher-e-
book-metadata/.  
 

http://www8.nationalacademies.org/onpinews/newsitem.aspx?RecordID=06022011�
http://listserv.vt.edu/cgi-bin/wa?A0=BATCH�
http://dx.doi.org/10.1108/08880451111185982�
http://dx.doi.org/10.1080/01639269.2011.546767�
http://dx.doi.org/10.1016/S1464-9055(99)00095-0�
http://dx.doi.org/10.1300/J121v16n01_04�
http://alcts.metapress.com/content/h1455767637633x8/�
http://dx.doi.org/10.1016/j.lcats.2005.08.005�
http://dx.doi.org/10.1080/01639374.2011.571147�
https://allthingscataloged.wordpress.com/2011/07/07/publisher-e-book-metadata/�
https://allthingscataloged.wordpress.com/2011/07/07/publisher-e-book-metadata/�


 13 

13. Jeffrey Beall,  “Free Books: Loading Brief MARC Records for Open-Access Books 
in an Academic Library Catalog,” Cataloging & Classification Quarterly 47 (2009): 452-
463, http://dx.doi.org/10.1080/01639370902870215.   
 
14. CONSER,  “Cooperative Open Access Journal Project Planning Group Report, April 
30, 2010.”  http://www.loc.gov/acq/conser/Open-Access-Report.pdf .  
 
15. CONSER,  “Open Access Journal Project FAQ.”  Updated August 26, 2010. 
http://www.loc.gov/acq/conser/Open-Access-FAQ.html. 
 
16. E.S. Hellman, “Open Access E-books,” The No Shelf Required Guide to E-book 
Purchasing, Library Technology Reports 47:8 (2011): 18-27, 
http://alatechsource.metapress.com/content/r7u235k327mm3q3h/. 
 
17.  “The Online Books Page,” edited by John Mark Ockerbloom, accessed February 28, 
2012, http://onlinebooks.library.upenn.edu. 
 
18.  Hellman, “Open Access E-Books,” p. 24. 
 
19. Program for Cooperative Cataloging, Provider-Neutral E-Monograph MARC Record 
Guide. (Prepared by Becky Culbertson, Yael Mandelstam, George Prager, includes 
revisions to September 2011).  Retrieved February 7, 2012 from 
http://www.loc.gov/catdir/pcc/bibco/PN_Guide_20110915.pdf. 
 
20. “Subscribe to the NAP Newsletter,” The National Academies Press, accessed March  
6, 2012, http://www.nap.edu/updates/index.html. 
 
21. “New Releases,” The National Academies Press, accessed March 6, 2012, 
http://www.nap.edu/new.html.   
 
22. “A New Service for Open Access Monographs: the Directory of Open Access 
Books,” Open Access Publishing in European Networks, accessed March 5, 2012, 
http://project.oapen.org/index.php/news/46-doab-press-release.  
 
23. Cary and Ogburn, “Developing a Consortial Approach,” p. 50. 
 
24. “WorldCat Record Use Policy,” OCLC, accessed February 28, 2012, 
http://www.oclc.org/worldcat/recorduse/default.htm. 
 
25. Elaine Sanchez, Leslie Fatout, Aleene Howser, and Charles Vance, “Cleanup of 
NetLibrary Cataloging Records: A Methodical Front-End Process,” Technical Services 
Quarterly 23(4) (2006), 51-71, http://dx.doi.org/10.1300/J124v23n04_04. 
 
26. Raymond Bérard, “Free Library Data?,” Liber Quarterly 20:3/4 (2011), 321-331, 
http://liber.library.uu.nl/publish/articles/000512/article.pdf.  
 

http://dx.doi.org/10.1080/01639370902870215�
http://www.loc.gov/acq/conser/Open-Access-Report.pdf�
http://www.loc.gov/acq/conser/Open-Access-FAQ.html�
http://alatechsource.metapress.com/content/r7u235k327mm3q3h/�
http://onlinebooks.library.upenn.edu/�
http://www.loc.gov/catdir/pcc/bibco/PN_Guide_20110915.pdf�
http://www.nap.edu/updates/index.html�
http://www.nap.edu/new.html�
http://project.oapen.org/index.php/news/46-doab-press-release�
http://www.oclc.org/worldcat/recorduse/default.htm�
http://dx.doi.org/10.1300/J124v23n04_04�
http://liber.library.uu.nl/publish/articles/000512/article.pdf�


 14 

 
Appendix 
 

National Academies Press Free Ebook Project 

Goals 
• Update OCLC ebook master record for free NAP ebooks to include NAP URL in 

the form http://www.nap.edu/catalog.php?record_id=????? 
• Perform basic quality control on the master record 
• Compile a list of OCLC numbers that can be used by participants and others to 

batch search, attach holdings and load the complete set of records into local 
catalogs 

Initial set up 
A list of NAP title IDs, titles, and (where available) ISBNs has been prepared covering 
the free NAP ebooks. Spreadsheets have been prepared that include batch search strings 
and 856 fields that can be cut and pasted into Connexion records. 

Perform batch search on assigned record range 
Each participant in this project will receive one or more spreadsheets with a list of titles 
to search and upgrade. The spreadsheet has columns with various search strategies that 
can be used for batch or individual searches and a column with URLs for pasting into 
Connexion. 
 
The columns that can be used for batch searches are: ISBN, ISBN limited to records held 
by Ebrary, title keyword combined with “National Academ*” as publisher (to pick up 
National Academy or National Academies) both as a plain search and as a search limited 
to records held by Ebrary. Each search is limited to mt:cai (cataloged as internet 
resource) and ll:eng (for English language records). 
 
To use for batch searching, select and copy the cells from the column you intend to use 
and paste the results in a Notepad text file. Separate instructions are provided for using 
the text file with the selected searches to generate a batch search in Connexion if you are 
not familiar with this process. 
 
Searches from any of these columns can also be copied and pasted into the command line 
search box and run individually. 

Select record to use and make the following edits 
If more than one record is retrieved, select the best record. Once you have selected a 
record to use, make the following edits. 
 

 15 

Try to select a record that has an LC call number and LC subject headings that appears to 
be based on the print record. If the record does not have an LC call number and LC 
subject headings, add them if possible (note: I am finding some records with 588 
Description based on print record with no 776 and no print record that I can find in 
OCLC; these probably should not be cataloged as “based on print record.”) 
 
Choose a record for an online item (no print records with URLs) 
Encoding level I or L if possible (upgrade if you feel comfortable) 
 
Check and fix if needed: 
008/23/Form = o 
006 = m\\\\\\\\d 
007 = cr [leave any additional codes if already on the record, but it is not necessary to add 

them unless it is your local practice  
020: add $z in front of any print ISBNs 
050 _4  This is preferred to the 090 for LC call numbers not assigned by LC; ditto for the 

060 _4 
245|h = [electronic resource] [If an AACR2 record; there is at least one RDA record in 

this set] 
Control headings if possible 
Provider-Neutral tidbits: If you are cataloging as P-N, then you will either be using 500  

Title from PDF t.p. (National Academies Press, viewed July 1, 2011) OR 588  
Description based on print version record.  In either case, with P-N, delete the 538 
Mode of access note.  [NOTE: The 500 field in the sentence above has now been 
changed to:  “...then you either be using 588  Description based on online 
resource; title from PDF t.p. (National Academies Press, viewed...”  ] 

 
Make sure there are LCSH subject headings in the record; would be nice to add 

NLM/MeSH headings if they are available and you have time 
Delete any institution-specific or proxied 856 fields 
856 40  Remember that the second indicator is zero 
Add National Academies Press catalog record URL with $3 for National Academies 

Press (can copy from spreadsheet; might be a good idea to make first URL). Do 
not include $z 
URL should be in the form: http://www.nap.edu/catalog.php?record_id=12815 
A good practice would be to make this the first URL (since it will be accessible to 

all users) and to delete other URLs going to the NAP site (so that we have 
consistent results for manipulation with MarcEdit) 

If the record is not provider-neutral, you may choose to make it into a provider neutral 
record, but this is not required. 

 
**IMPORTANT** Click on the URL in the 856 and make sure it works. 
 
After making changes, replace master record. 

http://www.nap.edu/catalog.php?record_id=12815�


 16 

Update spreadsheet and return to coordinator 
Update your copy of the spreadsheet with the relevant OCLC numbers. Just insert the 
plain OCLC numbers; it is not necessary to add any prefixes such as * or #. Add any 
questions or concerns or describe any unusual situations in the notes column. For 
example, it would be useful to note RDA records in the notes column. Return your 
completed spreadsheet to the coordinator by email. This will be used to compile a list of 
OCLC numbers that will be distributed to participants and posted publicly somewhere. 


	National Academies Press Free Ebook Project
	Goals
	Initial set up
	Perform batch search on assigned record range
	Select record to use and make the following edits
	Update spreadsheet and return to coordinator