Practical Limits to the Scope 

of Digital Preservation Mike Kastellec 
 

PRACTICAL LIMITS TO THE SCOPE OF DIGITAL PRESERVATION | KASTELLEC  63 
 

ABSTRACT 

This paper examines factors that limit the ability of institutions to digitally preserve the cultural 

heritage of the modern era. The author takes a wide-ranging approach to shed light on limitations to 

the scope of digital preservation.  The author finds that technological limitations to digital 

preservation have been addressed but still exist, and that non-technical aspects—access, selection, 

law, and finances—move into the foreground as technological limitations recede.  The author 

proposes a nested model of constraints to the scope of digital preservation and concludes that costs 

are digital preservation’s most pervasive limitation. 

INTRODUCTION 

Imagine for a moment what perfect digital preservation would entail: A perfect archive would 

capture all the content generated by humanity instantly and continuously. It would catalog that 

information and make it available to users, yet it would not stifle creativity by undermining 

creators’ right to control their creations. Most of all, it would perfectly safeguard all the 

information it ingested eternally, at a cost society is willing and able to sustain. 

Now return to reality: digital preservation is decidedly imperfect. Today’s archives fall far short of 

the possibilities outlined above. Much previous scholarship debates the quality of different digital 

preservation strategies; this paper looks past these arguments to shed light on limitations to the 

scope of digital preservation. What are the factors that limit the ability of libraries, archives, and 

museums (henceforth collectively referred to as archival institutions) to digitally preserve the 

cultural heritage of the modern era?
1 

I first examine the degree to which technological limitations 

to digital preservation have been addressed. Next, I identify the non-technical factors that limit the 

archival of digital objects. Finally, I propose a conceptual model of limitations to digital 

preservation. 

TECHNOLOGY 

Any discussion of digital preservation naturally begins with consideration of the limits of digital 

preservation technology. While all aspects of digital preservation are by definition related to 

technology, there are two purely technical issues at the core of digital preservation: data loss and  

technological obsolescence.
2 

Many things can cause data loss. The constant risk is physical 

deterioration. A digital file consists at its most basic level as binary code written to some form of 

 
Mike Kastellec (makastel@ncsu.edu) is Libraries Fellow, North Carolina State University Libraries, 

Raleigh, NC. 

mailto:makastel@ncsu.edu


 

INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012  64 

physical media. Just like analog media (paper, vinyl recordings), digital media (optical discs, hard 

drives) are subject to degradation at a rate determined by the inherent properties of the medium 

and environment in which it is stored.
 3

 When the physical medium of a digital file decays to the 

point where one or more bits lose their definition, the file becomes partially or wholly unreadable. 

Other causes of data loss include software bugs, human action (e.g., accidental deletion or 

purposeful alteration), and environmental dangers (e.g., fire, flood, war). 

Assuming a digital archive can overcome the problem of physical deterioration, it then faces the 

issue of technological obsolescence. Binary code is simply a string of zeroes and ones (sometimes 

called a bitstream)—like any encoded information, this code is only useful if it can be decoded into 

an intelligible format. This process depends on hardware, used to access a bitstream from a piece 

of physical media, and software, which decodes the bitstream into an intelligible object, such as a 

document or video displayed on a screen, a printout, or an audio output. Technological 

obsolescence occurs when either the hardware or software needed to render a bitstream usable is 

no longer available. Given the rapid pace of change in computer hardware and software, 

technological obsolescence is a constant concern.
4
 

Most digital preservation strategies involve staying ahead of deterioration and obsolescence by 

copying data from older to current generations of file formats and storage media (migration) or by 

keeping many copies that are tested against one another to find and correct errors (data 

redundancy).
5 

Other strategies to overcome obsolescence include pre-emptively converting data 

to standardized formats (normalization) or avoiding conversion and instead using virtualized 

hardware and software to simulate the original digital environment needed to access obsolete 

formats (emulation). As may be expected of a young field,
6 

there is a great deal of debate over the 

merits of each of these strategies. To date, the arguments mostly concern the quality of 

preservation, which is beyond the scope of this work. What should not be contentious is that each 

strategy also imposes limitations on the potential scale of digital preservation. Migration and 

normalization are intensive processes, in the sense that they normally require some level of 

human interaction. Any human-mediated process limits the scale of an archival institution’s 

preservation activities, as trained staffs are a limited and expensive resource. Emulation 

postpones the processing of data until it is later accessed, potentially allowing greater ingest of 

information. As a strategy, however, it remains at least partly theoretical and untested, increasing 

the possibility that future access will be limited. Data redundancy deserves closer examination, as 

it has emerged as the gold standard in recent years. 

The limitations data redundancy imposes on digital preservation are two-fold. The first is that 

simple maintenance of multiple copies necessarily increases expenses, therefore—given equal 

levels of funding—less information can be preserved redundantly than can be preserved without 

such measures. (Cost considerations are inextricably linked to every other limitation on digital 

preservation and are examined in greater detail in “Finances,” below.) There are practical, 

technical limitations on the bandwidth, disk access, and processing speeds needed to perform 



 

PRACTICAL LIMITS TO THE SCOPE OF DIGITAL PRESERVATION | KASTELLEC  65 

parity checks (tests of each bit’s validity) of large datasets to guard against data loss. Pushing 

against these limitations incurs dramatic costs, limiting the scale of digital preservation. Current 

technology and funding are many orders of magnitude short of what is required to archive the 

amount of information desired by society over the long term.
7
 

The second way technology limits digital preservation is more complex—it concerns error rates of 

archived data. Non-redundant storage strategies are also subject to errors, of course. Only 

redundant systems have been proposed as a theoretical solution to the technological problem of 

digital preservation,
8 

though, so it is necessary to examine their error rate in particular. On a 

theoretical level, given sufficient copies, redundant backup is all but infallible. In practice, 

technological limitations emerge.
9 

The number of copies required to ensure perfect bit 

preservation is a function of the reliability of the hardware storing each copy. Multiple studies 

have found that hardware failure rates greatly exceed manufacturers’ claims.
10 

Rosenthal argues 

that, given the extreme time spans under consideration, storage reliability is not just unknown but 

untestable.
11 

He therefore concludes that it cannot be known with certainty how many copies are 

needed to sustain acceptably low error rates. Even today’s best digital preservation technologies 

are subject to some degree of loss and error. 

Analog materials are also inevitably subject to deterioration, of course, but the promise of digital 

media leads many to unrealistic expectations of perfection. Nevertheless, modern digital 

preservation technology addresses the fundamental needs of archival institutions to a workable 

degree. Technological limitations to digital preservation still exist but the aspects of digital 

preservation beyond purely technical considerations—access, selection, law, and finances—

should gain greater relative importance than they have in the past. 

ACCESS 

With regard to digital preservation, there are two different dimensions of access that are 

important. At one end of a digital preservation operation, authorized users must be able to access 

an archival institution’s holdings and unauthorized users restricted from doing so. This is largely a 

question of technology and rights management—users must be able to access preserved 

information and permitted to do so. This dimension of access is addressed in the Technology and 

Law sections of this paper. The other dimension of access occurs at the other end of a digital 

preservation operation: An archival institution must be able to access a digital object to preserve it. 

This simple fact leads to serious restrictions on the scope of digital preservation because much of 

the world’s digital information is inaccessible for the purposes of archiving by libraries and 

archives. 

There are a number of reasons why a given digital object may be inaccessible. Large-scale 

harvesting of webpages requires automated programs that “crawl” the Web, discovering and 

capturing pages as they go. Web crawlers cannot access password-protected sites (e.g., Facebook) 

and database-backed sites (all manner of sites, including many blogs, news sites, e-commerce sites, 



 

INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012  66 

and countless collections of data). This inaccessible portion of the Web is estimated to dwarf the 

readily accessible portion by orders of magnitude. There is also an enormous amount of 

inaccessible digital information that is not part of the Web at all, such as emails, company 

intranets, and digital objects created and stored by individuals.
12

 

Additionally, there is a temporal limit to access. Some digital objects only are accessible (or even 

exist) for a short window of time, and all require some measure of active preservation to avoid 

permanent loss.
13 

The lifespans of many webpages are vanishingly short. Other pages, like some 

news items, are publicly accessible for a short window before they are hidden behind paywalls. 

Even long-lasting digital objects are often dynamic: the ads accompanying a webpage may change 

with each visit; news articles and other documents are revised; blog posts and comments are 

deleted. If an archival institution cannot access a digital object quickly or frequently enough, the 

object cannot be archived, at least not completely. Large-scale digital preservation, which in 

practice necessarily relies on periodic automated harvesting of content, is therefore limited to 

capturing snapshots of the changes digital objects undergo over their lifespans. 

LAW 

Existing copyright law does not translate well to the digital realm. Leaving aside the complexities 

of international copyright law, in the United States it is not clear, for example, whether an archival 

institution like the Library of Congress is bound by licensing restrictions and if it can require 

deposit of digital objects, nor whether content on the Web or in databases should be treated as 

published or unpublished.
14 

“Many of the uncertainties come from applying laws to technologies 

and methods of distribution they were not designed to address.”
15 

A lack of revised laws or even 

relevant court decisions significantly impacts the potential scale of digital preservation, as few 

archival institutions will venture to preserve digital objects without legal protection for doing so. 

Given this unclear legal environment, efforts at large-scale digital preservation are hampered by 

the need to secure permission to archive from the rights holder of each piece of content.
16 

This 

obviously has enormous impact on preserving the Web, but even scholarly databases and 

periodical archives may not hold full rights to all of their published content. Additionally, a single 

digital object can include content owned by any number of authors, each of whose permission is 

needed for legal archival. 

Without stronger legal protection for archival institutions, the scope of digital preservation is 

severely limited by copyright restrictions. Digital preservation is further limited by licensing 

agreements, which can be even more restrictive than general copyright law. Frequently, purchase 

of a digital object does not transfer ownership to the end-user, but rather grants limited licensed 

access to the object. In this case, libraries do not enjoy the customary right of first sale that, among 

other things, allows for actions related to preservation that would otherwise breach copyright.
17 

Preservation of licensed works requires that libraries either cede archival responsibility to rights 



 

PRACTICAL LIMITS TO THE SCOPE OF DIGITAL PRESERVATION | KASTELLEC  67 

holders, negotiate the right to archive licensed copies, or create dark archives that preserve 

objects in an inaccessible state until their copyright expires. 

SELECTION 

The limitation selection imposes on digital preservation hinges on the act of intellectual appraisal. 

The total digital content created each year already outstrips the total current storage capacity of 

the world by a wide margin.
18 

It is clear libraries and archives cannot preserve everything so, 

more than ever, deciding what to preserve is critical.
19

 

Models of selection for digital objects can be plotted on a scale according to the degree of human 

mediation they entail. At one end, the selective model is closest to selection in the analog world, 

with librarians individually identifying digital objects worthy of digital preservation. At the other 

end of the scale, the whole domain model involves minimal human-mediation, with automated 

harvesting of digital objects. The collaborative model, in which archival institutions negotiate 

agreements with publishers to deposit content, falls somewhere between these two extremes, as 

does the thematic model, which can apply either selective- or whole-domain-type approaches to 

relatively narrow sets of digital objects defined by event, topic, or community. 

Each of these approaches results in limits to the scope of digital preservation. The human 

mediation of the selective model limits the scale of what can be preserved, as objects can only be 

acquired as quickly as staff can appraise them. The collaborative and thematic models offer the 

potential for thorough coverage of their target but by definition are limited in scope. The whole 

domain model avoids the bottleneck of human appraisal but, more than any other model, is 

subject to the access limitations discussed above. Whole domain harvesting is also essentially  

wasteful, as it is an anti-selection approach—everything found is kept, irrespective of potential 

value. This wastefulness makes the whole domain model extremely expensive because of the 

technological resources required to manage information at such a scale. 

FINANCES 

The ultimate limiting factor is financial reality. Considerations of funding and cost have both broad 

and narrow effects. The narrow effects are on each of the other limitations previously identified—

financial constraints are intertwined with the constraints imposed by technology, access, law, and 

selection. The technological model of digital preservation that offers the highest quality and 

lowest risk, redundant offsite copies, also carries hard-to-sustain costs. While the cost of storage 

continues to drop, hardware costs actually make up only a small percentage of the total cost of 

digital preservation. Power, cooling, and—for offsite copy strategies—bandwidth costs are 

significant and do not decrease as scale increases to the same degree that storage costs do. Cost 

considerations similarly fuel non-technical limitations: Increased funding can increase the rate at 

which digital objects are accessed for preservation and can enable development of systems to 

mine deep Web resources. Selection is limited by the number of staff who can evaluate objects or 



 

INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012  68 

the need to develop systems to automate appraisal. Negotiating perpetual access to objects or 

arranging to purchase archival copies creates additional costs. 

The broad financial effect is that any digital preservation requires dedicated funding over an 

indefinite timespan. Lavoie outlines the problem: 

Much of the discussion in the digital preservation community focuses on the 

problem of ensuring that digital materials survive for future generations. In 

comparison, however, there has been relatively little discussion of how we can 

ensure that digital preservation activities survive beyond the current availability 

of soft-money funding; or the transition from a project's first-generation 

management to the second; or even how they might be supplied with sufficient 

resources to get underway at all.
20

 

There are many possible funding models for digital preservation,
21 

each with their own 

limitations. Creators and rights holders can preserve their own content but normally have little 

incentive to do so over the long-term, as demand for access slackens. Publicly funded agencies can 

preserve content, but they may lack a clear mandate for doing so, and they are chronically 

underfunded. Preservation may be voluntarily funded, as is the case for Wikipedia, although it is 

not clear if there is enough potential volunteer funding for more than a few preservation efforts. 

Fees may support preservation, either through charging users for access or by third-party 

organizations charging content owners for archival services; in such cases, however, fees may also 

discourage access or provision of content, respectively. 

A Nested Model of Limitations 

These aspects can be seen as a series of nested constraints (see figure 1). 



 

PRACTICAL LIMITS TO THE SCOPE OF DIGITAL PRESERVATION | KASTELLEC  69 

 

Figure 1. Nested Model of Limitations 

At the highest level, there are technical limitations on how much digital information can be 

preserved at an acceptable quality. Within that constraint, only a limited portion of what could 

possibly be preserved can be accessed by archival institutions for digital preservation. Next, 

within that which is accessible, there are legal limitations on what may be archived. The subset 

defined by technological, access, and legal limitations still holds far more information than 

archival institutions are capable of archiving, therefore selection is required, entailing either the 

limited quality of automated gathering or the limited quantity of human-mediated appraisal. 

Finally, each of these constraints is in turn limited by financial considerations, so finances exert 

pressure at each level.  

CONCLUSION 

It is possible to envision alternative ways to model these series of constraints—the order could be 

different, or they could all be centered on a single point but not nested within each other. Thus, 

undue attention should not be given to the specific sequence outlined above. One important 

conclusion that may be drawn, however, is that the identified limitations are related but distinct. 

The preponderance of digital preservation research to date has understandably focused on 

overcoming technological limitations. With the establishment of the redundant backup model, 

which addresses technological limitations to a workable degree, the field would be well served by 

greater efforts to push back the non-technical limitations of access, law, and selection. The other 

conclusion is that costs are digital preservation’s most pervasive limitation. As Rosenthal plainly 

states it, “Society’s ever-increasing demands for vast amounts of data to be kept for the future are 



 

INFORMATION TECHNOLOGY AND LIBRARIES | JUNE 2012  70 

not matched by suitably lavish funds.”
 22

 If funding cannot be increased, expectations must be 

tempered. 

Perhaps it has always been the case, but the scale of the digital landscape makes it clear that 

preservation is a process of triage. For the foreseeable future, the amount of digital information 

that could possibly be preserved far outstrips the amount that feasibly can be preserved. It is 

useful to put the advances in digital preservation technology in perspective and to recognize that 

non-technical factors also play a large role in determining how much of our cultural heritage may 

be preserved for the benefit of future generations. 

REFERENCES AND NOTES
 

1.  
Issues specific to digitized objects (i.e., digital versions of analog originals) are not specifically 

addressed herein. Technological limitations apply equally to digitized and born-digital objects, 

however, and the remaining limitations overlap greatly in either case.  

2. Francine Berman et al., Sustainable Economics for a Digital Planet: Ensuring Long-Term Access 

to Digital Information (Blue Ribbon Task Force on Sustainable Digital Preservation and Access, 

2010), http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdf (accessed Apr. 23, 2011). 

3. Marilyn Deegan and Simon Tanner, “Some Key Issues in Digital Preservation,” in Digital 

Convergence—Libraries of the Future, ed. Rae Earnshaw and John Vince, 219–37 (London: 

Springer London, 2007), www.springerlink.com.proxy-

remote.galib.uga.edu/content/h12631/#section=339742&page=1 (accessed Nov. 18, 2010). 

4. Berman et al., Sustainable Economics for a Digital Planet; Deegan and Tanner, “Digital 

Convergence.” 

5. Data redundancy normally will also entail hardware migration; it may or may not also 

incorporate file format migration. 

6. The Library of Congress, for instance, only began digital preservation in 2000 

(www.digitalpreservation.gov/partners/pioneers/index.html [accessed Apr. 24, 2011]). 

7. David S. H. Rosenthal, “Bit Preservation: A Solved Problem?” International Journal of Digital 

Curation 5, no. 1 (July 21, 2010), www.ijdc.net/index.php/ijdc/article/view/151 (accessed 

Mar. 14, 2011). 

8. H. M. Gladney, “Durable Digital Objects Rather Than Digital Preservation,” January 1, 2008, 

http://eprints.erpanet.org/149 (accessed Mar. 14, 2011). 

9. Rosenthal, “Bit Preservation.” 

10. Ibid. Rosenthal cites studies by Schroeder and Gibson (2007) and Pinheiro (2007). 

11. Ibid. 

http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdf
file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.springerlink.com.proxy-remote.galib.uga.edu/content/h12631/%23section=339742&page=1
file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.springerlink.com.proxy-remote.galib.uga.edu/content/h12631/%23section=339742&page=1
http://www.digitalpreservation.gov/partners/pioneers/index.html
file:///C:/Users/GERRITYR/Desktop/ITAL%2031n2_PROOFREAD/www.ijdc.net/index.php/ijdc/article/view/151
http://eprints.erpanet.org/149/


 

PRACTICAL LIMITS TO THE SCOPE OF DIGITAL PRESERVATION | KASTELLEC  71 

12. Peter Lyman, “Archiving the World Wide Web,” in Building a National Strategy for Digital 

Preservation: Issues in Digital Media Archiving (Washington, DC: Council on Library and 

Information Resources and Library of Congress, 2002), 38–51, 

www.clir.org/pubs/reports/pub106/pub106.pdf (accessed Dec. 1, 2010); F. McCown, C. C 

Marshall, and M. L Nelson, “Why Web Sites are Lost (and how they’re sometimes found),” 

Communications of the ACM 52, no. 11 (2009): 141–45; Margaret E. Phillips, “What Should We 

Preserve? The Question for Heritage Libraries in a Digital World,” Library Trends 54, no. 1 

(Summer 2005): 57–71. 

13. Deegan and Tanner, “Digital Convergence”; McCown, Marshall, and Nelson, “Why Web Sites are 

Lost (and how they’re sometimes found).” 

 14. June Besek, Copyright Issues Relevant to the Creation of a Digital Archive: A Preliminary 

Assessment (The Council on Library and Information Resources and the Library of Congress, 

2003), www.clir.org/pubs/reports/pub112/contents.html (accessed Mar. 15, 2011). 

15. Ibid., 17. 

16. Archival institutions that do not pay heed to this restriction, such as the Internet Archive 

(www.archive.org), claim their actions constitute fair use. The legality of this claim is as yet 

untested. 

17. Berman et al., Sustainable Economics for a Digital Planet. 

18. Francine Berman, “Got Data?” Communications of the ACM 51, no. 12 (December 2008): 50, 

http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=ACM&idx=J79&part

=magazine&WantType=Magazines&title=Communications (accessed Nov. 20, 2010). 

19. Phillips, “What Should We Preserve?” 

20. Brian F. Lavoie, “The Fifth Blackbird,” D-Lib Magazine 14, no. 3/4 (March 2008): I, 

www.dlib.org/dlib/march08/lavoie/03lavoie.html (accessed Mar. 14, 2011). 

21. Berman et al., Sustainable Economics for a Digital Planet.  

22. Rosenthal, “Bit Preservation.” 

http://www.clir.org/pubs/reports/pub106/pub106.pdf
file:///C:/Users/GERRITYR/Documents/My%20Dropbox/ITAL/ITAL_June_2012_preprints/,%20http:/www.clir.org/pubs/reports/pub112/contents.htm
http://www.archive.org/
http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=ACM&idx=J79&part=magazine&WantType=Magazines&title=Communications
http://portal.acm.org/citation.cfm?id=1409360.1409376&coll=portal&dl=ACM&idx=J79&part=magazine&WantType=Magazines&title=Communications
http://www.dlib.org/dlib/march08/lavoie/03lavoie.html
http://www.dlib.org/dlib/march08/lavoie/03lavoie.html