ladwig.indd Using Cited Half-life to Adjust Download Statistics J. Parker Ladwig and Andrew J. Sommese “Supplying accurate CPU [cost-per-serial use] information to faculty and appropriate marketing of the alternate modes of delivery ... become the key to achieving an optimal cost-efficient serials collection in an academic library.”1 A model is presented for adjusting use statistics using a journal’s ISI Journal Citation Reports cited half-life.The goal is to improve the method used to evaluate the raw electronic download figure.The proposed model will still undercount total use, but the undercounting will be proportional across disciplines and less severe. By using this model, librarians can avoid making cancellation decisions that may cost their libraries more money in the long run. n the spring of 2004, the University Libraries of Notre Dame began another round of journal cancellations. One overall goal was to be as cost- efficient as possible. First, the subscrip- tion cost of a journal was divided by the number of full-text downloads for one year to calculate a cost per download. Then, this figure was compared to the average commercial document delivery (docdel) cost. Those journals that cost more per download than the estimated docdel cost became candidates for can- cellation. W h e n t h e m e t h o d o l o g y wa s e x - plained to the Mathematics Depart- ment’s library committee, questions were raised not only about the cancel- lations, but also about the methodology employed. One obvious question was, “How could only one year ’s worth of download statistics be a fair measure of use?” Upon reflection, the second author of this article uncovered an even more serious flaw: the downloads had not been adjusted for the journals’ half- lives. Because the electronic runs are short (e.g., six years for many Springer Verlag journals as of 2003), raw down- load numbers would be reasonable for short half-life journals but would sig- nificantly undercount the downloads for long half-life journals. To demonstrate the significance of the ISI Journal Citation Reports cited half-life, this article will discuss the importance of journal use statistics, explain cited half- life and its importance, and then present a model for adjusting download statistics (including the model’s assumptions and problems). J. Parker Ladwig is the Mathematics Librarian in University Libraries at the University of Notre Dame; e-mail: ladwig.1@nd.edu. Andrew J. Sommese is the Duncan Professor of Mathematics in the Department of Mathematics at the University of Notre Dame: e-mail: sommese@nd.edu. The authors would like to express their thanks to the referees of this article for their helpful suggestions. 527 mailto:sommese@nd.edu mailto:ladwig.1@nd.edu 528 College & Research Libraries November 2005 Importance of Use Statistics A main reason for the library to collect use statistics is so that it can maximize the return on its investment (ROI). For example, when someone buys a car, he or she expects to maximize his or her invest- ment. If the $20,000 car is expected to last ten years, this is equivalent to expecting to get at least $2,000 worth of use from it each year (ignoring the effects of infla- tion). In the same way, if a library pays $500 for a subscription to one volume of a journal (assuming that only one volume is published for the year), the library expects that it will get at least $500 worth of use over the volume’s lifetime. Journal use statistics are collected in order to perform this sort of calculation. Because it is difficult to separate the use of one volume of a journal over its lifetime (assuming one volume is published per year), a library examines the use of the entire run, calculates the cost per use (CPU), and asks if the CPU is greater than the expected ROI. One measure of expected ROI is the cost of alternate modes of access, namely, interlibrary loan (ILL) and docdel. Because docdel is clearly more expensive than ILL, only the cost of docdel is compared to the journal’s CPU. If the CPU (i.e., the cost of using the entire run of the journal over a year) is more than the cost of one article via docdel, there is a strong argument for canceling the subscription and investing any net savings in docdel or ILL. This sort of analysis does not work in making decisions to subscribe to a journal. Even if more is spent on docdel per year for a particular journal than the annual subscription, there is not sufficient infor- mation to make a subscription decision (i.e., the publication years of the requested articles are generally unknown). Even if most of the articles are from the last few years and their docdel cost is significantly higher than a subscription, the cost for the first year or two will include both the sub- scription cost and the cost for docdel (or ILL) for articles from the years the library does not own. Thus, the model presented in this paper is useful for cancellation de- cisions but would need to be modified for decisions about new subscriptions. Definition of Cited Half-life ISI defines and explains cited half-life as follows: The cited half-life is the number of publication years from the current year which account for 50% of cur- rent citations received. This figure helps you evaluate the age of the majority of cited articles published in a journal. Each journal’s cited half-life is shown in the Journal Rankings Window. Only those jour- nals cited 100 or more times have a cited half-life. The chronological distribution of the cumulative percent of citations re- ceived per publication year is shown in the Cited Half-Life Calculation dialog box. A higher or lower cited half-life does not imply any particular value for a journal. For instance, a primary research journal might have a longer cited half-life than a journal that provides rapid communication of current information. Cited Half- Life figures may be useful to assist in collection management and ar- chiving decisions. Dramatic changes in Cited Half-Lifes [sic] over time may indicate a change in a journal’s format. Studying the half-life data of the journals in a comparative study may indicate differences in format and publication history.2 To illustrate cited half-life, consider the ISI citation figures for Nature Cell Biology, Communications in Partial Differential Equa- tions, and Mathematische Annalen listed in table 1. Data from 2003 are the most recent available. In 2003, Nature Cell Biology had an ISI cited half-life of 2.7 years. Specifically, of Using Cited Half-life to Adjust Download Statistics 529 TABLE 1 Percent of Citations to Each Journal “Breakdown of the citations to the journal by the cumulative percent of 2003 cites to articles published in the following years” (JCR) 2003 2002 2001 2000 1999 1998 1997 1996 1995 1994 1993- all Nat. Cell. Biol. 5.1 28.8 57.6 84.4 99.9 99.9 99.9 99.9 99.9 99.9 100.0 Comm. PDE 0.5 3.6 10.7 18.0 24.3 29.5 35.9 42.2 47.5 53.1 100.0 Math. Ann. 0.5 2.5 6.4 9.3 11.8 15.1 17.5 19.6 21.7 24.2 100.0 the total number of citations referring to Nat. Cell. Biol. from all the journals tracked by ISI, 5.08 percent were to articles pub- lished in 2003; 28.84 percent to articles published in 2002 or 2003; 57.62 percent to articles published in 2001, 2002, or 2003; and 84.39 percent to articles published in 2000, 2001, 2002, or 2003. Thus, with just the four most recent years of Nat. Cell. Biol. readily available, a researcher would be able retrieve more than 80 percent of the current citations to it. Communications in Partial Differential Equations had a cited half-life of 9.5 years. Of the total number of citations referring to Comm. PDE, 0.52 percent were to ar- ticles published in 2003; 3.60 percent to articles published in 2002 or 2003; 10.72 percent to articles published in 2001, 2002, or 2003; and 17.99 percent to articles pub- lished in 2000, 2001, 2002, or 2003. Thus, with just the four most recent years of Comm. PDE available, a researcher would be able to retrieve less than 20 percent of the citations to it. For Mathematische Annalen, a leading pure mathematics journal, the situation is even more dramatic. The cited half-life is listed as > 10 (if the ISI cited half-life is calculated as described below, it would be approximately 23 years). Of the total number of citations referring to Math. Ann., 0.51 percent were to articles pub- lished in 2003; 2.53 percent to articles published in 2002 or 2003; 6.39 percent to articles published in 2001, 2002, or 2003; and 9.31 percent to articles published in 2000, 2001, 2002, or 2003. Thus, with just FIGURE 1 ISI Half-lives for Three Journals 0% 25% 50% 75% 100% 0 1 2 3 4 5 6 7 8 9 Number of years before 2003 that cited Nat. Cell. Bi Comm. PDE Math. Ann. ISI half- life 2.7 yrs ISI half- life 9.2 yrs ISI half- life >10 yrs C u m u la ti ve p er ce n ta ge o f 20 03 ci ta ti on s fr om o th er j ou rn al s 530 College & Research Libraries November 2005 FIGURE 2 Least-squares Estimate for Nature Cell Biology 0% 25% 50% 75% 100% 0 1 2 3 4 5 6 7 8 9 Number of years before 2003 that cited articles were published ISI data Least-squares est. C u m u la ti ve p er ce n ta ge o f 20 03 ci ta ti on s fr om o th er j ou rn al s the four most recent years of Math. Ann. available, a researcher would be able to retrieve less than 10 percent of the cita- tions to it. Please note, however, one oddity in ISI’s method of determining half-life. ISI considers the publication year (e.g., 2003) as “year one” of half-life rather than “year zero.” Thus, when the half-life is 3.0 years, ISI does not mean that half of the articles were cited a er 2000 (= 2003 - 3) but, rather, a er 2001 (2003 was the first year, 2002 the second, 2001 the third). For our calculations, the beginning of the publica- tion year must start at zero; therefore, the ISI half-life is simply adjusted by subtracting one year. (See figure 1.) Cited half-life as an Exponential Decay Curve As the term half-life suggests, the fraction of it to decay away.... A plot of the re- maining nuclei as a function of time shows a steady decrease as the curve tends to, but never actually reaches, zero. The kind of behavior is called exponential decay.... The fraction of the original material remaining a er n generations is (1/2)n, instead of 2n for exponential growth.3 The fractional amount decaying each year does not change as the source disap- pears. Say, for example, that the half-life of a chunk of radium is 1,620 years. A er 1,620 years, one half of the chunk is le . A er 3,240 years, the original chunk does not have 0 percent radium le but, rather, 100% • 1/2 • 1/2 = 100% • 1/4 = 25%. A er T years the fraction of the total atoms of radium le is    T      1 of citations from a given year satisfies to a first approximation a curve for exponen- tial decay. One can illustrate this by plot-    1620 2 . ting the ISI data against a least-squares fi ed exponential decay curve (discussed below). (See figure 2 for an example.) Many physical models serve as analo- gies. For example, in the field of physics, the half-life of a radioactive sub- stance is the time required for half The empirical data provided by ISI may be used to find the exponential decay curve with the best least-squares fit to the data. As illustrated in figures 2, 3, and 4 (with the half-life corrected as indicated above), the least-squares fi ed decay curves fit the data surprisingly well, especially for long half-life journals. Using Cited Half-life to Adjust Download Statistics 531 FIGURE 3 Least-squares Estimate for Communications in Partial Differential Equations 0% 25% 50% 75% 100% 0 1 2 3 4 5 6 7 8 9 Number of years before 2003 that cited articles were published ISI data Least-squares est. C u m u la ti ve p er ce n ta ge o f 20 03 ci ta ti on s fr om o th er j ou rn al s Importance of Cited Half-life Among its many advantages, the half-life is particularly important for providing more accurate use data for all types of journals. For example, assume that only the print version of a journal is available, that the library has a complete run, and that the library is gathering reasonably accurate annual use statistics. If two of the example journals were used in print one hundred times over the course of a year, it would be expected that fi y of those uses for Nat. Cell. Biol. were to articles published in the past 1.7 years (= 2.7 - 1), and fi y of those uses for Comm. PDE were to articles published in the past 8.5 (= 9.5 - 1) years. Imagine now a second scenario where the electronic version is available for the past seven years, and prior to that, only the print was available (back to volume one, issue one). Further, assume that the year ’s use data include only electronic downloads. For Nat. Cell. Biol., because its corrected half-life is 1.7 years and the electronic version is available for seven FIGURE 4 Least-squares Estimate for Mathematische Annalen 0% 25% 50% 75% 100% 0 1 2 3 4 5 6 7 8 9 Number of years before 2003 that cited articles were published ISI data Least-squares est. C u m u la ti ve p er ce n ta ge o f 20 03 ci ta ti on s fr om o th er j ou rn al s 532 College & Research Libraries November 2005 years, virtually 100 percent of the total use (print and electronic) is reflected in the download statistics (assuming that one electronic download is equal to one print use). For Comm. PDE, because its corrected half-life is 8.5 years, only 36 percent of the total use is reflected in the downloads; and for Math. Ann., because of its very long half-life, only 18 percent is reflected in the downloads. Not only is the total use undercounted in the second scenario (only electronic downloads were captured), but the undercounting is not comparable. For one journal, the electronic use captures nearly 100 percent of the total use and for another, less than 20 percent. The striking difference in these figures also points to a difference across fields that, if not adjusted for, could lead to cancella- full-count at some date in the future. To understand this, recall how this adjusted figure will be used. It will be divided into the yearly subscription cost to obtain the CPU. Because costs are involved in cancel- ing a subscription and then resubscribing at a later date, the model is calculating the electronic downloads per year that one would expect in the future assuming the ER increases one more year each year. To derive AF, the overall adjustment factor to be applied to DL, convert DL to FC as follows: FC = DL • AF Le ing HL denote the adjusted ISI cited half-life of the journal (i.e., = ISI HL - 1); 1 AF  =    ER     tions ultimately costing the library more than was initially saved. For example, of the 327 mathematics and applied math- ematics journals listed in ISI, 39 percent have an ISI cited half-life of at least ten years, 70 percent have an ISI cited half-life of at least six years, and 11 percent have no half-life given (e.g., new journals). Com- pare this with biochemistry and molecular biology. Of the 261 journals listed in ISI, 1   −  The Mathematics The mathematics is straightforward. As noted in the radium example above, the fraction of the total atoms of radium re- maining a er T years is:    HL    2 . (equation 1)    1 7 percent have an ISI cited half-life of at T      least ten years, only 38 percent have an ISI cited half-life of at least six years, and only 1      1620 1 percent have no half-life given. 2 . Thus, to compare fairly the total use of one journal with another across dis- Thus, the total percentage spent a er ciplines, electronic download statistics T years is: should be adjusted by incorporating cited    T     half-lives. Model for Adjusting the Download Statistics Let DL denote the number of electronic downloads in a given year. This is the “use statistics” that publishers provide. Let ER denote the electronic run (i.e., the number of years of the journal from 1   −  By modeling citations to a given vol- ume of a journal as atoms decaying from a chunk of radium for a single volume of a journal, the fraction of total of expected citations unaccounted for j years a er publication is found to be:    1620    • 100% 2    1 . T years in the past to the present with j     1 electronic, but not paper available).       HL Let FC denote the full-count of use in a given year. The goal is the steady-state 2 . Using Cited Half-life to Adjust Download Statistics 533 Thus, for a single volume of a journal, and T equal to ER. Thus, by the above, the total citations accounted for in the jth FC = k • C. Notice that this FC is the future year a er publication is: steady-state count.             From this, we conclude that 1 HL −j j  1 1 −  where C is the total number of citations            HL       C TC DL =2 2 FC = k •C = k • ER    ER           1       1 −    HL HL , 1            −  1 2 2 from a given volume of a journal. For sim- . plicity, one volume is assumed to equal a Thus, 1 year’s worth of a journal’s articles. 1 AF  = . Thus, assuming the same number of to- ER      1    tal citations for each volume of the journal HL   1 volumes up to T years ago is the sum of the Note that if ER was equal to the full above quantities for j from 1 to T, that is: run of the journal, this would still give a   is the same number C, the total number of 2     −  citations TC in the current year from all small increase to DL.T      1    HL       1 −    TC = C Model Assumptions2 . The model is based on a number of as- Note that as T increases TC approaches sumptions summarized in table 2. For C. Thus the citations in a given year from each assumption, difficulties are first the most recent T volumes of the journal presented, then a justification. account for a fraction equal to: 1. Citations Decay Exponentially T     1    HL Assumption: The fraction of this year’s citations of the journal from volume one until T years ago will be:       1 −    2 of the number of total citations of the Tjournal per year that will be approached       1    HL       1 −    in the future. Assume that DL is proportional to TC (i.e., DL = k • TC for some constant k 2 . TABLE 2 The Model’s Assumptions Assumption Difficulty 1. Citations decay exponentially. 1. May be linear, but doesn’t fit data 2. Half-life is continuous. 2. ISI assumes discrete, but doesn’t fit publishing practices 3. Half-life is relevant for journals. 3. Not true for all articles, but more true for journals 4. Half-life is similar across all volumes. 4. Probably does change over time, but difficult to correct 5. Local half-life and ISI half-life are proportional. 5. Proportionality fits data, but need more research 6. Citation and use are proportional. 6. Proportionality fits data, but need more research 534 College & Research Libraries November 2005 Difficulties: First, half-life itself may not be the best way to describe journal obsolescence. This is discussed by Endre Száva-Kováts, whose article, “Unfounded A ribution of the ‘Half-Life’ Index-Num- ber of Literature Obsolescence to Burton and Kebler,” is required reading for anyone fond of half-life data. “[I]n their 1960 article Burton and Kebler first made critical and later ambiguous statements, and finally a ribute only ‘some validity’ to the idea of literature half-life.”4 Second, in 1961, R. E. Burton and B. A. Green Jr. suggested that statistical “me- dian-age” be used instead of half-life, im- plying a linear rather than an exponential relationship.5 Despite the term median-age, Burton and Green employ an exponential decay curve to graph the citation pa ern. Further, the use of a linear median-age rather than an exponential half-life does not fit the data presented by ISI. (See figures 2, 3, and 4.) Even if median-age were used instead of half-life, however, it would still adjust the download figures in such a way as to give a be er estimate of use than a simple cost per download calculation. Third, the model is not needed for short half-life journals because nearly all the use is likely to occur in the run of elec- tronic access available. Thus, the model is intentionally designed to be most helpful for medium to long half-life journals. Justification: Despite these difficulties, the model’s assumption of exponential decay is a good first approximation as illustrated by the remarkably good fit of the model data to the ISI data in figures 2, 3, and 4. 2. Half-life Is Continuous Assumption: The exponential decay model is a continuous time model; treat- ing time as discrete leads to error. Difficulty: One particularly worrisome area is implicit in the ISI measurement of half-life. The ISI half-life figure assumes that a year ’s issues of a given journal may be treated as if they appeared at the beginning of the year, when, in fact, they are spread out over the year. ISI’s assump- tion does not cause harm for long half-life journals, but it does for those with a short half-life. Justification: Because there is li le ef- fect on long half-life journals and, in fact, more accuracy for short half-life journals, the model employs a more straightfor- ward calculation based on continuity. 3. Half-life Applies to Journals Assumption: The exponential decay model applies for journals, but not for articles. Difficulties: Helmut M. Artus argued forcefully that “the generally accepted assumption of the steady obsolescence of scientific literature is refuted,” and one would agree that half-life is undoubtedly not true in a useful way for all articles.6 As an example of this, consider Leonard Roth’s article, “On the Projective Clas- sification of Surfaces.”7 Algebraic geometry was at the center of mathematics at the end of the nineteenth century. For a variety of reasons, objects of such complication were being studied that controversies arose over what had been proved and not proved. The subject slumbered until the middle of the twenti- eth century, when general tools (e.g., sheaf cohomology and algebra) had advanced to the point that many of the difficult com- plications could be handled by the new machinery. Many of the invariants of the classical period that had not been rigorous- ly defined had natural interpretations as invariants of the new machinery. This led to a renaissance of the subject in the middle of the twentieth century. Sommese (one of the authors of this paper) discovered the important article by Roth and quoted it prominently in his article, “Hyperplane Sections of Projective Surfaces I—The Ad- junction Mapping.”8 Performing a citation search on Roth’s article shows that the first citation in ISI is by Sommese. Therea er, a sequence of twenty citations (excluding Sommese’s) continues through 2004. Justification: Some articles continue to be cited despite their age; others are cited Using Cited Half-life to Adjust Download Statistics 535 once or twice and then forgo en. This dis- tinction is an important one to remember, but for the present model, what is being used is the citation pa ern for a collection of articles (i.e., a journal), not the citations of any individual article. 4. Half-life Is Similar across All Volumes Assumption: Different volumes of the same journal have the same half-life. Difficulties: As the explanation of the cited half-life from the JCR points out, “Dramatic changes in Cited Half-Lifes [sic] over time may indicate a change in a journal’s format.” This could be caused by a change in the number of articles published in a given year (e.g., more pages with same density of print or the same number of pages, but denser format); a change in editorial poli- cies; or a change in a field of study. Justification: An investigation into half- lives changing over time was not pursued. It would be worthwhile to explore the effect of such a change further. 5. Local Half-life and ISI Half-life Are Proportional Assumption: Local half-life is proportion- al to the corrected ISI half-life figures. Difficulty: Even if ISI’s half-life figures are a good proxy for the citation pa erns of the general scholarly community, they might differ markedly from the citation pa ern of a particular university or re- search institute. In “Library Journal Use and Citation Half-Life in Medical Sci- ence,” Ming-Yueh Tsay studied cited half- life and local use half-life. “[I]n general, journals with shorter citation half-lives also have shorter use half-lives.” But, “[t]here is ... a [statistically] significant difference between the mean citation half- life and the mean use-half life for journals of each category [studied].”9 Justification: Further research is need- ed on this assumption. 6. Citation and Use Are Proportional Assumption: The journal being cited and the journal being downloaded should be in about the same proportions (with some time lag). Difficulty: The other, more significant problem is that in-house use half-lives may differ significantly from citation half-lives. Tsay demonstrates that for the journals held by the medical library she studied, use half-life was less than cited half-life (e.g., for 266 clinical medicine titles, the use half-life was 3.02 years, but the citation half-life was 6.06 years).10 The implication is that in this case, citation half-lives may overestimate local use. In “Biology Journal Use at an Academic Library: A Comparison of Use Studies,” Diane Schmidt and Elizabeth B. Davis argued that “this technique [of studying citations] does not address the influence of background reading or information gathered for personal, as opposed to professional, use. Another problem ... is that, in general, they measure only the use of journals by faculty or occasionally graduate students.”11 The implication here is that citation half-lives underestimate local use. Justification: Because the estimate is applied across the board, one journal’s cost per use will be more comparable to another’s than an unadjusted calculation. However, further investigation is needed on this assumption. Practical Problems Practical problems stemming from the model’s assumptions and methods for handling them are summarized in table 3. Because Notre Dame’s cancellation project was postponed, the effects of these practical decisions are demonstrated with a sample set of journals listed in table 4. 1. Do the download statistics need to be adjusted at all? The first thing a user of download statistics will have to decide is whether to adjust them. The authors are convinced that download statistics should be adjusted, except in the case of a small set of journals with generally short half-lives. Even if the corrected download statistics do not figure http:years).10 536 College & Research Libraries November 2005 TABLE 3 Practical Difficulties Choice Our Decision 1. Do the download statistics really need to be adjusted? 1. We had many long half-life journals, so the adjusted download figures were calculated. 2. How should you generate one year’s download statistics? 2. We had data problems, so only one complete year’s worth of data was used. 3. How should you calculate the electronic run? 3. We had runs available for each journal, so they were used. 4. What do you do if the print and electronic runs overlap? 4. We didn’t have good print statistics, so the print was ignored. 5. What do you do if part of the run is available for little or not cost? 5. We didn’t consider this at the time, so the current issues were focused on. 6. What do you do with a half-life > 10 years? 6. We used a corrected half-life of 9.0 and were ready to perform least-squares analysis for the borderline cases. 7. What if the half-life is unavailable from ISI? 7. We were prepared to use a corrected half-life of 9.0 or to calculate the half-life needed to make a given cost-per-use cutoff. 8. How do you convert print use to downloads? 8. We needed to make some estimate, so five downloads: one print use was the ratio used. largely in a library’s cancellation decisions, the adjusted figures can inform decisions about borderline cases. Corrected half-lives for the sample set of journals is listed in table 5. Of the seventeen journals listed, seven have half-lives greater than eight years. 2. What time period should be used to compute DL, the download statistic for a year? Should the DL be from the most re- cent year or should an average of several years be used? If there are K years of download data, and the total count of all downloads is TDL for those K years, TDL/K could be use for DL to smooth out fluctuations. Caution should be exercised, however, because: • Usage might increase from year to year as the comfort level and dependence on electronic journals continues to increase. TDL/K could be replaced with TDL/K times the overall ratio of increase over the K years (i.e., if the total downloads for a given univer- sity increased overall by 3%, the TDL/K for each journal could be adjusted by 3%). • If K is not an integer, seasonal varia- tions in downloads will skew the figures (i.e., if K only covered 18 months, K would be 1.5). Because the variation could be caused by whether classes are in session or not, for example, one could attempt to make an ad- justment for seasonal variation. However, it is probably best for K to be an integer. • The electronic run will not be constant over the K years (i.e., this year there might be five years of electronic access, but last year only four). • Lastly, publishers generally acknowl- edge that there are problems with the statistics when they first began to collect them. A library should begin with one year’s worth of statistics and build from there over time. In table 4, download statistics are listed for 2003 and 2004; in table 5, only download statistics from 2004 are used. 3. What method should be used to calcu- late the electronic run? Another minor difficulty is calculating the number of years of electronic availability, Using Cited Half-life to Adjust Download Statistics 537 ER. Even for the same publisher, the ER for individual journals can vary from a few years to more than ten (especially for new journals). Also, the calculations assume no overlap between electronic and print versions. (See the discussion below.) The library should use the num- ber of years for the entire electronic run, realizing, of course, that even the adjusted download figures from the model will un- dercount, especially for disciplines more comfortable with paper journals than electronic journals. These subjects also appear to be ones with the preponderance of long half-life journals (e.g., journals in the humanities). However, because the model has been applied across the board, the adjusted download figures are more comparable than the raw download figures by themselves. (See table 5 for ER figures for each journal. Notice that Nature Cell Biology only began in 1999 and is adjusted accordingly.) 4. What should be done if the print and electronic runs overlap? At Notre Dame, many print subscriptions have been cancelled in favor of the elec- tronic version only. With few exceptions, however, there was a period of time when the library received both the print and electronic versions of a journal. If a library is gathering reasonably accurate print and electronic use statis- tics, the statistics could simply be added together for the overlapping time period. Of course, there is the problem of the definition of “use.” Download statistics count the use of one article as one use; print statistics generally count one current issue or one bound issue as one use. Obvi- ously, these are not comparable. For the sample set of journals, the as- sumption was that there was no overlap between electronic and print and that the print use statistics could be ignored (they are not generally collected well or with TABLE 4 Sample Set of Journals Rank Title 2004 Cost 2003 DL 2004 DL 2004 CPU 1 Nature Cell Biology $899 n/a 348 $2.58 2 SIAM Journal on Numerical Analysis $508 61 69 $7.36 3 Evolution and Human Behavior $808 89 109 $7.41 4 Library & Information Science Research $315 17 13 $24.23 5 Accounting, Organizations & Society $1,633 65 50 $32.66 6 Acta Psychologica $936 43 28 $33.43 7 Probabilistic Eng Mechanics $832 771 24 $34.67 8 International Journal of Industrial Organization $1,169 40 32 $36.53 9 Physics Reports $5,599 198 153 $36.59 10 Immunology Letters $2,734 89 69 $39.62 11 Journal of Molecular Structure: Theochem $7,633 216 192 $39.76 12 Earth Science Reviews $1,334 13 33 $40.42 13 Mathematische Annalen $2,760 58 60 $46.00 14 Communications in Partial Differential Equations $1,995 n/a 38 $52.50 15 Poetics $433 19 3 $144.33 16 Technological Forecasting & Social Change $839 5 4 $209.75 17 Journal of Logic and Algebraic Programming $923 1 1 $923.00 538 College & Research Libraries November 2005 regularity). The adjustments were made using only the complete electronic run. 5. What should be done if part of the run is available for little or no cost? Because the ultimate goal is to maximize use while minimizing cost (i.e., maximizing the ROI), there are implications for journals available for li le or no cost a er some number of years (e.g., through JSTOR or via various open-access arrangements). As a first example, consider Math- ematische Annalen. The years 1996 to the present of this journal are available elec- TABLE 5 First Adjustment to the Sample Set of Journals Rank Adj. Rank Title 2004 Cost 2004 DL 2004 ER 2003 HL AF FC Adj. CPU 1 1 Nature Cell Biology $899 348 6 1.7 1.1 381.0 $2.36 2 2 SIAM Journal on Numerical Analysis $508 69 8 9* 2.2 150.0 $3.39 3 3 Evolution and Human Behavior $808 109 8 3.0 1.2 129.4 $6.25 14 4 Communications in Partial Differential Equations $1,995 38 4 8.5 3.6 136.5 $14.61 9 5 Physics Reports $5,599 153 7 7.7 2.1 327.3 $17.11 5 6 Accounting, Organizations & Socty $1,633 50 10 9* 1.9 93.1 $17.54 4 7 Library & Information Science Research $315 13 10 5.3 1.4 17.8 $17.68 6 8 Acta Psychologica $936 28 10 9* 1.9 52.1 $17.95 12 9 Earth Science Reviews $1,334 33 10 9* 1.9 61.4 $21.71 13 10 Mathematische Annalen $2,760 60 9 9* 2.0 120.0 $23.00 7 11 Probabilistic Eng Mechanics $832 24 10 6.1 1.5 35.3 $23.54 8 12 International Journal of Industrial Organization $1,169 32 10 6.6 1.5 49.2 $23.75 10 13 Immunology Letters $2,734 69 10 4.4 1.3 87.0 $31.42 11 14 Journal of Molecular Structure: Theochem $7,633 192 10 4.3 1.2 239.8 $31.82 16 15 Technological Forecasting & Social Change $839 4 10 8.6 1.8 7.2 $116.07 15 16 Poetics $433 3 10 n/a $144.33 17 17 Journal of Logic And Algebraic Program- ming $923 1 10 n/a $923.00 * ISI Half-life > 10 Using Cited Half-life to Adjust Download Statistics 539 tronically by subscription from Springer (for us, 2003) plus the i years before that. Verlag. However, EMANI (Electronic Find the value of HL such that Mathematical Archiving Network Initia- 2      tive) makes all the issues from 1996 and      i HL     1−          1    9   ∑ earlier available for free. This is laudable, Ci − i=0 2 but because the older issues are free to anyone (even nonsubscribers), the down- load statistics from Springer for the sub- scribed issues should not be adjusted. As a result, the cost per download is higher and it is more likely that the subscription should be replaced by some other means of access (like commercial document delivery). As a second example, consider journals that are available before a certain moving wall at a cost less than a subscription. The SIAM Journal on Numerical Analy- sis (SINUM) was available before 1996 through JSTOR (JSTOR:SINUM). Because the decisions for both these subscriptions are separate and because “all the years for each period” are available electroni- cally, the raw download statistics should be used with no half-life corrections. A 2004 subscription to SINUM is $508, and the cost of JSTOR:SINUM is about $25. During 2004, the raw downloads were sixty-nine and sixty-six, respectively. Here are the results of the analysis: • The cost per download was $7.36 for SINUM and $0.38 for JSTOR:SINUM. • If JSTOR:SINUM was not available, the figures for SINUM would be adjusted, resulting in a cost of $3.39 per download as indicated in table 5. 6. What should be done with an ISI half- life of “> 10” years? For a journal with an uncorrected ISI half-life of at least ten years, ISI simply gives > 10 as the half-life. When ISI’s data are corrected, the journal has a half-life > 9 years. Can this be calculated more accurately? For these cases, a least-squares fi ed exponential curve can be employed to compute a more accurate estimate of half- life. The procedure would be as follows: Let Ci denote the cumulative fraction of cites from the journals in the current year is minimized (e.g., using the solver function in Excel™). The citation data were adjusted so that 0 percent was the prediction for cites from year zero (i.e., cites in 2003 journals to the 2003 volume of the journal in question). For long half-life journals this is close to true because the fraction C0 is typically less than 0.5 percent. More accurate computations of the > 10 half-lives using least-squares minimi- zation could give valuable guidance in contested cases. However, for journals with half-lives greater than four or five years but less than ten, least-squares minimization tends to give a half-life slightly less than the half-life given by ISI. For journals with half-lives less than four years, least-squares minimization should not be used. (For the sample set’s results employing least-squares minimization, see table 6.) 7. What if the half-life is unavailable from ISI? JCR does not provide cited half-lives for journals cited less than one hundred times nor does it provide half-lives for every journal published (for example, new journals). As noted above, of the 327 mathematics and applied mathemat- ics journals listed in ISI, 11 percent (36 journals) have no half-life given. Of the 261 biochemistry and molecular biology journals listed in ISI, 1 percent (3 journals) have no half-life given. The analysis un- covered the fact that nearly 50 percent of the journals across all subjects did not have half-life data available. If ISI has data on a particular journal (even if it has been cited less than one hun- dred times), it might be able to estimate the half-life as was done for those with > 10 years (above). Another approach is to 540 College & Research Libraries November 2005 TA B L E 6 F in al A dj us tm en t t o th e Sa m pl e Se t o f J ou rn al s R an k A dj . R an k L S A dj . R an k T it le 20 04 C os t 20 04 D L 20 04 E R 20 03 L S H L A F F C A dj . C P U 1 1 1 N at ur e C el l B io lo gy $8 99 34 8 6 1. 7 1. 1 38 1. 0 $2 .3 6 2 2 2 SI A M J ou rn al o n N um er ic al A na ly si s $5 08 69 8 11 .6 2. 6 18 1. 5 $2 .8 0 3 3 3 E vo lu tio n an d H um an B eh av io r $8 08 10 9 8 3. 0 1. 2 12 9. 4 $6 .2 5 5 6 4 A cc ou nt in g, O rg an iz at io ns & S oc ty $1 ,6 33 50 10 15 .3 2. 7 13 6. 9 $1 1. 93 14 4 5 C om m un ic at io ns in P ar tia l D iff er en tia l E qu at io ns $1 ,9 95 38 4 8. 5 3. 6 13 6. 5 $1 4. 61 6 8 6 A ct a Ps yc ho lo gi ca $9 36 28 10 10 .8 2. 1 59 .2 $1 5. 82 13 10 7 M at he m at is ch e A nn al en $2 ,7 60 60 9 13 .7 2. 7 16 4. 0 $1 6. 83 9 5 8 Ph ys ic s R ep or ts $5 ,5 99 15 3 7 7. 7 2. 1 32 7. 3 $1 7. 11 4 7 9 L ib ra ry & In fo rm at io n Sc ie nc e R es ea rc h $3 15 13 10 5. 3 1. 4 17 .8 $1 7. 68 12 9 10 E ar th S ci en ce R ev ie w s $1 ,3 34 33 10 10 .7 2. 1 69 .1 $1 9. 31 7 11 11 Pr ob ab ili st ic E ng M ec ha ni cs $8 32 24 10 6. 1 1. 5 35 .3 $2 3. 54 8 12 12 In te rn at io na l J ou rn al o f I nd us tr ia l O rg an iz at io n $1 ,1 69 32 10 6. 6 1. 5 49 .2 $2 3. 75 10 13 13 Im m un ol og y L et te rs $2 ,7 34 69 10 4. 4 1. 3 87 .0 $3 1. 42 11 14 14 Jo ur na l o f M ol ec ul ar S tr uc tu re : T he oc he m $7 ,6 33 19 2 10 4. 3 1. 2 23 9. 8 $3 1. 82 16 15 15 Te ch no lo gi ca l F or ec as tin g & S oc ia l C ha ng e $8 39 4 10 8. 6 1. 8 7. 2 $1 16 .0 7 15 16 16 Po et ic s $4 33 3 10 n/ a $1 44 .3 3 16 17 17 Jo ur na l o f L og ic a nd A lg eb ra ic P ro gr am m in g $9 23 1 10 n/ a $9 23 .0 0 Using Cited Half-life to Adjust Download Statistics 541 simply use a figure of 9.0 years across all subjects without a half-life (ISI half-life of = 10.0 - 1). If journals having a half-life of > 10 were le with a corrected half-life of 9.0, this estimate gives journals without half-lives the maximum benefit. For borderline cases, the model could be employed in reverse using the goal- seek function in Excel™. If there were a particular CPU cutoff, say, $40 per down- load, one could calculate what the half-life would need to be for a particular journal to make that cutoff. In table 5, Poetics would need a half-life of approximately twenty-one years to make the $40 CPU cutoff. This might be reasonable. The Journal of Logic and Algebraic Programming, however, would need a half-life of 156 years. This is clearly unreasonable and the journal should be a candidate for cancellation. 8. How is use from electronic downloads converted to expected document delivery requests? One of the major problems with analyzing use statistics is estimating the conversion factors between downloads, print uses, and commercial document delivery re- quests. Would a journal used five times electronically be used only once in paper? Would a print volume used once result in one commercial document delivery request? Does it ma er if the docdel is mediated or unmediated? Estimating this conversion factor is critical when determining whether a subscription (print or electronic) is more cost-effective than docdel. Unfortunately, there appears to have been no previous published research in this area. During the project, the plan was to use a conversion factor of five downloads to one mediated docdel request. Conversion factors for print use were not estimated. The download to docdel figure was based on docdel requests before the library had access to journals electronically and on a sense that patrons would request fewer articles if they had to ask for them (rather than clicking on a hyperlink). Final Remarks For the sample set, the average cost of docdel for the University of Notre Dame, roughly $35 per article, was used. An estimated internal processing cost of $5 was added, and the cutoff was set at $40 per article. Journals that cost more than $40 per adjusted, converted use would be candidates for cancellation. In tables 4, 5, and 6, journals are ranked from lowest to highest CPU. In tables 5 and 6, previous rankings for the journals are included. Even the results in table 5 are be er than those in table 4. No model is perfect, but the half-life model fits the citation data surpris- ingly well. The goal was to improve the method used to evaluate the raw download figures. Indeed, the proposed model will still undercount the same areas that the raw download figures undercount, but the undercounting will be proportional across disciplines and less severe. It is important when evaluating wheth- er approximations are “acceptable” to keep the goal in mind. The goal is a reduction in undercounting, and under- counting is much more severe for long half-life journals with short electronic runs available. For example, look at a journal with a very long half-life, Mathematische An- nalen. According to JCR, this respected mathematics journal has a half-life of > 10 years with an electronic run of nine years. The ISI database does not give the exact half-life; however, only 24.2 percent of citations are from articles published be- tween 1994 and 2003. Using least-squares minimization, the “ISI half-life” is 23.2 years. Math. Ann. moves from a $46 CPU in table 4 to a $23 CPU in table 5 and to a $17 CPU in table 6. As time passes, the electronic runs of journals will increase, and there will be sufficient year-to-year raw download fig- ures to make reasonable extrapolations. Thus, the model for adjusting downloads that this article proposes will become less urgent. 542 College & Research Libraries November 2005 Notes 1. Marisa Scigliano, “Serial Use in a Small Academic Library: Determining Cost-effective- ness,” Serials Review 26 (2000): 43–52. 2. JCR Glossary. Available online at h p://jcr4.isiknowledge.com/www/help/hjcrgls2.htm. [Accessed 27 February 2005]. 3. Lawrence A. Coleman, “Exponential Growth and Decay,” Macmillan Encyclopedia of Physics, vol. 2 (New York: Simon and Schuster Macmillan, 1996), 533. 4. Endre Száva-Kováts, “Unfounded A ribution of the ‘Half-Life’ Index-Number of Literature Obsolescence to Burton and Kebler: A Literature Science Study,” Journal of the American Society for Information Science 53 (Nov. 2002): 1098–1105. 5. R. E. Burton and B. A. Green Jr., “Technical Reports in Physics Literature,” Physics Today (Oct. 1961): 35–37. “While the phrase ‘literature half-life’ has been applied to this figure, it more properly should be referred to as the median age.” 6. Helmut M. Artus, “‘Halbwertzeit wissenschaftlicher Literatur ’—Naturgesetz oder Forschungsartefakt?” Nachrichten für Dokumentation 34 (Apr. 1983): 79–86. He also claims: “Ein bloß quantifizierendes Vorgehen reicht nicht aus, um das gleichermaß en kognitive wie soziale Phänomen ‘Literaturnutzung’ in den Griff zu bekommen.” “A purely quantitative procedure is insufficient to come to terms with the cognitive and likewise social phenomenon of ‘literature use.’” (Translated by Robert L. Kusmer, Associate Librarian, University Libraries of Notre Dame.) 7. Leonard Roth, “On the Projective Classification of Surfaces,” Proceedings of the London Mathematical Society, Second Series 42 (1937): 142–70. 8. Andrew J. Sommese, “Hyperplane Sections of Projective Surfaces I—The Adjunction Map- ping,” Duke Mathematical Journal 46 (1979): 377–401. 9. Ming-Yueh Tsay, “Library Journal Use and Citation Half-life in Medical Science,” Journal of the American Society for Information Science 49 (Dec. 1998): 1283–92. 10. Ibid., 1286. 11. Diane Schmidt and Elizabeth B. Davis, “Biology Journal Use at an Academic Library: A Comparison of Use Studies,” Serials Review 20 (summer 1994): 45–63.