Previous   Contents   Next
Issues in Science and Technology Librarianship
Fall 2002
DOI:10.5062/F4057CWP

URLs in this document have been updated. Links enclosed in {curly brackets} have been changed. If a replacement link was located, the new URL was added and the link is active; if a new site could not be identified, the broken link was removed.

[Refereed]

Scholarly Communication: The Use and Non-Use of E-Print Archives for the Dissemination of Scientific Information

Ibironke Lawal
Engineering and Science Librarian
Virginia Commonwealth University
iolawal@vcu.edu

Abstract

This study surveyed a randomly chosen sample from a population of 240,000 scholars in nine scientific disciplines from private and public colleges and universities across the United States and Canada. The disciplines included physics/astronomy, chemistry, mathematics/computer science, engineering, cognitive science/psychology, and biological sciences. The survey sought to determine use and non-use of e-print archives in the different disciplines. Results show that 18 percent of the researchers use at least one archive while 82 percent do not use any. Scholars in physics use e-print archives the most and chemistry the least. ArXiv receives the most use and authors' web sites the least use. Reasons for use include dissemination of research results, visibility, and exposure of authors. Reasons for non-use include publishers' policies and technology constraints.

Introduction

Scientific journals started in the mid-seventeenth century with Le Journal des Savants and Philosophical Transactions of the Royal Society of London. Their purpose was to communicate laboratory experiment results, inventions, and meteorological data in physics, chemistry and anatomy (Primack 1992). As the number of articles increased and the process became slower, drafts of manuscripts were circulated. These preliminary publications were called preprints. Initially, distribution of preprints required mailings of multiple copies of manuscripts. The process of distribution became faster with the popularity of the facsimile in the 1970's (Bellis 2002). The Internet and e-mail distribution accelerated the use of preprints. The invention of the World Wide Web in the early 1990's revolutionized preprints distribution. The integration of multimedia and graphics added considerable value to preprints. Preprints in digital format are known as e-prints and the online databases from which they are distributed are called e-print archives. Until recently, e-print archives did not exist in all scientific disciplines. Physics has a long history of using preprints and established ArXiv, an e-print archive at Los Alamos National Laboratory, in 1991. Other disciplines have established e-print archives as well. Previous studies on the use of e-prints include a citation study by Brown (2001), as well as use and non-use of ArXiv and Cogprints done by {E-prints.org} in 2000/2001. Brown, using the various archives in ArXiv and the SPIRES-HEP database, examined the citation rates of e-prints by e-prints. High Energy Physics Experiment (hep-ex) has the highest citation rate at 14.5%, while Mathematical Physics (math-ph) has the lowest at 0.95%.

She also used SciSearch database to analyze the citation pattern of journal articles to e-prints. Results show that High Energy Physics Theory (hep-th) has the highest citation rate while Physics has the lowest at 0.07%. Citation rate by e-prints to e-prints was 20 times greater than the citation rate by journal articles to e-prints.

The Problem

Exorbitant pricing levels of science, technical and medical (STM) journals and library budgetary constraints often prevent institutions from purchasing needed journals. According to the Association of Research Libraries (ARL) statistics, since 1986, the average annual increase of the serial unit cost for an ARL library was 8.8%. This amounts to an increase of 226% from 1986 to date. The consumer price index for this period increased 57%. In real dollars, libraries spent almost three times as much on serials in 2001 as they did in 1986 even though they acquired 7% fewer titles. This phenomenon is more prominent in the STM fields where journals are of primary importance (Case 2001). Harnad (2001) calls it 'toll-gating access' to research findings, and says it is as counterproductive as 'toll-gating access' to commercial products. Until now, the problem has attracted little attention from the researchers.

Publishers of science journals share research results with the media before the academy. Science and Nature give reporters a preview of the research articles that will appear a week later. The New England Journal of Medicine (NEJM) and The Journal of the American Medical Association (JAMA) send advance copies to reporters. Reporters' e-mail boxes and fax machines fill up with announcements from other journals, universities, and institutes promoting new scientific findings. Most of this information carries a warning label: EMBARGOED. Public use of the information is forbidden until a specified date and hour to coincide with a journal's publication date (Marshall 1998). This practice has existed for years since it benefited publishers and journalists. Science is supposed to progress through rapid communication of results among scientists, but the embargo system is a barrier to this free exchange of information. One can understand that publishers do not want to feed the public with incomplete and inaccurate information but other scientists in the academy would have liked to enjoy the same kind of privilege extended to the media. Even with the Internet and World Wide Web, there is still an uneasy alliance between publishers and journalists to keep information from the public. In addition to this, the regular publishing channel is too slow for today's fast paced flow of information.

In an attempt to free the literature from this impediment, scientists and scholars began to initiate alternatives by instituting reforms and establishing free e-print archives where authors only need to deposit their articles. In March 2000, the Association of American Universities (AAU), the Association of Research Libraries and the Merrill Advanced Studies Center of the University of Kansas sponsored a meeting in Tempe, Arizona to formulate principles that could help transform the scholarly publishing system. The participants came up with nine principles. The first one states: 'the cost to the academy of published research should be contained so that access to relevant research publications for faculty and students can be maintained and even expanded. Members of the university community should collaborate to develop strategies that further this end. Faculty participation is essential to the success of the process'(ARL 2000). Fortunately, improvements in technology can foster easy and wide distribution of research results and papers to everyone anywhere.

Varmus (Marshall 1999) proposed that PubMed Central would make research literature in biomedicine, plant and agricultural science widely available. Eisen and Brown (2001), in proposing the Public Library of Science (PLoS), argued that scientific progress and public welfare would be much better served by a scientific literature that belongs to the public, accessible and usable by anyone, anywhere without barriers, charges or restrictions. PLoS initiative was started by a group of concerned scholars who circulated an open letter that urged all scholars to edit, publish or review for or personally subscribe to only those scholarly and scientific journals that have agreed to grant unrestricted distribution rights to any and all original research reports that have been published through PubMed Central or other similar online public resources within 6 months of their original publication date. As of October 2002, 30798 scientists from 182 countries have signed the letter. In response to dysfunction in the scholarly communication system, ARL formed Scholarly Publishing and Academic Resources Coalition (SPARC). SPARC seeks to bring high quality research to a greater audience. Buckholtz (2001) talks of SPARC's initiative, focusing on the researcher declaring independence and restoring competition to the scientific journals marketplace. Mellman (2001), editor of Journal of Cell Biology, advocates removing barriers to the free exchange of scientific information. Case (2001), points out that it is the scientists who are going to have to figure out how they want their work to be available. Thus building of e-print archives or servers and other initiatives in various scientific disciplines that began in the late 1990's escalated at the turn of the millennium. According to the Office of Scientific and Technical Information (OSTI) of the United States Department of Energy (DOE), owner of the Preprint Network site, there are about 7,000 scientific and technical preprint servers around the world (Warnick 2001).

Publishers became apprehensive of these initiatives and instituted other rules. Under the editorship of Franz Ingelfinger, NEJM adopted a policy of declining to referee or publish research that had been previously published or publicized elsewhere. Other biomedical and broad-spectrum journals such as Science and Nature have since adopted this 'Ingelfinger Rule' (Harnad 2000). Karow (2001) concludes that publishers worry that outside archives hosting will introduce errors into the files lowering reliability of the information.

Objectives

This study seeks to determine the following regarding e-prints: Since this is the first study of its kind, the desired outcomes are 1) to gather data that will give insight into the research culture and scholarly communication process in each discipline, 2) to collect data that could be used to remove barriers to easy exchange of information. 3) to provide research methods that could be used in future studies.

Methods

The survey population consists of researchers and scholars in colleges and universities across the United States and Canada. According to the National Science Board statistics, 240,200 doctoral scientists and engineers were employed in academia in 1999 (National Science Board 2002). The disciplines chosen for study include Chemistry, Biological Sciences, Engineering, Cognitive Science and Psychology, Mathematics and Computer Science Physics and Astronomy. The sample size is 473, calculated from the total population using 95% confidence level and 4.5% confidence interval. The sample size was calculated using the software designed by Creative Research Systems of Petaluma California. The respondents were randomly chosen from institutions' web directories. The same number of respondents was chosen for each discipline. (There may be some limitations here as some fields have a higher number of scholars than others). The randomization does not take into account the status of the respondents. Each person was randomly selected regardless of whether he/she is tenured, tenure track, non-tenured faculty or visiting faculty. Professors emeritus are excluded because they may not be currently actively involved in the research process. Math and Astronomy, Mathematics and Computer Science and Cognitive Science and Psychology were combined in accordance with most faculty lists. The survey instrument was a web questionnaire. Four e- mail messages were sent to each potential respondent. The first was an introduction and invitation to participate; the second was a waiver of written consent form and the URL address of the active questionnaire. The third message was a reminder and the fourth, a letter of gratitude.

Results

The demography of respondents was as follows: 82.2% were from research universities, 12.7% from four year colleges and 8.5% from liberal arts colleges. Respondents from Physics and Astronomy represented 20.3%, 17.7% were from Biological Sciences, 17.1% from Engineering, 17.1% from Math and Computer Sciences, 16.4% from Chemistry and 9.5% from Cognitive Sciences and Psychology. Sixty-nine percent were tenured, 22.4% were tenure track while 8.4% were non-tenured.

Eighteen percent of the respondents use e-prints and 82% do not. Of those who use e-prints, 54.2% were in Physics/Astronomy, 27.7% were in Mathematics and Computer Science, 7.4% in Engineering, 3.7% in biological Sciences and 1.85% in Cognitive and Psychology. One hundred percent of those who utilize e-print archives also search e-print archives but only 90.7% cite them in their articles while 9.3% do not. Table 1 shows the percents of use and non-use of e-prints by discipline. The archives used are listed in Table 2.

Table 1: Percent Use of E-Print Archives by Individual Discipline
Discipline Yes No
Physics/Astronomy 51.6 48.3
Mathematics/Computer Science 28.8 71.1
Engineering 7.4 92.3
Cognitive Science/Psychology 6.8 93.1
Biological Sciences 3.7 96.2
Chemistry 0 100

Table 2: Archives Used and % Use
ArXiv PubMed Google Own web site Cite Seer Interior Point Archive Hopf Do not specify
77.7 3.7 1.85 1.85 1.85 1.85 1.85 5.5

Comparison of non-use by discipline (Figure 1) shows that Chemistry has the highest percentage of non-use due to publishers' policies, as compared to the other disciplines. There were a large number of respondents in all areas that felt that e-print archives were not relevant to them. A relatively small number named technology constraints as a barrier to use. If the barriers were removed, would use increase? Respondents' answers (Figure 2) indicate that use would increase by 62.5% in Engineering, 59.2% in Cognitive Sciences/Psychology, 44.2% in Biological Sciences, 40% in Physics/Astronomy, 35.1% in Mathematics/Computer Science and 32% in Chemistry.

Figure 1: Reasons for Non-Use of E-Print Archives






Figure 2: Change in Use of E-Print Archives if Barriers Were Removed






Pattern of use also differs in each of the disciplines. Table 3 shows how many respondents post articles to the web before or after publication, by discipline. Table 4 shows that regardless of discipline, most respondents who post preprints go on to publish them as articles later. Disciplines with the highest percentage of respondents who post preprints also show the highest percentage of respondents who formally publish their articles.

Table 3: Pattern of use by discipline
Discipline Posting before it is published % Posting after publication % Posting after accepted for publication
Physics/Astronomy 81.25 17.5 0
Engineering 50 50 0
Mathematics/Computer Science 93.3 6.6 0
Cognitive Science/Psychology 100 0 0
Biological Sciences 50 0 50

Table 4: Comparison of publishing with pattern of use in %
Discipline Respondents who post before publication Percent of those that post before publication who later publish their articles
Physics/Astronomy 81.25 84.3
Engineering 50 75
Mathematics/Computer Science 93.3 93.3
Cognitive Science/Psychology 100 100

Discussion

Seventy-two percent of respondents who use e-print archives said they do so for rapid and wider dissemination of information and fourteen percent said they do so for visibility and exposure. Though these respondents see the benefits of using e-prints, the majority of scholars surveyed still do not take advantage of this medium for disseminating research findings. Perhaps the nature of the literature and the information seeking behavior in each discipline will throw some light on the use and use pattern of e-print archives in that discipline. In chemistry, serial literature is the most important medium of communication. Fast access to current literature ranks first among chemists' information needs. Since chemical information is of fundamental importance to other scientific disciplines such as biochemistry, genetics, medicine, pharmacy, environmental science and others, chemical information should be widely disseminated. Why do they not use e-print archives for rapid and wider dissemination of chemical information? The chemistry respondents take the 'Ingelfinger Rule' seriously because primary chemistry publishers strictly apply this rule. Another drawback for chemistry is that the only preprint server available is relatively new. It is owned by the trade publisher Elsevier, and powered by {ChemWeb.com}. Other considerations include the way chemists work and the nature of chemical information. Chemists generally work in small groups and are involved with every aspect of the investigation. Publications in the field do not involve a large number of authors. Chemical information is distinctive in that it deals with atomic and molecular species, which are precisely and unambiguously defined by their molecular structure. Since physical and chemical properties do not change over time, older literature is as essential as current literature (Gould & Pearce 1991). Patent literature is vital to research in chemistry. Sometimes patents are the only source of particular chemical information. The potential to patent a specific research finding might detract from putting that information in the public domain before the patent is applied for and awarded. A significant number of chemists do not think that e-print archives are relevant to their field.

The literature of astronomy and physics are intertwined such that it is sometimes difficult to separate them. Astronomy is data dependent; the collected data do not change over years. Preprints in radio astronomy date back to 1978. There are a considerable number of Astronomy articles in the physics e-print archives. The literature of physics is found largely in journal publications. Physics has the best-organized literature in the sciences (Gould & Pearce 1991). They were the first to establish a pre-print server, the web version now called -ArXiv. There is a sharp difference between the information-seeking behavior of theoretical and experimental physicists. Theoreticians depend on the work of their predecessors. The information most important to them is often too recent to have been published, hence they use e-print archives. Experimentalists are more concerned with the way in which experimental procedures are carried out. Experiments in high-energy physics are very expensive; often physicists cannot wait for formal publications. High-energy physicists have depended on preprints for a long time. According to Brown (2001), 'the e-prints from the four high energy particle archives receive the highest number of citations by both e-prints and journal articles'. Preprints are most valued in physics because they provide an instantaneous publication channel. Physics is also collaborative in nature. It is not unusual to find a physics paper with over one hundred authors. These reasons, with a long existing e-print archive explain why physicists have the highest use.

Chemistry is at one end and physics is at the other end of a continuum while the other disciplines fall in between. Because biology is diverse, the approach to research varies, and because of its reliance on experimentation and observation, biologists depend heavily on reports of previous research in the periodical literature (Gould & Pearce 1991). The exponential growth of periodical literature in Biology makes it difficult to keep current. Also, publishing research findings through the traditional print medium takes up to eight months. Since collaboration is fundamental to biology research, biologists explore other avenues to disseminate information. They use symposia such as the Cold Spring Harbor, conferences and informal networks such as the Drosophila Information Service and electronic newsletters. Only a small fraction of biologists use e-print archives. The major reason being that they do not consider the existing archives relevant to their work. There needs to be many e-print archives in the different narrow sub-fields to make a significant difference.

Questions asked in engineering are generally focused on understanding what is happening in a given system. This requires knowledge of general scientific principles mainly drawn from physics, mathematics and chemistry. There are no e-print archives devoted solely to engineering. Engineers use the physics preprint archive and publish in physics, mathematics and computer science journals. Patent literature is important in engineering but technical reports are the mainstay of engineering research in many sub-fields. Engineers of all types use standards information. Engineering as a discipline, is not optimally compatible with e-print archives. The 47.9% 'Not Relevant', the highest of all the disciplines, is not surprising.

Mathematics literature retains its value over a long period of time and mathematicians frequently make use of the core literature. Like other science disciplines, communication with other researchers is vital to mathematics researchers. Preprints are the most important medium for consultation among scholars. Use of e-print archives in mathematics is next in rank to Physics. A significant number of publications necessary for research in mathematics come from international countries in languages other than English. English versions are obtained through a number of translation publishers. Mathematics publishers are the most liberal in applying the 'Ingelfinger Rule'. Unlike mathematics, computer science depends on recent literature with the oldest technical reports dating to the early 1950's (Gould & Pearce 1991). The main body of the literature tends to weigh more on technical reports and computer scientists depends heavily on conference proceedings for scholarly communication.

Psychologists and cognitive scientists depend heavily on journal literature. Use of computerized information systems to identify information ranked very low (Folster 1995). It is not surprising to see a low use of e-print archives and a relatively high percentage of technology constraints named as reason for non-use.

Conclusions

It is established that the perception of respondents is that e-print archives are mainly for rapid and wide dissemination of information. This is necessary where peer review process and regular publication take too long. Not all the disciplines are up to speed with using e-print archives partly due to the culture of information use in the various disciplines and partly due to low awareness level. Self- archiving initiatives might gain ground as every discipline becomes aware of the potential value for rapid and wider exchange of scientific information, fostering scholarly communication. Future studies could look into whether the situation in each discipline changes as use of e-print archives matures.

References

Association of Research Libraries (ARL). 2000. Principles for emerging systems of scholarly publishing. ARL Newsletter. [Online]. Available: {http://www.arl.org/resources/pubs/tempe/index.shtml} [November 8, 2002].

Bellis, Mary. 2002. Fax, fax machine and facsimile invention. [Online]. Available: http://inventors.about.com/library/inventors/blfax.htm [November 8, 2002.]

Brown, Cecelia. 2001. The coming of age of e-prints in the literature of physics. Issues in Science and Technology Librarianship 31. [Online]. Available: http://www.istl.org/01-summer/refereed.html [November 8, 2002].

Buckholtz, Alison. 2001. Returning scientific publishing to scientists. Journal of Electronic Publishing 7(1). [Online]. Available: http://www.press.umich.edu/jep/07-01/buckholtz.html [November 8, 2002].

Case, Mary M. 2001. The impact of serial costs on library collections. ARL Newsletter (218):9. [Online]. Available: {http://www.arl.org/bm~doc/costimpact.pdf} [November 8, 2002].

Eisen, Michael and Pat Brown. 2001. Should scientific literature be privately owned and controlled? Nature webdebates. [Online]. Available: http://www.nature.com/nature/debates/e-access/Articles/Eisen.htm [November 8, 2002].

Folster, Mary B. 1995. Information seeking patterns: social sciences. Reference Librarian 49/50:83-93.

Gould, Constance C. and Pearce, Karla. 1991. Information Needs in the Sciences: an Assessment. Mountain View, California: Research Library Group.

Harnad, S. 2000. Ingelfinger over-ruled: The role of the web in the future of refereed medical journal publishing. Lancet Perspectives 256 (December supplement: s16). [Online]. Available: { http://www.ecs.soton.ac.uk/~harnad/Papers/Harnad/harnad00.lancet.htm} [November 8, 2002].

________. 2001. The Self archiving initiative. Nature webdebates. [Online]. Available: http://www.nature.com/nature/debates/e-access/Articles/harnad.html [November 8, 2002].

Karow, Julia. 2001. Publish free or perish. Scientific American April 23. [Online]. Available: {http://www.sciam.com/article.cfm?id=publish-free-or-perish&catID=4&pageNumber=1} [November 8, 2002].

Marshall, Eliot. 1998. Good, bad, or necessary evil? Science 282 (5390): 860-867.

________. 1999. Varmus defends E-biomed proposal, prepares to push ahead. Science 284 (5423): 2062-2063.

Mellman, Ira. 2001. Setting logical priorities. Nature webdebates. [Online]. Available: { http://www.nature.com/nature/debates/e-access/Articles/mellman.html} [November 8, 2002].

National Science Board. 2002. Doctoral scientists and engineers in academia. In Science and Engineering Indicators. Washington D.C.: National Science Board. [Online]. Available: {https://wayback.archive-it.org/5902/20150818085944/http://www.nsf.gov/statistics/seind02/c5/c5s2.htm} [November 8, 2002].

Primack, Alice Lefler. 1992. Journal literature of the Physical Sciences: A Manual. Metuchen, N. J.: Scarecrow Press.

Warnick, Walter. 2001 Tailoring access to the source: preprints, grey literature and journal articles. Nature webdebates. [Online]. Available: http://www.nature.com/nature/debates/e-access/Articles/warnick.html [November 8, 2002].

Further Reading

Bachrach, S. R. et al. 1998. Who should own scientific papers? Science 281(5382) : 1459-60.

Brody, T. 2000. Mining the social life of an e-print archive. [Online]. Available: {http://opcit.eprints.org/ijh198/} [November 8, 2002].

Brophy, Peter. 2001. The Library in the Twenty-First Century. London: Library Association Publishing.

Campbell, Robert. 2001. Information access: what is to be done? Nature webdebates. [Online]. Available: http://www.nature.com/nature/debates/e-access/Articles/campbell.html [November 8, 2002].

E-Print survey: what do authors think? British Medical Journal. [Online]. Available: {http://bmj.bmjjournals.com/cgi/content/full/319/7202/DC1} [November 8, 2002].

Eysenbach, Gunther. 2000. The impact of preprint servers and electronic publishing on biomedical research. Current Opinion in Immunology 12: 499-503.

Haank, Derk. 2001. Content and context in one service, tailored to meet the needs of scientists. Nature webdebates. [Online]. Available: http://www.nature.com/nature/debates/e-access/Articles/hank.html [November 8, 2002].

Harnad, S. 1991. Post Gutenberg galaxy: the fourth revolution in the means of production of knowledge. Public Acccess Computer Systems Review [Online]. 2(1): 39-53. Available: {http://epress.lib.uh.edu/pr/v2/n1/harnad.2n1} [November 8, 2002].

Kling, Rob and Geoffrey McKim. 1999. Scholarly communication and the continuum of electronic publishing. Journal of the American Society for Information Science 50(10): 890-906.

Lawrence, Steve. 2001. Free online availability substantially increases a paper's impact. Nature webdebates. [Online]. Available: http://www.nature.com/nature/debates/e-access/Articles/lawrence.html [November 8, 2002].

Luzi, Daniela. 1998. E-Print archives: A new communication pattern for grey literature. Interlending & Document Supply 25(3) : 130-39.

Marshall, Eliot. 1999 E-biomed morphs to e-biosci, focus shifts to reviewed papers. Science 285(5429) : 810-11.

Odlyzko, Andrew. 2000. The future of scientific communication. In Access to Publicly Financed Research: The Global Research Village III, edited by P. Schroeder and P. Wouters, Amsterdam: NIWI, 273-78.

________. The public library of science and the ongoing revolution in scholarly communication. Nature webdebates. [Online]. Available: http://www.nature.com/nature/debates/e-access/Articles/odlyzko.html [November 8, 2002].

Okerson, Ann. 2001. What price 'free?' Nature webdebates. [Online]. Available: http://www.nature.com/nature/debates/e-access/Articles/okerson.html [November 8, 2002].

O'Reilly, Tim. 2001. Information wants to be valuable. Nature webdebates. [Online]. Available: http://www.nature.com/nature/debates/e-access/Articles/oreilly.html [November 8, 2002].

Public Library of Science. 2001. PLoS Open Letter. [Online]. Available: {http://www.plos.org/support/openletter.shtml} [November 8, 2002].

Roberts, Richard J. et al. 2001. Building a "Genbank" of the published literature. Science 291: 2318-19. [Online]. Available: http://www.sciencemag.org/cgi/content/full/291/5512/2318a [November 8, 2002].

Rzepa, Henry S. 2002. Chemistry preprints. Computer Software Reviews 42(3): 767.

Smith, Richard. 1997. Peer review: reform or revolution? British Medical Journal 315: 759-60.

Stallman, Richard. 2001. Science must 'push copyright aside.' Nature webdebates. [Online]. Available: http://www.nature.com/nature/debates/e-access/Articles/stallman.html [November 8, 2002].

Walker, Thomas J. 2001. Authors willing to pay for instant web access. Nature webdebates. [Online]. Available: http://www.nature.com/nature/debates/e-access/Articles/walker.html [November 8, 2002].

Wells, Robert D. and Herbert Tabor. 2001. Position statement by the American Society for Biochemistry and Molecular Biology. Nature webdebates. [Online]. Available: http://www.nature.com/nature/debates/e-access/Articles/asbmn.html [November 8, 2002].

Previous   Contents   Next

W3C 4.0   Checked!