Previous   Contents   Next
Issues in Science and Technology Librarianship
Winter 2004
DOI:10.5062/F40C4SQB

[Refereed article]

Database Reviews and Reports

A Comparison of Updating Frequency Between Web of Science and Current Contents Connect

Nancy J. Butkovich
Head, Physical Sciences Library
The Pennsylvania State University
njb2@psu.edu

Helen F. Smith
Agricultural Sciences Librarian
The Pennsylvania State University
hfs1@psu.edu

Claire E. Hoffman
Head Librarian, Abington College Library
The Pennsylvania State University
ceh8@psu.edu

Abstract

The Libraries at the Pennsylvania State University subscribe to the online databases Web of Science and Current Contents Connect. Concern was expressed regarding the great similarity in coverage between them. A comparison of title coverage found that Web of Science was more inclusive than Current Contents Connect across all disciplines. When updating frequency was compared, new science and social science journal issues appeared in both databases the same week approximately three quarters of the time. In the arts and humanities this is true only about half the time but these data are not as conclusive due to the small sample size. Each database has unique features. Web of Science has superior title coverage, while Current Contents Connect updates faster about 25% of the time. Unexpected significant problems were noted with updates to Current Contents Connect regarding timing of the updates and the definition of a "current week." The relative importance of the advantages and disadvantages of the two databases will vary depending on institutional needs.

Introduction

Traditionally, distinctions were made between "indexing and abstracting" sources and "current awareness" publications. The former provided a wealth of access points to the intellectual content of journals, but this came at the expense of speed, since the indexing to the content of any particular article often did not appear for many months after the article was published (Bottle 1979). The advent of electronic publication has resulted in a blurring of the roles of these two types of indexes. Rapid indexing and publishing now appear to be standard for the "indexing and abstracting" sources as well as the "current awareness" publications.

The Institute for Scientific Information (ISI) in Philadelphia, PA, produces publications of both types. Their flagship print publication, Science Citation Index (SCI), falls into the first category. This index, along with its two sister publications, Social Science Citation Index (SSCI) and Arts and Humanities Citation Index (AHCI), uses what ISI calls Permuterm Subject Indexing. Keywords within an article title are combined with other keywords in the same title in a rudimentary form of Boolean logic. This allows the user to be more precise in identifying relevant articles in a subject search. Corporate and individual author searching are also available. What makes these publications unique, is that they also index the references cited at the end of the source papers. As is typical of the indexing publications, these three sacrifice speed of publication for the extra features. The paper AHCI comes out twice a year, SSCI three times a year, and SCI six times a year.

ISI's best-known current awareness publications are a septet known collectively as Current Contents (CC). Each part of this set is devoted to a particular group of subjects: Life Sciences; Agriculture, Biology, and Environmental Sciences; Physical, Chemical and Earth Sciences; Clinical Medicine; Engineering, Computing, and Technology; Social and Behavioral Sciences; and Arts and Humanities. These publications are largely reproductions of journals tables of contents. They have minimal author and title indexing. The science and technology sections are published weekly; other sections are published at least every other week. In the case of the science sections ISI claims to have a two-week lag time between the time they receive a journal issue and the information being published in a CC issue (Institute for Scientific Information 2000).

Current Contents Connect, the electronic version of the print septet, is updated daily; Web of Science, the electronic incarnation of the three citation indexes, is updated every week. This is comparable in frequency to many print current awareness products, including Current Contents. Although separate subscriptions are available to each of the citation indexes and the seven Current Contents editions, the Pennsylvania State University has subscribed to all of them via the Web of Science (WOS) and Current Contents Connect (CCC) services (Table 1).

Table 1. Correspondence of sections between Web of Science and Current Contents Connect.
Web of Science Current Contents Connect
Science Citation Index
  • Agriculture, Biology & Environmental Sciences
  • Clinical Medicine
  • Engineering, Computing & Technology
  • Life Sciences
  • Physical Chemical & Earth Sciences
Social Sciences Citation Index Social and Behavioral Sciences
Arts & Humanities Citation Index Arts and Humanities

Anecdotal information suggested that the two databases were more obviously similar to each other than their print predecessors were. Title coverage was compared, and Table 2 shows the results of this comparison. With the exception of one arts and humanities title, all the titles in CCC were also indexed in WOS. The opposite was not true in that an average of 10 percent of the titles indexed in WOS were not indexed in CCC. Some titles were indexed in more than one section, causing the total title number for 'All Sections" to be less than the sum of the titles from each section.

Table 2. Title coverage in Web of Science and Current Contents Connect.
Percentages have been rounded off to the nearest whole number.
Description All Sections Sciences Social Sci. Arts & Hum.
Total titles (number) 8356 5829 1740 1139
Titles in both WOS & CCC 90% 86% 92% 98%
Titles in WOS only 10% 15% 8% 2%
Titles in CCC only <0.1% 0% 0% <0.1%

Searching and interface features aside, the other area of interest is the updating frequency of the databases. The purpose of this study was to compare the updating frequency of Current Contents Connect and Web of Science.

Methods

In order to compare the time of indexing for specific titles between the products, random samples were taken from the title lists in proportion to the number of titles in each subject section. The numbers of titles needed were determined using the following formula:

n=N/1+Ne^2

as described by Yamane (1973). In this formula, n represents the total sample size, N is the size of the total population, and e represents the rate of error, which we chose to be 5 percent. This was the set of titles chosen by the random method.

Several problems arose with the titles chosen by this random method:

Although some extra titles had been selected, there were not enough for the arts and humanities or the science sections. Additional titles were added from a list of journals most frequently cited by authors at Penn State-University Park. These titles were used in order from most to less frequently cited, compiled by averaging the number of citations to each journal for the years 1997-1999, according to data from each of the three Citation Indexes. This resulted in a set of titles chosen via the citation method.

The social sciences list did not experience this problem, and at the end of the study, we actually had more titles than we needed. In order to retain the correct subject proportions, a few titles were randomly eliminated from the list in order to reduce it to the correct size. These titles were chosen for elimination using a random number table (Beyer 1987). The sample size for each method of choosing titles (random or citation) is listed in Table 3.

Table 3. Sample sizes for the study.
Description Random Sample Size Citation Sample Size Total Sample Size
Number of science titles 219 37 256
Number of social sciences titles 76 0 76
Number of arts & humanities titles 47 3 50

The study was conducted for a total of ten weeks between 28 July 2000 and 22 September 2000. Each Friday all the titles in the study were searched in both databases in order to determine whether or not any new issues of the title had been added to the database during the previous week. This process continued for the next eight weeks. The tenth week was a "wrap-up" week. If an issue had been added to one database but not to the other, then that title was searched.

Results

Some journals did not publish an issue during the study period. Data regarding these are shown in Table 4. Although the science group had the largest number of titles that did not update, the arts and humanities group had the highest percentage, followed very closely by the social sciences. This severely restricted the number of issues for the updating analysis in these two groups. In fact the arts and humanities group ended up with only about half the minimum desirable number of issues for the updating analysis.

Table 4. Titles not updated during study period.
Description Random Titles Not Updated Citation Titles Not Updated Total Not Updated
  # % # % # %
Number of science titles 48 22 0 0 48 19
Number of social sciences titles 33 43 0 0 33 43
Number of arts & humanities titles 27   2   30  

Update Comparisons: The percentages for the sciences and the social sciences were similar. Both showed almost three quarters of the issues appearing in both databases within the same calendar week (Tables 5 and 6). Essentially all of the remaining titles appeared in Current Contents Connect between one and two calendar weeks before they appeared in Web of Science. The percentages for the arts and humanities showed a much higher level of variability, with just over half of the issues appearing in both databases in the same week (Table 7). Sixty percent of the journals in the arts and humanities did not publish an issue during the study period, so the sample for this area is much smaller than for the sciences or social sciences. Because of this the results for the arts and humanities are not conclusive.

Table 5. Update comparisons for science
Description Random Titles Updated Citation Titles Updated Total Titles Updated
  # % # % # %
Number usable issues 300 58 214 42 514 100
Number issues updated same time 217 72 162 76 379 74
Number issues CCC updated first 82 27 51 24 133 26
Number of issues WOS updated first 1 <1 1 <1 2 <1

Table 6. Update comparisons for social sciences.
Description Random Titles Updated Citation Titles Updated Total Titles Updated
  # % # % # %
Number usable issues 55 100 0 0 55 100
Number issues updated same time 40 73 0 0 40 73
Number issues CCC updated first 15 27 0 0 15 27
Number of issues WOS updated first 0 0 0 0 0 0

Table 7. Update comparisons for arts and humanities.
Description Random Titles Updated Citation Titles Updated Total Titles Updated
  # % # % # %
Number usable issues 26 96 1 4 27 100
Number issues updated same time 14 54 0 0 14 52
Number issues CCC updated first 11 42 1 100 12 44
Number of issues WOS updated first 1 4 0 0 1 4

Definition of Current Week:

In the course of this project, a significant and disturbing fact was noted regarding Current Contents Connect's definition of a current week. CCC allows users to limit their searches to selected date spans. Since it has traditionally been a print publication that appeared weekly it is logical to assume that one of the limit periods in the electronic version would be a week's worth of data. CCC has a limit labeled "current week", which according to the internal database help "includes journal issues and Current Book Contents for the current week. The span of dates given in parentheses defines the current week. Because Current Contents data are updated daily, this date span changes daily." However the implication of the phrase "current week" to a user is that it represents a seven day period. It was found that the "Current Week" could be anywhere from one to eight days. The date ranges defining "current week" were recorded when the searches were done. These are shown in the Table 8.

Table 8. Current week definitions in Current Contents Connect by day.
Monday Tuesday Wednesday Thursday Friday Saturday
        28 July
27th-27th
29 July
27th-29th
31 July
27th-29th
1 August
27th-31st
2 August
27th-1st
3 August
27th-3rd
4 August
3rd-3rd
5 August
7 August
3rd-4th
8 August
3rd-4th
9 August
3rd-8th
10 August
3rd-10th
11 August
10th-10th
12 August
14 August
10th-11th
15 August
10th-14th
16 August
10th-15th
17 August
10th-15th
18 August
17th-17th
19 August
21 August
17th-18th
22 August
17th-21st
23 August
17th-22nd
24 August
17th-24th
25 August
24th-24th
26 August
28 August
24th-25th
29 August
24th-28th
30 August
24th-30th
31 August
31st-31st
1 September
31st-1st
2 September
4 September
HOLIDAY
5 September
31st-1st
6 September
31st-6th
7 September
31st-6th
8 September
7th-7th
9 September
11 September
7th-8th
12 September
7th-11th
13 September
7th-12th
14 September
7th-12th
15 September
7:21am: 7th-14th
11:48am:14th-14th
16 September
18 September
14th-15th
19 September
14th-18th
20 September
7:37am: 14th-18th
11:25am: 14th-18th
11:26am: 14th-20th
21 September
14th-21st
22 September
7:30am: 14th-21st
9:00am: 21st-21st
23 September
25 September
21st-22nd
26 September
21st-25th
27 September
21st-25th
28 September
21st-28th
29 September
21st-28th
30 September

There are numerous days in which the database was not updated (16 Aug., 24 Aug., 6 Sept., 13 Sept., 26 Sept., 28 Sept.). There are also several days in which the "Current Week" consisted of one day. To further complicate the situation there were several instances in which it appeared that the database was actually updated during the working day (Eastern Time) rather than at night (15 Sept., 20 Sept., 22 Sept.). There does not appear to be any day of the week in which one could consistently expect to obtain the previous seven days of data. For example, on Thursdays you could retrieve anywhere from eight days of data (24 August) or one day of data (31 August).

Although the study was conducted in 2000, an examination of the time spans indicated for current weeks during September 2002, indicate that the situation still exists. The database is still apparently updated during the working day (Eastern Time), and the time period covered by a "current week" can be as little as one day or as much as seven. If a scholar regularly runs a search each week, the coverage of the material retrieved might show significant gaps, depending on the day of the week and the time of day that the search was conducted. Unlike CCC, Web of Science is consistently updated once a week. No matter how early the checks were run on Fridays during the study period, Web of Science was already showing its new update.

Conclusion

The online Web of Science and Current Contents Connect are more obviously similar in terms of the indexing updates than their print counterparts had been. Clearly their unique functions still exist and are reflected in the interface and searching features of the online products. The print Current Contents sections were designed for the researcher wanting to browse current issues of specific journals in their fields and consequently Current Contents Connect allows relatively easy browsing of subject areas and specific journal title table of contents. Is this feature necessary when users can now get journal contents emailed directly to them from many journal web sites and other internet resources? While there is an email notification feature in CCC, there is a restriction on the number of alerts allowed. The print citation indexes catered to scholars wanting to do subject or citation searching and logically, the Web of Science allows for the unique access to articles via the references cited by them. In terms of title coverage, the Web of Science product is significantly superior. CCC does have some advantage over WOS with regard to the frequency of updating. A major concern with Current Contents Connect is the way in which "current week" is defined. Because of the apparently erratic nature of the updates to this database, a scholar could potentially miss references to very important literature. It remains up to each individual institution to determine if the differences in the databases are important enough to subscribe to each product.

References

Beyer, W.H., ed. 1987. CRC Standard Mathematical Tables. 28th ed. Boca Raton, FL: CRC Press.

Bottle, R.T., ed. 1979. Use of Chemical Literature. 3rd ed. London: Butterworths.

Institute for Scientific Information. 2000. Current Contents: Physical, Chemical & Earth Sciences. 40(3): 1.

Yamane, T. 1973. Statistics, an Introductory Analysis. 3rd ed. New York: Harper and Row.

Acknowledgements

Soma Nag, graduate assistant in the Life Sciences Library, Penn State, for her assistance in comparing journal title lists.

Linda Musser, Head, Earth & Mineral Sciences Library, Penn State, for her comments and suggestions on the manuscript.

Previous   Contents   Next

W3C 4.0   Checked!