Evidence Summary
Multiple Databases are Needed to Search the Journal Literature on
Computer Science
A Review of:
Cavacini, A. (2015). What is the best database for computer science
journal articles? Scientometrics 102(3): 2059-2071. http://dx.doi.org/10.1007/s11192-014-1506-1
Reviewed by:
Giovanna Badia
Liaison Librarian
Schulich Library of Science & Engineering
McGill University
Montreal, Quebec, Canada
Email: giovanna.badia@mcgill.ca
Received: 18 Sep. 2015 Accepted: 04 Nov.
2015
2015 Badia.
This is an Open Access article distributed under the terms of the Creative
Commons‐Attribution‐Noncommercial‐Share Alike License 4.0
International (http://creativecommons.org/licenses/by-nc-sa/4.0/),
which permits unrestricted use, distribution, and reproduction in any medium,
provided the original work is properly attributed, not used for commercial
purposes, and, if transformed, the resulting work is redistributed under the
same or similar license to this one.
Objective – To
compare the coverage of computer science literature in four bibliographic
databases by checking the indexing of a selection of journal articles. The
purpose of this comparison was to identify the most comprehensive database in
computer science and determine whether more than one database is needed to
search for articles on computer science topics.
Design – Comparative
database evaluation using citation analysis.
Setting – Computer
science journal literature found within the INSPEC, Scopus, Web of Science, and
DBLP databases.
Subjects – 1,135
computer science journal articles published by an Italian university’s
researchers from 1979 to 2014.
Methods – The University
of Milan’s institutional repository (AIR), containing publications authored by
the university’s researchers, was searched in October 2014 for journal articles
that were assigned the subject heading “informatica”
(the word for computer science in Italian). The author then searched the titles
of these journal articles in each of the databases to check whether they were
indexed. For articles indexed in all four databases, the author also examined
the quality of the bibliographic records by looking for the presence of 20
elements (e.g., the “cited by” option, ranking of search results, precision of
results, etc.) in each database’s record. These overlapping articles were also
searched in Google Scholar to help compare the quality of the records between
the databases.
Main Results – Scopus
indexed 75.86% of the journal articles found in AIR, Web of Science indexed
64.49%, DBLP indexed 61.15%, and INSPEC indexed 53.39%. Web of Science and
INSPEC put together covered 74.80% of the articles, which is comparable to the
amount indexed by Scopus. DBLP and Scopus contained the highest number of
references to articles that were not found in the other databases, about 4%
each. Out of the 1,135 journal articles, 391 (34.45%) were indexed by all four
databases, with Web of Science scoring the highest for providing the best
quality bibliographic records for these articles.
Conclusions – According
to the author, the findings showed that INSPEC, Scopus, Web of Science, and
DBLP “complemented each other, in a way that neither one could replace the
other” (p. 2068) when searching the computer science literature. While there
was overlap between databases, they each also contained unique articles.
Commentary
Based on the author’s literature review, there are
many published studies that have compared two or more of the same databases
being contrasted in this study, and examined one or more databases that index
the computer science literature. However, it seems that none, thus far, have
compared INSPEC, Scopus, Web of Science, and DBLP for their coverage of the
computer science literature, or used publications from an institutional
repository as the source of data for making database comparisons.
The EBL Critical Appraisal Checklist (Glynn, 2006) was
employed to help objectively determine the strengths and limitations of this
study. Its strengths lie with the data collection and study design. The author
clearly outlines the procedure he utilized to conduct the research and
describes the results concisely. Readers would be able to easily replicate the
methodology.
The major limitation of this study concerns the study
population, in this case the source of data used to compare the databases. As
noted by the author, only journal article titles that were assigned the computer
science subject heading were extracted from the University of Milan’s
institutional repository and used to compare the four databases. These journal
articles accounted for only 29.64% of the total number of documents (1,135 out
of 3,828 documents) in the repository about computer science. The author states
that “in the computer science field, proceedings are usually a prime avenue for
publications, and selected conference proceedings are as prestigious as journal
articles” (p. 2069). Conference papers account for more of the published
literature in computer science than journal articles. Therefore, the sample
used as the data source in this study is not representative of the computer
science literature, which means the results cannot be generalized to searching
the entire corpus of published literature in computer science.
Additionally, another limitation pointed out by the
author is that databases continuously update their list of indexed journals so
that “identical searches might give different results if repeated over time”
(p. 2069). The results in the study are a snapshot in time, requiring that the
study be repeated to confirm the findings.
The author suggests that a future direction for
research would involve searching the databases under investigation for the
conference papers found in the institutional repository. This reviewer thinks
that the future direction should have been done in this study in order to
validate the findings. The author could have randomly selected documents from
the repository to obtain a more representative sample or searched the titles of
all 3,828 documents in the different databases.
Despite its limitations, this study will be of
interest to librarians seeking to compare databases in a specific discipline
for teaching, reference, or collection development purposes. The study
successfully demonstrates that the source of data for a comparative database
evaluation can also be taken from an institutional repository that provides
references to all the scholarly output of its researchers. Readers should take
a representative sample of documents from the institutional repository to
ensure the validity of their results.
Reference
Glynn, L. (2006). A critical appraisal tool for library and information
research. Library Hi Tech, 24(3),
387-399. http://dx.doi.org/10.1108/07378830610692145