Previous   Contents   Next
Issues in Science and Technology Librarianship
Winter 2006
DOI:10.5062/F45H7D7K

Database Reviews and Reports

Google Scholar -- Science & Technology

Marian Burright
Life Sciences Librarian
University of Maryland
College Park, Maryland
mburrigh@umd.edu

Introduction and Content

Google Scholar is a freely available Internet search engine for academic resources in all subject fields. The Google Scholar robot crawler searches content in peer-reviewed journal literature, books, dissertations, preprint repositories, academic society papers (if available), and technical reports. Google Scholar does not provide information about its content for STM research. The user can determine scientific and technical content only through searching or directly contacting major publishers and academic societies for their agreements with Google. Major STM publishers represented on the Google Scholar engine include: ACM, Annual Reviews, Blackwell, IEEE, Ingenta, IOP, Nature Publishing Group, Springer, Wiley, and others (Notess 2005). According to the Publications Division of the American Chemical Society, ACS journal articles will be indexed by Google Scholar using bibliographic information and abstracts. ACS will evaluate this initiative early in 2006 as a basis for future efforts with Google Scholar. It does not appear that Google Scholar crawls content published by Elsevier. For example, a search for an article published in the December 2005 issue of Trends in Ecology and Evolution, "Comparative Evaluation and its implications for mate choice" by Melissa Bateson and Susan Healy in the Advanced Google Scholar retrieves another article by Bateson in Trends in Ecology and Evolution published in 1994. Google Scholar provides access to this article through the Cambridge Scientific Abstracts interface, indicating that current access to Elsevier publications is unavailable through Google Scholar.

Google Scholar describes its scope and content generally, and, unlike the major science and technology bibliographic databases such as PubMed, ISI Web of Science, or Cambridge Scientific Abstracts, the search engine does not provide any source lists of publications searched or authority files for author names, journal titles, or controlled vocabulary for subjects. These considerations greatly limit Google Scholar's use as an exclusive research tool for STM research. The search engine, however, returns a large number of search results very quickly.

Although ostensibly rich in STM content, Google Scholar exhibits limitations in accuracy and timeliness. Its timeliness is limited in comparison to other free (and subscription-based) search engines such as PubMed. For example, a keyword search on "avian flu virus" in PubMed returns 134 articles and 14 reviews. The most recent search result is a case report published in the October 20 issue of Nature magazine. In order to compare Google Scholar's timeliness with PubMed's, I conducted an Advanced Search on keywords avian flu virus in Google Scholar with the following limits: published in Nature and publication year 2005. This search retrieved a total of 16 results:

The most recent publication retrieved was a Brief Communication from the 14 July, 2005 of Nature on the avian flu virus. This indicates at least a two-months lag in Google Scholar's coverage of Nature publications, with no coverage of research articles. Lastly, timeliness of coverage appears to be limited. Search results for on "copepods feeding behavior," for example, retrieves the same documents within a one-month period.

Search Features

Simple Search

Google Scholar offers Simple Search and Advanced Search options. The Simple Search is a powerful tool in a number of ways. It automatically supports both Boolean and truncation operators. Instead of truncation symbols, Google Scholar uses stemming technology, which retrieves documents with word variations based on keywords entered. For example, a search on feeding behavior will retrieve documents with feeding or feed and behavior or behaviors. Stemming technology is a useful search feature because it does not require the user to enter specific truncation operators (or to even know that they exist), however, for those used to using truncation symbols, an explanation of stemming technology is not available in the Google Scholar Help, thus the user must peruse the main Google Help for advanced searching techniques. The Simple search engine is remarkably fast, it retrieved 504 items on the topic feeding behavior copepods Chesapeake Bay in .06 seconds.

The Simple Search also offers searching on author names in a variety of ways. A user can search by both an author last name and a keyword to locate all publications by an author on a given topic, for example: Greene string theory will locate all publications by author Greene on string theory. The strength of this search is its speed; it retrieved 4,540 records in .09 seconds. One disadvantage of a combined author and keyword Simple Search is the large number of documents it retrieves. If the user were interested only in peer-reviewed papers on string theory by Greene, the Advanced Search is a better option to limit for articles only.

In general, Searching by author name in Google Scholar is most effective when the author's initials are known. Unlike bibliographic databases and catalogs, Google Scholar lacks an authority file for author names, greatly limiting its ability to function as a bibliographic search engine.

Advanced Search

The Google Scholar Advanced Search offers a number of search options for articles. It supports keyword and author searching and allows the user to restrict results published within a range of years, by name of publication, and by subject area. Keyword searching is more sophisticated than the Simple Search. It includes searching by all words, exact phrase, at least one of the words, without the words, and where the words occur in the document. Publication searching is, again, problematic like author searching, because it lacks authority control. Publication data, according to the Advanced Google Scholar Help can be incomplete or incorrect. Data on journal names is compiled automatically from web pages searched, thus, no human intervention occurs to verify publication names and history. This is a crucially limiting factor to Google Scholar as a search engine for scientific and technical information considering the importance of accuracy of publication data to those fields. Subject searching in Google Scholar is equally limited due to a lack of subject hierarchy. In the biological sciences, for example, searching for taxonomic data is more efficient in a database such as BIOSIS Previews due to its hierarchal mapping of living organisms. In Google Scholar, it is possible to retrieve articles about copepods, but the user does not have access to search by other essential data such as the taxonomy or geographic distribution of copepods, as in BIOSIS Previews.

Search Result Display

The Google Scholar search display offers a number of challenges to the user. First, the order in which results appear is unclear. Google Scholar Help indicates that search results are ranked in descending order by the number of times an article is cited, however, the rank order of articles retrieved below indicates otherwise. Several keyword searches have confirmed that results are indeed not returned according to citation ranking. The screen shot below shows that results are not sorted in descending order by number of times cited.

Second, the user has no control over the types of documents returned, their level of full-text access, the reliability of the access points, or the interface from which the user will access the papers retrieved, if available. Third, Google Scholar does not offer sorting by date, making it difficult for the user to ascertain information about the most recent publications, especially for timely issues. Most bibliographic databases for STM research offer options to limit and sort search results in a variety of ways, advantages that Google Scholar does not provide.

The Google Scholar search display offers a "cited by" feature, similar to the citation searching provided by ISI Web of Science, however, it is unclear how Google Scholar calculates citation rates. Google Scholar is an inadequate search tool for citation data since it does not provide nearly accurate data on publication names nor explain how citation rates are calculated.

Summary

Google Scholar offers various search options for free academic resources on the Internet. Its lack of authority control for basic data elements such as author names and publication titles greatly limits its ability to sustain a serious scientific and technical research audience as an exclusive source of literature. Its speedy search engine and voluminous output are tradeoffs that a researcher must consider weighing against accuracy and thoroughness in a literature search. As a free Internet search engine, Google Scholar falls short of another free search engine such as PubMed.

Notes

Notess, Gregory. 2005. Scholarly Web Searching: Google Scholar and Scirus. Online. 29(4) 39-41.

Previous   Contents   Next

W3C 4.0   Checked!