College and Research Libraries


Research Notes 
Ratings and Rankings: 

Multiple Comparisons of Mean Ratings 

William E. McGrath 
Ranking of journals or other objects according 
to mean ratings computed from an opinion sur-
vey is shown to be inappropriate if a test of sig-
nificance shows no difference between them. A 
Scheffe test for comparisons of mean ratings of 
journals ranked by Kohl and Davis [C&RL 
46:40-47 (Jan. 1985)] was performed. The 
results indicate no significant difference be-
tween means. Confidence intervals for every 
adjacent pair of journals in the list of ratings by 
ARL directors were also computed. The results 
indicate that every adjacent interval overlaps, 
and that the means are essentially tie scores. 
Treating them as significantly different, there-
fore, is a Type 1 error. 

Rank ordering of mean ratings, a common 
practice in library science research, can 
lead to serious Type 1 errors if the mean 
ratings are not first submitted to tests of 
significance. "Type 1" errors are those in 
which a hypothesis assuming no differ-
ence between two means, say, is actually 
true but is treated as untrue by the re-
searcher. In turn, Type 1 errors, if not rec-
ognized, may lead to unjustified social or 
administrative actions or other errors of 
judgment or policy. 

Two examples will illustrate. The first is 
from my own research some years ago, 
which inconclusively attempts to correlate 

mean ratings of subject-area characteris-
tics (computed from a 10-point scale) with 
variables of library circulation. 1 The ab-
sence of strong correlations may be attrib-
uted to the probable absence of significant 
differences between the mean ratings of 
subject areas. Had those differences been 
tested, the limitations of my design might 
have been realized. Fortunately the long-
term consequences were as negligible as 
the correlation, as I had merely failed to 
build good theory. 

The second example appears in an arti-
cle by Kohl and Davis. 2 These authors 
asked ARL library directors and deans of 
accredited library schools to rate thirty-
one library journals in terms of their im-
portance to evaluations of publications by 
librarians or faculty being considered for 
promotion and tenure. Each journal title 
was rated by each respondent on a 5-point 
Likert scale. The authors computed the 
mean rating of each journal, then ranked 
the journals according to these means. As 
in my own research, the authors did not 
test to determine whether means were sig-
nificantly different from each other-
although they did compare directors' rat-
ings to deans' ratings. Without such a test 
there is no evidence that one mean rating 
is any different from any other. 

The rankings in question appear in their 

William E. McGrath is Associate Professor in the School of Information and Library Studies, State University 
of New York at Buffalo, New York 14260. 

169 


170 College & Research Libraries 

table 1. 3 These ranks seem to assume that 
each mean is different-for example, that 
the mean for Library Quarterly, 4.4048, is 
different from that for Journal of Academic 
Librarianship, 4.3810-when in fact they 
are probably not different. That is pre-
cisely the same error cited in the first 
example-a Type 1 error. 

Kohl and Davis, however, did seek to 
avoid Type 1 errors, first by performing 
t-tests for the differences between the 
means of ARL directors and library school 
deans, then by looking at internal consen-
sus. They report the results of that test in 
their table 2. They conclude, that because 
deans and directors appear to agree on 
their ratings of journal "importance," 
there is a "perceived hierarchy of journal 
prestige.'' 

However, their Type 1 errors are be-
tween journals, not between deans and di-
rectors. Thus, their finding of a "per-
ceived hierarchy of journal prestige" is 
not supported. Although a perceived hier-
archy may exist, it cannot be determined 
from their table 1. Therefore, acceptance 
of these journal ranks at face value for the 
purpose of determining promotion and 
tenure of librarians and faculty could lead 
to inappropriate evaluation. 

The small visual differences between 
the means in table 1 and the small sample 
size from which the journal means were 
computed also cast suspicion on conclu-
sions drawn from them. The Scheffe test is 
appropriate for all possible comparisons. 4 

The data reported in their tables 1 (mean 
ratings) and 2 (sample sizes and standard 
deviations) make it possible to compute an 
overall mean square within (MSw), which is 
required to compute an F statistic, which, 
in tum, is required to perform the test. 
The equation for F is 

F = ____ (M_l -_M_2)_2---. 
1 1 

MSw ( fl;- + I\;"") (k-1) 

with df = k-1, N -k. 

(a) 

Working backwards, it is pos~ible to com-
pute MSw from the statistics reported in 
table 3, as follows: 

MSw = (ES/n;- ES2; )/(N- k), (b) 

March 1987 

where 52; and nj are the squares of the stan-
dard deviations and the sample sizes for 
each journal respectively. A sample size of 
42 for each journal, reported in Kohl and 
Davis' table 3, is assumed in computing 
the above equations. 

The Scheffe test was performed on 
means of ARL directors' ratings (left 
column of table 1) but only for the journals 
in Kohl and Davis' table 3, which contains 
the standard deviations necessary for the 
computation. From (b) above, MSw = 
2.23. This value was used in (a) to com-
pute F values for the Scheffe tests appear-
ing in table A. 

For no adjacent pair of journals did the 
computed values ofF exceed the test value 
of 1.57, indicating true null hypotheses in 
every comparison-i.e., that the means 
for every adjacent pair in the list are not 
significantly different from each other. 
Not until the journal at the top of the rank-
ings, College & Research Libraries, was com-
pared with one well down in the list, 
namely Library and Information Science Re-
search, was a significant difference ob-
served. Furthermore, Library and Informa-
tion Science Research is not significantly 
different from the journals following it in 
the list. This general lack of significance 
does not appear to support the rationale 
for strict ranking of these journals. At 
best, one might postulate two clusters of 
journals, with each journal in the first 
cluster essentially tied for first place and 
each in the second cluster tied for second 
place. To paraphrase Consumer Reports, 
journals within clusters are approximately 
equal in importance. 

Nearly identical results were obtained 
when a t-test for independent samples 
(though these samples may not be truly 
independent) was performed, again 
working backward from the standard de-
viations to obtain sums of squares and 
standard errors of the differences between 
each pair of means. 

Finally, confidence intervals for all 
means in the ARL directors' list were com-
puted, again at the .05 significance level. 
For every journal, the confidence interval 
overlapped the one above it and below it. 
For example, the lower and upper limits 
for C&RL were 4.60 and4.87, respectively, 
while the lower and upper limits for LQ 


Research Notes 171 

TABLE A 

SCHEFFE TEST FOR DIFFERENCES BETWEEN PAIRS 
AND CLUSTERS OF JOURNAL MEANS 

Journal Title 

Coil. & Res. Libr. 
Libr. Quart. 
J. Acad. Libr. 
Libr. Res. & Tech. Serv. 
Librat Trends 
Its. ech. and Libr. 
J SIS 
Library Journal 
Amencan Libraries 
RQ 
Special Libraries 
Libr. & Tf:.· Sci. Res. 
Collect. naffement 
Info. Proc. & gmnt. 
School Librak Journal 
Intern. Libr. ev. 
Microyra~hics Today 
Schoo Li rary Medta Q 
Intern. J. Law Libraries 
Law Library Journal 

*F(df: k- 1 = 19, N - k = 820), .05level = 1.57. 

Mean 

4.7381 
4.4048 
4.3810 
4.3810 
4.2381 
4.1429 
4.0952 
3.8571 
3.5000 
3.3810 
3.1667 
2.8810 
2.5238 
1.9286 
1.7381 
1.5714 
1.5714 
1.5714 
1.5476 
1.5238 

Pair-wise 
Fvalue* 

0.06 
0.00 
0.00 
0.01 
0.00 
0.00 
0.03 
0.06 
0.01 
0.02 
0.04 
0.06 
0.18 
0.02 
0.01 
0.00 
0.00 
0.00 
0.00 
xxxx 

Possible 
Ousters+ 

Possible 
Cluster 1 

Possible 
Cluster 2 

The F value refers to pairs of titles: the title listed and the one immediately following . Thus, the first F listed, 0.06, refers to College and 
Research Libraries and Library Quarterly. F values must exceed 1.57 to be significant. None are. 

+Means for journals within " possible clusters" are not significantly different from each other. But the first title in cluster 1 (C&RL) is 
significantly different [F(.05) = 1.71] from the first title in cluster 2 (Library and Information Science Research) , while the last title in cluster 1 
(Special Libraries) is significantly different (F = 1.71) from the last title in cluster 2 (Law Library Journal), clusters 1 and 2 overlap each other 
with Special Libraries. The difference between the average of cluster 1 and the average of cluster 2 is significant [F( .05) ,;, 22 .7] . 

were 4.09 and 4.72. Clearly, the upper 
limit of LQ falls well within the interval for 
C&RL, indicating that their means cannot 
be distinguished from each other. 

Visual inspection of the means for li-
brary school deans' rankings (right 
column of table 1) suggests that few signif-
icant differences would be found between 
adjacent journals in that list either. 

This analysis suggests that ranking av-
erage ratings without submitting them to 
appropriate tests of significance cannot be 
trusted. Such tests are necessary even 
when data are trustworthy-for example, 
when the sample is large, or when it other-
wise represents the population with a 
high degree of confidence. Here, a distinc-
tion should be made between performing 
tests of significance to guard against sam-
pling errors on the one hand and measure-
ment errors on the other. Here, the rating 
scores can properly be considered as mea-
surements subject to error. For example, 
an average score can hide a great diversity 
of opinion. If we ask 100 respondents to 
rate journals on a 1-to-5 scale, a particular 
journal could receive an average of 3.0 in 

several ways. At the extremes, all respon-
dents could give the journal a rating of 3; 
or 50 respondents could give a rating of 1; 
and 50, a rating of 5. Both scenarios pro-
duce an average of 3.0, but the first repre-
sents exact consensus. In the second, the 
average score hides a considerable degree 
of measurement error. In fact, in the sec-
ond scenario no individual respondent 
gives the journal a rating of 3.0 and we 
might well question whether a real con-
sensus exists that a journal with a rating of 
3.0 is really higher than one with a rating 
of2.9. 

Kohl and Davis sprinkle cautions 
throughout their study, noting that it has 
"important limitations" that must be con-
sidered "to maintain a proper perspective 
on the findings." Perhaps the major cau-
tion should address the use of these or 
similar ranks for determining tenure and 
promotion. 

If journal prestige and importance must 
be studied, then many related questions-
including those raised here and by Kohl 
and Davis-must also be studied. Which 
journals do the larger population of non-


172 College & Research Libraries 

ARL directors and ACRL members feel are 
important? What is the relationship be-
tween a respondent's own specialized 
area and the subject area of the journal be-
ing rated? What are the correlates of 
"prestige" or "importance"? Can pres-
tige or importance be predicted from other 
variables? What is the basis for equating 
prestige and importance? Is prestige a var-
iable of real utility, or does it merely make 
an author feel good? Do studies of prestige 
contribute to the knowledge base of our 
profession? Or does the knowledge base 
contribute to prestige? Prestige is not a 
guarantee of quality, say Kohl and Davis. 
Likewise quality is not a guarantee of pres-

March 1987 

tige. Then what is quality, and what is the 
relationship between prestige and qual-
ity? Kohl and Davis suggest citation analy-
sis; other kinds of impact should also be 
examined. It seems that whenever we at-
tempt to measure attitudinal variables, we 
can never really pin them down without 
reference to behavioral variables. Under-
standing of behavioral variables has much 
the greater potential for contributing to 
good theory. 

In conclusion, whenever rating scores 
are used to produce rankings of items be-
ing rated, those rankings should be sub-
jected to appropriate tests of statistical sig-
nificance. 

REFERENCES AND NOTES 

1. William E. McGrath, "Predicting Book Circulation by Subject in a University Library," Collection 
Management 1, no.3/4:7-26 (Fall/Winter 1976-77). Average ratings in this research were for the vari-
ables Hard/Soft, Pure/Applied, and Life/Nonlife. 

2. David P. Kohl and Charles H . Davis, "Ratings of Journals by ARL Library Directors and Deans of "fi 
Library and Information Science Schools," C&RL 46:40-47 Gan. 1985). 

3. All references to tables are to Kohl and Davis except for table A. 
4. John T. Roscoe, Fundamental Research Statistics for the Behavioral Sciences (New York: Holt, 1975), 

p.313. 

Authors' Reply 

David F. Kohl and Charles H. Davis 
We read William McGrath's comments 

on our study with considerable interest. 
Our only concern is that in order to make 
his point he has to make us say more than 
we were, in fact, comfortable saying. It 
frankly never occurred to us that anyone . 
would take the listing in Table 1 as some 
kind of precise ranking where ''each mean 
is different,'' since that is obviousty not 
the case. Not only did a number of the 
journals listed in Table 1 have identical 
means C\nd were, in those cases, 

"ranked" in alphabetical order but in ad-
dition we present two other possible 
"rankings" which vary in detail from the 
lists in Table 1. The point of the article, 
which was fairly explicitly made, was not 
that any one journal stood in a specific re-
lationship to any other journal, but that a 
clearly recognizable general pattern did 
exist with some journals consistently 
emerging toward the top, others toward 
the middle, and others toward the bot-
tom. 

David F. Kohl is Assistant Director for Public Services, Universi·ty of Colorado, Boulder, Colorado 80309. 
Charles H. Davf.s is Professor, Graduate School of Library and InfoTltUltion Science, University of Illinois, Ur-
btlm~, Illinois 618()1 . 


Research Notes 173 

In fact, Professor McGrath's own analy-
sis seems to confirm this general hierarchy 
or, as he calls it, clustering. It should be 
noted that he finds this very general clus-
tering (into two groups) using the Scheffe 
test-the most conservative test of this 
kind possible. A less restrictive test such 
as the Duncan, Tukey, etc., would invari-
ably have suggested finer distinctions 
among the journals. The issue, which Mc-
Grath's comments may obscure, is not 

whether there is or is not some hierarchy 
or ranked clustering but how fine the gra-
dations of the hierarchy or clustering are. 

We agree with McGrath's point that av-
erages don't necessarily constitute a de-
tailed ranking and hope that his com-
ments may help prevent a misreading of 
Table 1 of our study by casual readers. We 
do feel, however, that his misinterpreta-
tion of Table 1 created a bit of a straw man 
in our case. 

BAIRRM® 
HAS/TAU! 

Over 1fitlll~~~eefintls, 
plus pateBts. 

books 111111 RIOI'e! 

With Biologic11/ Abstr11cts/RRM (Reports. Reviews, 
Meetings) you'll receive 250,000 entries for 1987 
from over 9,000 serials and other publications from 
over 100 countries . 

No other reference publication provides you with 
comprehensive coverage of symposia papers, meet-
ing abstracts, review publications, bibliographies, 
research communications. books, book chapters and 
U.S. patents . In three easy-to-use sections-Content 
Summaries, Books and Meetings. 

The indexes in each issue provide four modes of 
access to the literature : Author, Biosystematic, 
Generic and Subject. 

Take advantage of this excellent coverage of impor-
tant new scientific research and discoveries for your 
library: 

Make sure ye11r library has it all! Subscribe 
today by contacting BioSciences Information 
Service (BIOSIS~) Customer Services, 2100 Arch 
Street. Philadelphia. PA 19103-1399 USA. Tele-
phone (215) 587-4800 worldwide or toll free 
(800) 523-4806 (USA. except AK. HI, PA) . Or 
contact the Official Representative in your area . 

CRL3871HIA 


Swets ... an attractive, 
many facetted and transparent 

subscription service. 
We would be pleased to send you-

our informative brochure as well as 
detailed documentation of our services.