; v ' 
 
 THE LIBRARY 
 
 OF 
 
 THE UNIVERSITY 
 OF CALIFORNIA 
 
 LOS ANGELES 
 
 
 
 l 
 
 
 S^Ss 
 
 *Z& w.V'TS 
 
 m 

 
 PSSt* 1 ^; 
 
 1 
 
 
 'WS&' 
 w4
 
 BOOKS BY H. L. HOLLINGWORTH. 
 
 The Inaccuracy of Movement, Archives of Psychology, No. 13, 
 (Columbia Contributions to Philosophy and Psychology, Vol. 
 XVII, No. 3) pp. 87. June, 1909. New York. The Science 
 Press. 80 cents. 
 
 The Influence of Caffein on Mental and Motor Efficiency, 
 
 Archives of Psychology, No. 22, (Columbia Contributions to 
 Philosophy and Psychology, Vol. XX, No. 4) pp. 167. April, 
 1912. New York. The Science Press. $1.50 (paper), $1.75 
 (cloth). 
 
 Principles of Appeal and Response, (A Systematic Textbook of 
 Business Psychology) pp. 315. New York. 1913. D. Apple- 
 ton and Company. $2.00 net. By mail, $2.16. 
 
 Experimental Studies in Judgment, Archives of Psychology, No. 
 29, (Columbia Contributions to Philosophy and Psychology, 
 Vol. XXII, No. 3). pp. 125. December, 1913. New York. 
 The Science Press. $1.25 (paper), $1.50 (cloth).
 
 r rrr/ 
 
 EXPERIMENTAL STUDIES 
 IN JUDGMENT 
 
 H. L. HOLLINGWORTH 
 
 COLUMBIA UNIVEBSITT 
 
 fc 'X I? 
 
 ARCHIVES OF PSYCHOLOGY 
 
 EDITED BT 
 
 B. S. WOODWORTH 
 
 No. 29, DECEMBER, 1913 
 
 COLUMBIA CONTRIBUTIONS TO PHILOSOPHY AND PSYCHOLOGY, 
 
 VOL,. XXII, NO. 3 
 
 NEW YORK 
 THE SCIENCE PRESS 
 
 AGENTS G. E. STECHERT&CO.; London (2 Star Yard. Carey St., W. C.); Leipzig (Hospital St., ro); 
 
 Paris (76, rue de Rcnnes).
 
 
 
 T[ 
 
 \
 
 Bus. Admin. 
 Library 
 
 C 
 ffl 
 
 A/ 
 
 72, 
 
 TABLE OP CONTENTS 
 
 PAGK 
 
 INTRODUCTION v 
 
 CHAPT. I. Judgments of Personal Efficiency . . 1 
 
 II. Perceptual Criteria of Judgments of Efficiency. . 17 
 
 III. Performer and Witness as Judges of Efficiency. . 27 
 
 IV. The Central Tendency of Judgment 44 
 
 V. The Direction of Judgment 53 
 
 VI. Natural or Habitual Tendencies of Judgment. . . 59 
 
 VII. Judgments of Similarity and Difference 68 
 
 VIII. Influence of Form and Category on the Outcome 
 
 of Judgment 85 
 
 IX. The Perceptual Basis for Judgments of Extent 
 
 of Movement 92 
 
 X. Some Characteristics of Judgments of Evaluation 96
 
 INTRODUCTION 
 
 A GENERAL title, such as that given to this monograph, can give 
 very little preliminary indication of the nature of the problems 
 therein suggested or investigated. In the study of those mental 
 processes, acts or resultants which we vaguely call judgments there 
 are perhaps four chief problems with which special researches may be 
 concerned : 
 
 (a) The nature and mechanism of judgments. Studies which 
 have sought for introspective ear-marks or criteria of the judgment 
 process, qualitative differentia between judgments and other ele- 
 mentary or complex states or processes or acts, belong here. Here 
 also would belong any attempt to describe or hypothecate the physio- 
 logical correlate of judgments. With these problems the studies here 
 presented are not concerned. 
 
 (&) The forms, varieties and classification of judgments. This 
 may be conceived as a task for logical rather than for psychological 
 inquiry. It may suffice here merely to indicate that these studies 
 are in no primary way concerned with problems of classification. 
 
 (c) The basis or perceptual criteria of typical judgments, the 
 data which determine the content, direction, or outcome of special 
 varieties of judgments under given conditions. Two of the studies 
 here presented are specifically directed toward this type of problem. 
 Thus in Chapter II. and in Chapter IX. attempts are made to dis- 
 cover on what data one relies when he judges the efficiency of a work 
 process or the extent or duration of a voluntary movement. 
 
 (d) The laws or behavior of judgments, and the ways in which 
 the laws are modified or the behavior conditioned by specific varia- 
 tions of the judgment situation. Among these specific variations of 
 the judgment situation may be mentioned, by way of examples, the 
 form in which the judgment is expressed, the category employed, the 
 nature of the material to be judged, individual, age, sex and group 
 differences, previous practise, preceding judgments, habitual judg- 
 ment tendencies, etc. On problems of this sort all of the studies here 
 presented have more or less direct bearing. 
 
 The studies have been made from a fairly definite point of view, or 
 at least they have been actuated by a fairly permanent interest. 
 Stated in general terms, this has been an interest in the way in which 
 mind works rather than in what is in the mind at the moment of its 
 operation. As I have elsewhere remarked, such an interest finds but
 
 VI EXPEEIMENTAL STUDIES IN JUDGMENT 
 
 little use for the introspective method. It is an interest "not in the 
 momentary content of a conscious moment; nor in the descriptive 
 character of the sensory fragment which may at that moment be the 
 bearer of meaning ; nor in the instrument, criterion or vehicle of an 
 act of apprehension, a comparison, a feeling, or a choice. " It is above 
 all an interest in "the outcome of this moment in the form of 
 behavior, an act, a choice, a judgment, and in the character, reli- 
 ability, constancy, and significance which the outcome of such a 
 mental operation possesses." 
 
 Of the ten studies which the volume contains, six are entirely new 
 and have not been elsewhere reported. The remaining four have 
 already appeared in the psychological periodicals. They are re- 
 printed here because of their relevance to the later studies and 
 because they were originally part of the larger plan of which this 
 monograph is a partial result.
 
 ' 
 
 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 i 
 CHAPTER I 
 
 JUDGMENTS OF PERSONAL EFFICIENCY 
 
 INVESTIGATORS of fatigue have frequently found occasion for the 
 remark that the individual's judgments of the quality of his own 
 performance in a piece of work in progress or just completed are far 
 from being a reliable index either of the capacity of his organism at 
 the time, or of the actual amount, speed, or quality of the work done. 
 The matter usually rests, however, with this generalization. No 
 attempts seem to have been made to determine experimentally the 
 reliability of such judgments, except in the cases of a few studies of 
 the confidence of simple sensory discriminations. In a sense, of 
 course, the task of judging the intensity, extent, or duration of two 
 sensory impressions may be called work, even though no emphasis be 
 laid on the number of such judgments to be made in a given unit of 
 time. But sensory discrimination is not to be called work in the 
 active sense indicated in such processes as the production of ergo- 
 grams, the execution of tapping movements at maximal speed, or the 
 similar high speed performances of " naming opposites," "naming 
 colors, ' ' or mental calculation. 
 
 In this chapter will be reported a preliminary attempt to inves- 
 tigate the characteristics, conditions, tendencies, and reliability of a 
 worker's judgments of the efficiency of his own performance in such 
 active processes as those just mentioned. Such questions as the fol- 
 lowing will define the nature of the problem, indicate the direction 
 taken by the present inquiry, and suggest the importance of the topic 
 to that sort of psychology which is interested in the dynamic aspects 
 of the life of psycho-physical organisms. 
 
 1. How reliably can a performer judge the quality of his own 
 performance when no objective measures are at his disposal? To 
 what extent is the conscious concomitant of an action a guarantee of 
 the quality or effects of that action ? 
 
 2. What are the criteria which constitute the basis of one's judg- 
 ments of his own efficiency at a given moment, or through a given 
 period of time ? 
 
 3. What are the conditions which modify the character and accu- 
 racy of such judgments, both in the same task and in the case of 
 
 1
 
 2 EXPEBIMENTAL STUDIES IN JUDGMENT 
 
 different tasks? How do the characteristics of the judgment of per- 
 sonal efficiency change with the conditions of variation and with the 
 nature of the performance? 
 
 4. What relations exist between the certainty or degree of con- 
 fidence of such judgments and their accuracy as shown by objective 
 record ? 
 
 5. How do the judgments of the performer compare in these 
 respects with the judgments of a witness who observes the progress 
 of the work without participating in it, and without knowledge of the 
 objective records? 
 
 6. Do practise, fatigue, transfer, and similar processes affect the 
 course and reliability of these judgments ? 
 
 7. What individual differences exist in these various respects? 
 How does proficiency in performance correlate with reliability of 
 judgment ? 
 
 Such questions as these open up a large field of inquiry which has 
 hardly been explored in even a preliminary way. The present study 
 is limited to perhaps three of these problems, and must even here be 
 considered as hardly more than suggestive. It will achieve its main 
 purpose if it succeeds in directing attention toward the general field 
 in which it lies. Further problems of a similar kind will be taken up 
 in Chapters II. and III. 
 
 Several investigators, interested mainly in the determination of 
 the differential threshold, in the examination of the psycho-physical 
 relations and methods in the field of sensation, and in the measure- 
 ment of recognition memory, have taken occasion to instruct their 
 observers to state, in the case of each judgment of sensory discrimina- 
 tion, recognition, etc., the degree of confidence with which the judg- 
 ment was expressed. Since the present study constitutes the applica- 
 tion of a similar procedure to judgments of the efficiency of perform- 
 ance in a work process, a brief account of the most important results 
 of these studies may well be given here. 
 
 Fullerton and Cattell 1 while investigating the perception of small 
 differences in extent and speed of movement, lifted weights, and 
 intensity of lights, proceeded mainly by the methods of right and 
 wrong cases and average error. But these methods were combined 
 with the method of just observable differences by requesting the 
 observer to state, after each judgment of difference, the degree of his 
 confidence in his judgment. Three degrees of confidence were used, 
 A, B, and C, indicating, respectively, "quite confident," "fairly 
 confident," and "less confident." Among the conclusions based on 
 these results the following are of special interest in the present 
 connection : 
 
 i Fullerton and Cattell, ' ' On the Perception of Small Differences. ' '
 
 JUDGMENTS OF PESSONAL EFFICIENCY 3 
 
 Extent of Movement. ". . . with regard to the degrees of con- 
 fidence a, &, and c, it may be objected that the terms 'quite confident,' 
 'fairly confident,' and 'less confident' are extremely vague. In a 
 series of experiments with the one observer each of these terms may 
 be assumed, perhaps, to have approximately the same meaning in 
 different parts of the series; but the quantitative relations of the 
 subjective feeling of confidence in the three cases remain very ob- 
 cure, nor can it be assumed that they may be measured by the per- 
 centage of right cases corresponding to each degree of confidence. 
 The fact that an observer is always right when he feels quite confi- 
 dent, and right 97 per cent, of the time when he feels fairly confi- 
 dent, does not prove that the amount or degree of his confidence in 
 the two instances is as 100 to 97" (p. 63). 
 
 Weights. "The confidence (of A and B judgments) varies nearly 
 as the percentage of right cases (with varying sense differences) and 
 some reliance may therefore be placed on such introspection. We see 
 however . . . that different individuals place very different meanings 
 on the degree of confidence. . . . Those observers who felt the great- 
 est degree of confidence in their judgment had the largest probable 
 error, while those who were least seldom quite confident had the 
 smallest probable error. . . . We see that an observer is more apt to 
 be right than wrong, even when he feels very little confidence in the 
 correctness of his decision. We also obtain a rough measure of what 
 reliance may be placed on the judgment of the observer" (p. 126). 
 
 Lights. ' ' The confidence of the observer is hence a fair measure 
 of the correctness of his judgment, but it is evident that A and B have 
 a widely different meaning in the case of the several observers. . . . 
 It is worth noting that when the discrimination was equally good the 
 confidence was less with lights than with weights" (p. 144). 
 
 Griffing's 2 observers, in judging sensations of pressure and im- 
 pact, also estimated their degree of confidence in each judgment in 
 some experiments. Griffing concludes, on this point : ' ' The degree of 
 confidence in the perception of intensive differences varies greatly 
 for individuals, the proportion of wrong judgments of which ob- 
 servers were confident ranging from 1/3 to 1/50. The probability of 
 correctness was for most observers from .8 to .9. There is no relation 
 between either of these quantities and the accuracy of discrimination. 
 The percentage of correct guesses (D judgments) varied from 52 per 
 cent, to 70 per cent, the average being 59 per cent." 
 
 Henmon, 3 in a study the chief object of which was the correlation 
 
 2 Griffing, ' ' On Sensations from Pressure and Impact, ' ' Psych. Mon., Vol. I., 
 No. 1. 
 
 3 Henmon, ' ' Time and Accuracy of Judgment, ' ' Psych. Bev., May, 1911.
 
 4 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 of the speed with the accuracy of judgments of visual linear magni- 
 tudes, also instructed his observers to assign their degree of confidence 
 to each judgment. He used four degrees of certainty, designated as 
 "perfectly confident," "fairly confident," "with little confidence," 
 and "doubtful." Henmon's chief conclusions on this aspect of his 
 problem are as follows : 
 
 "The time of judgment increases uniformly as the degree of con- 
 fidence decreases. The time of wrong judgments is on the average 
 longer than that of right judgments, while under each category the 
 wrong judgments are in general shorter. The time of wrong judg- 
 ments is more variable than that of right, and there are indications 
 of two kinds of wrong judgments, those too quick and those pro- 
 longed beyond a certain optimal time. The degree of confidence 
 varies, from subjects who are perfectly confident in 90 per cent, of 
 500 judgments to those who are perfectly confident in less than 10 
 per cent. While there is a positive correlation on the whole between 
 accuracy and degree of confidence, the latter is not a reliable index of 
 the former. Subjects whose judgments are quick are neither more 
 nor less accurate than those whose judgments are slow." 
 
 In experiments on the effect of length of series on recognition 
 memory, Strong instructed his subjects to grade the confidence of 
 their recognitions of pages of advertisements. Three degrees of cer- 
 tainty were used, "absolutely certain," "reasonably sure," and 
 "very doubtful." Pure guesses were not required. So far as his 
 conclusions bear on the subject of the present study they are as 
 follows : 
 
 "The accuracy approximates with 'very doubtful' recognitions, 
 regardless of the length of the series. . . . Recognitions not accom- 
 panied by a feeling of absolute certainty are practically no better 
 than random guesses. . . . As the difficulty of the task increases, the 
 ratio of 'absolutely certain' recognitions to 'reasonably sure' and 
 'doubtful' recognitions decreases." In general, "we have approxi- 
 mately three fourths the accuracy in pile No. 2 ('reasonably sure') 
 that we find in pile No. 1 ('absolutely certain') and one half the accu- 
 racy in pile No. 3 ('doubtful') that we find in pile No. 1." These 
 results were found only when the various observers and the various 
 tasks were combined. "It was not the case with the individual sub- 
 jects. . . . With each successive series, implying a difference in the 
 difficulty of the task, the relationship between the three piles 
 changed. " * In a later study Strong has also investigated the degree 
 of confidence of recognitions of words, after varying intervals. 
 
 < Strong, "Effect of Length of Series on Recognition Memory," Psych. 
 Rev., Nov., 1912.
 
 JUDGMENTS OF PEESONAL EFFICIENCY 5 
 
 The Present Experiments 
 
 In order to secure an adequate situation for the study of judg- 
 ments of personal efficiency in an active work process, four features 
 must be provided for : 
 
 1. The task should be one in which the performer has reached a 
 practise level of performance which closely approximates his physio- 
 logical or psychological limit. Work on this level of performance 
 will show variations in both directions from an average degree of 
 proficiency. These variations in the directions of "better" and 
 "worse" performance will be approximately equal, except that occa- 
 sional large inferior records may be made, thus producing variations 
 which can not be equalled by deviations in the direction of "better." 
 It is possible that because of this fact, the ideal place for such work 
 would be on the secondary slope of the practise curve. But there 
 should at any rate be no considerable excess of superior performances 
 such as would occur if the worker were still on the primary slope of 
 the curve of practise. 
 
 2. The conditions of performance and the technique of record 
 should be such that, although objective measure of the work is 
 secured, the performer shall have no direct knowledge of these data. 
 The judgment should be based solely on his introspective impressions 
 of the ease, smoothness, agreeableness, or speed of his work. For this 
 situation to be attained the most successful plan is to keep the amount 
 and quality of the work constant and to make the speed of perform- 
 ance (recorded by a second person) the objective measure of efficiency. 
 
 3. Various types of tasks should be examined, ranging from 
 work which is chiefly motor and fairly automatic to work which 
 is mainly mental in character. An intermediate stage should 
 also be represented, and is afforded by tests involving perceptional 
 reactions. In the motor work the observer will be enabled to attend 
 more or less directly and objectively to the progress of the work, on 
 the perceptual level more or less attention will be demanded by the 
 details of the process, and observation will be less direct. In the more 
 exclusively mental work attention may be supposed to be quite occu- 
 pied with the immediate details of performance, and the judgment 
 will be still less direct in character. It is quite conceivable that as 
 one passes from stage to stage the criteria of the judgment of effi- 
 ciency will shift from one ground to another or others. The intro- 
 spective analysis of these criteria constitutes a profitable direction of 
 inquiry. 
 
 4. The various tasks, to be strictly comparable, should be about 
 equally difficult, should continue for about the same time, should be 
 equally practised, and should yield about the same per cent, of cor- 
 rect judgments.
 
 6 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 As tasks which satisfy the above requirements and which are at 
 the same time technically convenient and fairly well standardized, 
 the following three well-known laboratory tests were chosen. 
 
 Stage 1. The Tapping Test. Performer, holding short stylus in 
 right hand, elbow resting on table, tapped 400 times on metal plate at 
 maximal speed. Each tap was recorded by an electric counter and 
 the total time taken with the stop-watch. 
 
 Stage 2. The Color-naming Test. The Woodworth-Wells blank 
 was used, the colors being named in the same order at each 
 trial. The test blank shows 100 patches of color, each 1 cm. square, 
 and separated by spaces of 1 cm. from its neighbors. Each of the five 
 colors blue, red, green, black, and yellow, is repeated twice in each of 
 the 10 lines of 10 colors each. All sequences of the same color are 
 avoided, as are frequent occurrences of the same sequence of colors. 
 The colors are to be named in order, as in reading, as rapidly as pos- 
 sible. The total time was taken with the stop-watch. No errors were 
 permitted. 
 
 Stage 3. Naming Opposites of Words. A series of 50 adjectives 
 used by the writer in a previous study. The performer was required 
 to go down the list, giving in turn the opposite (antonym) of each 
 word and to complete the list as quickly as possible. The total time 
 was recorded with the stop-watch. At each successive trial the order 
 of occurrence of the words was changed, each order being a chance 
 one. No errors were permitted. 
 
 Each test was repeated daily during the major part of the experi- 
 ment. During the later days two daily trials were made. In order 
 to eliminate practise effect, 60 trials of each test were made (cover- 
 ing a period of two months) before the feature of the experiment here 
 reported was introduced. By this time all the performers (three in 
 number) had practically reached a practise level and during the suc- 
 ceeding 72 trials, on which the present study is based, the average 
 amount of gain in the three tests was but slight. The only exception 
 is the color-naming test, which allowed a certain amount of memory. 
 The average records at the beginning of the practise curve, after the 
 60 preliminary trials, and at the close of the experiment, were as 
 follows : 
 
 Average at Close of 
 Test Initial Average Average after 60 Trials Experiment 
 
 Tapping 45.5 sec. 39.0 sec. 38.0 sec. 
 
 Color-naming 44.0 37.0 28.0 
 
 Naming opposites 46.7 29.0 26.0 
 
 The three tests seem to satisfy to a sufficient degree the conditions 
 just enumerated as requisite. Each observer, after each trial in each 
 task, judged his performance to have been either "better than usual"
 
 JUDGMENTS OF PERSONAL EFFICIENCY 7 
 
 or "worse than usual," and assigned a degree of confidence to his 
 judgment. Four degrees of confidence were used, A (absolutely 
 certain), B (fairly certain), C (slightly certain), and D (a mere 
 guess). All records were kept from the performer's knowledge and 
 no computations were made, on the point under investigation, until 
 the experiment was completed. One of the observers (H) was the 
 writer. Of the other two (G and L) G was a college undergraduate 
 music student, with no psychological training. L was a graduate 
 student, with psychological training and with considerable experi- 
 ence both as subject and as experimenter. 
 
 The experiment thus required 132 trials in each of three tasks, 
 by each of three observers, a total of 1,188 trials. The first 60 trials 
 in each test were used for the two purposes of reaching practise level 
 and of giving some sort of definition to the term ' ' as well as usual. ' ' 
 The remaining trials (648 in all) were used for the judgments of 
 personal efficiency. In computing results, the median of the 7 trials 
 preceding the trial being judged was taken as the standard of com- 
 parison. The term "as usual" was found to refer no further back 
 than the previous half dozen days or trials. The median was chosen 
 rather than the average because it makes due allowance for occa- 
 sional large variations, which the introspections of the observers 
 showed to be allowed for in the judgments of performance. Each 
 trial is thus compared with the median of the 7 trials immediately 
 preceding it. The direction and amount of difference between the 
 two serve as the objective measure of the efficiency of the trial in 
 question. Comparison of this measure with the observer's subjective 
 estimate of his performance will in this way afford a measure of the 
 correctness of his judgment. Comparison of the amount of this 
 difference with the degree of confidence will show the relation of the 
 feeling of certainty to the variation in performance. Since the 
 time of the performance is not quite the same in all tests nor for all 
 observers (although very nearly so in both cases) in some of the 
 tables the absolute differences between standard and single trial are 
 converted into percentages of the total time for the individual or 
 task in question. 
 
 Table I. gives the average results for the three observers, for each 
 of the three processes, showing the average deviation from the usual 
 performance on which each degree of confidence was based (A.S.D. = 
 Average Stimulus Difference). The first part of the table gives the 
 absolute variations in seconds, the latter part giving these variations 
 when expressed as per cent, of the average total time required for 
 the test in question. Table II. gives the same results, when assembled 
 regardless of sign or direction of variation, but classified according
 
 8 
 
 EXPEEIMENTAL STUDIES IN JUDGMENT 
 
 to degree of confidence only. Table III. gives the per cent, correct- 
 ness of all these degrees of confidence and in both directions of varia- 
 tion. This table also gives the distribution of these judgments, thus 
 showing the number of cases on which each average is based. Table 
 IV. gives these same records, regardless of sign. Table V. gives the 
 total distribution of the judgments when classified merely as "judg- 
 ments of better" and "judgments of worse." The table also gives 
 the actual distribution of the records when thus classified. Both 
 absolute numbers and percentages are given. The sign is used 
 to indicate "better" (requiring less time) and -{- to indicate "worse" 
 (requiring longer time) than usual. 
 
 TABLE I 
 
 SHOWING ABSOLUTE AND PERCENTTLE DEVIATIONS FBOM "USUAL" ON WHICH 
 
 THE VARIOUS DEGREES OF CONFIDENCE WERE BASED ; CALLED, IN FOLLOW - 
 
 PAGES, A.S.D (AVERAGE STIMULUS DIFFERENCE). TABLE GIVES 
 
 AVERAGE CONSTANT ERRORS AND AVERAGE M.V. 's 
 
 FROM THESE CONSTANT ERRORS 
 
 
 Better 
 
 
 A 
 
 B 
 
 C 
 
 D 
 
 Test 
 
 A.S.D. M.V. 
 
 A.S.D. M.V. 
 
 A.S.D. M.V. 
 
 A.S.D. M.V. 
 
 Seconds: Tapping.. 
 
 - 1.5 0.8 
 
 -1.2 0.9 
 
 -0.7 0.8 
 
 -0.5 1.3 
 
 Colors 
 
 - 2.9 1.3 
 
 -1.5 1.5 
 
 -0.6 1.3 
 
 -0.8 1.3 
 
 Opposites . 
 
 - 3.0 0.8 
 
 -1.7 1.3 
 
 -0.7 1.2 
 
 -0.1 1.4 
 
 Per cent. : Tapping . . 
 
 - 3.3 
 
 -3.2 
 
 -1.7 
 
 -1.3 
 
 Colors 
 
 - 9.6 
 
 -5.0 
 
 -2.0 
 
 -2.8 
 
 Opposites. 
 
 -11.6 
 
 -6.6 
 
 -2.7 
 
 -0.4 
 
 Av. per cent 
 
 - 8.4 
 
 -4.9 
 
 -2.1 
 
 -1.5 
 
 
 Worse 
 
 
 A 
 
 B 
 
 C 
 
 D 
 
 Test 
 
 A.S.D. M.V. 
 
 A.S.D. M.V. 
 
 A.S.D. M.V. 
 
 A.S.D. M.V. 
 
 Seconds: Tapping. . 
 
 + 2.5 0.9 
 
 +1.4 1.0 
 
 +0.9 0.6 
 
 +0.9 1.0 
 
 Colors. . . . 
 
 + 3.7 0.6 
 
 +2.0 1.2 
 
 +1.6 1.3 
 
 +0.6 1.9 
 
 Opposites . 
 
 + 5.4 1.2 
 
 +1.6 1.2 
 
 +1.3 1.7 
 
 +0.7 1.4 
 
 Per cent. : Tapping . . 
 
 + 6.5 
 
 +3.6 
 
 +2.4 
 
 +2.4 
 
 Colors .... 
 
 +12.3 
 
 +6.6 
 
 +5.1 
 
 +2.0 
 
 Opposites . 
 
 +21.0 
 
 +6.0 
 
 +5.2 
 
 +2.6 
 
 Av. per cent 
 
 + 13.3 
 
 +5.4 
 
 +4.2 
 
 +2.3 
 
 TABLE II 
 SHOWING STIMULUS DIFFERENCES EEGARDLESS OF THEIR DIRECTION 
 
 Absolute Differences 
 
 Test* A B C D 
 
 Tapping 2.0 1.3 .8 .7 
 
 Color-naming 3.3 2.0 1.1 .7 
 
 Opposites 4.2 1.6 1.0 ^ 
 
 Averages 3.2 1.6 1.0 .7 
 
 Percentile Differences 
 
 A 
 5.2 
 
 10.9 
 16.3 
 10.8 
 
 B 
 
 3.4 
 
 5.8 
 6.3 
 5.2 
 
 c 
 
 2.0 
 3.6 
 3.9 
 3.2 
 
 D 
 
 1.9 
 2.4 
 1.5 
 1.9
 
 JUDGMENTS OF PEBSONAL EFFICIENCY 9 
 
 TABLE III 
 
 SHOWING THE CORRECTNESS AND- DISTRIBUTION OF THE VARIOUS DEGREES OF 
 
 CONFIDENCE 
 
 Better Worse 
 
 Test -A -B -C -D +A +B +C +D 
 
 Tapping 87 82 74 57 100 77 79 72 
 
 Per cent, correct Color-naming 100 82 58 59 100 80 79 58 
 
 Opposites HX) 85 69 53 100 78 70 56 
 
 Averages. 96 83 67 56 100 79 76 62 
 
 _. ., . , Tapping. . 31 37 36 27 15 19 24 27 
 
 Distribution of the c^.^^g . . . . 16 38 43 43 9 15 19 33 
 
 Judgme Opposites ^J?.^ 2 ! 1515 28 37 
 
 Totals 59 117 119 97 39 49 71 97 
 
 TABLE IV 
 
 SHOWING CORRECTNESS AND DISTRIBUTION OF THE JUDGMENTS REGARDLESS or 
 
 SIGN 
 
 Percentile Correctness Distribution 
 
 Teat A B C D A B C D 
 
 Tapping 94 80 77 65 46 56 60 54 
 
 Color-naming 100 81 68 59 25 53 62 76 
 
 Opposites 100817054 27 57 68 64 
 
 Averages 98 81 73 59 Totals 98 166 190 194 
 
 TABLE V 
 
 SHOWING THE DISTRIBUTION OF THE JUDGMENTS AND OF THE ACTUAL RECORDS, 
 WITH RESPECT TO "BETTER" AND "WORSE" 
 
 Test Worse Better Total 
 
 Distribution Tapping 85 (39%) 131 (61%) 216 
 
 of the Color-naming 76 (35%) 140 (65%) 216 
 
 Judgments Opposites 95 (44%) 121 (56%) 216 
 
 Totals 256 (39%) 392 (61%) 648 
 
 Distribution Tapping 99 (46%) 117 (54%) 216 
 
 of the Color-naming 94 (43%) 122 (57%) 216 
 
 Actual Cases Opposites 92 (42%) 124 (58%) 216 
 
 Totals 285 (44%) 363 (56%) 648 
 
 Several interesting points are suggested by these tables : 
 1. The observer's judgments of the efficiency of his own perform- 
 ance, in successive daily trials in these tests, have a reliability which 
 varies with the confidence of the judgments. Judgments of "abso- 
 lutely certain" are always correct (100 per cent.) except in the case 
 of judgments of superior performance in tapping, where the average 
 per cent, correctness of the three observers is 87 per cent. Judg- 
 ments which are "fairly certain" and "slightly certain" show 80 
 per cent, and 70 per cent, correctness respectively. "Pure guesses" 
 2
 
 10 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 are correct in 60 per cent, of the cases. In all tests with all observers 
 the correctness of pure guesses is greater than that to be expected 
 from mere chance. This result accords with those of earlier investi- 
 gations on judgments of sensory discrimination (Cattell, Griffing, 
 Henmon, Jastrow, etc.). 
 
 2. Judgments of "better" seem to be based on smaller variations 
 than are judgments of ' ' worse. ' ' If the average of all three tests is 
 regarded this is true of all degrees of confidence. Almost twice as 
 great per cent, inferiority is found for a given type of judgment of 
 "worse" as that per cent, of superiority required to produce a judg- 
 ment of "better." Considering the tests separately this rule holds 
 of all the judgments except in the cases of the B judgments in oppo- 
 sites and the D judgments in color-naming, in which cases no con- 
 siderable difference whatever is present. In the case of the three 
 observers this rule holds without exception in the case of the A 
 judgments in all tests. The remaining degrees of confidence do not 
 show the relation clearly in the individual records. 
 
 There are three possible explanations of this apparently finer dis- 
 crimination in the case of judgments of superior efficiency. 
 
 A. It may indicate merely a predisposition on the part of the 
 performer to judge his work as good rather than as poor, thus 
 revealing only a prejudice in favor of judgments of "better." If 
 this is the case, the variations in performance on which these "better" 
 judgments are based will be small because of the frequent occurrence 
 of inferior trials which are judged to be superior. This would result 
 in a reduction of the threshold for the class of judgments in question, 
 since frequent -f- variations would cancel the larger variations. 
 But if this were the case the judgments of "better" would show a 
 lower percentage of correctness than that of the judgments of 
 "worse" since the latter would have been based for the most part 
 on only the more pronounced cases of inferior performance. 
 
 But reference to the table which gives the correctness of the 
 various classes of judgments does not clearly show this to have been 
 the case. In the case of opposites the "better" judgments are no less 
 correct than are the judgments of "worse." In fact the total correct- 
 ness is slightly higher in the case of the former. In color-naming 
 the same thing is true for A, B, and D judgments. Only in the case 
 of the C judgments is there an exception. Tapping alone affords a 
 slightly greater percentage of correctness in the case of the ' ' worse 
 judgments. The average results of the three tests give 76 per cent. 
 and 79 per cent, correct in the two directions. Or if the categories 
 be disregarded in the computation of correctness, 75 per cent, of the 
 "worse" judgments are correct and 76 per cent, of the "better."
 
 JUDGMENTS OF PERSONAL EFFICIENCY H 
 
 The judgments of "better" are then about as correct on the whole as 
 those of "worse," and this in spite of the fact that the former are 
 based on much smaller variations in efficiency. It does not yet seem 
 then that prejudice in favor of efficiency judgments affords adequate 
 explanation of the differences in threshold. 
 
 B. The relation may be supposed to follow from the mere fact 
 that, when a performer is approximating his physiological level there 
 will occur very few large deviations in the direction of superiority, 
 whereas occasional lapses, interferences, distractions, and accidents 
 might produce large deviations in the direction of inferiority. These 
 large deviations then would tend to increase the average variations 
 from the standard in the case of the judgments of "worse" beyond 
 the point which might be actually necessary as the ground for the 
 given type of judgment. 
 
 The possibility that the larger variations for "worse" judgments 
 are merely the result of accidental large inferior deviations is not so 
 
 TABLE VI 
 
 SHOWING THE DISTRIBUTION OF THE ACTUAL RECORDS (DEVIATIONS FROM 
 "USUAL"), WITH RESPECT TO THEIR MAGNITUDE 
 
 Tapping: 
 
 Sec. 
 
 0-1 
 
 1-2 
 
 2-3 
 
 3-4 
 
 4-5 
 
 5-6 
 
 6-7 
 
 7-8 
 
 8-9 
 
 H + 
 
 9 
 
 15 
 
 7 
 
 1 
 
 
 
 
 
 
 
 
 17 
 
 8 
 
 11 
 
 1 
 
 
 
 
 
 
 -S + 
 
 6 
 
 12 
 
 8 
 
 3 
 
 
 
 1 
 
 1 
 
 
 
 
 
 10 
 
 13 
 
 12 
 
 3 
 
 1 
 
 
 
 
 
 
 
 L + 
 
 14 
 
 13 
 
 3 
 
 1 
 
 
 
 
 
 
 - 
 
 25 
 
 14 
 
 2 
 
 
 
 
 
 
 
 
 Total -f 
 
 29 
 
 30 
 
 18 
 
 5 
 
 
 
 1 
 
 1 
 
 
 
 
 
 
 
 52 
 
 35 
 
 25 
 
 4 
 
 1 
 
 
 
 
 
 
 
 
 
 9 
 
 6 
 
 4 
 
 5 
 
 2 
 
 3 
 
 3 
 
 1 
 
 9 
 
 6 
 
 7 
 
 8 
 
 4 
 
 2 
 
 1 
 
 
 
 10 
 
 8 
 
 4 
 
 4 
 
 
 
 1 
 
 
 
 12 
 
 14 
 
 5 
 
 5 
 
 2 
 
 1 
 
 
 
 5 
 
 6 
 
 2 
 
 3 
 
 2 
 
 2 
 
 
 1 
 
 Color naming: H + 
 
 S + 
 
 L + 
 
 - 20 10 9 5 
 
 Total + 24 20 19 12 
 
 41 30 21 18 
 
 Opposite*: H + 7784421 1 
 
 
 
 S + 
 
 7 
 
 7 
 
 8 
 
 4 
 
 4 
 
 2 
 
 1 
 
 13 
 
 12 
 
 8 
 
 1 
 
 3 
 
 1 
 
 
 
 11 
 
 2 
 
 8 
 
 2 
 
 3 
 
 3 
 
 
 
 14 
 
 11 
 
 9 
 
 6 
 
 2 
 
 
 
 1 
 
 6 
 
 9 
 
 5 
 
 4 
 
 2 
 
 
 
 1 
 
 L + 
 
 - 18 12 6 4100 
 Total + 24 18 21 10 9 5 2 
 
 - 45 35 23 11 6 1 1
 
 12 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 easily disposed of, but there seems to be sufficient evidence to show 
 that this factor is not the only one at work. As a matter of fact, when 
 the -f- and variations are grouped, as in Table VI., according 
 to their magnitude, there is found to be no excessive number of infe- 
 rior records, although such were theoretically possible, and would 
 perhaps have occurred had not the performers been both zealous and 
 competitive, and nearly on a practise level. In tapping, the largest 
 variations (over 3 sec.) show almost equal distribution for all sub- 
 jects. The variations predominate in the smaller groups as the 
 result of slight practise in the course of the experiment. In color- 
 naming the variations are larger than in tapping, but since there con- 
 tinued to be considerable practise in this test, large deviations are 
 just as frequent as are large inferior records. In fact, with G the 
 former are more numerous. But the color-naming shows the supe- 
 riority of judgments of "better" for A, B, and C degrees of con- 
 fidence, and one need not expect to find it in the D judgments, which 
 were pure guesses. In the case of opposites we clearly have a pre- 
 ponderance of large inferior trials, with all observers. If this is 
 the factor which is responsible for the higher averages of the ' ' worse ' ' 
 judgments, we ought then to find this result most striking in the 
 opposites test. But just the reverse is the case. Opposites is just the 
 test which affords several exceptions to the generalization. 
 
 Moreover, even if there were a considerable excess of large positive 
 deviations (worse) these would only affect necessarily the A judg- 
 ments. The B, C, and D judgments would still be based on variations 
 chosen by the observer at will. But the B, C, and D judgments show 
 the same tendency, on the whole, as do the A judgments, smaller 
 variations for judgments of "better" than for equally confident 
 judgments of "worse." 
 
 C. The present indication seems to be, then, that efficiency is 
 judged on the basis of smaller variations than is inefficiency. Does 
 this mean that the criteria of judgments of efficiency are more definite 
 or more numerous or more clearly detected, and hence that the 
 "feeling of efficiency" arises on smaller provocation than does the 
 "feeling of inefficiency"? The point constitutes an interesting 
 problem for future work, and will be taken up again in a later 
 chapter. 
 
 3. Progressively larger variations in performance (both absolute 
 and relative) are required as the basis of judgments of a given degree 
 of confidence, as one passes from tapping, through color-naming, to 
 opposites. With A judgments (see also records regardless of sign) 
 this increase is very apparent. Judgments are passed with absolute 
 certainty on the basis of an average deviation of 5.2 per cent, in
 
 JUDGMENTS OF PERSONAL EFFICIENCY 13 
 
 tapping, but in color-naming 10.9 per cent, and in opposites 16.3 
 per cent, deviation is necessary to produce A judgments. The C 
 judgments show this same increase without exception, and the B 
 judgments differ only in the case of judgments of "worse" in oppo- 
 sites. The D judgments (pure guesses) show, as might be expected, 
 no clear differences. 
 
 These differences in performance required for judgments of a 
 given degree of confidence are not entirely a function of the varia- 
 bility of the trials in the three tests. The largest number of A 
 judgments, as well as the smallest percentile variation for a given 
 kind of judgment, comes in the tapping test, which is the least 
 variable performance with all three individuals, in terms of per cent, 
 variability. If the absolute variability be considered, the three tests 
 all show practically the same mean variability, which varies from 
 1 to 2 seconds. Table VII. shows the average total time and the M.V. 
 of 25 consecutive trials in each test, the trials being taken from the 
 middle section of the experiment. 
 
 TABLE VII 
 
 SHOWING THE VARIABILITY OP THE TESTS 
 
 H G L 
 
 Test Av. M.V. M.V.% Av. M.V. M.V.% Av. M.V. M.V.% 
 
 Tapping 40.3 1.5 3.7 40.1 2.0 5.0 36.5 1.0 2.7 
 
 Color-naming 28.9 2.0 6.6 27.3 1.4 5.1 27.7 2.0 7.2 
 
 Opposites 30.3 2.0 6.6 26.6 1.8 6.7 23.7 1.6 6.7 
 
 This progression is doubtless partly dependent on decrease ifl 
 the objectivity and automatic character of the three kinds of work. 
 The more automatic and motor the work the greater the precision of 
 the judgment of efficiency of performance. As the task comes to in- 
 volve a greater proportion of more strictly mental work (association, 
 memory, discrimination, choice, etc.) the judgments delivered with a 
 given degree of confidence come to require larger and larger varia- 
 tions. Does this change involve a shift in the criteria (as for ex- 
 ample, a shift from estimates of mere duration to reliance on affec- 
 tive processes, feelings of ease, smoothness, pleasantness, etc.) ? Or 
 does it involve merely a greater degree of some fairly constant cri- 
 terion or criteria-complex? Is it perhaps due to the mere fact that 
 there is better opportunity to observe the efficiency of an automatic 
 process since' it requires little attention itself? Systematic introspec- 
 tion during such an experiment would doubtless throw interesting 
 light on the basis of the feeling of efficiency, and perhaps on the af- 
 fective consciousness generally. Comparison of the judgments of a 
 witness with judgments of the performer would be especially inter-
 
 14 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 esting. The following two chapters will report an experiment in 
 which these additional factors were studied. 
 
 4. In all three tests the various 'degrees of confidence have a very 
 constant ratio of correctness. About 60 per cent, of the D judg- 
 ments, 70 per cent, of the C judgments, 80 per cent, of the B judg- 
 ments, and 98 per cent, of the A judgments, are correct. Fullerton 
 and Cattell point out that these ratios of correctness do not measure 
 the intensity or amount of the feeling of confidence. The truth of 
 this statement is obvious when one reflects that 50 per cent, of the 
 judgments should be correct by mere chance. Perhaps a fairer meas- 
 ure of the amount of confidence is secured by subtracting this 50 per 
 cent, chance correctness from each total correctness, thus leaving the 
 various degrees of confidence as represented by magnitudes A (48), 
 B (30), C (20), D (10). This would make the zero point the amount 
 of confidence possessed by a judge who had absolutely no knowledge 
 of what had happened. A still fairer measure would perhaps be the 
 P.E. required for the given per cent, correctness. That there is, in 
 the present experiments at least, a greater distance between the feel- 
 ing of absolute certainty (A) and the first degree of uncertainty (B) 
 than there is between the various degrees of uncertainty, agrees with 
 the writer's own introspections. This is also borne out by the fact 
 that the variations underlying these degrees of confidence do not 
 increase by equal steps, but almost by equal multiples. The average 
 deviation of the C judgments, regardless of sign, is about twice that 
 of the D judgments, that of the J?'s twice that of the C"s, and that of 
 the A 's twice that of the B 's, if the percentile deviations be consid- 
 ered. If the absolute deviations be taken, they increase by 50 per 
 cent, increments from D to C and from C to B, but the step from 
 B to A represents an addition of 100 per cent, over the B judgments. 
 
 By referring to tables for determining the P.E. from the per- 
 centage of right cases and amount of difference, as in the method of 
 right and wrong cases, we get: 
 
 Degree of confidence A B C D 
 
 Average difference per cent 10.8 5.2 3.2 1.9 
 
 Per cent, right judgments 98 81 73 59 
 
 Diff./P.E 3.05 1.30 .91 .34 
 
 P.E 3.1 4.0 3.5 5.6 
 
 That is to say, the average probable error, the amount of variation 
 which will be judged correctly in 75 per cent, of the cases, is about 
 4 per cent, of the "usual" record. 
 
 5. Individual differences in the use of the various degrees of con- 
 fidence, in the percentile correctness, and in the probable error, have
 
 JUDGMENTS OF PEESONAL EFFICIENCY 15 
 
 been pointed out by Fullerton and Cattell and by Henmon. The 
 present study of but three observers does not afford sufficient mate- 
 rial for individual comparisons of any reliability. The numbers of 
 cases of a given sort vary from individual to individual and in some 
 instances are small. With respect to the amount of variation on 
 which the various judgments are based, the results are much the same 
 for all observers in those cases in which the number of trials is large 
 enough to make comparison reliable. The same thing must be said 
 of the correctness of judgment. Such differences as are found are 
 either small or are in no consistent direction. With respect to the 
 distribution of judgments ("better" or ''worse") in tapping no in- 
 dividual differences are present. The judgments of "better" are 
 somewhat in excess, but so are the actual cases of superior perform- 
 ance, to a slight degree. In color-naming the judgments of G and L 
 are skewed considerably toward the ' ' worse ' ' side, but the actual cases 
 are distributed in much the same way with these two observers. With 
 H the actual cases of each sort are equal and the distribution of judg- 
 ments is uniform. In the case of opposites much the same situations 
 are present. 
 
 Summary 
 
 1. The study of the conditions, validity, and laws of judgments of 
 personal efficiency offers a fruitful field of inquiry, with respect to 
 the psychology of judgment, the learning process, affective conscious- 
 ness, the psychology of work, and individual differences. 
 
 2. In the tests examined, an individual's judgments of the effi- 
 ciency or inefficiency of his own performance possess a degree of 
 correctness which varies with his degree of confidence. In this re- 
 spect judgments of performance resemble judgments of sensory dis- 
 crimination and of recognition memory. The relative per cent, cor- 
 rectness of the four degrees of certainty are 98, 80, 70, and 60. Pure 
 guesses are more likely to be right than wrong. 
 
 3. The feeling of efficiency arises on slighter provocation than 
 does the feeling of inefficiency. Judgments of greater efficiency, hav- 
 ing a given degree of confidence, are based on smaller variations in 
 performance than are equally confident judgments of inferior per- 
 formance. 
 
 4. Judgments of "better than usual" show nearly as high per 
 cent, correctness as do judgments of "worse than usual," although 
 the former are based on variations about one half as great as those on 
 which the latter are based. 
 
 5. There is a slight predisposition toward the delivery of judg- 
 ments of "better," the distribution being, however, on the average,
 
 16 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 within 5 per cent, of the actual ratio of occurrence of superior and 
 inferior trials. 
 
 6. Progressively larger variations in performance are required as 
 the basis for judgments of a given degree of confidence as one passes 
 from an automatic, objectively observable, motor performance (such 
 as tapping), through work involving perceptional reactions (color- 
 naming), to work of a more strictly mental and less objectively ob- 
 servable character (opposites). 
 
 7. No evidence is here afforded on the question as to whether this 
 decreasing precision of judgment depends on a shift in criteria (as 
 from estimates of duration to reliance on affective processes) or on 
 the greater intensity or clearness of some fairly constant criterion or 
 criteria-complex. 
 
 8. A variation of 4 per cent, from "usual" will be judged cor- 
 rectly in 75 per cent, of the cases in which it occurs. 
 
 9. Judgments of A, B, C, and D degrees of confidence show a per 
 cent, correctness which is respectively 48 per cent., 30 per cent., 20 
 per cent., and 10 per cent, greater than would result from chance 
 estimates. (A/P.E = 3.05, 1.30, .90, .34). These ratios are con- 
 firmed by introspection as approximate measures of the intensity of 
 the "feeling of certainty" in the four cases. These ratios do not 
 differ essentially from the corresponding degrees of correctness of 
 similar judgments of sensory discrimination and recognition memory. 
 They depend, in part, however, on the character and difficulty of the 
 task and on the range of variation in stimulus, stimulus difference, 
 and time of performance. 
 
 10. The number of observers is insufficient for the determination 
 of the nature or degree of individual differences.
 
 CHAPTER II 
 
 PERCEPTUAL CRITERIA OF JUDGMENTS OF EFFICIENCY 
 
 IN daily life these judgments of personal efficiency are frequently 
 expressed. A worker asserts that his work is "going unusually 
 well," that he is "in fine form," or, on the other hand, that he is 
 "not himself," that his work is not "up to its usual standard," etc. 
 Not only does the performer himself pass such judgments, but wit- 
 nesses may make similar remarks. These judgments may be deliv- 
 ered with varying degrees of confidence, ranging from pure guessing 
 to absolute assurance. They are passed on muscular work involving 
 only strength or endurance, on work requiring more or less coordi- 
 nation, on work involving sensory discrimination and perceptional 
 reaction, and on more exclusively mental work. Shovelling coal, 
 riding a bicycle, playing tennis, target shooting, mathematical calcu- 
 lation, and writing sonnets represent such gradations in daily life. 
 
 In many of these concrete situations the judgments of personal 
 efficiency may be determined or supported by reference to the ob- 
 jective result of the work, the wages earned, the score attained, etc. 
 In such cases we should perhaps speak of "inferences" rather than 
 of "judgments." But even in the absence of knowledge of the ob- 
 jective results a worker may estimate the efficiency of his work, and 
 in these cases he does it by some direct process which seems, before 
 analysis at any rate, to be correctly described as "judgment" of the 
 most primary sort. Such judgments are often said to be the expres- 
 sion of "feelings," feelings of efficiency, of inefficiency, etc. 
 
 In the preceding chapter was reported a study of the distribution, 
 confidence, and accuracy of such judgments. The present chapter 
 presents the results of a further study designed to investigate the 
 characteristics and criteria of these judgments, the way in which 
 these features vary with the nature of the task, the effects of practise 
 on the correctness of the judgments, and the relation, in all these 
 respects, between judgments of one's own performance and judg- 
 ments of the work of another person. 
 
 Four observers have taken part in the experiment, two men and 
 two women, the two men being professional psychologists, one of the 
 women an experienced psychological observer, the other a beginner. 
 The work consisted in the repeated performance of four standard 
 laboratory tests, 1 as follows : 
 
 i For further discussion of the nature, technique and significance of these 
 
 17
 
 18 EXPESIMENTAL STUDIES IN JUDGMENT 
 
 (a) Color-naming, the Woodworth-Wells blank, containing 20 
 repetitions of each of 5 colors, the four positions of the card being 
 used in succession. 
 
 (6) Naming Opposites, a list of 50 adjectives, the antonyms to 
 which were to be given as quickly as possible. The list was one used 
 by the writer in previous studies, the average time of naming the 
 opposites ranging from 2 to 5 seconds per word. The 50 words oc- 
 curred in chance order, the order being changed at each trial. 
 
 (c) Cancellation, crossing out the 3's and 5's from the Wood- 
 worth-Wells form of this test, the first 10 lines, containing 50 repeti- 
 tions of each digit, being used. 
 
 (d) Addition, adding 17 mentally to each of 50 two-place num- 
 bers and calling out the correct answer. The numbers occurred in 
 changed random order at each trial. 
 
 The time of performance was taken, in fifth-seconds, for each 
 trial, the quantity and quality of the work being maintained con- 
 stant. Each observer made 104 trials at each test, the first 4 trials 
 being considered preliminary. After the completion of the trial, and 
 before the operator had recorded or even noticed the time measure- 
 ment, both performer and operator judged the performance to have 
 been either "better than usual" or "worse than usual," and as- 
 signed to the judgment one of four degrees of confidence, A, B, C, 
 or D (A representing absolute certainty, and D a mere guess). Both 
 judgments were recorded independently, after which the objective 
 measurement was recorded. This procedure thus yields, for each of 
 the four tests, 100 judgments from each of four performers and 100 
 judgments from each of four witnesses, a total of 3,200 judgments. 
 The experiment occupied a two-hour session on each of 9 successive 
 days, and 10 to 12 trials of each test were made at each sitting. 
 
 Toward the close of the experiment each subject was given the 
 following schema for systematic introspection. The two arrange- 
 ments of criteria were made on separate occasions, the first on the 
 eighth and the second on the ninth day. After the completion of the 
 experiment each observer was asked to answer the supplementary 
 
 questions. 
 
 SCHEMA FOR INTROSPECTION 
 
 A. Feelings of ease and comfort or of strain and uncertainty as the test pro- 
 
 ceeds. 
 
 B. Feelings of pleasantness and satisfaction or of unpleasantness and dissatis- 
 
 faction, either during the test or after its completion. 
 
 C. The perception of the smoothness and regular flow or of the roughness and 
 
 irregularity of the performance. 
 
 tests, and their usefulness as psychological instruments, the reader is referred to 
 the writer 's monograph, ' ' The Influence of Caffein on Mental and Motor Effi- 
 ciency," ARCHIVES OF PSYCHOLOGY, No. 22. 1912. Science Press.
 
 EXPERIMENTAL STUDIES IN JUDGMENT 19 
 
 D. Direct estimate of the total time interval or duration of the test from be- 
 
 ginning to end, regardless of what happens during the performance of 
 the test. 
 
 E. Perception of the speed or rate of succession of the separate acts which the 
 
 test involves (as each word, each problem, etc.). 
 
 F. Inference based on the number or amount of specific mistakes, hesitations, 
 
 successes, observed during the test or remembered after its completion. 
 
 G. Feelings of surprise, or of fulfilled or unfulfilled expectation, when the end 
 
 of the test is reached. 
 
 H. Unanalysable and indefinable feeling of efficiency or of inefficiency. 
 J. Any other specific criteria which you may have noted. 
 
 QUESTIONS ON THE SCHEMA 
 
 1. Think over the way in which you judge your own performance in each of 
 the tests. Arrange the above factors in the order of their importance with re- 
 spect to the degree to which they constitute the basis or criteria for your judg- 
 ments of your own work. Place the most important first, then the next in im- 
 portance, etc. Do this separately for each of the four tests. 
 
 2. Now think over the way in which you judge the performance of another 
 person, and arrange the above criteria in the order of their importance, sepa- 
 rately for each of the four tests, as was done in question 1. 
 
 SUPPLEMENTARY QUESTIONS 
 
 1. When do you feel the greater security or certainty, when judging your 
 own performance or when judging that of another person? 
 
 2. In which case do you think you can detect smaller changes or variations 
 in efficiency of performance, when judging yourself or when judging another 
 person, in these tests? 
 
 3. In which of these four tests do you think your judgments are delivered 
 with the greatest degree of confidence? Arrange the four tests in order of de- 
 creasing confidence, both for when judging yourself and for when judging 
 another person. 
 
 4. In which of the tests do you believe you can detect the smallest changes 
 in performance? Arrange the four tests in order, for this point, as in the pre- 
 ceding question. 
 
 5. When judging your own performance and that of another, which of the 
 following is or are true? 
 
 (a) A judgment is made tentatively during the performance and this judg- 
 ment is modified and corrected as the test proceeds, the judgment thus being 
 ready at the moment when the test is completed. 
 
 (6) No judgment is made until the test is all completed, when the judgment 
 is formed by thinking back over the test as a whole, as it was performed on the 
 given occasion. 
 
 (c) At the end of the test the judgment simply comes, of its own accord, 
 and fully formed. It is not made tentatively during the test, nor is it necessary 
 to think back over the particular performance. 
 
 The present paper will present the results of this systematic in- 
 trospection, an examination of the total per cent, correctness of the 
 judgments, a statement of the influence of practise on correctness, a
 
 20 
 
 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 TABLE VIII 
 INDIVIDUAL ARRANGEMENTS OP THE CRITERIA OP JUDGMENT 
 
 The Test 
 
 Color naming: 
 
 Naming opposites: 
 
 Cancellation: 
 
 Adding: 
 
 Order when Judging Self 
 
 Observers 
 Position H, P, L, R 
 
 Order when Judging Another Person 
 
 Observers 
 Position H, P, L, R 
 
 1 
 
 C 
 
 E 
 
 E 
 
 F 
 
 1 
 
 C 
 
 C 
 
 F 
 
 H 
 
 1 
 
 # 
 
 C 
 
 F 
 
 A 
 
 2 
 
 E 
 
 E 
 
 C 
 
 C 
 
 3 
 
 4 
 
 F 
 
 C 
 
 C 
 
 3 
 
 F 
 
 F 
 
 D 
 
 A 
 
 4 
 
 F 
 
 A 
 
 B 
 
 D 
 
 4 
 
 G 
 
 D 
 
 E 
 
 E 
 
 6 
 
 G 
 
 B 
 
 A 
 
 H 
 
 5 
 
 D 
 
 (A 
 
 G 
 
 F 
 
 6 
 
 B 
 
 H 
 
 D 
 
 B 
 
 6 
 
 A 
 
 \ B 
 
 ( H 
 
 D 
 
 7 
 
 D 
 
 ( D 
 
 (G 
 
 G 
 
 7 
 
 B 
 
 \ G 
 
 < A 
 
 B 
 
 8 
 
 H 
 
 \G 
 
 \H 
 
 E 
 
 8 
 
 H 
 
 (H 
 
 (B 
 
 G 
 
 1 
 
 E 
 
 E 
 
 F 
 
 H 
 
 1 
 
 E 
 
 c 
 
 F 
 
 A 
 
 2 
 
 A 
 
 C 
 
 C 
 
 C 
 
 2 
 
 C 
 
 E 
 
 C 
 
 C 
 
 3 
 
 C 
 
 F 
 
 G 
 
 E 
 
 3 
 
 D 
 
 F 
 
 D 
 
 E 
 
 4 
 
 F 
 
 A 
 
 B 
 
 B 
 
 4 
 
 F 
 
 D 
 
 E 
 
 F 
 
 5 
 
 B 
 
 B 
 
 E 
 
 G 
 
 5 
 
 G 
 
 (A 
 
 G 
 
 H 
 
 6 
 
 G 
 
 H 
 
 A 
 
 F 
 
 6 
 
 A 
 
 1 
 1 7> 
 J ^ 
 
 (H 
 
 B 
 
 7 
 
 D 
 
 ( D 
 
 D 
 
 D 
 
 7 
 
 B 
 
 \ /~v 
 
 \ VT 
 
 
 G 
 
 8 
 
 H 
 
 \G 
 
 { H 
 
 A 
 
 8 
 
 H 
 
 u 
 
 IB 
 
 D 
 
 1 
 
 E 
 
 C 
 
 A 
 
 C 
 
 1 
 
 E 
 
 C 
 
 E 
 
 H 
 
 2 
 
 C 
 
 F 
 
 E 
 
 E 
 
 2 
 
 F 
 
 E 
 
 F 
 
 F 
 
 3 
 
 F 
 
 E 
 
 F 
 
 F 
 
 3 
 
 G 
 
 F 
 
 C 
 
 E 
 
 4 
 
 D 
 
 A 
 
 H 
 
 A 
 
 4 
 
 A 
 
 D 
 
 D 
 
 C 
 
 5 
 
 A 
 
 H 
 
 B 
 
 B 
 
 5 
 
 B 
 
 ( B 
 
 G 
 
 B 
 
 6 
 
 B i 
 
 (B 
 
 D 
 
 D 
 
 6 
 
 C 
 
 \G , 
 
 !H 
 
 D 
 
 7 
 
 G 
 
 I Si 
 
 D 
 
 G 
 
 7 
 
 D 
 
 \A 
 
 A 
 
 G 
 
 8 
 
 H i 
 
 IG 
 
 { G 
 
 H 
 
 8 
 
 H 
 
 (H . 
 
 B 
 
 A 
 
 1 
 
 A 
 
 F 
 
 E 
 
 C 
 
 1 
 
 E 
 
 c 
 
 F 
 
 H 
 
 2 
 
 E 
 
 E 
 
 A 
 
 H 
 
 2 
 
 G 
 
 E 
 
 E 
 
 A 
 
 3 
 
 C 
 
 C 
 
 B 
 
 G 
 
 3 
 
 F 
 
 F 
 
 C 
 
 C 
 
 4 
 
 F 
 
 A 
 
 C 
 
 E 
 
 4 
 
 C 
 
 D 
 
 D 
 
 B 
 
 5 
 
 D 
 
 B 
 
 F 
 
 F 
 
 5 
 
 A 
 
 (A 
 
 G 
 
 E 
 
 6 
 
 G 
 
 H 
 
 G 
 
 B 
 
 6 
 
 B 
 
 IB i 
 
 \ H 
 
 F 
 
 7 
 
 B 
 
 f D 
 
 D 
 
 D 
 
 7 
 
 D ' 
 
 \G < 
 
 A 
 
 D 
 
 8 
 
 H 
 
 [ G 
 
 H 
 
 A 
 
 8 
 
 H 
 
 (HI 
 
 (B 
 
 G 
 
 Brackets indicate criteria not used. 
 
 comparison of the process of judging one's self with that of judging 
 another person, and some points on individual and test differences. 
 The Criteria of Judgment. The eight items included in the 
 schema proved to be a complete enumeration of the criteria used by 
 all four observers. These eight criteria being arranged in order of 
 importance by each observer, for each test, and both for judging as 
 performer and for judging as witness, the final position of importance 
 for each criterion is determined by averaging the four arrangements
 
 EXPEE1MENTAL STUDIES IN JUDGMENT 21 
 
 for the given situation. The individual orders are given in Table 
 VIII. The average positions of the eight criteria are given in 
 Table IX. 
 
 It is clear at once, from Table IX., that criteria E (perception of 
 speed or rate of succession of the separate elements), C (perception 
 of the smoothness and regular flow or of the roughness and irregularity 
 of performance), and F (inference based on number and amount of 
 specific mistakes, hesitations, successes, etc.) are considered, and in the 
 order here given, the most important criteria, both for personal judg- 
 ments and for judgments as witness. This is further confirmed by 
 observation of the number of times each criterion was reported "not 
 used," out of a total of 32 possible situations (4 tests, 4 observers, as 
 performer and as witness). The figures are as follows: 
 
 Criterion Times Reported as Not Used 
 
 A 7 
 
 B 8 
 
 C 
 
 D 4 
 
 E 
 
 F 
 
 G 8 
 
 H 8 
 
 TABLE IX 
 FINAL AVERAGE POSITIONS OF ALL CRITERIA 
 
 When Judging One's Own Performance 
 
 Criterion Colors Opposites Cancellation Adding Grand Av. 
 
 A 3.5 5.0 3.5 3.8 3.9 
 
 B 5.3 4.5 4.0 5.3 4.8 
 
 C 2.3 2.3 2.8 2.8 2.5 
 
 D 6.0 7.0 5.8 6.5 6.3 
 
 E 3.0 2.5 2.0 2.3 2.4 
 
 F 2.5 3.5 2.8 3.8 3.1 
 
 G 6.8 5.5 7.5 5.8 6.4 
 
 H 6.8 5.8 6.3 6.0 6.2 
 Final Order, E-C-F-A-B-H-D-G 
 
 When Judging the Performance of Another Person 
 
 A 5.3 4.8 6.5 4.8 5.3 
 
 B 7.0 6.8 5.8 6.0 6.4 
 
 C 1.5 1.8 3.5 2.8 2.4 
 
 D 4.5 4.5 5.3 4.5 4.7 
 
 E 2.5 2.5 1.8 2.5 2.3 
 
 F 3.0 3.0 2.3 3.3 2.9 
 
 G 6.0 6.0 5.3 5.5 5.7 
 
 H 5.8 6.8 5.8 5.8 6.3 
 Final Order, E-C-F-D-A-G-H-B
 
 22 EXPEBIMENTAL STUDIES IN JUDGMENT 
 
 Criteria C, E, and F are the only ones never reported "not used." 
 The direct estimate of total time interval or duration (D) is given a 
 higher value (4.7) when judging another than when judging one's 
 self (6.3). Feelings of surprise () show a similar difference, which 
 is, however, only slight (5.7 and 6.4). Feelings of pleasantness or 
 unpleasantness (B) have a much higher value when judging one's 
 self (4.8) than when judging another person (6.4). Unanalyzable 
 feelings of efficiency or of inefficiency (H) average only slightly 
 higher when judging one's self and as a matter of fact only the un- 
 trained observer ever places this criterion higher than the sixth 
 position. 
 
 In general, then, the affective processes do not, in the opinion of 
 these four observers, play any considerable role as criteria of judg- 
 ments of efficiency in these tests. The criteria chiefly relied on are 
 directly perceptual in character (speed, smoothness or roughness) or 
 are inferences from particular delays or successes. Trained observ- 
 ers do not report an "unanalyzable feeling of efficiency," but point 
 to specific criteria of a perceptual character; nor is the estimate of 
 total time interval or duration important. The great difference be- 
 tween the positions of E (speed) and D (duration) seems to indicate 
 a probable direct and independent basis for judgments of speed of 
 performance, as is also found to be the case with judgments of the 
 characteristics of voluntary movements. 2 Because of the importance 
 of these perceptual factors, the judgments of the performance of 
 another person are based on the same criteria as are those of one's 
 own work. 
 
 Correctness of the Judgments. All four observers report greater 
 confidence when judging themselves, and believe themselves to be 
 more sensitive to changes in their own performance than in that of 
 another person. Table X. shows the per cent, correctness of the 
 judgments in all situations. In computing these results, the median 
 of the five trials preceding a given test was used as the standard of 
 comparison, or as a measure of "usual" performance. By usual is 
 thus meant the median record of the half -day's work immediately 
 preceding the trial in question. This standard was adopted after 
 questioning the observers as to the meaning which the term "usual" 
 had for them, and its use accords with the introspections of all four 
 observers. In the table the degree of confidence of the judgments is 
 ignored, since this matter will be taken up in the following chapter. 
 The judgment is counted correct or incorrect according as the record 
 did or did not differ, in the direction asserted, from the median of the 
 
 2 See Hollingworth, ' ' The Inaccuracy of Movement, ' ' pp. 40-62.
 
 PERCEPTUAL CEITEEIA OF JUDGMENTS OF EFFICIENCY 23 
 
 five preceding trials, regardless of both the amount of this deviation 
 and the degree of confidence. In connection with this table three 
 points are to be especially noted. 
 
 TABLE X 
 
 SHOWING THE PEE CENT. OF CORRECT JUDGMENTS 
 When Judging One's Self 
 
 Observers 
 
 Test H P L R Average 
 
 Color-naming 72 67 65 67 68 
 
 Opposite* 80 68 69 73 72 
 
 Cancellation 74 67 71 72 71 
 
 Addition 69 70 74 69 70 
 
 Averages 74 68 70 70 70 
 
 When Judging Another Person 
 
 Color-naming 60 69 60 52 60 
 
 Opposites 67 62 66 64 60 
 
 Cancellation 59 79 67 61 66 
 
 Addition 73 67 80 63 71 
 
 Averages 65 69 68 60 65 
 
 1. Within any given judgment situation there are no consider- 
 able individual differences in correctness. Such differences as are 
 present are not consistently individual. 
 
 2. Correctness when judging one's self is, on the average, only 
 about 5 per cent, higher than when judging another person. This 
 difference, such as it is, confirms the introspective reports of the four 
 observers. Its slight amount bears additional witness to the per- 
 ceptual character of the criteria of the judgments. Factors E, C, 
 and F are as directly observable in estimating another's work as 
 when judging one's own performance. The slight difference found 
 may be accounted for in part by the greater degree of attention given 
 to the process when one judges himself. 
 
 3. This average difference of about 5 per cent, is due to the first 
 three tests on the list (color-naming, opposites, and cancellation). 
 The per cent, superiority in the correctness of the personal judg- 
 ments in the various tests is + 12 per cent, for opposites, -+- 8 per 
 cent, for color-naming, -f- 5 per cent, for cancellation, and 1 per 
 cent, for adding. For the individual subjects these differences are 
 as shown in Table XI. If these small differences are at all signifi- 
 cant, they probably indicate only differences in the degree to which 
 one is able to take an objective attitude toward his own performance, 
 and color-naming and opposites would thus seem to involve processes
 
 24 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 more reflex in character than those involved in cancellation and 
 adding. 
 
 TABLE XI 
 
 SHOWING FOE EACH SUBJECT AND EACH TEST THE SUPERIORITY OF THE CORRECT- 
 NESS OF PERSONAL JUDGMENTS OVER THAT OF JUDGMENTS OF THE 
 PERFORMANCE OF ANOTHER PERSON, IN PER CENT. 
 
 Observer H P L R 
 
 Color-naming 13 6 3 9 
 
 Opposites 12 - 2 5 15 
 
 Cancellation 15 -12 4 11 
 
 Addition 4 3 6 6 
 
 Practise Effects. Table XII. gives the per cent, of correct judg- 
 ments for each section of 20 trials. There is no considerable practise 
 gain in correctness in the separate tests nor with the different observ- 
 ers. The fourth section (trials 61 to 80) tends to show greatest cor- 
 rectness, and quite uniformly. But in the personal judgments there 
 
 TABLE XII 
 
 SHOWING THE EFFECT OF PRACTISE ON CORRECTNESS OF JUDGMENT. THE FIGURES 
 
 INDICATE THE TOTAL NUMBER OF CORRECT JUDGMENTS DELIVERED 
 
 BY ALL FOUR OBSERVERS, IN EACH SITUATION 
 
 o 
 
 Trial* 
 
 1-20 
 21-40 
 41-60 
 61-80 
 81-100 
 
 is, aside from this, no gain. In judgments as witness there is, if the 
 grand totals be considered, a fairly well marked increase in cor- 
 rectness in the successive sections of the experiment. Further than 
 stating these points it is difficult to analyze out the practise factor. 
 The real gain is probably in all cases greater than the figures reveal, 
 because, as the experiment proceeded, the magnitude of the varia- 
 tions from trial to trial grew smaller and smaller, as the result of 
 practise in the tests themselves. Meanwhile the "usual" record also 
 became better and better. The same per cent, correctness (and, as 
 witness, a higher correctness) is maintained in spite of this decrease 
 in absolute variability. On the other hand, this is what would be 
 expected if something like Weber's law holds in such judgments. 
 
 The slightly superior correctness of the personal judgments is 
 present in all five sections of the experiment (see Tables XII. and 
 XIII.), but it decreases somewhat as the later sections are passed 
 through. This decrease seems to depend solely on such practise gain 
 
 Color Naming 
 Perf. Wit. Tot. 
 
 Opposites 
 Perf. Wit. Tot. 
 
 Cancellation 
 Perf. Wit. Tot. 
 
 Addition 
 Perf. Wit. Tot. 
 
 48 
 
 41 
 
 89 
 
 59 
 
 51 
 
 110 
 
 53 
 
 54 
 
 97 
 
 56 
 
 52 
 
 108 
 
 53 
 
 49 
 
 102 
 
 54 
 
 45 
 
 99 
 
 53 
 
 53 
 
 106 
 
 52 
 
 50 
 
 102 
 
 45 
 
 44 
 
 89 
 
 55 
 
 50 
 
 105 
 
 52 
 
 52 
 
 104 
 
 61 
 
 58 
 
 119 
 
 59 
 
 54 
 
 113 
 
 57 
 
 54 
 
 111 
 
 64 
 
 58 
 
 122 
 
 54 
 
 57 
 
 111 
 
 56 
 
 48 
 
 104 
 
 53 
 
 49 
 
 102 
 
 53 
 
 54 
 
 107 
 
 57 
 
 60 
 
 117
 
 PERCEPTUAL CRITERIA OF JUDGMENTS OF EFFICIENCY 25 
 
 as comes when the judgments are directed toward the work of another 
 person. 
 
 TABLE XIII 
 
 GRAND TOTAL CORRECTNESS, ALL TESTS, ALL OBSERVERS 
 
 Trials Judging Self Judging Another Totals 
 
 1-20 216 188 404 
 
 21-40 212 197 409 
 
 41-60 213 204 417 
 
 61-80 234 223 457 
 
 1-100 219 211 430 
 
 Showing Effect of Practise on the Correctness of the Judgments. Witness gains, 
 approximating finally the correctness of the performer. 
 
 Formulation of the Judgments. Individuals differ somewhat in 
 their methods of formulating the judgments, and the process varies 
 also with the test and with the judgment situation. Thus observer 
 H reports: "When judging myself, no judgment is usually formed 
 until the test is completed, in which case the judgment may either 
 seem to come of its own accord, fully formed, or it may require 
 thinking back over the trial and comparing it with other trials. But 
 when judging another person a tentative judgment is usually made 
 early in the performance and this judgment is modified as the test 
 proceeds, and is ready for delivery at the moment the test is com- 
 pleted. This is particularly true of cancellation and of addition. In 
 color-naming and opposites it is less true. ' ' 
 
 Similarly, observer L reports: ''I seem to form judgments in all 
 three ways suggested, sometimes in one way, sometimes in another. ' ' 
 The other two observers describe themselves as having relied chiefly 
 on the method of tentative formulation and modification, regardless 
 of the test or of the judgment situation (as performer or as witness). 
 
 Summary 
 
 The chief results of the study may be summarized as follows : 
 
 1. The important criteria of judgments of efficiency in these tests 
 are either directly perceptual in character or are inferences from 
 such data. Affective processes do not play an important role. 
 
 2. A direct and independent basis or set of sensory criteria for 
 judgments of speed of performance is indicated. 
 
 3. The same criteria are relied on when judging one's own effi- 
 ciency as when judging that of another person. 
 
 4. Direct estimate of duration and feelings of surprise are more 
 important when judging another than when judging one's self. 
 With feelings of pleasantness and unpleasantness and with unanalyz- 
 able feelings of efficiency or inefficiency the reverse is the case. 
 
 3
 
 26 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 5. Trained observers do not report unanalysable feelings, but 
 point to specific criteria of a perceptual character. 
 
 6. Judgments of one's own work tend to be only slightly better, 
 from the point of view of correctness, than judgments of the work 
 of another person. This superior correctness of the personal judg- 
 ments varies somewhat with the test. It is greater for color-naming 
 and opposites than for cancellation and addition. 
 
 7. Practise results in an absolute increase in correctness in the 
 case of judgments as witness. Personal judgments show no absolute 
 gain but the initial per cent, correctness is maintained along with 
 a decrease in the absolute variability of the trials. There is thus in 
 both cases a real improvement, which is greater than the figures 
 indicate. 
 
 8. The process of judgment formulation, as introspectively de- 
 scribed, differs with the individual, with the test, and with the judg- 
 ment situation.
 
 jirh I / 
 
 CHAPTER III 
 
 PERFORMER AND WITNESS AS JUDGES OF EFFICIENCY 
 
 THE two previous chapters have presented results bearing on the 
 judgment of personal efficiency in a work process, the characteristics, 
 reliability and laws, and the basis or criteria of these judgments. In 
 the first chapter it was shown (1) that an individual's judgment of 
 his own efficiency in a task just completed possesses a degree of cor- 
 rectness which varies in a definite and measurable way with his feel- 
 ing of confidence in the judgment; (2) that judgments of "better 
 than usual" are nearly as often correct as are judgments of "worse 
 than usual," although the former do tend to be somewhat in excess 
 of the number of actual cases ; (3) that the magnitude of the average 
 constant variation required as the basis of judgments of a given de- 
 gree of confidence varies with the nature of the task; and (4) that 
 judgments of "better" arise on slighter provocation than do judg- 
 ments of "worse." 
 
 The second chapter gave the results of an introspective study of 
 the judgment of efficiency, both when judging one's self and when 
 judging the performance of another person. It was here indicated 
 that (1) the important criteria relied on in making these judgments 
 are either directly perceptual in character or inferences from such 
 data; (2) that affective processes do not play an important role as 
 criteria of these judgments, and that unanalyzable feelings of effi- 
 ciency or feelings of inefficiency are not reported; (3) that the same 
 criteria are relied on when judging one's own performance as when 
 judging that of another person ; (4) that the specific criteria and the 
 process of formulating the judgment vary with the task and with the 
 judgment situation, and (5) that one's judgments of his own per- 
 formance are only slightly more correct than his judgments of the 
 work of another person, the latter judgments improving somewhat in 
 correctness as the result of practise. 
 
 The present chapter reports a continuation of this series of in- 
 vestigations, designed to check up the previous results by securing a 
 larger number of judgments from more observers and in new tasks, 
 and to make a thorough quantitative and qualitative comparison of 
 the judgments of performer and of witness. Since the method used 
 here was identical with that described in the earlier studies no de- 
 tailed account of it need be given here. Four observers, two men 
 
 27
 
 28 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 and two women, took part in the experiments. Four tests were em- 
 ployed, described in earlier papers: Color-naming, Naming Oppo- 
 sites, Cancellation, and Addition. The data discussed in this chapter 
 were secured in connection with the experiment described in Chap- 
 ter II. 
 
 The time of performance was taken in fifth-seconds. After four 
 preliminary trials each observer made 100 further trials. After each 
 trial, and before the operator had noted the record, both performer 
 and operator judged the performance to have been either "better 
 than usual" or "worse than usual," and each assigned to his or her 
 judgment one of the four degrees of confidence (A, B, C, or D). 
 Both judgments were independently recorded, and after this was 
 done the objective measurement was noted by the operator only. 
 Each person served in turn as operator and as performer. This pro- 
 cedure gives 100 judgments from each of four performers and 100 
 from each of four witnesses, the two sets of judgments referring to 
 the same records. Since there were four tasks this gives a total of 
 3,200 judgments. The experiments occupied a two-hour period on 
 
 TABLE XIV 
 SHOWING THE AMOUNT OF PRACTISE GAIN IN THE VARIOUS TESTS 
 
 Average of First Average of Last General 
 Test Observer 10 Trials (Sec.) 10 Trials (Sec.) Average (Sec.) Gain (Sec.) 
 
 Color-naming: R 34 36 35 -2 
 
 L 51 45 48 6 
 
 P 45 40 43 5 
 
 H 42 38 40 4 
 
 Opposites: R 28 23 26 5 
 
 L 25 22 24 3 
 
 P 50 34 42 16 
 
 H 32 28 30 4 
 
 Cancellation: R 75 55 65 20 
 
 L 56 46 51 10 
 
 P 60 40 50 20 
 
 H 54 40 47 14 
 
 Addition: R 90 52 71 38 
 
 L 86 50 68 36 
 
 P 100 58 79 42 
 
 H 83 60 72 23 
 
 each of 9 successive days, 10 to 12 trials of each task being made at 
 each sitting by each person. 
 
 In computing results the median of the five trials preceding the 
 given record was used as the standard of comparison, or as the meas-
 
 PEBFOEMEB AND WITNESS AS JUDGES OF EFFICIENCY 29 
 
 lire of "usual" performance. This standard was adopted after 
 questioning the observers as to the meaning which the term "usual" 
 had for them. By ' ' usual ' ' is thus meant the median record of the 
 half-day's work immediately preceding the trial in question. It may 
 be well to point out that this method was used (rather than, for in- 
 stance, comparison with the preceding trial) in order to make the 
 experiment as nearly as possible comparable with daily life, in which 
 our impressions and verdicts of momentary efficiency of ourselves or 
 of others are usually expressed in these general terms. 
 
 TABLE XV 
 
 ABSOLUTE DEVIATIONS FROM USUAL. JUDGING SELF. GIVING ALSO THE 
 
 RELIABILITY 
 
 Test 
 
 Obs. 
 
 
 j^ 
 
 Better 
 -B -C 
 
 -D 
 
 +A 
 
 Worse 
 +B +C 
 
 +D 
 
 Colors: 
 
 H 
 
 A.S.D. 1 
 
 -2.5 
 
 -1.5 
 
 -1.0 
 
 -2.2 
 
 4.9 
 
 3.0 
 
 1.2 
 
 -0.4 
 
 
 
 P.E. 
 
 .4 
 
 .7 
 
 .5 
 
 .7 
 
 .2 
 
 .6 
 
 .8 
 
 .2 
 
 
 P 
 
 A.S.D. 
 
 -1.7 
 
 -0.8 
 
 -0.8 
 
 0.7 
 
 5.2 
 
 4.2 
 
 0.6 
 
 
 
 
 
 P.E. 
 
 .5 
 
 .4 
 
 .3 
 
 .6 
 
 .8 
 
 .6 
 
 .6 
 
 
 
 
 R 
 
 A.S.D. 
 
 -0.5 
 
 -2.2 
 
 -1.1 
 
 -0.9 
 
 6.8 
 
 0.8 
 
 1.9 
 
 5.3 
 
 
 
 P.E. 
 
 .8 
 
 .7 
 
 .4 
 
 .9 
 
 1.0 
 
 .3 
 
 .8 
 
 1.0 
 
 
 L 
 
 A.S.D. 
 
 -4.6 
 
 -3.2 
 
 -0.6 
 
 -1.2 
 
 6.0 
 
 2.7 
 
 0.9 
 
 0.3 
 
 
 
 P.E. 
 
 .6 
 
 .8 
 
 .6 
 
 .9 
 
 1.3 
 
 1.4 
 
 .7 
 
 .6 
 
 Opps.: H A.S.D. -4.0 -3.5 -2.9 -0.1 4.5 3.4 0.8 -0.4 
 
 Cane. 
 
 Add.: 
 
 
 P.E. 
 
 .4 
 
 .5 
 
 .4 
 
 .7 
 
 .6 
 
 .6 
 
 .4 
 
 1.1 
 
 P 
 
 A.S.D. 
 
 -3.2 
 
 -1.4 
 
 0.3 
 
 0.1 
 
 7.4 
 
 3.6 
 
 0.5 
 
 
 
 
 P.E. 
 
 .6 
 
 .2 
 
 .3 
 
 .6 
 
 1.0 
 
 .7 
 
 .7 
 
 
 
 R 
 
 A.S.D. 
 
 -1.6 
 
 -0.7 
 
 0.1 
 
 
 
 5.4 
 
 1.6 
 
 0.9 
 
 2.3 
 
 
 P.E. 
 
 .2 
 
 .2 
 
 .2 
 
 .2 
 
 .4 
 
 .3 
 
 .5 
 
 .7 
 
 L 
 
 A.S.D. 
 
 -2.4 
 
 -0.7 
 
 -0.1 
 
 1.1 
 
 1.9 
 
 3.4 
 
 3.2 
 
 0.7 
 
 
 P.E. 
 
 .3 
 
 .3 
 
 .3 
 
 .4 
 
 .4 
 
 .5 
 
 .6 
 
 .3 
 
 H 
 
 A.S.D. 
 
 -6.0 
 
 -3.6 
 
 -1.5 
 
 -1.5 
 
 4.9 
 
 5.8 
 
 0.1 
 
 0.9 
 
 
 P.E. 
 
 .6 
 
 1.0 
 
 .7 
 
 .5 
 
 1.0 
 
 .7 
 
 .8 
 
 .5 
 
 P 
 
 A.S.D. 
 
 -4.6 
 
 -1.0 
 
 2.2 
 
 2.2 
 
 14.3 
 
 6.7 
 
 3.0 
 
 -0.8 
 
 
 P.E. 
 
 .4 
 
 .4 
 
 .5 
 
 2.6 
 
 8.0 
 
 .8 
 
 .7 
 
 
 
 R 
 
 A.S.D. 
 
 -8.0 
 
 -4.4 
 
 -3.0 
 
 -1.5 
 
 8.6 
 
 .6 
 
 1.5 
 
 -3.3 
 
 
 P.E. 
 
 
 
 .8 
 
 .9 
 
 .8 
 
 1.1 
 
 1.2 
 
 .4 
 
 1.4 
 
 L 
 
 A.S.D. 
 
 -6.4 
 
 -1.9 
 
 0.8 
 
 -0.3 
 
 7.0 
 
 5.9 
 
 3.3 
 
 0.7 
 
 
 P.E. 
 
 .8 
 
 .7 
 
 .7 
 
 .6 
 
 1.4 
 
 .8 
 
 .8 
 
 .8 
 
 H 
 
 A.S.D. 
 
 -9.8 
 
 -5.6 
 
 -5.7 
 
 -5.5 
 
 8.3 
 
 3.4 
 
 -1.2 
 
 -2.4 
 
 
 P.E. 
 
 1.0 
 
 2.1 
 
 1.5 
 
 1.0 
 
 1.0 
 
 1.4 
 
 1.1 
 
 1.1 
 
 P 
 
 A.S.D. 
 
 -7.2 
 
 -1.3 
 
 -1.0 
 
 -0.8 
 
 5.9 
 
 2.0 
 
 0.8 
 
 1.7 
 
 
 P.E. 
 
 .6 
 
 .7 
 
 .8 
 
 
 
 1.0 
 
 .8 
 
 .7 
 
 2.5 
 
 R 
 
 A.S.D. 
 
 -9.4 
 
 -2.8 
 
 -3.1 
 
 -3.2 
 
 3.6 
 
 0.2 
 
 0.4 
 
 -1.4 
 
 
 P.E. 
 
 1.3 
 
 .9 
 
 .9 
 
 .3 
 
 1.4 
 
 1.1 
 
 .7 
 
 1.5 
 
 L 
 
 A.S.D. 
 
 -4.6 
 
 -1.9 
 
 -0.6 
 
 -1.7 
 
 7.7 
 
 7.7 
 
 2.9 
 
 1.9 
 
 
 P.E. 
 
 .4 
 
 .4 
 
 .7 
 
 .9 
 
 1.8 
 
 3.3 
 
 1.4 
 
 .5 
 
 A.S.D. = Average Stimulus Difference.
 
 30 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 Three of the observers (L, R, and H) had had prolonged previous 
 practise in color-naming. Observer P had not, but since repetition 
 brings little improvement in this test the gains by the end of the ex- 
 periment were very slight in all cases. The same three observers 
 were practised in opposites but P, who was not, shows a gain of some 
 16 seconds by the end of the experiment. In the cases of cancellation 
 and addition the amounts of previous practise were unequal. In all 
 cases the four preliminary trials served to overcome the initial diffi- 
 
 TABLE XVI 
 
 ABSOLUTE DEVIATIONS FEOM USUAL. JUDGING AS WITNESS. GIVING ALSO THE 
 EELIABILITY OF THE MEASURES 
 
 Better 
 
 Worse 
 
 Test 
 
 Obs. 
 
 
 -A 
 
 -B 
 
 -C 
 
 -D 
 
 +A 
 
 + B 
 
 +c 
 
 +D 
 
 Colors: 
 
 H 
 
 A.S.D. 2 
 
 -7.8 
 
 -4.8 
 
 -2.1 
 
 -0.3 
 
 9.8 
 
 1.7 
 
 0.3 
 
 0.8 
 
 
 
 P.E. 
 
 .5 
 
 .9 
 
 1.0 
 
 1.1 
 
 .7 
 
 .5 
 
 .5 
 
 .4 
 
 
 P 
 
 A.S.D. 
 
 -3.8 
 
 -0.7 
 
 0.8 
 
 0.8 
 
 3.3 
 
 3.7 
 
 1.9 
 
 -0.3 
 
 
 
 P.E. 
 
 .4 
 
 .5 
 
 .5 
 
 .7 
 
 1.8 
 
 1.3 
 
 .5 
 
 .4 
 
 
 R 
 
 A.S.D. 
 
 -1.1 
 
 -0.1 
 
 2.9 
 
 -4.2 
 
 3.4 
 
 -0.2 
 
 0.2 
 
 -1.4 
 
 
 
 P.E. 
 
 .3 
 
 .5 
 
 .8 
 
 .7 
 
 1.2 
 
 .8 
 
 .7 
 
 
 
 
 L 
 
 A.S.D. 
 
 -1.3 
 
 -2.3 
 
 0.3 
 
 1.4 
 
 2.4 
 
 4.5 
 
 5.1 
 
 1.6 
 
 
 
 P.E. 
 
 .6 
 
 .5 
 
 .5 
 
 .7 
 
 
 
 .3 
 
 .6 
 
 .9 
 
 Oppe.: 
 
 H 
 
 A.S.D. 
 
 -2.5 
 
 -0.9 
 
 -0.4 
 
 
 
 
 
 2.1 
 
 2.1 
 
 2.2 
 
 
 
 P.E. 
 
 .6 
 
 .3 
 
 .3 
 
 .5 
 
 
 
 .8 
 
 .5 
 
 .4 
 
 
 P 
 
 A.S.D. 
 
 -0.5 
 
 -0.1 
 
 0.6 
 
 2.9 
 
 5.0 
 
 1.2 
 
 2.9 
 
 
 
 
 
 P.E. 
 
 .3 
 
 .3 
 
 .5 
 
 1.1 
 
 
 
 .9 
 
 .8 
 
 
 
 
 R 
 
 A.S.D. 
 
 -0.8 
 
 -0.9 
 
 0.4 
 
 -1.8 
 
 4.8 
 
 5.0 
 
 2.4 
 
 2.8 
 
 
 
 P.E. 
 
 .3 
 
 .4 
 
 .6 
 
 
 
 .5 
 
 2.1 
 
 .9 
 
 
 
 
 L 
 
 A.S.D. 
 
 -2.1 
 
 -1.3 
 
 -2.1 
 
 1.4 
 
 5.0 
 
 5.5 
 
 3.2 
 
 0.6 
 
 
 
 P.E. 
 
 .5 
 
 .5 
 
 .7 
 
 .8 
 
 1.3 
 
 .5 
 
 .9 
 
 .7 
 
 Cane. 
 
 H 
 
 A.S.D. 
 
 -7.5 
 
 -1.3 
 
 -3.0 
 
 0.5 
 
 7.8 
 
 1.6 
 
 2.4 
 
 -0.3 
 
 
 
 P.E. 
 
 1.2 
 
 1.1 
 
 1.2 
 
 .5 
 
 .4 
 
 1.0 
 
 .7 
 
 .6 
 
 
 P 
 
 A.S.D. 
 
 
 
 -4.4 
 
 -2.5 
 
 -1.9 
 
 11.3 
 
 5.4 
 
 1.8 
 
 3.5 
 
 
 
 P.E. 
 
 
 
 .5 
 
 .6 
 
 1.3 
 
 2.1 
 
 1.1 
 
 .8 
 
 1.4 
 
 
 R 
 
 A.S.D. 
 
 -3.8 
 
 -0.7 
 
 0.7 
 
 2.8 
 
 
 
 -0.6 
 
 0.5 
 
 0.8 
 
 
 
 P.E. 
 
 1.1 
 
 .5 
 
 .9 
 
 .6 
 
 
 
 1.1 
 
 1.6 
 
 
 
 
 L 
 
 A.S.D 
 
 -4.1 
 
 0.1 
 
 -0.8 
 
 -0.3 
 
 11.6 
 
 -1.1 
 
 3.8 
 
 -0.4 
 
 
 
 P.E. 
 
 .6 
 
 .7 
 
 .6 
 
 .6 
 
 1.6 
 
 3.6 
 
 .7 
 
 .8 
 
 Add.: 
 
 H 
 
 A.S.D. 
 
 -5.9 
 
 -4.2 
 
 -2.7 
 
 -4.0 
 
 6.3 
 
 2.0 
 
 
 
 -0.9 
 
 
 
 P.E. . 
 
 .2 
 
 .7 
 
 1.3 
 
 .9 
 
 2.4 
 
 .9 
 
 .6 
 
 1.2 
 
 
 P 
 
 A.S.D. 
 
 -9.7 
 
 -4.0 
 
 0.1 
 
 -4.7 
 
 12.9 
 
 -0.8 
 
 -0.6 
 
 
 
 
 
 P.E. 
 
 1.9 
 
 .6 
 
 .5 
 
 .5 
 
 4.0 
 
 1.4 
 
 .9 
 
 
 
 
 R 
 
 A.S.D. 
 
 -4.0 
 
 -2.5 
 
 
 
 -1.6 
 
 7.9 
 
 -0.7 
 
 1.3 
 
 -0.3 
 
 
 
 P.E. 
 
 .7 
 
 .8 
 
 .9 
 
 
 
 1.0 
 
 1.0 
 
 .9 
 
 2.9 
 
 
 L 
 
 A.S.D. 
 
 -6.1 
 
 -7.5 
 
 -5.0 
 
 -3.6 
 
 11.5 
 
 2.9 
 
 3.8 
 
 1.0 
 
 
 
 P.E. 
 
 1.1 
 
 1.4 
 
 1.1 
 
 .7 
 
 1.2 
 
 1.7 
 
 1.4 
 
 1.2 
 
 2 A.S.D. = Average Stimulus Difference.
 
 PEBFOSMEE AND WITNESS AS JUDGES OF EFFICIENCY 31 
 
 TABLE XVII 
 
 JUDGING SELF 
 Average Constant Deviations from Usual, in Terms of Per Cent, of Average 
 
 Test 
 Colors: 
 
 Average 
 Record Obs. 
 
 43 H 
 
 -A 
 - 5.8 
 
 Better 
 -B -C 
 
 - 3.5 -2.3 
 
 -D 
 
 -5.1 
 
 +A 
 
 11.4 
 
 Worse 
 +B +C 
 
 7.0 2.8 
 
 +D 
 
 -0.9 
 
 40 
 
 P 
 
 - 5.8 
 
 - 2.0 
 
 -2.0 
 
 1.7 
 
 13.0 
 
 10.5 
 
 1.5 
 
 
 
 35 
 
 R 
 
 - 1.4 
 
 - 6.3 
 
 -3.1 
 
 -2.6 
 
 19.4 
 
 2.3 
 
 5.4 
 
 1.4 
 
 48 
 
 L 
 
 - 9.6 
 
 - 6.6 
 
 -1.2 
 
 -2.5 
 
 12.5 
 
 5.6 
 
 1.9 
 
 0.6 
 
 . - 5.2 
 
 - 4.6 
 
 -2.2 
 
 -2.1 
 
 14.3 
 
 6.4 
 
 2.9 
 
 0.4 
 
 Averages . 
 
 Total No. of cases 44 76 71 50 28 52 50 29 
 
 Opposites: 30 H -13.6 -11.7 -9.7 -0.3 15.0 11.3 2.7 -1.3 
 
 42 P - 7.6 - 3.3 0.7 0.2 17.6 8.6 1.2 
 
 26/2 - 6.4 - 2.8 0.4 21.6 6.4 3.6 9.2 
 
 24 L - 9.6 - 2.8 -0.4 4.4 7.6 13.6 12.8 2.8 
 
 Averages - 9.3 - 5.1 -2.2 1.1 15.5 10.0 5.1 3.6 
 
 Total cases 75 91 62 29 33 41 43 26 
 
 Cancellation: 47 H -12.7 - 7.6 -3.2 -3.2 10.5 12.3 .2 1.9 
 
 50 P - 9.2 - 2.0 4.4 4.4 28.6 13.4 6.0 -1.6 
 65 R -12.3 - 6.7 -4.6 -2.3 13.2 0.9 2.3 -5.1 
 
 51 L -12.8 - 3.8 1.6 -0.6 14.0 11.8 6.6 1.4 
 
 Averages -11.7-5.0 -0.4-0.4 16.6 9.6 3.8 0.8 
 
 Total cases 52 82 68 44 30 33 56 35 
 
 Adding: 72 H -13.6 - 7.8 -7.9 -7.6 11.5 4.6 -1.7 -3.3 
 
 79 P - 9.1 - 1.6 -1.3 -1.0 7.5 2.5 1.0 -2.1 
 
 71 R -13.2 - 3.9 -4.3 -4.5 5.0 0.3 0.6 -1.9 
 
 68 L - 6.8 - 2.8 -0.9 -2.5 11.3 11.3 4.3 2.8 
 
 Averages -10.7 - 4.0 -3.6 -3.9 8.8 4.7 1.0 -1.1 
 
 Total cases 73 86 48 25 39 52 43 34 
 
 culties and to bring the performer close to the secondary slope of the 
 practise curve. Table XIV. gives the averages of the first 10 trials 
 (excluding the preliminaries) and of the last 10 trials, the average 
 of these two averages, and the difference between them, thus afford- 
 ing an approximate statement of the general tendency to gain for 
 each individual. 
 
 In the case of each trial the difference between the record made 
 and the appropriate measure of "usual" was found. These differ- 
 ences were then assembled according to the judgments passed on 
 them, the judgments of "better" and of "worse," each with the 
 four degrees of confidence, being tabulated separately. The average 
 constant deviation from usual was then computed for each type of 
 judgment, for each test, and for each individual, both as performer 
 and as witness. Tables XV. and XVI. give these absolute constant 
 deviations, along with their variability.
 
 32 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 In these tables, as in those which follow, the sign ( ) means 
 "better" (t. e., requiring less time than usual) and the sign (-J-) 
 means "worse" than usual. 
 
 In Tables XVII. and XVIII. these absolute deviations have been 
 transformed into per cent, of the average time of performance in the 
 case of each person. This makes it possible to treat all the deviations 
 as comparable magnitudes. In these two tables the deviations are 
 assembled according to the test, and test-averages are also computed. 
 In Tables XIX. and XX. the same measures are reassembled accord- 
 ing to the individual, and individual averages are computed. Table 
 XXI. represents the individual averages and the combined averages, 
 for all types of judgment. Table XXII. presents the combined test 
 averages for all types of judgment, and is a convenient summary of 
 many of the most interesting results of the experiment. Table XXIII. 
 
 TABLE XVIH 
 JUDGING AS WITNESS 
 
 Average Constant Deviations from Usual, in Terms of Per Cent, of Average of 
 the Time of the Performer 
 
 Average Better Worse 
 
 Test Record Obs. -A -B -C -D +A +B +C +D 
 
 Colors: 48 H -15.6 - 9.6 -4.2 - 0.6 19.8 3.4 0.6 1.8 
 
 35 P -11.4 - 2.1 2.4 2.4 9.9 11.1 5.7 -0.9 
 
 40 R - 2.7 - 0.2 7.2 -10.5 8.5 -0.5 0.5 -3.5 
 
 43 L - 3.0 - 5.4 0.7 3.3 5.6 10.5 11.9 3.7 
 
 Averages - 8.2 - 4.3 1.5 - 1.4 10.9 6.1 4.7 -0.2 
 
 Total cases 80 85 56 35 14 43 47 40 
 
 Opposites: 
 
 24 
 
 H 
 
 -10.0 
 
 - 3.6 
 
 -1.6 
 
 
 
 
 
 8.4 
 
 8.4 
 
 8.8 
 
 26 
 
 P 
 
 - 2.0 
 
 - 0.4 
 
 2.4 
 
 11.6 
 
 20.0 
 
 4.8 
 
 11.6 
 
 
 
 42 
 
 R 
 
 - 1.9 
 
 - 2.1 
 
 0.6 
 
 - 4.3 
 
 11.5 
 
 11.9 
 
 5.7 
 
 6.6 
 
 30 
 
 L 
 
 - 7.0 
 
 - 4.3 
 
 -7.0 
 
 4.3 
 
 16.6 
 
 18.3 
 
 10.7 
 
 2.0 
 
 
 
 - 5.2 
 
 - 2.6 
 
 -1.4 
 
 2.9 
 
 16.0 
 
 10.8 
 
 9.1 
 
 5.8 
 
 es. 
 
 
 129 
 
 93 
 
 52 
 
 29 
 
 18 
 
 17 
 
 29 
 
 33 
 
 Averages 
 Total cases 
 
 Cancellation: 51 H -15.0 - 2.6 -6.0 1.0 15.6 3.2 4.8 -0.6 
 
 65 P 6.8 -3.8 - 2.9 17.4 8.3 2.8 5.4 
 
 50 R - 7.6 - 1.4 1.4 5.6 1.2 1.0 1.6 
 
 47 L - 8.7 0.2 -1.7 - 6.4 24.7 -2.3 8.1 -0.9 
 
 Averages -10.4 - 2.6 -2.5 - 0.7 19.2 2.0 4.2 1.4 
 
 Total cases 36 115 67 46 11 38 51 36 
 
 Adding: 68 H - 8.7 - 6.2 -3.9 - 5.9 9.2 2.9 -1.3 
 
 71 P -13.7 - 5.6 0.1 - 6.6 18.2 -1.1 -0.8 
 79 R - 5.0 - 3.2 - 2.0 10.0 -0.9 1.6 -0.4 
 
 72 L - 8.5 -10.5 -6.9 - 5.0 16.0 4.0 5.3 1.4 
 
 Averages - 8.9 - 6.4 -2.7 - 4.9 13.3 1.2 1.5 -0.1 
 
 Total cases.. 62 86 58 23 28 47 65 31
 
 TABLE XIX 
 INDIVIDUAL BECORDS, JUDGING SELF 
 
 In Terms of the Per Cent. Constant Deviation from Usual 
 
 Better Worse 
 
 Obs. Test -A -B -C -D +A +B +C +D 
 
 H: Col - 5.8 - 3.5 -2.3 -5.1 11.4 7.0 2.8 -0.9 
 
 Opps -13.6 -11.7 -9.7 -0.3 15.0 11.3 2.7 -1.3 
 
 Cane -12.7 - 7.6 -3.2 -3.2 10.5 12.3 0.2 1.9 
 
 Add -13.6 - 7.8 -7.9 -7.6 11.5 4.6 -1.7 -3.3 
 
 Average -11.4 - 7.6 -5.8 -4.0 12.1 8.8 1.0 -0.9 
 
 P: Col -4.2-2.0-2.0 1.7 13.010.5 1.5 
 
 Opps -7.6-3.3 0.7 0.2 17.6 8.6 1.2 
 
 Cane - 9.2 - 2.0 4.4 4.4 28.6 13.4 6.0 -1.6 
 
 Add - 9.1 - 1.6 -1.3 -1.0 7.5 2.5 1.0 -2.1 
 
 Average...... -7.5-2.2 0.4 1.3 14.2 8.7 2.4-1.9 
 
 R: Col - 1.4 - 6.3 -3.1 -2.6 19.4 2.3 5.4 1.4 
 
 Opps - 6.4 - 2.8 0.4 21.6 6.4 3.6 9.2 
 
 Cane -12.3 - 6.7 -4.6 -2.3 13.2 0.9 2.3 -5.1 
 
 Add -13.2 - 3.9 -4.3 -4.5 5.0 O3 0.6 ^L9 
 
 Average - 8.3 - 4.9 -2.9 -2.3 14.8 2.5 3.0 0.9 
 
 L: Col - 9.6 - 6.6 -1.2 -2.5 12.5 5.6 l.tf 0.6 
 
 Opps - 9.6 - 2.8 -0.4 4.4 7.6 13.6 12.8 2.8 
 
 Cane -12.8-3.8 1.6-0.6 14.011.8 6.6 1.4 
 
 Add - 6.8 - 2.8 -0.9 -2.5 11.3 11.3 4.3 2.8 
 
 Average - 9.7 - 4.0 -0.2 -0.3 11.3 10.6 6.4 1.9 
 
 gives the test averages for A, B, C, and D judgments regardless of 
 sign, secured by averaging the thresholds for "better" and "worse" 
 judgments for each degree of confidence. Table XXIV. shows the 
 distribution of the judgments for all types of situation and indicates 
 the per cent, correctness in each case. The remaining tables are 
 described later. In the discussion which follows these tables will 
 be referred to by number. 
 
 Results 
 
 1. Judgments of "better" are based on smaller constant devia- 
 tions in efficiency than are judgments of ''worse." Considering the 
 average percentile results from the four tests combined (Tables 
 XVII., XVIII., and XXII.) this is true (1) for all four observers, 
 (2) for all four degrees of confidence, and (3) both when judging self 
 and when judging the performance of another. The difference is 
 somewhat greater when judging another than when judging one's 
 own performance. The average amounts of change required as the 
 basis for judgments of any given degree of confidence are almost 
 twice as large when judging inefficiency as when judging efficiency.
 
 34 EXPEEIMENTAL STUDIES IN JUDGMENT 
 
 TABLE XX 
 
 INDIVIDUAL RECORDS. JUDGING AS WITNESS 
 
 Better Worse 
 
 ObB. Test A -B -C -D +A +B +C +D 
 
 H: Col -15.6 -9.6 -4.2 -0.6 19.8 3.4 0.6 1.6 
 
 Opps -10.0 - 3.6 -1.6 8.4 8.4 8.8 
 
 Cane -15.0 -2.6 -6.0 1.0 15.6 3.2 4.8 -0.6 
 
 Add - 8.7 - 6.2 -3.9 - 5.9 9.2 2^ 0_ -1.3 
 
 Average. -12.3 - 5.5 -3.9 - 1.4 ~14~9 1^5 3^5 2.1 
 
 P: Col -11.4 -2.1 2.4 2.4 9.9 11.1 5.7 -0.9 
 
 Opps - 2.0 - 0.4 2.4 11.6 20.0 4.8 11.6 
 
 Cane - 6.8 -3.8 - 2.9 17.4 3.2 4.8 -0.6 
 
 Add -13.7 - 5.6 0.1 - 6.6 18.2 -1.1 -0.8 
 
 Average. - 9.0 - 3.7 0.3 1.1 16\4 2.2 5.3 -0.8 
 
 R- 
 
 Col 
 
 - 2.7 
 
 - 0.2 
 
 7.2 
 
 105 
 
 85 
 
 05 
 
 05 
 
 3 5 
 
 
 ODDS. . . 
 
 1.9 
 
 2.1 
 
 06 
 
 43 
 
 11 5 
 
 11 9 
 
 57 
 
 66 
 
 
 Cane 
 
 7.6 
 
 1.4 
 
 1 4 
 
 56 
 
 
 1 2 
 
 1 
 
 1 6 
 
 
 Add 
 
 - 5.0 
 
 - 3.2 
 
 
 
 - 2.0 
 
 10.0 
 
 -0.9 
 
 1.6 
 
 -0.4 
 
 T,- 
 
 Average. 
 Col 
 
 - 4.3 
 - 3.0 
 
 - 1.7 
 - 5.4 
 
 2.3 
 0.7 
 
 - 2.8 
 3.3 
 
 10.0 
 56 
 
 2.3 
 105 
 
 2.2 
 119 
 
 1.1 
 37 
 
 
 Opps 
 Cane 
 
 - 7.0 
 
 - 8.7 
 
 - 4.3 
 0.2 
 
 -7.0 
 1.7 
 
 4.3 
 64 
 
 16.6 
 247 
 
 18.3 
 23 
 
 10.7 
 8 1 
 
 2.0 
 09 
 
 
 Add 
 
 - 8.5 
 
 -10.5 
 
 -6.9 
 
 - 5.0 
 
 16.0 
 
 4.0 
 
 5.3 
 
 1.4 
 
 
 Average. 
 
 - 6.8 
 
 - 5.0 
 
 -3.7 
 
 - 0.9 
 
 15.7 
 
 7.6 
 
 9.0 
 
 1.6 
 
 TABLE XXI 
 COMBINED AVERAGES OP ALL TESTS 
 
 Better Worse 
 
 Obs. Situation -A -B -C -D +A +B + C +D 
 
 H: Self -11.4 -7.6 -5.8 -4.0 12.1 8.8 1.0 -0.9 
 
 Witness -12.3 -5.5 -3.9 -1.4 14.9 4.5 3.5 2.1 
 
 P; Self -7.5 -2.2 0.4 1.3 14.2 8.7 2.4 -1.9 
 
 Witness - 9.0 -3.7 0.3 1.1 16.4 2.2 5.3 -0.8 
 
 R: Self - 8.3 -4.9 -2.9 -2.3 14.8 2.5 3.0 0.9 
 
 Witness - 4.3 -1.7 2.3 -2.8 10.0 2.3 2.2 1.1 
 
 L: Self -9.7 -4.0 -0.2 -0.3 11.3 10.6 6.4 1.9 
 
 Witness - 6.8 -5.0 -3.7 -0.9 15.7 7.6 9.0 1.6 
 
 Average self -9.2 -4.7 -2.1 -1.3 10.6 7.6 3.2 
 
 No. of cases 244 335 249 148 130 178 192 124 
 
 Average, witness . -8.1 -3.9 -1.3 -1.0 14.5 5.0 4.9 1.7 
 
 No. of cases 307 379 233 133 71 145 192 140 
 
 When the four individuals are averaged for each test, as in Tables 
 XIX., XX., and XXII., this law holds for all tests with the excep- 
 tion of addition. Here it holds only for B judgments of one's own 
 performance and for A judgments as witness.
 
 PEEFOEMEE AND WITNESS AS JUDGES OF EFFICIENCY 35 
 
 These results quite confirm the similar finding reported in the 
 earlier experiment (see Chapter I.). It was there questioned whether 
 this law results from a predisposition toward judgments of "better," 
 since these judgments show a somewhat lower per cent, correctness 
 than do judgments of ' ' worse. ' ' The result does not follow from the 
 possibility of larger variations in the direction of inferiority, since 
 these variations are, as a matter of fact, no more frequent, and even 
 if they were would affect only the A judgments, whereas the law holds 
 for all degrees of confidence. The only other explanation suggested 
 was that the criteria of judgments of "better" are either different, 
 more numerous, or more definite and more clearly detected, and that 
 for this reason the "feeling of efficiency" arises on slighter provoca- 
 tion (smaller changes in performance) than does the "feeling of 
 
 inefficiency. ' ' 
 
 TABLE XXII 
 COMPARISON OF WITNESS AND PERFORMER 
 
 Test 
 
 Col: 
 Opps. : 
 Cane.: 
 Add.: 
 
 Grand 
 Total ( 
 
 Situation 
 Self 
 
 -A 
 
 . . -5.2 
 
 Better 
 -B -C 
 -4.6 -2.2 
 -4.3 1.5 
 -5.1 -2.2 
 -2.6 -1.4 
 -5.0 -0.4 
 -2.6 -2.5 
 -4.0 -3.6 
 -6.4 -2.7 
 
 -D 
 -2.1 
 -1.4 
 1.1 
 2.9 
 -0.4 
 -0.7 
 -3.9 
 -4.9 
 
 +A 
 14.3 
 10.9 
 15.5 
 16.0 
 16.6 
 19.2 
 8.8 
 13.3 
 
 Worse 
 +B +C 
 6.4 2.9 
 6.1 4.7 
 10.0 5.1 
 10.8 9.1 
 9.6 3.8 
 2.0 4.2 
 4.7 1.0 
 1.2 1.5 
 
 +D 
 0.4 
 -0.2 
 3.6 
 5.8 
 -0.8 
 1.4 
 -1.1 
 -0.1 
 
 Witness 
 
 . 8.2 
 
 Self 
 
 . . - 9.3 
 
 Witness. . . 
 
 . . - 5.2 
 
 Self 
 
 . . -11.7 
 
 Witness. . . 
 Self 
 
 . . -10.4 
 . . -10.7 
 
 Witness. . . 
 average 
 
 . . - 8.9 
 
 . . - 8.6 
 
 -4.3 
 714 
 
 -1.7 
 
 482 
 
 -1.2 
 281 
 
 14.1 
 201 
 
 6.3 
 323 
 
 4.0 
 
 384 
 
 0.8 
 264 
 
 :ases . . 
 
 551 
 
 For averages, for self and for witness, see end of Table XXI. 
 
 Some information on this point is offered by the introspective 
 accounts of the relative importance of various criteria relied on in 
 making these judgments. (See Ch. II.) Each observer was given, 
 toward the close of the experiments, the list of criteria, and was asked 
 at the end of the investigation to arrange these various criteria in 
 order of importance, according to the degree to which the criteria 
 were used in judging "better" and also in judging "worse." 
 
 POSSIBLE CRITERIA OF JUDGMENT 
 
 A. Feelings of ease and comfort or of strain and uncertainty as the test pro- 
 
 ceeds. 
 
 B. Feelings of pleasantness and satisfaction or of unpleasantness and dissatis- 
 
 faction, either during the test or after its completion. 
 
 C. Perception of the smoothness and regular flow or of the roughness and irregu- 
 
 larity of the performance. 
 
 J>. Direct estimate of the total time interval or duration of the test from be- 
 ginning to end, regardless of what happens during the performance of 
 the test.
 
 36 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 E. Perception of the speed or rate of succession of the separate acts which the 
 
 test involves (as each word, problem, etc.). 
 
 F. Inference, based on the number or amount of specific mistakes, hesitations, 
 
 successes, observed during the test or remembered after its completion. 
 
 G. Feelings of surprise, or of fulfilled or unfulfilled expectation, when the end 
 
 of the test is reached. 
 
 H. Unanalyzable and indefinable feeling of efficiency or of inefficiency. 
 /. Any other specific criteria which you may have noted. 
 
 The following table shows the arrangements by each individual: 
 
 Criteria of Better Criteria of Worse 
 
 Observers 
 
 Criterion H P L R Av. Pos. H P L R Av. FOB. 
 
 A 3 5 6 8 5.5 3 5 5 8 5.2 
 
 B 4 8 2 6 5.0 6 8 6 4 6.0 
 
 C 2 2 4 1 2.2 2 3 3 2 2.5 
 
 D 6 4 5 7 5.5 5 4 4 7 5.0 
 
 E 1 1 3 2 1." 4 2 2 3 2.7 
 
 P 7 3 1 3 3.5 1 1 1 1 1.0 
 
 G 5 7 7 4 5.7 7 7 7 6 6.7 
 
 H 8 6 8 5 7.0 8 6 8 5 7.0 
 
 In both cases criteria F, C, and E stand higher than the remaining 
 criteria. But there are nevertheless differences in position among the 
 various criteria which seem sufficient to be significant, viz., the higher 
 positions of F and D in the case of judgments of "worse." Infer- 
 ence on the basis of specific failures or successes, and direct estimate 
 of total time interval or duration are relied on less when judging 
 "better" than when judging "worse." This means that the direct 
 perceptions of smoothness and of speed are less prominent, as are 
 also feelings of pleasantness and unpleasantness. The judgment of 
 "better," that is to say, is the result of a direct perceptual process. 
 The judgment of "worse" is somewhat more likely to be at least one 
 step removed from direct perception, to resemble an inference. 
 This seems to mean that the "positive" qualities of smoothness and 
 speed are appreciated immediately and in their own right, while the 
 logically opposite qualities of roughness and slowness are not appre- 
 ciated in so direct a manner. If this be true, it falls in line, in an 
 interesting way, with previous findings as to the way in which judg- 
 ments which are logically opposite are psychologically related to each 
 other. Thus two sets of judgments of dislike or of stupidity, in the 
 case of photographs of human faces, show lower correlation than do 
 similar sets of judgments of preference or intelligence, and also yield 
 a higher variability (see Chapter VIII.). Further, before the two 
 categories have been explicitly brought together in the consciousness 
 of the observer, the personal consistency coefficient of two arrange- 
 ments of given materials on the basis of resemblance to a given stand-
 
 PEEFOEMEE AND WITNESS AS JUDGES OF EFFICIENCY 37 
 
 ard is higher than that of two arrangements for unlikeness (see 
 Chapter VII.). Moreover, if an observer is left free to choose the 
 direction of his judgment in comparing two sensory stimuli, there is 
 found to be a strong tendency to direct the judgment toward the 
 stimulus described as "positive" in quality (see Chapter VI.). All 
 of these facts go to show that logical opposites are not necessarily 
 psychological opposites. 
 
 TABLE XXIII 
 
 COMBINED AVERAGES OF "BETTER" AND "WORSE" JUDGMENTS 
 Average of 4 Observers for Each Test. Also Number of Cases 
 
 Test 
 
 Col 
 
 A 
 Self Witness 
 
 9.8 9.5 
 
 B 
 
 Self Witness 
 
 5.5 5.2 
 
 c 
 
 Self Witness 
 
 2.5 3.1 
 
 D 
 
 Self Witness 
 
 1.3 0.8 
 
 Cases .... 
 
 72 94 
 
 128 128 
 
 121 103 
 
 79 75 
 
 Opps 
 Cases .... 
 
 12.4 10.6 
 108 147 
 
 7.5 6.7 
 132 110 
 
 3.6 5.3 
 105 81 
 
 2.3 4.3 
 
 55 62 
 
 Cane 
 
 14.2 14.8 
 
 7.3 2.3 
 
 1.7 3.4 
 
 0.6 1.0 
 
 Cases .... 
 Add 
 
 82 47 
 9.8 11.1 
 
 115 153 
 4.3 3.8 
 
 124 118 
 2.3 2.1 
 
 79 82 
 2.5 2.5 
 
 Cases .... 
 
 112 90 
 
 138 133 
 
 91 123 
 
 59 54 
 
 Average. . 
 Cases .... 
 
 11.5 11.5 
 374 378 
 
 6.1 4.5 
 513 524 
 
 2.5 3.5 
 441 425 
 
 1.7 2.2 
 272 273 
 
 Grand av. 
 
 11.5 
 
 5.3 
 
 3.0 
 
 2.0 
 
 Cases .... 
 
 752 
 
 1,037 
 
 866 
 
 545 
 
 But it will be shown later that this difference in the nature of the 
 criteria is not responsible for the difference in the magnitude of the 
 constant deviations. It will be shown that although the constant 
 deviations are consistently different, they are so related to the per 
 cent, of correct judgments that the probable error (the difference 
 correctly reported in 75 per cent, of the cases) is the same for all 
 circumstances. 
 
 2. When the four degrees of confidence are considered, regardless 
 of direction there is seen to be no appreciable difference between 
 judgments of performer and judgments of witness. The thresholds 
 are not consistently different, and the distribution of judgments 
 among the various degrees of confidence is almost identical in the 
 two cases (Table X.). 
 
 3. Correctness of Judgment. If the judgment be classed as right 
 or wrong according as the record on which it was based did or did not 
 depart from the usual performance in the direction indicated in the 
 judgment (regardless of amount) the per cent, of correct judgments 
 may be correlated with the degree of confidence. Table XXIV. 
 summarizes the results of this classification. As in the previous 
 study, correctness increases with certainty, and even pure guesses 
 are more likely to be right than wrong. Roughly, the per cent, cor-
 
 38 
 
 EXPEBIMENTAL STUDIES IN JUDGMENT 
 
 TABLE XXIV 
 PEE CENT. CORRECT JUDGMENTS 
 
 Better 
 
 Test 
 
 Col.: 
 
 Situation 
 
 Self 
 
 -A 
 
 80 
 
 -B 
 
 74 
 
 -c 
 61 
 
 -D 
 53 
 
 
 Cases 
 
 44 
 
 76 
 
 71 
 
 50 
 
 
 Witness. . . . 
 Cases. . . 
 
 . 81 
 
 80 
 
 71 
 
 85 
 
 39 
 
 56 
 
 52 
 35 
 
 Oppe.: Self 92 78 58 42 
 
 Cases 75 91 62 29 
 
 Witness 71 63 57 43 
 
 Cases 129 93 52 . 29 
 
 Cane.: Self 96 77 58 63 
 
 Cases 52 82 68 44 
 
 Witness 93 69 63 69 
 
 Cases 36 115 67 46 
 
 Add.: Self 93 75 67 75 
 
 Cases 73 86 48 25 
 
 Witness 95 79 61 92 
 
 Cases 62 86 58 23 
 
 Average: Self 90 71 61 58 
 
 Cases 244 335 249 148 
 
 Average: Witness 85 70 55 64 
 
 Cases 307 379 233 133 
 
 Grand average A = 92 B = 73 
 
 Total cases . . 686 366 
 
 Worse 
 
 +A. 
 
 + B 
 
 + C 
 
 +D 
 
 100 
 
 72 
 
 59 
 
 56 
 
 28 
 
 52 
 
 50 
 
 29 
 
 92 
 
 63 
 
 71 
 
 50 
 
 14 
 
 43 
 
 47 
 
 40 
 
 93 
 
 88 
 
 72 
 
 81 
 
 33 
 
 41 
 
 43 
 
 26 
 
 96 
 
 79 
 
 81 
 
 74 
 
 18 
 
 17 
 
 29 
 
 33 
 
 97 
 
 86 
 
 81 
 
 45 
 
 30 
 
 33 
 
 56 
 
 35 
 
 100 
 
 77 
 
 78 
 
 64 
 
 11 
 
 38 
 
 51 
 
 36 
 
 89 
 
 68 
 
 52 
 
 54 
 
 39 
 
 52 
 
 43 
 
 34 
 
 97 
 
 66 
 
 59 
 
 44 
 
 28 
 
 47 
 
 65 
 
 31 
 
 95 
 
 78 
 
 66 
 
 59 
 
 130 
 
 178 
 
 192 
 
 124 
 
 96 71 72 58 
 
 71 145 192 140 
 
 C = 63 D = 60 
 
 216 332 
 
 rectness (averaging both performer and witness, and both "better" 
 and "worse" judgments) is A 90 per cent, B 75 per cent., C 65 
 per cent., D 60 per cent. In the previous study, in which the three 
 practised observers only were concerned, these percentages were 
 somewhat higher, viz., A 98 per cent., B 80 per cent., C 70 per cent., 
 D 60 per cent. Judgments as witness are correct nearly as often, in 
 the long run, as are those of the performer, and, in the case of both, 
 judgments of "better" are somewhat less likely to be correct than 
 are those of "worse" (the average difference being 4 to 5 per cent.). 
 4. The threshold variation in performance for the judgments of 
 all degrees of confidence varies with the general situation in which 
 the judgment is passed. Within each degree of confidence there are 
 four different judgment situations : 
 
 A. Witness judging performer to be worse than usual. 
 
 B. Performer judging self to be worse than usual. 
 
 C. Performer judging self to be better than usual. 
 
 />. Witness judging performer to be better than usual.
 
 PEEFOEMEE AND WITNESS AS JUDGES OF EFFICIENCY 39 
 
 The highest threshold is required for situation A, then come, in 
 order of diminishing threshold, E, C, and D. Similarly, situations 
 requiring large thresholds show a smaller number of judgments of 
 the given degree of confidence. If it were correct to refer to these 
 facts as the "sensitivity" of the judgments, those judgments being 
 most ' ' sensitive ' ' which require the smallest variations of performance 
 as their basis, the result might be stated as follows : 
 
 A. The most "sensitive" judgments are those in which the wit- 
 ness affirms superior performance on the part of another person. 
 
 B. Next come the performer's own judgments of himself as 
 "better than usual." 
 
 C. Then come the performer's judgments of himself as "worse." 
 
 D. Finally, least "sensitive" of all, the witness's judgments of 
 inferiority on the part of another person. 
 
 In other words, the thresholds for the witness, as compared with 
 those for the performer, are lower for efficiency and higher for 
 inefficiency. This is also shown in the distribution of the judgments. 
 On the question as to whether these differences indicate genuine 
 differences in "sensitivity" or whether they merely show different 
 judgment attitudes or degrees of predisposition, more will be said 
 later. 
 
 5. Test Differences. The four tests may be compared from three 
 different points of view: 
 
 A. Average Amount of Variation Required as the Basis for a 
 Judgment of a Given Degree of Confidence. This comparison may 
 be most easily made by reference to Table XXV. in which the per 
 cent, variation for each degree of confidence is given, the direction of 
 the variation being disregarded and the results of performer and 
 witness being combined. 
 
 The test differences are neither considerable nor very consistent 
 addition, color-naming, cancellation, opposites, is the order in the 
 
 TABLE XXV 
 
 TEST DIFFERENCES 
 
 A B c D 
 
 Colors: Threshold 9.6 5.3 2.8 1.0 
 
 Cases 166 256 224 154 
 
 Opposites: Threshold 11.5 7.1 4.4 3.3 
 
 Cases 255 242 186 117 
 
 Cancellation: Threshold 14.5 4.8 2.5 .8 
 
 Cases 129 268 242 161 
 
 Addition: Threshold 10.4 4.1 2.2 2.5 
 
 Cases 202 271 214 113
 
 40 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 long run. In the previous study, in which tapping, color-naming, 
 and opposites were used as tests, judgments in color-naming were, as 
 in the present instance, more sensitive than those in opposites, while 
 tapping was twice as sensitive as either of these. It was there sug- 
 gested that "progressively larger variations in performance are re- 
 quired as the basis for judgments of a given degree of confidence as 
 one passes from an automatic, objectively observable performance (such 
 as tapping), through work involving perceptional reactions (color- 
 naming), to work of a more strictly mental and less objectively 
 observable character (opposites)." No new information on this 
 point is afforded by the present study. 
 
 B. Correctness. In this respect no consistent test differences 
 seem to be present. The lowest per cent, correctness for A confidence 
 is found in judgments of "better" in color-naming, but in other 
 cases this test shows up as well as any of the others. 
 
 C. Conformity to the Law of Smaller Thresholds for Judgments 
 of "Better." The chief point to be made here concerns addition. 
 This is the only test in which the law does not hold, the usual rela- 
 tion of thresholds being found here only in the B judgments of the 
 performer and the A judgments of the witness. Addition, that is, 
 which is the most sensitive test^ shows the law least emphatically. 
 Color-naming and cancellation, which are about equally sensitive, 
 show the law about equally strikingly. Opposites, which is the 
 least sensitive, shows the law most clearly. This seems to mean that 
 the more difficult the judgments (difficulty being measured by the 
 average constant variation required for a given degree of confidence) 
 the stronger is the predisposition toward "better" judgments. 
 Much the same thing was found in the earlier study in which tapping, 
 color-naming, and opposites were compared with each other. 
 
 6. Individual Differences. Tables XX. and XXI. show the indi- 
 vidual thresholds when the four tests are averaged. All individuals 
 show the same tendency to pass judgments of "better" on smaller 
 average constant deviations, but they do not show it equally clearly. 
 Observer L, whether acting as performer or as witness, always shows 
 the tendency, and under all four degrees of confidence. H (the 
 writer) shows the tendency least clearly. R and P offer occasional 
 exceptions with the lower degrees of confidence. With respect to the 
 magnitude of the variations no consistent individual differences 
 seem to be present. 
 
 7. Amount of Variation and Per Cent. Correctness. In the pre- 
 vious study the figures in the first part of the following table were 
 secured, and in the present study those in the latter part of the table. 
 The A/P.E. for these various percentages of correctness is given as
 
 PEEFOEMEE AND WITNESS AS JUDGES OF EFFICIENCY 41 
 
 found by using tables presenting this relation when the per cent, 
 correctness for a given difference is given (Fullerton and Cattell, 
 
 "Small Differences"). 
 
 TABLE XXVI 
 
 PROBABLE EEEOES 
 
 Degree of Confidence A B C D 
 
 1st 2d 1st 2d 1st 2d 1st 2d 
 
 Av. per cent. diff. .. 10.8 11.5 5.2 5.3 3.2 3.0 1.9 2.0 
 
 Percent, correctness 98 92 81 73 73 63 59 60 
 
 Diff. divided by P.E. 3.05 2.08 1.30 .91 .91 .49 .34 .38 
 
 Probable error 3.1 5.5 4.0 5.9 3.5 6.1 5.6 5.3 
 
 Av. probable error . . 4.3 4.9 4.8 5.5 
 
 Or if the results of the two experiments be averaged, the follow- 
 ing table results : 
 
 TABLE XXVH 
 
 PROBABLE ERRORS 
 
 Degree of Confidence A B C D 
 
 Av. per cent, difference 11.2 5.3 3.1 2.0 
 
 Per cent, correctness 95 77 69 60 
 
 Diff. divided by P.E 2.44 1.10 .69 .38 
 
 Probable error 4.6 4.8 4.5 5.2 
 
 When the probable error is computed in this way, it gives the 
 amount of difference which will be correctly reported 75 per cent, of 
 the times. This P.E. is found to be uniformly about 4.8 per cent, 
 variation from "usual," for all degrees of confidence. 
 
 In the same way may be compared the P.E. for "better" and for 
 ' ' worse ' ' judgments. The results are as follows. The table gives the 
 results when the two experiments are combined to give averages. 
 
 TABLE XXVm 
 PROBABLE ERRORS 
 
 Judgments of "Better" Judgments of "Worse" 
 
 Degree of Confidence A B C D A B C D 
 
 Av. per cent, difference 8.5 4.6 1.9 1.3 13.7 5.8 4.1 1.5 
 
 Per cent, correctness 90 77 63 58 98 77 72 60 
 
 Diff. divided by P.E 1.90 1.10 .49 .30 3.05 1.10 .86 .38 
 
 Probable error 4.5 4.2 4.0 4.3 4.5 5.3 4.3 4.0 
 
 The probable error is seen to be, in all cases, about 4.5 per cent. 
 This same P.E. is indicated regardless of the degree of confidence or 
 of the direction of the variation. When the per cent, correctness and 
 the amount of the difference are both taken into account the actual 
 thresholds for judgments of efficiency differ in no way from those of 
 judgments of inefficiency. Reference to the tables shows that when a 
 judgment with a given degree of confidence is passed on the basis of 
 smaller average minus variations than in the case of plus varia- 
 
 4
 
 42 EXPEEIMENTAL STUDIES IN JUDGMENT 
 
 tions there is usually a falling off in the per cent, correctness. An 
 observer is, then, no more sensitive to gain in efficiency than he is 
 to loss, but he is predisposed to judge both himself and a performer 
 whom he is watching as having done "better than usual" rather than 
 "worse than usual." The consequence is that smaller degrees of 
 superiority tend to be judged as better with higher degrees of con- 
 fidence, and that a certain slight degree of inferiority tends to be 
 incorrectly judged as "better." It is this situation which is chiefly 
 responsible for the smaller constant variations on which judgments 
 of "better" are based. 
 
 If the four different judgment situations be now considered, it 
 will be seen that we were not dealing with genuine differences in 
 "sensitiveness" in the earlier tables. The following table shows that 
 probable error for all four judgment situations is quite the same, the 
 differences in threshold measuring, in reality, not the sensitiveness 
 of judgments but the strength of a predisposition. We are predis- 
 posed to judge "better" rather than "worse" and we are, further- 
 more, predisposed in favor of the other man rather than of ourselves. 
 
 TABLE XXIX 
 JUDGMENT SITUATIONS 
 
 Situation Degree of Confidence A B C D 
 
 Witness judging performer to Av. per cent, difference . 14.5 5.0 4.9 1.7 
 
 be "worse than usual": Per cent, correct 96 71 72 58 
 
 Diff . div. by P.E 2.60 .82 .86 .30 
 
 Probable error 5.6 6.1 5.7 5.6 
 
 Av. P.E., disregarding D 
 
 judgments 5.8 
 
 Performer judging self to be Av. per cent, diff 13.8 7.7 3.2 .2 
 
 "worse than usual": Per cent, correct 95 78 66 59 
 
 Diff. div. by P.E 2.44 1.14 .61 .34 
 
 Probable error 5.6 6.7 5.3 .6 
 
 Av. P.E., disregarding D 
 
 judgments 5.8 
 
 Performer judging self to be Av. per cent, difference. 9.2 4.7 2.6 1.3 
 
 "better than usual": Per cent, correct 90 71 61 58 
 
 Diff. div. by P.E 1.90 .82 .41 .30 
 
 Probable error 4.8 5.7 6.3 4.3 
 
 Av. P.E., disregarding D 
 
 judgments 5.6 
 
 Witness judging performer to Av. per cent, difference. 8.1 3.9 1.3 1.0 
 
 be "better than usual": . Per cent, correct 85 70 55 64 
 
 Diff. div. by P.E 1.54 .78 .19 .53 
 
 Probable error 5.3 5.0 6.8 2.0 
 
 Av. P.E., disregarding D 
 judgments 5.7
 
 PEEFOSMEB AND WITNESS AS JUDGES OF EFFICIENCY 43 
 
 The differences found do not then indicate real differences in sen- 
 sitivity under the various judgment situations, they measure the 
 relative strength of these various predispositions, tendencies, and 
 inclinations. These observers were, under all circumstances, dis- 
 inclined to judge any trial as "worse than usual," and the disinclina- 
 tion was stronger when judging as witness than when judging as 
 performer. This results in a combination of optimism and altruism 
 which, if found to be a common occurrence, would seem to have 
 exceedingly interesting psychological and perhaps social implication. 
 Further investigation will perhaps show that these predispositions are 
 conditioned, under different circumstances, by a variety of factors, 
 such as competition, education, motive, age, sex of performer and wit- 
 ness, and perhaps by individual differences of a temperamental sort.
 
 CHAPTER IV 
 
 THE CENTRAL TENDENCY OP JUDGMENT 1 
 
 SINCE the work of the early investigators of the time sense the 
 concept of the " indifference point" (LP.) has played an ever- 
 present role in experiments on judgments of magnitude, duration, 
 and intensity. Judgments of time, weight, force, brightness, extent 
 of movement, length, area, size of angles, have all shown the same 
 tendency to gravitate toward a mean magnitude, the result being 
 that stimuli above that point in the objective scale were underesti- 
 mated and stimuli below overestimated, while the mean magnitude 
 itself was invested with no constant error. This region in the scale, 
 flanked above and below by negative and positive constant errors, 
 was called the indifference point, or more properly the region of 
 indifference. 
 
 The tendency has been throughout to infer that the I.P. dis- 
 closed in any particular experiment was in some way an absolute 
 quantity and should be found in other experiments on the same 
 quality of stimulus. In this way arose the ideas of a "most favor- 
 able extent" (Kramer and Moskiewicz, Jaensch) and a "most fa- 
 vorable time" (Vierordt, Horing, Estel, etc.). Among the investi- 
 gators of the time sense, since an I.P. was found for every group of 
 intervals employed, grew up the doctrine of periodic I.P. 's, those 
 for regions higher up in the scale being multiples of the I.P. 's found 
 in the experiment in which the shortest intervals were used. At- 
 tempts were made to correlate the unit of periodicity with various 
 bodily processes the swing of the leg, breathing time, pulse beat 
 (Wundt, Miinsterberg) . All of this speculation passed the criti- 
 cism of laboratory workers and was incorporated in the general 
 texts as a curious fact, productive of many illusions and constant 
 errors, but the analysis was carried no farther. 
 
 In an earlier study 2 the writer undertook an experimental analy- 
 sis of the phenomenon of the I.P. in judgments of the duration and 
 extent of rectilinear arm movements. The results of this investiga- 
 tion showed conclusively that, with the method of reproduction, the 
 following principles hold. 
 
 i Eeprinted from The Journal of Philosophy, Psychology, and Scientific 
 Methods, Vol. VII., No. 17, August 18, 1910. 
 
 3 ' ' The Inaccuracy of Movement, " H. L. Hollingworth, Columbia Contribu- 
 tions, Vol. XVII., No. 3, June, 1909. 
 
 44
 
 TEE CENTRAL TENDENCY OF JUDGMENT 45 
 
 I. The I.P. is relative, not absolute. It is a function of the 
 series limits of the stimuli employed. Given the series of magni- 
 tudes with which we are to work, we may be quite certain that a 
 region of indifference will occur at about the midpoint of that 
 particular scale. 
 
 II. A periodic I.P. can be found within a total series (8) by 
 working with its special sections (A, B, and C) . 
 
 III. The same absolute magnitude may be either an I.P., or af- 
 fected with a positive constant error, or with a negative constant 
 error, according to the particular range or section in which it occurs. 
 
 IV. The gradual extension of the series limits is accompanied 
 by a corresponding shift in the region of indifference. 
 
 V. No magnitude estimated out of relation to a series or group 
 of which it is a member evinces any considerable constant error. 
 
 VI. The phenomenon of the I.P. disappears as the interval 
 between separate judgments is extended. The first disposition is 
 soon dissipated and is no longer adequate to affect the second 
 performance. 
 
 VII. In a parallel tabulation of the I.P. 's and the ranges of 
 intervals used in the various time-sense studies the influence of the 
 latter on the magnitude of the I.P. is clearly seen. 
 
 VIII. The phenomenon of the I.P. and the so-called positive 
 and negative time errors result from a general law the central 
 tendency of judgment. In all estimates of stimuli belonging to a 
 given range or group we tend to form our judgments around the 
 median value of the series toward this mean each judgment is 
 shifted by virtue of a mental set corresponding to the particular 
 range in question. This central tendency is not a "law of sense 
 memory. " It is a law of immediate perception and disappears as the 
 experiment becomes a memory test. 
 
 IX. In experiments by the method of reproduction this central 
 tendency is reenforced by the law of motor habit. 
 
 For an account of the experiments on which these conclusions 
 rest and for detailed exposition of their significance the reader 
 must be referred to the earlier study. 
 
 The Present Study 
 
 Purpose. On account of the reenforcing value of the law of 
 motor habit the earlier experiments did not indicate how clearly or 
 in how far the results secured were a function of the method of 
 motor reproduction. In order to support the case completely it 
 should be shown that the same law of judgment is present in ex- 
 periments into which the method of reproduction does not enter.
 
 46 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 In order to put the generalization to such a test the following ex- 
 periments have been made on judgments of the size of squares, by 
 the method of selection. 
 
 Observers. The observers were all women students in Barnard 
 College with from one and a half to two and a half years of train- 
 ing in psychology. Different observers were used in the two ex- 
 periments and none of them knew the purpose of the experiment, 
 nor were they familiar with the results of the earlier study. 
 
 Material. The material used in both experiments A and B was 
 the same, the chief differences between the experiments consisting 
 in the way in which the series limits were varied. On a dark gray 
 wall were placed 30 squares of light gray cardboard, ranging in 
 size from 2.5 cm. on a side to 50 cm. and increasing from 2.5 to 
 7 cm. by increments of 0.5 cm., from 7 to 15 cm. by increments of 
 1 cm., from 15 to 40 cm. by increments of 2.5 cm., and on to 50 cm. 
 by increments of 5 cm. Each card was numbered in consecutive 
 order. Alongside these standard cards and at the same distance 
 from the observer was an exposure apparatus, by means of which, 
 at proper intervals, the fourteen test cards could be presented one 
 at a time. These test cards varied in size from 3 cm. to 40 cm. on 
 the side, ranging from 3 to 7 cm. by increments of 1 cm., from 7 to 
 15 cm. by increments of 2 cm., from 15 to 40 cm. by increments 
 of 5 cm. 
 
 Procedure. In each experiment a test card was exposed for 5 
 seconds. The observer then waited for 5 seconds, the eyes resting 
 meanwhile on a dark screen. She then turned to the standard 
 series and was allowed 5 seconds in which to select a card corre- 
 sponding in size to the one just exposed and to write its number in 
 her record. A second test card was then exposed, and so on through- 
 out the experiment. By keeping a record of the order in which the 
 test cards were shown, the experimenter was able subsequently to 
 compare the observer's judgment with the actual magnitude. As a 
 result of this method of selection all constant errors due to the law 
 of motor habit in reproduction are eliminated and any error dis- 
 closed will be entirely an error of judgment of visual magnitude. 
 
 Experiment A 
 
 This experiment began with series 3, 4, 5, 6, 7, three trials for 
 each magnitude, in chance order. The smallest card (3) was then 
 dropped and the larger card (9) substituted, and three trials taken 
 in chance order, for each member in the new series 4, 5, 6, 7, 9. In 
 this way the successive series moved up along the total range, drop- 
 ping at each change the lowest member and including the one next
 
 TEE CENTEAL TENDENCY OF JUDGMENT 47 
 
 larger than the greatest number. The series, that is to say, always 
 consisted of 5 test cards, and as the experiment progressed, magni- 
 tudes were dropped from the lower end and new ones added to the 
 upper end. Ten observers were used, 150 trials being taken on each 
 observer. Table XXX. gives the C.E. of the 10 observers in terms of 
 the square root of the area that is, in terms of the length of one 
 side of the square. Each figure is the C.E. resulting from 30 
 judgments. 
 
 TABLE XXX 
 
 GIVES THE C.E. IN CM. OF EACH CARD IN EXPERIMENT A. 
 10 OBSERVERS, 1,500 TRIALS 
 
 3456 7 9 11 13 15 20 25 30 35 40 
 
 1 -.13 -.23 -.24 -.21 
 
 2 +.15 +.52 +.53 -.01 +.44 
 
 3 +.51 +.15 -.11 +.32+ .31 
 
 4 +.19 +.39 +.55+ .21- .02 
 
 5 +.31 +21 + .42- .13 
 
 6 +.74+ .75+ .64+ .56+ .48 
 
 7 +1.31+ .80+1.37+1.73+2.15 
 
 8 +1.39+1.60+1.84+1.43+1.92 
 
 9 + .94+1.72+2.15+ .98+.90 
 
 10 +2.40+2.65+1.50+.45+1.78 
 
 Experiment B 
 
 This experiment began with the series 3, 4, 5, 6, 7, 9. Three 
 trials for each magnitude were taken in chance order. The next 
 higher magnitude (11) was then added to the series and again 3 
 trials for each magnitude (3-11) were taken in chance order. At 
 this point the next magnitude (13) was introduced, 3 trials for each 
 card taken, and the process continued until in the ninth series the 
 whole range of test cards from 3 to 40 was included. Six observers 
 were used, 270 records being taken from each observer. Table XXXI. 
 gives the C.E. of the 6 observers for each magnitude in each suc- 
 
 TABLE XXXI 
 
 GIVES THE C.E. OF EACH CARD IN EXPERIMENT B. 
 6 OBSERVERS, 1,620 TRIALS 
 
 13 15 20 25 30 35 40 
 
 
 3 
 
 4 
 
 5 
 
 6 
 
 7 
 
 9 
 
 11 
 
 1 
 
 .03 
 
 .10 
 
 .08 
 
 .42 
 
 .25 
 
 .58 
 
 
 2 
 
 .03 
 
 .17 
 
 .15 
 
 .45 
 
 .25 
 
 .65 
 
 .86 
 
 3 
 
 .03 
 
 .26 
 
 .48 
 
 .60 
 
 .11 
 
 .80 
 
 .89 
 
 4 
 
 .03 
 
 .53 
 
 .73 
 
 .88 
 
 .45 
 
 .40 
 
 .53 
 
 .60 
 
 .65 1.43 
 
 5 .03 .65 .98 .83 1.05 .43 .36 .52 1.60 2.63 
 
 6 .05 .65 1.05 .78 .85 .72 .43 - .25 1.62 2.05 2.40 
 
 7 .03 .76 1.05 .90 .92 .93 .80 1.00 1.35 1.73 2.25 4.82 
 
 8 .05 .87 1.12 .73 1.23 .70 .82 1.83 1.27 1.77 1.63 1.85 3.08 
 
 9 .08 .68 1.08 .87 1.10 .75 .42 .92 1.52 1.57 1.43 .97 2.10 4.42
 
 48 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 ceeding series. As in Table XXX. the errors are given in terms of 
 one side of the square. Each figure in the table is the C.E. of 18 
 judgments of the same card. 
 
 In each of these experiments we have another case of the grad- 
 ual extension of series limits, and if the law of central tendency is 
 operative, I.P. 's might be expected to occur in each series and 
 gradually to rise in the range as the larger magnitudes are added. 
 The A.E. and its variability are not given in the tables, since only 
 the C.E. is of interest for the problem in hand. As a matter of fact 
 the phenomenon of the I.P. is concealed in both experiments by a 
 strong positive constant error which comes from a general tendency 
 to overestimation in judgments of square magnitudes. This tend- 
 ency has been found by other investigators. Woodworth and 
 Thorndike find a positive constant error in estimates of area by a 
 mental standard. Baldwin, Shaw, and Warren find the same tend- 
 ency in judgments of the size of squares and attribute it to a 
 change in the memory image. This error, however, is irrelevant to 
 the present problem. The important fact is that underneath this 
 ever-present overestimation the law of central tendency is also 
 operative, and its presence can be clearly shown by a proper analy- 
 sis of the figures. 
 
 Casual examination of Table XXX. shows that the positive con- 
 stant error for any one magnitude increases as the place of the magni- 
 tude in the series descends. Thus the C.E. ( .21) for card 7 in 
 series 1 changes to a decided + C.E. (+.39, +-31) in series 4 and 
 5. The + C.E. (+.31) of card 11 increases to +1.31 in series 7, 
 and the errors of the other cards undergo in a strikingly uniform 
 way the same transformation. This is a clear indication that in any 
 one series the magnitude is influenced by other magnitudes occurring 
 above and below it and is in every case shifted toward the center of 
 the series. Thus in series 1 card 7 is drawn toward the smaller 
 magnitudes, and its judgment results in a C.E. In series 5 the 
 same card is drawn toward a higher set of magnitudes and hence 
 acquires a decided + C.E. 
 
 The process is clearly shown by an examination of the 6 cards 
 (7 to 20, inclusive) that occurred in all 10 series. Each of these 
 cards occupied, in the course of the experiment, all 5 positions. 
 Thus card II. is in series 3 the largest magnitude; in series 7 it is 
 the lowest ; in series 5 it is the central card ; while in series 4 and 6 
 it occupies the intermediate positions on either side of the center. 
 The same, in appropriate series, is true of all 6 cards, from 7 to 20 
 inclusive. Now if there were no source of error present except the 
 central tendency of judgment each card should have theoretically
 
 THE CENTRAL TENDENCY OF JUDGMENT 49 
 
 no C.E. when it occurred in the middle of a series, i. e., it should 
 be the I.P. for that series. But, since there is another error present 
 due to the general tendency to overestimation in judgments of square 
 size, the theoretical conditions are not fulfilled, and each card, even 
 when it occurs in the central position, shows an actual -j- C.E. 
 We may assume, then, that the error shown in this central position 
 is due to the character of the' material, and that so far as the law 
 of central tendency is concerned it may be considered 0, or what we 
 might call the normal error. If the errors of any magnitude in the 
 successive series from 1-10 be calculated with respect to this normal 
 error, the operation of the law of central tendency should lead to the 
 following results. As the series progress the relative errors of any 
 magnitude, that is, the deviations of the actual from the normal 
 errors, should show an I.P. phenomenon they should be negative 
 above the normal, zero at the normal, and positive below it. The 
 facts are shown in Table XXXII., in which, for cards 7-20, the error 
 of each card when it occurred in central position is assumed to be 
 normal. It will be seen that above the normal the errors are, with 
 a single exception, negative, while below they are, with only three 
 exceptions, positive. The transformation is from a high value 
 through to a high -{- value. 
 
 TABLE XXXII 
 
 7 
 
 9 
 
 ll 
 
 13 
 
 15 
 
 20 
 
 -.10 
 
 -.11 
 
 -.11 
 
 -.66 
 
 -1.37 
 
 -1.32 
 
 +.10 
 
 -.23 
 
 -.21 
 
 -.77 
 
 - .81 
 
 - .11 
 
 
 
 
 
 
 
 
 
 
 
 
 
 +.28 -.34 +.33 +.16 + .23 - .12 
 
 +.20 +.19 +.89 +.75 + .43 + .68 
 
 Thus from any point of view in which the figures may be re- 
 garded the central tendency of judgment is revealed, working, how- 
 ever, underneath a general tendency to overestimation. This result is 
 confirmed by the results of Experiment B, in which the lower mag- 
 nitudes were allowed to remain in the series while the higher were 
 being added. The results appear in Table XXXI. Again there is 
 present the positive constant error due to the character of the 
 material, but underneath the central tendency is clearly to be seen. 
 
 The magnitudes here used fall into three groups. To the first 
 group belong cards 3-9, present in all 9 series, and influenced in 
 judgment by the gradual inclusion of the higher magnitudes 11-40. 
 According to the aforestated law the effect of these higher magni- 
 tudes should be to draw the lower cards toward a constantly aug- 
 menting center, that is, as the higher cards appear one by one, the
 
 50 EXPEEIMENTAL STUDIES IN JUDGMENT 
 
 central tendency of the respective series rises. The positive errors 
 of cards 3^-9 should thus become constantly greater as the experiment 
 proceeds. Again the deductions are strikingly verified. Thus the 
 error of card 4 increases from -}- .10 in series 1 to + .68 in series 9 ; 
 that of 5 from + -08 in series 1 to + 1.08 in series 9, etc. This effect 
 is due, in any one series, partly to the introduction of still higher 
 magnitudes, partly to habituation to the larger cards already intro- 
 duced and now being repeated. 
 
 The second group of magnitudes consists of cards 20 to 40 
 inclusive. When any one of these cards, say 20, is introduced, the 
 observer is already considerably adapted to the lower magnitudes, 
 and as the next higher card (25) is introduced in the following series 
 this adaptation to the lower cards is much furthered by the fact that 
 each of the 9 cards below 20 is again repeated three times, while 
 adaptation to magnitudes higher than 20 is only slightly begun by 
 the threefold repetition of card 25. The consequence is that as the 
 experiment proceeds habituation to the lower range increases much 
 more rapidly, at first, than that to the upper range, on account of 
 the greater number of lower cards. In this group, then, we should 
 expect transformations just the reverse of those in group I., that is, 
 the -{-C.E.'s should become constantly smaller as the high card is 
 drawn more and more in judgment toward the center of the series. 
 Again expectation is confirmed. The error of card 20 falls from 
 -f- 2.63 in series 5 to + 1.57 in series 9 ; that of card 25 from 
 4- 2.40 to + 1.43 ; that of card 30 from + 4.82 to + .97 ; and that 
 of card 35 from -f 3.08 to + 2.10. 
 
 There remain yet to be considered the three cards 11, 13, and 15, 
 comprising group three. This group, standing as it does midway 
 between groups one and two, which show directly opposite trans- 
 formations, might be expected to show either of two results. First, 
 the two tendencies might neutralize each other, the errors in group 
 three remaining approximately constant or varying irregularly. 
 Second, the first tendency might operate in the first few series, after 
 which, by virtue of increasing habituation to the larger cards the 
 second tendency might begin to assert itself in the later series. So 
 far as the figures go they are sufficiently irregular to admit of either 
 interpretation. There is neither uniform increase nor decrease 
 throughout. There is, in fact, a strong suggestion of the second 
 possible result initial decrease followed by increase as habituation 
 to higher magnitudes grows. Thus the errors of card 11 fall from 
 -f- .86 in series 2 to + .36 in series 5, then increase to + -80 and 
 -{- .82 in later series. Card 13 falls from + -60 in series 1 to 
 .25 in series 6, then increases to over -f 1.00 in series 7-9. Card
 
 TEE CENTEAL TENDENCY OF JUDGMENT 51 
 
 TABLE XXXIII 
 
 345679 
 
 1 -.01 -.42 -.67 -.30 -.44 -.08 
 
 2 -.01 -.35 -.60 -.27 -.44 -.01 
 
 3 -.01 -.26 -.27 -.12 -.58 +.14 
 
 4 -.01 +.01 -.02 +.16 -.24 -.26 
 
 5 -.01 +.13 +.23 +.11 +.36 -.23 
 
 6 +.01 +.13 +.30 +.06 +.16 +.06 
 
 7 -.01 +.24 +.30 +.18 +.23 +.27 
 
 8 +.01 +.35 +.37 +.01 +.59 +.04 
 
 9 +.04 +.16 +.33 +.15 +.41 +.09 
 
 15 falls to + 1.35 in series 7, increasing to + 1.50 in the last series. 
 
 One could scarcely ask for more convincing evidence of the law 
 of central tendency than that afforded by the behavior of the C.E.'s 
 in these three groups of magnitudes. The evidence may be re- 
 enforced, however, and the process more clearly exhibited by further 
 treatment of the errors in group L, consisting of cards which were 
 present in all 9 series. In the case of this experiment we have no 
 means of determining, as we did in experiment A, the normal error 
 due to the character of the material. We may, however, observe the 
 deviations of the errors in a given series from the average of the 
 errors in the whole 9 series. These deviations should show, as did 
 Table XXXII. for experiment A, an indifference point phenomenon 
 for the errors of any given magnitude in successive series. Such a 
 calculation results in Table XXXIII. As was to be expected, the I.P. 
 phenomenon is clearly present. The successive deviations from the 
 average, in the case of the errors for any given magnitude, pass 
 from pronounced negative direction through an approximate zero 
 point to a pronounced positive direction. This change was caused 
 in every case by the inclusion of higher magnitudes in the series, 
 thus producing an upward shift in the central tendency or median 
 of the series, toward which each lower magnitude was assimilated 
 in greater or less degree, according to the amount of habituation to 
 the upper range. 
 
 It is not necessary to go further into the theoretical and inter- 
 pretative consideration of the law of central tendency, since the 
 writer has already discussed this elsewhere. 3 But it should be 
 pointed out that none of the factors usually introduced to explain 
 the occurrence of indifference points are adequate. Unexplained 
 differences in time error (Fechner), mechanical sources of error in 
 apparatus (Schumann), peculiarity of the sense organ (Vierordt), 
 lack of current motor control (Delabarre), relative expenditure of 
 energy (Wundt), change in the memory image (Wreschner, Leuba), 
 
 s ' ' Inaccuracy of Movement, ' ' Chapter III.
 
 52 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 fatigue and dynamogeny, all these may contribute their share toward 
 the actual magnitude of a given error, but their influence can hardly 
 be conceived as varying up and down a scale of objective magnitudes 
 in such a way as to account for the shifting I. P. with extension of 
 the series limits. 
 
 Nor is the phenomenon in any way the result of contrast. It is, 
 on the contrary, just the reverse a case of two magnitudes approxi- 
 mating each other in judgment by virtue of their temporal contiguity. 
 The tendency seems explicable only in terms of itself. Just as our 
 experience with a race, class, or social group results in the conception 
 of a type which shall in some way represent the central tendency of 
 the group, and from which the separate members shall deviate the 
 least, so in an experiment on sensible discrimination we become 
 adapted to the median value of the series, tend to expect it, to as- 
 similate all other values toward it, and to greater or less degree to 
 substitute it for them. Either this tendency is the rudimentary 
 process out of which the higher acts of conception grow, or it is the 
 habit of conception extended to sensory fields and interfering with 
 a quite elementary process of comparison and recognition. The 
 importance of the law in any series of psychophysical measurements 
 should be apparent. The error to which it leads is distinctly an 
 error of judgment, and is quite independent of sensory or physiolog- 
 ical conditions which may of themselves be sources of other types 
 of errors.
 
 CHAPTER V 
 
 THE DIRECTION OF JUDGMENT 
 
 So far as the writer is aware the only discussion of the influence 
 of the direction of judgment is to be found in the works on psycho- 
 physics. In these works the problem is handled chiefly as a point in 
 experimental technique and treated as an issue which must be dis- 
 posed of before some further problem can be most precisely ap- 
 proached. In the several papers that follow this chapter the phe- 
 nomena of preferred or accustomed directions, inclinations, or tend- 
 encies of judgment, and the influence, on the outcome of the 
 judgment, of the form or category in which it is expressed, are them- 
 selves to have the place of chief interest. In place of the simple 
 stimuli used in the psycho-physical studies, material of a more com- 
 plex sort has been employed. This has been done partly because of 
 immediate interest in these little-studied subjective types of judg- 
 ment, and partly because of a preliminary assumption that this kind 
 of material would involve processes and criteria which might be more 
 sensitive to the influences just mentioned than might be the case with 
 descriptively simpler and more objectively measurable material. 
 
 By way of introduction to the three chapters which follow it may 
 be of interest to sketch briefly some of the chief sections in the litera- 
 ture of psycho-physics in which the problem of the direction of 
 judgment has been raised. 
 
 In Fechner's experiments on the discrimination of weights the 
 observer was required to pass one of two kinds of judgments, he 
 might designate the heavier weight or he might express himself as 
 uncertain. When a comparison was expressed, that is to say, the 
 direction of judgment was determined by the quality of the stimulus 
 rather than by its time or space order. The subject of the proposi- 
 tion expressing the judgment was always the heavier weight, which 
 might be either the right or left, the first or second, in order of 
 presentation. 
 
 G. E. Miiller 1 devotes considerable space to a preliminary discus- 
 sion of ' ' die Urtheilsrichtung. ' ' Miiller points out that which of the 
 six possible ways of expressing the relation between a standard and a 
 variable stimulus is used (indicating the heaviest or lightest, or 
 
 i"Die Gesichtspunkte und die Tatsachen der psychophysischen Methodik," 
 p. 16 ff. 
 
 53
 
 54 EXPEBIMENTAL STUDIES IN JUDGMENT 
 
 describing the first or second, right or left) is not a matter of indiffer- 
 ence. For at least three reasons the observer should always, on begin- 
 ning an experiment, be given definite instructions with respect to the 
 direction of his judgments, and these instructions should be recorded. 
 In the first place the six directions differ in convenience and ease, both 
 for operator and for observer. In the second place, the results of 
 some methods of instruction are more informative than others. 
 Finally the part played by "absolute impression" depends somewhat 
 on the direction of attention toward the one or the other stimulus. 
 These remarks hold whether the order of presentation be simultane- 
 ous or successive. 
 
 The instruction to direct the judgment always toward the stand- 
 ard or toward the variable, Miiller dismisses because of the danger of 
 confusion, either in the mind of the observer or in the records. Nor 
 is the method of periodically changing direction felt to be satisfactory. 
 When the two stimuli are simultaneous the preferable procedure is 
 held to be that of "free direction" in which, whether the judgment 
 shall relate to the first or second, standard or variable, heavier or 
 lighter, is left to the option of the observer. Two reasons are given 
 for this preference for the method of "free direction." The first 
 is found in the statement that such procedure "does least possible 
 violence to the psychological tendency of the observer. ' ' The second 
 is the fact that, given a good observer and an appropriately planned 
 experiment, information can be secured concerning the observer's 
 type and his attention characteristics by examining the frequency of 
 the various forms of judgment. It was by utilizing this method that 
 Miiller classified his observers as positive or negative in type. 
 
 In the case of successive stimuli Miiller believes that the method 
 in which the judgment always relates to the second stimulus is far 
 superior to any other method of "prescribed direction," "because 
 this is the simplest, most natural method, and the one most free from 
 omissions and confusions." No experiments with successive presen- 
 tation and free direction of judgment are recorded. Miiller however 
 asserts that the method of "free direction" with respect to space 
 position is always to be recommended. With respect to temporal 
 position no experiments are recorded. The same is true of procedure 
 with absolutely free direction in which judgment may refer, at the 
 discretion of the observer, toward either the right or left, first or 
 second, stimulus. 
 
 Three points are to be noted in Miiller's discussion. One is the 
 statement that if the direction of judgment is to be prescribed, the 
 direction should always be toward the second stimulus because this is 
 the " simplest and most natural method." The second is the state-
 
 THE DIEECTION OF JUDGMENT 55 
 
 
 
 ment that observers have psychological tendencies which may be 
 violated. The third is the assertion that the direction of attention 
 may influence the distribution of the judgments. 
 
 Miiller and Schumann instructed their observers to direct their 
 judgment toward the second stimulus presented. Martin and Miiller 
 (Untershiedsempfindlichkeit) experimented by various methods, such 
 as judgment on the (a) variable, (&) standard, (c) heavier, (d) 
 second, ignoring the method of judging always on the first. Fech- 
 ner's method (judging which is heavier) is said to complicate unduly 
 the process of judgment. It is asserted that if the difference between 
 standard and variable is clear the observer always decides at once 
 how the second compares with the first, and the reply is made much 
 more easily under the Miiller and Schumann method (judgment on 
 the second). "If the observer must say which is lighter or which is 
 heavier the psychological process is too complex. Subjects complain 
 of having to hold the impression in memory while deciding its posi- 
 tion." This method is also objected to because of difficulties in 
 recording, on the part of the operator. 
 
 Methods with judgment always on standard or on variable are 
 also reported to be both unnatural and too complex, and to present 
 difficulties in the matter of records. This leaves the Miiller and 
 Schumann method as the preferable procedure. But it should not 
 fail to be noticed that the "difficulties" and "complexities" of the 
 other methods are for the most part, reported by the operator, or on 
 the part of observers already long practised in the Miiller and 
 Schumann method. 
 
 Fullerton and Cattell 2 in experiments on extent of movement, on 
 lifted weights and on lights, instructed their observers to state the 
 relation of the second to the first stimulus. In the general discussion 
 of the psycho-physical methods these investigators state that "the 
 method of right and wrong cases in which two stimuli nearly alike 
 are presented to an observer and he is required to say which seems 
 the greater is the most accurate method" (p. 150). But this seems, 
 in the light of their procedure, to have meant not that the category 
 of greatness should be employed, but that the magnitude or intensity 
 of the second be compared with that of the first. The second stimulus 
 was always the subject of the proposition expressing the judgment. 
 Fullerton and Cattell do not take up the question of "direction of 
 judgment ' ' for its own sake. 
 
 Titchener 3 in describing the method of right and wrong cases 
 advises that "0 judges always in terms of the weight lifted second," 
 
 2 ' ' Small Differences. ' ' 
 
 " Experimental Psychology, Student's Quantitative Manual," p. 119.
 
 56 EXPEEIMENTAL STUDIES IN JUDGMENT 
 
 t 
 
 and refers, by way of reasons for this procedure, to Mailer's dis- 
 cussion. 
 
 Warner Brown, in an interesting study of the various factors 
 influencing the judgment of difference in the case of lifted weights* 
 compared, with one observer, the Fechnerian method with that of 
 Miiller and Schumann. In discussing the difference between them 
 Brown remarks on the way in which the form of expression may, by 
 inducing a particular mental set or bias, modify the total distribution 
 of the judgments. The following paragraphs are quoted. 
 
 "The group which appears to better advantage here is that which 
 adopts the procedure recommended by Miiller and Schumann. It 
 has less errors in all and a less dispersion of errors toward the larger 
 differences. It also shows a less exaggerated constant error. So far 
 as the small number of cases warrants any conclusion, it seems also 
 to present a more symmetrical distribution of plus and minus errors, 
 and to have greater regularity. . . . The results leave no doubt that 
 a difference in the framing of two propositions which are precisely 
 equivalent logically will be a governing factor in making a compari- 
 son. Evidently no comparison is complete with the mere apprehen- 
 sion of the presented stimuli. These are apprehended in the light of 
 other stimuli which have gone before, but even then the analysis is 
 not complete without taking account of what the observer has to do 
 in the matter. Even the slightest differences in the task which he has 
 to perform seem to govern to some extent his decisions. ' ' 
 
 "To speak of the 'perception of difference' in such a case is to 
 obscure some of the factors in the actual situation. The difference 
 is not merely perceived. The process of comparison involves the 
 active operation of the mind in the expression of a judgment upon 
 the situation in which the difference is only one factor. When this 
 difference is acted upon through one set of categories and with one 
 mental set it occasions one definite reaction, while if it is taken into 
 another set of categories it goes through different mental machinery 
 and comes out different. If it were possible to catch an instantane- 
 ous view of the two experimental groups under consideration, there 
 is no doubt that a weight of 95.5 grams would be sensibly lighter than 
 100 in the one and heavier in the other. The stimuli to be compared 
 are identical and the difference involved is not conceivably other than 
 identical. Moreover the logical relations of the terms are equivalent. 
 And yet this difference comes out plus in one group and minus in the 
 other. In the instantaneous view it is judged to be sensibly other; 
 to be two distinct differences." 
 
 ". . . If it be true that the mind will more readily give expres- 
 
 * ' ' The Judgment of Difference, ' ' California Studies in Psychology, No. 1.
 
 THE DIRECTION OF JUDGMENT 57 
 
 sion to 'greater' than to 'less,' the fault is certainly not in the per- 
 ception of the particular difference but rather in the mind's attitude 
 toward all differences. Such a defect would permeate all quantita- 
 tive judgments and would, in fact, be a defect of judgment itself. 
 There seems to be evidence that some of the abnormalities observed in 
 the comparison of weights are traceable to such subtle eccentricities 
 in the machinery by which all judgments of difference, in any 
 material, are expressed." 
 
 Henmon has recently reported observation of decided preferences 
 in the direction of judgments of length of lines. 5 ' ' One curious con- 
 stant error in judgments of the shorter line appeared in the results. 
 All of the subjects, particularly Br and H, noted early in the experi- 
 ments that judgments could be more easily given, more quickly, and 
 with greater confidence when reaction was to be made to the shorter 
 line. The feeling that the most accurate judgments would be secured 
 with the shorter line was very marked. . . . The results in part con- 
 firm the introspections and in part do not. The general averages show 
 in each case that the greater number of wrong judgments was obtained 
 to the shorter line though the differences are not significant except in 
 the case of Br. However the number of right A judgments (judg- 
 ments with high degree of assurance) to the shorter line is almost 
 twice as great as to the longer line, except in the case of Bl where the 
 difference is not marked. ' ' 
 
 Burt 6 remarks : " It may be of interest to note, as bearing on the 
 psychological theory of comparison of sense impressions, that the 
 natural tendency of the boys seemed invariably to be indicative, by 
 pointing or naming, the heavier of the two weights, rather than to 
 pronounce a judgment directly expressing an 'absolute impression of 
 the heaviness or lightness of the last lifted. ' ' 
 
 The Present Studies 
 
 In the three following chapters, on "Natural or Habitual Tend- 
 encies of Judgment," "Judgments of Similarity and Difference," 
 and "The Influence of Form and Category on the Outcome of a 
 Judgment," will be reported a series of experimental inquiries de- 
 signed chiefly to discover the character and degree of such natural or 
 habitual tendencies or inclinations of judgment as are revealed under 
 experimental conditions, to investigate any individual differences 
 that may be indicated, and to examine into the way in which changes 
 in logical category or form of expression may influence the outcome, 
 
 e "Time and Accuracy of Judgment," Psych. Sev., May, 1911, p. 193. 
 e ' ' Experimental Tests of General Intelligence, ' ' Brit. Jour. Psychol., 1909, 
 p. 20. 
 5
 
 58 EXPEBIMENTAL STUDIES IN JUDGMENT 
 
 the consistency, and the variability of judgment. Special attention 
 will be given to the psychological process and criteria underlying 
 judgments which are, from a grammatical or logical point of view, 
 only two sides or modes of expression of one and the same intellectual 
 act. The interest throughout will not be in technique of experimental 
 procedure as has been the case for the most part in the studies just 
 referred to, nor will any attention be given to the relation between 
 objective measurement and subjective estimation. The interest will 
 be in the judgments themselves, their behavior and criteria, and the 
 way in which these are influenced by changes in the task, situation. 
 or mental set in the interest of which the judgment is passed.
 
 CHAPTER VI 
 
 NATURAL OR HABITUAL TENDENCIES OP JUDGMENT 1 
 
 THE preceding studies have demonstrated the important part 
 played by direction, form, and category in determining the outcome, 
 consistency, and variability of judgment. The present study reports 
 an attempt to learn whether there are some tendencies, categories, or 
 forms of expression which are most naturally or habitually employed, 
 and to learn how such inclinations, if present, vary with individual, 
 with age, and with the modality or general situation in which the 
 judgment is passed. The experiments have been performed on naive 
 subjects, who neither knew the purpose of the experiments nor were 
 practised in any of the psycho-physical methods. They are more- 
 over limited to results from a group of school children and a group 
 of college students (women). The original plan included a group of 
 male observers but the conditions under which the work was done 
 have made it impossible to secure this third group of observers. The 
 original plan included also a study of the way in which the preferred 
 direction of judgment might vary with the position of the group of 
 stimuli in the total possible range of magnitudes, intensities, etc. 
 But this first section (here reported) proved to require a longer 
 time for its completion than had been expected. Unavoidable inter- 
 ruptions also occurred, so that by the time it was finished the same 
 observers and assistance were no longer available. These further 
 questions, although not discussed in this paper, seem to constitute 
 extremely interesting topics' of research and it is hoped that on some 
 later occasion or by some other investigator they may be taken up 
 anew. The method and procedure are here described in detail in 
 order that such later work may be planned on a comparable basis. 
 
 The Method of the Experiment 
 
 Fifteen sets of stimuli were provided, so chosen as (1) to enable 
 the study of several modalities of sensation, (2) to call for a variety 
 of typical kinds of judgment categories, and (3) to afford, in each 
 set, three degrees of difference, all of which should, however, be 
 easily perceptible. The stimuli used, and their measurements or 
 quality, are here listed. 
 
 i This experiment was conducted, under the writer 's general supervision, by 
 Miss M. E. Bishop, who is also responsible for the tabulation of the data. 
 
 59
 
 60 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 Three weights, weighing respectively 25, 40, and 70 grams. 
 
 Three heavily drawn horizontal lines, 6, 7, and 8.7 cm. in length. 
 
 Cards bearing squares, the sides being 1.5, 2, and 2.5 cm. 
 
 Balls of rubber, three different sizes. 
 
 Three tuning forks, pitch C, E, and G. 
 
 Tones on monochord, lengths of string, 50, 60, and 70 cm. String 
 constant. 
 
 Three shades of gray paper, easily discriminable. 
 
 Cards bearing in figures amounts of money, $197.35, $205.72, 
 and $628.43. 
 
 A pain point (thorn) applied with three degrees of force. 
 
 Bottles of violet perfume, two strengths, and a bottle of clear 
 water. 
 
 Cards bearing following dates : 1492, 1609, 1776. 
 
 Hard rubber ball, falling on floor from heights of 1, 2, and 3 ft. 
 
 Three sheets of sand paper, of different degrees of roughness. 
 
 Metronome beating at three rates, 76, 100, and 126. 
 
 Three wrapped bottles, two containing old cheese of different 
 strength, the remaining bottle containing only water. 
 
 These stimuli were presented to 31 observers (21 adults, for the 
 most part students in Barnard College or teachers, all women) and 
 10 children in the Speyer School (5 boys and 5 girls, ages 11 or 12). 
 In each case two of the stimuli from a given group were given in 
 succession, with an interval of a few seconds. Six trials were made 
 ^within each group of stimuli, thus giving a total of 90 judgments for 
 each of the 31 observers, in all 2,790 judgments. Three of these 
 BIX trials were what will be designated as ' ' positive first, followed by 
 negative." The remaining three were "negative first, followed by 
 positive." The use of the terms "positive" and "negative" in this 
 connection is chiefly a matter of convenience. By a negative stimulus 
 is meant simply a stimulus which presents a smaller amount or degree 
 of that quality, force, or property, etc., which characterizes the 
 group. Thus 
 
 The observer was requested to compare the two stimuli with 
 respect to some category which was more general than either the 
 positive or negative quality, care being taken not to suggest either 
 the one or the other quality or form of expression. Thus, ' ' Compare 
 these two tones in pitch," "Compare these two squares as to size," 
 "these two odors, as to how they affect you," etc., etc. In the case of 
 the grays, the surfaces, and the lines, however, it was not so easy to 
 give a general instruction which should not more or less directly 
 suggest one or other of the forms of expression available for the 
 judgment. In these cases the observer was simply asked to compare
 
 NATURAL OB HABITUAL TENDENCIES OF JUDGMENT 61 
 
 In judging volumes (balls) "Positive" means larger. 
 
 pitches (forks) "Positive" means higher. 
 
 shades of gray "Positive" means darker. 
 
 amounts of money "Positive" means greater. 
 
 pains (prick of point) "Positive" means more acute. 
 
 perfumes (violet) "Positive" means more agreeable. 
 
 stinks (cheese) "Positive" means more agreeable. 
 
 dates "Positive" means later. 
 
 weights (pressure) "Positive" means heavier. 
 
 sounds (intensity) "Positive" means louder. 
 
 surfaces (sandpaper) "Positive" means rougher. 
 
 speeds (metronome) "Positive" means faster. 
 
 weights (lifted) "Positive" means heavier. 
 
 lines "Positive" means longer. 
 
 squares "Positive" means larger. 
 
 the two. If he hit upon the right comparison, the experiment was 
 continued without further instruction for that group. If the com- 
 parison was not of the type desired, he was asked to compare them 
 in still another respect. When the desired comparison was once 
 made, he was asked to compare the remaining stimuli of the group. 
 
 That is to say, the observer was left free to select both the direc- 
 tion of the judgment (as to first or second stimulus) and the form of 
 expression (positive or negative quality). This was of course the 
 whole point of the experiment, and the question of interest was : when 
 an observer is left thus free, both as to direction and as to category, 
 what is the direction or form which his judgment most naturally or 
 habitually takes? Does he show any inclination to judge the char- 
 acter of the second stimulus rather than that of the first, or is the 
 direction determined perhaps by some more or less constant tendency 
 to attend to the stimulus possessing either the positive or the nega- 
 tive quality or degree of quality ? If, to the naive observer one direc- 
 tion or one category is either more natural, more accustomed or more 
 easily employed, and if individuals differ in these respects, when the 
 differences between stimuli are clear, the records of 90 judgments by 
 each individual, in the various modalities or types of comparison, 
 ought to disclose the tendencies. 
 
 Kecord was made, in each case, of the order in which the two 
 stimuli were presented, and the stimulus indicated which became the 
 subject of the proposition expressing the judgment. This record en- 
 ables a statement of the number of judgments directed toward the 
 first or the second, and toward the positive or negative stimulus. 
 The various arrangements were presented in a chance order, care 
 being taken only that the same number of each arrangement be pre- 
 sented, three of each in each group of six. 
 
 In Tables I. and II. the distribution of the 90 judgments, for each
 
 62 
 
 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 observer, regardless of modality or situation, is given. The records 
 for "positive stimulus first" are kept separated from those for " nega- 
 tive first," but the total distribution also given. It would suffice to 
 give in the table only a statement as to whether the judgment was 
 directed in each case toward the first or toward the second stimulus, 
 
 TABLE XXXIV 
 
 DISTRIBUTION OP JUDGMENTS. TEACHERS AND COLLEGE STUDENTS 
 
 Observer 
 Lst 
 
 Positive Quality First 
 1st 2d Pos. Neg. 
 
 .. 32 13 32 13 
 
 Negative Quality First 
 1st 2d Pos. Neg. 
 
 17 28 28 17 
 
 9 36 36 9 
 2 43 43 2 
 5 40 40 5 
 3 42 42 3 
 4 41 41 4 
 6 39 39 6 
 3 42 42 3 
 4 41 41 4 
 45 45 
 9 36 36 9 
 3 42 42 3 
 3 42 42 3 
 6 39 39 6 
 7 38 38 7 
 45 45 
 1 44 44 1 
 45 45 
 14 31 31 14 
 21 24 24 21 
 19 26 26 19 
 
 Total Distribution 
 1st 2d Pos. Neg. 
 
 49 41 60 30 
 45 45 72 18 
 44 46 85 5 
 21 69 56 34 
 41 49 80 10 
 45 45 82 8 
 45 45 78 12 
 45 45 84 6 
 46 44 83 7 
 36 54 81 9 
 42 48 69 21 
 42 48 81 9 
 25 65 64 26 
 45 45 78 12 
 48 42 79 11 
 14 76 59 31 
 9 81 52 38 
 18 72 63 27 
 42 48 59 31 
 57 33 60 30 
 58 32 65 25 
 
 Ger 
 
 ..36 9 36 9 
 
 Stf 
 
 . . 42 3 42 3 
 
 Bro. 
 
 16 29 16 29 
 
 Mes 
 
 ..38 7 38 7 
 
 Sch 
 
 . . 41 4 41 4 
 
 Schl 
 
 ..39 6 39 6 
 
 Sal 
 
 . . 42 3 42 3 
 
 Bok 
 
 . . 42 3 42 3 
 
 Mor 
 
 . . 36 9 36 9 
 
 Ell 
 
 .. 33 12 33 12 
 
 New 
 
 . . 39 6 39 6 
 
 Seb 
 
 . . 22 23 22 23 
 
 Hrt 
 
 ..39 6 39 6 
 
 Sav 
 
 . 41 4 41 4 
 
 Fit 
 
 . . 14 31 14 31 
 
 Van 
 
 8 37 8 37 
 
 Lat 
 
 .. 18 27 18 27 
 
 Pow 
 
 . . 28 17 28 17 
 
 Wri 
 
 . . 36 9 36 9 
 
 Bur 
 
 ..39 6 39 6 
 
 Total 
 
 . . 681 264 681 264 
 
 136 809 809 136 817 1,073 1,490 400 
 Negative quality first. Grand Totals. 
 
 TABLE XXXV 
 
 N OP JUDGMENTS. CHILDREN 
 
 Negative Quality First Total Distribution 
 1st 2d Pos. Neg. 1st 2d Pos. Neg. 
 
 6 39 39 6 45 45 78 12 
 4 41 41 4 38 52 75 15 
 8 37 37 8 49 41 78 12 
 5 40 40 5 46 44 81 9 
 3 42 42 3 39 51 78 12 
 3 42 42 3 42 48 81 9 
 45 45 45 45 90 
 11 34 34 11 51 39 74 16 
 3 42 42 3 9 81 48 42 
 1 44 44 1 44 46 87 3 
 
 Observers 
 
 Ave. . , 
 
 Positive quality first 
 
 DISTRIBUTIO 
 
 Positive Quality First 
 1st 2d Pos. Neg. 
 
 39 6 39 6 
 34 11 34 11 
 41 4 41 4 
 41 4 41 4 
 36 9 36 9 
 39 6 39 6 
 45 45 
 . 40 5 40 5 
 6 39 6 39 
 43 2 43 2 
 
 Bio 
 
 Dec 
 
 Bil 
 
 Gil 
 
 Col 
 
 Gil 
 
 How 
 Smi 
 
 Sau.. 
 
 Total 364 86 364 86 
 
 Positive quality first. 
 
 44 406 406 44 
 Negative quality first. 
 
 408 492 770 130 
 Grand Totals.
 
 NATUEAL OS HABITUAL TENDENCIES OF JUDGMENT 63 
 
 and from these results the distribution with respect to positive and 
 negative qualities might be calculated. But since in one case the 
 positive judgments would coincide with those directed toward the 
 first, and in the other case with those directed toward the second 
 stimulus, the source of the totals in such a table would not be at once 
 clear. Consequently, for the sake of clearness, the two types of dis- 
 tribution are given, in parallel vertical columns. The numbers in 
 the two columns will be the same, the difference being in their 
 
 arrangement. 
 
 TABLE XXXVI 
 
 DISTRIBUTION OF JUDGMENTS IN THE VARIOUS MODALITIES OP SENSATION. 
 TEACHERS AND COLLEGE STUDENTS 
 
 Modality or Situation On 1st On 2d On Positive On Negative 
 
 Lifted weights 54 72 113 13 
 
 Length of lines 57 69 108 18 
 
 Size of squares 52 74 99 27 
 
 Volumes 56 70 97 29 
 
 Pitch of tones 44 82 91 35 
 
 Shades of gray 34 92 89 37 
 
 Amounts of money 56 70 97 29 
 
 Degree of pain 53 73 112 14 
 
 Perfumes, affective tone . . 61 65 120 6 
 
 Dates 60 66 85 41 
 
 Pressures 55 71 110 16 
 
 Intensity of sounds 61 65 120 6 
 
 Surfaces, texture 62 64 87 39 
 
 Speed of metronome 51 75 70 56 
 
 Bad odors _61 65 92 34 
 
 Total judgments 817 1,073 1,490 400 
 
 TABLE XXXVII 
 DISTRIBUTION or JUDGMENTS IN THE VARIOUS MODALITIES. CHILDREN 
 
 Modality or Situation On 1st On 2d On Positive On Negative 
 
 Lifted weights 24 36 54 6 
 
 Length of lines 27 33 55 5 
 
 Size of squares 26 34 56 4 
 
 Volumes 27 33 53 7 
 
 Pitch of tones 26 34 42 18 
 
 Shades of gray 28 32 50 10 
 
 Amounts of money 26 34 56 4 
 
 Degree of pain 31 29 49 11 
 
 Perfumes, affective tone. 30 30 58 2 
 
 Dates 27 33 35 25 
 
 Pressures 27 33 57 3 
 
 Intensity of sounds 29 31 55 5 
 
 Surfaces, texture 26 34 50 10 
 
 Speed of metronome .... 24 36 46 14 
 
 Bad odors _30 _30 54 6 
 
 Total judgments 408 492 770 130
 
 64 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 If there is no inclination to prefer the first or the second, the 
 positive or the negative stimulus, there will be a chance distribution 
 of the judgments with respect to the stimulus which becomes the sub- 
 ject of the proposition expressing the judgment. If there is a con- 
 stant tendency to direct the judgment toward either the first or 
 toward the second stimulus presented, there will be of necessity an 
 equal number of positive and negative judgments, since both quali- 
 ties occurred the same number of times in the second and first orders 
 of presentation. If however there is instead a constant tendency to 
 direct the judgment toward either the positive or the negative 
 stimulus, these judgments will be for the same reason distributed 
 between the first and second positions. What is really found is 
 summed up in the following table. 
 
 TABLE XXXVIII 
 
 SUMMABY OF DlSTBIBUTION 
 
 Positive Quality 1st Negative Quality 1st Grand Totals 
 
 Observers 1st 2d Pos. Neg. 1st 2d Pos. Neg. 1st 2d Pos. Neg. 
 
 Adults 681 264 681 264 136 809 809 136 817 1,073 1,490 400 
 
 Children. . . 364 _86 364 _86 _44 406 406 _44 408 492 770 130 
 
 Totals 1,045 350 1,045 350 180 1,215 1,215 180 1^225 1,565 2,260 530 
 
 The grand totals show that there is no striking preference for 
 either the first or the second position. Such difference as is present, 
 is about 6 per cent, more than chance relation in favor of the second 
 stimulus presented. This balance is due chiefly to the cases in which 
 the positive stimulus comes second, in which case there are only 180 
 judgments on the first as compared with 1,215 on the second. When 
 the positive is presented first there are on the contrary 1,045 judg- 
 ments directed toward the first stimulus as compared with only 350 
 toward the second. The direction of the judgment is not determined 
 to any considerable degree by the mere fact of temporal position. 
 
 But examination of the tendency toward positive and negative 
 quality shows that here there are very decided preferences and incli- 
 nations. There are a total of 2,260 positive judgments, as compared 
 with only about 25 per cent, as many negative judgments (530). 
 The tendency toward the positive holds no matter in what order the 
 stimuli are presented. However, along with the pronounced inclina- 
 tion toward the positive quality, there is, as pointed out above, a 
 slight preference for the second position as such. Consequently when 
 the positive is second in order of presentation, the ratio of positively 
 directed judgments to those negatively directed is very large (6.8 to 
 1). When the positive is presented first the ratio is smaller, but is 
 still pronounced (3 to 1).
 
 NATURAL OB HABITUAL TENDENCIES OF JUDGMENT 65 
 
 This inclination toward the positive quality is more striking in the 
 case of the children than it is with the adults, the final ratio for the 
 former being about 6 to 1, and for the latter 3.7 to 1. The children, 
 that is to say, show less inclination toward the second stimulus as 
 such and more inclination toward the positive quality as such than 
 is the case with the adults. 
 
 The members of the group of adults show practical uniformity in 
 this inclination. The final results for the 21 individuals show not a 
 single exception to the general rule. Only when the positive comes 
 first and the slight inclination toward the second stimulus favors 
 the negative quality are any exceptions shown. Then the judgments 
 of four adults show the reverse relation and one individual shows an 
 impartial distribution. 
 
 A similar uniformity characterizes the group of children. In the 
 final totals there is no exception to the general rule. When the posi- 
 tive is presented first a single individual with a strong inclination 
 towards the second stimulus, affords the only exception in the table. 
 
 Tables XXXVI. and XXXVII. show the distribution of the judg- 
 ments of both groups with respect to the modality or situation within 
 which the stimuli fall. With respect to the slight preference for the 
 second stimulus, all of the 15 groups of stimuli agree. With the 
 adults this tendency is most pronounced with the shades of gray and 
 the pitch of tones, and least pronounced with surfaces, sound inten- 
 sities, perfumes and disagreeable odors. With the children it is most 
 pronounced with the weights and speeds, while odors and perfumes 
 show no difference, and pains are slightly reversed. 
 
 With respect to positive or negative direction again, all modalities 
 and situations agree. With adults the inclination toward the positive 
 is most striking with perfumes, sound intensities, pains, weights, and 
 pressures, the ratio here being about 10 to 1. It is least evident with 
 speeds, dates, and surfaces, although even here the ratio is as high 
 as 2 to 1. In the case of the children the positive-negative ratio is 
 highest with perfumes and pressures, and lowest with dates, pitches, 
 and speeds. 
 
 In Table XXXIX. the various modalities have been grouped into 
 five sections according to the degree of positive tendency shown. 
 Thus group 1 contains the three modalities or situations which show 
 the most pronounced inclination toward the positive quality, section 
 5 containing the three which show the least tendency. The figure 
 after each modality shows the section into which that group falls. 
 That the order of the various modalities for the two groups of 
 observers is almost identical is shown by the fact that the modalities 
 fall into much the same section of the total series of 15, for both
 
 66 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 groups. Those which stand high with the adults stand high with the 
 children also, and the positions in the scale practically coincide, so 
 long as the same tendency is under consideration. But modalities 
 standing high for inclination toward the second stimulus tend, of 
 course, to fall low for inclination toward the positive quality. 
 
 TABLE XXXIX 
 
 Inclination Toward Inclination Toward 
 
 the Second Stimulus the Positive Quality 
 
 Modality Adults Children Adults Children 
 
 Weights 2113 
 
 Lengths 4 4 2 2 
 
 Squares 2 231 
 
 Volumes 3 333 
 
 Pitch 1145 
 
 Grays 1444 
 
 Money 3 2 3 2 
 
 Pain 2524 
 
 Perfumes 4 5 1 1 
 
 Dates 4 455 
 
 Pressures 3 321 
 
 Sounds 5412 
 
 Surfaces 5 254 
 
 Speeds : 1 1 5 5 
 
 Bad odors 5 543 
 
 Several interesting points are to be noticed here with reference to 
 what the positive quality is felt to be in the different situations. 
 With the grays it is darkness, not brightness, that is the positive 
 quality. With dates it is recency, and still more curiously, even with 
 the stale cheese odors, which most observers felt to be unpleasant 
 in character, the positive quality, as indicated by the direction of the 
 judgments, is agreeableness just as was the case with the pleasant 
 perfumes. 
 
 Such facts as these suggest that what we have called the ' ' positive 
 quality" of a modality or of a judgment situation is not a permanent 
 or characteristic property of that modality or situation throughout 
 its whole range, but depends perhaps on the absolute impression 
 received from the selections presented. This would mean, then, that 
 if the grays, for example, which were presented in the experiment, 
 had been lighter grays than those actually used, the observers might 
 perhaps have received an absolute impression of brightness rather 
 than of darkness, and that this absolute impression would modify the 
 natural inclination of the judgments. 
 
 This form of absolute impression would, however, be somewhat 
 different from the absolute impression which plays a role in the com- 
 parison of stimuli in a given experimental series. Experiments are
 
 NATUEAL OS HABITUAL TENDENCIES OF JUDGMENT 67 
 
 under way which are designed to determine whether the selection of 
 stimuli from the extremes or middle of the scales of magnitude, inten- 
 sity, brightness, affective quality, etc., reveals any change in the 
 preferred direction or inclination of judgment, at what points the 
 changes come, if present, and what individual differences are shown 
 by various observers. These results will not be presented in the 
 present connection. The purpose of the experiments here reported 
 was simply to determine whether or not, under the conditions of a 
 given judgment situation, definite, characteristic, and uniform tend- 
 encies of judgment expression and direction of attention are present. 
 That such is the case, and what the character of these tendencies is, 
 have been clearly indicated. The chief results may be summarized 
 as follows: 
 
 Summary 
 
 1. The most striking inclination shown is a strong tendency to 
 direct the judgment toward the stimulus described as "positive" in 
 quality. This tendency is present with both children and adults, 
 with all modalities and situations, regardless of the order in which 
 the stimuli are presented. The tendency is markedly stronger with 
 children than with adults. There are no exceptions to the general 
 rule, among the 31 observers studied. Among the various modalities 
 and judgment situations differences are shown, which are common to 
 both groups of observers. 
 
 2. There is a slight tendency to favor the second stimulus pre- 
 sented. This inclination is not nearly so strong as the positive tend- 
 ency, is weaker with children than with adults, and is consistently 
 stronger in some modalities than others. It is strongest in those 
 modalities in which the positive inclination is weakest.
 
 CHAPTER VII 
 
 JUDGMENTS OP SIMILARITY AND DIFFERENCE 1 
 
 WHEN an observer is presented with two stimuli and instructed 
 to compare them with respect to some general property such as 
 weight, size, pitch, affective quality, intensity, etc., it is apparent that 
 he has fairly decided preferences or inclinations with respect to the 
 form in which his judgment is expressed. Thus comparisons of 
 weight may proceed in terms of either heaviness or lightness, com- 
 parisons of pitch in terms of either highness or lowness, comparisons 
 of affective quality in terms of either agreeableness or disagreeable- 
 ness. But experiments show (see Chapter VI.) that judgments in 
 terms of lightness, lowness, shortness, smallness, faintness, etc., are 
 very infrequent so long as the observer is left to his own inclination. 
 These categories, which may be designated as "negative," since they 
 imply the absence of some positive factor in the stimulus or situation, 
 seem to be, if not more artificial, at least more unaccustomed than the 
 contrasting and grammatically opposite "positive" categories. 
 
 Conceivably these natural tendencies or inclinations or judgment 
 habits may exert an appreciable influence on the apperception of 
 the two stimuli, and hence on the outcome of the judgment in cases in 
 which the differences, though objective, are small. This point has 
 not remained untouched in the technique of the psychophysicists. 
 As we have seen in Chapter V., Brown emphasizes the fact that, in 
 the comparison of lifted weights, the judgment of difference depends 
 upon the form of expression. It will be recalled that Muller and 
 Schumann, and Muller and Martin made certain recommendations 
 as to procedure in psychophysical experiments, as a result of related 
 observations. 
 
 When Brown's report appeared the writer was in the midst of 
 an investigation of judgments of a "subjective" type, such as are 
 involved in the comparison, estimation, and measurement of such 
 complex material as handwriting, comic situations, arguments, ap- 
 peals to instincts and interests, photographs, etc. One of the prob- 
 lems outlined in that investigation (the results of which comprise, in 
 part, the present monograph) is that of investigating the influence 
 of the category or form of expression on the outcome of judgments of 
 similarity and difference, and of other pairs of logical or grammatical 
 i Reprinted from The Psychological Review, September, 1913. 
 
 68
 
 JUDGMENTS OF SIMILARITY AND DIFFERENCE 69 
 
 opposites, of analyzing the psychological relation between the two 
 types of judgment, and of discovering the relative ease, consistency, 
 and certainty of the various categories when the judgments are 
 directed toward the same material, both in the case of the same ob- 
 server and with groups of observers. The present chapter concerns 
 itself with the first mentioned pair of categories, similarity and 
 difference. 
 
 The problem, in the writer's mind, grows at once out of the con- 
 tradictory character of the few relevant references available in the 
 literature of judgment. The following references to experimental 
 and general studies will illustrate the point, and raise more or less 
 definitely the question at issue. 
 
 June E. Downey, ' ' Preliminary Study of Family Eesemblance in Handwriting. ' ' 
 
 Bulletin No. 1, Dept. of Psychology, Univ. of Wyoming. 
 "In general a judgment of unlikeness is made with greater ease than one 
 of likeness" (p. 49). "Toward the close of a series the judgments became 
 judgments of dissimilarity. The records show that such a judgment is fre- 
 quently made more easily than is a judgment of likeness. . . . There were sub- 
 jects . . . who were more constant in their judgments of dissimilarity than in 
 those of similarity, and who varied less from the average in the case of the latter. 
 Some subjects . . . first selected the specimens most unlike the standard and 
 then proceeded to find the similar hands by elimination of the unlike" (p. 20). 
 ' ' The judgment of unlikeness is, on the whole, an easier one to make than the 
 judgment of likeness. There is considerable agreement among subjects as to the 
 handwriting most unlike a given specimen" (p. 24), etc. 
 
 These statements are based on the variabilities of 'five successive 
 trials by the same individuals, the instructions being "to arrange 
 the writing specimens in the order of their likeness to a given 
 standard" (p. 15). But if one is judging in terms of likeness one can 
 not fairly speak of judgments of unlikeness resulting from such an 
 experiment. It is assumed here that the category in which the judg- 
 ment is expressed has no influence on the outcome of that judgment. 
 But I shall show later that a judgment of unlikeness is not merely 
 the reverse of a judgment of likeness, but a new kind of judgment. 
 The "least similar" is not therefore the "most unlike." 
 
 George V. N. Dearborn, "Notes on the Discernment of Likeness and Unlike- 
 ness." Journal of Philosophy, etc., February 3, 1910, p. 57. 
 Reports a research which ' ' aimed to help the analysis of the mental process 
 by which we become aware of similarity and dissimilarity . . . judgments as 
 to the likeness and unlikeness experienced in the case of a series of visual forms. 
 . . . The method of experimentation in detail was simply as follows: The 
 hundred blot cards (bearing blots of ink) being placed in order ten-square on 
 the table before the seated subject and the norm in its frame conveniently be- 
 fore his eyes and above the blots, he proceeded to select within fifteen minutes 
 the ten blot-cards out of a hundred most similar in form or shape to the norm,
 
 70 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 and to place them one side arranged carefully and deliberately in the order of 
 their judged similarity to the norm. Meanwhile the subject reported how he 
 apperceived the norm and what he considered its most essential form-character- 
 istics and peculiarities. These subjective notes were recorded and the numbers 
 of the ten blots judged most like the norm, and in their chosen order. The 
 time required for a selection satisfactory to the subject was also recorded, and 
 at the end of the selection the reason why each of the ten had been preferred, 
 concisely as possible. The process in the case of judgments as to unlikeness was 
 precisely the same, with the appropriate change in intention to keep dissimilarity 
 instead of similarity in mind" (pp. 57-58). 
 
 Dearborn continues: "Ideal criteria (as distinguished from affective) gave 
 more accurate results in the dissimilarity choices than in the similarity choices. 
 This is as we should expect on logical principles. The awareness of unlikeness 
 is an easier, if not a simpler, process apparently than that of likeness, for the 
 change of consciousness is greater and so easier to appreciate. At any rat the 
 sets of blots chosen as unlike the norm were much more certainly unlike it than 
 were the 'similar' blots chosen like it" (p. 61). 
 
 There are two things to be pointed out in this connection. The 
 first is the fact that in Dearborn 's experiment the judgments of like- 
 ness and of unlikeness were directed toward totally or partially differ- 
 ent stimuli, and hence the ease of the judgment as mere judgment is 
 in no way indicated by his results. It may well have been that the 
 dissimilar blots differed from the standard in more points than that 
 number in which the similar blots resembled the same standard. In 
 the absence of quantitative measurements of amounts of likeness or 
 unlikeness, the relative ease of the two types of judgment can be 
 made out only when the same material is employed in the two cases. 
 The second point is that the assumption that the awareness of unlike- 
 ness is a simpler and easier process than the awareness of likeness 
 seems to the writer to be completely gratuitous, until the difference 
 has been experimentally demonstrated. The results of the present 
 experiments indicate that the contrary is the case. 
 
 As opposed to the point of view suggested in the two articles just 
 referred to, we find in other places frequent assertion of the more 
 fundamental character of the judgment of resemblance, and the 
 derived character and secondary importance of the judgment of 
 difference. Thus Miss Macdonald, in her review of Preyer's " Infant 
 Mind, ' ' says that likeness is more easily discerned than difference. 
 
 Titchener, "A Text Book of Psychology," p. 26, says: "We notice these 
 differences (in human bodies) because we are obliged, in everyday life, to dis- 
 tinguish the persons with whom we come in contact. But the resemblances are 
 more fundamental than the differences. If we have recourse to exact measure- 
 ments we find that there is in every case a certain standard or type to which the 
 individual more or less closely conforms and about which all the individuals are 
 more or less closely grouped. And even without measurement we have evidence 
 to the same effect; strangers see family likenesses which the members of the
 
 JUDGMENTS OF SIMILARITY AND DIFFEBENCE 71 
 
 family can not themselves detect, and the units in a crowd of aliens, Chinese or 
 negroes, look bewilderingly alike. ' ' That there may be a difference in the psy- 
 chological character of the two judgments is suggested by the same writer's 
 statement that ' ' reports of equality or identity are less frequently based on image- 
 less comparison than reports of difference" (p. 534). 
 
 Jevons, "Principles of Science," pp. 43 and 44, insists that similarity 
 and difference are only two forms of expression of one and the same judgment. 
 ' ' In every act of intelligence we are engaged with a certain identity or difference 
 between things or sensations compared together. " " We can not, in fact, assert 
 the existence of a difference without at the same time implying the existence of 
 an agreement. " " Agreement and difference are ever the two sides of the same 
 act of intellect, and it becomes equally possible to express the same judgment in 
 the one or the other aspect. " " It is a matter of indifference in a logical point 
 of view, whether a positive or a negative term be used to denote a given quality 
 and the class of things possessing it. " " But there are very strong reasons why 
 we should employ all propositions in their affirmative form." "All inference 
 proceeds by the substitution of equivalents and a proposition expressed in the 
 form of an identity is ready to yield all its consequences in the most direct man- 
 ner. . . . Difference is incapable of becoming the ground of inference; it is only 
 the implied agreement with other differing objects which admits of deductive 
 reasoning, and it will always be found more advantageous to employ propositions 
 in the form which exhibits clearly the implied agreements." 
 
 Bergson, "Creative Evolution," p. 214, remarks: "Independently of all 
 consciousness the living body itself is so constructed that it can extract from the 
 successive situations in which it finds itself the similarities which interest it, and 
 so respond to the stimuli by appropriate reactions." Also (pp. 4446): "We 
 must have managed to extract resemblances from nature which enable us to antic- 
 ipate the future." 
 
 The last three references seem to agree on the proposition that 
 psychologically, in real life, it is similarity that most interests us. If 
 we perceive difference it is only for the sake of a search for similarity 
 conformity to type, interest, image, desire, etc. In handling coins 
 the differences usually lapse in favor of the similarities, except in the 
 case of the expert. To perceive differences requires special, some- 
 times professional training, and this is not necessarily because the 
 differences are smaller than the agreements. They may be just as 
 obvious, once they become interesting. We are seeking for agree- 
 ments. In hunting, the resemblance of the stubble to the form of a 
 rabbit is more striking than its many points of difference. So in 
 diagnosing disease we are strongly interested in certain diagnostic 
 features and accustomed to look for them, since they are significant 
 in the midst of infinite diversity of other factors. Just as we are 
 prone to "see only those instances which are favorable to the theory 
 or belief which we already possess " (Creighton, "Logic," p. 250), 
 so we tend to warp every perception toward the idea or image which 
 we happen to have at the time. And just as in observing a race of 
 men, the members of a profession, or a species of animal or plant life,
 
 72 
 
 we tend always to form a conception of a type or mode from which 
 the separate members of the group shall vary the least, so in so arti- 
 ficial a task as the process of judging the separate magnitudes of an 
 experimental series we tend to conceive a central value from which 
 the total deviations of the different magnitudes shall be the least. 
 The clearly demonstrated "central tendency of judgment," the so- 
 called "indifference point phenomenon" may be due largely to the 
 fact that resemblances are more striking than differences, and hence 
 all magnitudes approximating the type are assimilated towards it 
 (see Chapter IV. ; also, "The Inaccuracy of Movement," Ch. III., on 
 "The Indifference Point"). 
 
 The Present Experiment 
 
 The purpose of the experiment was to investigate the influence of 
 the category or form of expression on the outcome of judgments of 
 similarity and difference, to analyze the psychological relation be- 
 tween the two types of judgments, and to discover the relative ease, 
 consistency, and certainty of the two judgments when directed toward 
 the same material, both in the case of -the same observer and with 
 groups of observers. 
 
 The material to be judged consisted of 35 specimens of hand- 
 writing, each specimen written by a different individual, the indi- 
 viduals chosen at random. Each individual wrote, on a standard 
 sized card, the words, 
 
 Department of Psychology 
 
 Barnard College 
 Columbia University. 
 
 One individual wrote two copies, one of which served as the stand- 
 ard by which the other 35 specimens were judged. The same cards 
 and the same standard were used throughout the experiment, which 
 covered a period of 14 months. 
 
 The chief observers, nine in number, were divided into three 
 groups, designated by the words "similarity 1st," "difference 1st," 
 and "mixed." Each member of the first group proceeded as follows. 
 He was given the pack of 35 specimens, accompanied by the standard 
 card. He was asked to arrange the cards in an order of resemblance 
 to the standard, placing the most similar specimen at the top, the 
 next most similar in the second place, and the least similar at the 
 bottom, with the remaining cards in their appropriate intermediate 
 positions. After completing his arrangement, for which he was 
 allowed all the time desired, he was handed a sheet of paper and re-
 
 JUDGMENTS OF SIMILARITY AND DIFFERENCE 73 
 
 quested to give an introspective account of the criteria used in pass- 
 ing his judgments. A week later he was again given the cards and 
 asked to again arrange them in an order of similarity to the standard. 
 After this second arrangement he was given his introspection sheet 
 and asked to note down any modifications of criteria observed in this 
 second trial. 
 
 After another week the same observer was given the cards and 
 asked to arrange the specimens of handwriting in an order of differ- 
 ence from or unlikeness to the standard, putting at the top of his list 
 the card most different, at the bottom the card least different, etc. A 
 fresh introspection sheet was prepared after this arrangement, and 
 criteria noted without reference to the previous records. After a 
 third week a second arrangement on the basis of unlikeness to the 
 standard was made, and further notes made on the introspection 
 sheet. 
 
 The "difference 1st" group performed the experiment in the 
 same way, except that their first two arrangements were in terms of 
 difference and the last two in terms of similarity. In the case of the 
 "mixed" group an arrangement on the basis of similarity was fol- 
 lowed by an arrangement for difference, or vice versa, before the 
 second trial for the same category of judgment. 
 
 Only one of the observers (H. L. H.) knew the purpose of the 
 experiment at the beginning. One observer (Str.) suspected the pur- 
 pose before his arrangements had all been made. Observer H. L. H. 
 repeated the four arangements 14 months after the first trials had 
 been made. The intervals of one week seemed to be sufficiently long 
 to eliminate any very decided memory effect except in the cases of 
 the one card written in the same hand as the standard, and one 
 other card which was strikingly different from that standard in 
 almost every respect. 
 
 The place of each card in the various orders was recorded for 
 each observer. The data secured from such a procedure can be 
 examined from many points of view. In the case of each observer 
 the two orders for similarity can be correlated, and the consistency 
 of such a judgment indicated by the coefficient of correlation. The 
 same thing may be done with the two orders for difference. The 
 orders for difference may be inverted and the reciprocal order thus 
 obtained correlated with the original orders for similarity. In the 
 same ways may be treated the final orders for both similarity and 
 difference secured by averaging the arrangements of the nine ob- 
 servers. The three groups of observers may be compared with each 
 other in all these respects. In the case of the final orders for both 
 
 6
 
 74 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 categories, the variability of the individual judgments may be com- 
 puted for each card, and the categories and groups of observers again 
 compared with respect to this variability of judgment. The arrange- 
 ments of the various observers may be compared with the final orders 
 secured from the group averages, and in this way the agreement of 
 each individual with the group average (judicial capacity) deter- 
 mined. Comparing these measurements with the correlation between 
 the various trials of the same observer affords a measure of the rela- 
 tion between personal consistency and general judicial capacity. 
 Various other interesting and perhaps significant comparisons may 
 be made, some of which will be later pointed out. All of these points 
 of view will throw light on the psychological relation between the two 
 categories of judgment, which it is the main purpose of the investiga- 
 tion to study. 
 
 The results of many of these comparisons and correlations are 
 given in the following tables. In computing coefficients of correla- 
 tion the formula 
 
 n(n 2 1) 
 
 has been used. The introspections of the observers, in so far as they 
 bear on the point of the experiment, are also given. 
 
 Table XL. gives the coefficients of correlation between the various 
 arrangements of each of the nine observers, along with the average 
 coefficients for the group. In this table $1 indicates the first trial 
 for similarity and $2 the second trial. PI and D2 indicate the two 
 trials for difference. Whenever similarity is correlated with differ- 
 ence the reciprocal of the difference order (the inverted order) is used. 
 
 TABLE XL 
 CORRELATIONS BETWEEN THE VARIOUS ARRANGEMENTS BY THE SAME INDIVIDUALS 
 
 S, Similarity. D, Difference. The orders for difference were inverted when- 
 ever similarity was correlated with difference. The figures represent positive co- 
 efficients of correlation, by formula given in text. 
 
 Subject. SI with 82 DlwithD2 Average 51 with Dl 52 with D2 Average 
 
 L.S.H., S 1st ..... 833 .813 .823 .639 .723 .681 
 
 DeN., 51st ..... 781 .572 .677 .619 .655 .637 
 
 Str., S 1st ..... 700 .811 .756 .506 .767 .636 
 
 Rich., D 1st ..... 856 .676 .766 .654 .740 .697 
 
 Bar., D 1st ..... 748 .586 .667 .613 .653 .633 
 
 G.E.H., D 1st ..... 916 .727 .822 .630 .754 .692 
 
 Hart, Mixed... .756 .775 .765 .572 .784 .678 
 
 Kup., Mixed... .771 .894 .832 .760 .911 .835 
 
 H.L.H., Mixed. . . .744 .677 .710 .439 .495 .467 
 
 Average .......... 789 .726 .757 .604 .720 .662 
 
 Mean variation . . .052 .087 .055 .065 .079 .061
 
 JUDGMENTS OF SIMILARITY AND DIFFERENCES 75 
 
 Several points are at once disclosed by Table XL. 
 
 1. The correlations of the two arrangements according to similar- 
 ity (81 with $2) are greater than the correlations of the two arrange- 
 ments for difference (Dl with D2). With six of the nine observers 
 this is clearly the case. With three it is not true. Two of these 
 three are in the mixed group, and in one of these cases there is no real 
 difference betwen the two coefficients. The third exception to the rule 
 is in the case of observer Str., who suspected the purpose of the 
 experiment and whose introspective account states that he was dis- 
 turbed by having read up on the subject. However observer H. L. H. 
 was aware of the purpose of the experiment from the beginning, he 
 being in fact the writer, and his coefficients show the normal relation. 
 Apparently the mixed order of arrangements introduces factors or 
 tendencies not present with the other two groups (see also introspec- 
 tions of observer Kup. under "difference"). Whether similarity or 
 difference is judged first, five of the six observers in these two groups 
 show considerably higher personal consistency when judging similar- 
 ity. Averaging the nine observers yields a coefficient of .789 for 
 similarity as against .726 for difference. 
 
 2. If there is no psychological difference between judgments of 
 similarity and judgments of difference, if, as Jevons states, "Agree- 
 ment and difference are ever the two sides of the same act of intellect, 
 and it becomes equally possible to express the same judgment in the 
 one or the other aspect," the inverted order for difference should 
 show the same correlation with a direct order for similarity as do two 
 arrangements for similarity or two arrangements for difference. The 
 fact that the coefficients for similarity are higher than those for 
 difference suggests that the two categories of judgment are not 
 psychologically the same. But the case is still more apparent when 
 these reciprocal correlations are compared with the direct ones. 
 Observe the correlations of SI with the inversion of Dl. With every 
 observer these coefficients are smaller than those for two arrange- 
 ments for similarity ($1 and $2). The average coefficient is almost 
 20 per cent, lower. And with seven of the nine observers these 
 coefficients are also lower than the coefficients for two arrangements 
 for difference, the average coefficient being 12 per cent, lower. 
 
 3. With every observer the coefficient for $2 with Z>2 is higher 
 than for SI with Dl, the average difference being 12 per cent. That 
 is to say, with practise and repetition the two judgments come to 
 resemble each other, and the inverted order for difference to agree 
 more closely with the direct order for similarity. This, we may 
 assume, accounts for the uncertainty shown by the members of the
 
 76 EXPEBIMENTAL STUDIES IN JUDGMENT 
 
 mixed group, with whom the two categories clashed more quickly than 
 with the other observers, who had made two arrangements under one 
 category before the other category was suggested. But even in these 
 correlations of $2 with D2, six observers show less agreement than 
 with the two arrangements for similarity. The average is some 7 
 per cent, lower than the average for $1 and S2, and about the same as 
 the average for the two orders for difference. Averaging the direct 
 correlations and comparing this coefficient with the average for the 
 inverted correlations shows a superiority of 13 per cent, in favor of 
 the former, and among the nine observers the only exception to this 
 rule is Kup. in the mixed group, whose two averages are identical. 
 
 It seems to be clear then, that the two categories are not merely 
 "the two sides of the same act of intellect"; that different psycho- 
 logical processes are involved, processes so different that they modify 
 the outcome of the judgment; and further, that judgments of similar- 
 ity are made, if not more easily, at least with higher consistency than 
 are judgments of difference. 
 
 Table XLI. gives the variability of the group averages for each 
 of the four arrangements. The average deviation of the individual 
 judgments from the average position of each card have been calcu- 
 lated. It seems unnecessary to give this figure for each of the 35 
 cards, hence the total series of 35 has been divided into 7 sections of 
 5 positions each and the average of the M. V. 's of each of these sec- 
 tions of 5 positions is given in the table. It should be noted that 
 corresponding sections do not always contain the same cards, although 
 this is in general true of the two orders for resemblance and the two 
 orders for difference. 
 
 TABLE XLI 
 
 THE VARIABILITY OP THE GROUP AVERAGES FOR THE VARIOUS ARRANGEMENTS 
 The figures are the average M.V. 's of successive groups of five cards. 
 
 Similarity Similarity Difference Difference 
 
 Positions 1st Trial 2d Trial 1st Trial 2d Trial 
 
 1 to 5 inc 5.18 4.46 4.70 5.40 
 
 6 to 10 inc 5.44 6.78 6.42 7.90 
 
 11 to 15 inc 6.76 6.88 7.76 7.77 
 
 16 to 20 inc 6.34 7.72 8.16 7.58 
 
 21 to 25 inc 6.34 6.96 6.78 5.82 
 
 26 to 30 inc 7.58 6.14 5.76 7.42 
 
 31 to 35 inc 5.26 4.76 4.56 4.54 
 
 Average 6.13 V /6.24 6.30v ,6.63 
 
 M.V Ji^&ir .82 1.11 X 6.47 X 1.19 
 
 In this table then we are dealing no longer with personal con- 
 sistency but with the variability of a group of nine observers. Two 
 facts of interest are disclosed by this table. The first is that, although
 
 JUDGMENTS OF SIMILARITY AND DIFFERENCE 77 
 
 the final averages of the variabilities under the four trials differ very 
 little, such differences as are present point to lower variability for 
 similarity judgments than for judgments of difference. Both the 
 averages for similarity are lower than either of the averages for 
 difference. There seems to be a slight tendency for the second trials 
 to be more variable than the first, although the difference is small and 
 not reliable. But such as it is, this difference is greater in the case of 
 the difference series than in the case of the similarity series. 
 
 The second fact disclosed by the table is that with the arrange- 
 ments for similarity the cards at the top of the series show smaller 
 variability than those of the corresponding section at the bottom, 
 thus the first five tend to be less variable than the last five, the second 
 less than the sixth, the third than the fifth. But with the arrange- 
 ments for difference the reverse tends to be the case, that is, the 
 sections below the center of the series are less variable than the corre- 
 sponding sections above the center. What this means then is this: 
 that whether judging in terms of similarity or in terms of difference, 
 it is on the cards which are most like the standard that the judgments 
 of the various members of the group of observers agree most closely. 
 
 Summing up the results of this table we may say that the observ- 
 ers agree with each other more closely when judging similarity than 
 when judging difference, and that in either case they agree more 
 closely on the cards which are more like the standard than on those 
 which are more unlike it. 
 
 The results of the two tables just discussed are further confirmed 1 
 by those shown in Table XLII. One observer made arrangements of 
 the cards for both similarity and difference fourteen months after the 
 original experiment, not having examined the cards in the meantime. 
 These arrangements have been correlated with the similar arrange- 
 ments of the original experiment. The correlation between the 
 original and the later orders for similarity was .69. That for the 
 original and the later order for difference was .62. But the correla- 
 tion between the original order for similarity (difference) and the 
 inversion of the later order for difference (similarity) was only .36 
 (.62). That is to say, with an interval of over a year, personal con- 
 sistency for similarity is somewhat higher than that for difference^ 
 and the difference between the one category and the inversion of the 
 other is present and is especially striking in the case of the first two 
 arrangements of each period. 
 
 The final group average orders for the four arrangements have 
 been correlated, and Table XLII. presents these coefficients also. 
 They are all four extremely high, and the differences between them 
 are so small as to afford no suggestions.
 
 78 EXPEBIMENTAL STUDIES IN JUDGMENT 
 
 TABLE XLII 
 
 MISCELLANEOUS CORRELATIONS OF ARRANGEMENTS 
 
 Correlations of final group average orders: 
 
 1st order for similarity, with second trial 93 
 
 1st order for difference, with second trial 95 
 
 1st order for similarity with reciprocal of 1st order for difference 93 
 
 2d order for similarity with reciprocal of 2d order for difference 91 
 
 Subject H.L.H., correlations of trials 14 months apart: 
 
 1st resemblance with resemblance 14 months later 69 
 
 1st difference with difference 14 months later 62 
 
 1st order for resemblance with reciprocal of order for difference secured 14 
 
 months later 36 
 
 1st order for difference with reciprocal of order for resemblance secured 14 
 
 months later 62 
 
 In the following pages are given the introspections secured from 
 the nine chief observers whose results have been recorded, and also 
 introspections from several others who were asked to make but one 
 arrangement, some for similarity and others for difference. A dis- 
 cussion of the significance of these introspections will follow them. 
 
 Introspections 
 Resemblance : 
 
 DeN. The principal thing upon which my judgment was based was the 
 general slant of the writing, that is the sample was in a hand slanting from 
 Tight to left and the ones slanting in the same general direction looked more like 
 it than the vertical or backward. Another thing was the formation of the capi- 
 tals, especially of the letters P and C. Another factor was the space between 
 the letters, whether the word was all connected or whether it was broken. 
 
 Kup. At first the actual combination of various types of hand writing, e. g., 
 slant, round, backhand, as evidenced in the type given as a model appealed to me 
 and I was inclined to sort the cards according to this "combination type." 
 Soon, however, the elements of character, of the personality in back of that 
 type copy claimed my attention and this criterion established itself in my mind 
 as a standard by which to judge the others. I characterized the type copy as 
 having elements of rapidity, definiteness, free movement and no-waste-of-time. 
 It seemed that of a decided, quick thinking person. According to such charac- 
 teristics I tried to arrange the cards given. 
 
 Hrt. The first resemblance I thought of was that of slope, then the ques- 
 tion as to whether the joinings between the letters were sharp or curved. Then I 
 compared the relative height and depth of the letters, above and below the lines. 
 Then I noticed endings of words, whether they ended abruptly or with a flour- 
 ish. Methods of crossing t's and dotting i's were noticed and also methods of 
 finishing y's and g's. The apparent ease of the writing always struck me, 
 whether it seemed to swing along easily or to be stiff and cramped. The size of 
 the letters received little attention on the whole. 
 
 Rich. My introspections are just about the same as when I arranged the 
 cards for difference instead of resemblance, except that instead of looking to see 
 how the cards differed in general appearance, placing, slant, color, etc., I looked 
 for similarity in these respecta.
 
 JUDGMENTS OF SIMILAEITY AND DIFFEBENCE 79 
 
 Bar. I was influenced primarily by regularity or irregularity of lines in 
 the writing. If the whole seemed to be made up of lines going in all directions 
 I was inclined to classify it as like the standard. If the whole presented an 
 orderly appearance I did not consider it like the standard. I was influenced also 
 by the width and prominence of the pen line choosing, first those that were 
 darker and heavier, like the standard. Sometimes I found myself comparing 
 only the one word ' ' psychology ' ' on the various cards, then when I tried to see 
 them all at once the factor of regularity or irregularity was the strongest. 
 Slant had some influence, but the judgment was much a matter of general im- 
 pression, without any special factor so prominent. The ideas were mainly im- 
 pressionistic, I was guided more by a feeling of like or unlike than I was by any 
 specific comparisons. 
 
 Str. I first grouped the cards according to the position on them of the 
 three lines of writing, then according to uniformity, regardless of the style or 
 legibility, and finally, when the cards were very poor, according to legibility. 
 
 L.S.H. I based my judgments of similarity to the standard on the shape of 
 the letters and the slant of the writing. 
 
 H.L.H. Began in terms of slant and judged on basis of slant, roundness of 
 letters and general appearance of the card, until about two thirds of the way 
 down. Then the slants were all reversed, the judgments seemed more diflicult 
 and the criterion was shifted to letter formation, angles, tails of y's, capitals, 
 becoming more important. On turning back to the start, after the first arrange- 
 ment, these later factors asserted themselves, and I rearranged the first few 
 cards, paying more attention to the smaller details than I had done before. 
 
 Gas. The general character of the writing, as a whole, was the main basis 
 for the arrangement. By that I mean the general size, boldness or fussiness and 
 regularity. Next in importance was slant, and then the formation of the various 
 letters. 
 
 Wund. I judged first by the general character of the writing, then by the 
 slant of the letters, the distance the letters were apart, and their general round- 
 ness. As I reviewed my first arrangement I made several changes according to 
 the resemblance of the final letters of the different words, noting whether they 
 turned up or down. I also watched for the ways in which the t's were crossed. 
 
 Lyo. Personally I think I more or less unconsciously considered several 
 factors, such as shade of ink, position on card, legibility, script, and size, I 
 said "this or that card is like the standard" without forming the reason in 
 words. 
 
 And. First on the type of handwriting, an extremely masculine type, 
 then on the slant of the letters and lastly on their form. 
 
 Hod. I based my judgment chiefly on the general appearance and direction 
 of the writing, whether it was slanting, upright or backhand. I took into con- 
 sideration also the size of the writing, the spacing of the letters and the form of 
 the letters themselves. 
 
 Wd. In the first place I tried to pick out handwriting with the same gen- 
 eral slant and carelessness and arrangement. Then I noticed the capitals and 
 then of the endings of the words, the spacing and the size of the letters, al- 
 though these latter I did not use very much. The general features seemed more 
 important to me than the smaller details. 
 
 Difference: 
 
 DeN. I paid more attention to the formation of independent letters than 
 when I arranged the cards for resemblance. Used slant until about one third of
 
 80 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 the way through then had to rely on minor details, and the task became harder. 
 
 Kup. This arrangement was constantly harder than the previous one, be- 
 cause of my inclination to arrange as I had done last time when the order was 
 that of resemblance. When instinctively I felt the great difference of a card I 
 very often remembered that I had not placed it so low in the order for resem- 
 blance. I labored between two impulses, one to be true to my previous judg- 
 ments and the other to act honestly according to my present light. I think I 
 succeeded in following the latter. I noticed as I had not done before, to so great 
 an extent, the great resemblance of groups of cards. Very often they seemed 
 to have been written by the same person, but with the intention to disguise his 
 handwriting. In such cases I noticed the details of the penmanship and made 
 my decision rest with such little points as the separation of letters in a word, 
 the crossing of a t or the last stroke of the y. . . . Throughout the relation of 
 resemblance was in the background of consciousness. I felt that it was involun- 
 tarily more a criterion than the standard of "difference." The problem seemed 
 far more puzzling this time than last. 
 
 Hrt. In ranking according to dissimilarity I did not think first of slope, as 
 in the arrangement for resemblance, but rather of differences in endings of 
 letters like g, y, etc., and in beginnings of words after capitals. 
 
 Rich. I first looked at the general type of writing, i. e., the slant, the size 
 of the letters and the blackness of the ink. After this more general survey I 
 thought sometimes of the similarity of the formation of the letters and the capi- 
 tals, but this was necessary only when the general survey did not show striking 
 enough differences. 
 
 Bar. First the general appearance of the writing in its suggestion of the 
 character of the writer. The pattern seemed to express a type of individuality 
 entirely different from that expressed in the card which I placed on top. This is 
 a question of general impression. For cards more nearly alike I think the 
 strongest point was in the regularity or irregularity of the letters. Some seemed 
 to be regular according to some definite system, others, like the sample, seemed 
 to be more or less hit-or-miss style. Another feature was the width of the pen 
 line. Next came the question of slant, although this was not a very strong factor. 
 The formation of the individual letters was also of small import, but the final 
 letters of each word influenced me somewhat, also the capitals. The question of 
 motor imagery seemed to be a determining factor, I seemed unconsciously to 
 wonder how differently one should go about it to write the various cards, and to 
 think of the hand movements necessary to the writing. This was a very strong 
 factor in judging those that were particularly dissimilar. 
 
 Str. Judged by general conception of smoothness rather than by actual 
 comparison of standard. This may have been due to the fact that I had just 
 read Dearborn's article on "The Discernment of Likeness and Unlikeness. " 
 Found the judgment harder than that of similarity and laid more stress on de- 
 tails which went to make up general smoothness. Distasteful job, goes counter to 
 normal mode of doing things. Tended for a while to think of similarity. Do 
 not feel sure of my judgments. 
 
 L.S.H. Felt less decided than when making judgments of resemblance. 
 Judgments vaguer. Felt as though about to come down stairs backwards, and 
 thus a little uncertain of progress. Judgments based on slope, shape and size 
 of the letters with some tendency to consider the ' ' maturity ' ' of the writing. 
 
 H.L.H. Began in terms of general slope and "rapidity." Felt rather in 
 the air and soon found the criterion inadequate. Then adopted size for a while,
 
 JUDGMENTS OF SIMILAE1TY AND DIFFEBENCE 81 
 
 then formation of separate letters, tendency to flourish, and way of ending y's, 
 g 's, and d 's. In the last part the tendency to think in terms of resemblance was 
 strong, because the cards re8embled each other in slant of the letters. Had to use 
 finer and finer details. 
 
 Wood. I judged first on the form of the letters and the way in which they 
 were made, then on the general direction, vertical, slant or backhand. Then the 
 position of the words on the card, and finally such details as the crossing of the 
 t's, the ending of the y's and the way the e's were made. 
 
 Gold. My judgments were chiefly based on differences in slant, size, and 
 heaviness. My first judgments were made by examining the writing, as a whole, 
 comparing one card with another. Later I studied the individual words and 
 letters, comparing their shape, roundness or sharpness, whether connected or not, 
 method of crossing t's, etc. 
 
 Bead. In deciding the differences in handwriting the first consideration 
 was the general appearance. So long as the cards of decided vertical writing held 
 out I went by that. I then noticed the differences in the formation of the letters 
 and particularly the first and last letters of a line. Of course, to some extent, 
 the general effect was still of influence. 
 
 Grand. I first observed the general character of the writing. The standard 
 seemed to me to be freely flowing, accustomed and not particularly careful. I 
 began selecting those cards which were most carefully and apparently most 
 slowly written, and those which seemed to have been written with some difficulty. 
 As the most striking cards were eliminated the process became more difficult and 
 I paid more attention to the formation of individual letters. 
 
 Plum. The factors considered were general neatness, angles and slant, size 
 of the writing, arrangement of the lines on the cards, and the form of special 
 letters, such as the d and the C. 
 
 Two things are indicated with considerable clearness by these in- 
 trospective records. The first is the greater ease and naturalness 
 which is felt to characterize the judgments of similarity. This is best 
 revealed in the introspections made during arrangements for differ- 
 ence. Thus Kup. reports: "This arrangement (difference) was con- 
 stantly harder than the previous one (similarity). . . . The problem 
 seemed more puzzling this time. ' ' Str. records : ' ' Found the judgment 
 harder than that of similarity. . . . Distasteful job, goes counter to 
 the normal mode of doing things. Tended for a while to think of 
 similarity. Do not feel sure of my judgments." Similarly L. S. H. 
 remarks : ' ' Felt less decided than when making judgments of resemb- 
 lance. Judgments vaguer. Felt as though about to come down stairs 
 backwards, and thus a little uncertain of progress." H. L. H. re- 
 ports : ' ' Felt rather in the air, . . . found the criteria inadequate . . . 
 tendency to think in terms of resemblance was strong. ' ' 
 
 The second fact is suggested by such statements as often occur 
 when judging difference, "I paid more attention to the formation of 
 independent letters than when I arranged the cards for resemblance" 
 (DeN.). Or, "I noticed the details of penmanship and made my de- 
 cision rest with such little points as the separation of letters . . ., the
 
 82 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 crossing of a t or the last stroke of a y (Kup.) . Also ' ' I did not think 
 first of slope, as in the arrangement for resemblance, but rather of 
 differences in endings of letters like g, y, etc., and in beginnings of 
 words after capitals" (Hrt.). "Began in terms of general slope and 
 rapidity . . . and soon found the criteria inadequate" (H. L. H.). "I 
 judged first on the form of the letters and the way in which they were 
 made ' ' (Wood) . The judgment of difference, that is to say, is largely 
 or often based on the comparison of fine points and minor details. 
 
 The introspections for similarity, on the other hand, abound to a 
 much greater degree in references to ' ' slope, " ' ' general slant, " " char- 
 acter," "personality," "regularity," "uniformity regardless of the 
 style or legibility," "general impression," "carelessness," etc. all 
 of these factors of a large, general, loosely defined and "impression- 
 istic" character. These differences in criteria tend to assert them- 
 selves without regard to the order in which the arrangements were 
 made. 
 
 A possible objection at this point might be that the differences in 
 the two arrangements were perhaps due to the fact that the two ar- 
 rangements began with different cards (the similar end of the series 
 in one case and the unlike end in the other), rather than to a real 
 influence of the form of the judgment. A test of this would be af- 
 forded by observers who should arrange the cards in terms of similar- 
 ity (beginning with the most similar) and also in terms of difference 
 (beginning with the least different instead of with the most different). 
 When such an experiment was tried with three observers, all three 
 showed clearly that, in the attempt to reason out what might be 
 meant by "least different," the two categories were at once brought 
 explicitly together in the consciousness of the observer. Since log- 
 ically the "most similar" is the "least different," the arrangement 
 then proceeded in terms of similarity, even when the instructions 
 were in terms of difference. 
 
 The apparent objection is not a real one. The observer has all 
 the cards before him. Whatever cards are judged to be "least sim- 
 ilar," he may leave till the latter part of the series, if he chooses, 
 when judging similarity. When judging difference, whatever cards 
 he judges to be most different may be at once selected. The whole 
 matter is in the observer's own hands. And the significant thing is 
 that the cards which are left to the end of series, when judging simi- 
 larity, are not precisely the ones selected for the earlier part of the 
 series when judging difference. 
 
 Furthermore, if the result were only a consequence of inverting 
 the series, the two orders for difference should correlate as closely
 
 JUDGMENTS OF SIMILAE1TT AND DIFFERENCE 83 
 
 as, and show no greater variability than, the two orders for similar- 
 ity. Neither of these conditions is realized. The difference is then 
 not merely the result of inverted arrangements. 
 
 Summary 
 
 1. The personal consistency correlation of two arrangements on 
 the basis of similarity is greater than that of two arrangements for 
 difference, unless, by performance in the "mixed order," or by some 
 other circumstance, both categories are brought explicitly together 
 in the consciousness of the observer. 
 
 2. Both the correlation of two orders for similarity and of two 
 orders for difference are higher than the correlation of an order for 
 similarity with the reciprocal of an order for difference. 
 
 3. With repetition, adaptation and familiarity with the material 
 the two categories tend to approximate each other and the direct order 
 to agree more closely with the indirect order. 
 
 4. The variability among a group of observers is less for similarity 
 than for difference. 
 
 5. Whether the judgment is expressed in terms of similarity or in 
 terms of difference it is on the cards which are most like the standard 
 that the group agrees most closely. 
 
 6. When arrangements are made 14 months apart, the same rela- 
 tions are disclosed, personal consistency for judgments of similarity 
 is greater than that for judgments of difference, and the discrepancy 
 between the direct order and the indirect order secured by inverting 
 the arrangement under the opposite category is noticeable. 
 
 7. Introspection suggests the greater "ease" and "naturalness" 
 and "confidence" of the judgments of similarity. 
 
 8. Introspection also shows a different distribution of criteria in 
 the two categories. Judgments of similarity tend to be based on 
 grosser and more general criteria, such as character, slope, ease, rapid- 
 ity, etc.; the judgment tends to be "impressionistic." In judging 
 difference more attention is paid to the finer details of form, size, ar- 
 rangement, and separation of letters. 
 
 9. Judgments of similarity and of difference are not merely two 
 forms of expression of one and the same intellectual act. Judg- 
 ments within each type or category involve each its own peculiar 
 psychological processes and criteria. The "most similar" is not, by 
 virtue of that fact, the "least different," nor is the "least similar" 
 identical with the "most different." Of the two categories, similar- 
 ity seems to be the most fundamental, natural, easy, and self-consist- 
 ent, whether a single individual or a group of observers is concerned.
 
 84 EXPEBIMENTAL STUDIES IN JUDGMENT 
 
 10. In these respects judgments of similarity and of difference 
 behave in the same way as do judgments of other logically opposite 
 qualities (such as preference and dislike, intelligence and stupidity) 
 which involve, in the beginning of such an experiment, psychological 
 processes and criteria which are not identical, but which move to a 
 common plane as the experiment proceeds or is repeated (see 
 Chapter VIII.) .
 
 CHAPTER VIII 
 
 As we have seen in the preceding chapter, judgments of similarity 
 and of difference are not merely the two sides of one and the same act 
 of intellect, but involve each its own peculiar psychological processes 
 and criteria, and the category or the form in which the judgment is 
 expressed, the attribute toward which it is directed, makes a consider- 
 able and measurable difference in the outcome of that judgment. 
 The present study reports an investigation, from a similar point of 
 view, of certain other judgments commonly passed in daily life. 
 
 Is a judgment of stupidity the exact reverse of a judgment of in- 
 telligence ? Is a judgment of preference the exact reverse of a judg- 
 ment of dislike? In other words, do we use the same standard in 
 judging characteristics designated by logical opposites, ranking all 
 specimens according to the degrees by which they deviate positively 
 or negatively from that standard? When we arrange specimens of 
 handwriting in an order of merit with respect to resemblance to a 
 given standard hand we use somewhat different criteria from those 
 employed when the specimens are arranged according to their dif- 
 ference from the standard. May it be also true that judgments of 
 intelligence or of preference are based on different sets of criteria 
 from those of judgments of stupidity or aversion ? Do we like a per- 
 son for certain qualities and dislike those who possess the exact antith- 
 esis of these qualities, or are our dislikes and preferences based on 
 different sets of qualities? To discover which of these possibilities 
 has the greater degree of probability is the main purpose of this 
 study. 
 
 The material consisted of 25 photographs of actresses. The 
 photographs were similar in shape, size, finish, and mount, differing 
 only with respect to the individual photographed and the pose as- 
 sumed. In selecting the photographs care was taken to avoid those 
 of well-known actresses, in order that past judgments might not 
 influence the results of the experiment. These pictures were ranked 
 in an order of merit, by 10 observers, with respect to preference, dis- 
 like, intelligence, and stupidity. As the purpose was to discover the 
 
 i By Margaret Hart Strong and H. L. Hollingworth. Eeprinted from Jour. 
 Phil., Psych., and Sci. Methods, September 12, 1912. 
 
 85
 
 86 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 effect of the direction or category of judgment, special emphasis was 
 laid on each category in the written instructions with which each of 
 the observers was provided. These instructions were as follows : 
 
 Preference 
 
 Arrange the photographs in an order of merit, placing at the top the face 
 you like the most, placing second the face you like next best, and so on, until 
 the face you like the least is at the bottom of the series. 
 
 Dislike 
 
 Arrange the photographs in an order of demerit, placing at the top the 
 face you dislike the most, placing second the one you dislike next intensely, and 
 so on, until the one you dislike the least is at the bottom. 
 
 Intelligence 
 
 Arrange the photographs in an order of merit with respect to the intelligence 
 of the face, putting at the top the most intelligent, next to it the next in intelli- 
 gence, and so on, with the least intelligent face at the bottom of the series. 
 
 Stupidity 
 
 Arrange the photographs in an order with respect to the stupidity of the 
 face, putting the most stupid at the top, next to it the next stupid, and so on, 
 until the least stupid looking face is at the bottom of the series. 
 
 Five of the observers made the arrangements in the following 
 order : 
 
 1st week, ranked for preference and intelligence. 
 2d week, ranked for preference and intelligence. 
 3d week, ranked for dislike and stupidity. 
 4th week, ranked for dislike and stupidity. 
 
 The remaining five ranked for dislike and stupidity in the first two 
 weeks, and for preference and intelligence in the last two weeks. 
 This precaution was taken in order to minimize the influence of 
 practise on the results of the group averages. In every case at least 
 a week intervened between one judgment and the next. There was 
 no clear evidence of decided memory effect except in the case of the 
 extremes of the series. After the fourth arrangement the observers 
 were asked to write out a statement of the criteria used in judging 
 each trait. The observers were all students of Barnard College, 
 juniors or seniors taking their second or third year's work in psy- 
 chology. 
 
 In making the correlations to be discussed later, the formula 
 
 d(d? 1) 
 was used. The correlations were worked out between each observ-
 
 INFLUENCE OF FOEM AND CATEGORY ON JUDGMENT 87 
 
 er's two trials (I. and II.), and between each observer's average 
 judgment (a) with the group judgment (A), for each of the four 
 traits. These results are given in Table XLIII. 
 
 TABLE XLIII 
 
 THESE COEFFICIENTS OF CORRELATION ABE ALL POSITIVE 
 
 Observer Ell. Car. Ste. Hal. DeN. Str. Bro. Bar. Val. Gas. Av. M.V. 
 Correlations of I. and II.: 
 
 Preference 55 73 87 91 68 74 88 92 84 96 80.8 10.6 
 
 Dislike 57 89 86 98 87 73 84 70 86 60 79.0 11.0 
 
 Intelligence 71 84 90 92 78 74 86 77 91 83 82.6 6.0 
 
 Stupidity 77 85 89 87 83 72 73 65 82 86 79.9 6.5 
 
 Correlations of a with A : 
 
 Preference 51 57 58 23 56 55 44 45 54 58 50.1 7.7 
 
 Dislike 50 59 64 31 43 27 57 48 63 48 49.0 9.6 
 
 Intelligence 32 29 32 48 43 41 32 59 26 30 37.2 8.4 
 
 Stupidity 54 57 55 52 62 46 62 36 42 36 50.2 8.2 
 
 Table XLIV. gives the correlations between each order and the re- 
 ciprocal of its supposed opposite (by the reciprocal is meant the in- 
 verted order, so that what was originally the bottom of the series 
 becomes the top). If categories logically opposite are also psycho- 
 logically the two sides of the same act of intellect, then the correla- 
 tion between preference and the reciprocal of dislike should be equal 
 to the average of the personal consistency coefficients for preference 
 and for stupidity. That is to say, the inverted order for dislike 
 should coincide with the direct order for preference, and should cor- 
 relate as closely with this direct order as would two trials for prefer- 
 ence with each other. The same relation should be expected to hold 
 between intelligence and stupidity. On the other hand, if the proc- 
 esses differ from each other psychologically, it would seem that the 
 correlation between preference and the reciprocal of dislike (both 
 standards or categories being involved) should be less than the corre- 
 
 TABLE XLIV 
 Observer Ell. Car. Ste. Hal. DeN. Str. Bro. Bar. Val. Cas. Average 
 
 Correlations of: 
 
 1. Pref. and the recip. of did. 60 89 93 94 90 57 86 78 89 83 81.9 
 
 2. Av. of pref. I. and II., and 
 
 disl. I. and II 56 81 86.5 94.5 77.5 73.5 86 81 85 78 79.9 
 
 3. Int. and the recip. of stup. 85 79 93 90 94 74 73 87 86 96 85.7 
 
 4. Av. of int. I. and II., and 
 
 stup. I. and II 74 84.5 89.5 89.5 80.5 73 78.5 71 86.5 84.5 81.2
 
 88 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 lations of two trials for preference or of two trials for dislike. The 
 same, again, should hold for intelligence and stupidity. 
 
 At first glance, as the results are presented in this table, the 
 situation does not seem to be similar to that found in the study of 
 judgments of similarity and difference. In 6 of the 10 cases the 
 correlation between preference and the reciprocal of dislike is greater 
 than the average correlations of similar arrangements, and in two 
 of the remaining cases there is no difference between the two. The 
 average shows a small per cent, in favor of the former. 
 
 In the case of intelligence and stupidity, 7 of the 10 observers 
 have higher correlation between the judgment of intelligence and 
 the reciprocal of stupidity than the average correlation of similar 
 arrangements, and the average shows superiority in this direction 
 of 4.5 per cent. 
 
 It is apparent then that if these character judgments really have 
 the same psychological differences as those found between judgments 
 of similarity and difference, some factor is present in this experiment 
 which obscures the difference. 
 
 Table XLV. indicates that this factor is practise, adaptation, or 
 familiarity with the material, and that before these factors operate 
 genuine psychological differences are disclosed. In this table the 
 trials are not averaged as in Table XLFV., but the first order for pref- 
 erence is correlated with the reciprocal of the first order for dislike, 
 and the second order for preference with the reciprocal of the second 
 order for dislike. In a similar way are handled the arrangements 
 according to intelligence and stupidity. Each of these indirect cor- 
 relations is then compared with the average of the direct correla- 
 tions, that is, with the average of preference with preference, and 
 dislike with dislike. This also is done in the case of intelligence and 
 stupidity. 
 
 In both cases the results are clear. The correlation of the first 
 of the positive quality with the reciprocal of the first of the nega- 
 tive quality is less than the average correlation of positive and nega- 
 tive qualities with themselves. In the case of preference and dislike 
 there is no exception to this rule, and the average difference amounts' 
 to over 13 per cent. In the case of intelligence and stupidity 3 of 
 the observers are exceptions, but the other 7 show the difference 
 clearly ; a difference which averages, for the 10 observers, over 5 per 
 cent. Averaging the two types of judgment, in the lower part of 
 the table, there is no exception to the rule, and the average superior- 
 ity amounts to over 9 per cent. 
 
 The influence of practise, adaptation, and familiarity with the
 
 INFLUENCE OF FOEM AND CATEGORY ON JUDGMENT 89 
 
 material is shown by comparing the third row of coefficients in each 
 group of Table XLV. with the second row of the same section. In 
 these third rows the correlation of the second direct arrangements 
 with the second of the reciprocal arrangements is seen to move up, 
 in each case, and very clearly in the average, to the correlation of 
 two direct arrangements for a given trait. In fact the coefficients 
 
 TABLE XLV 
 
 Observer Ell. Car. Ste. Hal. DeN. Str. Bro. Bar. Val. Cas. Average 
 Av. pref. (I. and II.) and disl. 
 
 (I. and II.) 56 81 87 95 78 74 86 81 85 78 79.9 
 
 Pref. Land recip. of disl. I.... 22 81 83 91 66 43 77 56 80 67 66.6 
 
 Pref. II. and recip. of disl. II... 59 80 90 95 92 55 79 86 82 90 80.8 
 
 Av. int. (I. and II.) and etup. 
 
 (I. and II.) 74 85 90 90 81 73 79 71 87 85 81.2 
 
 Int. I. and recip. of stup. I.... 72 78 88 88 87 53 52 73 77 92 76.0 
 
 Int. II. and recip. of disl. II. . . 83 78 88 90 91 69 86 84 83 87 83.9 
 
 Av. pos. and neg. (I. and II.) .65 82 88 92 79 73 82 76 86 81 80.5 
 
 Pos. I. and recip. of neg. 1 47 80 86 90 77 48 65 65 79 80 71.3 
 
 Pos. II. and recip. of neg. II. . 71 79 89 93 92 62 83 85 83 89 82.3 
 
 are usually a little higher. Very evidently, then, in the beginning 
 of the experiment, before the two categories have been brought to- 
 gether in the consciousness of the observer in any explicit way, the 
 judgment of a negative quality is not the exact antithesis of that of a 
 positive quality. A judgment of dislike, that is to say, is not merely 
 the reverse aspect of a judgment of preference, but a new kind of 
 judgment, with perhaps different criteria, and certainly with a dif- 
 ferent outcome. The same must be said of judgments of intelli- 
 gence and stupidity. The form of expression, the direction or cate- 
 gory of the judgment, has a measurable influence on the outcome of 
 that judgment. But as the experiment proceeds and the two cate- 
 gories are both explicitly brought to the consciousness of the ob- 
 server, and after practise, adaptation and familiarity with the ma- 
 terial have played their part, the difference between the two cate- 
 gories tends to fall away, and the form or direction of the judgment 
 no longer influences its outcome. 
 
 This tendency is the same as that remarked in the study of the 
 judgments of similarity and difference in the case of handwriting, 
 where it is found that with practise and repetition the two judg- 
 ments come to resemble each other, and the inverted order for dif- 
 ference to agree more closely with the direct order for similarity. 
 
 This tendency is further shown by the figures in Table XL VI., in 
 
 7
 
 90 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 which the correlation of the first two trials of a given observer is 
 compared with the correlation of his last two trials, regardless of the 
 category of judgment concerned. With a single exception the latter 
 coefficient is always higher than the former, the average of the ten 
 observers showing a superiority of 7 per cent. 
 
 TABLE XLVI 
 Observer Ell. Car. Ste. Hal. DeN. Str. Bro. Bar. Val. Cas. Average 
 
 First two trials 63 79 89 92 73 73 79 68 84 73 77.0 
 
 Last two trials 67 87 88 93 85 74 87 85 88 90 84.2 
 
 TABLE XLVH 
 
 PERSONAL CONSISTENCY COMPARED WITH GENERAL JUDICIAL CAPACITY 
 
 Observer Ell. Car. Ste. Hal. DeN. Str. Bro. Bar. Val. Cas. Average 
 
 Average correlations of I. with II. 65 83 88 92 79 73 83 76 86 81 80.6 
 Average correlations of a with A 47 51 52 39 51 42 49 47 46 43 46.6 
 
 TABLE XLVIII 
 Ratio of Best to Poorest Preference Intelligence Dislike Stupidity Average 
 
 Correlation of I. and II 96:55 92:71 98:57 89:65 1.51:1.00 
 
 Correlation of a with A 58:23 59:26 64:27 62:36 2.15:1.00 
 
 Average 1.83:1.00 
 
 TABLE XLIX 
 Correlations of 
 
 I. and II.: Av. M.v. Av. M.v. 
 
 Preference 80.8 10.6 Subjective judgments. .. 78.9 10.8 
 
 Intelligence 82.6 6.0 Objective judgments 81.3 6.2 
 
 Dislike 79.0 11.0 Positive judgments 81.7 8.3 
 
 Stupidity 79.9 6.5 Negative judgments .... 79.4 8.8 
 
 a with A : 
 
 Preference 50.1 7.7 Subjective judgments. .. 49.5 8.6 
 
 Intelligence 37.2 8.4 Objective judgments 43.7 8.3 
 
 Dislike 49.0 9.6 Positive judgments 43.7 7.9 
 
 Stupidity 50.2 8.2 Negative judgments .... 49.6 8.9 
 
 The introspection was of little value, consisting for the most part 
 of mere generalization. But where specific criteria were given the 
 presence of the two standards was apparent. For example, Ob- 
 server Hal. "I like eyes looking straight at me. I don't like head 
 or eyes to have unnatural pose, because it looks affected. I can't 
 abide frowsy hair. I like smiling eyes and mouth and a high fore- 
 head." Here the first two criteria do seem to be opposed eyes 
 looking straight at one are not usually eyes in an unnatural pose. 
 But other criteria show the two standards. The observer "can't 
 abide" frowsy hair, but she does not specifically admire smooth 
 coiffures. She likes high foreheads, but expresses no positive dis- 
 like for low ones.
 
 INFLUENCE OF FOEM AND CATEGORY ON JUDGMENT 91 
 
 Some incidental points brought out in the results are worth 
 noting. In Table XLVII. the personal consistency of each observer 
 is compared with her correlation with the group average. The coeffi- 
 cient (.06) shows that there is absolutely no correlation between the 
 two. This seems to indicate an absence of general judicial capacity. 
 
 In Table XLVIII. the ratio of best to poorest is given, and the 
 familiar ratio of about 2:1 found (see Chapter X.). 
 
 Table XLIX. seems to show that the more subjective judgments 
 of preference and dislike are more variable and uncertain than the 
 more objective ones of intelligence and stupidity. The coefficients 
 are slightly lower on the average and the mean variations are larger. 
 This is true whether personal consistency or judicial capacity is con- 
 cerned. The coefficients for the negative judgments of dislike and 
 stupidity also show a higher variability than do those of the positive 
 judgments of preference and intelligence. 
 
 Summary 
 
 1. Judgments which are grammatically opposite (as preference 
 and dislike, intelligence and stupidity) involve, in the beginning of 
 the experiment, psychological processes and criteria which are not 
 identical. The form, direction, or category of the judgment exerts 
 a measurable difference on its outcome. 
 
 2. As the experiment proceeds the processes and criteria move 
 to a common plane and the two types of judgment resemble each 
 other more closely. This movement to a common plane is apparently 
 the result of repetition, adaptation, and familiarity with the ma- 
 terial, and of the fact that the two categories, hitherto implicitly 
 distinct from each other, are now brought explicitly together in the 
 consciousness of the observer. 
 
 3. The result of practise and familiarity with the material is to 
 increase the personal consistency of the observer's judgments. 
 
 4. Introspection suggests different criteria for judgments which 
 are grammatically or logically only two sides of the same intellec- 
 tual act. 
 
 5. There is seen to be no correlation between personal consist- 
 ency and agreement with the group average. 
 
 6. The ratio of best to poorest, in both these respects, is the fa- 
 miliar one of about 2 : 1. 
 
 7. Subjective judgments (of preference and dislike) are more 
 variable and uncertain than the more objective judgments (of in- 
 telligence and stupidity). 
 
 8. The coefficients of "negative" judgments (dislike and stupid- 
 ity) are more variable than those of the "positive" judgments 
 (preference and intelligence).
 
 CHAPTER IX 
 
 THE PERCEPTUAL BASIS FOR JUDGMENTS OF EXTENT 1 
 
 IN 1887, in the course of experiments on the extent of movement, 
 Loeb 2 was led to the supposition that the judgment of extent is 
 based on the perception of the duration of the movement. Since 
 then Kramer and Moskiewiez, 3 in 1901, and Jaensch, 4 in 1905, have 
 felt that their experimental results led to the same conclusion. 
 Woodworth, 5 in 1903, discredits the hypothesis. His chief objections 
 are: (1) Duration may be varied without entirely destroying the 
 approximate equality of the extents; (2) extent can be judged better 
 than time; (3) compensatory constant errors with higher speed are 
 insufficient; (4) if we judged by duration alone, speed distinctions 
 would be reduced to a matter of visual space or perception of force. 
 
 In June, 1909, the writer published, along with other matter, 8 
 the result of a long series of experiments on the relation between the 
 judgments of extent and duration in the case of rectilinear arm 
 movements. His conclusion there was that "the experimental facts 
 point to separate processes of judgment for the two magnitudes, ex- 
 tent and duration. The four methods of separate accuracy tests, 
 confusion, correlation, and correction failed to justify the assump- 
 tion that the perception of any one characteristic of a movement is 
 more primitive or fundamental than that of any other. The judg- 
 ment of extent seems to be based on a system of signs which have 
 been learned to mean extent directly. The same seems to be true of 
 both duration and velocity. ' ' 7 
 
 In the July (1909) number of the American Journal of Psychol- 
 ogy, Leuba 8 reported experiments, on the results of which he arrives 
 at conclusions quite opposed to those quoted in the preceding para- 
 graph. "The comparison of the length of arm movements is made 
 through the comparison of the duration of one or several of the sen- 
 
 i Reprinted from The Journal of Philosophy, Psychology, and Scientific 
 Methods, November 11, 1909. 
 
 zPfliiger's Archiv, 41, p. 124, 1887. 
 8 Zeitschrift fur Psychologie, 25, pp. 101-125, 1901. 
 4 Ibid., 41, pp. 257-279, 1905. 
 B"Le Mouvement," Chap. IV. 
 
 "The Inaccuracy of Movement," ARCHIVES OF PSYCHOLOGY, No. 13, 1909. 
 7 Ibid., pp. 85-86. 
 
 s American Journal of Psychology, July 1909, p. 374. 
 
 92
 
 PERCEPTUAL BASIS FOE JUDGMENTS OF EXTENT 93 
 
 sations arising from the movement and of a particular value of the 
 joint sensation called here the rate value." 
 
 In the face of such conflicting opinion the writer desires to pre- 
 sent in abbreviated form the results of his experiments and to give 
 certain additional reasons in support of his earlier conclusions. 9 
 From 600 to 800 experiments were performed on each of four sub- 
 jects, by the method of average error, on extents ranging from 150 
 to 650 mm. and on corresponding durations ranging from 1 to 3.5 
 seconds. By using a piece of apparatus already described else- 
 where, 10 all the movements, while they remained active, were free 
 
 TABLE SHOWING RELATION BETWEEN ERRORS OP EXTENT AND 
 ERROES OP DURATION 
 
 Deliberate 
 
 EXTENT DUHATION 
 
 Per Cent. Per Cent. 
 
 Per Cent. Per Cent. Right Per Cent. Per Cent. Right 
 
 Obs. 
 
 Trials 
 
 C.E. 
 
 V.E. Guesses 
 
 r 
 
 Trials 
 
 C.E. 
 
 V.E. Guesses 
 
 r 
 
 W. 
 
 450 
 
 6 2.0 
 
 13 0.6 
 
 59 
 
 .22 
 
 375 
 
 51.3 
 
 11 0.7 
 
 46 
 
 .31 
 
 H. 
 
 450 
 
 19 1.7 
 
 12 0.6 
 
 54 
 
 .56 
 
 375 
 
 16 2.0 
 
 12 0.9 
 
 52 
 
 .54 
 
 Bt. 
 
 287 
 
 24 3.8 
 
 181.5 
 
 64 
 
 .79 
 
 264 
 
 20 3.5 
 
 16 1.2 
 
 61 
 
 .67 
 
 L. 
 
 375 
 
 7 0.8 
 
 7 0.6 
 
 60 
 
 .54 
 
 
 
 
 
 
 Averages 14 2.1 12.5 0.8 59 .53 13.7 2.3 13 0.9 53 .51 
 
 Incidental 
 
 W. 
 
 375 
 
 8 1.7 
 
 13 0.8 
 
 49 
 
 450 
 
 10 1.8 
 
 20 0.9 
 
 53 
 
 H. 
 
 375 
 
 9 1.3 
 
 12 0.6 
 
 56 
 
 450 
 
 8 0.9 
 
 12 0.6 
 
 58 
 
 Bt. 
 
 264 
 
 15 2.2 
 
 15 1.2 
 
 65 
 
 287 
 
 17 2.8 
 
 201.3 
 
 63 
 
 L. 
 
 
 
 
 
 375 
 
 51.5 
 
 13 0.9 
 
 56 
 
 Averages 
 
 10.7 1.7 
 
 13.3 0.9 
 
 57 
 
 
 101.7 
 
 16.3 0.9 
 
 56 
 
 from the illusion of impact which has vitiated so much of the work 
 on movement. The apparatus gave simultaneous graphic records of 
 the extent, duration, speed, and energy of every movement per- 
 formed. For further details of the experiment and for a more com- 
 plete presentation of most of the data used in the present article the 
 reader must be referred to the writer's earlier monograph. The 
 preceding table gives the C.E. and V.E. for the extents and their 
 corresponding durations, when the observer tries to reproduce (1) 
 the extent and (2) the duration of his first movement. In still 
 other columns may be found the per cent, of right guesses when the 
 observer guesed, after each trial, as to the probable direction of his 
 error, and the coefficient of correlation between agreement of extents 
 and agreement of durations calculated by the method of unlike signs. 
 
 9 Leuba 's article was probably in the hands of the printer when ' ' The Inac- 
 curacy of Movement" appeared. 
 
 10 "Inaccuracy of Movement," Chap. I.
 
 94 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 On the basis of these figures the writer draws the following conclu- 
 sions. 
 
 1. The durations of extents intended to be equal have greater 
 V.E. (16.3 per cent.) than the extents themselves (12.5 per cent.). 
 There must be, then, some basis for the judgment of extent other 
 than the perception of duration. 
 
 2. The C.E. seems to be bound up with the process of attention, 
 the magnitude deliberately reproduced [extent (14 per cent.) or 
 time (13.7 per cent.)] being greater than that of the magnitude 
 incidentally reproduced [time (10 per cent.) or extent (10.7 per 
 cent.)]. This evident separation between the magnitude attended 
 to and that incidentally executed argues for separate processes of 
 judgment for the two magnitudes, extent and duration. 
 
 3. If the perception of duration were the basis of the judgment 
 of extent, incidentally reproduced durations should show as close 
 correspondence as durations deliberately reproduced. This is not 
 the case. 
 
 4. Extents agree as closely when the observers are reproducing 
 duration (V.E. 13.3 per cent.) as when they are attending to the 
 extent (V.E. 12.5 per cent.), but durations incidentally executed 
 do not correspond as closely (V.E. 16.3 per cent.) as in deliberate 
 experiments on reproduction of duration (V.E. 13 per cent.). That 
 is to say, if either judgment is to be considered the more primitive 
 and fundamental it should be the judgment of extent rather than that 
 of duration. 
 
 5. The coefficients of correlation between deliberate extents and 
 incidental durations (+.53) on the one hand, and between deliber- 
 ate durations and incidental extents (-|-.51) on the other, are posi- 
 tive. But all that this shows is the presence of positive correlation 
 between extent and duration, no matter which factor is being at- 
 tended to. There is as much evidence for the dependence of dura- 
 tion judgments on the perception of extent as for the converse. 
 
 6. If the observer is required to guess as to the probable direction 
 of his error in the case of each attempt to reproduce either extent or 
 duration, (a) the guesses in both cases correspond more closely to 
 the actual errors of the extents (59 per cent., 57 per cent.) than to 
 the differences between the durations (57.5 per cent., 53 per cent.) ; 
 (l&) the proportion of right guesses in experiments on extent (59 
 per cent.) is greater than that in experiments on duration (53 per 
 cent.). These facts are unfavorable to the hypothesis that it is the 
 perception of duration on which the judgment of extent is based. 
 
 Leuba's chief argument is based on the proposition that the dura-
 
 PERCEPTUAL BASIS FOE JUDGMENTS OF EXTENT 95 
 
 tions of movements judged shorter, equal, or longer than a standard 
 fall out shorter, equal, or longer as compared with the duration of the 
 standard. Unfortunately, neither the variability nor the reliability 
 of the average is given, nor is the number of cases, from which a 
 reader might compute the reliability himself. But even if the corre- 
 spondence were found to be complete such statistical correspondence 
 would throw no light whatever on the nature of the process of dis- 
 crimination involved in the comparison of the two lengths. If accu- 
 rate measurements had been kept of the depth of the wrinkles in the 
 loose glove w r hich covered the arm of the observer there would have 
 been found the same positive correlation when the extents were 
 judged shorter the wrinkles would have been found to be relatively 
 shallow, and they would have been equal or deeper according as the 
 judgment happened to be "same" or "longer." 
 
 It is a case in which denying one member of the disjunction dis- 
 proves a conclusion which is not proved by the affirmation of the 
 other member. In other words, even though the relations of the 
 durations do coincide with the form of the judgment, this duration 
 agreement may still be simply an incidental fact, on a par with the 
 depth of the wrinkles in the observer's sleeve. With the rather con- 
 stant speed characteristic of all observers in such experiments a 
 greater extent must occupy a longer duration, an equal extent an 
 equal duration, etc. To show that the durations do not agree as 
 closely as the extents, as the writer has already done, invalidates 
 the one conclusion, while to prove that they agreed equally well 
 would have no bearing whatever on the question of the perceptual 
 basis of the judgment of comparison. 
 
 The movements reported in Leuba 's article were made in different 
 parts of the arm 's total swing, under different degrees of contraction, 
 tension, joint position, etc. The only common factor was the time 
 element. Now even to prove that under these unusual conditions the 
 duration of movements is used as the basis for the comparison of 
 their extent does not prove that this is what happens in other cases. 
 But to show that even here the durations disagree more than the ex- 
 tents disproves the hypothesis completely. 
 
 With Leuba 's assertion of the existence of a special set of signs 
 which serve as criteria for judgments of speed, the writer heartily 
 agrees, but he is convinced that along with this assertion should also 
 go the recognition of the independent character of judgments of 
 extent and duration.
 
 CHAPTER X 
 
 SOME CHARACTERISTICS OP JUDGMENTS OP EVALUATION 
 
 AMONG the most common judgments passed in daily life are those 
 which express preferences or aversions, similarities or differences, 
 convictions or doubts, successes or failures, and other " general im- 
 pressions" or value "estimates." These expressions possess all the 
 characteristics of judgments, but are often said to be "subjective," in 
 the sense that it is impossible or difficult to measure their truthful- 
 ness or accuracy by the application of a standardized test. In many 
 cases no "objective" (generally accepted or conventionalized) meas- 
 ure exists, and the only method of test is by observing the internal 
 consistency of an individual's judgments on different occasions, by 
 comparing the individual's judgments with the consensus of opinion 
 of a large experimental group of observers, or by some other statistical 
 criterion. In such cases there is, strictly speaking, no measurement 
 of truth or accuracy, but rather of the consistency, certainty, fre- 
 quency, or correlation of different judgments. 
 
 The dependence of these judgments of general impression on indi- 
 vidual differences gives them a particular psychological interest. 
 Esthetic and ethical judgments belong to this group, as do also many 
 verdicts in the fields of philosophy, politics, manners, justice, and 
 most of the decisions of business, pedagogy, and religion. In spite of 
 the practical importance of this type of judgments, experimental 
 psychology has until recently occupied itself with only the more 
 trivial of them. The evaluation of simple esthetic material, the 
 elements of design, color preferences, tonal harmony, and the various 
 attributes of elementary sensory experiences have been studied in 
 detail. But there have been few attempts to investigate experi- 
 mentally the characteristics, conditions, and behavior of judgments of 
 such qualities as eminence, interest, belief, persuasion, character, the 
 comic, literary merit, etc. 
 
 Studies conducted by the "methods of expression" may be dis- 
 regarded in this connection, since these methods are expressly directed 
 toward the facts and character of the organic reaction rather than 
 toward the characteristics of the accompanying process of judgment. 
 Of the "methods of impression" various forms have been developed, 
 such as the "method of paired comparisons," the "serial method," 
 ' ' order of merit method, ' ' etc. In the hands of different investigators 
 
 96
 
 CHARACTERISTICS OF JUDGMENTS OF EVALUATION 97 
 
 these various names have not always meant precisely the same pro- 
 cedure, but the general features of the methods are well recognized. 
 Perhaps the most conspicuous have been the methods of "paired 
 comparisons" and "order of merit." Of these two the latter is by 
 far the more promising and Miss Barrett (1) has recently demon- 
 strated its superiority from the points of view of simplicity, expe- 
 dition, and reliability and significance of results. The present paper 
 considers some of the characteristics of such judgments of evaluation 
 as those for which the "order of merit" method has been used in 
 the past. 1 
 
 The beginnings of the method may be seen in some of the simple 
 experiments of Fechner, Mantegazza, and Galton. The method was 
 first given definite formulation by Cattell in a study of brightness 
 intensities (2) and particularly in his statistical studies of eminent 
 men and women (3-7). The method has since been used and further 
 developed by many of Cattell 's students, including Summer (21). 
 Norsworthy (17), Wells (24, 25), Thorndike (22, 23), Strong 
 (18, 19), Kuper (16), Barrett (1), and the writer (11-14). Downey 
 (8) and Yerkes (26) have also employed the method, and Thorndike 
 (23) has further proposed the transmutation of results secured by 
 this method into a surface of distribution for the purpose of deriving 
 quantitative statements of amounts of difference. 
 
 In most of these studies the method has been used chiefly as an 
 instrument in the investigation of some specific problem, such as 
 family resemblance, interests of children, value of advertisements, 
 measurement of school progress, distribution of eminence, etc. But 
 when the various studies are considered as a group there arise a 
 number of interesting problems concerning the judgments themselves. 
 Certain of these problems will here be taken up in turn, with a brief 
 consideration of the data at present available for their solution ana 
 interpretation. In many cases the conclusions can be but tentative, 
 and in several cases the problems themselves may ultimately prove 
 to be but "straw problems," suggested by a chance coincidence 
 of accidental or insignificant results. In spite of these facts it 
 seems worth while to present the problems in a more or less defi- 
 nite way, in order that future results may be explicitly referred to 
 them. 
 
 Many of these problems were first suggested directly or indirectly 
 in the two very original papers of Wells. The general principle of 
 the method may be given in the words of this author. "Professor 
 Cattell calls attention to the fact that, if one endeavors to arrange 
 
 1 For full bibliography of these studies see end of chapter.
 
 98 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 and rearrange in serial order a number of given objects, the posi- 
 tions successively given them will vary somewhat as they would vary 
 if the arrangements had been made one each by different observers. 
 If we undertook to arrange ten times a series of grays in order of 
 brightness, we should no more get the same order each time than we 
 should get identical orders from ten different subjects. Nor would 
 our own orders vary approximately the same amount from the aver- 
 age ; sometimes we should be better, sometimes worse, judges, just as 
 among our ten subjects some would be more discriminative, some 
 less. The judgments of the same individual at different times are 
 theoretically quite comparable to those of different individuals 
 regardless of the factor of times" (25 1). 
 
 A fuller description of the method and illustrations of some of 
 its useful practical applications are to be found in the writer's 
 "Principles of Appeal and Response" (14). A further modifica- 
 tion, which may be designated the group method as contrasted with 
 the strict order method has been employed by the writer, and pos- 
 sesses several advantages which justify its further development. The 
 following account of this modification is taken from a previous 
 paper (11). 
 
 "Instead of arranging the material in strict order of merit the 
 observer placed them in ten piles, according to their 'degree of 
 funniness.' In the first pile were placed the superior jokes, in the 
 tenth the poorest ones, while the intermediate piles represented 
 gradation of merit from best to poorest. No instructions were given 
 as to the amount of difference represented by these successive piles, 
 nor as to the number of cards to be placed in each. 
 
 Ten observers took part in the experiment, all of whom were 
 women, students in the Barnard laboratory, with one and a half 
 year's work in psychology. When the average position of each card 
 for the ten observers was calculated, the 39 jokes could be arranged 
 in a strict order of merit according to their respective averages. The 
 advantages of this group method are several. 
 
 It is much quicker than the strict method, less fatiguing and 
 monotonous to the observer, yet correlates closely with results from 
 the same observers by the strict order method. Further, the method 
 gives opportunity to observe any changes in value of the group as a 
 whole. Thus by multiplying the number of cards in a given group 
 (say 7) by the position of that group (say number 9) and adding 
 these products for all ten groups a figure is obtained which gives 
 some measure of the total value of the series for a given individual 
 or group. Now if the cards are arranged a second, third, fourth, etc.,
 
 CHAXACTEEISTICS OF JUDGMENTS OF EVALUATION 99 
 
 time by the same observers, these sums will indicate the change in 
 total value of the series during the successive trials. This figure 
 is of course not in any sense an absolute measure. It is conditioned 
 by shifts in the individual's standard of value, by his personal 
 variability of judgment, by the variation in standard from indi- 
 vidual to individual, and by the fact that no card can be thrown 
 higher than the first nor lower than the last pile. Nevertheless it 
 affords an interesting and suggestive index of the total series behavior 
 which the strict order method can not yield. It will be shown later 
 that the M.V. (mean variation) in such experiments bears a con- 
 stant ratio to the number of places into which the objects are to be 
 sorted, so that the relative variability is the same here as in the strict 
 method. 
 
 There may be, in the group method, a certain tendency to arrange 
 stimuli according to qualitative or type resemblance, which might to 
 a degree disturb the judgment of merit, a tendency, that is, to put 
 all puns in the same pile, etc. But there is no evidence in the results 
 that such an inclination has in any way operated. Moreover the 
 tendency is just as strong, in the strict order method, to put qualita- 
 tively similar stimuli in the same region of the scale. Thus "Wells 
 found that in arrangements of picture postals according to prefer- 
 ence there was a tendency to place near each other cards bearing 
 similar scenes, color schemes, etc. It is conceivable that, even in 
 arranging individuals with respect to scientific eminence, contiguity 
 in space or similarity of field or method may operate as a more or less 
 significant associative factor in determining relative position. But 
 since these factors also help determine the individual's actual judg- 
 ment of merit, they need not be supposed to warp that judgment in 
 any undesirable way. 
 
 In the present experiment each of the ten observers arranged the 
 cards five successive times, the trials being a week apart. This plan 
 thus gave data for investigating the variability of the group, of the 
 individual, of the total value of the series, and of the behavior of 
 each card under the influence of repetition. Both Wells and Downey 
 have shown that a week is ample time for the elimination of any 
 great disturbance through the memory factor in the successive trials. ' ' 
 
 Problems 
 
 First Problem. Variability of Different Parts of the Series. 
 (Repeated arrangements and arrangements by different individuals.) 
 If all the items are arranged at each trial the variability of each 
 item from its average position may be determined. When this is
 
 100 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 done the variability is usually found to be smaller at the extremes of 
 the series than in the central section, in such material as has been 
 employed. The variabilities increase fairly regularly as the central 
 region of the series is approached. The following records (Table 
 L.) illustrate this tendency. The figures are taken from vari- 
 ous studies in which different material and observers were used, and 
 include series of various lengths. The results are not always given 
 for each item, but usually for sections of neighboring items, the sec- 
 tions being determined sometimes by tabular convenience, and in 
 other cases by the way in which the results were originally expressed. 
 
 Wells remarks, on this finding in the case of repeated arrange- 
 ments by the same observer : ' ' We find, as we should anticipate, that 
 the M.V. increases toward the middle position and decreases toward 
 the ends. The amount of this increase varies considerably and con- 
 stitutes a not uninteresting point of individual difference. In subject 
 A the middle M.V.'s are nearly three times those at the start, in D 
 they are barely half again as much. Individual difference in reli- 
 ability of judgment seems therefore to be greater in the middle than 
 at the ends. This is what we should expect, for the judgments are 
 more difficult in the middle and we naturally vary more from each 
 other in our judgment of difficult things than in our judgment of 
 easy ones" (25 525). 
 
 But the problem can not be so easily disposed of. In the first 
 place the decrease of variability toward the ends is in part a purely 
 methodological consequence, items at extreme top and bottom 
 of the series can be displaced in successive arrangements or by 
 different observers, in only one direction, viz., toward the middle. 
 Even those somewhat further in from the extreme ends can suffer 
 large displacements in one direction only, but at the middle of the 
 series there is double opportunity for large displacement. To be 
 sure the maximum possible displacement is greater in the case of the 
 extremes, since a given card may be displaced the full length of the 
 series, but this situation probably seldom occurs, would, in fact, 
 occur only in arrangements on the basis of chance. The individual 
 differences pointed out by Wells are then in all probability only 
 differences in variability in general, rather than in specific ' ' amount 
 of increase" from one part of the series to the other. 
 
 The problem as it now stands is to determine to what extent the 
 increase of variability toward the center is only a methodological re- 
 sult of this end error, and how far it possesses any further signifi- 
 cance. One can not by any means assume a priori that in a given 
 series the middle region will be one of greater difficulty. In fact one
 
 CHAEACTEBISTICS OF JUDGMENTS OF EVALUATION 1Q1 
 
 TABLE L 
 VARIABILITY IN DIFFERENT PARTS OF THE SERIES 
 
 Av. M.V. of Sections, from Top to Bottom 
 
 Study 123 4 56 7 8 9 10 
 
 H.L.H. 
 Jokes 
 Funniness 
 39 items 
 10 Obs. 1.89 2.04 1.85 2.20 2.07 2.58 2.14 1.81 
 
 H.L.H. 
 
 Appeals 
 
 Persuasiveness 
 
 50 items 
 
 50 Obs. 9.76 11.44 9.80 
 
 H.L.H. 
 
 Portraits 
 
 Intelligence 
 
 20 items 
 
 10 Obs. 1.41 2.85 3.86 3.68 3.60 3.01 2.90 3.03 2.06 2.16 
 
 H.L.H. 
 
 Portraits 
 Courage 
 20 items 
 10 Obs. 2.80 3.27 3.38 5.08 5.50 3.34 3.34 2.67 3.29 3.12 
 
 Wells 
 
 Post Cards 
 
 Preference 
 
 50 items 
 
 5 Obs. 8.7 8.3 11.6 10.5 12.2 12.9 10.0 10.8 11.8 8.5 
 
 Wells 
 
 Authors 
 
 Style 
 
 10 items 
 
 10 Obs. .25 .30 .36 .39 .40 .39 .34 .31 .33 .26 
 
 Strong 
 
 Advertisenemts 
 
 Persuasiveness 
 
 10 items 
 
 30 Obs. 1.9 1.4 2.0 2.5 2.8 2.8 1.6 2.0 2.3 1.5 
 
 Downey 
 
 Handwriting 
 
 Resemblance 
 
 37 items 
 
 10 Obs. 4.72 6.58 6.50 7.43 7.03 5.94 4.48 3.62
 
 102 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 might expect the difficulty to increase regularly toward one end of 
 the series, unless the material were deliberately chosen so as to afford 
 items on both sides of the zero-point of the quality being judged. In 
 the case of the post cards this may well have been the case, and the 
 series may have included positively pleasing and positively displeas- 
 ing as well as indifferent items. In Wells 's study of the series of 
 weights with constant difference ratios between adjacent items, the 
 variabilities increased from the top to the bottom of the series. The 
 same thing was true of Cattell's lists of eminent men, though here 
 there was no lower limit to the series. 
 
 Test experiments might be made in which the presence of a zero- 
 region could be introspectively reported upon, with different mate- 
 rials and varying series lengths. Only by such experiments may the 
 role of the end error be separated from other suspected influences. 
 The figure of variability has been used as a measure of the amount 
 of difference between the items judged, and whenever this is done it 
 is important to be sure that other conditions are not influencing the 
 size of the coefficients. The table just given indicates that the ten- 
 dency toward increased variability in the central region is present 
 with varied kinds of material, regardless of the manner in which it is 
 chosen. It will be shown later that the average M.V. of these experi- 
 ments with judgments of "general impression" tends to be about one 
 fifth of the total number of places in the series. This would mean 
 that the end error might of itself affect the upper and lower quarters 
 of the total series, which perhaps sufficiently explains the tendency to 
 increase toward the center. 
 
 Second Problem. Certainty of Individual Likes and Dislikes. 
 Disregarding the middle of the series the variabilities of the two 
 extreme sections may be compared, since both these sections are 
 equally affected by the end error. Two cases must be distinguished 
 here : (1) The consistency or certainty of repeated arrangements by a 
 single observer; (2) the agreement or disagreement of various indi- 
 viduals of a group. On the first point the following data are avail- 
 able ( Table LI. ) . In this table the first section is to be compared with 
 the last, the second with the penultimate, and the third with the 
 antepenultimate section. It will be observed that the same individual 
 is, on the average, more certain (has smaller M.V.) in the case of the 
 lower sections of the series than in the case of the upper ones. With 
 respect to his data Wells remarks: "Another point of significance is 
 that the M.V.'s are always less at the disliked end than at the pre- 
 ferred end, although there is no intrinsic reason why they should be 
 better grounded in memory. This might be in great part due to a
 
 CHARACTERISTICS OF JUDGMENTS OF EVALUATION 103 
 
 TABLE LI 
 CERTAINTY OP INDIVIDUAL LIKES AND DISLIKES 
 
 Section 
 
 First 
 
 H. L. H. 
 
 Judgments of 
 the Comic. 
 M.V. 
 
 84 
 
 Wells 
 Preference for 
 Post Cards. 
 M.V. 
 
 2.6 
 
 Downey 
 Resemblance of 
 Handwriting. 
 M.V. 
 
 269 
 
 Second 
 
 1.39 
 
 4.7 
 
 3.05 
 
 Third. . 
 
 1.64 
 
 5.4 
 
 3.90 
 
 Antepenult 1.64 5.4 2.92 
 
 Penultimate 1.37 4.4 2.74 
 
 Last 78 1.8 1.45 
 
 generally unesthetic series of cards, but it is perhaps generally true 
 that we are surer of our antipathies than of our preferences" (25 
 525). But Downey finds the same relation shown in general by 
 judgments of resemblance, and remarks: "Toward the close of a 
 series the judgments became judgments of dissimilarity. The records 
 show that such a judgment is frequently made more easily than is 
 a judgment of likeness" (8 20). The writer, in the study of judg- 
 ments of the comic, finds the same -tendency for the lower end of the 
 series to show smaller variability. 
 
 Here again then is a problem. In these studies of repeated ar- 
 rangements the lower end of the series shows the smaller variability. 
 This is hardly to be explained by Wells 's suggestion of the greater 
 certainty of our antipathies, unless one can be fairly supposed to 
 entertain feelings of aversion toward "unlikeness" when judging 
 handwriting, and toward lack of humor in an intended comic situa- 
 tion. It should be pointed out that the relation is by no means a unan- 
 imous one with individual observers. Only half of Wells 's observers 
 show it to any striking degree, though all but one of the five show it 
 when the highest five items are compared with the lowest five. In my 
 own results the relation of the averages is largely due to four of the 
 observers, the other six showing exactly the opposite result. One of 
 Downey's experiments failed to show the tendency with any cer- 
 tainty, and the repeated arrangements of weights in Wells 's study 
 showed an increasing variability from top to bottom of the series. It 
 is quite probable that there is no genuine problem here at all and that 
 the results given are merely dependent on the character of the mate- 
 rial in the particular cases. It is perhaps easier to find material that 
 is distinctly not beautiful, not comic, or not similar, than to find 
 material of the extreme opposite qualities. 
 
 Third Problem. Group Variabilities in Likes and Dislikes. 
 With respect to the likes and dislikes of the members of a group of
 
 104 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 observers several studies are available. I will present first a dis- 
 cussion of this point as it appeared in the previous paper on ' ' Judg- 
 ments of the Comic." 
 
 "Likes and Dislikes. If the cards be arranged in a final order of 
 merit for each trial and the M.V. 's of the best cards compared with 
 those of the poorest, that is, if the M.V.'s of the top and bottom of 
 the series be compared, the members of the group are found to agree 
 more closely at the top than at the bottom. Table LII. gives the M.V. 
 for the first and last ten places in each of the five trials. Inspection 
 shows two facts. First, that the M.V. for the top groups, taken 
 either by 5's or 10 's, is less than for the lower. Thus the average 
 M.V. for places 1-10 is 2.03 compared with 2.22 for places 30-39. 
 The M.V. of places 1-5 is 1.97 compared with 2.09 for places 34-39. 
 
 TABLE LII 
 Av. M.V.'s, 10 OBSERVERS, 5 TRIALS 
 
 Foe. Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 
 
 1 1.48 1.20 0.90 1.66 1.12 
 
 2 1.40 3.04 2.98 2.12 2.22 
 
 3 2.84 1.56 1.72 2.44 1.80 
 
 4 2.20 3.06 2.10 1.66 1.84 
 
 5 1.80 2.32 1.86 1.62 2.40 
 
 6 2.52 2.40 2.56 2.10 1.49 
 
 7 1.88 2.08 2.70 2.40 1.84 
 
 8 2.04 1.56 1.52 2.21 2.00 
 
 9 2.08 1.68 1.60 2.83 2.20 
 10 2.40 1.88 2.32 2.08 2.68 
 
 30 2.60 2.76 1.43 2.80 2.40 
 
 31 3.20 2.12 1.80 2.40 2.52 
 
 32 2.08 3.04 3.18 2.80 1.96 
 
 33 2.50 2.44 2.10 2.30 1.63 
 
 34 2.08 2.12 2.24 1.60 2.17 
 
 35 1.98 2.40 2.20 1.90 1.84 
 
 36 2.94 2.20 1.68 2.40 2.38 
 
 37 2.00 1.70 3.16 1.38 1.50 
 
 38 2.36 1.88 1.80 2.50 1.56 
 
 39 2.72 1.82 1.78 1.96 1.80 
 
 Second, this difference becomes smaller with each repetition, the 
 differences between the M.V.'s of 1-5 and 34-39 being successively 
 .46, .23., .21, .13, .05, and between the M.V.'s of 1-10 and 30-39, 
 being .39, .24, .17, .10, .01. Generalizing we may say that in the 
 beginning individuals agree more closely on the good than on the
 
 CHAEACTESISTICS OF JUDGMENTS OF EVALUATION 105 
 
 poor, but that with successive repetitions this difference disappears 
 (see Table LIII.). 
 
 TABLE LIII 
 AVERAGES FROM TABLE LII 
 
 12345 Average 
 
 Av. 1-5 1.94 2.23 1.91 1.90 1.87 1.97 
 
 Av. 34-39 2.40 2.00 2.12 2.03 1.82 2.09 
 
 Difference + .46 - .23 + .21 + .13 - .05 
 
 Av. 1-10 2.06 2.08 1.97 2.11 1.96 2.03 
 
 Av. 30-39 2.45 2.32 2.14 2.21 1.97 2.22 
 
 Difference + .39 + .24 + .17 + .10 + .01 
 
 This first relation seems to be a usual one in judgments of this 
 subjective character, of preference, beauty, persuasiveness, etc. 
 Thus in Wells 's study of picture postals, although the author does 
 not call attention to the fact, the figures yield the following result. 
 For places 1-5 and 45-50, the M.V.'s are much alike, being respec- 
 tively 8.7 and 8.5. For places 1-10 the M.V. is 8.5 while for 40-50 
 it is 10.2. For 1-15 it is 9.5 as against 10.3 for places 35-50, etc. 
 
 Various investigators find that for repeated trials by the same 
 individual the reverse situation holds, the same individual being more 
 consistent at the bottom of the scale than at the top, and the sugges- 
 tion has been made that this may mean that we are more certain of 
 our dislikes than of our preferences. Giving the present relation a 
 somewhat analogous interpretation, it may mean that although a 
 single individual may be more certain of his antipathies, a group of 
 individuals will resemble each other more in their preferences than 
 in their aversions. 
 
 Or the relation may mean simply that we attend to things pos- 
 sessing positive quality, that here where the expression of the judg- 
 ment is in terms of preference we attend more strongly to the end 
 in which our preferences really lie. But that this is not true for 
 all individuals will be later pointed out. Dearborn finds judg- 
 ments of unlikeness easier to make than judgments of similarity, and 
 Downey finds some evidence for the same relation, although the 
 average of her results confirms the statement of Wells. But the 
 judgment of preference is qualitatively different from the judgment 
 of resemblance, the one being based on feeling-tone, the other on 
 more restricted perceptual factors. 
 
 Another possible interpretation of the data is that the differences 
 between the superior cards, at the top of the scale, are greater than 
 those of the mediocre at the bottom. This was clearly shown by 
 Cattell to be the case in judgments of scientific achievement. Thus 
 
 8
 
 106 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 ' ' The figures show that the average differences 2 between the chemists 
 who are in the first tenth are about eight times as great as between 
 the chemists toward the middle of the list and about twelve times as 
 great as between the chemists toward the bottom of the list." But 
 there are at least three reasons for believing that there is consider- 
 able change in attitude when the same observer turns from arranging 
 men according to merit to arranging simple stimuli according to 
 affective tone. The difference lies in the fact that part way down 
 the scale, in the latter case, the expression of judgment changes 
 from terms of decreasing preference into terms of increasing posi- 
 tive dislike, whereas probably few scientists who would get into a 
 total group would be rated as positively bad, the judgment being 
 expressed rather in terms of more or less merit. Arrangements of 
 scientific merit resemble the scale of sensation intensities, varying 
 always in terms of degree, while arrangements of preference re- 
 semble the gradation of feelings from the positive pole through a 
 region of indifference to a decided negative pole. 
 
 In the second place the suggestion that the smaller variability in 
 the upper ranges depends on objective differences in the stimuli is 
 contradicted by the fact that in the successive arrangements by the 
 same individual four of the ten observers were more consistent in 
 the lower range than in the upper, and this would hardly be expected 
 if the differences between the cards in this lower range were 
 actually smaller than in the upper. Furthermore if something like 
 Weber's law holds for judgments of affective tone as well as for 
 sensation intensity, differences in the upper range would have to be 
 greater in order to yield equal variability, and considerably greater 
 if the variability is still smaller. The whole question of this closer 
 group agreement in the upper ranges seems to merit further investi- 
 gation and especially, the tendency of the differences to become uni- 
 formly smaller in successive trials. ' ' 
 
 The following results, from the preceding chapter on judgments 
 of similarity and difference in the case of handwriting, show the 
 same tendency. Both when judging similarity and when judging 
 difference the nine observers agree more closely on the upper sec- 
 tions of the series, the material being the same in both cases. 
 
 The following table gives the average results of two studies by 
 "Wells, the one of "literary qualities," the other of "similarity of 
 two colors." The judgments of literary qualities show the common 
 tendency, but the judgments of color similarities show just the 
 reverse. 
 
 2 Measured inversely by the size of the probable errors and directly by the 
 difference in grade.
 
 CHABACTESISTICS OF JUDGMENTS OF EVALUATION 107 
 
 TABLE LIV 
 
 35 SPECIMENS OF HANDWRITING. 9 OBSERVERS 
 
 Trait: Resemblance to a Given Standard Specimen 
 Section Judging Similarity Judging Difference 
 
 1st 5 items, Av. M. V 4.82 4.55 
 
 2d 5 6.11 6.59 
 
 3d 5 6.84 . 6.30 
 
 4th 5 7.03 7.87 
 
 5th 5 6.65 7.77 
 
 6th 5 6.86 7.16 
 
 7th 5 5.01 5.05 
 
 TABLE LV 
 
 10 Authors with Respect to 28 Pairs of Colors. 
 Given Literary Qualities. Average M.V. of 10 
 Av. M.V. of 10 Observers Observers 
 
 1st sec. of series 25 2.1 
 
 2d sec . .30 2.6 
 
 Penultimate sec 33 2.4 
 
 Last sec 26 0.7 
 
 Individual and class differences in such a tendency might well 
 be expected. In a later study by the writer, in which 50 appeals to 
 specific instincts and interests were rated according to their per- 
 suasiveness, an apparently genuine case of such difference is 
 afforded (12). The following table (Table LVI gives the average 
 
 TABLE LVI 
 
 Average M.V.'s of Best 10 Middle 10 Poorest 10 
 
 20 women, 1st trial 10.10 11.18 10.07 
 
 20 women, 2d trial 9.76 11.93 9.59 
 
 10 women 9.37 10.58 ' 8.77 
 
 Av. of women 9.74 11.23 9.47 
 
 20 men 9.84 12.96 10.79 
 
 Grand average 9.76 11.44 9.80 
 
 M.V.'s of the highest, lowest, and middle sections of 10 appeals for 
 several groups of observers. The point of interest in these records 
 is the question of closeness of agreement at the top of the list, 
 among the preferences, as compared with that at the bottom of the 
 series, among the dislikes. The evidence here is suggestive. Women 
 seem to agree more closely on their dislikes (M.V. 9.4) than on 
 their preferences (M.V. 9.7), but the difference is not large. It is 
 probably reliable and genuine, however, since the relation holds in 
 all three experiments with women. The men, on the other hand,
 
 108 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 agree more closely on their preferences (M.V. 9.8 as against 10.8 for 
 dislikes) and the difference is considerable. The averages of men 
 and women show no difference whatever. There seems to be a sex 
 difference here, which, expressed in general terms, would be, that 
 men resemble each other more closely in their preferences while 
 women are more alike with respect to their aversions. This fact 
 throws some light on the further finding that there is low correla- 
 tion between the magnitude of the M.V's for the particular cards 
 when the variabilities of the women's judgments are compared with 
 those of the judgments passed by the men. 
 
 It is difficult to determine how far this question of group varia- 
 bility at the extremes is merely a function of the material and how 
 far it is due to more essential psychological factors. Such cases as 
 the sex difference just described are obviously not due to the 
 nature of the material, which was the same in both cases. There 
 is further evidence which tends to confirm the suggestion of this 
 sex difference as men and women are now constituted. Thus Strong 
 (18 79) finds that "When women are given an equal opportunity 
 with men to rate appeals (advertisements) they are able to classify 
 their dislikes as well as their preferences, which the men do not. 
 . . . "Women have more and greater dislikes than men and are 
 surer of them." Similar evidence is found in Kuper's study of the 
 preferences of boys and girls from 6.5 to 16.5 years of age. "An- 
 other sex difference noted was the number of positive dislikes ex- 
 pressed by each sex. The girls gave 161 dislikes as against the 
 boys' 65. Boys seemed to entertain relative indifference toward the 
 appeals at the bottom of the list" (16). 
 
 These results, if further verified, would lead to the generaliza- 
 tion that men are homogeneous, that is, tend to resemble each other 
 more closely, in the case of their preferences, appeals which are 
 positive and strong ; women, on the contrary tending to be alike with 
 respect to their dislikes, appeals which are weak or negative. 
 Whether this difference bears in the direction of selection and differ- 
 ence in experience or training, or merely toward the temporary 
 motives which operate in reacting toward such experiments, the 
 results do not show. The fact that women have definite and mutual 
 aversions, with fewer common preferences, while men have fewer 
 determinate dislikes but definite and mutual preferences, is, if true, 
 an interesting statistical discovery, and one which may be found to 
 have numerous implications. Whether it be interpreted to mean a 
 fundamental and inherent sex difference or merely a difference 
 which reflects our present social organization (which is doubtless an
 
 CHAEACTEE1ST1CS OF JUDGMENTS OF EVALUATION 109 
 
 adequate explanation of all the facts) has nothing to do with the 
 present usefulness of the facts themselves. Moreover the suggested 
 further verification must be found before the existence of the differ- 
 ence can be asserted with even mild assurance. 
 
 Fourth Problem. Personal Consistency and Judicial Capacity. 
 This problem was first raised by Wells (25 529) who remarks, in 
 discussing the esthetic judgments of his subjects, "A somewhat sig- 
 nificant comparison is afforded between the variability of the (5) 
 subjects from the average of the ten, and their variation from their 
 own judgments (in repeated arrangements). Those who vary least 
 from their own judgments also vary least from the judgments of 
 others. . . . The observations are too few to do more than suggest a 
 general principle, but their interpretation is a rather interesting one. 
 The critic who best knows his own mind would seem the best criterion 
 of the judgments of others." In the case of the judgments of 
 amount of resemblance between colors "the peculiar correspondence 
 between the amount of variation from one's own judgment and from 
 the judgment of others appears" also. 
 
 In order to test further the truth of this generalization I have 
 made several experiments in which the variability of the individual 
 (personal consistency, as shown by the correlation of two trials by 
 the same individual on different occasions) is correlated with his 
 degree of agreement with the group average (judicial capacity or 
 representative character}. The resulting coefficient of correlation 
 will thus indicate the degree to which high personal consistency im- 
 plies the representative character of the judgments. The various 
 coefficients from the different experiments are given in the following 
 table. 
 
 TABLE LVII 
 
 PERSONAL CONSISTENCY ANI> JUDICIAL CAPACITY 
 
 Judgment Situation and Observers r 
 
 Appeals, relative persuasiveness, 20 women 29 
 
 Jokes, relative funniness, 10 women .49 
 
 Faces, various characteristics, 10 women 06 
 
 Handwriting, resemblance, 9 observers 47 
 
 Handwriting, difference, 9 observers .07 
 
 Syllables, agreeableness, 10 women 15 
 
 Portraits, various characteristics, 10 women 11 
 
 Wells, postal cards, 5 observers 70 
 
 Wells, color differences, 7 observers . 30 
 
 Downey, handwriting resemblance, 1st specimen 70 
 
 Downey, handwriting resemblance, 2d specimen .40 
 
 Downey, handwriting resemblance, 3d specimen 40 
 
 Average +.19
 
 110 EXPESIMENTAL STUDIES IN JUDGMENT 
 
 In my own experiments, with 10 to 20 observers, the correlations 
 are practically zero (Av. .07) . I have computed, from the data given 
 by Wells and Downey, similar coefficients from their small groups of 
 observers, (usually 5) and these are also included in the table. Four 
 of the five are positive and large, the other being negative, and the 
 average being .34. The average of the 12 different studies is .19. 
 The only large negative correlation among my own figures is in the 
 case of the judgments of comic situations. It may well be that this 
 single negative coefficient is due to the peculiar nature of the mate- 
 rial. The process of adaptation gives to the comic situation a chang- 
 ing rather than a static value. The judgments of the group of ob- 
 servers in this experiment indicate that some of the jokes change 
 greatly in value with successive repetitions. One class, the "objec- 
 tive comic" as I have called them (naive jokes and calamity jokes in 
 which the predicament of the victim is self -induced) rise in the rela- 
 tive scale. Another class fall just as rapidly, the "subjective 
 comic" (sharp retort, pun, play on words, caricature, occupation 
 joke, etc.). A third class (mixed in character) approximate their 
 original position, in the later arrangements, and constitute about one 
 half of the total series. This gives a waxing, a waning, and a static 
 group. 
 
 This means that if a given individual's judgments are to be an 
 index of the opinion of the group his evaluation of the waxing and 
 waning items must vary correspondingly, thus giving him a low per- 
 sonal consistency coefficient. In so far as the individual 's consecutive 
 arrangements remain uniform, to just that extent does he fall short 
 of being representative of his group. It is clear from these facts that 
 in all such determinations the stability of the material must be in 
 some way ascertained before the results can be safely interpreted. 
 
 Fifth Problem. Personal Consistency in Different Situations. 
 It would be interesting to know whether an individual who has a 
 high personal consistency coefficient in one situation shows the same 
 characteristic when a totally different sort of material is judged. In 
 Table LVIII. such coefficients are given for 10 observers in two differ- 
 ent situations, judgments of the comic and judgments of persuasive- 
 ness of appeals. The correlation by relative position between the two 
 columns (1 and 2 of the table) is .30. The cases are few and the 
 P.E. large, but in so far as the data are reliable they indicate no 
 likelihood that an individual who judges the one sort of material con- 
 sistently will judge with relatively equal consistency in the other sit- 
 uation. The peculiar nature of the material in these two cases gives
 
 CHARACTERISTICS OF JUDGMENTS OF EVALUATION HI 
 
 this conclusion merely suggestive value, and further experiments are 
 needed. 
 
 Sixth Problem. Judicial Capacity in Different Situations (Gen- 
 eral Judicial Capacity). The table just described contains also, for 
 these 10 observers, their degree of correlation with the average of 
 their group in the two experiments (columns 3 and 4 of the table). 
 The correlation between the two columns is .22. This figure again 
 is subject to a large P.E. In so far as it is reliable it indicates a cer- 
 tain degree of general judicial capacity, the individual who is the 
 best representative of his group in the one ease being somewhat more 
 likely than any other individual to be the best representative of his 
 group in the other situation. 
 
 TABLE LVIII 
 GENERAL JUDICIAL CAPACITY 
 
 Personal Consistency Correlation with Average 
 
 Observer Appeals (r) Comic (M.V.) Appeals Comic 
 
 Ell 55 .88 .24 .32 
 
 Mah 13 1.65 .36 .55 
 
 Mor 71 1.30 .13 .54 
 
 Den 78 1.86 .52 .66 
 
 Ger 81 .95 .66 .70 
 
 Mas 87 1.43 .36 .60 
 
 Pra 74 1.35 .62 .28 
 
 Bis 73 .87 .43 .30 
 
 Sch .87 .43 
 
 Hrt 80 .92 .55 .48 
 
 r=-.30 r=+.22 
 
 In another experiment, the results of which are not given in the 
 table, a given group of individuals judged, on the one occasion the 
 legibility of handwriting, and on another occasion their degree of 
 belief in each of a series of propositions. The correlation between 
 representative character in the two cases is just zero ( .01), show- 
 ing consequently the non-existence of general judicial capacity in 
 this experiment. 
 
 Wells found, in his statistical study of literary merit, that the 
 observer who was the best judge (most nearly representative of the 
 group) in the case of "general merit" was not at all necessarily the 
 best judge of the author's possession of the various specific qualities. 
 In a group of 20 observers "the worst judge of general literary merit, 
 according to his divergences, is the third best judge of charm, the 
 best judge of clearness, and the thirteenth best of euphony. The best 
 judge of general merit is the fifth best of charm, the fourteenth of
 
 112 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 clearness, and the seventeenth of euphony. . . . We can hardly 
 draw inferences as to the general capacity for sound judgment as 
 measured by the soundness of judgment for any particular class of 
 objects . . . the fact that one has a good judgment for psychologists 
 tells us very little about the value of his opinion in other fields. . . . 
 To demonstrate the very existence of an abstract power of judgment 
 is ultimately synonymous with the problem of free will" (24 30). 
 
 Cattell found, in the case of the judgments, by ten psychologists, of 
 the eminence of fifty living psychologists, that "the second best 
 judge of the first ten psychologists is the worst of the second, the 
 fifth of the third, the eighth of the fourth, and the sixth of the fifth" 
 (24 30). On the whole then, there is no evidence, in the available 
 material, of the existence of such a thing as general judicial capacity. 
 
 Seventh Problem. Relation of Variability to Series Length. 
 Another striking relation brought out by the comparison of various 
 order of merit arrangements of stimuli on the basis of such affective 
 factors as preference, beauty, persuasiveness, funniness, etc., is the 
 constancy of the ratio of the average M.V. for the series as a whole to 
 the number of possible positions in the range. If by M.V. we desig- 
 nate this average variability and by P the total number of positions 
 in the scale then M.V./P is, with various kinds of material, with 
 different groups of observers, and with a widely ranging value for P, 
 usually .20, and with high reliability. The following table exhibits 
 this relation in such material as the writer has at hand. 
 
 TABLE LIX 
 
 Material Trait Observer P M.V. M.V./P 
 
 1. 4 advertisements Persuasiveness 10 men 4 .8 .200 
 
 2. 5 advertisements Persuasiveness 10 men 5 .98 .196 
 
 3. 39 jokes Funniness 10 women 10 2.2 .220 
 
 4. 10 advertisements (av. of 4 sets) Persuasiveness 10 women 10 2.3 .230 
 
 5. 10 advertisements (av. of 3 sets) Persuasiveness 20 mixed 10 2.5 .250 
 
 6. 20 advertisements (av. of 2 sets) Persuasiveness 50 mixed 20 4.3 .215 
 
 7. 20 photographs Various traits 10 women 20 3.6 .180 
 
 8. 39 jokes Funniness 10 women 39 8.03 .205 
 
 9. 50 appeals Strength 20 women 50 10.5 .201 
 
 10. 50 picture postals (Wells) Beauty. 10 mixed 5010.7 .201 
 
 That is to say, the M.V. is always about one fifth of the total num- 
 ber of possible places, or the P.E. (probable error) assuming a 
 normal distribution, about .168 or about one sixth of the range. The 
 evidence seems to the writer too strong to permit of explanation in 
 terms of mere coincidence. Of course if the material had been the 
 same throughout, the only variable being the number of places into
 
 CHAEACTEEISTICS OF JUDGMENTS OF EVALUATION 
 
 which it was sorted, this is just what we might expect, for the rela- 
 tive P.E. would remain constant, the absolute P.E. depending on the 
 fineness of the grades of distinction. But we have here ten distinct 
 sets of material, judged in terms of a considerable range of traits, by 
 widely differing groups of observers, both as to sex, training, interest, 
 and number. The only constant factor is that the judgment is 
 always based on the affective reaction to the stimulus. And we find 
 that in every case the probable error is approximately one sixth of 
 the range. (It would probably be slightly larger if it were not for 
 the fact that the end error tends to reduce the variability of the 
 extreme upper and lower positions.) Assuming that the M.V.'s were 
 equal in all parts of the range (and they do not vary greatly), and 
 allowing a P.E. in both directions from both the upper and lower 
 
 j 1 P. E. ? 
 
 A- 
 
 - 
 
 
 \P.E. 
 
 m 
 
 1P.E. 
 
 B- 
 
 
 
 P.E. 
 
 
 1P.E. 
 
 c- 
 
 
 
 P.E. 
 
 
 \P.E. 
 
 D- 
 
 - 
 
 \P.E.? 
 
 extremes, the total range would then be divided into four sections, 
 each separated from its neighbor by the respective P. E. 's, somewhat 
 as follows. This would mean that, so far as the average judgment of 
 the group of observers is concerned, there are only four distinct 
 grades of difference or merit in the material, only four shades of dis- 
 tinction on which the group would, in the long run, agree, these 
 grades corresponding to the sections lying about A, B, C, and D as 
 central tendencies. 
 
 This situation is curiously analogous to that disclosed in judg- 
 ments of the same observer, where practise shows that about four or
 
 114 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 five distinctions of certainty, clearness, etc., are all that can be com- 
 fortably and accurately made. The same thing that holds for the 
 variability of the individual holds for the variability of the group. 
 And the fact that the law holds for such different kinds of material 
 and traits argues an interesting resemblance between the judgments 
 involved in such affective discriminations. 
 
 The size of this ratio M.V./P would become smaller as the mate- 
 rial came to be selected so as to disclose more pronounced or more 
 objectively measurable differences. Thus in judgments of resem- 
 blance of penmanship, which are supposedly more directly perceptual 
 and objectively verifiable in kind, Downey finds M.V.'s which, if 
 arranged as below, according to the range of possible positions, would 
 yield an M.V./P value of about .163, or a probable error of about 
 .130, meaning that while there are only about four clearly marked 
 grades of beauty, funniness, persuasiveness, etc., there are about 
 five clearly marked degrees of resemblance. 
 
 TABLE LX 
 VARIABILITY OF JUDGMENTS or SIMILARITY (DOWNEY) 
 
 P M.V. M.V./P 
 
 20 3.31 .165 
 
 34 5.33 .157 
 
 37 6.22 .168 
 
 Average M.V./P = .163 
 
 It is probable that this ratio (M.V./P) can be used as a reliable 
 index of the objective character of judgments and with greater accu- 
 racy than the crude M.V. employed by "Wells. Using this ratio the 
 objectivity of his three classes of judgments would be, in increasing 
 order, preference .201, weights .141, colors .086, showing that the 
 judgments of weight order were more subjective than those of color 
 order, thus reversing the order assigned. 
 
 Eighth Problem. Quantitative Criteria of the Subjective. The 
 next problem grows directly out of the preceding one, and has to do 
 with the proposed "quantitative criterion of the subjective." Wells 
 writes : ' ' So far as any distinction on a statistical basis is possible we 
 might consider as subjective those types in which the various judg- 
 ments of the individual formed a species of their own, varying from 
 each other considerably less than from an equal number of judgments 
 made by different individuals; and consider as objective those in 
 which an individual would vary from his own independent judg- 
 ments about as much as the variation of an equal number of
 
 CHARACTERISTICS OF JUDGMENTS OF EVALUATION 115 
 
 judgments by different individuals. . . . The two categories would 
 almost certainly be continuous" (25 512). 
 
 A determination of these criteria for materials affording three 
 classes of judgments was the primary purpose of Wells 's study. His 
 conclusion may be given in his own words : ' ' It has appeared that in 
 the first class (the highly subjective feeling of preference for different 
 sorts of pictures) the judgments of each individual cluster about a 
 mean which is true for that individual only, and which varies from 
 that of any other individual more than twice as much as its own judg- 
 ments vary from it; that in the second class, with the colors, the 
 variability of the successive judgments and that of those by different 
 individuals markedly approached each other but still preserved a 
 significant difference; while in the third class, with the weights, we 
 found that there might be even an excess of the individual variability 
 over the 'social.' This comparison seems to afford, to a certain 
 extent, a quantitative criterion of the subjective" (25 547). 
 
 Further determinations of a somewhat similar sort may be derived 
 from many of my own studies. Instead of using a figure of varia- 
 bility I have employed the coefficients of correlation. The signifi- 
 cance should be the same and fewer trials are required to determine 
 the results. 
 
 TABLE LXI 
 COEFFICIENTS OF SUBJECTIVITY 
 
 Average Personal Average Agree- 
 
 Consistency ment with the Subjectivity 
 Material Trait Obs. . 2 Trials Group Av. Ratio 
 
 Faces (photos).. Frankness 10 .625 .632 .99 
 
 Faces (photos). .Intelligence 10 .627 .583 1.07 
 
 Faces (photos).. Beauty 10 .724 .641 1.13 
 
 Handwriting Resemblance 9 .789 .644 1.22 
 
 Syllables Agreeableness 10 .687 .532 1.29 
 
 Syllables Ease 10 .667 .492 1.36 
 
 Jokes Funniness 10 .550 .390 1.41 
 
 Appeals Persuasiveness 20 .677 .432 1.57 
 
 Faces (photos). .Attractiveness 10 .806 .466 1.73 
 
 Table LXI. gives a series of these determinations. The various 
 materials and traits are arranged in an order of increasing subjec- 
 tivity as measured by the " subjectivity ratio" (ratio of index of 
 personal consistency to index of group agreement). Judgments of 
 the frankness and intelligence of faces (photographs) are completely 
 objective, that is, a given individual correlates as closely with the 
 average judgment of the group as he does with his own judgment on 
 another occasion. But as one goes on down through the table the
 
 116 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 personal consistency coefficients remain fairly constant while the 
 coefficients of group agreement decrease. This gives a larger and 
 larger "subjectivity ratio," until, in judgments of the attractiveness 
 of faces, the personal consistency coefficients are nearly twice as 
 large as those of group agreement. 
 
 The use of the coefficients of correlation as criteria of subjectivity 
 in the case of judgments expressed by serial arrangement is much 
 more satisfactory than the relation of the two figures of variability. 
 Fewer trials are required for the determination, and the measures 
 are not complicated by the end error, and other factors which tend 
 to disguise the real size of the M. V. 's. 
 
 It is probable, however, that the distinction between subjective and 
 objective judgments is at best but an artificial one. The chief differ- 
 ence between the two classes seems to consist in the amount or clear- 
 ness of the differences present between the various items of the mate- 
 rial judged. Judgments of preference will, in the case of a given 
 individual, be expressed as consistently as judgments of weight, dura- 
 tion or intensity, providing the differences are equally perceptible; 
 and judgments of intensity, etc., will vary as much as those of pref- 
 erence if the differences afforded by the material are sufficiently 
 slight. The fact that a so-called objective scale may be applied to the 
 material in the one case and not in the other, is, in the first place, 
 only an extrinsic fact, and in no way conditions the psychological act 
 of judgment. In the second place the objective scale derives its own 
 validity in the long run only from the consensus of opinion and from 
 its pragmatic value. So far as this is concerned a consensus of 
 opinion may be secured for even the most variable and personal sort 
 of material, as witness Thorndike's scales for measuring the excel- 
 lence of penmanship, literary composition, drawing, etc. The only 
 difference between the two cases would be in the universality of the 
 verdict, and this again in no way conditions the psychological act. It 
 is apparent that the coefficients are merely indices of certain charac- 
 teristics of the material, rather than of any features of the judg- 
 ments, as judgments. A certain sort of material may not be constant 
 from time to time or from observer to observer (jokes or comic pic- 
 tures, for examples) . Here the judgment attitude may be conceived 
 as constant, but the material changed. Or one sort of material may 
 provide larger differences between items most alike, and either 
 situation would be revealed by the "coefficients of subjectivity." 3 It 
 
 8 It is of course also true that, in judging such a general trait as "attrac- 
 tiveness" different observers may proceed on the basis of different qualitative 
 standards and this fact would also be reflected in several of the coefficients, 
 though not in all of them.
 
 CHARACTERISTICS OF JUDGMENTS OF EVALUATION 117 
 
 is to be expected that various sets of material, of the same content but 
 with differing degrees of difference between successive items would 
 show the same differences in "subjectivity" as those found with 
 different kinds of material. Subjectivity means, then, either of two 
 things, or both: (1) The amount of difference, (2) the universality 
 of the verdict. These also differentiate judgment and perception. 
 
 Ninth Problem. Agreement Between Diverse Groups. The final 
 problem to be presented here concerns the agreement between the 
 average judgments of two groups of observers, when only small 
 groups are used. It is of course obvious that if the two groups are 
 sufficiently large and represent similar or random selections of 
 humanity, the two final orders will be identical, no matter how "sub- 
 jective" the material may be. But if the groups are small, or if they 
 represent different samplings of human nature, differences might be 
 expected which would be of interest to individual, social, and applied 
 psychology. 
 
 I have brought together in the following table such material as 
 I have been able to secure from my own studies and from the pub- 
 lished reports of others. The range of material represented is small, 
 and this problem would seem to constitute an interesting theme for 
 further work in statistical psychology. 
 
 In the case of this sort of material the average correlation of two 
 groups representing approximately the same sampling of the popula- 
 
 Material, Trait, and Observers r PJS. 
 
 H. L. H. Appeals, relative persuasiveness. 
 
 20 women with 10 other women 610 .06 
 
 20 women with 20 men 624 .06 
 
 10 women with 20 men .598 .06 
 
 Average of all three coefficients 611 
 
 E. K. Strong. Advertisements. Persuasiveness. 
 
 15 men with 10 women 53 .07 
 
 25 subjects and group of advertising experts 51 .10 
 
 25 subjects and manufacturers of the commodity 52 .10 
 
 Advertising experts and manufacturers 64 .08 
 
 Average of all four coefficients 55 
 
 Kuper. Cosmos prints. Preference. 
 
 100 boys with 100 girls (ages 6.5 to 16.5) 24 .06 
 
 E. K. Strong. Advertisements. Persuasiveness. 
 
 50 college men and 97 farmers and mechanics .53 .07 
 
 22 college women and 30 college women 93 .02
 
 118 EXPERIMENTAL STUDIES IN JUDGMENT 
 
 tion is about .60. The average personal consistency coefficient is 
 about .70, while -the correlation of two trials by the same group on 
 two different occasions is about .90. The coefficient of personal con- 
 sistency thus stands about midway between that of the consistency of 
 a group and the agreement of two diverse groups. 
 
 The last two figures from Strong's data, and the one from Kuper's 
 study show the great degree to which the group agreements are 
 conditioned by the composition of the groups. The college students 
 and the manual laborers yield a large negative coefficient, while the 
 two groups of college students give almost perfect positive correla- 
 tion. The boys and girls correlate, in judging the interest of pic- 
 tures, by only .24. When college students or adult men and women 
 judge the degree of their interest in appeals not remotely different in 
 character from those used with the children, men and women show as 
 high correlation as do two groups from the same sex. It would seem 
 that in this index of group correlation we have then another useful 
 index of the subjectivity of the material. If the material were 
 weights or brightness intensities there would be no reason for expect- 
 ing these various groups to show any significant differences in the 
 degree of mutual correlation. 
 
 We are thus provided with at least five different indices of sub- 
 jectivity, personal consistency, approximation to group average, the 
 ratio of these two indices, the ratio of variability to series length 
 (M.V./P), and the agreement of diverse groups. It would be inter- 
 esting to work out the interrelations of these various indices in differ- 
 ent judgment situations. 
 
 BIBLIOGRAPHY OF THE ORDER OF MERIT METHOD 
 
 1. Barrett, The Order of Merit Method and the Method of Paired Comparisons, 
 
 Jour. Phil., July 3, 1913, 382-4. 
 
 2. Cattell, The Time of Perception as a Measure of Difference in Intensity. 
 
 Phil. Stud., 1903. 
 
 3. Cattell, A Statistical Study of Eminent Men, Pop. Sci. Mo., 53, 357, 1903. 
 
 4. Cattell, Statistics of American Psychologists, Am. J. Psychol., 1903, XIV, 
 
 310. 
 
 5. Cattell, Statistical Study of American Men of Science, Science, N. S., XXIV. 
 
 6. Cattell, A Further Statistical Study of American Men of Science, Science, 
 
 N. S., XXXII. 
 
 7. Cattell, Appendix, American Men of Science, 2d ed., 1910. 
 
 8. Downey, Study of Family Resemblance in Handwriting, Bulletin No. 1, 
 
 Dept. of Psychology, Univ. of Wyoming, 1910. 
 
 9. Fernald, G. E., The Defective Delinquent Class, Differentiating Tests, Amer. 
 
 Jour, of Insanity, 69, 125-142, 1912. 
 
 10. Hillegas, Milo B., A Scale for the Measurement of Ability in English Compo- 
 sitions, Teachers College Studies.
 
 BIBLIOGEAPHT 119 
 
 11. Hollingworth, Judgments of the Comic, Psych. Eev., 1911, 18, 132. 
 
 12. Hollingworth, Judgments of Persuasiveness, Psych. Eev., 1911, 18, 234. 
 
 13. Hollingworth, Influence of Form and Category, Jour. Phil., 1912, 9, 513. 
 
 14. Hollingworth, Principles of Appeal and Response, Appletons, 1913. 
 
 15. Hollingworth, Experimental Studies in Judgment, ARCH. OF PSYCH., No. 29. 
 
 16. Kuper, Group Differences in the Interests of Children, Jour. Phil., 1932, 9, 
 
 376. 
 
 17. Norsworthy, Validity of Judgments of Character, Essays in Honor of Wil- 
 
 liam James, 1908. 
 
 18. Strong, The Relative Merits of Advertisements, ARCH. OF PSYCH., 1911, 17. 
 
 19. Strong, Application of the Order of Merit Method to Advertising, Jour. Phil., 
 
 October 26, 1911, 600-606. 
 
 20. Strong, Psychological Methods as Applied in Advertising, Jour. Ed. Psychol., 
 
 Sept., 1913, 393. 
 
 21. Sumner, A Statistical Study of Belief, Psych. Eev., 5, 616. 
 
 22. Thorndike, Handwriting, Teachers College Record. 
 
 23. Thorndike, Mental and Social Measurements, 2d ed., 1913. 
 
 24. Wells, A Statistical Study of Literary Merit, ARCH. OF PSYCH., 1907, 7. 
 
 25. Wells, On the Variability of Individual Judgments, Essays in Honor of Wil- 
 
 liam James, 1908, 511. 
 
 26. Yerkes, Introduction to Psychology, Holt, 1911, Ch. XIV.
 
 UNIVERSITY OF CALIFORNIA LIBRARY 
 
 Los Angeles 
 This book is DUE on the last date stamped below. 
 
 
 
 
 ' 
 
 
 >; 1 ^*3?>*jivJ 
 
 f*^^ ^*-.f*~ .
 
 :' V ''-.. 
 
 
 A 000 289743 7 
 
 
 
 
 
 
 + ./..--: 
 
 
 
 
 ^^n