LIBRARY UNIVl ...m o^ CALIFORNIA SAN DIEGO A STATISTICAL STUDY OF LITERARY MERIT With Remarks on Some New Phases of the Method FREDERIC LYMAN WELLS. PH.D. Palholojfioal PN.vcliolofjist in tlio .McL.oaii HoKpital, Waverley, Mass. Formerly I>ecturer in Barnard College, Colnnibla University ARCHIVES OF PSYCHOLOGY EDITED Br R. S. WOODWOKTH NO. 7, AUGUST, 1907 Colninbia University Contribntions to PbilONOpby and Psycbologry, Vol. XVI, Wo. 3 NEW YORK THE SCIENC :NC:£^R^S e CONTENTS PAGE. I. The Order, Positions and Probable Errors of Ten Leading American Authors 5 II. Quality Analysis 20 III. On the Validity of Individual Judgment as Measured by De- PiVRTURE FROM AN AVERAGE 25 I. THE ORDER, POSITIONS, AND PROBABLE ERRORS OF TEN LEADING AMERICAN AUTHORS The practical value of the statistical method in the measurement of a mental trait rests upon the hypothesis that such value of this trait as is worth measuring in any individual is significant for a cer- tain group of persons as it impresses itself upon that group, and only in so far significant as it thus impresses itself. This is what the method measures. Unrecognized merit may exist, but it is also likely to 1^ inefficient merit, which is not merit at all in any legitimate sense of the term. We must finally assume efficiency to be in proportion to its influence. This would work injustice only where such influences were unaccounted for, or accounted for to the wrong source, and in such determinations as these this factor is certainly, if not indeed al- ways, negligible. The measure of influence is the ultimate criterion of efficiency. While the data of the method are based upon introspection, yet they are dealt with in such a wholly objective way as at least to meas- ure, if not indeed to largely remove, the invalidities usually traceable to this source. Just as the biologist cannot make a certain measure- ment on all individuals of a given species, so here we cannot deter- mine the effect of our objects on all the community. We need not, however, select so much at random as is usually advisable for the biologist, but we can select those individuals whose judgments are the least likely to vary, that is, those best informed on the subject, just as the biologist would select as assistants those individuals who gave him the smallest variations in measuring the same object. We might also regard the judgment of each grader as a new measurement made with the same instrument. In the absence of constant error, we suppose those measurements the most accurate which vary from each other least. We should find that persons who had never heard of our 10 American authors would grade them almost by pure chance and that persons of limited knowledge in this respect would vary a great deal, but when we come to those who have made a special study of this group there is but little variation, and it is their judgment that we therefore regard as the most valid. As we ascend the scale, constant deviations, mainly of a chronological and geographical nature, are introduced, and this precludes determinations of absolute validity. It is not these that would be of most use, however, but the knowledge of how the series of graded objects has influenced a certain particular group. From this point of view the method is as much a measure of the judges as of the judged. 6 A STATISTICAL STUDY OF LITERARY MERIT In these experiments we get a direct measure of the relative ex- tent to which the authors have impressed themselves upon the group which we are studying. In so far as this is a representative group, we get a measure of the extent to which they have influenced the com- munity represented, and a determination based, from this view-point, upon entirely objective facts. The writer's first experiment by this method along the lines of literary criticism dealt with short compositions by a single author, the arrangements being made by 40 women undergraduates. Ten stories by Edgar Allan Poe were graded in order of preference, the order, positions, and p.e.'s, together with the graphic representation according to the scheme devised by Cattell,* being given below. Order. Pos. P. E. The Fall of the House of Usher. 3-6 26 The Murders in the Rue Morgue. 4- 35 Ligeia. 4.1 22 The Purloined Letter. 4.6 53 William Wilson. 5.1 24 The Telltale Heart. 5.8 3 The Cask of Amontillado. 6. 38 Metzengei stein. 6.6 26 Loss of Breath. 7-1 3 Le Due de L' Omelette. 7-7 32 Average Difference in Position .46 Av 31 On account of the limited training of the graders the m.v.'s are considerable compared to those to be subsequently discussed. The differences in position are also much smaller. "Working by the meth od of % of like sign s it was not possib le to discover any correlations m proferclic-e, positive or negative, that might not as well be ascribed to pure chanc e. This seems rather surprising, as one would naturally have expected relative preferences to be the same within types of stories, that is, one who disliked Loss of Breath should also dislike Le Diic de L'Omelette. But such slight relationships as did appear seemed to be rather between stories relatively unrelated by ordinary critical standards, as positive between Loss of Breath and William Wilson, negative between The Purloined Letter and The Cask of Amon- tillado, etc. These results appeared to indicate that the standards of literary ^jriticisra erected by accepted critical scholarship would bear experi- * Science, N. S., 24, 658, 699, 732, 1906. ORDER, POSITIONS, ETC. 7 mental examination. Aside from the intrinsic interest of determin- ing relative positions in the group tested, it seemed desirable to analyze so far as possible the precise standards upon which such judgments were based. Accordingly the experiment whose results form the raison d'etre of the present study was devised. It is not, however, to be antici})ated that the introduction of a scientific method into this field should contribute markedly to the principles of accepted critical procedure; the main function of literary criticism having hitherto been to serve rather as a convenient vehicle for individual expression than for the empirical determination of actual literary relationships. Ten American imaginative writers were selected for study, these being presented in alphabetical order, Bryant, Cooper, Emerson, Hawthorne, Holmes, Irving, Longfellow, Lowell, Poe, Thoreau. They are presumably all in the first 15 of their class. These were graded first in respect to general literary merit. They were then graded in respect to their possession of ten literary qualities. These, also in alphabetical order, and with the abbreviations by which they will subsequently be designated, were Charm (Ch), Clearness (CI), Euphony (Eu), Finish (Fi), Force (Fo), Imagination (Im), Origi- nality (Or), Proportion (Pr), Sympathy (Sy), Wholesomeness (^Vh). These lists were not determined by any standard method but by a literary critic in ordinary consultation with the writer. The terms are in the main technical terms of literary criticism and there seems to have been no great difficulty about their interpretation. The grading was done at a meeting of the English Graduate Club at Colum- bia University, the work occupying from 35 minutes to 1 hour. One of the graders was the critic above mentioned, the remainder belong- ing, with 2 or 3 exceptions, to the graduate student group. There was a remarkably small amount of invalid data, principally confined to such lapses as grading the same author 3rd and then again 7th. The present results are derived from 20 records. Of course in so large a number of separate distributions as that un- der consideration (110), the probable incidence of certain forms by pure chance is not inconsiderable. While in general they approxi- mate the normal distribution as closely as could be expected in the limited number of judgments, yet it may be worth while to call atten- tion, with special reference to species, to some of the more marked deviations from the normal, where the factor of chance, which, of coiu-se, is itself always measurable, does not seem to play a promi- nent part. This is perhaps the phase of the results most interesting to stu- dents of literature. For example, the fact that VII (Bryant) has a distribution of such marked bimodality as to be practically with- 8 A STATISTICAL STUDY OF LITERARY MERIT out the range of chance deviation from the normal, is perhaps not without critical interest. It has been suggested that these two groups might have a certain geographical distribution, the relatively higher grades coming from New England and neighboring states. It is now impracticable to verify this supposition, but there is nothing inher- ently improbable about it. and such theories are, of course, experi- mentally verifiable. I am rather distrustful, however, of the value of explanation for its own sake and representing a personal opinion. We shall perhaps do well to remember that we know just as good reasons for many things that are not so as for things that are, and when the history of our present thought is written it will probably be found that we have explained to our complete satisfaction quite as many of the former as the latter. There is little ground for supposing different species in the re- mainder of the general merit grades except perhaps in the case of VIII (Thoreau), whose grades fall with almost equal frequency among the last 5 positions. The three most markedly bimodal dis- tributions m the quality grades are those of II (Poe) for Charm, and I (Hawthorne) for Clearness and Sympathy. In 17 cases the same author receives grades in first and last place, though in only 2 cases is there a grade in every place, namely, in IV (Lowell) for Sympathy and X (Cooper) for Clearness. The most variable distribution is that of III (Emerson) for Proportion with a p.e. of .61, and the least variable are those of II for Imagination and Originality with p.e.'s of .11 each. There are naturally many distributions that on their face are bimodal, but the probability of their occurrence by pure chance is too great to warrant their acceptance as evidences of species in the judgments. On the whole, the opinions seem to concentrate about a common centre rather than to form groups. I f the distributions were governed by pure chance, they would . always approximate to 2 grades in each place . As the frequencies are not governed by pure chance, but presumably by the probability distribution about a mode, we can roughly determine to what ex- tent the variability we obtain is a true variability for this class of judg- ments. For example, in the 40 judgments of Poe's stories, it was found that the results from 20 random selections differed but little from the results of the 40. There would thus be reason to believe that the variability found in the 40 judgments was representative of the amount of variability that we might expect to find in dealing with judgments of this sort. It has been suggested that in this method at least, the reliability increases much more slowly than as the square root of the numl^er of cases, and may be more accurately represented by the mean variation itself. If the factor of memory might only be overcome, it would be ORDER, POSITIONS, ETC. 9 well wortli while to compare with the variability of nuiiiy individuals the variability of a single individual from the averajje of his own judgments. This was done l)y Cattell for a consideral^le numlK-r of psychologists. We shouhi then have a measure of constancy in judg- ment that would have a not uninteresting psychological bearing. A single judgment is subject not only to error from the average judgment of other individuals, but from the average judgment of the individual himself. Large and small m.v.'s may be the product of variations along either of these lines. We are all probably very much surer of our relative preferences for lobster Newberg and fried oysters than of our preferences for Emerson and Hawthorne; yet these very differ- ences in taste might produce as large an m.v. in one case as in the other. For some purposes of analysis the median has seemed a better] measure than the average . It was somewhat discredited in the results , of Cattell, but is of more value here on account of the larger number of measures. The average is here also relatively less valid because' the number of possible positions is limited to ten, whereas it was there in the negative diiection practically unlimited. In the present r e- sult s there is almost no distribution in which the author does not re- ceive a gr ade in either first or last place, and when the grades are banked up a gainst first or last place, the average is obviously too low or too high, probably more so than the median . However, it is of no particular consequence which we use so far as order is concerned, for the two orders are almost identica l, the divergences that occur being well within the limits of chance variation. The accompanying tables give the main results of the experiment in the median and average order and position of the authors in general merit and the equalities. In general merit the writers fall into three groups, separated by considerable distances, three at the top, three in the middle, and four at the bottom. Between the three at the top there is little differ- ence to speak of, between I and II practically none at all. The median of II is considerably higher than that of I, and it is very possible that his true position is higher than I. Such constant error as might result from prejudice would perhaps operate more against II. P^ach has six grades in first place, and none in last. It is quite anomalous that the differences should be greater in the middle group than at the ends; although the p.e.'s are not of the smallest they fail to overlap at all; the chances are over 16-1 that the order given is correct. The narrow mathematical limits of variability might account in a measure for the small p.e.'s at the ends, and perhaps also for the small differ- ences in position, which are equally striking; but only in a small meas- ure, for this condition does not obtain in the quality grades, nor in 10 A STATISTICAL STUDY OF LITERARY MERIT MEDIANS. G.M. Ch. CI. Eu. Fi. Fo. Im. Or. Pr. Sy. Wh. M.ofM. Hawthorne I . Poe II. Emerson III. Lowell IV. Longfellow \' . Irving VI . Bryant VII. Thoreau VIII . Holmes IX. Cooper X . 2 2 2 4 5 6 6 8 8 8 5 2 7 5 5 8 2 4 6 4 5 2 8 5 4 8 2 7 7 5 I 5 7 3 9 5 6 9 6 3 2 6 7 4 5 2 6 5 5 5 I 3 3 I 7 3 3 4 5 7 7 9 2 5 7 8 5 5 3 5 7 I 2 7 4 3 4 5 7 7 9 5 8 9 7 7 8 5 7 4 2 I 4 7 8 5 4 7 6 5 3 3 2 7 7 7 2 7 7 2. I . 5-1 6.1 6. 6. 7- 7-5 8.3 2.7 I 3 6 8 6 7 4 7 5 7 2 2 I 3 3 5 5 2 2 -7 / 4 4 3 7 8 6 8 7 5 I 8 9 8 3 9 5 4 3 3 6 7 5 7 7 3 5 5 5 7 7 6.9 10 2.5 4-5 3-2 4.1 5- 5-7 5-1 7.2 2.9 2. I 6.1 4-5 4- 4-3 6.2 7.2 7.2 7-5 Medians 5 5 4 7 5 5 5 5 8 5 I 6. 5 9 5 5 5 2 4 8 5-3 Median Orders. Positions Displaced from Average Order are Given in Italics. G. M. Ch. CI. Eu. Fi. Fo. Im. Or. Pr. Sy. Wh. //. VI. VI. II. I. III. II. II. II. V. III. I. I. V. I. II. II. I. I. I. VI. V. III. IX. IX. IV. V. I. X. III. VI. I. VI. IV. IV. X. V. VI. VIII. III. VIII. IV. IV. IV. V. //. I. VI. IV. IV. V. A". V. IX. VII. VI. V. VII. VII. VII. VII. VI. IV. IX. III. IX. VII. VIII. //. VIII. III. X. IV. VI. VII. VII. VIII. VIII. III. IV. IX. IX. IX. VII. VII. III. X. I. IX. VII. VIII. III. VIII. V. VIII. IX. VIII. VIII. X. X. X. III. X. X. VI. IX V. X. II. II. AVERAGES. G.M. Ch. Cl. Eu. Fi. Fo. Im. Or. Pr. Sy. Wh. Av. of Av. I. II. III. IV. V. VI. VII. VIII. IX. X. A. D. p. 2 2 2 4 5 5 7 7 8 8 5 6 9 4 I 7 I 9 I 4 3 4 6 5 5 2 7 6 5 8 2 I 7 5 9 7 5 I 5 5 6 8 5 3 3 5 6 5 5 I 6 3 5 5 5 7 2 I 3 I 7 4 3 4 5 7 7 9 4 7 6 I 7 8 6 2 8 I 2 2 6 4 4 4 5 7 7 9 I 2 7 8 5 6 6 5 3 4 4 3 I 4 7 8 6 4 6 6 5 4 7 8 5 2 3 8 6 2.4 1-4 5-8 6.1 6.2 6.1 6.8 7.3 8. 4-7 2 I 3 6 7 6 6 4 7 6 9 5 8 6 8 2 7 8 9 4 3 2 6 5 4 4 5 7 6 8 I 9 5 3 2 9 4 9 5 3 8 5 4 3 3 7 8 5 7 5 4 6 4 2 8 I I 6 6 9 3 4 4 4 4 5 5 6 4 5 7 9 9 6 4 4 3 4 5 5 5 4 6 6 6 7 7 2 6 9 2 7 5 2 65 62 59 82 81 72 •^^1 71 62 58 51 •39 ORDER, POSITIONS, ETC. 11 PROBABLE ERRORS. G.M. Ch. CI. Eu. Fi. Fo. Im. Or. Pr. Sy. Wh. Av. I. II. III. IV. V. VI. VII. VIII. IX. X. Av. 21 25 37 35 25 31 35 37 21 33 ■37 ■47 ■31 ■31 ■45 ■35 .29 •38 ■45 ■23 56 41 48 56 29 29 31 38 38 47 29 19 33 27 35 43 35 47 27 17 33 21 38 41 27 31 35 31 27 23 29 41 17 37 31 25 31 38 31 48 •17 . 1 1 ■48 ■31 •43 ■35 •33 ■33 ■27 ■45 23 1 1 43 31 35 33 29 45 33 39 •31 ■37 .61 ■52 ■31 •33 •33 ■33 .29 ■25 33 31 51 39 38 31 47 35 37 19 37 15 52 27 35 41 43 48 37 41 32 27 42 37 35 33 34 38 33 32 3 -36 41 31 31 33 •32 32 •36 36 38 343 i - 1 _ii other relative position work that has been done with even smaller series than 10. Between the positions of VI and VII is another long step, 1.4 between positions, .8 between limits of p.e.'s, and VII again fails to overlap the p.e. of VIII. From here until X's posi- tion at 8.4 the steps are about equal. It is thus seen that we have no man who is so distinctly at the head of American writers as one is found among contemporary Astrono- mers, Psychologists and Pathologists. It is perhaps a fair inference that enlargement of a group may decrease differences at the top by bringing more of the leaders into conflict. There is no doubt that a certain department of American letters could have been found in which III would have reigned supreme, and the differences between I and II could have "been much increased, in either direction, by nar- rowing the field of literary work to be considered. It is beyond dis- pute that there would be more disagreement about the order and less about the identity of the five greatest poets of the world than the five greatest poets of France. Such a condition is probably to be ex- pected in all walks of life. There is a limit to the realization of human powers fixed by opportunity and other environmental factors. "Es wird dafiir gesorgt," says the German proverb, "dass die Baume nicht in den Himmel wachsen." If we artificially limited to 140 ft. the height of a tree ordinarily growing to 150 ft., we should find more trees at 140 than at 135. It acts in the same way as any other limitation of a normal distribution, crowding the extreme cases to- gether. This is probably a reasonable alternative to the supposition of genius as a separate group. Though the peculiar conditions noted above do not generally obtain in the qualities, these present certain other points of interest. In Charm there is a group slightly above the middle position, the in- creases and decreases from which show nothing anomalous. The 12 A STATISTICAL STUDY OF LITERARY MERIT G. M. Ch. CI. Eu. u O OS 5 'in s, 1^ ■a O s o « 3.^ u '5) CI •0 01 c I. II. III. IV. V. VI. VII. VIII. IX. X. 2 2 2 4 5 5 7 7 8 8 5 6 9 4 1 7 I 9 I 4 21 25 37 35 25 31 35 37 21 33 VI. I. II. IV. IX. V. VIII. III. VII. X. 2 3 4 5 5 5 6 6 7 8 9 2 I I 5 5 7 7 5 35 37 47 31 45 45 38 31 29 23 VI. V. X. I. IX. VII. IV. II. VIII. III. 3 3 5 5 5 5 5 6 6 8 5 I I 2 5 5 6 7 3 29 29 47 56 38 31 56 41 38 48 II. I. V. IV. VI. VII. VIII. III. IX. X. I 3 3 4 4 5 7 7 7 9 7 4 7 I 8 6 2 6 8 I 19 29 35 27 43 35 47 33 27 17 Fi. 1 Fo. Im. Or. u V ■E c ■55 W Pi' u 11 5 1 t4 u er- haps with Proportion. It may also contribute to the high position of Finish, but it is difficult to see how it could have been avoided. The results of the calculations by the various methods to be described are given in the accompanying table. 22 .4 STATISTICAL STUDY OF LITERARY MERIT A. In d. Dis. B. Med. Dis. C. Rel. to Med. of Med. and Med. of G.M. D. Med. ft like signs. K. Size of p.e.' s. Fi. Ord. Pos. P .E. Ord. Pos. Ord. Pos. Ord. Pos. Ord. Pos. I. 12.2 1 7 II. 6 I. 7 I. 8 I. .307 Eu. II. 12 7 7 I. 7 II. 7 II. 8 II. 312 Or. III. 13 4 6 IV. lO III. 7 V. 8 IV. .320 Im. IV. 14 9 V. II IV. 6 VI. 8 III. 322 Pr. V. 14 6 8 III. 13 V. 6 IX. 7 VI. 328 Fo. VI. 14 7 8 VI. 14 VIII. 6 III. 6 VII. 363 Ch. VII. i.S .s 8 VII. 17 VI. 4 IV. 6 V. 365 Sy. VIII. iq 6 8 IX. 18 VII. 4 VII. 6 VIII. 369 Wh. IX. 20 Q 8 VIII. 19 IX. 2 VIII. 5 IX. 376 CI. X. 22 7 I o X. 20 X. I X. 4 X. 413 Thus the average number of displacements per individual is 12.7 in Eu., 15.5 in Ch., etc. but the number of displacements for the median grades of the group is 7 for Fi., 18 for Wh., etc. The general average of the individual displacements is 16. i with an m.v. of 5.1, the distribution of the entire 200 series of displacements being as follows: 4 6 8 10 12 14 16 18 20 22 24 26 28 30 38 7 7 17 21 19 30 20 25 14 14 7 9 5 4 I The distribution is again skewed to the small end, like that of the p.e.'s. and prob- ably for the same reason, i. e., hmit of individual accordance. A rough determination of the standards by which our 20 graders judged as a group may be rapidly arrived at by simply making a table in which a + sign is attached to every case in which the quality grade of an author is on the same side of the median of the grades in that quality as the author's grade is on the side of the median of general merit. A — sign means that the quality grade and the grade in gen- eral merit are on different sides of their respective medians. Thus I in general merit is also high in Charm, and for this quality receives a + sign. But he is low in Wholesomeness, and in this receives a — sign. Then the quality in which the greatest number of + signs is found is that quality in which an author oftenest stands in a position analogous to his place in general merit. As will be seen, high and low positions in general merit have usually gone with high and low positions in Euphony, Finish, and Imagination, but only once has this been the case in Clearness (Table, col. C). Correlations by % of like signs were applied, but the results were very inferior to those obtained by the other methods, as shovm in column D. It shows just enough agreement to demonstrate its in- exactness. While well adapted for certain sorts of work and the only method for cursory observation of individual relationships, it does not seem to operate satisfactorily in the correlation of orders. It would be difficult, however, to find a correlation method more QUALITY ANALYSIS 23 admirably adapted to all relative position work than the measure of displacements devised by Professor Woodworth. In any order of 10 positions, such as we have here, to produce an exactly reverse order {i. e., correlation — 100% Pearson) would require 45 displace- ments. X being above 9 that he should be Ijelow gives 9 displace- ments, IX above 8 that he should be below gives 8, etc., total 45. Orders that had no reference to the standard would center about 22 and 23 displacements, while the fewer the displacements the higher the positive correlation. For comparative purposes the displace- ments may be expressed in percentile relation. There have been determined by this method the number of dis- placements from the order of general merit given by the order in each of the qualities (see Table, col. B). This is a rapid means of reaching a generally reliable conclusion, and is much more exact than that af- forded by the relation of the individual positions to the general median. It is as yet impracticable, however, to assign a workable p.e. in such determinations and for this purpose I undertook the calculation of the displacements of each quality as given by each individual grader from the order of general merit as given by that individual. The order of correspondence thus obtained has been taken as the standard (col. A), as it seems to possess a measurable and not inconsiderable degree of validity. According to the graphic representation the jDosi- tions and p.e.'s are as follows: /i i3 If ir /I /7 1* /I >« w >i « ;» The p.e.'s of the average displacements are larger, yet the differ- ences are usually distinct within two places. The steps are about equal for the first seven qualities, and then we find a considerable gap to the last three, whose p.e.'s are larger, as those at the top are smaller. Some traces of this gap are discernible in the results by the cruder methods. Indeed not the least reason for confidence in these orders is the correspondence they maintain. The B and C orders are practically the same while the very coarsely determined D order keeps well on the positive side. The siun of these orders is prac- tically that given by the standard. The above orders are all measures of the same general thing, between which, provided they were valid in principle, a certain cor- respondence would be mathematically necessary. A still closer cor- respondence, however, is found with an order mathematically by no means so well associated with the degree of correspondence, namely, the size of the p.e.'s discussed on p. 16, and whose table is reproduced in col. E. It will be noted that the order of relative importance of 24 A STATISTICAL STUDY OF LITERARY MERIT the qualities corresponds to the order in size of the average p.e. with but three displacements. A certain amount of this must indeed be ascribed to happy chance, for the differences in the p.e.'s are often infinitesimal, and were there actuallj'' perfect correspondence the pres- ent methods would be far too coarse to detect it surely. So far as the results go, the qualities that we tend to judge an author by are also those that we tend to grade with the greater accuracy. It is perhaps not unnatural that the traits about which we have the most assurance should also be those that we regard as the most important. The close correspondence of the two may itself be in the nature of an argu- ment for their validity. The method measures directly an author's possession of a quality with reference to other authors. Indirectly an idea may be obtained of the prominence or absence of a quality relative to the other quali- ties of his own work. Aside from such errors as would be due to differences in the ranges, etc., he is likely to have more of a quality in which his position is higher than of one in which his position is lower. Thus I, who has a median of 2.1 in Imagination, but one of 6.9 in Wholesomeness, is probably more imaginative than he is whole- some. A table may be constructed in which a plus sign is given to those quality grades which are at the same time both above the author's median of medians and the general median of the grades in that quality, this last always falling somewhere in the neighborhood of 5.5. Minus is assigned to those grades which fall at the same time below the author's median of medians and the general median of the quality, and a zero sign goes to those which fall between the two. Other things being equal, a + sign then goes to the qualities that are relatively prominent, a — sign to those that are absent, and zero to those which are inconspicuous one way or the other. Such a table contains 35 + signs, 27 — signs, and 38 zero signs. The figure, however, has little significance save when it refers to a prominent quality in a low author or a lacking quality in a high one. The following are in order the two highest and the two lowest quality grades received by each author; i. e., the two qualities for which his work is presumably the most and the least distinguished. Most. lycast. Most. Least. I. Fi Im CI Wh VI. Ch CI Or Fo II. Im Eu Sy Wh VII. Wh Eu Or Ch III. Fo Wh Pr CI VIII. Or Fo Fi Pr IV. Eu Pr Im Or IX. Ch CI Fo Im V. Sy Wh Fo Or X. Im CI Eu Fi ^^^L^ ^c^.^^jut^ U^ JU^ U^ iL-A^ : lu^su t1 ^'^^^ »''V'i--mA t) Guci.'^(>^ci.s. y^ ^1S- III. ON THE VALIDITY OF INDIVIDUAL JUDGMENT AS MEASURED BY DEPARTURE FROM AN ANERAGE. If we took a series of graduated weights, and asked a numljer of persons to serially arrange them in order of their apparent heavi- ness, we should find, if the differences between the weights were suffi- ciently small, that no one could save by chance arrange them in cor- rect order, but that there would always be more or less displacement. The person whose arrangement showed the least displacement would approximate closest to the true order, and we should therefore con- sider him to have the most accurate judgment for weight. Now assuming that the distribution of all the errors made followed that of the probability curve, we should find that the errors compensated and that the average order in which the weights were placed would also be very close to the correct order, closer probably than that of the best individual, though the average number of displacements might be considerable. In estimating the accuracy of our subjects' judgments of weight, it would make little or no difference whether we took as the true order the actual order of heaviness as measured on the scales, or took the average order as the standard. Theoretically, each would give us the same result. But there are many important qualities, and indeed those most adaptable to measurement by relative position, whose differences we cannot determine in this objective way. The question then arises, are we also here justified in taking the truth of the average order as objective, and measuring the value of a judgment according to its deviation from it? For clearly unless our average approximates to some objective validity, the absolute value of a single judgment is not measured l^y the amount of its deviation from it. To recur to our weights, suppose we heated and cooled the weights to varying degrees before presenting them to all save one of our subjects, and to him presented them at equal temperatures. The subjects would all feel the colder weights as heavier, and the average order would not be the objectively true one, and the order of the subject jx^rceiv- ing the weights under equal conditions might well be the farthest from the average. Our two groups would give us different results because they were judging from different standards. It is just this condition that must be guarded against in those measurements where an average order is all that we have to guide us. We have, a priori, no objective measure of the var^'ing stand- ards by which the individuals judge. Still less do we know the rela- ive values of the standards themselves. In the case of the weights 26 A STATISTICAL STUDY OF LITERARY MERIT we know the differing nature of the standards, and can allow for them; but if we did not know them the judgment of the single subject would still be the most useful for us. Practise will overcome many illusory standards of judgment to which normal persons are subject, and I should hardly have the right to assert my judgment of direc- tion to be superior to that of Professor Judd because I was nearer tlie average than he in amount of subjection to the Zollner illusion. In the measurement of mental traits by relative position we have thus two factors that tend to cause individual deviation from the average, namely the absolute inaccuracy of the judgment, the direction of whose errors will be variable, and a differing standard from other members of the group, the direction of whose errors will be constant, at least throughout the individual. We must know the exact nature of the deviations due to these two causes before we can estimate the values of the judgments. We must also know the value of the standards, for it is possible that the opinion of a very accurate judge by one set of standards might be of smaller value than that of a less accurate judge by another. We must show cause why a person who judges literary work by its clearness must have ipso facto a poorer judgment than one who judges it by its imagination. It is possible that in the estimation of scientific merit, where this method found its first application, there would be more unanim- ity in the standards of judgment, yet there are some divergences from this cause, since there was an observed tendency for graders to give disproportionately high position to men engaged in the same special work with them and to their own immediate colleagues. The method has here been applied only to the first fifty psychologists, but it gave fairly definite results, and these might be still more definite in others of the sciences. Save for observer A the order is rather variable, and it might be questioned whether a man's estimate of the fifth group should be allowed the same weight with his estimate of the first. This is also a matter subject to a good deal of variation, for the second best judge of the first ten psychologists is the worst of the second, the fifth of the third, the eighth of the fourth, and the sixth of the fifth. However, where the variations in the standards compensate,. as they ought to do in scientific merit, the method is immeasurably more valid than where they not only patently fail to do so but give a false standard, as in literary merit. The conditions are exactly the same as with the varying sizes and temperatures of the weights^ Our group of weight-graders constantly gives a small or cold object an undue weight; the group of scientific graders constantly assigns high position to their immediate colleagues and co-workers; the group of literary graders constantly allows a presumably undue weight to VALIDITY OF IXDIVIDUA L JlJUdMESr TI Euphony and Finish. The variation in the accordance of the judges is a little over 2:1, as was the case in Cattell's psychologists; the ac- cordance of the judgments also tends to follow the normal distribution, though there seems to be a slight skew in favor of the more accordant judgments. It should not be iinpossible to get a quantitative demonstration of these differing standards. When we have a series of objects graded in respect to a general quality, and then in regard to the main ele- ments of that quality, the relative influence of the elements on the general judgment appears in their degree of correspondence to the general ciuality. Now while the graders showed a certain unanimity in assigning to various elements of literary merit a certain order of influence, it does not follow that the mature judgment of eminent literary critics would give the same order, or that the graders them- selves would give it twenty years hence. Still less does it follow that this standard is the best one for us to abide by, or that it is one which the graders themselves would not be among the first to con- sciously repudiate. If we had the qualities directly graded in order of value to literary merit, we should hardly expect to find Euphony and Finish first, Clearness and Wholesomeness last. Nor do we. Such a judgment was obtained from a group of 24 graduates in psychology and education, of about the same intellectual level as those who furnished the literary grades. I see no reason a priori — and there is certainly none evident in the results — why the conscious judgment of this group should not have the same ethical value as that of the literary graders, or why the terms should not have been equally well understood. The group contained a certain^ proportion of women, about one-third, but this factor did not appear to influence the character of the judgments. The formula by which the ciuali- ties were graded was "according to their importance to the fulfil- ment of the highest function of literature." No definitions of any of the qualities were given, nor does it appear that it would have ijeen advantageous to have given them. This order of importance, with positions and p.e.'s, is shown in the accompanying table (cols. T.C.). This table, compared with that on p. 22, gives an idea of what we think we judge literary merit by as contrasted with what we ac- tually judge it by. The number of displacements between the two orders is 28 — slightly more than we should expect by pure chance. Such correspondence as there is between our naive and conscious standards is thus slightly in the direction of perversity. It is proba- bly something more than an amusing coincitlence that that quality which we are so sure we ought to judge an author by most of all is the one Avhich really plays the least part in our estimate of liim. ami that the two qualities which ought to have the least share in deter- 28 A STATISTICAL STUDY OF LITERARY MERIT T. C. E. G. T. C E . G. Av. 1 p. e. Cl. 2-7 4.1 28 43 Fo. 3 7 4 7 26 41 Or 3 7 3 37 31 Im. 4 3 2 4 35 34 Wh. 5 5 6 9 49 6 Pr. 6 I 6 3 25 55 Ch. 6 I 5 8 33 55 Sy. 6 3 5 7 31 46 Fi. 8 4 7 3 32 21 Eu. 3 5 9 I i8 26 A.D.P. ! -^5 75 Av. •3 39 mining an author's position are those which always show the most remarkable correspondence with it. The distributions of these grades are unimodal for the most part, and only in Wholesomeness do we find distinct species of high and low grades. It has much the largest p.e. and is the only quality receiving a grade in every place. The species were examined for sex correlations, but none were apparent. Before the method for the determination of individual standards had been applied, the literary graders had been made aware, through one of the cruder methods, of the general relations of the qualities. It was therefore impossible to obtain from them any order not subject to large constant error. Nevertheless, it seemed worth while to ob- tain a few records from this group. Records were obtained from 14 individuals, of whom 12 had taken part in the previous test. The results are given in the last quoted table, cols. E.G. The order and positions here assigned also differ from the objectively determined order by slightly more than the chance nimiber of displacements, but while the number of displacements is almost identical with that of the order given by the other group, there are 11 displacements between the two groups themselves, and in a few cases these discrepancies are outside the limits of the p.e. This may well be due to the constant error men- tioned above, and I do not consider that there is sufficient warrant for supposing separate species. An interesting aspect of these re- sults is afforded from the view-point of individual comparisons. The number of displacements that occur between the order of the authors in general merit and their order as assigned in the various qualities by a single individual, gives an idea of that individual's actual stand- ards of judgment. The qualities that vary least from the general merit order are his most important standards. In the grading of the qualities themselves we have the conscious standards by which VALIDITY OF INDIVIDUAL JUDGMENT 29 the individual thinks he judges. The orders assigned to the quaH- ties naively and consciously are strikingly divergent. The average number of displacements is about 20, a little less than the chance number; it occurs as high as 34, and as low as 8. In the former case the individual's conscious standards are almost the reverse of his naive standards. We might call such a figure a "coefficient of con- sistency," The relative smallness of the p.e.'s of the averages assigned by the Teachers' College Group is due wholly to the larger number of graders; the p.e. of the individual judgment, as measured by the m. v., is practically the same in each group. It is interesting to observe that the special training of the literary graders has neither varied the standards to any noteworthy degree, nor given them greater as- surance. There are many complications into which it is not possible to enter deeply. Thus a certain irreducible minimum of Clearness might be most desirable, but once this irreducible minimum were assumed, an analogous degree of Charm might be more important. It must also be remembered that the standards quoted in the table on p. 22 are standards for the criticism of imaginative writers, while the quali- ties are here graded according to their importance to the fulfilment of the highest function of literature. If we had graded a group of historians, we should probably have found less real judging by Euphony and Finish, and more by Clearness and Force. The standards of judgment for imagmative writing may not be the highest literary standards, perhaps there are other departments of literature which are held to higher standards. But this interpretation is of very doubtful value, since literature, technically considered, is imagina- tive by definition. Now the best judge is not the man who judges most true to ordi- nary standards, but the man who judges most true to the best stand- ards. To discuss what these best standards might be would lead at once into devious ethical pathways; let us call them for the moment the most useful ones. It is probably fair to assume that the maturer, more experienced and distinguished of a group of graders, selected by universal experience for the very abilities which they are here exercising, shoukl, at least in this particular respect, have a better judgment than the remainder of the graders. By this same token, they should also have different standards of judgment, and this would tend to draw them awa}^ from the average, but should not, therefore, be held to discount the value of their opinions. After all, the func- tion of a method of this sort is not to tell us what we could not possi- bly find out in any other way, but rather to determine quickly what in less organized experience might require many years. Its data 30 A STATISTICAL STUDY OF LITERARY MERIT must not run too contrary with those of our every-day experience; even the method of measurement by relative position would itself hardly survive the shock of Aristotle's appearing in the lower half of the world's philosophers. The data of relative critical ability obtained by this method show little accordance with the results of oiu" partially organized experience. It is also true that there is ap- parent in the results no correlation between accordance of judgment to the average and approximation of individual standards to it; how- ever, when the new factors that would here come into play are con- sidered, it will easily be seen that the present data are much too coarse for such refinements. But the order of critical ability given by the method of direct accordance is quite too far from that of the best ex- perience. Nor does the best judgment for literary merit correspond at all to the best judgment for the various qualities. The worst judge of general literary merit, according to his divergences, is the 3rd best judge of Charm, the best judge of Clearness, and the 13th best of Euphony. The best judge of general merit is the 5th best of Charm, the 14th of Clearness, and the 17th of Euphony. All that is really given in the individual deviations from the aver- age judgment is the individual who tells us most about the group, or the most accurate judge for a certain set of standards, which, at least in the case of these literary judgments, every one will probably admit to have a rather low ethical value. We can hardly draw inferences as to the general capacity for sound judgment as measured by the soundness of judgment for any particular class of objects. We must have the information as well as the ability to weight it. It might be that the best judge of the psychologists was he who had the best proportioned knowledge of the work done in the various fields. Judgment may be wholly a mat- ter of information if we make this term synonymous with experience. Obviously then, the fact that one has a good judgment for psycholo- gists tells us very little about the value of his opinion in other fields. To demonstrate the very existence of an abstract power of judgment is ultimately synonymous with the problem of free will. Fortunately it is not in this abstract power of judgment that we need be in the least interested, but rather in the quality of one's judgment for a par- ticular class of objects. We wish to know whether a person is a good judge of distance, of faces, of a mining prospect. To determine this we must pay careful attention to the weighting of the standards of judgment. ^5 if/ UC SOUTHERN REGIONAL LIBRARY FACILITY AA 000 807 140 9