PAL e peg Rares hail teeth Ps Pedi oe area res ete yw 4 irginia Library surement of intelligenc TT MAT YX OOO 453 bee PE a ets def tape phatase 3 AS, EE hh IOI rt tt sePusu t hha AF at PRE art pent rringe si ee hee ae! a - : i My = { 2 ne eA. # ry a x 7 iets aR. tetsu b A Ni abe yah ats Sh ak bre v cS Pale errr at Fe hne sh Septet AEM be its ad fib eer Saietid weilSees : THE HECK MEMORIAL LIBRARY Department of Education University of Virginia GIFT OF MR. ALFRED W. ERICKSON NEW YORK CITY 1923-1924— forrey ‘ ta byA : 3 ie oe ev aterach wank = he see TNT Sit teesciti Aamir XM t Hi H ' ite MEASUREMENT @F INT EEEKGENGCE By EDWARD L. THORNDIKE, E. O. BREGMAN, M. V. COBB, ELLA WOODYARD, and the Staff of the Division of Psychol- ogy of the Institute of Educational Research of Teachers Col- lege, Columbia University The investigations and results reported in this volume were made possible by a grant from the Carnegie Corporation Bureau of Publications TEACHERS COLLEGE, COLUMBIA UNIVERSITY NEW YORKehPREFACE This volume represents the fruits of three years of investigation (from July 1, 1922, to July 1, 1925) by the Division of Psychology of the Institute of Educational Research. It attempts to answer the essential questions concerning the nature and meaning of the measurement of a mental fact in the sample case of intelligence, or rather of a defined segment thereof. Its conclusions, in so far as they are warranted, should become the basis of sound practice in the construction and calibration of scales for use in mental mea- surement. According to them, the present theory and practice of measurement of mental abilities are justified to a remarkable de- eree in certain respects, but in others should be almost recreated. Some of the most important of these conclusions were reached only in the last six months of the inquiry and are consequently presented with less adequate evidential support than is desirable. The concept of area of intellect in particular needs more experl- mentation to make it clear, and still more to demonstrate its sound- ness and worth. We had intended to add a long chapter reviewing the literature on the topics dealt with in this volume, but it seemed more impor- tant to exemplify and apply the results of our conclusions in a eonerete series of tasks selected and scaled according to the prin- ciples described; and there was not time to do both. We hope to be able to publish such a review later, and in particular to do jus- tice to the notable contribution of Kelley (’23a), which deserves most careful study by everyone who is concerned with the general logic of mental measurements. We had intended also to include full treatment of the method of obtaining a group of approximately known forms of distribution in respect of a mental trait measured in truly equal units, by taking the members of an array in that trait who have identical scores in a second trait correlated with the trait in question. This method was abandoned in favor of a better one, but nearly a third of our time and effort was spent in exploring its possibilities. The results should be made known, both because of their intrinsic interest, and because otherwise someone will surely be tempted to do again what has already been done by us. The material is, however, highly technical and elaborate; and it seemed best not to include it in this volume.vl PREFACE The general responsibility for the work rests upon the senior author, who planned and directed the various inquiries, organized the results, and wrote this book, with the exception of Appendix III. It would, however, have been utterly impossible for him to have carried the work through without the financial assistance of the Carnegie Corporation and the Trustees of Teachers College, and without the loyal cooperation of the staff of the Division of Psychology of the Institute of Educational Research, and many scientific workers in all parts of the country. Dr. Bregman col- lected and organized most of the facts which are used in Chapter VII and Appendix III, and some of those used in Chapter VIII Miss Cobb devised many of the tasks of levels A, B, C, D, EB, and F, and, with the aid of Dr. Murdoch, Dr. Tilton and Miss Robin- son. measured 180 imbeciles of mental age 3 to 5 and 100 of mental age 6. Dr. Woodyard has arranged and supervised most of the testing and scoring in grades 4 to 9, and has shared in the evalua tion of the difficulty of the thousands of tasks which have been used in our experiments. Dr. Murdoch made all the tests with the fifty feeble-minded at Polk. Mrs. Miner has computed most of the correlations. Miss Robinson, Dr. Hunsicker, Dr. Tilton, and Mr. Upshall have given expert and painstaking service in testing and scoring. Dr. Toops and Mrs. Ruger worked up the data which provided the first set of tasks graded in difficulty from which the final seale eventually developed. Miss Hanson, Mrs. Work and Miss Wilcox have had a large share in the arrangement and tabulation of the results. We are indebted, for most courteous and efficient cooperation, to all the psychologists on the staff of Teachers College, to fifty members of the American Psychological Association who made various ratings for us, to Dr. Raymond Franzen and Dr. Grace A. Taylor who supplied valuable records, to Miss Elizabeth K. Far- rell, Inspector of Ungraded Classes, New York City, Mr. George Melcher, Assistant Superintendent of Schools, Kansas City, Mis- souri, Dr. E. H. Nifernecker, Director of the Bureau of Educa- tional Research of New York City, Dr. Howard W. Potter, Clinical Director of Letchworth Village, Dr. Louise M. Poull, Psychologist at the Randall’s Island Institution, Mr. Lionel J. Simmons, Super- intendent of the Hebrew Orphan Asylum of New York City, and to the many principals and teachers who have facilitated our ex- perimentation.CONTENTS CHAPTER I.—The Present Status. PAGE Ambiguity in content 1! Arbitrariness of units 3 Ambiguity in significance U Measurements of intelligence are measures of intellectual products 11 Measurements of intelligence imply valuation eccecnceeeeeeceeee 12 Truth 14 Development with age 16 Ability to learn.......... ; 17 Other attempted simplifications of the process of valuation... 18 Relational thinking 19 The content or data of tests of ime] lect ee encecece cence menermeneee 20 The form of tests. of intellect 8 ee 21 Scoring the products of imbe]lect ee eeecececceeeceeceneecenenemennmneeneenente 22 Further facts concerming: ifficun) ty eee neceeseeeeeeceeeeenceeneennnentnnnee 28 Width or extent On Tales ee 31 Speed 32 The relative importance of altitude, extent, and ee of intellect. ......... Saco 33 CHAPTER II. —The Measurement of Dvfficulty. The present Status. nveennnevcenencecenereenenerernenenenentntneneneenetnemnntnn 37 Measurement of differences in difficulty by way of ener iedee of the form of distribution of the variations of an indiv idual anvleavelwo ts intelloctess 0c i ee 40 The relation of the variability of an individual to his amount OL RIT LTyannnnaaeanancennannenncneneennnetmeetrennemnenemnrmnenrenes 43 Measurement by way of the form of etapation of errelieet in SOME Gefined LTOUP ccccevecrnneneveveneerseeeenevneeeetstenenetnetneenennennenetnenetneenenatten ok Measurement by way of the form of distribution of an array iva feorrelationstable. ee 54 The defects of the Measurements so far GeSCri bed... sconce 56 CHAPTER III—The Measurement of the Intellectual Difficulty of Tasks and of Level of Intellect: More Rigorous and Exact Methods. Intellectual difficulty... UE Ee ape een 62 IntellactVG@AVD. .. ee ee ee 65Vill CONTENTS PAGE The relation of intellect CAVD to the abilities measured by ordinary intelligence @xX Amin sAtlONS = 96 The homogeneity of difficulty CANVID Ee ees sae 101 The inference from the form of distribution of a gerade popu- lation in Standard Intelligence Examinations scores to the form of its distribution in level or altitude of intellect CAVD . ee Soe: he aes 104 CHAPTER IV —The Measurement of the Intellectual Difficulty of a single Brief Task. The problem in the case of single tasks, each of which measures intellect plus a mere SAaMpl]iNg CLTOV anne nner 109 The problem in the ease of such single tasks as are used in CAVD or in Standard Intelligence Examinations... 14 The solution by the use of extensive composite tasks... 118 The correlations of single tasks with measures of intellect... 122 SINKS yc ee ee - ioe CHAPTER V.—The Measurement of the Intellectual Difficulty of Tasks by a Consensus of Expert Opinion. The Experiments nn eee Se ALO The Ratings ee eee NEES The Validity of the Consensus... sect ase 141 Snmmacye ee: 156 CHAPTER VI.—Levels of Intellect. Composite tasks oe es 100 The construction of composite tasks.......... = TO 10-composites in word knowledge or “V” 179 The construction of 10-composite tasks in sentence completion, arithmetical problems, and the understanding of sentences and paragraphs... oo 103 The difficulty of the 10-composites .... 208 The combination of 10-composites into 40-composites......- . 21] CHAPTER VII.—The Transformation of the Scores of Standard Intelligence Examinations into Terms of Scales with Equal Units. The method of transformation, illustrated by the Thorndike Examination and Army Alpha 224 The National Intelligence Examimation Ac cccccccecsceessceemennen 239 The Otis Advanced Examination 245CONTENTS 1x PAGE The Haggerty Examination, Delta 2 247 The Terman Group Test 250 The Myers Mental Measure 254 The Pintner Non-Language Test 254 The L.E.R. Test of Selective and Relational Thinking, Gen- eralization and Organization 257 The Brown University Psychological Examination... 260 The Army Examination a 264 CHAPTER VIII.—The Form of Distribution of Intellect m Man. General considerations 271 The evidence 274 The form of distribution at ages up to fifteen... she 285 The form of distribution in adults 287 CHAPTER IX.—A Scale for Measuring Altitude of Intellect. The difficulty of composites I, J, K, L, M, N, O, P and Q.......... 295 Estimating o:1 from ot, 297 Expressing the o: of each group in terms of a common unit........ 303 Expressing the measures of difficulty as distances from a com- mon point of reference 314 The difficulty of composites A, B, C, D, E, F, G, and HW... 321 Estimating o: from 6, 323 Expressing the o, of each group in terms of 6,, : ue O28 Expressing the measures of difficulty as distances from a com- mon point of reference 331 The Seale 336 CHAPTER X.—The Absolute Zero of Intellectual Difficulty. Locating zero difficulty by experiment. eneneonenrcrccecencmeeneneneeemeen 340 A program of tasks to use in measuring tasks of very little in- tellectual difficulty .. 341 Locating zero difficulty by a consensus... Pel 342 CHAPTER XI.—The Measurement of the Altitude of an Individ- ual Intellect. The form of the curve of percent correct in relation to difficulty 351 Estimating the CAVD altitude of an individual.________-_- 365 CHAPTER XII.—The Measurement of Width and Area of Intellect. Width of intellect in the case of truly intellectual tasks... 373x CONTENTS PAGE Width of intellect in the sense of the number of single short tasks mastered, any one of these tasks being only a very partial representation of intellectee= ee 376 Ares, of intel cet aaeseceenoce-eeneeeceneeeeoeeeeseneeeuerneeccensnnnecennnsenneenneeegerumnaceeanerentenatat 378 Proportional Counts .ccecncceccnennncenennennetnnnmnnsnnenninnntnere 383 CHAPTER XIII.—The Relations of Altitude to Width, Area, and Speed. The relation between altitude and W(10C+10A+10V4 10D), i.e., number of 40-composite CAVD tas sks succeeded with a a given level of Giffictal tye nnenennncncenncennen ton BOO The relation between altitude and W(1C or 1 in or 1V 1D), i.e., the number of single tasks succeeded with at a = lev él oe La 390 The relation between mutes and area of ian ea te 397 The relation of altitude or level of intellect to speed. 400 CHAPTER XIV.—The Meaning of Scores Obtained in Standard Intelligence Examinations. The meaning of the Binet Mental Ages. ne 402 The meaning of scores obtained in Standard group examina- Ons pe i 403 The meaning of scores aici in tests of the ability to learn and to improve = 108 Mean Square Error of a CAVD Altitude in Units of the Cc AVD Seale (1.00 Equalling 9,,) ’ ee 411 CHAPTER XV.—The Nature of Intellect. AV wOrkine, Gefinition Of In ellech-s eee 413 The hypothesis that quality of intellect depends upon quantity of connections ko ee ab Experimental verification of the quantity hypothesis... 422 Summary cs 430 CHAPTER XVI.—The Measurement of Original Intellectual Capacity and of Acquired Intellectual Ability. The present status of opinion 33 General principles 435 The use of novel tasks 437 The use of familiar tasks 439 The use of a series graded for susceptibility to environmental InflueNCeS ........c. Perce ence en Potts ain, 440 The test and results of Bar 447CONTENTS xl PAGE The use of altitude and width of intellect. scence . 458 Other methods of separating original capacity from acquired ability : 460 Summary 462 CHAPTER XVII.—Changes in the Altitude and Area of Intellect With Age. Altitude 463 IAT Ca eer See 467 General consideratione........... 6 Pare het he a Ce bee 468 CHAPTER XVIII.—Summary of Results and Applications to the Measurement of Human Abilities in General. Summary of results 469 Applications to the measurement of human abilities in general 476 Appenpix JI. The Form of Distribution of an Individual’s Varia- tions in) Intellechs ne ee ee 491 Apprnpix II. The Relation of an Individual’s Variability to His Ability in Tests of Intelligence... Aue 497 Apprenpix III. On the Form of the Distribution of Intellect in the Sixth Grade, the Twelfth Grade, and Among College Freshmen........ 8 RO oa os Doe Appenpix IV. The Homogeneity of Intellect CAVD at All Levels of Difficulty = 222s =. mig 556 AppenDIx V. The Adequacy of Tasks of Any One Level of Diffi- culty as a Measure of All of Intellect CAVD......... 565 AppenpIx VI. The Estimated Form of Distribution of Various Groups sae eget See eeeTABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE LIST) OF STABEES PAGE 1—Variations of the Scores of Thirteen (or Fewer) 30- Minute Trials with Part I of the Thorndike Intelli- gence Examination for High School Graduates from the Median Score for the Individual in All Thirteen Trials. 20 Gifted Pupils, a, ec, d,e,...u. 13 Dif- ferent Days 2—The Relation of the Variability of an Individual to His Amount of Ability in Fifteen Tests or Amalgama- tions of Tests, Using Eight Levels of Ability... 3—The Variability of Four Individuals in Intellect Accord- ing to a Certain Additive Combination of Factors All Positive 4Four Forms of Distribution 5—Approximate Percentages Which the Differences in Diff- culty between Task T,, Task T,, Task T,, etc., are of the Difference between T,; and T,, According to the Form of Distribution of the Groupee 6—The Correlation of Success in Task 281 with Average Seore in a Total Series of Intellectual Tasks... 7—The Correlation between CAVD Summation Score and Stanford Binet Mental Age in the Case of 178 Imbe- eciles Sixteen Years Old or Older, of Mental Age 28 Months to 59 Months 8—The Correlation between Score in the Thorndike Exami- nation for High School Graduates (Average of Two Forms) and an Incomplete Sampling of Intellect CAVD 9—The Correlations between Scores in Stock Intelligence Examinations and Level Scores in Arithmetical Prob- lems and Sentence Completions. 180 Pupils in Grades 7. to 12: Data trom Clark? (24) 10—The Effect of Decreasing the Error of Estimating the Difficulty of the Median Task of a Composite of Ten by the Use of the Percent of a Group Scoring “5 or More Right out of Ten” in Place of the Median of the Ten Percents of the Group Scoring “Right” in the Tasks Taken One at a Time. Vocabulary Tasks in the Case of 250 Pupils of Grade 84g cnccccccemeceneecencenenee xili 47 49 52 100X1V TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE LIST OF TABLES PAGE 11—The Difficulty and Intellectualness of 30 Single Tasks in Understanding Sentences, Measured by the Percent of 668 11th Grade Pupils Succeeding with Each, and by the Correlations of Success in Each with the Average Score in Two Forms of the I.E.R. Sel. Rel., Gen. Org. Nxaminsion, eee 12--Percentages Succeeding and Correlations with a Criterion in the Case of 24 Reading Tasks and 52 Vocabulary Tasks: Grade 11: n=668 for the Reading Tasks and 454 for the Vocabulary Tasks......................- 13—Percents Succeeding and Correlations with a Criterion of Intellect, in the Case of 240 College Graduates 14--The Correlations (Bi-Serial r) of Each of 99 Reading and Vocabulary Tasks with Intellect (I.E.R. Sel. Rel., Gen. Org.), Grouped According to the Percent Suc- ceeding with the Task 15—Overlappings and Bi-Serial r’s for 35 Elements.. 16—The Correspondence between Success in a Single Small Task and Intellect, as Measured by the Overlapping of the Score in Intellect of Those Failing with the Task Past the Median Score in Intellect of Those Succeeding with the Task. Compiled from the Origi- nal Data of Vincent 17—The Correspondence between the Sum of the Ratings of Ten Judges and the Sum of the Ratings of the Other Ten 18—The Probable Divergence of a Difficulty Rating by 20 Ex- perts from the Average of an Infinite Number of Ratings of the Task Each by 20 Experts... 19—Measures of the Difficulty of 10-Composite Tasks... 20—Differences in Difficulty of Various Composite Tasks and of the Median Sums of 20 Expert Ratings of the Single Tasks of These Composites Which Were Rated. Each Difference Is Expressed as a Percent of the Dif- ferences between the A and the D Composite of Its Kind 21—Form of Distribution Used in the Caleulations of Tables 19 and 20. Relative Frequencies at Equal Successive Intervals 121 137 145TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE LIST OF TABLES XV PAGE 22—Measures of Difficulty If the Form of Distribution As- sumed Is Form A or a Rectangle. Distance from C.T. in Terms of o or Q/25 23—Differences in Difficulty of Various Composite Tasks and of the Median Sums of 20 Expert Ratings of the Sin- gle Tasks of These Composites Which Were Rated. Each Difference Is Expressed as a Percent of the Dif- ference between the A and the D Composite of Its Kind 24—Difficulty of Twelve Composites by the Results with 240 College Graduates, and 189 Candidates for College Entrance, in Distances + and — from the Median for the 240, in Terms of the Sigma of the Composite Con- cerned. Also the Median Ratings by the Consensus of Such Tasks in Each Composite As Were Rated 25—Measures of Difficulty 26—Measures of Difficulty 27—The Differences in Difficulty of CAVD 40-Composite Tasks by Experiment and by the Consensus of 20 Experts ...... bo 28—Percents Correct for Each Single Word of Seven 10- Word Composite Tasks in Each of Various Groups of Individuals 29—Percents Correct in the Single Tasks of Word Knowl- edge: 10-Composite Tasks 8, 9, 10 and 11... 30—Percents Correct for Each Single Word of the Seven 10- Word Composite Tasks la, 2a, 3a, 4a, 5a, 6a and 7a...... 31—The Percentages Obtaining Five or More Right Out of Ten in the Vocabulary Composites 1, la, 2, 2a, 3, 3a, ete. 32—Percents Correct in the Single Tasks of Word Knowl- edge: Composite Tasks A, B, C and D. 180 Adult Imbeciles 33—Percents Correct in the Single Tasks of Word Knowledge K, F, G, and H 34—Percents Succeeding with Each Single Task of Various 10-Composites in Two Groups of Adult Imbeciles....... 35—Percents Succeeding with Each Single Task of Various 10-Composites in Four Groups: 100 Adults of Mental 148 149 185 190 191 193 194 195 196Xv1 TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE LIST OF TABLES PAGE Age 6, 50 Feeble-Minded of Class 3 in an Institution. Pupils in Special Classes in a Large City, and Pupils in Grade 4 (second half) a nn-an-ne-enener rere 36—The Permilles Succeeding with Each Single Task of Various 10-Composites of Sentence Completions 37-_The Permilles Succeeding with Each Single Task of Vari- ous 10-Composites of Arithmetical Problems... 38-The Permilles Succeeding with Each Single Task of Vari- ous 10-Composites of Directions and Reading... 39-The Difficulty of 10-Composites Co. A, B, C and Dy Ar: A, B, C and D; V. A, B, C and D; D. A, B, C and D, in the Case of 180 Adult Imbeciles......... eae! 40—The Difficulty of 10-Composites Co. E, F, G and I; Ar. EH, F, G and H; V. E, F, G and H; D. E, F, Gand H, in Various Groups. (o Distances Are Omitted from This Table.).... 41—The ae of 10-Composites Co. F, G, H, I J and K; ke (Gh oe EJ and Ke) Ve we GE cid anges, a Fr G, H, %, 1, 2 and 2%, in Various ada in %s. 2 42-The Difficulty of 10-Composites Measured by the Percents of 147 Pupils in Grade 51% Succeeding with Five or More of the Ten Single Tasks, and by Distances + or — from the Median Difficulty for Grade 544, in Units of the Mean Square Variation of Grade 54% in Level of Whatever Ability the 10-Composite Measures in Each Case. Similar Facts for 205 Pupils and 200 Pupils in Grade 5144. The 147 Pupils Are Those Who Were Included in Both the 205 and the 200 43—The Difficulty of Various 10-Composites in the Case of 44 Adults: Recruits in the United States Army... Successes and by Distances + or — from the Median Difficulty for Grade 814, in Units of the Mean Square Variation of Grade 81% in the Ability Measured by the Composite 45—Difficulty of 10-Composites Measured by the Percents of Two Groups (246 Pupils in Grade 9 and 264 Pupils in Grade 9) Succeeding with Five or More of the Ten Single Tasks, and by Distances + or — from the Median Difficulty for the Group in Units of the Mean Square 197 199 202 205 210 210 212 . 213 44-_Tifficulty of 10-Composites Measured by the Permilles ofLIST OF TABLES Xvi PAGE Variation of the Group in the Altitude of Whatever Ability the 10-Composite Measures..ic.ccccccccccccsscscseseeneern Taste 46—Difficulty of 10-Composites Measured by the o Distances + or — from the Median Difficulty of a Given Grade in Units of the Mean Square Variation of the Population of That Grade in the Ability Measured by the 10- Composites ....... Taste 47—The Difficulty of Various 10-Composites Measured by the Percent Succeeding and by the Distance from the Median in Terms of the Mean Square Variation of the Group: 422 Normal School Seniors. The form of dis- tribution is assumed to be “normal.” The division into two groups of 150 and 185 is approximately at random. The group of 87 represented a somewhat superior selection and took certain additional tests...... TaBLE 48—Difficulty of 10-Composites Measured by the Percents of TABLE TABLE TABLE TABLE TABLE TABLE TABLE bo (a) 240 College Graduates and (b) 100 Students of Education (College or Normal School Graduates) Succeeding, and by Distances + or — from the Median Difficulty for the College Graduates in Question, in Units of the Mean Square Variation of the College Graduates in Question in Altitude of Whatever Ability the 10-Composite Measures in Each Case eccccccccccccecnne 49—Difficulty of 10-Composite Measured by the Percents of 53 Adult Students Succeeding, and by Distances + or — from the Median Difficulty for the Group, in Units of the Mean Square Deviation of the Group in the Ability Measured by the Composite. iecccsccscseecceenmeewe 50—Difficulty of 10-Composites Measured by the Percents of 63 University Students Succeed iran eccccccsecccecsenceeeeeueeen 51—Summary of the Facts Concerning the Difficulty of the Four 10-Composites Constituting Each 40-Composite.. 52—Thorndike Examination for High School Graduates. Part i Worms and N; Grade 12) n—1b2(- 53—Thorndike Examination for High School Graduates. Part I, Forms D and N. Scores from 54 to 129 Cor- rected to Be in’ Truly, Biase WS 54—Army Alpha: Grade 9: n=1721 55—Army Alpha: Grade 12: n=1387 214 bho — n 216 217 218 218 220 ~XVlll TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE LIST OF TABLES PAGE 56—Army Alpha: College Freshmen (Ohio) : n=2545............- 57—Army Alpha: Grades 9, 12 and 13 (College Freshmen). Values of Successive 5-Point Intervals of the Original Scores, in Equal Units --------n-nnnrn nnn : 58—Army Alpha: Grade 5: 1 = 2630 _ 59—Army Alpha: Grade 6: n=28 iH 60—Army Alpha: Grade 7: n 61—Army Alpha: Grades 5, 6 and 7. Values of Successive 5-Point Intervals of the Original Scores, in Equal UTR ree 62—Army Alpha: Grades 12 and 13. Supplemental Values of Successive 5-Point Intervals of the Original Scores, in Hxquial) Units anne eevee 63—Final Estimate of Relative Values of Army Alpha Scores in Equal Units.......... aaa aac A eee eter ke. 64—Equivalents for Army Alpha Scores from 20 to 170 in a Secale with Equal Units, 1 of This Scale Equalling 89/90 of the Difference between 60 and 150 of the Original Alpha Scores, or Approximately 1/100 of the Difference between 50 and 150 of the Original Alpha 229 bo ww or nN o TOS cn scence et a eer nme 237 64a—Provisional Equivalents for Army Alpha Scores from 170 to 209; Seale as in Table 64...» 238 65—Army Alpha Distribution of Scores of 216 College Graduates 22 2 . 239 66—National A: Grades 6 (n=1668) and 9 (n=494) 240 67—National A: Grades 7 (n=1679) and 8 (n=482)......... 241 68—National A: Summary of Determinations of Values in Equal Units 242 69—National A: Grades 4 (n=1677) and 5 (n=2487)...... 243 70—Equivalents for National A Scores from 20 to 170, in a Scale with Equal Units. 1=Approximately 1/50 of the Difference between 100 and 150 of the Original Scale . 244 T1—Otis Advanced: Distrib tions. en esencsseccsccceeeeeneeeeenreeeeneeee 246 72—Otis Advanced: Equivalents for Each 10-Point Interval of the Original Seale in Equal Units... 247TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE LIST OF TABLES xX1x PAGE 73—Equivalents for Otis Advanced Scores from 30 to 200 in a Scale with Equal Units. 1=1/120 of the Difference between 50 and 170 of the Original Scores... 73a—Provisional Values for Otis Advanced Scores from 10 to 29 74—Haggerty: Delta 2: Distributions 75—Haggerty: Delta 2: Values in Equal Units... 76—Kquivalents for Haggerty Delta 2 Scores from 50 to 160, in a Seale with Equal Units. “J 7—Terman Group Tests 78—(Terman Group Tests.) Sample of the Six Sets of Values in Equal Units Whence the General Transmu- tation Table Is Derived 79—Terman Group Test of Mental Ability: Values in Equal Units of Each Point on the Original Seale from 35 to 193 80—Myers Mental Measure. Grade 6 (n=724) and Grade 9 (n=311). Values of Intervals in Terms of Equal Units, Expressed as Multiples of 1/45 x (Difference between 36 and 81) 81—Equivalents of Scores from 21 to 86 for Myers Mental Measure: In Equal Units. 82—Pintner Non-Language Mental Test. Original Scores and Values of Intervals in Equal Units... ccc 83—Equivalents for Pintner Non-Language Scores from 100 to 380, in a Seale with Equal Units. O and C refer to the original scores and the scores transmuted into a) Seale wath: equal) units: 2:8 <2 oe ston 84—I.E.R. Tests of Selective and Relational Thinking, Gen- eralization and Organization: Distributions in Grade 9 and Grade 12 85—I.E.R. Tests of Selective and Relational Thinking, Gen- eralization and Organization. Values of Intervals of Original Scale in Equal Units 86—Transformation Table. I.E.R. Tests of Selective and Relational Thinking, Generalization and Organization ‘ 87—Brown University Psychological Examination. Grades 12 and 13. N=3333+2118 248 . 249 249 250 256 “J bo on i) Or GO 209M4 bk 1 TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE LIST OF TABLES PAGE 88—Equivalents of Scores from 20 to 80 for the Brown Unt- versity Psychological Examination, 1n Equal Units 262 g9—Army Examination a: Distributions of Pupils in Grades _ oT > ‘ 9R* At Os ages ri VB aan aannnaanneen enna cere 263 90—Army Examination a: Equivalents for Each 10-Point Interval of the Original Scale in Equal Units. Re- sults from Grades 6, 7 and 8.x 264 9]—Army Examination a: Equivalents for Certain 10-Point Intervals of the Original Seale in Equal Units. Re- sults from Grades 4 amd 5. 265 92—Army Examination a: Equivalents in Equal Units. Re sults from Grades 6, 7 and 8; 4 and 5; 13; and Com- posite from AD) annem nnn 266 93—Transmutation Table for Army Examination a 267 94—Equivalents for Army Examination a: Scores from 10 to 360 in a Scale with Equal Units. 1=1/80 of the difference between 130 and 210 of the original scale... 268 95—National Intelligence Examination: Distribution of Seores for White Pupils, Age 11... 273 96—National Intelligence Examination: Distribution of Scores for White Pupils, Age 12... 275 97—National Intelligence Examination A: Distribution of Seores for White Pupils, Age 13... 78 98—National A: Data for Surface of Frequency in Equal UU its) ee ee eee 279 99—Otis Advanced: Distribution of Scores: Ages 11 and 12 282 100—Otis Advanced: Distribution of Scores: Ages 13 and 14.. 284 101—Otis Advanced: Data by Which the Surfaces of Fre- quency Are Constructed......... ills ea ES, 285 102—Haggerty: Delta 2: Distribution Of SCOTeS. 286 103—Haggerty: Delta 2: Data for Surface of Frequency in Equal Units..____—_— 288 104-The Effect of Correlation between Status and Gain When Gain Increases in a Geometric Ratio...ccccncenceee - 299 105—The Effect of Correlation between Status and Gain When G=AS+BLIST OF TABLES XX1 PAGE TasLE 106—Percents of Various Groups Succeeding with 20 or More Single Tasks of CAVD 40-Composites I to 8 bare 295 Taste 107—The Difficulty of Composites I to Q in Various Groups Expressed As a Deviation from the Difficulty for the Median of That Group, in Terms of the o of that Group in the Ability Measured by Success with the Composite in Question. — Is Easier, + Is Hardev........ 296 TaBLE 108—rt,t, As Estimated from Correlations between Number of Single Tasks Correct in One Half of a 40-Com- posite and Number of Single Tasks Correct in the Other Half; and Also As Estimated from Correlations between Number Correct in a 40-Composite and Num- ber Correct in a Neighboring 40-Composite........... 300 TaBLeE 109—Values of rt,t, Derived from Table 108, and the Values of Vrt,t, Used to Obtain Table 110 from Table 107..... 301 Taste 110—The Intellectual Difficulty of Composites I to Q in Groups 51%, 9I, 9II, 13 and 17 Expressed in Terms Oe Oh rs Givers Oh ory Gh oy Oe Gh 17) As Derived by the Use of Table 109 Ue Taste 111—Values of re,: Estimated from Correlations between Number of Single Tasks Correct in a 40-Composite and Number Correct in a Long CAVD Series................. 303 TaBLe 112—The Intellectual Difficulty of Composites I to Q in Terms of 5, 5, 5% 9 ete., As Derived by the Use of Table vo 111 304 Taste 113—The Intellectual Difficulty of Composites I to Q. Aver- ages of the Determinations of Table 110 and Table 112 305 Taste 114—Data for Computing Relative Variabilities of Different Grades in Intellect; and for Computing Distances between Medians of Different Grades in Intellect... 307 Taste 115—The Relative Variability of Different Grade Populations... 311 Taste 116—The Intellectual Difficulty of Composite Tasks I to Q in Terms of Ore pera OG TABLE 117—Difference between Grades in Scores Attained in Various Intelligence Examinations 317 Taste 118—The Intellectual Difficulty of Tasks I to Q Expressed in Each Case As a Difference from the Median Difficulty for Groupie 9; in) Units) Ofs orem ees eee 321XXxll TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE TABLE LIST OF TABLES PAGE 119—Percents Succeeding with Various Composites in Groups im 3, im 6, £, Spy 4, 5 Ad Aden eeeeeeerenrnnnrnenercenenenenee 120—The Difficulty of Composites A to K, in Various Groups, Expressed As a Deviation from the Difficulty for the Median of That Group, in Terms of the o of That Group in the Ability Measured by Success with the Composite in Questhome nn eeeeeneeeeeeeenernteeeeeenennenenene 121—Raw Intercorrelations of Composites C, D, E, F and G in the Case of 100 Individuals Chronologically Sixteen or Over, and Mentally ee 122—The Difficulty of Composites A to K in Terms of 9j,3, Oimey Fig, ele. 123—The Difficulty of Composites A to K in Terms of 9,, 124-The Intellectual Difficulty of Tasks A to K Expressed As a Difference from the Median Difficulty for Group OR MItS Ole G jn re 23 125—The Estimated Differences in Difficulty of Intellectual Tasks from Zero Difficulty to the Difficulty of Com- posite C 126—The Number of Psychologists (out of 40) Judging a Cer- tain Task To Be More Difficult Intellectually than a Certain Other, and the Number Judging the Two ne To Be Equally Difficult. The table reads: “Task 3 was judged harder than Task 37 by 13 and equal to it in difficulty by 7, Task 36 was judged harder than Task 38 by 2 and equal to it in difficulty by 3, Task 36 was judged harder than Task 27 by 10 and equal tomtinvditiculty by 2; and’ so one = 127—Median of the Scores (Number Right out of 40) for Each of Twenty-Nine Groups with Each of Four or More Neighboring Composites... ecceccccccsscsecssscseuceueeeneeenee 128—Altitudes Corresponding to Any Number Correct from outozao) ontiol 40 for Tasks d*to; Qe ee : 129—Approximate Provisional Altitudes Corresponding to Any Number Correct from 5 to 35 out of 40 for Tasks A toH 130—Correlations, Raw and Corrected for Attenuation, between Rate and Level. (After Hunsicker, ’25, Table V) 131—Intercorrelations (Corrected for Attenuation) of Sen- tence Completion (Co), Vocabulary (V), Arith- 350LIST OF TABLES XXlli PAGE metical Control (Ac), Arithmetical Association (Aa), Analogies (An), and Information (Inf) in 250 Pupils of Grade 84%. (Compiled from Tables of Wilton; 725; in press;) =o. eee eae TABLE 132—The Intercorrelation of Four Tests of the Higher and Two Tests of Associative Thinking. 100 University Students. P by Pearson formula, S by Sheppard formula TABLE 133—The Intercorrelations of Three Tests of the Higher and Two Tests of the Lower or Associative Thinking. 126 Pupils in Grade 51%. The correlations are all raw correlations by the Sheppard formu] an. ccceecceeeeeenee TaBLE 134—Observed and Partial Correlations between Age, Intelli- (Table XX gence, School Attainments, and the Results of the of Burt) Binet-Simon Tests TABLE 135—The Intercorrelations of One Binet Test (B), One Burt Reasoning Test (I), One Measure of School Work (S), and Age (A) by Certain Assumptions Concern- ing the Intercorrelations If an Infinite Number of Such Tests Had Been Used TABLE 136—Observed and Partial Correlations between the Binet- (Table XXT Simon Tests and Attainments in the Several School of Burt) Subjects TABLE 137—The Average Scores in the Original Units and in Equal Units in Various Intelligence Examinations at Various Ages; and the Differences between Successive Years...... TABLE 138—Variations of the Scores of Thirteen (or Fewer) 30- Minute Trials with Part I of the Thorndike Intelli- gence Examination for High School Graduates from the Median Score for the Individual in All Thirteen Drals; 20: Gifted: Pupils: a; e, di ¢ 4. uy ola Different Days TABLE 139—The Relation between an Individual’s Ability and His Variability. The variability is that of one trial (30 minutes) of the Thorndike Test, Part I, from the average of an infinite number of such trials. The two trials were taken on the Same Gay. acces cceccececceeeemnem TABLE 140—Thorndike Intelligence Examination for High School Graduates, 719—’23 Series. Part I, Trial 2, Arrayed under Trial 1. Test of Feb. ’22. 30=30 to 34, 35 = 35 to 39, ete. . 426 .. 428 429 465 498XX1V LIST OF TABLES PAGE TABLE 141—Thorndike Intelligence Examination for High School Graduates, 71923 Series. Part I, Trial 2, Arrayed under Thorndike Examination, Total Score. Women Students in High School, Normal School, College and University. 30=30-34; 35=35-39, ete. ..___-- 508 TABLE 142A—Thorndike Intelligence Examination for High School Graduates. Part I. Average Difference between Two Trials (Single Session) in Relation to the Average ‘Sum of Trial 1 and Trial 2 Score ( = . 509 ~ TABLE 142B—Same as Table 142A, except that the Difference is be- tween Trials on Different Days and that the Average Score Is from Four Trials. Normal School Students... 510 TABLE 143—Thorndike Intelligence Examination for High School Graduates. Part I. Variability of Score in One Trial Arrayed under Score in Another Trial or under Total Score in the Entire Examination. 10=10 to 14; 15=15 to 19, ete. 52 DIL TABLE 144—The Summaries of Table 142B and Table 143, with Coarser Grouping 513 TABLE 145—The Relation of the Variability of an Individual to His Amount of Ability in Fifteen Tests or Amalgamations of Tests. The upper number is the measure of varia- bility; the lower number (in italics) is the weight attached to it 516 TABLE 146—The Effect of the Selection of Tasks. cc ceccscceccssssscseesseeeneee . 518 TABLE 147—The Relation of the Variability of an Individual to His Amount of Ability in Fifteen Tests or Amalgamations of Tests, Using Eight Levels of Ability................... ec aah 519 TABLE 148—The Closeness of Fit of Six Test Scores, Taken Singly....... 531 TaBLE 149—The Closeness of Fit of Six Test Scores, Taken Two or More at a Time and Averaged................. ee . 533 TaBLE 150—Goodness of Fit of Observed Distributions—Groups 1 to 6—to Normal Curve 552 TaBLeE 151—Goodness of Fit of Composite Distributions to Normal Curve 552 Taste 152—Self- and Inter-Correlations of Four 40-Composites of CAVD, Each Divided into Two Random Halves (I and II). 98 Imbeciles. (P means Pearson Coefficient, S means Sheppard Coefficient.) or i) | ~]LIST OF TABLES XXV Tasie 153—The Inter-Correlations of Four CAVD Composite Tasks Like A, B, C, and D in Constitution and Difficulty, but Each Consisting of an Infinitely Large Number of Single Tasks. The Inter-Correlations of Table 152 Corrected for Attenuation . Sees eee OO TasLe 154—Self- and Inter-Correlations of Four 40-Composites of CAVD, Each Divided into Two Random Halves (I and II). 121 High School Graduates. (P means Pearson, Sh means Sheppard.) 598 Taste 155—The Inter-Correlations of Four CAVD Composite Tasks Like NY6a4%%2, OZ,75, PZ,86, and Q . 97, in Con- struction and Difficulty, but Each Consisting of an Infinitely Large Number of Single Tasks. 121 High 1 GO School Graduates 559 TABLE 156—The “Raw” Inter-Correlations of Five CAVD 40-Com- posites) in) 246 Pupilsiof.Grade 9 = ee 560 TaBLeE 157—The Inter-Correlations of Five CAVD Composite Tasks, Like Those of Table 156 in Constitution and Difficulty, but Each Consisting of an Infinitely Large Number of Single Tasks 560 TaBLE 158—The “Raw” Inter-Correlations of Four CAVD Compos- ites in 192 Pupils of Grade 9 561 TABLE 159—The Inter-Correlations of Four CAVD Composites Like Those of Table 158 in Constitution and Difficulty, but Each Consisting of an Infinitely Large Number of Single Tasks = 56) TABLE 160—Summary of the Inter-Correlations Corrected for At- tenuation 562 TABLE 161—The Correlations between the Number of Single Tasks Responded to Correctly in Various 40-Composites and the Number of Tasks Responded to Correctly in a Long Series of CAVD Tasks Ranging from Tasks Very Easy for the Group in Question to Tasks Very Hard for the Group in Question: Groups 9I and 9 IT 566 TABLE 162—The Correlations between the Number of Single Tasks Responded to Correctly in Various 40-Composites and the Number of Tasks Responded to Correctly in a Long Series of CAVD Tasks Ranging from Tasks Very Easy for the Group in Question to Tasks Very Hard for the Group in Question: Groups 13 and 17... 568XXV1 LIST OF TABLES PAGE TABLE 163—The Assumed Magnitude of the Error Whereby a Stan- ford Binet Mental Age Differs from the Mental Age Which Would Be Found by a Perfect Measurement of (Altitude. ot intellect... eas Se . 518 TABLE 164—Form of Distribution Assumed in Obtaining Measures of the Difficulty of Various Composites for the Groups of 50 Feeble-Minded 2 DOO TABLE 165—The Probable Form of Distribution of Altitude of Intel- lecttin Group 17 (Law Students)= sm DOL TABLE 166—Data for Estimating the Form of Distribution of Alti- tude of Intellect in the Group Ad. (44 Recruits) 599CHAPTER I THe PrEsENT Status? Existing instruments for measuring intellect? developed from three roots, the interview, the school examination, and the ‘tests’ of sensory acuity, memory, attention, and the like, devised during the early history of psychology. The Stanford Binet, for example, is an improved, systematized and standardized interview. The Army Alpha is in part an improved school examination and in part an improved bat- tery of tests like those used before 1900 by Galton, Ebbing- haus, Cattell, Jastrow, and others. Existing instruments represent enormous improvements over what was available twenty years ago, but three funda- mental defects remain. Just what they measure is not known; how far it is proper to add, subtract, multiply, divide, and compute ratios with the measures obtained is not known; just what the measures obtained signify concerning intellect is not known. We may refer to these defects in order as ambiguity in content, arbitrariness in units, and ambiguity in significance. AMBIGUITY IN CONTENT If we examine any of the best existing instruments, say the Stanford Binet, the Army Alpha or the National Intel- ligence Test, we find a series of varied tasks. Some concern words, some concern numbers, some concern space relations, some concern pictures, some concern facts of home life. Some seem merely informational; some are puzzle-like. Some concern mental activities which will be entirely famil- iar to almost all of the individuals to be tested; some con- 1 This chapter is reprinted with some alterations from the Psychological Review, Vol. 31, pp. 219 to 252. 2 We shall use ‘intellect’ and ‘intelligence’ as synonyms throughout this book.2 THE MEASUREMENT OF INTELLIGENCE cern novelties. Some are irrespective of speed; in some speed is a large element in success. In particular, as we shall see later, the score attained is a composite in variable proportions whereby A is rated as more intelligent than B —first, if he ean do certain hard tasks with which B fails, second, if he can do a greater number than B can of tasks of equal difficulty, and third, if he can do more rapidly than B tasks at which both succeed. The only sure statement of what abilities the Army Alpha measures is to show the test itself and its scoring plan. To this it may be retorted that this variety is not really an ambiguity, that one of these tests is a representative sampling of tasks for intellect, and that the scoring plan is one which weights each response according to its importance as a symptom of intellect. Unfortunately this is not true. We may cherish the hope that these tests approximate to such representativeness of sampling and suitability of weights. In fact, however, nobody has ever made an inventory of tasks, determined the correlation of each with intellect, selected an adequate battery of them, and found the proper weight to attach to each of these. Such a procedure was carried out in part by the Committee responsible for the construction of the National Intelligence Test, but limita- tions of time and funds restricted it to a very small fraction of what would be adequate. If anybody did this wisely, a large fraction of his labor would be precisely to find out what abilities our best present instruments did measure, and how these abilities were related to intellect; or to find out what abilities constituted intellect, and how these abili- ties were measured by our present instruments.® One of the main lines of work in the improvement of instruments for measuring intellect is then to find out what abilities our best present instruments do measure. $ The balance of his labor might be expended upon experimentation with tasks that seemed promising as symptoms, even though we did not know what abilities they required.THE PRESENT STATUS 3 ARBITRARINESS OF UNITS The score obtained by using the instrument to measure an intellect is in present practice either a number represen- ting a summation of credits and penalties or, more rarely, a number representing the grade of difficulty of the tasks which the person can respond to with some assigned per- centage of correct responses. Thus in Army Alpha he may score by summation from 0 to 212; in the first suggestion of Binet he could score 5 or 6, or 7, or 8, or 9, according as he was able to do correctly all but one of the tasks set as 5-year tasks, 6-year tasks, 7-year tasks and so on.* In neither case (even supposing the measurement to be a perfect representation of the person’s abilities) can the numbers be taken at their face value. If A scores 50 on Alpha, B, 75, and C, 100, we do not know that the difference between A and B in the abilities tested by Alpha is the same as the difference between B and C, nor that C has twice as much of these abilities as A. If D scores mental age 4, mental age 6, and F mental age 8 by the Binet, we do not know that, in the abilities tested by the Binet, F excels E as much as E excels D, or that F has one and one third times as much of these abilities as E has. The numbers, 1, 2, 3, 4, etc., designating the scores made by individuals, do not represent a series of amounts of intellect progressing by equal steps. The difference in intellect between Army Alpha 10 and Army Alpha 20 may indeed conceivably be as great as the difference between Alpha 100 and Alpha 150. From Stan- ford Binet 40 months to 60 months may be as great a difference in intellect as from 140 months to 180 months. The value of what is called a difference of 1 on the scale is not known, and its value may fluctuate greatly as we move along the scale. 4This suggestion was, however, abandoned in favor of a procedure which mixes two sorts of measure. The procedure is, ‘‘Take for point of departure the age at which all tests are passed; and beyond this age count as many fifths of a year as there are tests passed.’’ [‘The Development of Intelligence,’ Eng. trans. of Kite, 1916, p. 278.]+ THE MEASUREMENT OF INTELLIGENCE We have then no right to add, subtract, multiply, or divide with these scores of A, B, C, D, E, and F in the way that we do with their heights or weights. Suppose that A scores 100; B, 110; C, 90; and D, 120. We cannot say that the average intellect of A and B equals the average intellect of C and D. If E changes from 60 to 70, while F changes from 70 to 80, we cannot say that they have made equal gross gains. The numbers designating the scores made by individuals are usually not even approximately related to any true zero point. Consequently, even if the scores 1, 2, 3, 4, did represent an equal-interval series of amounts or degrees of the ability in question, they would properly be treated as et1,a+ 2,24+3,2+4. The ‘times as’ or ratio judg- ment is thus not surely applicable and the relations of the scores to anything else are thus undetermined. For ex- ample, we cannot say whether the intellect of the average twelve-year-old is one and a quarter times that of the aver- age six-year-old or twice it, or ten times it. The second main problem in improving measurements of intellect is thus to attach fuller and more definite meanings to these credit summations and difficulty levels, and if pos- sible to find their equivalents on absolute scales on which zero will represent just not any of the ability in question, and 1, 2, 3, 4 and so on will represent amounts increasing by a constant difference. We have to estimate equivalents of this sort somehow before we can make much use of ratings by either credit summations or difficulty levels; before, for example, we can conveniently compare individuals or groups, or the changes made by individuals or by groups, or the effects of different environments. The commonest method at present is to take as the equivalent for any score by any instrument, the age whose average achievement is that score, and to assume that 5 Attempts have been made to define ‘zero’ or ‘just not any’ ability and to assign scores in relation to zero in the case of knowledge of English words, ability to understand sentences, handwriting, drawing, and English composition.THE PRESENT STATUS 5 the increments in average ability are equal for equal differ- ences in age up to some limit such as 192 months, and are zero thereafter. This of course is purely hypothetical in general and is almost certainly in error for the ages near the point where the age change suddenly turns from its full amount to zero. The curve of ability in relation to age is almost always smooth as in the continuous line of Fig. Fig. la. The probable form of the curve of intellect in relation to age. Fic. 1b. The form of the curve of intellect in relation to age, if annual gains are equal up to some stated age, and are zero thereafter. la, but not with a sharp turn as in the dash-line of Fig. 1b. The competent thinkers who use the method know this and are cautious in inferences based upon its application to the higher ages; but they use it rather freely for the lower ages,6 THE MEASUREMENT OF INTELLIGENCE because some method must be used, because it is easy to understand and apply, and because we do not know what method is really right. It may be objected that equality of units is an unneces- sary refinement, for present practical purposes, since the mental age defines the status of an individual sufficiently, ‘as able as the average ten-year-old,’ ‘as able as the average twelve-year-old.’ These, it may be said, are better measure- ments for practical purposes than some absolute scale in terms of equal ‘mentaces’ or ‘intels.’ The convenience, in- telligibility and realism of the mental age scale up to about 12 or 13 years are indeed great advantages, but after 13 or 14 it is neither convenient nor intelligible nor realistic. It is not convenient because the computation of intelligence quotients becomes very troublesome for the higher ages. It is not readily intelligible because mental ages 14, 15, 16, ete., are not ‘as good as the awerage’ 14-year old, 15-year old, ete. The average 25-year old for example is about the mental age of 14 by one of the best instruments. It is not realistic because we have no clear or vivid sense of what the average person is intellectually at fifteen, or at sixteen, and do not even know whether he improves in the next two or three years. A mental age of 15 or 16 or 17 is in fact as arbitrary a quantity as an Alpha ability of 123. A rarer but more promising procedure than that of trans- forming test scores into ‘ages’ is to transform them into units of ability on the assumption that the distribution of ability in all adults 21-30, or in all twelve-year-olds, or in all pupils in grade six of a certain city, or in some other speci- fied group, is approximately that given by 1 = y= = e27” oV 2n For example, the Alpha scores from 0 to 212 were not used in the army at their face value, which would give a distribution of the form shown in Fig. 2, but were trans- muted into seven letter measures by the following scheme,THE PRESENT STATUS 7 which assumed an approximately ‘normal’ distribution for a random sampling of 128,747 of the literate white draft: 0 /00 200 Fig. 2. The form of distribution of the literate white draft if Army Alpha are taken at their face value. 135-212 = A 105-1384 = B (9-104 = C + 45- 74=C . 20- 44—= C— 1o- 24—=) O- 14—D)— The score used in the Thorndike-McCall test of para- graph reading is not the number of correct answers, but a transmutation on the assumption that the real ability con- cerned is distributed ‘normally’ amongst twelve-year-olds in American cities. We know very little concerning the permissibility of the assumption of the so-called normal distribution for adults or for an age, or for a school grade. The search for evi- dence pro and con is one important feature of the attempt to obtain units of mental ability which shall be at least ap- proximately equal. AMBIGUITY IN SIGNIFICANCE The test score measures directly only the measurer’s impression from the subject’s performance, or the summa- tion in a more or less capricious fashion, of credits and penalties for the subject’s responses to the different ele- ments of the tests, or a combination of these. What this 3PEBEUDB CHR Nee 8 THE MEASUREMENT OF INTELLIGENCE score signifies about the subject’s intellect depends upon the intuition of the measurer, or upon the correlation be- tween the summation and intellect, or upon both. When we assert that a child is found by measurement with the Stan- ford Binet to have the intellect of a child 101% years, all that is really asserted is that the child does as well in that par- ticular standardized interview as did the average of the children of 1014 years of age tested by Terman in making his standards. We do not know what the average intellect of these children was, nor how closely the Stanford Binet score represents or parallels or signifies it. When we assert that a man is found by measurement with the Army Alpha to have the intellect of an average recruit in the draft, all that is really asserted is that he does as well in that particular battery of tests scored and sum- mated in a particular way, as the average recruit did. Just what the intellects of recruits were and how closely their Alpha scores paralleled their intellects, we do not know. The measurement is one thing, the inference to intellect is a different thing. This is of course true of many measurements. The amount of silver deposited in one second by an electric cur- rent is not the amount of current. The dividend rate on stock during any one year is not the worth of the stock. The amount of silver is, under proper conditions, of perfect sig- nificance as an indicator of the amount of current, since the correlation between it and a perfect criterion of amount of current is perfect. The dividend rate is of very imperfect significance, since the correlation between it and a perfect criterion of the worth of the stock is far from perfect. We do not know how closely the rating or score in the Stanford Binet or the Army Alpha or any other instrument correlates with a perfect criterion of intellect, because we do not know what such a criterion is, much less its correlations with these tests. One great task of the measurement of intellect is to obtain such a criterion, or a closer approxima- tion to it than we now have, and to use it to improve theTHE PRESENT STATUS 9 selection and weighting of the elements of our testing in- struments. The present status of such instruments as the Binet or Army or National tests is roughly as follows: We have chosen tests where the judgment of sensible people in gen- eral is that correct response or speed of correct response is characteristic of intellect. Such is the case with directions tests, arithmetical problems, common sense questions (as in Alpha 3), and the like. We have chosen tests using the Judgment of psychologists in the same way. Such is the case with the completion tests devised by Ebbinghaus, the mixed relations or analogies test devised by Woodworth, and the like. We have tried these or other tests with chil- dren secluded in institutions because of imputed intellec- tual inferiority and with children of like age who are in ordinary schools (as by Norsworthy), with adult males of good health and morals who were found in a Salvation Army home, glad to work for a dollar a day, and with adults of the professional classes (as by Simpson), with children in general of different ages (as by Binet and Terman), with various groups of children ranked for im- puted intelligence by teachers, fellow pupils, school ad- vancement, and other symptoms (as by Spearman, Burt, Terman, Whipple, Yerkes, and others), with children of alleged superior intelligence in comparison with others (as by Whipple and Terman), with soldiers in the National Guard and regular army in connection with ratings for intelligence given by their officers (as by the Psychology Committee of the National Research Council), with stu- dents whose success in high school and college studies was also measured (as by Colvin, Wood, and many others), with individuals who were tested with a very long series of tests (as by Terman and Chamberlain, Stenquist, and others), and in other ways. As a general result we know that certain systematized interviews and batteries of tests measure somewhat the same trait, since they correlate somewhat one with an-10 THE MEASUREMENT OF INTELLIGENCE other; and that this trait has to some extent the same con- stitution as the trait which sensible people, psychologists, and teachers rate as intellect. The failure of perfect correlation between the amount of intellect a person has, as revealed by the criterion, and the amount indicated by the instrument is due, as has been said, partly to the imperfection of the criterion, but partly also to the imperfection of the instruments. They (at least all but one of them) are demonstrably imperfect, since no two of them correspond perfectly in their findings for the same intellects. Since it is extremely unlikely that, out of a dozen instruments devised with about equal care by a dozen individuals or committees at about the same date one should be very much superior to all the others, we may assume, until there appears proof to the contrary, that all are imperfect. The imperfection may be of two sorts. First, the re- sponses measured by the instrument may not be represen- tative of the whole intellect and nothing but intellect; the score obtained may not give enough weight to certain fac- tors or elements of intellect and may give weight to others which really deserve less or even zero weight. The instru- ment is then like a wattmeter which gives only half weight to the voltage of the current or adds two watts for every time that the current is turned on or turned off. Second, the same person may receive a different score when re- measured by the instrument. In so far as such differences are due to the ‘accidental’ ups and downs in the person’s achievements, they are taken care of by measuring him at enough different times; but in so far as they are due to ac- quaintance with the instrument itself or with instruments like it, they are a very serious imperfection. For example, a given score with Army Alpha represents a very different status according as it is from a first, a second, or a third trial. The case here is as if a thermometer tended after subjection to a temperature of 200° once to register 220° when 200° was next encountered. The provision of meansTHE PRESENT (STATUS ipa for distinguishing between that part of the score due to certain general characteristics of the person measured and that part of the score which is due to certain special train- ing that he has had with the tasks of the tests, or with tasks like them, is thus an important part of the work of making the measurements more fully and exactly signifi- cant of intellect. In general, all our measurements assume that the indi- vidual in question tries as hard as he can to make as high a score as possible. None of them can guarantee that the scores would correspond at all with a perfect criterion if the individuals measured tried to appear as dull as they could. The correlation would indeed then probably be in- verse, the more intelligent persons being more successful in their efforts to appear dull! It is theoretically possible to arrange a system of incentives such that each person measured by an instrument would put forth approximately his maximum effort, and in scientific testing of the instru- ments this can often be done. In general practice, how- ever, we rarely know the relation of any person’s effort to his possible maximum effort. Since, however, the disturb- ances due to differences in effort on the part of those tested require in study and treatment procedures which have little or nothing to do with the procedures by which the instru- ments are made to give better measurements of those who do try their best, we shall disregard the former and shall limit our inquiry to the latter sort of procedures. MEASUREMENTS OF INTELLIGENCE ARE MEASURES OF INTELLECTUAL PRODUCTS All scientific measurements of intelligence that we have at present are measures of some product produced by the person or animal in question, or of the way in which some product is produced. A is rated as more intelligent than B because he produces a better product, essay written, answer found, choice made, completion supplied or the like, or pro- duces an equally good product in a better way, more quickly12 THE MEASUREM NT OF INTELLIGENCE or by inference rather than by rote memory, or by more in- genious use of the material at hand. We can conceive of states of affairs such that a man’s intellect could be measured without consideration of the products he produces or the ways in which he produces them. Intellect might be exactly proportionate to the ac- tivity of the thyroid gland, or to the proportion of the brain weight to body weight, or to the number of associative neu- rones in the frontal lobes or to the complexity of the fibril- lary action of certain neurones, or to the intensity of a certain chemical process, and hence be measurable by ob- servations of the thyroid’s action, or estimates of the brain’s volume, or by a count or measurement of neurones, or by a chemical analysis. Psychologists would of course assume that differences in intelligence are due to differences histological or physio- logical, or both, and would expect these physical bases of intelligence to be measurable. At present, however, we know so little of the neural correlates of intellect that if twenty college freshmen were immolated to this inquiry, ten being the most intellectual of a hundred, and ten being the least intellectual of the hundred, and their brains were studied in every way by our best neurologists, these could probably not locate sixteen out of the twenty correctly as at top or bottom. Moreover, what we do know of neural correlates is of little avail during life, the living neurones being extremely inaccessible to present methods of observation. Even if one aimed at discovering the physiological basis of intellect and measuring it in physiological units, one would have to begin by measuring the intellectual products produced by it. For our only means of discovering physio- logical bases is search for the physiological factors which correspond to intellectual production. MEASUREMENTS OF INTELLIGENCE IMPLY VALUATION Our present measurements of intelligence rest on human judgments of value, judgments that product A is ‘better’THE PRESENT STATUS 13 or ‘truer’ or ‘more correct’ than product B, that method C is ‘preferable’ to method D, or that C is ‘right’ while D is ‘wrong,’ and the like. In some eases this is so clear that everyone must admit it. Thus in three of our best tests of intelligence, giving the opposites of words, completing sentences by supplying omitted words, and answering questions about a paragraph read, we make elaborate keys assigning credits to the dif- ferent responses.© These keys are obviously made by human judgments of the value of each response. The credits given may represent valuations by the truth- fulness or wisdom of the answers or sentences, by their grammatical form, by their rhetorical excellence, by their originality, by the rate of producing them, or by a subtle sense of their significance as evidence of intelligence. 6 For example, the task being to complete, ‘God made...and... let him pass for a man,’ we find among the responses of high-school graduates: him therefore him so him then him will him they him he him I him let man always man then man God man has man he man therefore man please Adam then Adam Eve Adam he animal wouldn ’t Eve God us we heaven earth and must assign some value to each, or make a dividing line between full value and no value somewhere.ej s>2j+j-i+i- 14 THE MEASUREMENT OF INTELLIGENCE In some eases the value is assigned so easily (as a simple deduction from, or following of, a general rule) that we may thoughtlessly assume that the response indicates intelli- gence regardless of any process of valuation. For ex- ample, we may consider that in a test in arithmetical com- putation or problem solving, the right answers are signs of intelligence, regardless of what anybody thinks. A little thought will convince us, however, that in such tests the human judgment acts as truly as in a completion or para- graph-reading test. The main difference is that, having once for all decided that right answers are better than wrong answers, we do not raise the issue about any par- ticular answer. We simply assume or make a general rule of valuation. The valuation becomes obvious if we col- lect all the responses made to an arithmetical task and ask whether all the different ‘rights’ are equally good or right, and whether all the different ‘wrongs’ are equally undesir- able.’ One criterion of value, truth, is so widely used in fram- ing, keying, and scoring tests of intelligence that it deserves comment, especially since there may be in the case of truth an objective criterion, power in prediction, by which our judgments of value are or should be determined. Two other criteria of value also need comment because they have been suggested explicitly or implicitly as direct criteria for intelligence. They are development with age and abil- ity to learn. TRUTH Probably over half of our present tests of intelligence are tests where the response is given credit as a symptom of intelligence in proportion to its truthfulness. Such is the case, for example, with eight out of ten tests of the Otis 7 In the special case where we arrange for Yes and No answers valuation is doubly active. We arrange so that a Yes or a No will be ‘good’ as a response. Then, since some of the correct ‘Yeses’ or ‘Noes’ may be due to chance, and since chance answers are deemed of no value, we plan our scoring so as to give the chance ‘ Yeses’ and ‘Noes’ zero value.THE PRESENT STATUS ih Advanced; and with Army Alpha, 2, part of 3, 4, 6, 7, and 8. It is more or less the case with Stanford Binet III, 5; IV, TOI Gils sia, 45 VE, 2, 25:3, Oo mViley te 2e om Ill aee 6% XE A ORL, 25 LT 268 oxeiVe 28335. G-andewath National Intelligence A, 1, 3, 4, and B, 1, 2, 3, and 5. One could make an attractive theory of intelligence and its measurement somewhat as follows: Intellect is con- cerned with facts, being the ability to see and learn the truth, to get true knowledge and use it to the best advan- tage. Truth is insight into the real world, the evidence that knowledge is true is its predictive power. Measures of intelligence are then ultimately measures of a man’s mastery of prediction, that 2 and 2 will be 4, or that it will be profitable to buy such and such a stock, or that a planet will be found having such and such a path. More immedi- ately, they are measures of certain abilities which contrib- ute to, or accompany, or indicate the existence of, the abil- ity to get and use the truth. By this theory we should rest our valuations of truth all on the ultimate test of power of prediction. One truth would be better than another in proportion as it predicted more facts, or more important facts, or predicted the same facts more acccurately, or helped more in the acquisition of other truths. Our valuations of abilities as evidences of intellect would rest on their significance as symptoms of ability to get and use truth. It seems sure, however, that people in general, psychol- ogists, and framers of intelligence tests, alike mean by intellect something more than ability in truth-getting to improve prediction. They mean what Pericles and Wash- ington and Gladstone had as well as what Aristotle and Pasteur and Darwin had. In the oral interview of the business man or physician to test intelligence, in such tests as Ebbinghaus’ completions, and in such a battery of tests as Army Beta, there is little obvious reference to predic- tion or truth getting. In the first case, the aim is rather to see how the person fits his thoughts and acts to little16 THE MEASUREMENT OF INTELLIGENCE problems or emergencies; in the second, it was rather to give him a chance to use all the so-called higher mental powers; in the third, many tasks were selected in which people who were regarded as intelligent could do better than people regarded as dull, and those of them which most conveniently distinguished the alleged bright from the al- leged dull were kept as the final choice. If these instru- ments do really measure ability at truth getting, it is only indirectly and more by accident than by design. It may be that truth-getting is what we unwittingly do measure by our intelligence tests, or what we ought to try to measure, but very few of those who devise or apply the tests think so. And it is surely wise to find out what we do measure before deciding that it is or ought to be truth- getting. DEVELOPMENT WITH AGE Binet had it in mind to discover those intellectual abili- ties which six-year-olds had that five-year-olds did not have, those which seven-year-olds had that five-year-olds and six-year-olds did not have, and so on. It might seem that, except for the one judgment that abilities were ‘better’ or represented ‘greater intelligence’ the later they came in this series of normal chronological process, the Binet mea- surement would be free from valuation. However, valuation came in from the start because Binet tried only abilities which he valued as intellectual. He did not take all the psychological features of five-, six-, and seven-year-olds and choose as his series of tests those which Separated the ages most distinctly. In revising Binet’s series Terman and others have paid less and less attention to lateness of development and more and more to signifi- cance as valued symptoms of intelligence in their choice of tasks. This is well. For if Binet or they had collected a series of tasks such as showed the least overlapping of one chrono- logical age on the next, the resulting series would be in- ferior as a measure of intellect to the series as it stands. For example, quality of handwriting, rate of tapping, andTHE PRESENT STATUS 17 ability in checking A’s on a mixed sheet of capitals would probably show less overlapping with age than vocabulary, rate of reading, and ability in completing sentences. But they would be far less effective in diagnosing amount of intelligence. Development with age would be a poor and partial crite- rion for intellect of any sort or degree, and for the higher ranges of intellect, say those above the 70-percentile intel- lect of the average of the white draft, or above the average ninth-grade pupil, it would be well-nigh worthless. It has never been so used. The Terman mental ages above 14, for example, are not functions of development with age, but of differences between individuals, regardless of age. ABILITY TO LEARN An obvious hypothesis, often advanced, is that intellect is the ability to learn, and that our estimates of it are or should be estimates of ability to learn. To be able to learn harder things or to be able to learn the same thing more quickly would then be the single basis of valuation. Suc- cess in solving arithmetical problems, or defining words, or completing sentences would then be good, simply and solely because it signified that the person had greater ability to learn. If greater ability to learn means in part ability to learn harder things, we have excluded the vague general valua- tion of certain products and ways of producing only to in- elude it again. For we shall find ourselves selecting or defining A as harder to learn than B on the ground that only the more intellectual persons can do it, or on the ground that it requires a higher type of intellect, and shall find ourselves using those vague general valuations to pick the persons or describe the type of intellect required. If greater ability to learn means only the ability to learn more things or to learn the same things more quickly, we have a view that has certain advantages of clearness and approximate fitness to many facts. Even less than in the case of truth-getting, however, do our present actual instru-bwecere el ez bi aiejriti~e diese a seaei tse eeeeheciaagetiges seis 18 THE MEASUREMENT OF INTELLIGENCE ments for measuring intelligence measure directly a per- son’s ability to learn more things than another person ean, or to learn the same things more quickly. The substitution test included in Army Beta, in the National Intelligence Examination and in some others, is about the only test of speed of learning that is used; and it is more than a learn- ing test. Much evidence will therefore be required before we can wisely replace our present multifarious empirical valua- tions by the formula that intellect is the ability to learn more things or to learn the same things more quickly. The reduction of all valuations of response to valuation as symptoms of ability to learn more and more quickly thus seems too narrow a view. It has other defects. Were it true, we ought, other things being equal, to get better correlations with a criterion of intellect from tests in learn- ing something new and from tests deliberately framed to measure how much one has learned in life so far, than from the existing batteries of miscellaneous tasks. This does not seem to be the case. Quantitative data concerning individual differences in learning under experi- mental conditions are rather scanty, and their correlations with a criterion of intellect are scantier still; but what facts we have been able to gather do not show that, per hour of time spent, tests in learning predict the criterion as well as do the tests now in use. Tests framed to measure how much one has learned in life so far, such as vocabulary tests, information tests, or such Binet elements as ‘Knows whether he is a boy or a girl,’ and the like, are valuable, but not, so far as we can determine, more valuable than a composite containing also tests primarily of selective, rela- tional, generalizing, and organizing abilities. OTHER ATTEMPTED SIMPLIFICATIONS OF THE PROCESS OF VALUATION Response to Novelty In one way allied to the doctrine just described and in one way sharply contrasted with it, is the doctrine that aTHE PRESENT STATUS 19 person’s intellect is measured by his ability to respond well to new situations, to do ‘originals.’ The importance of some such ability as this will, of course, be admitted. How- ever, in view of the great difficulty of deciding just what situations are ‘new’ for any given individual; in view of the fact that ‘to respond well’ is likely to bring in many or all of our vague general valuations again; in view of the fact that distinctions among novel situations as ‘harder’ (that is, making greater demands on intellect) will have the same effect ; and in view of the fact that our most approved present instruments include many tasks which seem as fittingly called responses to the familiar as to the new— in view of all this it seems best at present not to try to narrow our valuations to fit this theory. Relational Thinking Spearman has argued that intellect equals the appre- hension of experience, the eduction of relations and the eduction of correlates. The two processes are defined as follows: ‘‘The mentally presenting of any two or more characters (simple or complex) tends to evoke immediately a knowing of relation between them.’’ [23, p. 63.] ‘‘The presenting of any character together with any relation tends to evoke immediately a knowing of the correlative character.’’ [23, p. 91.] There is no doubt that the appreciation and manage- ment of relations is a very important feature of intellect, by any reasonable definition thereof. Yet it seems hazard- ous and undesirable to assume that the perception and use of relations is all of intellect. In practice, tests in para- graph reading, in information, and in range of vocabulary, seem to signify intellect almost as well as the opposites and mixed relations tests. In theory, analysis (thinking things into their elements), selection (choosing the suitable ele- ments or aspects or relations), and organizing (managing many associative trends so that each is given due weight in view of the purpose of one’s thought), seem to be asTe ee ee 20 THE MEASUREMENT OF INTELLIGENCE deserving of consideration as the perception and use of relations. Moreover, I fear that, in all four cases, we need other valuations to decide which are the better relations or more abstract relations, or the more essential elements, or the more sagacious selection, or the more consistent organization, or the more desirable balance of weights, and the lke. However this may be, our present tests of intelligence are not merely instruments to measure how little stimulus is required to produce a perception of a relation, or how many relations will be perceived from a given constant stimulus, or how quickly. And we may best study them as they are before dismissing the valuations on which they are based, in favor of any simpler and more objective sys- tem. We shall then accept for the present the status of mea- surements of intellect as measures of different products produced by human beings or of different ways taken by them to produce the same product, each of these products and ways having value attached to it as an indication of intellect by a somewhat vague body of opinion whether popular or scientific. THE CONTENT OR DATA OF TESTS OF INTELLECT Presumably a man can use intellect and display the amount of it which he possesses in operations with any sort of material object, any living plant or animal, includ- ing himself, any quality or relation that exists in reality or in imagination, any idea or emotion or act. Our tests might draw upon anything for their material. They have, in fact, greatly favored words, numbers, space-forms, and pictures, neglecting three-dimensional ob- Jects and situations containing other human beings. How far this has been due to convenience, and how far intellect is really best measured by its operations with words, num- bers, space-forms, and pictures, is a matter that obviously deserves investigation. Our choices of test material haveTHE PRESENT STATUS 21 certainly been somewhat determined by convenience. They have also favored ideas, general notions, abstractions, sym- bols and relations, to the relative neglect of percepts and particulars. This has been in the main deliberate, our general scheme of valuation attaching on the whole more intellectual worth to operations with generals and facts in relations than to particulars and facts in isolation. The nature and extent of the specialization of intellect, according to the content or material operated on, has been and still is a matter of dispute; and the difference of opinion carries over into the practice of measurement. Some psychologists would be fairly well satisfied to mea- sure intellect by a series of mazes alone; or by a series of sentence completions alone. Others, the great majority, attach much more confidence to a battery of tests including surely both words and numbers, probably also some space- forms and perhaps some more concrete pictorial material. THE FORM OF TESTS OF INTELLECT Whether we consider the external appearance of the tasks or the internal nature of the processes in the person doing them, there is a great variety in respect to form, that is, to the operations performed with the words, num- bers, pictures, and other content. Externally, there ap- pear questions to be answered, sentences or pictures to be completed, errors to be found and corrected, definitions to be given and to be chosen, items to be matched, directions to be followed, disarranged parts to be put together, dis- arranged events to be put in proper sequence, keys or codes to be learned, true statements to be distinguished from false, items to be checked as fit by various criteria, items to be crossed out as unfit, and so on. Internally, the individual finds himself striving to at- tend to certain matters, to fix others in memory, to recall what he knows about others, to select from many things or ideas the one which best satisfies certain requirements, to define the relation between two terms, to discover an ele-DP THE MEASUREMENT OF INTELLIGENCE ment common to three or four given facts, to hold in mind many different facts and use them to some specified pur- pose, and to inhibit customary habits in view of some rule. He also finds himself in some cases (such as many elements of information tests, vocabulary tests, and arithmetical computations) utilizing a wide range of knowledge and skills. Any system of units of measurement that is to be ade- quate must then apparently be flexible enough to apply to a wide variety of operations such as we may call attention, retention, recall, recognition, selective and relational think- ing, abstraction, generalization, organization, inductive and deductive reasoning, together with learning and knowledge in general. SCORING THE PRODUCTS OF INTELLECT In the great majority of instruments for measuring in- tellect the score or rating is determined in part by the de- gree of difficulty of the tasks the individual can do success- fully. Thus ‘There are three main differences between a president and a king; what are they?’ (Stanford Binet XIV, 3) is harder than ‘Are you a little boy or a little girl?’ (Stanford Binet, III, 4). To complete 3 6 8 16 18 36... . (Alpha 6, 20) is harder than to complete 10 15 20 25 30 35... . (Alpha 6, 2). Psychologists and scien- tific and sensible people in general readily rank tasks as easy or hard for intellect and would accept the principle that, ‘Other things being equal, the harder the tasks a per- son can master, the greater is his intelligence.’ The con- cept of hardness or difficulty in intellectual tasks, as now used, is somewhat vague and variable. Its outstanding characteristic is that among a large group of persons vary- ing in intelligence, the harder the task, the fewer will be the persons who can do it, and the more intelligent they will be. Sometimes, however, tasks are called hard which really are only recondite, familiar to few; and sometimes tasks are called hard which really are only long.THE PRESENT STATUS Da We shall presently define this concept of the intellectual difficulty of a task, so as to make it more useful in science, but for the present we may leave it vague, the principle stated above being true for any reasonable definitions of ‘difficulty,’ and ‘intelligence.’ In many of the instruments for measuring intellect there are tasks which are of equal difficulty (or at least tasks so nearly equal that which of them is hardest is not certain). In the Binet series the tasks for any one year of age were supposed to be equally hard. In Alpha 7 only by statistical inquiry could one decide which of these is hardest, which next hardest and so on. 6 love—hatred :: friend—lover Mother NCA CNEOMYveceecccvevcesesssvvscesesseessusceesseeessseunenee 6 7 wrist—bracelet :: meck—collar leg fOOt Giff Crceeeccecocmceesoc-sssseemvssecssseeeessecneeeee 7 8 sailor—mnavy :: SO]DICT—Qwm PrWAte ATMY FINE reccececsecsnccssesesresesemunnseereesinssesseseesen 8 9 carpenter—house :: shoemaker—hatmaker wax shoe leatneir.cceccccccsccccsnreemee 9 10 shoestring—shoe :: buttom—codt catch Dell WOOK Qniaeeaacecnoeecscnccscveeencesresecernmemreee 10 11 quinine—bitter :: sugar—came sweet SOLE DCCES uoccnccccecccrmencrsnrnrinnennnennn 11 12 tiger—wild :: Cat—og MOUSE LAME BAGreccecsssrcssseinesenovesssssceseonseeeseersesssne ‘noc Se 12 13 legs—man :: wheels—spokes carriage go ttre............ ee orton oe em oui ee 13 14 north—south :: east—north west south east 14 15 feather—float :: rock—ages hill sink OT €K.cccccecccccno----- scsi setesoe Se 15 16 grass—cattle :: bread—man butter water DOMES..:ccccccccconeeorn spc ee ee 16 17 fin—fish :: wing—feather air bird sail........... 17 18 paper—wall :: carpet—tack grass sweep floor 18 19 food—man :: fuel—engine burn coal wood 19 20 sled—runner :: buggy—horse carriage NArness WHEE] ecesecceocrworevnveverioneeneseneren 20 21 poison—death :: food—eat bird life bad 21 22 Japanese—Japan :: Chinese—Russia China Japanese pigtdilcceccccccccn. 22 23 angels—heaven :: mMen—Earth WOMAN BOYS PATAAISE .oecrecceeccsiweseewmnnn 2B 24 Washington—Adams :: first—contrast best SeCOMA UdStececerscsscescrsneenrenenn 24 25 prince—princess :: king—palace queen president Kingdom .ecccccscceorsrmerrnneen 25 Now if a test includes a dozen tasks absolutely equal in difficulty for people in general, any one person who gets some right will by no means always get them all right, and any one person who gets some wrong will by no means always get them all wrong. So a person’s score is partly determined by the number of tasks of equal difficulty that 424 THE MEASUREMENT OF INTELLIGENCE he does. We must then consider as a possible principle ‘Other things being equal, the greater number of tasks of equal difficulty that a person masters, the greater vs his intelligence.’ This principle would not be accepted so readily as the principle about greater difficulty, and per- haps would not be accepted at all unanimously. ‘Knowing more things than someone else, and being able to do more things than someone else’ is not so clearly and surely hav- ing more intelligence as ‘being able to do harder things than some one else can do.’ The two things have been somewhat confused in general discussions and in the construction of measuring instru- ments because, by and large, a person increases the num- ber of things he can do in large part by adding on harder ones, and also because the person who can do the harder ean on the average learn those which the duller person can learn more quickly than he, and so learns more of them. Consequently what we may call the level or height or alti- tude of intellect and what we may call its extent or range or area at the same level are correlated and either one is an indicator of the other. It will be best, however, to keep them separate in our thinking. In many of the instruments for measuring intellect a person’s score is determined partly by the speed with which he can do the tasks. Even in batteries of tests where all candidates attempt all the tasks, speed may count, since the persons who do the easier tasks more quickly may have time to review some of the tasks and perfect their work. If speed deserves any weight in determining the measures of intellect it is by virtue of the principle that, ‘Other things being equal, the more quickly a person produces the correct response, the greater is his intelligence.’ Giving much weight to speed arouses decided objections in the laity and among some psychologists, and the principle just stated certainly would not be accepted as axiomatic. By and large, however, if A can do harder things than B can, A will do those things which B can do more quickly than B can.THE PRESENT STATUS 25 A certain moderate weight attached to speed will not then much decrease a test’s significance; and, per hour of time spent on testing and scoring, an even greater significance may perhaps be obtained by giving a liberal weight for speed than by giving none. For the practical purposes of estimating intellect, a battery of tests in which level, extent, and speed combine in unknown amounts to produce the score may be very useful. For rigorous measurements, however, it seems desirable to treat these three factors separately, and to know the exact amount of weight given to each when we combine them. We shall try to make the concepts of intellectual prod- uct, difficulty of producing an intellectual product, range of products produced, and speed of producing a product, more definite and precise, but without so altering them as to lose the elements which have given them practical value in the best current practice in measurement, or to weaken in any way their usefulness in measuring intellects as we actually find them by the tests which we have so far de- veloped. We shall start with certain first approximations. For a first approximation, let intellect be defined as that quality of mind (or brain or behavior if one prefers) in respect to which Aristotle, Plato, Thucydides, and the hike, differed most from Athenian idiots of their day, or in respect to which the lawyers, physicians, scientists, scholars, and edi- tors of reputed greatest ability at constant age, say a dozen of each, differ most from idiots of that age in our asylums. Let an intellectual product, i.e., a product or response requiring, or depending on, intellect for its production, be defined as a product or response which, given the same ex- ternal situation, the intellects in the half toward Aristotle are more likely to make than the intellects in the half to- ward the idiot. For example, if, when all Athenians of age forty were confronted by the question ‘Is a straight line the shortest distance between two points?’ the growth of the white blood corpuscles was equal for the Aristotelian26 THE MEASUREMENT OF INTELLIGENCE and the idiotic halves, whereas the answer Yes was more prevalent in the Aristotelian half, we should rate the latter as a product depending on intellect, and the former as a product not depending on intellect. Let the intellectual difficulty of producing a given intel- lectual product in response to a given external situation be defined as follows: Enough time being allowed for produc- tion so that an increase in time would not increase the num- ber producing it, the difficulty for Athenians of forty is approximately greater the smaller the number of them who produce it, provided that the ranking of those who do pro- duce it differs from the ranking of those who do not by greater nearness to the Aristotelian end. We could be much more rigid here by supposing a population to vary from the idiots to the Aristotles in amount of intellect only, being identical in all else. Then, if all conceivable pro- ductions of intellectual products in response to given ex- ternal situations were ranked for difficulty, the order would be very closely that of rarity and of the nearness to Aris- totle of those who achieved it. We could omit the ‘ap- proximately,’ and the ‘provided that.’ Our definition has deliberately been left loose, since we do not know exactly what it is in which Aristotle differs most from the idiot, much less can we know in the case of any group of actual individuals that they are identical in all else than it. The range of products produced at any one level, 1.e. of products which are equally hard to produce, is defined simply by their number. What we may call the relative range at any level may be defined as the percent or fraction of the products at that level which can be produced by the intellect in question. The speed of producing any given product is defined, of course, by the time required. It will be convenient to use the word task to mean the production of a given product in response to a given exter- nal situation, and to speak of the difficulty of tasks, the number of tasks of a given difficulty that can be done, and the speed of doing a given task.THE PRESENT STATUS 27 We now have intellect defined by a ranking of men whose differences therein are roughly appreciated as we appre- ciate the differences of the world’s varied objects in volume (only much more roughly.) We have intellectual tasks and products defined in a catholic way that would, for instance, probably include every task in all the stock instruments in use by psychologists to-day. We have difficulty defined objectively so that a series of tasks could be approximately ranked as to their respective amounts of difficulty for any specified group. If we list all tasks, find the difficulty of each, apply an intellect to them, observe which it can do, and how long it requires to do each, we have measured how hard tasks it can do, how many it can do at each level, and how quickly it can do them. If we use in place of a complete list of tasks a fair sampling from them, we have attained the same end, subject to the error of our sampling. The new problems of theory and technique in the mea- surement of intellect, that is, the problems not soluble by the general methods of measurement in any science, con- cern the measurement of difficulty of task. Extent and speed are measurable in two of the most perfect units there are—number and time. In the ease of difficulty, however, we have so far provided only for an inventory of intellec- tual tasks and their arrangement in an order of difficulty. Their differences in amount of difficulty and the dif- ferences between the amount of difficulty of any one of them and some zero point of difficulty (some task which is Just below a task of infinitesimal difficulty), are not deter- mined. To find ways of determining these will be our main work. Before attempting it, however, we may best consider certain further facts about difficulty, extent and speed in the production of intellectual products, and certain conse- quences of our analysis of a measurement of intellect into this three-fold determination.28 THE MEASUREMENT OF INTELLIGENCE FURTHER FACTS CONCERNING DIFFICULTY We have defined intellectual difficulty in relation to a defined group of individuals. How far the rank order for difficulty obtained in the case of one human group will hold for others, or for a group of dogs or of chickens, is a mat- ter better ascertained by experiment than prejudged. Difh- culty in our treatment is always difficulty for some specified group of intellects, such as our Athenians aged forty. We can, if we wish, specify the group as all human beings of all ages, or all animals, and so get measurements of some- thing which we might call difficulty in general. The value of such a measure will, however, depend largely on the close- ness of correspondence between the rank orders for the same series of tasks at different ages, in different civiliza- tions, and so forth. If these are very low, the measure- ment of such difficulty in general may be of very little use. Many cases of grouping, as by age, by amounts of gen- eral education, by amounts of special education, or by city and country environment, are of great importance. Two may be considered briefly now as samples, namely, group- ing of those of equal chronological age by amounts of in- tellect, and grouping of those of equal intellect by chrono- logical age. If certain tasks are of difficulty k, ka, k+a+b, k+a+b-+c, ete, for 12-year-olds of low or small intellects, say the bottom tenth of twelve-year-olds, how far will they retain the same relations in respect of difficulty in the case of the top tenth? If certain tasks are of difficulty k,, k, + a, kj + a+b, ete. for the eight-year- olds of a certain degree of intellect, how far will they retain the same difficulty relations for sixteen-year-olds of the same degree of intellect? We have eliminated speed entirely from influence upon the measurement of difficulty, by our condition that such a time allowance be given for the task that no further increase in time would alter the production. In practice, this would only be approximated. Obviously we must not make the time so long that during it the intellect in question changesTHE PRESENT STATUS 29 appreciably by growth or training. We should not leave individuals to strive for ten hours to complete: ‘The body eee gives light the is the sun,’ because once in ten thousand times, some child who failed during nine hours succeeded in the tenth. This would be a valuable experiment, but we have far more valuable ways of using ten hours of his time. What we are really concerned about is to avoid rating one task as harder than another merely because it is longer, so that the poorer intellects do it less quickly than the others, and so, within a too short time limit, show a spuri- ously greater percentage of failures. We have made the requirement that the intellectual ranking of those who do produce the response shall be higher than that of those who fail. Usually this require- ment is unnecessary. It can, that is, usually be assumed that the good or correct response will be obtained by the better intellects more often than by the poorer. It is inserted to provide against cases where the better intellects are sub- ject to some constant error so that they give fewer correct responses than the dull do, or where other factors than in- tellect distort the percent of rights from what it would be if everything but intellect were equalized. For example, it is conceivable that, if (a) and (b) below were given to a ran- dom sampling of intellects, Underline the right answers: 1 (a) 47* equals 4 3 5 41 (6) 4% equals 2 34 8 412 ratings for difficulty by the percents correct would be very much in error. The percent for (a) would probably be lower than for (b) because, lacking knowledge of exponents, the more intelligent one was, the more likely one would be to report 3 for (a), (valid if 4* means 4 — 1), and to report 2 (valid if 44 means 4 halves or 4 « 4) for (b). We have treated the task as being to produce a certain product. It is scored, consequently, as done or not done,30 THE MEASUREMENT OF INTELLIGENCE suecess or failure, right or wrong. Now when any task for intellect is set there are often many different responses raryinge in ‘goodness’ or correctness. In such cases, our method requires that in determining the difficulty of the task, a dividing line be set somewhere.? Our method will not, however, prevent us from later using different credit values in a scoring plan for such a task and taking full ad- vantage of whatever added value these more detailed credit values may have in estimating an individual’s intellect. It may be noted further that a task may consist of vari- ous combinations and complications of other tasks. Thus the task may be to get the right answer to 8 + 3, or to get the right answer to 11 +7, or to get the right answers to both 8-+3 and 11-+7, or to get the right answers to 8 + 3 and 11+7 and also 18+ 4, or to get the right answer to: Find the sum | om~ape which ordinarily involves the above, plus knowledge of 92 +9 of certain words and procedures, and control over certain habits, such as holding numbers in mind, and adding a seen to a thought-of number. We are now in a position to state one theorem of the measurement of intellect. Let difficulty be defined as above, then: Theorem I: Other things being equal, if intellect A can do correctly all the tasks that intellect B can do save one and in place of that one can do one that is harder than it intellect A has the higher level. One is tempted to go further and assume that, other things being equal, if A and B ean do correctly the same number of tasks, A has the higher level, if the average diffi- 8 What seems to be one task to the person tested may be used as two or more tasks by scoring it first with the dividing line at one place, and second with the dividing line at another.THE PRESENT STATUS ol culty of the tasks he can do is greater than the average diffi- culty of the tasks B can do. This cannot, as yet, be wisely assumed first because we do not know that we have any right to average measures of difficulty,° and secondly, be- cause, even if we could, it is not safe to assume that as much intellect is required to do 10 tests each of difficulty 20 as to do one task of difficulty 200. On the other hand one is tempted to suggest the measure- ment of an intellect by the hardest things it can do,assuming that since it can do these, it could do all easier, as we assume that one who can jump over a bar 6 feet high could surely Jump over bars at 5 ft. 10 in., 5 ft. 8 in., and so on. The possible variety and specialization of intellectual tasks makes this uncertain. WIDTH OR EXTENT OR RANGE Our definition of greater difficulty enables us also to define equal difficulty and so to make a fairly rigorous defini- tion of width or extent or range by making it separately at each level of difficulty. For any specific group G and any specific time ¢ those tasks are equally difficult which are done correctly by equal percentages of intellects. Consider then all the tasks which are of a certain diffi- culty D. Some intellects will fail with all of them. Among the intellects which succeed with some of them we may make comparisons according to the number succeeded with. Such a statement as ‘N tasks, of equal difficulty D, being given, with ¢ time allowed per task, A did 0.1N while B did 0.2N and C did 0.3N,’ is clear and useful. We can say that B did twice as many as A, that C exceeded B in the number done as much as B exceeded A, and that the average for A and C was the same quantity as the score for B. Where the problem concerns the extent of an ability, as in the number of certain facts that are known in history or science or the 9 We have provided for determinations of which one of two or more tasks is the more difficult, but not, as yet, for determinations of how much more diffi- cult it is,32 THE MEASUREMENT OF INTELLIGENCE number of certain procedures that are mastered in arith- metic or carpentry, it is often, perhaps usually, desirable to free the measurement from differences in difficulty by mak- ing the tasks equal in difficulty and measuring extent at that level.*° In the measurement of intellect, measurements of extent at each level are obviously instructive for many purposes. The inventory of what intellect A can do is improved by being classified into Tasks 1, 2, and 4 at level D,, 16, 19, 27, and 28 at level D., 37, 43, 48, 49, and 56 at level D;, and so on. We can set down as Theorem II: Other things being ‘equal, if intellect A can do correctly all the tasks that intel- lect B can do, and can also do one more task at the level of any of the others, intellect A has a greater range than intel- lect B has. We could also safely say that A is a better or more useful intellect; whether we can rightly say that it is a greater than B is more doubtful. The latter seems to im- ply that superiority in extent can be made commensurate with, weighed in the balance against, superiority in level. For the present let us leave the question open. SPEED There is of course no essential difficulty in measuring the time required for intellect A to produce a certain prod- uct. Number and time figure in mental measurements as they do in physical measurements. The units of number and time are indeed so much more convenient and intelli- gible than units of difficulty that there is a strong natural tendency in those who devise instruments for measuring intellect to let their measurement depend largely upon the 10 It should be noted that a number of tasks of equal difficulty may be given in a test instrument, not with any intention of measuring extent of intellect at that level for its own sake, but simply in order to obtain a more accurate measure of the level itself. For example suppose that in instrument x we have tasks at ten levels D,D,D, . . . Dy, one at each. Suppose that in instrument y we have ten at each level. Then by whatever convention we determine how hard a task a person can do, we shall determine it much more exactly by instrument y than by instrument z.THE PRESENT STATUS 33 number of correct responses and the speed of producing them. In the instruments that are actually used, it is customary to have the time a mixture of (1) the time spent in doing some tasks correctly, (2) the time spent in doing other tasks incorrectly and (3) the time spent in inspecting other tasks and deciding not to attempt them. This confusion may be permissible, or even advantageous, in the practical work of obtaining a rough measure of intellect at a small expense of time and labor and skill, but for theory at present and for possible improvement of practice in the future we need to separate the speed of successes from the speed of failures. To the number of tasks correctely done at each level we may add a record of the time for each or of the average time for all at that level. Since to save time in intellectual production is a ‘good,’ we may frame Theorem III as follows: Other things being equal, if intellect A can do at each level the same number of tasks as intellect B, but in a less time, intellect A is better. To avoid any appearance of assuming that speed is com- mensurate with level or with extent, we may replace ‘better’ by ‘quicker.’ THE RELATIVE IMPORTANCE OF ALTITUDE, EXTENT, AND QUICKNESS OF INTELLECT Hach of these three factors is essential. If it required an infinite time per task, an intellect would produce no prod- uct at any level no matter how high its potentialities as to altitude and extent might be. If it had zero extent at all levels, it would not matter how high its potentialities as to altitude or how quickly it could do nothing. In the ordinary sense of the word, however, altitude or level is by far the most important. The chief evidence for this is that it alone is indispensable, irreplaceable by anything save itself. If the best available intellect can do only things of level D1», we cannot get things of level D.. done at all. If the best available intellect can do only 72 things at level D, and we34 THE MEASUREMENT OF INTELLIGENCE need to get 144 things at that level done, we need only to get other intellects at work, say one that can do 45 of the balance and another that can do the remaining 27. If the best available intellect can do only 10 tasks per minute at Sh el 410|8 10) 330 YEO1410) 48013201630} 190} 7710\3.20) 340 4501 190| 300} 40032015 7013 10 340) 360|220|250 307 | 25212401231 12631318 1245 |1 75 1168 | 252) 242 (79.12.981.205| 3041 146 | 2851216 | 234]1961223|227) 21 170} 192) 991155} 164 | 203} 165} 127|158 }1 86] 160 |230)(47 127) 884109) 921123) 73 047) SLII27|1G2 (135 | bij le GEIL 491991 FL VOO}I3|109]102) 115158 |) 65 }107) 72 8111231 99} 80) 671 85} 59} 16) 38) 831108) 70) 72) 110} 86 291 661102) 45} 30] 63) ST} 71) 62} 58} 70) 34168) 41) 8 SUM Al QB TEN S21 45 SES Seo etaleoo tar 41) 52129) 32] 40} 34121} 50) 39} 55 | 38 | 50] 32 | 38125 154 Lg] 27| 36] 50] 20129 | 26418 | 15 | 27} 24} 40) 35 | 28 | 23 | 34 | 25 Ly} 20} 19 | 15 | 20 | 18 $19 | 16 18 | 20 | 24 | 16 18 26 ea ] 6 es One S/s\lll ON te es et ) wo 12} 8 | 10 Bala | 3) |e Fig. 3. The measure of a superior intellect. eg of Wilco }]—c¢ oS Ss —- | o~ | co — a} ~) | oO ox oS Cn —~) le) level D; and we need to get 20 done per minute, we can hire five common people who ean do two a minute to help. In- deed, we shall be wise to hire ten common people to do twoTHE PRESENT STATUS a a minute each, and leave the best available intellect to put its time on tasks far above level D;. Common sense recognizes the greater importance of alti- titude. It rates a Pasteur far above the most widely com- petent general practitioner. It does not ask how quickly Milton could give opposites, or turn out doggerel rhymes. Probably Pasteur was very much above the average in extent of intellect; probably Milton could have written as good poetry as A can write and much faster than B can. But common sense considers extent and quickness as unim- portant in comparison with reaching a level far above the average. From the economic and philanthropic points of view, altitude is enormously more important. If an intellect could be hired from Mars of so high level that it could learn how to prevent war as easily as Jenner learned how to pre- vent smallpox, a million dollars a day would be a cheap wage for the earth to pay him. Our analysis of the measurement of intelligence may be represented by space and number as follows: Let one sixteenth of a square inch represent one intellec- tual task. Let those equal in difficulty be placed in the same row across the page; let the order of the rows from the bottom to the top of the page be the increasing order of difficulty ; let the square be shaded if the individual in ques- tion cannot do it; if he can do it, let it bear a number repre- senting the time he requires to do it. For illustration, we have assumed that there are 320 tasks and that they are of 20 levels of difficulty, 16 at each level. Figures 3 and 4 then represent the measurements or in- ventories of two specimens of intellect. Such measurements or inventories may be abbreviated by using a random sam- pling of tasks at each level, or by using only every other level or every third or every fourth level, or in other ways. Only one thing is needed to make such measurements sub- missible to the arithmetic and calculus of science in general. That is the expression of the altitude of each level (now36 THE MEASUREMENT OF INTELLIGENCE merely a rank) as an amount of difference from the altitude of the others and from some group of tasks which require intellect, but so little of it that they border on a true zero of difficulty which may be set as their lower limit. This is the fundamental problem of mental measurements. Lis 147] 163 128| 146] 78 | 86] 130 461116] S4|101} 68 128] 93] 80| 66 | 84 68) 65} 37) 54} 93 Ls | 48] 42] 40/32] 41/42 L, |34| 33] 30] 32| 35] 24 Ls] 25) 24) 19] 1s] 17 BIO FIZ BI PIS TITY IG | 16 | 24 Li {is {19 [17] 16] 14 dy {is {18 [16 (4 [t7 [13 12 119 er eajmcmlnen {eo s|o.1G | 11 Sq eG a) antral evan as Fic. 4. The measure of an inferior intellect. 17132 §3}107 58} 45 151103 36} 82|101| 77 82} 46} 91} 15 g\ge2 313230137 3h 3 23} 20) 37|30| 21) 36 6 | 3 JCHAPTER II THe MEASUREMENT OF DIFFICULTY THE PRESENT STATUS If the members of a group are tested with each of a num- ber of intellectual tasks, these can be put into an approxi- mate rank order for intellectual difficulty for that group by the percentages of successes. For example, tasks A to G below are arranged by steps in an approximate order of intellectual difficulty for the group, ‘‘persons at the time of graduation from Grade 8 in City A in 1924.”’ The same sort of validity that attaches to the statement that G exceeds A in intellectual difficulty for that group, attaches also to the statement that G so exceeds F, that F so exceeds EK, and so on. Write words on the dotted lines so as to make the whole sentence true and sensible. Write one word on each inch of dots. PAPAS STON OMAN: 6S lift a heavy box. Due sskOse is; a favorites 4 1. 8 because of een WES 2 e Eraorance sang... seb A, Craw body. Of. 2 entirely surrounded by reas ka TSyCAl ed, ais teak ete, SSG iy oe eet! effortyand..a long. == ee but the result is sure. iH Yousmay sately conclude that you. == in yourseli: the means Of... 2: 2 2a at the truth. He Elepab.enuue:v @.d), in, 24-4 9 ee hard things Ae crs because hard. G. Judicial decisions are of or less author- MGVPAS LOCOCO eMtSe 2. ke to circumstances. The same sort of evidence and argument which decides that G has more intellectual difficulty than A, was used to place B, C, D, E, and F in order between A and G. The 3738 THE MEASUREMENT OF INTELLIGENCE evidence is the number and nature, in respect of intellect, of those who succeed with each task. The harder the task, the fewer the persons who succeed at it, and the more intel- lect they have. The argument implicitly involved is (1) that whether a person succeeds or fails in such tasks is deter- mined largely by the amount of intellect which he possesses, and not greatly by anything other than intellect, and (2) that in the hardest tasks which a person masters, he uses in general nearly all the intellect which he has. The argument is sound enough to justify such a rank order as the A, B, C... G order shown above, or the order of a series made of Stanford Binet tests for Mental Age 10, Mental Age 12, Mental Age 14, Mental Age Adult, and Mental Age Superior Adult, but we shall find trouble if we try to make a very close ordering, or to use the per- centages of successes for other than approximate rankings. The exact determination of a rank order of test elements for intellectual difficulty requires that the individuals in the group be tested with each of the tasks under similar condi- tions, including interest and effort, which is a matter of general scientific care that needs no further discussion here. It requires also that each of the tasks in the series shall be ‘intellectual;’ and this requirement will eventually need very elaborate discussion. We shall, indeed, find that it is desirable to define an intellectual task as one in which the person tested uses all the intellect he then has; and in which he differs from other persons in nothing save the amount of intellect used. If, however, we applied any such rigorous definition now, we should be unable to deal with any elements of any tests ever used in measuring intellect, since not a single one of them is a task which depends on intellect in its entirety, and differentiates individuals with no disturbance by any- thing other than intellect. A test element which did so would correlate 1.00 with a perfect criterion. In order to maintain continuity with previous work, we shall first treat each test element as if correct response to it was caused by intellect intact and uncontaminated by aught else.THE MEASUREMENT OF DIFFICULTY 39 Two methods have been used to measure the difficulty of intellectual tasks. The first has used the judgments of teachers, psychologists, and other judges of presumable competence, relying on some assumptions such as that if K percent of competent judges rate A as harder than IB; and B as harder than C, the differences in difficulty, A-B and B-C, are equal. We shall report our investigations of this method later; for the present, it may be disregarded. The second has used the percentages of some defined group of individuals who succeed with each task, relying on some assumptions about the form of distribution (in that group) of the ability involved in doing such tasks. For example, suppose that we knew that ten thousand individuals were, in respect to levels of intellect, distrib- uted as follows: At level x 10 ‘6 ‘6 x ae k 100 ee Xe, Ok 1000 moo. Xt Bk 1190 eee |X <4 AK 1000 eee, Xt lk 1000 ae | x U6k 2000 ee xe ik 2000 eee’. XM 8k 1000 wee’. .X + Ok 500 ie ‘* x-+410k or higher 200 and suppose that the ten thousand were tested with various intellectual tasks with the following result: 10000 had Task No. 6 right 9990 ‘“* e OLOE mont Te) ee ee erent 8390 <* So dane lO nonG sonal) Se 20erioht 6700 ‘* Ses SS ee eraehG oOo < cS) OS riot ZOO) Oo enteht LOO; s: So, eee ee Ont 200. <: th 7 ected gata OCG ZOO es Sonia40 THE MEASUREMENT OF INTELLIGENCE Using these results at their face value, we should con- clude that within this group No. 6 can be done by intellects of level x, and perhaps of lower levels; Nos. 18 and 19 re- quire level x +k; Nos. 1o and 20 require level x + 2k; No. 92 requires level x + 5k; No. 28 requires level x + 7k; No. 29 requires level x + 8k; No. 30 requires level x + 9k; Nos. 31 and 33 require level x + 10k or higher. Nos. 31 and 33 are thus 1k harder than 30, which in turn is 1k harder than 99, which in turn is 1k harder than 28, which in turn is 2k harder than 22, which is 3k harder than 15 or 20, and so on. THE MEASUREMENT OF DIFFERENCES IN DIFFICULTY BY WAY OF KNOWLEDGE OF THE FORM OF DISTRIBUTION OF THE VARIATIONS OF AN INDIVIDUAL IN LEVEL OF INTELLECT” An individual does not display the same level of intellect ai all times and seasons. He varies around his average status. If we know the real form of distribution of his variations in level, we can use it to compare the differences of tasks in real difficulty, just as we use knowledge of the form of distribution of a group. 1 We began our search for means of measuring differences in difficulty by inquiring whether the real form of distribution of the real abilities of the indi- viduals represented im a single array m a correlation table, might not be de- termined with greater certainty than the form of distribution of the group as a whole. This is indeed often the case; and the use of a group sorted into arrays has much to recommend it. The consideration of the factors which do influence the form of distribution of the real ability of the individuals in an array, led us to a broader view of the means of scaling difficulty of task and level of intellect. The form of distribution of the real abilities in an array is determined by three causes: (1) The form of distribution of an individual’s variations around his own average; (2) the relation of an individual’s variability to his amount of ability, and (3) the form of distribution of the entire group from which the array is sorted out by its correlation. It will be shown that if we can determine the facts for any one of these, we can transmute certain differences in rank into differences in amount. The transmutations by (1) and (2) are almost, if not quite, independent of those by (3) in respect of facts and assumptions, and so provide a check of great value. The use of an array instead of a total group utilizes all three methods together in a way that has many advantages. We shall not, however, make use of this method in the main body of our work.THE MEASUREMENT OF DIFFICULTY 4} Suppose, for example, that an adult is measured with a hundred examinations, each consisting of 100 tasks, each of ten levels, 1,, 1,, 13, L,, ete., of increasing difficulty. Sup- pose that the highest level at which he attains 50 percent correct? is level 1, in 27 eases, level 1, in 50 cases, and level 1; in 23 cases. Then, if we know the form of distribu- tion of his variability, we can compare 1,-], with L—, in terms of amount. We could thus put into comparison the differences between any two levels between his upper and lower limits. This method will be very useful as a check upon estimates of difficulty via the form of distribution of a perfectly mea- sured group, because it is so independent thereof. The forces which make one individual vary from one time to another are probably almost, if not quite, different from the forces which make one individual vary from other in- dividuals when all are perfectly measured. We have therefore studied the variability of an individ- ual in repeated tests of intelligence and other mental abili- ties at some length, trying to discover the form which its distribution would have if the ability in question were mea- sured by a scale of truly equal units, instead of the arbi- trary scales which we have. The results, which appear in Appendix I, are substantially unanimous in fitting the hypo- thesis that, omitting such extreme conditions (for example, being asleep or being seriously ill) as prevent an individual from being tested at all, depressing conditions are neither more frequent nor greater in their effects than elevating conditions, the real variability being symmetrical. For example, 60 pupils in Grades 4, 5, and 6 were tested with Stanford Binet, National A, National B, Otis ad- vanced,® Myers Mental Measure, Haggerty Delta 2, Illinois, and certain parts of Dearborn. Each score was first turned into a deviation from the median for the group in that test, 2 Or 60%, or 90%, or whatever percent we are using as a measure in our experiments. 3 Some had the Otis Primary instead. For these, estimated scores in the Otis Advanced were computed.SRP SP Paes MRT OK ds OE AS bee eee: 42 THE MEASUREMENT OF INTELLIGENCE in terms of the variability of the group in the test in ques- tion. Then it was expressed as a deviation from the aver- age of the eight such deviation-scores for the individual in question. These last deviations represent the variability of an individual around his average ability in intelligence tests. Their distribution is as follows: Deviations Frequencies —165 to —194 0 —135 to —164 2 —105 to —134 1 - 75 to -104 9 — 49 to — 74 40 —- 15 to - 44 124 - 14to+ 14 137 4+ 15 to + 44 86 + 45 to + 74 41 + 75 to +104 10 +105 to +134 3 +135 to +164 2 +165 to +194 1 65 pupils in grades 8 to 12 were tested with Alpha Form 5, Alpha Form 8, Terman Group Test Form A, Terman Group Test Form B, and half of Part I of the Thorndike Examination for High-School Graduates. These five scores for each pupil were treated just as the eight scores de- scribed in the previous paragraph, except that the final deviations are deviations from the individual’s median in- stead of from his average. The resulting distribution was as follows (including the 65 zero deviations of the medians themselves) : Deviations Frequencies -110 to —129 1 — 90 to —109 2 — 70 to — 89 10 — 50 to — 69 10 —- 30 to — 49 34 —- 10 to — 29 47 — tte & 117 10 to + 29 55 30 to + 49 27 + 69 1 70 to + 89 90 to +109 a Se + 50 to a + +110 to +129 ona uwTHE MEASUREMENT OF DIFFICULTY 43 We can then use the method to compare amounts of diffi- culty up and down from an individual’s average level. Our results also seem to fit the hypothesis that the real variabil- ity of an individual, under the conditions stated above, fits a probability surface limited at about + and —3 S.D. better than it fits any other one surface. They are not, however, such as Justify a rigorous quantitative treatment of fit. THE RELATION OF THE VARIABILITY OF AN INDIVIDUAL TO HIS AMOUNT OF ABILITY The facts of the previous section enable us to compare and equate differences in difficulty for an individual of aver- age level l,, using of course many such individuals to gain exactitude, or general reference, or both. These differences will be restricted in range to levels not very far below or above l,. If we wish to compare and equate them with dif- ferences outside that range, we must use individuals of average level higher than 1], and individuals of average level lower than 1,. To use them we must know how much less or greater their real variability is than that of our starting group with average at l,. We have, therefore, made very extensive investigations of the relation of an individual’s amount of variability to his amount of ability. These are reported in Appendix II. Only their general nature and results will be stated in this section. Here as elsewhere we distinguish sharply between (a) the apparent, or face-value, relation observed between the variability of an individual’s separate scores and his average score, and (b) the real relation that would be ob- served if these scores were transmuted into measures such that 1, 2, 3, 4, 5, 6, 7, ete., represented a real arithmetical progression of amounts of the ability. For example, we find that twenty individuals each of whom took (after two preliminary trials, to eliminate the practice effect) from eleven to thirteen forms of Part I of the Thorndike Intelli- gence Examination for High School Graduates, showed the results of Table 1. If the scores are taken at their face44 THE MEASUREMENT OF INTELLIGENCE value, it appears that the variability of an individual whose median score is about 105 (from 100 to 113) is very nearly the same as the variability of an individual whose median score is about 128 (125 to 132). If, however, the units of the scoring scale from 90 to 120 really represent smaller in- erements of ability than the units from 120 to 145, the real variability of an individual of ability 105 is less than the real variability of an individual of ability 128, and con- versely, if the units of the scoring scale from 90 to 120 really represent larger increments of ability than the units from 120 to 145. We thus record the face-value-score results for many different sorts of tests of intelligence,* noting in each case any facts about the construction of the tests which concern the probability that its units progressively swell or shrink in ‘real’ value over any considerable fraction of the range we are concerned with. We note especially the results in those cases where there is no reason to expect swelling more than shrinking. The average relations between varia- bility and ability found in these cases may be taken to rep- resent approximately the real relation, until some one pro- duces evidence that, in all or nearly all tests for the ability in question, there are forces leading psychologists, quite without intention, to devise scoring plans which make for progressive swelling or shrinking of units. The general drift of the facts is shown in Table 2 which gives the variability (in face-value-score units) of an indi- vidual from day to day in intellect as a percent of the varia- bility of a person whose amount of intellect is that repre- sented by an Army Alpha first-trial score of about 100. 4 We have secured extensive data concerning Army Alpha, Examination A, Army Beta, Stanford Mental Age, the National Intelligence Test, the Otis Advanced Test, the Haggerty Delta 2, the Myers Mental Measure, the Kelley- Trabue, the Stanford Binet, the Terman Group Test, the I.E.R. Test of Selec- tive and Relational Thinking, the I.E.R. Test of Generalization and Organiza- tion, the Thorndike Non-Verbal Test, the Thorndike Examination for High School Graduates, series of 1919 to 1930, and the Toops Clerical Test. See Appendix IT.ny ABLE 1 30-MINUTE TRIALS WITH PART I OF THE THORNDIKE INTELLIGENCE EXAMINA- f T ‘HE MEDIAN SCORE FOR THE INDIVIDUAL IN ALL THIRTEEN ) > \ oF rt ‘ N (oR FEW HOOL GRADUATES FROM 1 TEE 9 > v ‘ H Sc ‘ x TION FOR Hi VARIATIONS OF THE ScoRES OF THII THE MEASUREMENT OF DIFFICULTY 45 DAYS. 13 DIFFERENT 0 ¢ N | | oD | | soln | co | ° NHMOnMMoOnOCd nN | s o 8 a | aa As So | | = Oo Dp - BS Sue Ss “ro | Drone ABE eS ROe| Bia > } oe an = 2 Se | x p aSe ee ee ee et 46 THE MEASUREMENT OF INTELLIGENCE It appears from Table 2, and still more clearly from the consideration of the detailed facts in Appendix II which Table 2 barely summarizes, that if we had scales for intellect whose units were really equal, the variability of an individual from day to day would be the same, regardless of whether the average amount of intellect possessed by him was that of a ‘low grade ten-year-old’ or of a ‘superior adult,’ that of an Army Alpha score of 25 or that of a score of 175. This result is so important, if true, that we have sought for facts and probabilities in real or apparent opposition to it. First, there are the obvious opposing facts of range of variability in intellectual or similar production. Keats may have written ‘‘On Reading Chapman’s ‘Homer’ ’”’ in one hour, and have written nothing in some other hour when he tried as hard, whereas an average twelve-year-old varies a- the most from nothing up to a composition scoring 50 on the Hillegas scale. A gifted stock-exchange trader who in transactions of 10,000 shares a day, averages $100 profit, may vary from a profit of $25 to one of $2,500, whereas a less gifted trader who averages $10 a day on 100 shares in the same market, it is said, varies over a much narrower range. Such apparently opposing facts as these are, however, not so simple as they seem. If we had a full record of all of Keats’ hours of equal effort, the production called zero might turn out to be far above zero. The ideas he had then might rank in poetic value far above those of the best hours of the average man. The less gifted trader may vary over just as wide a range. For example, a still less gifted trader losing $100 on the average, may lose in two days the $25 and the $2500 that the gifted trader gains. Furthermore, we have to consider the alleged common observation that as one increases his expertness in acting, music, dancing, or athletic feats, he seems to reduce his variability. Thus a sprinter who can on the average run 95 yards in 10 seconds almost never runs less than 90 yards or more than 98 yardsTABLE 2 THE RELATION OF THE VARIABILITY OF AN INDIVIDUAL TO His AMouNT oF ABILITY IN FIFTEEN TESTS OR AMALGAMATIONS OF Trsts, USING EriaHt LEVELS oF ABILITY Upper ade Gr Low Level THE MEASUREMENT OF DIFFICULTY } Ona Hy 3S bo S 5, Qs 4] Q 2ecs a a BS Bim Au WOHMHAD rH id | 1010 moan CO & 0 CO. ri E= co att | = a oO Sx | Oe ar rion oN 6 oH mH ADOmMM j ~m | 2a Sl | coMAAD OO NADNAND | AD | So oam|™ ae oN Tat | | | a | hex oan = Soest | rdnwnoon o nna | Od as A SOOOnHnMS SG COnHhD|oso eles mre narere _ narere rt ri 2 ri aoae edo co Sooo Sooo moos co eo =e) || Aaa OSoOoo Coo Como oooo Too | ip kod arena es mena ee crit | AOSdS c+ oa Sm | Pw | COMDHADNErFAHDOMER nat Der St | SAaOCORhOMWAHOnrTHWM OO Yo) — =r) RAS rei re esi re rt a DSQ aN STHEOlamutSo OMOMDE 1D © ys oT OOOAaS mwonoanrc§ ioc ma oi Oo ri re reir reir lit RO <0 i ll epce eee BRUOGWH!/6~O KR WACOM on as 2 co ~ aHm oo SS oO = © re rq «4 Sire Boas aezan S ~ =A SeeeavV iss 3 2 : 10 =| OO) Tov © D> CO oD oO CO o> oats re — rs ed) re | a~ n Be, Tord 4 2 nm — eS | Oo) AE = = im B00 ipo i: PHS = = 5 < ley eZee > | S as oO | | ~ — ~ : OA on o : Oprmbos - oO oO 0 0 ~ | I< MoT Be mee iM Mo cia se SEA ited 4 | OS RO) ep One ee Reo | OO | tz pS ees Faia ial sl fal] EY | |Savoaet KRWES'ISSSOS/|] TO | Sx eBoy Stosaaad | Gb SZAHOHAZHMEHRHHH | a alf weight to en- Average (h tries whose sum of weights <@1:()}) 96 104 100 108 10348 THE MEASUREMENT OF INTELLIGENCE in that time, whereas the heavy-footed youth whose average is 50 yards in 10 seconds may reach 60 and fall below 40. On the whole, what facts there are concerning the relation of variability to amount of ability in mental tr aits in the world at large, seem to favor a slope down about as much as a slope up. Certain hypotheses concerning the constitution of in- ereases in amount in the case of mental traits are in oppo- sition to our result. One such hypothesis is that there are certain factors producing intellect all of which act positively and by addition. For simplicity’s sake, we w ill assume that each factor contributes as much to intellect as any other. Then the average amount of intellect that an individual dis- plays depends on the number of these factors that he pos- sesses, and his variability from time to time depends on how many of them are then active. Assume that each of them has the same probability of acting as any other and that the num- ber of them that will be active at any one time is a result of the probabilities of their several combinations. In such a ease the variability increases as the square root of the aver- age amount of the ability. For example, if there are five, ten, fifteen, and twenty factors in individuals A, B, C, and D, respectively, the status of A, B, C, and D at a thousand points taken at random in their lives, will be as shown in Table 3, and the variability of D will be twice that of A. If intellect is caused in any such way as this, the number of factors is probably very large so that we may better change our illustration from 5 and 20, to say 1,000,000 and 1,500,- 000. The variabilities are then (mean square deviations) \/ 250,000 and ./375,000.2 The ability which is 500,000 greater, or 14 times as great, has a variability which is only 1.2 times as great.® | ; 5 These are for points of time. The variabilities in tests covering thirty minutes or so are the variabilities of averages of many such points, and may be represented as vi B50, 000 and V 375,000 where n is relatively large, say Vn Vn 1,000; this, however, does not change the relation of variability to amount of ability.THE MEASUREMENT OF DIFFICULTY 49 Such hypotheses as this can be nearly reconciled with our results if the difference between the intellect of the level of Alpha 25 and Alpha 175 is due to an increase in the number of factors which is large absolutely, but small in comparison with the number involved in Alpha 25. Thus if the difference is 500,000, but an ability of Alpha 25 involves 5,000,000 then the variabilities around levels of 25 and 125 will be as 1,250,000 and 11,350,000, or as 1118 and 1162, the second being only 4 percent greater. The reasonable- ness of this depends upon the location of the absolute zero of intellect. If that is ten times as far below Alpha 25 as Alpha 25 is below Alpha 175, it is perfectly reasonable. Another way out of the difficulty is to deny the validity of the theory that intellect is constituted by the addition of positive factors only. If the factors in the above illustra- tions were inhibitive against some maximum amount of TABLE 3 THE VARIABILITY OF FOUR INDIVIDUALS IN INTELLECT ACCORDING TO A CERTAIN ADDITIVE COMEINATION oF Factors ALL POSITIVE econ? of ipoaeacieal at 1000 Random Periods Intellect A B 0 31 1 1 156 10 2 313 44 3 3 313 117 14 1 4 156 205 42 5 5 31 246 92 15 6 205 153 37 7 117 197 74 8 44 197 120 9 10 153 160 10 1 92 176 11 42 160 2 14 120 13 3 74 14 37 15 15 16 5 17 1 no oO oO oO “I Oo _ oS —] Averageet et het ere oe 50 THE MEASUREMENT OF INTELLIGENCE intellect that would otherwise act, so that the more of them that acted the less intellect there would be, the relation between amount and variability would be reversed, the variability of a man’s intellect being as the square root of the amount by which the man was below the maximum intel- lect! There may be, and probably is, some combination of additive and inhibitive factors making the average intel- lects of men vary up and down from an amount typical of the human species; and this may result in equal variability for A, who is much below the average, and B, who is much above it. For example, suppose there are 6 factors, a, b, ¢, d, e, and f, each contributing —1, and 6 factors, A, B, C, D, E, and F, each contributing ++ 1; and that every intellect is constituted by 6 factors chosen from the 12; and that the momentary conditions of each intellect represent the chance combinations of its six factors. Then we have intellects whose averages range from —3 to +3, according to whether they are constituted by six minus causes, or by 9 minus and 1 plus, or by 4 minus and 2 plus, or by 3 minus and 3 plus, and so on. All will have the same variability, however, the frequencies being in the proportions 1, 6, 15, 20, 15, 6, 1, with a mean square deviation of 1.2247. A consideration of the relative probabilities of various types of constitution of intellect out of positive and negative factors would be interesting, but is too speculative to be profitable for our present purpose. The attainment of greater intellect by the lack or suppression of negative factors as well as by the possession and use of positive factors is at least a possibility; and will seem highly prob- able to many. On the whole, then, we do not need to be especially skeptical of the experimental findings that the variability in tests of a half hour from time to time is approximately equal over the range from, say, the ten-percentile adult in- tellect to the ninety-five percentile adult intellect.THE MEASUREMENT OF DIFFICULTY ol MEASUREMENT BY WAY OF THE FORM OF DISTRIBUTION OF INTELLECT IN SOME DEFINED GROUP If T,, T., T;, and Ty, ete., are intellectual tasks with _K K+a K+a+b K+a+b+e ae which oa re = i , ete., individ- uals of a group of individuals succeed respectively (K, a, b, ¢, ete., all being positive, K being greater than 0 and the largest percentage being under 100), we can measure the differences in difficulty for intellect between T,, T., T;, Ts, G C reversed Fig. 5. Four surfaces of frequency: A rectangle, Form A, Form C, and Form C reversed. ete., in terms of amount, if we know the form of distribu- tion of intellect in the group.® If, for example, n is 100, K is 5, and a, b, ec, d, and e are each 10, the differences in diffi- eulty will be in the proportions shown in Table 5, according as the form of distribution of the group is a rectangle, a surface like A, a surface like C, or a surface like C reversed, shown in Table 4 and Figure 5. 6 Our measures will approximate perfection in proportion as T,, T., Ts, T., etc., depend upon all of intellect and nothing but intellect. As has been noted, we are assuming this for the present, reserving for full treatment later the influence of failures of certain tasks to utilize intellect fully, and the influence of admixture of other factors than intellect.THE MEASUREMENT OF INTELLIGENCE Or bo We have estimated the form of distribution for certain groups, using the following procedure. Choose some group which is caused by forces that can be studied and which, so TABLE 4 Four FORMS OF DISTRIBUTION Quantity Frequency Rectangle Form A Form C Form C (reversed) rear Gg 0 ; K+ 1 84 14 1 1 K+ 2 84 44 114 14 K+ 3 84 9 224 34 K+ 4 84 15 23 63 K+ 5 84 194 174 114 K+ 6 84 194 114 174 Ker 84 15 64 23 K+ 8 84 9 34 22 K+ 9 84 44 14 113 K +10 84 14 1 1 K+11 84 4 d 0 far as can be ascertained, represents a clustering around one amount of intellect with divergences therefrom due to a large number of causes each small in amount of influence. TABLE 5 APPROXIMATE PERCENTAGES WHICH THE DIFFERENCES IN DIFFICULTY BETWEEN TASK T; Task T,, TASK Ty ETC., ARE OF THE DIFFERENCE BETWEEN T AND T,, "ACCORDING TO THE ForRM or DISTRIBUTION OF THE GROUP Fora For For ire Rectangle FormA FormC Form(, reversed Dy torn. 100 244 141 355 Ds to Te pe eee aes 100 148 105 186 T, to T oe : 100 112 91 136 1 ee 100 104 96 109 Ts to T: ae oc 100 100 100 100 Choose many instruments for measuring intellect (such as the Otis Advanced, Army Alpha, National and Terman tests), each of which (1) is known to correlate fairly wellTHE MEASUREMENT OF DIFFICULTY 53 with any reasonable criterion of intellect; (2) is different from the others; (3) was constructed without any depend- ence of the selection of elements or of the scoring system, upon the assumption that the distribution of intellect in the group in question approximates Form A. Find by each test by actual experiment the form of distribution for the group, using the scoring system for each test at its face value. Find the form of distribution which best fits all these vary- ing forms. Observe the effect (upon the form of distribu- tion) of reducing the chance error in the scores by obtaining the form of distribution for the group when two or more trials with the same instrument are combined for each indi- vidual. If the best fit distribution is of Form A, and if the reduction of the chance error does not produce divergence from this form, we may conclude that Form A represents closely the form of distribution of the real ability in the group, as measured by a scale of equal units of difference in that ability. The general argument is that nothing in the instruments themselves or their scoring favors this form of distribution for this group, and that it can not be due to the chance error, since reducing that leaves it unimpaired. The details of the argument and the evidence are pre- sented in Appendix III. They demonstrate that for Grades from 6 to 12, and probably for freshmen in colleges of equal standards of admission, the form of distribution of the pop- ulation of a grade, when perfectly measured in respect of the ability required for success with standard types of intel- ligence tests, in truly equal units, will be unimodal, sym- metrical, and very closely of Form A, the ‘normal’ proba- bility surface, the equation of whose bounding curve is the — x? exponential curve y= ——e ” where o is the mean ov 2n Square deviation. The critical reader should examine Appendix III with especial care. The method of measuring the intellectual difficulty of tasks which we adopted for our actual scale con- struction is based on it. It also provides support for certainee ee 54 THE MEASUREMENT OF INTELLIGENCE features of previous work which up till now has been taken on faith. Appendices I and II are perhaps of greater theo- retical importance, but Appendix III is fundamental for present and future practice in mental measurement. We can then measure the difficulty of any intellectual task for pupils in any one of these grades by the percent of the group succeeding with it, as shown in the illustration that follows: 3190 pupils in grade 9 were tested with four tasks in com- pleting sentences. The percentages succeeding were re- spectively 60, 30.5, 46.1, and 37.1. We assume that these are intellectual tasks, that is, that success with each depends upon intellect. The form of distribution of the intellects of the group being Form A, a percentage correct of 60 corresponds to a division of the group at — .25330, that is, at — .25330 of the mean square deviation of the group (in the ability mea- sured in truly equal units by that task) below the average or median of the group (in the ability measured by that task). 91010, + .09790, and + .32920 have similar meanings for the difficulties of tasks 21, 22, and 23. The differences in difficulty between the tasks are 21-20 = .7648, 21-22 — .4122, and so on, in truly equal units, unity being taken arbitrarily as the mean square deviation of the group in intellect. MEASUREMENT BY WAY OF THE FORM OF DISTRIBUTION OF AN ARRAY IN A CORRELATION TABLE The fourth method of attacking our problem uses, as the group whose form of distribution is to be determined, the population comprising one array in a correlation table of the sort shown in Table 6, where the individuals are ar- rayed under their scores in some examination symptomatic of intellect. Hach array consists of two compartments rep- resenting the two scores (Failure and Success) attainable in the intellectual task whose difficulty we wish to measure.THE MEASUREMENT OF DIFFICULTY Do For example, we might have data concerning success with the task in question from 1000 persons each scoring 30 in Army Alpha, from 1000 persons each scoring 35 in Army Alpha, from 1000 persons each scoring 40 in Army Alpha, and so on. Or we might have data concerning success with the task in question from 1000 persons scoring Mental Age 8.0 in the Stanford Binet, from 1000 scoring Mental Age 8.5 in the Stanford Binet, and so on. If both the total score and success in the task depend upon intellect, and nothing but intellect, the latter being one of the varying manifestations of intellect of which the former represents the average condition, the form of dis- tribution of the intellects measured in an array in such a TABLE 6 THE CORRELATION OF SUCCESS IN TASK 281 wiTH AVERAGE SCORE IN A TOTAL SERIES OF INTELLECTUAL TASKS.7 Seore in the total series Score in iaAskeceliel~ ve 29 20) 2 22) 23) (24525) 26 27 728) 29) o0meoL: Wrong | 14 44 72 94 151 213 259 274 281187 75 55 6 5 Right 2 4 14 19 38 60 111 203 265 302 223 260 90 39 12 correlation table, measured in truly equal units, will be symmetrical and approximately ‘normal.’ For they are a random sampling from the combined distribution of cer- tain individuals closely alike in average intellect, when all the variations of each individual from time to time are taken; and we have shown that each of these individual’s distributions is symmetrical and approximately ‘‘normal.’’ The use of such an array® is in fact a convenient means of applying our knowledge of the form of distribution of the variations of an individual in intellect. It is imprac- ticable to obtain a hundred trials of an individual with an 7 The entries of Table 6 are genuine, but the total series is not a series representing all of intellect, nor is the score in it an average of many trials. Such data are not available. The ‘‘Score in the total series’’ in Table 6 is in fact the score in one trial of one-half hour test of certain features of intellect. 8 We shall later see uses of other sorts of arrays. 656 THE MEASUREMENT OF INTELLIGENCE intellectual task; and even if we did, the results would be hard to interpret because of possible effects of practice. It is possible to find a hundred individuals who are substan- tially identical in their average performance at intellectual tasks, and test them all once with any given task. The measurements of the difficulty of one intellectual task in terms of the distance + or — from the average of one such array, expressed as a multiple of the variability of that array, can be made approximately commensurate with measurements of the difficulty of another intellectual task in terms of the distance + or — from the average of the corresponding array, expressed as a multiple of its variability. For we have shown that the variability of an individual (and so of such an array) in intellect is approxi- mately the same regardless of his average amount of in- tellect. Consequently the two multiples are of approxi- mately the same unit and the distance between the two aver- ages of overlapping arrays can be measured in terms of this same unit. If two arrays do not overlap, we can bridge the gap by inserting data from intermediate arrays which do form a series of overlapping arrays. THE DEFECTS OF THE MEASUREMENTS SO FAR DESCRIBED We have determined the approximate form of distribu- tion of a grade population, from Grade 6 to Grade 12, in re- spect of level of intellect at one time, if that were measured in truly equal units. We have done the same for a popula- tion (an array) characterized by identity in average of in- tellect measured by a random selection of times. By an ex- tension and refinement of the methods which we have used, this could be done with greater precision. If all that we require for the measurement of the intel- lectual difficulty of tasks is to secure a group of known form of distribution in intellect when measured in truly equal units, whose members we may test with the tasks in ques- tion, the problem is solved. Unfortunately more is required. Thechief defect in our procedures is that the difficulty whichTHE MEASUREMENT OF DIFFICULTY ov we measure by the percentages of our group which succeed 1s not pure intellectual difficulty. Any such task as solving an arithmetical problem or completing a sentence or obey- ing a command is deficient by not involving all of intellect, and often also by involving other factors than intellect, From the percentage of a group of known distribution in respect of intellect, which succeed with it, we can derive a close measure of its difficulty, but not of its intellectual diffi- culty. Although this has not been understood in the past, it can easily be realized by considering cases like the fol- lowing: A group of known distribution in respect of intel- lectual level measured in truly equal units, is tested with (a) leaping over a certain hurdle, (b) distinguishing a certain pitch from one higher, (¢) spelling a certain word, (d) giv- ing the opposite of a certain word, and (e) giving the oppo- site of a certain other word. The percent of success is equal for a, b, ¢, d, and e, being, let us say, 40, so that each of the five tasks is + .2533 S.D. The five are not equal in intellectual difficulty, however. Common sense tells us this: and the verdict of common sense is a crude intimation of the scientific fact that for (a) the + .2533 S.D. means .2533 times the S.D. of the group in ability to leap that hurdle above the mean of the group in ability to leap that hurdle, whereas for (d), the + .2533 S.D. means .2533 times the S.D. of the group in ability to think of the first opposite above the mean of the group’s ability to think of that opposite. Ability to think of the second opposite may conceivably differ from ability to think of the first opposite by involving much more of intellect, or much less of non-intellect, or both. in the same way that the ability to think of the first opposite differs from the ability to leap a hurdle. If we take the tasks chosen as intellectual tasks and put in any of the stock intelligence examinations, they will so differ. This has been abundantly proven by investigations which will be reported in Chapter IV. Moreover, no one of them will measure all of intellect and nothing but intellect. In fact, no one short task does or ean involve all of intel- lect and nothing but intellect. Any one short task measures58 THE MEASUREMENT OF INTELLIGENCE only a fraction of intellect and is influenced by other forces than intellect. That is, any one short task measures intel- lect plus an error. The nature and amount of this error must be considered in connection with any procedure for estimating the intellectual difficulty of a task from the per- centage of individuals who succeed with it.* There are other hidden assumptions and weak or even missing links in the argument by which we proceed from knowledge of who and how many can do a task, to a meas- ure of its intellectual difficulty. In the next chapter we shall expose these, subject the entire argument to a much more rigorous treatment, and seek to remedy the defect noted above and such others as are found. ® The exposure of this defect should not diminish our use of the general procedure of inferring degree of difficulty from percentage of failures in a dis- tribution of known form. On the contrary, now that we are aware of the defect, we can make much better use of the procedure than when we were ignorant of it. As we shall elsewhere show in detail, if we replace a single task by a composite of forty tasks, and use twenty or more right as our mark of ‘“suecess,’? we can use the procedure with better results than have ever been obtained hitherto.CHAPTER III THe MEASUREMENTS OF THE INTELLECTUAL DIFFICULTY OF TASKS AND OF LEVEL OF INTELLECT: More Rigorous anp Exact Metruops In the two previous chapters we have operated with pro- visional and somewhat vague definitions and inexact as- sumptions, largely in order to maintain continuity with what has been done to date in the measurement of intellect. It is now necessary to treat the whole matter of intellectual difficulty and level of intellect more rigorously. We have assumed (1) that there is such a quality or characteristic of man as altitude or level of intellect; (2) whose amount or degree is measured by the height at which it can attain success with a series of intellectual tasks ranked for difficulty; (3) that the same individual differs in the amount or degree of it which he has available from time to time; and (4) that different individuals differ in the amounts or degrees of it which they have available on the average. (5) We have defined intellectual tasks only loosely and vaguely as those in which men esteemed very intelligent differ most from men esteemed very unintelli- gent. (6) We have defined intellectual difficulty only loosely and vaguely as that characteristic of a task, an in- crease in which reduces the number of intellects who can succeed with it, eliminating those esteemed unintelligent more rapidly than those esteemed intelligent. Since we are treating intellect as the ability to perform intellectual tasks, our primary need is a clearer and more exact notion of intellectual tasks. We can reach this in either of two ways. The first is by assuming that certain abilities, such as to understand directions, or to know facts, or to use relations of likeness, part and whole, actor and acted upon, genus and species, and the like, or to use facts 5960 THE MEASUREMENT OF INTELLIGENCE together, and certain tasks which represent them, are as a whole intellectual. We must then describe these tasks, and the credit or weight to be attached to each, precisely, and put them in a total series in such form that an individual intellect can attempt them all. The second is by assuming that the ranking of individ- uals in an order from idiots to Aristotles for amount of in- tellect! by some defined consensus of opinion is valid. We must then describe this consensus and the method of its operation. If we take the former way, we may attach the term ‘ On mal Cc CO Po Pee oe oe ne oe Oe ek THE MEASUREMENT OF INTELLIGENCE SUB-SERIES B SENTENCE CoMPLETION, Ora B PM VVesput Stalnps Ol @ === ea Woeicuimeat with & "==. : SaWwihen we are sick, we call the —-— == WO Oi) at night. RCA ee ee with a pencil. Mhesnuesisionsthe ...- sss eee Oncrandione make eee weAgdoownas [OUT 2.2 FeApplessare Tos ts 2 ee ; 10. Ghairstare madeol = ae : ARITHMETIC, ORAL B 11. Counts 2 pennies. (Credit if successful in 3 of 3 trials.) 12. Counts 4 pennies. (Credit if successful in 2 of 3 trials. ) 12. ‘‘One and one make ............ »» Add ‘‘what?’’ if neces- sary. 14. ** Which is the biggest pile?’’ showing 13 and 2 pen- nies. (Credit if successful in 3 of 3 trials.) 15. Recognizes 2 fingers. (Credit if successful in 4 of 5 tg) 20. trials. ) . ““Which is the longest of these three lines?’’ (Credit if successful in 3 of 3 trials.) “Which is the biggest, a baby or a man?”’ (Credit if successful in 2 of 3 trials.) . Adds unseen, 1 plus 2. (Credit if successful in 2 of 3 trials.) Subtracts unseen, 3 minus 2. (Credit if successful in 2 of 3 trials.) Subtracts unseen, 3 minus 1. (Credit if successful in 2 of 3 trials.)THE MEASUREMENT OF DIFFICULTY 69 VocABULARY B (The method is as in A above.) The words are: 21. soup 26 comb 22. bag 27. locomotive 23. window 28. door 24. wings 29. cradle 25. envelope 30. sun DrIrREcTIONS, OrAL B Set 1. (with paper and pencil) (Unless otherwise specified, the tasks of Directions Oral B are those of set 1.) 31. ‘‘Make a line.’’ (If S fails, show again, but do not eredit. ) 32. ‘‘Make across.’’ (If S fails, show again, but do not eredit. ) 33. ‘‘Turn the paper over and make a ring on the other side.’’ 34. ‘‘Turn the paper back again, and make a line on the other side.’’ 35. ‘‘Make two rings down here,”’ pointing. 36. ‘‘See the lines? Make one more line.’’ (Credit if one or two lines are drawn anywhere.) ‘Make two crosses, like these two. Make one here and one here,’’ pointing. 38. ‘‘Make the other arm on this man,’’ pointing. 39. ‘‘Make the other leg on this man,’’ pointing. 40. ‘‘Make 2 lines, like these two,’’ pointing. Fig. 7 shows the pictures used in connection with tasks 37, 38, and 39, reduced to half size. For task 36, three parallel lines two inches long and half an inch apart one from another, drawn parallel to the side of the sheet, are shown in the lower left-hand corner of a letter-size sheet. For task 40, two such parallel lines are shown, at the top of a sheet otherwise blank.70 THE MEASUREMENT OF INTELLIGENCE Gf JA. t/ es ¥ 33 D> (A) Ce? \ - ' , (3 ce s2 \ 4 7d -} . a — | (/ / LE ¥! : ‘ nena 5 J Oras . bs - ; he \, mor gy \ ott {Pr eS g- | 1 J £ LD) a \) Qi A ' IN r i Q) f\ ee Oy eee, ju {y py — TI MESES ; Nh fea | i l iL aS aoa! a “<0 we ses} yey Ay | AY EN Kd Parr Ga eesti 6 < 5 Lil Fie. 6. Six rows of pictures such as were used in the Picture Vocabulary tests: reduced to three-fourths of the original dimensions.THE MEASUREMENT OF DIFFICULTY (Al SUB-SERIES C SENTENCE CoMPLeTIoN, Ora C ie Cloudsyare!in the — sss ase 2. We send children to school, because they ATS Ge SeaiewbUEn es in the stove. Am Cyrene is barking at the eat. Oo} We wash clothes with = and water. GuiGmassiis. ee is Sweet. SaMersee with Our... Jr Roses and Gaisies are HO Ore eats the mouse. Xe Y @ Le ey) ci sce Fie. 7. The pictures used with Directions Oral B, 37, 38 and 39: reduced to one-half the original dimensions.tie THE MEASUREMENT OF INTELLIGENCE ARITHMETIC, OraL C 11. Counts 5 pennies. (Credit if successful in 3 of 3 trials. ) 12. Counts 10 pennies. (Credit if successful in 2 of 3 trials. ) 13. ‘“‘Show me 3 pennies.’’? (Credit if successful in 2 of 3 trials.) 14. ‘“Which is the biggest pile?’’ showing 10 and 5 pen- nies. (Credit if successful in 3 of 3 trials.) 15. ‘*T'wo and one make —______. > (Add ‘‘what?”’ if necessary. 16. Recognizes 3 fingers. (Credit if successful in 4 of 5 trials. ) 17. ‘“Which is the biggest, a chair or a eup?’’ (Credit if successful in 2 of 3 trials.) 18. Subtracts unseen, 5 minus 4. (Credit if successful in 2 of 3 trials.) 19. Subtracts unseen, 3 minus 3. (Credit if successful in 2 of 3 trials.) 90. Subtracts unseen, 2 minus 2. (Credit if successful in 2 of 8 trials.) VOCABULARY C The method is as before. The words used are: 21. camera 26. pistol 22. stationery 27. vase 23. hole 28. stamps 24. corn 29. tiger 25. puppy 30. kennel Three of the rows of pictures used in this type of test are shown in Figures 8,9 and 10. It will be observed that the task sometimes involves a considerable degree of ability in interpreting the pictures. : LS \} ) Ni | PSE SZ ¢ = ai \ : és | se Fie. 8. Picture used with ‘‘lamp.’’THE MEASUREMENT OF DIFFICULTY te = Z wae ep. se her” Fic. 9. Picture used with ‘‘pond.’’ Fig. 10. Picture used with ‘‘cork,’? Directions, Orau C In the actual tasks the drawings have twice the dimen— sions of those shown here: 31. “See the square?’’? (A 14 inch square is shown at the top of a sheet 11 by 83.) ‘‘Make a ring in the square.’’ 32. ““Now make another ring in the Square.’’ 33. “See the ring? Make a cross in the To) Ona (AN circle 2 inches in diameter is shown near the middle of the sheet.) 34. ““See the cup. Draw a line around the Cupless Chie 11 is shown at the bottom of the sheet.) 30. ‘‘Make a ring and across up here,’’ pointing. 36. ‘‘Make a cross where the line is.”’ (A line 23 inches long is shown, parallel with the bottom of the sheet. ) 37. ‘‘Draw a line to finish the square.’’ (A half-inch Square with the left-hand side omitted is shown.) 38. ‘Make a cross in here,’’ pointing to a triangle which is printed with a square on one side of it and a circle on the other. The square is 14 in.: the tri- angle has a base of 1} in.; the circle has a diameter of 14 in. 39. ““Make a cross X in the square.’? (Fig. 12 is shown.)74 THE MEASUREMENT OF INTELLIGENCE 40. ‘‘Make two squares out of these.’ (Two § in. squares are shown, one with the right-hand side lacking, the other with the lower side lacking.) oa] tae Fia. 11. {— = a) sweaoS & Fia. 12 a> <7 a se 2 [atl b S| a\ )} dis Fie. 13 Xs C] Fig. 14 SUB-SERIES D SenTENCE CoMPLeTION, OraL D There being only eight tasks, each is counted as 144. Ie RBOV St ae baseball. (‘‘Playing’’ and ‘‘play ball’’ are called wrong.) Piherstansvand ther. s eee will shine tonight. ILwo and one make 22 = Auboyehas: 283 ees and legs ithe bird sings: the eee ares WON ATC) kn es es than boys. pIUG fet eke ihe at pulls the eart. SeHorses are big and =e : oeTHE MEASUREMENT OF DIFFICULTY ~] Or ARITHMETIC, Orat D 11. Counts 15 pennies. (Credit if successful in 3 of 3 trials. ) 12. Recognizes 4 fingers. (Credit if successful in 3 of 5 trials. ) 13. ‘Show me 4 pennies.’’ (Credit if successful in 3 of 3 trials. ) 14. ‘‘How many fingers have you on one hand ?’’ 15. Recognizes 3 fingers. (Credit if successful in 5 of 5 trials. ) 16. Recognizes 5 fingers. (Credit if successful in 5 of 5 trials. ) 17. “Which is biggest, 3 or 1?’’ (Credit if successful in 2 of 3 trials.) 18. Adds unseen, 2 plus 2. (Credit if successful in 2 of 3 trials.) 19. Adds unseen, 3 plus 2. (Credit if successful in 2 of 3 trials.) 20. Subtracts unseen, 5 minus 3. (Credit if successful in 2 of 3 trials.) VocABULARY D The method is as heretofore. The words are: 21. tools 26. trumpet 22. fuel 27. cube 23. screw 28. cork 24. angel 29. blade 25. cartridge 30. arrow Directions, Orat D The illustrations shown here all have dimensions half those used in the actual tasks. Each row is also in the actual tasks separated from the one above and from the one below it by from 1 to 3 inches. 31. ‘‘Make a cross inside the little square.’’ (Fig. 13 is shown. )76 THE MEASUREMENT OF INTELLIGENCE 29 ‘Draw a line to make this a eross,’’ pointing. (A thick line ? in. long, parallel to the side of the sheet is shown.) 99 ‘‘Make a ring on the cup.’’ (Fig. 11 is shown.) 24 ‘See the ring. Make 2 crosses in the ring.” (A circle 2 inches in diameter is shown.) 35. ‘Make a cross on top of the boy’s head.”’ (Fig. 13 is shown, on a new sheet. ) 26. ‘‘Draw a line around the big hand.’’ (Fig. 14 is shown. ) 37. ‘“Make a cross on the horse. a new sheet.) »» (Fig. 12 is shown, on 82 <‘Make a cross outside the big square.’’ (A second copy of Fig. 13 is shown.) 39. “Make this a circle,’’ pointing. (An incomplete circle with a diameter of 3 inch, lacking the right- hand quarter, is shown.) 40. ‘*Make a line outside the ring.’’ (A circle 2 inches in diameter is shown.) The sub-series N, O, P, and Q which follow presuppose ability to read in the individuals measured by them. SUB-SERIES N SENTENCE COMPLETION Write words on the dotted lines so as to make the whole sentence true and sensible. Write one word on each inch of dots. LNG Se ee a time was progress) Tega ae during the last half of the nine- TECnGhy CO ri rt rs < ~~ ~D © co N e fa on Sic 3) Oo © nm oe _ m = 2 0 QS © ry FA = 24 ae = = le = moa —_ ec i a z } o0 ° A SiN - “9 Oe = mn = ie ae | 2) - ’ ac ~- os mh) oO mM 4g TR 0 and<100) we may express the difficulty of the task as Mt + Aor: where M:; is the central tendency of the group in the ability mea- sured by t, ot; is the variability of the group in the ability measured by t, and A is a factor dependent for its sign and absolute value on k. We have seen that we cannot, without further knowledge to that effect, assume that Mr is equal to the central tendency of the group in intellect or anything else save the ability measured by t;; or that or; 1s equal to the variability of the group in general intellect or anything else save the ability measured by t. THE PROBLEM IN THE CASE OF SINGLE TASKS, EACH OF WHICH MEASURES INTELLECT PLUS A MERE SAMPLING ERROR We have now to consider the possibility of such further knowledge. Consider it first for cases where t, is a repre- sentative sample of intellectual tasks, and the measurement afforded by t, is a compound of perfectly measured intel- lect and error of sampling, and the errer is of the same 109Spe eel el hre 110 THE MEASUREMENT OF INTELLIGENCE magnitude for any one t, as for any other, so that the aver- age of a sufficient number of t,’s would be a perfect measure of intellect, so that the correlation of any one t, with any other will be a constant. For such cases the knowledge needed is available and M, and q, the central tendency and variability of the group in intellect, can be computed from M:, and or;, when the amount of the error is known. Since M, is the average of the N individuals of the group, each measured in (t, +t.+t,;+1t,--- t,)/n, M, will, if N is large, approximate closely to Mi, Mto, Mes, ete., and any one of these will approximate closely to any other of them. The effect of the error whereby the estimate of intellect by any t differs from that by the average of all the t’s is as often plus as minus, and is negligible for our purposes so far as concerns the central tendency of a large group. M, may be taken as equal to Mx. Because of the sampling error, o; will always be smaller than ot3. 6+ will equal Vo,” + o.”, where o, is the variabil- ity of the N individuals each measured by (t,; + t. + ts + t, - t,)/n, and o, is the variability of the sampling error, dependent upon the variations of t,, t, ts, ete., im any indi- vidual from the average of t,, ts, ts, ete., for that individual. o. may be computed in various ways from various measures of the unlikeness of t,, to, tz, etc., in the same individual, such as the correlation of t, with t, in the group, or the cor- relation of t, or t. or ts; with (t, +t.+ +t, --:~- t,)/n, or the differences between t, and t, in individuals, or the variabil- ity of an individual in intellect as estimated first by t,, then by t., then by tz, and so on. - 0; é. : Rist) —lowy Gea, OL o. aa [Kelley, ’23, formula V [ta to 166, p. 213] and since r,, = V/rtaty [Kelley, ’23, formula 160, 0; p. 206], o, =—, where Tei 1 This second formula is presented because it lends itself better to much of the material at our disposal. It is derived by Kelley directly from Spearman’s formulas for the correlations of sums or averages. The reader of less mathe-THE DIFFICULTY OF A SINGLE TASK 111 o; =the variability of the group in intellect, which is here identical with o,, the variability of the group in (t, + tevate ty <> > t,)/n. r,, = the correlation between the estimate of intellect by any one t in question and the estimate of intellect by (t, + t.-+ ts -~ : t,)/n in the group in question. o, = the variability of the group in the ability measured by the one t in question. Ttatp — the correlation between the estimate of intellect by any one t and that by any other t. For example, assume that the completion task 22 is one taken at random from a number of completions, each of which measures intellect plus a similar sampling error, the average of all of them measuring it exactly. We found the difficulty of task 22 to be .098 times the Go. of the ninth grade group harder than the Ms». of that group. In accord with our assumptions, we may replace M.. by M;. The correlation between score in task 22 and intellect may be taken as approximately .40 for the group in question, since the obtained correlation with a fairly close representation of intellect is .374. In place of .098 ox» we then put .098 a The purely intellectual difficulty of task 22, freed from the effect of the sampling error, is now measured as .245 (6; tor oth eraae) aNd can be compared with that of any other task representing intellect plus the effect of sampling error for which we have the percent of correct responses in this group. Thus task 20, which matical ability may easily derive it from the more familiar formula for the correction for attenuation, as follows: Consider the ordinary Spearman attenuation formula for our case, Etim ia Veet ty Let 1 and i be perfect measures of intellect. Then T1.= TOO} ateccs a5) 1.00 by hypothesis. So Viet, =Ter- e ° a Op Op O1. . Substituting in o,= ——— we have o¢=— = — im this case. r T Tei tito ti112 THE MEASUREMENT OF INTELLIGENCE showed 60% of correct responses in this group, giving — .25330.9, has a correlation with the same fairly close rep- resentation of intellect just mentioned, of .184. Its corre- lation with intellect may be taken as approximately .20. Assuming that it is a random sample from a set of tasks whose average measures intellect perfectly,? and each of which suffers an error of equal magnitude, we transmute oa Oi9g “EROO easier than M, in Grade 9,”’ or —1.27 oi9. Task 20 is then 1.51 easier than task 22, the unit of measure being the mean square deviation of intellect in Grade 9. If the error whereby the ability measured by a task dif- fers from intellect is a random sampling error, so that per- fectly measured intellect can be got by merely increasing the number of tasks strictly comparable to it drawn in the sample, we can then correct for it, the correction being a further application of the facts shown by Spearman [’04, ’07, ’10, and 713], Boas [’06], Thorndike [713], and Kelley Calo, 721 and 723]. If the single tasks whose intellectual difficulty we wish to determine measured intellect perfectly, except for such a random sampling error, we could and should compute Tr; (OF Tisccavp)) for each of them in each group used, and apply the corrections. The effect of the correction may be illustrated by cases where we have reduced the sampling error empirically by using ten tasks in place of one. Thus for 250 pupils in Grade 84, the median of the ten percents correct for the ten single word tasks and the percent scoring five or more correct responses out of the ten was as shown in Table 10 for each of the fourteen 10- word composites in the I. E. R. A-2 and B-2. Table 10 2 This average will have to be computed from a larger number of t’s than would be needed in the case of the tasks from which task 22 was drawn as a random sample, since the error is here larger, making the correlation with i smaller. 66 ‘¢ 9533 oo) easier than Ms in Grade 9’’ into 9)THE DIFFICULTY OF A SINGLE TASK 113 also reports the o, values and the o,) values which corre- spond to these percents. The percent is more remote from 50 when we shift from one right out of one to five or more right out of ten; and the value in terms of oj) is more remote from the median of the group. TABLE 10. THE EFFECT OF DECREASING THE ERROR OF ESTIMATING THE DIFFICULTY OF THE MEDIAN TASK OF A COMPOSITE OF TEN BY THE USE OF THE PERCENT OF A Group Scorine ‘‘5 ok MORE RIGHT OUT OF TEN’’ IN PLACE OF THE MEDIAN OF THE TEN PERCENTS OF THE GROUP SCORING ‘*RIGHT’’ IN THE TASKS TAKEN ONE AT A TIME. VOCABULARY TASKS IN THE CASE OF 250 PUPILS OF GRADE 8%. Distance from the Median Median of Percent Ability of the Group the ten Seoring In Terms In Terms Percents 5 or more of Oo, of on Composite 1 93.2 98.0 —1.49 — 2.05 a la 93.4 97.6 —1.51 —1.98 ‘ 2 82.6 92.0 — 0.94 -1.41 ns 2a 7.4 97.2 —1.15 -1.91 rs 3 62.8 74.0 — 0.32 — 0.64 oie 3a 72.3 86.4 — 0.60 -—1.10 ag 4 55.2 64.4 - 0.13 — 0.37 os 4a, 55.6 61.6 — 0.14 — 0.30 as 5 43.0 44.4 +0.18 +0.15 es da 43.2 44.8 + 0.17 + 0.13 zs 6 23.4 15.2 + 0.73 + 1.03 ae 6a 25.8 19.6 + 0.65 + 0.86 es 7 15.8 4.4 + 1.00 +1.71 de 7a 10.8 8 + 1.24 + 2.41 It may be realized more exactly by applying the formula to a few representative cases. Thus, tasks A, B and C, each being done correctly by the same percent (80) of a group (of normal form of distribution), but correlating with intellect to the extent of .20, .35 and .50, respectively, in that group, will be of intellectual difficulty — 4.208 oi, — 2.405 o, and — 1.683 o;, respectively. Tasks C, D and H, although done correctly by very different percents of the group, are of equal intellectual difficulty, their differences114 THE MEASUREMENT OF INTELLIGENCE in difficulty being counterbalanced by reverse differences in intellectualness. %sue- Correlation Intellectual Task cessful with intellect Difficulty Difficulty A 80 .20 — 84166, — 4.2086, B 80 35 — .84166, — 2.4050, Cc 80 50 — 84166, ~ 1.6836, D 60.1 .20 — .25590,, — 1.2800, E 67.3 35 — 44820, — 1.2800, F 73.9 50 — 64036, — 1.2800, THE PROBLEM IN THE CASE OF SUCH SINGLE TASKS AS ARE USED IN CAVD OR IN STANDARD INTELLIGENCE EXAMINATIONS Unfortunately we cannot be sure that a single task will measure intellect save for such a sampling error. This may be best realized by taking our Intellect CAVD as intellect for the moment, and considering a task made up of 20 com- pietions, 20 arithmetical problems, 20 words and 20 direc- tions, all of equal difficulty. The ability measured by such an 80-element task, if the elements are well selected, is ap- proximately perfectly representative of ability CAVD. Now if we take one of the eighty tasks at random, we do not have something which measures what the eighty together do plus an ordinary error of sampling. One word-knowledge test does not differ from one arithmetical problem test in the same way that one arithmetical prob- lem test differs from another. The total is too varied a synthesis and the single task is too small a sample for the latter to represent the former plus an ordinary sampling error. In the eighty are four different sorts of tasks; in the twenty completions there may be four or five which require knowledge of specialized facts; amongst these four or five, there may be one which is very much easier for in- tellects which have lived in the country than for intellects, otherwise similar, which have lived in the city; and another of which the reverse is true. The case is not so much like measuring a man’s height a dozen times and taking one of the dozen to representTHE DIFFICULTY OF A SINGLE TASK 115 their average, as like measuring his head, his neck, his trunk, his legs to the knees, his shins, and his feet and adding the results to get his height. Our measures of in- tellect are mventories; we combine C, A, V and D as we might combine a man’s real estate, ships, stocks, bonds, ac- counts receivable, merchandise, materials, and cash on hand. What he happened to own in the way of real estate in Boston would not in a useful sense represent his total wealth plus a sampling error. Assume for the purpose of illustration that: (1) intel- lect is composed of C and A in equal parts, and is perfectly measured at the level in question by a task composed of 20 completions and 20 arithmetical problems, the two twenties having equal weight; (2) a task comprising the 20 comple- tions will correlate perfectly with a task comprising 100 completions from which the 20 are a random sample; (3) a task comprising the 20 arithmetical problems will corre- late perfectly with a task comprising 100 problems from which the 20 are a random sample; (4) the 20 completions or the 100 completions will correlate 0 with the 20 arith- metical problems or the 100 arithmetical problems. If now N individuals composing a group distributed ‘‘normally’’ are measured in respect of their success with a task composed of 40 completions, and if a given percent succeed with the task (that is, have 20 or more of the 40 right), the difficulty of that 40-completion task is Myo. + XOsoc.- The correlation between the score in 40C and the score in intellect, or C + A measured by 20C + 204A, is .707. The correlation between the score in 40C and the score in another 40C is 1.00. By our assumptions O20c + 20a == V O'20c + O20 since Toc 20a — 0. O20c + 208 = V 2020¢ since O20c — O20a- 9 , : O20c + 20a — TOE O40c SINCE O4oc = 20200) SINCe Toeoc 20c is 1.00. O20c + 20a — 107 O40c-116 THE MEASUREMENT OF INTELLIGENCE ; 0; But by the formula | o, =——— ] we should have S Wit O20c + 20A a8 0400 — 5 giving V Teoc 20c O20c + 20A O0C at V1 Or O20c + 20a — O400: We see the reason for the discrepancy if we consider the attenuation formula Tc40 with C20 + A20 Tow with (C+ A)ao— , V Tc40 cso T(c20 + 420) (C20 + A20) With our present assumptions, You witn (c+ apo IS NOt 1.00; because, no matter how many C’s we take, we do not get all of intellect and nothing but intellect. It is in fact .707. So we do not have V Tc4o with c40—— I'c40 with C20 + A209 but Sa a To40 with C20 + A20 V To . “40 — ae TA C40 with C40 lar f ’ 107 If we substitute Tos witn coo « azol HOD Wt cna Uke oe a i formula o,. = >= ——=, we have again the erroneous result IV altits O20c + 20a — O40: Now the correlation between C, and (C+A+V+ D).. or any other form of perfectly measured intellect is not perfect; and the correlation between C., and either A. or V. or D. or Picture Completions, or Geometrical Rela- tions ,, is not in fact perfect. In general, if we sample by taking one small task, it has to be so limited that if we take a thousand tasks closely like it, the score therein need not correlate perfectly with the score in intellect, or with the score in a thousand tasks closely like any other one} taskTHE DIFFICULTY OF A SINGLE TASK 1ULYE with which we might begin. In particular, no single com- pletion, or word to be defined, or problem in arithmetic, or sentence to be comprehended can safely be regarded as dif- fering from intellect only by a sampling error such as may be adequately corrected for by 0; 0; Oo. = ———or o, =—. Vite Vj A single task, t,, measures not a large part of intellect plus a small error due to the action of a large number of factors of about equal magnitude, but a small part of in- tellect plus a large error. The latter is due to the action of factors some of which, like residence in the city, access to books, formal training with arithmetical problems, spe- cial acquaintance with the particular word or sentence or problem, may be of very great magnitude in comparison with others. More generally, o, =o; ri; OF o. = ” is, as Kelley’s dis- ti cussion [p. 213] makes clear, true for a case where i is simply the average of many t’s, each of which has closely the same o as any other and closely the same r,, as any other. It is not true when we fail to get i by a collection of tasks however extensive. And no matter how many completions we take, we shall never get an i made up of completions and arithmetical problems unless the corre- lation between sentence completion and solving arithmeti- cal problems is perfect. The quantitative importance of having a varied as well as a large sample may be illustrated by measurements of the correlation between i, as represented by the summation score in CAVD (40C + 40A + 40V + 40D), and Composites of 10 made up all of C or A or V or D on the one hand, and on the other, composites of 10 made up of 2C+3A-+ 2V + 3D or of 3C+2A+3V+2D. In the case of 240 col- lege graduates, the average of the former sort was .59 with a P.E. of + .028; the average of the latter sort was .72 with a P.K. of + .022.118 THE MEASUREMENT OF INTELLIGENCE It should also be noted that, even if the correlation be- tween the score in an infinite number of completions and the score in an infinite number of arithmetical problems were perfect, so that we got all of intellect and nothing but intellect as well by a sampling of one type of task as by a mixed sampling (the reduction of r,, below 1.00 being due purely to sampling error), still the practical difficulties in the way of applying the correction would make it far wiser first to construct composite tasks. It is very laborious to compute r,, for each element. It will be low (roughly from .20 to .60 for a group of individuals in the same school grade), and the probable error of a low bi-serial r is such that an enormous number of individuals must be tested to obtain r,, with a precision such that the probable error is less than .01 (from 5,000 to more than 10,000 for r= 40, according as the split of successes and failures is near .90, .00, or remote therefrom). THE SOLUTION BY THE USE OF EXTENSIVE COMPOSITE TASKS The only safe and wise course is, then, to make sure that the tasks whose difficulty we are to measure are alike in the amount of intellect which each involves, and in the amount of non-intellect by which each is contaminated, by using composite tasks each containing many single tasks, repre- senting with proper weight the various aspects or constitu- ents of intellect. The nearer we come to having each of them measure all of intellect and nothing but intellect, the safer our course will be. With composites which differ from i only by the sam- pling error the correction formulas are appropriate. In proportion as the composite is made to include a large sampling, the labor of computing r,, or rt,t, to a given de- gree of precision is reduced and the reliability of the cor- rection is increased. With forty-element CAVD com- posites, for example, it is safe to infer o, from ot,, either by 6; = VIt,t,0t, OF by o, —Ft,10t,. In constructing composite tasks whose difficulty will be truly intellectual difficulty, freed from the sampling errorTHE DIFFICULTY OF A SINGLE TASK 119 by having many tasks, and freed from the constant error by having a proper representation of all the elements of intellect, it may be desirable, other things being equal, to include in any one composite only tasks which would show approximately the same intellectual difficulty if, by a mir- acle, all of intellect, and nothing but intellect, could in each case be utilized for success. The measure of the difficulty of a composite of n tasks would be more reliable if this could be the case. The construction of composites of speci- fied amounts of difficulty would be less a matter of trial and correction. So, for this purpose, we may need to measure approxi- mately something which, for lack of a better name, we may call the ‘‘intellectual difficulty’’ of single tasks, and to know how close the approximations are. The facts which we shall present in this connection are also of importance in estimating the errors in scales’ which have been constructed on the assumption that ot,, Ot.) Ot,, Ot, etc., are equal. They are also of importance in connection with the general technique of selecting single tasks to make a composite, even if we make no attempt to select them to be of the same intellectual difficulty, rather than of the same difficulty. These facts are the percents of some group succeeding with the several tasks (t,, ts, tz, etc.) whence we may com- pute M measures of their difficulty (Mi, +C, or, Me, + C, or, Mt, +C, o,, ete.); and the correlations (rt,1, Yt,1, Tigi, ete., between each of many single tasks and intellect (CAVD or some other defined intellect), whence we may compute the extent to which ty, t2, ts, ete. represent intellect, and so estimate their ‘‘intellectual difficulty.’’ We have seen that with a large group; M:,, M:,, M:,, ete., will be closely equal. In proportion as fi,i, Tt,i, Tt,1, etc., are ap- proximately equal, ot,, ot,, ot, ete., will be approximately equal, and o, will be approximately the same fraction of each of them, equalling respectively V/o,2 + Ey’, Vo,?+ E.”, 3 Such as the Buckingham Spelling Scale, Trabue Completion Scales, Van Wagenen History Scales. 10120 THE MEASUREMENT OF INTELLIGENCE \/6,2 + Hs’, where E, or E, or Hz is the ‘Cerror’’ by which the estimate of intellect by the single task diverges from the estimate of intellect from a properly weighted sum of all tasks. E,, E», Es, ete., will be approximately equal, if they produce approximately equal reductions from perfection in the correlations Ft,1, It,1, I't,1, ete. If, then, we select single tasks which are done by equal percents of a large group, and also are approximately equally closely correlated with intellect, we shall have equality in the sort of intellectual difficulty which we are discussing. For example, Table 11 shows, in the case of 30 reading tasks, the percents succeeding and the correlations (bi-serial r) with the combined score in two forms of a standard intelligence examination given a year apart (the L. E. R. Tests of Selective and Relational Thinking, Gener- alization and Organization‘). The facts are given for 668 pupils in Grade 11. Using the facts of Table 11 as our guide, tasks 10, 15, and 24 may be expected to be of approxi- mately equal ‘‘intellectual difficulty.’’ They are approxi- mately equally difficult because the percents succeeding are respectively 66, 65, and 67. They are approximately equally intellectual because the riyi’s are, respectively, .40, 41, and .38. We can also balance low degrees of %s (per- cent successful) against high degrees of ru so.as to get tasks that would be of equal intellectual difficulty in so far as the formula is applicable. Even if it is not desirable to spend time in choosing tasks which are alike in the + — values of o, as inferred from 6,;—T,:6, it will be very useful to know how much difference will be shown in the r,,’s of single tasks in com- pleting sentences, solving arithmetical problems, knowing word-meanings, following directions or answering ques- tions about a paragraph, giving opposites, possessing and using information, completing pictures, supplying or se- lecting the proper related term as in the analogies test, and other stock forms of tasks used in instruments for measure- . : 1.70 4The self-correlation of this combined score is approximately 7 g5 or 92, in this group. :THE DIFFICULTY OF A SINGLE TASK OT ment of intellect. For, other things being equal, the higher rz; 18 the more suitable the task is for inclusion in a com- posite to measure i. TABLE 11 THE DIFFICULTY AND INTELLECTUALNESS oF 30 SINGLE TASKS IN UNDERSTAND- ING SENTENCES, MEASURED BY THE PERCENT oF 668 11TH GRADE PUPILS SUCCEEDING WITH EACH, AND BY THE CORRELATIONS oF SUCCESS IN EACH WITH THE AVERAGE SCORE IN Two FORMS oF THE I.E.R. SEL. REL. GEN. Org. EXAMINATION, % Sue- Unreliability Task ceeding Te} of ri; (o;) DIA 1 93 43 laa OT 2 94 .30 eM 3 91 .29 SE OGD as 84 oD SCE O4: 5 76 .36 e045 6 83 45 EOL 7 75 43 04. 8 82 5B} OS. 9 82 54 oe .04: 10 66 .40 SO {We 11 81 52 CO704: 12 69 48 6204. 13 ae, 45 66" 204 14 o7 .28 e045 15 65 41 ‘Ce 04. 16 75 48 fe 04 ily 66 45 13 16 16 uw 800 ‘* 11 959 880 ‘‘ 14 4 960 ** 1039 1040 ** 1119 7 OLS 6 AN 1120 ‘* 1199 N 3 € J N 9 27 9 39 ¢ 1200 5% 1 11 17 5 t 1280 ** 13 16 10 10 1360 ‘* 14 11 24 N 1440 ** 1519 AN 1520 ** 1599 19 11 nN 1600 ‘* 1679 1680 ‘* 1759 a 13 Ww io 1760 ‘* 1839138 THE MEASUREMENT OF INTELLIGENCE THE RATINGS The ratings were combined by simple addition, the re- sult being a series of arbitrary numbers from 32 to over 3,600 which represent accurately enough for all our pur- poses an order of difficulty by the consensus. Its statistical reliability is fairly high. The sum of the ratings by ten of the judges (Br., Mi. Ro. (2), Thom., Thor. (2), Vi. and Wo.(2)) corresponds closely with the sum of the ratings by the other ten. The facts appear in Table 17. The reliability is about the same for any one sort of task, such as sentence completion, or arithmetical prob- lem or word knowledge, as for the entire series. That is, the judges agreed about as closely when they compared two tasks of different sorts as when they compared two tasks of the same sort. The correlations between the two sums of ten are as fol- lows: Completion tasks ——_____ 973 Arithmetic tasks = .988 Wocabulaiy tasks =e 954 Directions) tasks: —— .996 imfonmation tasks, === 979 Opposite tasks: = 978 The average of the six is .978. The correlation when all are mixed together is .984. This material is unsuitable for the computation of co- efficients of correlation, the distributions being of very ir- regular form. The correlations given above are used only as rough indicators of the closeness of agreement between the two groups of ten judges. The mean square error and the median or ‘‘probable’’ error of the sum of the twenty ratings for any task are as shown in Table 18.1. The error varies, increasing in general 1 These measures of unreliability are computed from the mean square devia- tions of the differences between the sum of the ratings of the first ten and the sum of the ratings of the second ten judges. The mean square error for i a sum of ten Coe ag —Oaitr.. The mean square error for the average 9 aMEASUREMENT OF DIFFICULTY BY EXPERT OPINION 139 with the difficulty of the tasks, but also decreasing at the two extremes of the set of tasks used. On the average, it is about one thirtieth of the difference between task I and task II, shown at the bottom of Table 18, for the mean square error, and about one forty-fifth thereof for the prob- able error. TABLE 18 THE PROBABLE DIVERGENCE OF A DIFFICULTY RATING BY 20 EXPERTS FROM THE AVERAGE OF AN INFINITE NUMBER OF DIFFICULTY RATINGS OF THE TASK, (EAcH RATING BEING THE AVERAGE OF THE RATINGS OF 20 EXPERTS). | The unit being one The unit being one hundredth of the|hundredth of the difference2 between! difference between Level A and difficulty rating for te unit being the same as that of the | difficulty ratings by the 20 experts | Level O Task I and Task II Tasks rated under| S.D. P.-E. SDE Sip) pie 400 (approx.) | 44 29 1.5 1.0 1.3 8 400 to 799 81 55 2.8 1.9 2.3 1.6 800 ‘* 1199 | 97 66 | 3.3 2.2 2.8 1.9 1200 ‘* 1599 | 99 67 3.4 2.3 | 2.9 1.9 1600 ‘* 1999 110 74 | 3.7 2.5 | 3.2 2.1 2000 ‘* 2399 129 87 4.4 3.0 3.7 2.9 PAOOKE D709) | 417 79 £0 2 | 3 8s 2800 ** 3199 108 73 3.7 2.9 3.1 2.1 3200 ‘* 3599 | 86 58 2.9 2.0 | 2.4 7, I 1. Hold up your hand. 2. Show me your nose. Put your finger on your nose. 3. Show me your mouth. Put your finger on your mouth. 1 1 : of two sums of ten equals- = X ——==XoOairr.. Since, however, we are using V2 V2 the sum of twenty in place of the average of two sums of ten, our numbers are all twice as large as they would be for the average of two sums of ten. That is, the mean square error for a sum of twenty equals: 1 - x- x yr simply ‘ V2 V2 Oairr. OF SIMPLY Oairr. 2 xX 2 Level A is the ability of adults of mental age a little under 36 months, and so with I.Q.’s of about 20. Level O is approximately the ability of the average graduate of American colleges of high requirements.140 THE MEASUREMENT OF INTELLIGENCE II 1. Read this and then write the answers. Read it again if you need to. COLERIDGE I see thee pine like her in golden story Who, in her prison, woke and saw, one day, The gates thrown open—saw the sunbeams play With only a web ’tween her and summer’s glory; Who, when the web—so frail, so transitory, It broke before her breath—had fallen away, Saw other webs and others rise for aye, Which kept her prisoned till her hair was hoary. Those songs half-sung that yet were all divine— That woke Romance, the queen, to reign afresh— Had been but preludes from that lyre of thine, Could thy rare spirit’s wings have pierced the mesh Spun by the wizard who compels the flesh, But lets the poet see how heav’n can shine. Copy the first word of the line which implies there had not been a con- tinuous stream of like songs. 2. Supply the missing words to make this a true and sensible sentence. Speech, gesture and oom LOM OL human action are in . TUN FOSOLVADIC) cece eerie contraction. 3. Arrange these numbers and signs to form a true equation. PRY CA BY aly aay So much of these unreliabilities as is due to the small number of judges can be reduced to any desired extent by increasing the number of judges. The crude summations of ranks can also be replaced by more precise and refined uses of the differences between the rankings for any two tasks. The general value of the method can, however, be studied well enough for our purposes with the sums of the twenty ranks as they stand. The meaning of these sums of the twenty ranks in terms of the percentage of the judges who judge the direction of the difference correctly may be realized from the following facts:MEASUREMENT OF DIFFICULTY BY EXPERT OPINION 141 Taking 618 pairs of tasks at random from those pairs which differ in the ‘‘sum of the twenty’’ by approximately 100 (95 to 105), we find that, in 263, eleven and a half or fewer of the twenty judges* judged correctly; in 114, twelve judged correctly; and in 241, twelve and a half or more judged correctly. A difference of 100 in the ‘‘sum of the twenty ranks’’ thus corresponds to a percentage of judges a little under 60. Taking 853 pairs of tasks at random from those pairs which differ in the ‘‘sum of the twenty’’ by approximately 200 (195 to 205), we find that, in 404, thirteen or fewer of the judges judged correctly; in 49, thirteen and a half judged correctly; in 400, fourteen or more judged cor- rectly. 4 7 8 20=-+ X 7X 4=20+8 20) @ 1S (ie 2 1S (3 ad) 16. De eS 5. 215 oe ie leet 4. 4... 16-9 a) 18. eee) (0 Oe 19. 1 Age OLD) 0h ee 90. Counting that 25 dozen sheets of paper are worth ten cents, how many sheets of paper are worth a fifth of a cent? Directions and samples the same as on page 161. action 1 play....2 deed.....3 mention....4 opinion......5 crime avarice 1 ordinary.....2 various....3 empress....4 frailty....5 greed bearing 1 a large ring....2 behavior....3 cub....4 commendation 5 destination allusion 1 aria....2 illusion.....3 eulogy......4 dream.....5 reference dynasty 1 davenport.....2 very unpleasant.....3 framework....4 ruling family.....5 engine . habitat 1 dweller.....2 bodice....3 prodigality.....4 habit....5 home adversity 1 ill fortune....2 dialogue.....3 advertisement....4 dislike....5 distemper eaprice 1 value.....2 a star....3 grimace....4 whim.....5 inducement ignominious 1 seductive....2 not guilty....3 incontestable.....4 ignorant.....5 shameful chastity 1 dissension.....2 pursuit....3 eminence.....4 purity....5 punishment In each set of sentences, check the two which mean most nearly the same as the sentence printed in heavy type. 31. What a man has, so much is he sure of. There’s many a slip ’twixt the cup and the lip. He who hesitates is lost. Look before you leap. Bee A bird in the hand is worth two in the bush. 32. Tho the knowledge they (the ancients) have left us be worth our study, yet they exhausted not all its treasures; they left a great deal for the industry and sagacity of after ages.—(Locke.) Worth is wholly dependent on long use. Build the present on a knowledge of the past.174 THE MEASUREMENT OF INTELLIGENCE Ho a Do not neglect the present in admiration of the past. ts There is nothing new under the sun. 33. Cowards die many times before their death.— (Shakespeare. ) ee Fortune favors the brave. as Discretion is the better part of valor. eres The valiant never taste of death but once. Po se They suffer more who fear than they who die. 34. Some books are to be tasted, others to be swal- lowed, and some few to be chewed and digested. —(Bacon.) feel ahs, Reading is profitable to every one. tie t One should read only parts of some books, while others should be carefully studied. et hd Only a few books repay one for painstaking effort. ee People’s tastes differ in books. 35. Write it on your heart that every day is the best day of the year.—(EKmerson.) peta Abink There is no time like the present. ee Never do today what you can put off until to- morrow. ae Anticipation is better than realization. ee ctelbsaxt A common delusion is that the present hour is not the critical, decisive hour. 36. Our virtues disappear when put in competition with our interests.—(La Rochefoucauld.) seal of A dog with a bone knows no friend. eee My teeth are nearer than my kindred. eR Ge Virtue is its own reward. ROE A good friend is my nearest relation. 37. If men wish to be held in esteem, they must asso- ciate with those only who are estimable.—(La Bruyere.) ail Deca What a man does shows what he is. sere You cannot always judge a man by his sur- roundings.LEVELS OF INTELLECT is ORE ere He who comes from the kitchen smells of its smoke. cob Peis i If you always live with those who are lame, you will yourself learn to limp. 38. We too often forget that not only is there a soul of goodness in things evil, but very generally also a soul of truth in things erroneous.—(Spencer.) ees Falsity frequently has a nucleus of reality. oe oneen 8 Beliefs that are shown to be untrue may, never- theless, be based on some element of truth. are ts Benevolence sometimes has evil consequences. aa Evil is commonly due to error. 39. They build too low who build beneath the stars. ciara Not failure, but low aim is crime. Seis Hitch your wagon to a star. rents He that strives to touch a star often stumbles at a straw. peoe wee Wouldst thou reach stars because they shine on thee? The paragraph for task 40 is ‘‘Every Home Needs a Gar- den,’’ on page 165. 40. Copy the four words which most fully state the pur- pose of the X. Y. Z. magazine. THE CoNSTRUCTION OF CoMPOSITE 'T'ASKS With the knowledge gained in the course of our investi- gations, we could now construct composite tasks for use in measuring altitude or intellect which would be much supe- rior to these. But these will serve reasonably well. If we had begun our work with the knowledge which we now have, we should also have proceeded somewhat differ- ently in their construction. The procedures which we did use will consequently be reported here only very briefly. We shall preface them by a description of a more efficient and economical method of construction of such composite tasks, which we recommend for the future.176 THE MEASUREMENT OF INTELLIGENCE It is as follows: Select the special abilities which to- gether constitute the sort of intellect (call it intellect abe . . . n) for which composite tasks are to be constructed. Select a sufficient number of single tasks to provide one hundred for each special ability that is included at each twentieth of the total range of intellect abe ...n from the lowest thousandth of human adults to the highest thou- sandth (or the proper segment of such a collection, if the tasks are to cover only a part of this range). In this selec- tion you trust your own knowledge and judgment. Have twenty or more competent judges rank these tasks for intel- lectual difficulty for the group whose intellect abe ...n you plan to measure by the tasks. Let them use as fine a scale as is convenient up to two hundred compartments, and require the use of approximately the same number of compartments by each judge (say, 150 to 200, or 75 to 100, or 60 to 75, or 45 to 60, or 32 to 45, or 25 to 32, or 18 to 25). Express the results of this consensus by simple summing. Arrange the single tasks in order of difficulty as estimated by the consensus, and in series representing each the same special ability (unless some better way is found to insure that persons to be tested understand the general nature of the tasks, and do not fail because of misunderstanding directions). Test with a cross-section of these tasks from fifteen hun- dred to twenty-five hundred individuals, taking about two hundred from each of ten groups selected to represent dif- ferent altitudes of intellect abe ...n, such as, college graduates, pupils in grade 12, pupils in grade 9, . . . adults of mental age 4. Let the tasks used always begin at a point where 95% of the group of two hundred can succeed with at least four out of five of the tasks. Be sure that each indi- vidual has sufficient time. It will be found most convenient to have each individual in the group attempt all of the tasks used with that group. Enter the score as ¢, x, or — (correct, wrong or omitted) for each individual in each group for each task. Find theLEVELS OF INTELLECT Wee percent of successes for each task in each group. Make up composites containing 2 tasks of a, 2 tasks of b, 2 tasks ofc ... 2 tasks of n, putting in one such composite tasks most nearly alike in difficulty. Call such a composite a 2n- composite. Find the percent of successes for each 2n-com- posite in each group which was tested by all its tasks. Plot the successes and failures’ in each 2n-composite in at least one group* against the total score (number of tasks cor- rect), and compute the overlapping of the failures past the median of the successes in that group. Compute the bi- serial r. Combine the 2n-composites into 4n or 6n or 8n or 10n or 12n composites, using 2n composites which are neighbors in difficulty, and making each composite large enough so that its r,, will be at least .90 for a grade population or other group of approximately the variability of a grade population. How large composites will be needed can be judged from the size of r,; for the 2n composites, the self- correlations of the 2n composites, and the self-correlation of the measure of i. This last’ should be approximately 1.00. The resulting composites should be nearly or quite as satisfactory for measuring intellect abe ...n as the 40- composites described in this chapter are for measuring In- 3 A success in a 2n-composite is a case which has n or more right. A failure is a case which has fewer than n right. 4Use the group which most nearly approximates 50% of successes with the 2n-composites. 5 Let Tes, =the average r from the 2n composites. ‘« ry, =the average self-correlation for a 2n composite. cs M1, = the self-correlation of the measure of i. ‘¢ r.. =the average self-correlation of a composite necessary to produce an r;, of .90. r tl, \ Tix Thy Then .90 = and n, the number of 2n composites necessary to produce a self-correlation of Trtx can be produced from 9 bi sane 8ir,, 1+ @-1)r,178 THE MEASUREMENT OF INTELLIGENCE tellect CAVD. The r,, for any one of them should be very close to 1.00 for all adults, or for any group of the same chronological age. All the tasks in any one of them will be enough alike in difficulty to seem neither much too easy nor much too hard to those for whom the composite as a whole is suitable. The same procedure may be followed in constructing levels for any ability which has what we have termed “‘alti- tude,’’ that is, which has to master tasks varying in diff- culty. The difficulty may be in words that are harder to spell, that is, require a higher altitude of spelling ability for success; or in temptations to dishonesty that are harder to resist, that is, require a higher altitude of honesty to pass; or in hundreds of other sorts of tasks. But wherever the concepts of difficulty and altitude are applicable, this method of constructing measuring instruments is appli- cable. At the outset of our studies, we lacked the knowledge of how often and how far a consensus of expert judges could be trusted in its estimates of intellectual difficulty, and the knowledge of how many single elements are needed to give a reliable measure of intellectual difficulty, and the knowledge of the essential impossibility of measuring the intellectual difficulty of any single small task. So we did not proceed in the way outlined above, but began with single small tasks, estimated their difficulty by the percent of vari- ous groups which succeeded with each, combined these into composites by special abilities, that is, into sets of ten or twenty completions of approximately equal difficulty; sets of ten or twenty arithmetical problems of approximately equal difficulty, and so on. The 40 element composites were made by putting together a 10 completion composite, a 10 arithmetic composite, a 10 word knowledge composite, and a 10 sentence-comprehension composite, which were, as composite tasks, as nearly equal in difficulty as could be found in our material. This method does have the advantage that we have means of conveniently measuring the difficulty of tasks inLEVELS OF INTELLECT 179 these four abilities separately, and have made many such measurements of value (these are reported in Chapter VIII). The disadvantages are that our composite tasks do not represent as narrow segments or slices of difficulty as they might have done; are not spaced apart as evenly as they might have been, and required much more labor in their construction than would have been the case by the other method. We shall describe briefly the derivation of the word- knowledge composites of ten single tasks as a sample to show the nature and validity of the selection and the extent of the experimentation involved. In the case of the others we shall simply present the evidence that the elements of each composite of ten (occasionally fewer), do belong fairly in that rather than in an easier or harder composite. We shall then even more briefly relate samples of the evidence by which these composites of ten were put into composites of forty. Finally we shall state the facts concerning the value of the composites of forty as intellectual tasks the difficulty of which we shall later measure. 10-CompositEs IN WorpD KNOWLEDGE OR V Consider the tasks shown below. Each ‘Level’ or 10- Composite is, by our definition of difficulty, harder than the preceding for such a group as persons twelve to twenty years old or older who have lived in the United States five years or more, since a smaller percentage of them will get five or more of the ten elements right. The difficulty is ‘intellectual’ to the extent that within any sub-group of equal age the greater intellects will show higher percents correct than the smaller intellects in the case of any word. It may seem far-fetched and forced and an unhappy consequence of our definitions to argue thus that it requires more intellect to know such words as cloistered, madrigal and ignominious, than to know such words as confess, ad- vertise and combat. A dull person, it may be said, could learn the former as well as the latter; and it is a matter of180 THE MEASUREMENT OF INTELLIGENCE range rather than level that he does not. There is much foree in this criticism, and we chose the ease of Word Knowledge as one illustration of the measurement of dif- ficulty, in order to state the answer to the criticism. Word Knowledge is representative of many tasks of an informational character where many of the harder tasks might have been in the repertory of the dull so far as the essential difficulty of mastering them is concerned, but sim- ply are not as a matter of observed fact. They are not there because the greater intellect can learn more per unit of time and has learned more at equal age; range is posi- tively correlated with level. Also there is, for any locality and epoch, a certain rough order of acquisition, whereby people usually do not progress to learn certain things until they have learned certain other things. The former are then ‘harder’ by our definition although, if customs had been reversed, they might have been easier. Look at the first word in line 1. Find the other word in the line which means the same or most nearly the same. Write its number on the line at the right side of the page. Do the same in lines 2, 3, 4, etc. Lines A, B, C, and D show the way to do it. Do all the lines you can. Write only one number for each line. A. beast 1 afraid.....2 words.....3 large....4 animal.....5 bird B. baby 1 cradle.....2 mother....3 little child.....4 youth.....5 girl C. raise 1 lift up.....2 drag....3 sun....4 bread.....5 deluge D. blind 1 man....2 cannot see....3 game....4 unhappy.....5 eyes Leven V1 Begin: 1. await 1 pace.....2 slow.....3 wait for.....4 tired....5 quit 2. beautify 1 make beautiful...2 intrude....3 exaggerate....4 insure....5 blessed 3. bug 1 insect.....2 a vehicle....3 fiber.....4 abuse....5 din 4. arrange 1 put in order....2 hasten.....3 distance.....4 frighten.....5 charge 5. different 1 not the same.....2 quarrelsome.....3 better......4 complete.....5 not here 6. cotton 1 eloth.....2 small bed.....3 hut.....4 flour.....5 herd 7. blacken 1 a fern....2 interpose....3 impel....4 make black....5 slack 8. ablaze 1 ostensible....2 on fire....3 slightly.....4 loaf about....5 urbane 9. avenue 1 justice....2 arrival...3 street....4 jury.....5 library 10. bench 1 tool....2 pull ashore.....3 opinion.....4 seat.....5 pond. confess . backward , advertise . combat . blond . broaden . chubby , concern . cargo . clutch . awe . aged _ arrive . blunt . accustom . bade . bog . cascade . bray disembark . conspire . check . cherish . chirrup . accessible . dingy . edible . confound . concur . contact . downeast - pact - audible . Solicitor . beguile - dominate - average . behave - comely - cycle a a a oe a fd ped fed pe tet et ts bd eet Ot ope LEVELS OF INTELLECT 181 LEvEL V2 agree.....2 mend....3 deny.....4 admit....5 mingle downwards.....2 after....3 toward the rear.....4 defense....5 arrears detain.....2 explore....3 give notice of....4 adverse.....5 newspaper fight.....2 dismay.....3 club.....4 expedition.....5 eomb polite.....2 dishonest.....3 dauntless.....4 coy.....5 fair efface.....2 make level....3 elapse....4 embroider.....5 widen indolent.....2 obstinate.....3 irritable.....4 plump.....5 muscular see clearly.....2 engage....3 furnish....4 disturb....5 have to do with load......2 small boat.....3 hem.....4 draught.....5 vehicle exploit......2 nest....3 flit..4 grasp.....5 cane LeveL V3 lamb......2 fear.....3 tool.....4 mound.....5 opera years.....2 active.....3 old...4 mereiful....5 punctual answet......2 rival....3 enter.....4 foree.....5 come dull....2 drowsy.....3 deaf....4 doubtful.....5 ugly disappoint.....2 customary.....3 encounter....4 get used.....5 business gaze.....2 a tool....3 fetched.....4 wait.....5 ordered ebb.....2 disorder.....3 swamp.....4 field...5 difficulty hat.....2 waterfall......3 firmament.....4 disaster.....5 box ery of an ass.....2 bowl....3 ery of an ox....4 frustrate.....5 raven’s ery unearth.....2 go ashore.....3 dislodge.....4 disparage.....5 strip LeveL V4 plot.....2 breathe.....3 rely.....4 die......5 outrun error......2 stop......3 flash......4 rude....5 haste dedicate.....2 happy......3 covet.....4 hold dear.....5 marry aspen.....2 joyful......3 capsize....4 chirp.....5 incite indefatigable.....2 successful.....3 limpid.....4 easy to reach.....5 liable afraid.....2 hostelry.....3 small bell....4 midget.....5 dirty auspicious.....2 eligible.....3 fit to eat....4 sagacious....5 able to speak discovered.....2 fulfill....3 establish....4 mix up.....5 expire agree.....2 race.....3 mongrel.....4 pounce.....5 ramble tactful....2 hate.....3 injunction.....4 touch.....5 oversight LeveL V5 thrown down.....2 neutral.....3 judicious.....4 sad.....5 broken puissance.....2 remonstrance.....3 agreement....4 skillet......5 pressure festive......2 easy......3 audit.....4 heard.....5 downy lawyer.....2 chieftain....3 watchman.....4 maggot.....5 constable entreat.....2 delight.....3 dispense.....4 deceive.....5 foster abide......2 goad.....3 threaten....4 control....5 dissuade level......2 count......3 evident.....4 ordinary.....5 distinct act......2 own......3 keep still.....4 enable.....5 entitle ignoble.....2 handsome.....3 disagreeable......4 enter.....5 in time seythe.....2 eyclone.....3 circle.....4 ode......5 junction182 THE MEASUREMENT OF INTELLIGENCE LEvEL V6 51. action 1 play....2 deed......3 mention....4 opinion...j5 erime = ####| 2 52. avarice 1 ordinary....2 various....3 empress.....4 frailty...5 greed jg. = 53. bearing 1 a large ring.....2 behavior 3 cub.....4 commendation.....5 destination ~aal 54. allusion 1 aria..2 illusion....3 eulogy....4 dream..5 reference go #2 =m 55. dynasty 1 davenpott.....2 very unpleasant......3 framework......4 ruling family... Srengine* See dweller....2 bodice.....3 prodigality....4 habit...6 home = aa ill fortune.....2 dialogue.....3 advertisement....4 dislike...56 distemper == 56. habitat 1 57. adversity 1 58. caprice 1 value....2 a star....3 grimace....4 whim...5 inducement = = aaa 1 1 59. ignominious seductive.....2 not guilty....3 incontestable....4 ignorant.....5 shameful 60. chastity dissension....2 pursuit....3 eminence.....4 purity....5 punishment LrveEL V7 persuade....2 beshrew.....3 deny....4 profit....5 imprint obituary.....2 a poem....3 carousal....4 epigram...5 portrait miniature....2 bunched.....3 arched.....4 malady.....5 secluded =a 61. gainsay 62. eclogue 63. cloistered 64. reciprocal 65. accolade 66. benighted 67. madrigal 68. pinnace 69. broach 70. nectarine saturnine.....2 mutual....3 receptive.....4 morose.....5 careless salutation.....2 anchovy.....3 procession.....4 bivouae....5 acolyte fraudulent....2 weary....3 insuperable....4 ignorant....5 venal song....2 mountebank....3 lunatie....4 ribald.....5 syeophant a boat.....2 doublet.....3 pinnacle....4 hold fast.....5 forfeiture nal dodge....2 clasp...3 open....4 top..5 edify oo 848s (acai >) bouillon.....2 a fruit....3 a jewel....4 a drink....5 diurnal Be HY eee ee ee Intellectual tasks range in this respect between two ex- tremes. At one extreme the tasks are, in and of themselves, almost or quite impossible for the dull person regardless of which things the world tries to teach him. At the other the tasks are such as he can master nearly or quite as easily as he can master any intellectual tasks, the question being rather how many a dull person can master at a given age or with a given set of opportunities. For example, two of our very hard word tasks are: reciprocal saturnine mutual receptive.____ a MOTOSE.............. careless nectarine bonillon-22 == aero a. jewel... = a drinks. 3 diurnal A person twenty years old with a mental age of four not only would not know the meaning of reciprocal, but also probably never could be taught it. The idea involves think-LEVELS OF INTELLECT 183 ing of things by aspects and in relationships in a way that is probably beyond his degree of intellect. He would not, Save in rare instances, know nectarine; but with proper training he could know nectarine instead of some word, say apple, which he does know. Theoretically it is best to measure level or ‘‘altitude’’ of intellect by tasks that lie toward the former extreme; and for practical purposes also, we may, in general, expect better results per hour of time spent from using such. They are likely to involve more of intellect, and to be less adulterated by other influences than intellect, and to be more representative of level and less of width or range.® However, the standard tests used for measuring intelli- gence contain tasks that range far toward the other ex- treme, and it is obviously desirable to measure the diffi- culty of these tasks and ascertain how much of it is due to intellect pure and simple, and how much of it is due to other factors. Word Knowledge is a specially suitable case for study, because it has been approved by Terman as one of the very best single measures of intellect, and is involved to some degree in many of our better tests, such as oral and printed directions, paragraph reading or comprehension, sentence completion, opposites, and other tests of relations pre- sented in words. We began with four hundred words chosen originally to make an instrument for measuring word knowledge with- out regard to the merits or demerits of any one of them as a measure of intellect. The selection amongst these was made solely on grounds of the percentages right in certain groups, the end sought being to have for any one level word-tasks which were ap- proximately equally hard in the sense of being done cor- rectly by approximately equal percents of the group; and 6 These matters will be treated in connection with new experimental data, to be presented in Chapter XV. We shall there see that the theoretical and practical advantages are much less than has been supposed. 14184 THE MEASUREMENT OF INTELLIGENCE to have, at the next higher level, words which were done by fewer of the group. The procedure was as follows: 400 words, ranging from very common words to words outside the first twenty thou- sand as listed in the Thorndike Teachers Word Book, were used in the case of 278 pupils in grade nine. On the basis of the percents correct, 110 of the tasks were chosen, 10 done correctly by 276 or 277 or 99.3 to 99.6% of the pupils 10 << « ‘© 9271 to 273 or 97.1 to 97.8% ‘* ** <5 15‘ « ‘6 957 to 261 or 92.4 to 93.9% ‘* ** <5 15‘ c< ‘¢ 928 to 236 or 82.1 to 84.9% ** ** < 15 «§ e< ‘6 185 to 194 or 66.6 to 69.8% ** “* ¢& aly, GG c< ‘© 134 to 143 or 48.2 to 51.5% ** ** = ¢5 15‘ « ‘<< 79 to 90 or 28.4 to 32.4% ‘* “* <5 15 ¢ c< ‘6 37 to 51 or 13.3 to 18.3% ‘* ** ¢ These 110 tasks were experimented with in the case of 430 pupils in grades 11 or 12, 500 pupils in grades 9 or 10, 250 pupils in grade 84, and 514 pupils in grade 6, and smaller groups of college students. From them were chosen the seven ‘Levels’ of ten tasks each shown above. Levels 1 and 2 were constructed chiefly on the basis of the results with the 514 pupils of grade 6. Levels 3, 4, and 5 were constructed chiefly on the basis of the results with pupils of grades 9 to 12. Levels 6 and 7 were constructed chiefly on the basis of the results with pupils in grades 11, 12, 18, and 17. The tasks within any one level vary in difficulty somewhat widely and it is pos- sible that results from as many thousands as we have hun- dreds might show some tasks in adjacent levels which actu- ally should be transposed. Greater equality within and distinctness between levels could have been attained by reducing the number from ten to eight or fewer, but this did not, on the whole, seem de- sirable. The order of difficulty of these tasks varies so much from group to group, and so enormously from one individual to another that, at levels where a person gets from 20% to 80% right, the percent which an individual has correct from one of our sets of ten is probably a more re-LEVELS OF INTELLECT 185 liable measure of the percent which he would have correct from a hundred tasks each of exactly the same difficulty as the median task of the ten than is the percent which he would have correct of the middle eight of the ten. Diffi- culty is taken in the above to be difficulty for the sort of persons who get about half right at the level in question. The essential facts concerning the percentages correct for each of the 110 tasks are shown in Table 28. TABLE 28 PERCENTS CORRECT FOR EACH SINGLE WorpD or SEVEN 10-Worp CoMPOSITE TASKS IN EACH OF VARIOUS GROUPS OF INDIVIDUALS Grade 6a 8% 9 SO ao 1g 12 City Nie Yer" ONS Ye Max: K K K, K, Number of Individuals 514 250 278 500 430 200 200 i await, ah ee O49 90.0 99.6 92.8 97.9 95.5 96.5 Se beautify. 9453 93.2 99.3 94.6 94,2 97.5 94.0 Gi DUS) eee ee 94.6 94.4 99.6 97.8 99.5 99.5 99.0 (ge .arranoe es 9610 95.6 99.3 97.2 99.3 98.5 98.5 OF different, 94.5 93.0 99.3 97.4 100.0 99.5 96.5 OMe COLLOT Ee 0314 93.2 99.6 96.4 98.8 98.0 96.5 12; blacken ...... mendes 94.4 97.5 98.0 99.3 98.5 98.0 Ve ablaze) ee 89:9 95.6 97.8 94.6 99.3 99.0 97.0 1So avenue... 94:6 93.2 97.8 98.0 99.5 98.5 98.5 aif bench =... ee 9210 90.8 93.5 92.2 96.3 92.0 95.5 Moe CONLOSS) esc 62.4 86.0 93.9 92.2 96.7 99.0 98.0 Zo) backward ==. 70)9 88.4 92.4 87.6 90.5 95.0 94.5 26" advertise 69.0 82.0 93.1 79.6 88.8 89.0 89.0 2SEcombat) a2 59.6 88.4 92.4 89.2 97.4 99.0 99.0 308 blond) a2. Sees 62.4 63.2 92.8 87.2 96.0 97.5 98.0 Silly broadent- 62.9 83.2 93.1 94.6 99.1 97.5 98.0 S28 (chubby, = 64.6 78.8 93. 92.4 95.8 97.5 98.5 Som CONCOMn: 2 eee es 65.1 74.0 93.5 87.6 94.7 97.0 95.5 34 cargo 67.1 89.2 93.9 84.0 89.1 92.5 95.5 Son clutch. ee. 6019 80.4 92.4 89.8 94.0 97.5 97.5 S68 Gwen es. ee 29.4 62.8 82.4 69.0 83.5 89.0 86.0 STP saced! 2 mots 69.6 33.5 73.8 85.8 88.5 90.0 SOF Jarrive: poe 4550) 68.8 83.6 63.8 68.8 73.5 74.5 4.0% sblunte 2 eee: 41.0 66.8 84.5 85.8 92.3 96.5 94.0 41 accustom ........ ee 45.6 62.4 82.4 52.0 68.6 70.0 68.0 A= Dade a. ble 45.1 co 84.9 72.6 82.8 84.5 83.5 43 56.8 84.2 66.6 79.3 87.5 88.0 44 cascade: =... 2 =ts 3919 56.8 82.1 65.6 75.3 87.5 92.5 * Omitted because of a misprint in test.186 on Ww Jad fed et et a Eoocookf OO MMA COM DH HD AAmAMmM ONAN N COoORNDODMANDOCAAMAWHNHONAKEH PF © THE MEASUREMENT OF INTELLIGENCE Grade City Number of Individuals bray disembark conspire COC Ke ere cherish <2-6s chirrup) ACCESSIDIE -....-.-sece1e-- CLT Oy pre reeeers edibles 2 confound concur econtacy: downcast pact audible solicitor beguile dominate average . behave comely cycle action avarice bearing . allusion .. dynasty . habitat adversity caprice ignominious chastity gainsay eclogue cloistered reciprocal accolade benighted madrigal ...... DUNN ACG eee cess proaches a TLOCLATIN) siscsececccrnsrece ate 40.8 21.1 21.3 19.2 5.0 45.0 16.4 "9 9Fi0 11412 Mix. 48. 48. 31. 29 29 32.4 31.7 32.1 32.4 28.8 29.2 17.6 16.9 32.1 30.9 13.3 13.3 14.7 17.6 13.3 18.3 18.3 pe GOss SA’ OD Ww Ww bo Ol K 500 79.4 65.4 61.0 50.6 48.4 66.0 54.8 71.0 57.2 43.2 61.6 57.2 42.4 29.2 38.8 39.8 44.4 42.2 52.1 39.8 39.0 37.6 23.4 31.0 29.0 22.8 23.8 26.0 22.6 21.2 17.6 25.2 18.8 23.8 10.8 11.0 11.8 7.5 8.2 8.4 14.6 6.8 K 12 K, 93.0 89.5 94.5 80.5 72.0 73.0 94.0 93.5 91.5 52.5 83.5 85.0 64.0 17.5 83.0 71.0 47.5 79.0 72.5 70.0 62.5 64.5 46.5 60.5 54.0 43.0 70.5 54.0 67.5 55.0 41.5 64.0 30.0 33.0 31.0 26.0 15.0 16.0 21.0 14.5 39.0 13.5 * Omitted because of a misprint in test. 13.5 34.5 12.0LEVELS OF INTELLECT 187 Ninety tasks were chosen to represent harder words than level 7, and were used with one hundred college gradu- ates. From these ninety, four composites of ten each were chosen to be most alike in difficulty within a ten and most widely apart between tens. These four sets of ten were used with 240 college graduates who were also tested with levels 6 and 7. The results are shown in Table 29. We thus obtain level 8 of about the same difficulty as 7, and levels 9, 10, and 11 progressively harder. These levels from 1 to 11 are competent to measure word knowledge from below the level of the average ten-year-old to far above the level of the average college graduate. Composites la, 2a, 3a, 4a, 5a, 6a, and 7a, of approxi- mately the same difficulty as 1, 2, 3, 4, 5, 6, and 7, were con- structed by testing many pupils in grades 6, 84, 9, 10, 11, 12, and 100 college graduates with composites 1 to 7 and also with 240 new tasks, obtaining the percents succeeding with each of the 310 and selecting sets of ten from the 240 to match sets 1, 2, 3, 4, 5, 6, and 7, respectively. The facts are shown in Tables 30 and 31. At the low end of the ability, the four sets A, B, C, and D shown below were constructed by selection from about twice as many on the basis of trials with 180 individuals 16 years old or older of mental age from 2 to 4. The facts are shown in Table 32. Composites of ten intermediate between D and I were constructed on the basis of the ratings of about 160 single tasks by the consensus of twenty experts, and trials of these with a hundred adults of mental age 6.0 to 7.0, with 50 feeble-minded individuals in the same educational ‘‘class’’ in an institution for the feeble-minded, with 101 pupils fif- teen years old or over in special classes in a large city, and with 162 pupils in grade 4B (second half). The facts con- cerning these word-knowledge tasks appear in Table 33. These composites intermediate in difficulty between V D and V I are imperfect in three respects. The dif- ficulty of each single task element is not determined from188 THE MEASUREMENT OF INTELLIGENCE enough cases. The oral picture selection tests are not equated accurately enough with the oral word-selection tests. The difficulty of written word-selection tests has not been equated accurately against the difficulty of the same sort of test given orally. In general, we have devoted most of our work in the preparation of composite tasks to making effective instru- ments to measure altitude of Intellect CAVD from an alti- tude corresponding roughly to a mental age of ten up to very high levels. Our work with composites at lower levels has been aimed first at demonstrating that Intellect CAVD can be measured at the altitude of low imbecility, and that we can, subject to certain limitations, locate an absolute zero point for intellect and so, by later studies which will bridge the interval between imbecility and our level I, at- tach approximate absolute values to all the levels. We have not been able to give adequate attention to the construction of CAVD composites to bridge this interval and our com- posites between D and I are not so well made as the easier and harder ones. LEVEL 1A Begin: 1. boyhood 1 childhood....2 mischief....3 hardihood.....4 eap.....5 cherub 2. churchman 1 janitor....2 member of a church.....3 elector....4 disciple....5 steeplejack 3. boyish 1 naughty.....2 male.....3 impudent.....4 like a boy.....5 informal 4. cocoa 1 chocolate....2 a drug....3 chrysalis.....4 biscuit......5 trivial 5. bottomless 1 artless.....2 deeper....3 unreasonable.....4 ultimate.....5 without bottom 6. assistant 1 orator.....2 perseverant....3 progressive.....4 at hand.....5 helper 7. chauffeur 1 carter....2 stove....3 hot water....4 coachman.....5 automobile driver 8. dine 1 sprawl....2 visit....3 make a noise....4 have dinner.....5 bespeak 9. blouse 1 whisk.....2 storm.....3 below....4 pouch.....5 waist 10. cafe 1 chaperon.....2 theater....3 restaurant.....4 flask.....5 festivity LEVEL 2a 11. dandruff 12. abashed 13. bethink dream.....2 molest....3 forget.....4 ascertain....5 call to mind 1 ruffle....2 seamp.....3 bald....4 dastard....5 disease of the scalp 1 1 14. comical 1 funny.....2 coming....3 placid......4 typical.....5 alert 1 1 1 ashamed....2 overpowered.....3 overlooked.....4 bruised....5 lowered excuse.....2 verdict...3 tribulation.....4 conclusion....5 disease held fast.....2 part of a wheel....3 stung....4 part....5 nestled among.....2 drenched.....3 middle.....4 lost.....5 partly 15. apology 16. clung 17. amidstbaste eauseless aster ballot rinse barge acquit cambric brawn appreciation alliance deceiver calculate childlike betwixt arafty outstrip available certify annihilate contentedly carcass console amen brawl debase adventurous adequate amiable ally benefactor bethought aperture ascribe default apparition appliance churlish sexton buckler animosity conflagration confidential e a ee H Se ee Be Re eH SRO CE EE oe ar | LEVELS OF INTELLECT 189 sew.....2 list......3 calico.....4 wallow.....5 dump eventual.....2 without reason.....3 ineffective.....4 highway.....5 faultless flower.....2 bitter......3 matin.....4 star....5 guilder LEVEL 3A song.....2 vote.....3 ammunition......4 dance.....5 award seald.....2 wash.....3 smear....4 wrench.....5 grin seaport.....2 knock.....3 tonnage.....4 expansive.....5 boat do.....2 free of blame....3 leave.....4 aquatie.....5 pipe brittle.....2 linen.....3 moccasin.....4 leather. strength.....2 brood.....3 brine.....4 burnt forbearance.....2 accomplishment.....3 5 sermon 2 enchantment.....3 slander detective 3 spy.....4 cavalier 3 plaster LEVEL 44 3 foolish 5 erochet 5 bolster speech.....4 sympathetic recognition...... league 4 hypocrisy.....5 assembly 2 illusion 5 cheat marvel.....2 administer 4 reckon......5 convene innocent......2 saucy 4 piteous.....5 affectionate confused.....2 braided.....3 between.....4 bewitched.....5 pinched meager.....2 difficult....3 adjacent....4 sly....5 artistic subside.....2 outer edge.....3 outskirt....4 satiate....5 out-run hidden.....2 at hand.....3 economical....4 lamentable.....5 useful exhort.....2 ascertain.....3 boast....4 fuse....5 assure dead.....2 crucify.....3 enamor....4 nihilist....5 destroy fully......2 heretofore....3 without a stop....4 cheerfully.....5 massy mold.....2 body 4 rind alone......2 3 cargo 3 visit 5 hold of a ship 4 thin sole LEVEL 5a 3 proverb....4 farewell 3 hoot.....4 quarrel 3 chastise 3 bold conscientious qualify 5 soothe so be it pouch.....2 roast degrade.....2 base clamorous.. 2 hymn 5 communion 5 le at length 4 blaspheme 4 travel 5 unfounded 2 casual capricious......2 tractable league......2 patron 5 advancing 4 added 4 pleasing 4 factor 4 sexton....5 advantage 3 enough 2 trusty.....3 passionate associate 2 churchman 5 water supply 5 odious 3 council 5 navigator 3 tourist perhaps.....2 credulous......3 forget....4 bewildered.....5 considered through....2 precipice.....3 opening....4 raiment....5 opportunity LEVEL 6A attribute.....2 pertain.....3 clerk.....4 write....5 upbraid defeat.....2 blame.....3 failure....4 libel...5 displace ghost.....2 insurrection.....3 apparent.....4 farce....5 apparel request.....2 adjustment.....3 conformity.....4 device.....5 pliant craven.....2 rude.....3 reckless.....4 contemptible.....5 envious cube.....2 janitor....3 compass.....4 archbishop....5 six singers keel....2. servant.....3 stag.....4 shield.....5 seraper hatred.....2 animation carnival....2 3 disobedience celebration respectable......2 4 diversity......5 friendship 3 decoration with flags....4 contagion 3 sensitive....4 secret....5 confident 5 fire secure bo DO bo PO dO LO : oo Qo i) Oo to or D ~“ sent wei 30 aero: nen en dD A a0 nO 37 ed 8 oOo, 40 00 51 52 53 54. cr on d8 xo sense 58 59 2200190 THE MEASUREMENT OF INTELLIGENCE LEVEL 7A searchev.....2 forger....3 chaplain.....4 clerk.....5 sceptic cup.....2 binnacle.....3 beak.....4 slanderer.....5 bottle populate.....2 free....3 prominent.....4 rival....5 come pier......2 coach.....3 postern.....4 gable...5 headdress jubilant.....2 bitter....3 maritime....4 ungracious....5 purple 1 serivener 2. beaker 63. emanate 64. landau . amaranthine 6. athwart 67. conscientious 68. ingenuous 9. betimes 0. lambrequin O1 alongside......2 above.....3 alert.....4 across......5 thwarted guilty......2 cautious.....3 efficient....4 good....5 knowing ungenerous......2 unselfish.....3 dull...4 frank....5 unthinking hereby......2 sometimes.....3 meantime....4 early.....5 now and then knapsack.....2 drapery.....3 raw wool....4 matting....5 chandelier BeBe ee ee TABLE 29 PERMILLES CORRECT IN THE SINGLE TASKS oF WorpD KNOWLEDGE 10—Composite Tasks 8, 9, 10 and 11 T.C. Grad. L. Grad. T.C. Grad. L. Grad. n=100 n=240 n=100 n=240 8 10 1. monomania 550 392 1. shrievalty 250 283 2. saturnalian .......... 520 375 2. sessile 210 179 3: pristine <.......... “510 42] 3. teleological 210 221 4. quaternion . 540 346 4. peccancy 210 358 Oo. predatory ............ 620 571 5. cacophony 240 413 6. persiflage ............. 500 521 6. pediment 250 254 7. encomium .............. 480 600 7. licentiate “i 190 154 Sreapattoir.-- == 480 613 8. ambulatory = ea 317 9. meticulous ..... ey oL0 658 9. murrain ot 200 133 10° largess 2: 500 429 10. cantilens: =. 230 288 9 11 radial =a 400 408 1. saltatory ........... 7 L990 121 2. sequestrate .......... 350 529 2. amerce am VEO 154 S.tactility: —-. ~ 360 204 3. Gistrains =<... 130 458 ARADO CCH ee 320 363 4, besomy eos “090 154 D. nugatory on 320 525 a. Thodolite...- 090 138 6. sedulous ............... 350 363 GCSrInG eee = 30 112 iemumbel’ oon s. 2. (850 129 7. hermeneutic ....... 100 021 8. asseveration ....... 340 254 8. devolution ......... 070 046 Peapjure «= B40 342 9. palindromic ........ 100 112 10° auricular ....... 320 321 10. carmagnole ....... 120 120 SanaaEEe eeLEVELS OF INTELLECT 191 TABLE 30. PERMILLES CORRECT FOR EACH SINGLE WORD OF THE SEVEN 10-WorpD CoMposItTE TASKS 1A, 2A, 3A, 44, 54, 6A, 7A IN EACH OF VARIOUS GROUPS OF INDIVIDUALS. 6b 6e 81% 9k 10k 11k 12k n=139 n=105 250 306 311 224 195 ee DOYDOOd: seen 777 990 904 917 927 933 933 AeeCOULCHMANY sence. 820 971 928 933 960 964 970 SE eDOVISH 22 ees 77 942 932 891 940 946 964 Ae COCOR, eric 805 990 848 911 921 937 949 5 bottomless aes 683 933 935 968 982 982 979 Nee OSSIStant, ee 640 895 952 952 960 982 985 OF Cha uthour® ncesisscssescescd 604 942 976 968 976 991 979 10 dine ime ees ee 626 933 952 952 972 996 990 13 blouse ees ae 604 914 956 968 966 996 990 15 cafe bene 546 933 932 981 976 991 990 Die Gandrun, 2. 590 790 896 965 969 991 990 16 abashed . see 661 628 752 757 828 812 872 17 bethink Siac 460 752 892 863 886 875 923 ree COMICAL ec. cerca 554 809 952 964 985 996 995 23 apology 446 834 964 912 921 937 954 24 clung 496 866 928 967 966 991 970 31 amidst Re ee 446 781 856 843 892 914 923 32 baste Leeds 446 743 640 824 857 914 913 34 causeless Pe 410 790 820 819 914 954 970 39 aster ee 417 657 532 889 950 946 659 Bow Dallot <2. S 424 514 756 tie 824 825 816 OOM TINS) ects cc 388 581 676 637 683 749 852 AI DAYPe) jee 395 638 836 752 737 888 831 OM ACQUIG Gee en 453 457 724 523 647 852 841 47 cambric oe 460 343 504 706 747 861 887 58 brawn sau cae 316 A486 736 569 700 834 846 59 appreciation ........... 374 571 708 676 786 847 821 Gilmvalliance) 22200005. 244 447 728 667 728 830 826 64 deceiver i ok 252 609 732 775 721 812 826 86— calenlate: 2-4-2 093 371 720 598 728 843 836 36 childlike Sots eet 496 324 356 500 528 602 718 26) betwixtt soe. ee 230 343 464 572 631 772 785 Dee Cratty qe oo 273 457 752 542 583 669 657 GOR Outstripi. 244 324 660 539 670 727 713 fe available: 4.3. 173 257 504 494 715 852 852192 THE MEASUREMENT OF INTELLIGENCE TABLE 30—Continued. 6b 6¢e 8% 9k 10k b 11k 12k n-139 n=105 520 306 311 224 195 Prices 108 447 532 548 570° 683° San 78 penitilate 201 171 507 667 825 841 80 contentedly 302 486 588 549 686 754 785 94 carcass Beis 144 352 580 549 615 731 881 113 console 065 228 556 509 663 785 821 50 amen 237 466 564 425 480 629 559 54 brawl 266 305 568 350 441 598 636 79 debase : 273 267 432 294 438 665 682 84 adventurous 122 257 432 399 486 500 584 89 adequate 187 114 336 363 425 598 657 93 amiable 209 238 448 363 460 624 667 100 ally 201 219 416 366 44] 611 672 103 benefactor 173 314 384 355 409 558 652 108 bethought 137 152 520 359 502 549 616 109 aperture 093 133 416 342 460 566 616 63 ascribe E 345 324 256 275 316 31 416 69 default es 108 219 256 271 316 352 390 85 apparition 151 124 360 164 219 415 605 88 appliance Ss 165 162 224 157 267 406 498 LOW churlish) 2.3 230 162 292 229 283 312 359 107 sexton 216 162 300 228 332 379 462 112 buckler é 165 228 220 211 267 526 374 125 animosity ........ 065 124 364 176 264 388 482 137 conflagration ............ 022 048 260 160 293 459 451 138 confidential . Z 124 057 216 121 216 357 457 C53 scrivener ....... 058 146 158 185 C73 beaker . 106 158 231 431 C76 emanate ....... 067 091 098 154 OW9 landan = 8. 080 101 133 190 C83 amaranthine ........ 102 126 150 159 OiSSathwart ees 067 101 197 113 C89 conscientious . 061 126 115 195 @:90 ingenuous ................... 128 154 171 195 ©'93 betimes ................ é 054 032 051 082 C95 lambrequin re 096 066 098 149 “ Omitted because of misprint in test.LEVELS OF INTELLECT 193 THE CONSTRUCTION OF 10-COMPOSITE TASKS IN SENTENCE COM- PLETION, ARITHMETICAL PROBLEMS, AND THE UNDER- STANDING OF SENTENCES AND PARAGRAPHS The 10-composites for C, A, and D were constructed by the process of trying many single tasks with various groups and selecting tasks of similar difficulty, which has been de- scribed and illustrated in the case of V. Only the main re- sults will be presented here. They are in the form of tables TABLE 31. PERMILLES OBTAINING FIVE oR MorRE RIGHT OUT OF TEN IN THE VOCABULARY CoMPosITEs 1, 14, 2, 24, 3, 3A, ETC. Grade dS 8% 9 10 11 12 INER Ys NEYe K K K K ni 148 250 1089 723 769 643 Wale hens. 993 980 993 996 997 994 WVU cccccscassclit eee 993 976 993 993 999 997 AV es cssncaossstc : 905 920 989 989 996 994 V 2a 959 972 995 991 999 1000 VE. 3 615 740 913 924 967 975 V 3a 601 864 914 929 977 987 VO c 645 749 801 936 946 We 4a. .... si s 618 764 839 931 962 NV I ers hccccctca ticste 440 428 560 748 824 Wa OEY coed ee saci 448 473 604 801 846 WMO Gc S oi. 152 129 290 473 560 V 6a 236 183 299 480 593 AVA cash ica a, 044 017 030 061 107 AVANT, coe oe: 060 017 031 104 137 giving the percent of successes for each single task in each group. The constitution of the group sometimes varies within a table, because sometimes in a certain group some tasks would be assigned to only a part of the group. Where this is the case, the fact is noted by printing the new in the body of the table. The ~ at the top of a column applies to all entries in that column unless a second n appears in194 THE MEASUREMENT OF INTELLIGENCE the column. If a second » appears in the column, it applies to all entries below it unless a third appears; and so on. Percents are strictly comparable only where they are for the same v. The sentence-completion 10-composites are A, B, C, D, BG sid), 1, a, M,N, O, P, and Q. The main facts con- cerning these are shown in Tables 34, 35, and 36. We also TABLE °32. PERCENTS CORRECT IN THE SINGLE TASKS OF WorD KNOWLEDGE: COMPOSITE AND D. 180 ApuLT IMBECILES Ny Tasks A, B, C n= 100 n= 80 n=100 n=80 AS 76 81 Ore 26 36 2 71 79 2 25 15 3 74 75 3 24 31 4 76 80 + 24 49 5 76 89 5 23 19 6 73 85 6 22 221% 7 72 73 7 21 30 8 72 79 8 20 24 9 67 77% 9 21 25 10 67 76 10 21 30 133 al 48 55 iD) al 1 21 2 49 52% 2 15 9 3 51 61 3 11 9 + 46 62% 4 14 16 5 44 42% 5 15 12% 6 47 54 6 12% 7 40 36 7 14 17% 8 43 50 8 9 7% 9 4] 571% 9 6 6 10 39 56 10 + 12% have certain provisional completion 10-composites I-J and R which will be useful until better ones are constructed. Some of these composites and also some of the arith- metic and directions composites to be presently described could probably be improved by transfers of some elements. We have not made these transfers, because the gain would not be great and the labor of recomputing the composite-LEVELS OF INTELLECT 195 TABLE 33. PERCENTS CORRECT IN THE SINGLE TASKS oF WorD KNOWLEDGE E, F, G, AND H. n=100 n=50 n= 101 n=162 Spec. 4B VE 1 gasoline (picture selection) 59 100 _—— _—- 2 erayon as is 51 64 3 tresses os te 55 20 4 refrigerator ‘‘ ~- OAS a | ~ cy oO 68 27 93 100.0 88 87.7 95.1 95 80 27 89 97.5 84 96.9 oI te 82.7 67 50 © 84.0 80.2 90.7 82.7 75.3 86 92 85 85 68 94 70 82 19 29 12 17 ~ONRO = 97.5 98.8 98.1 100.0 98 91 92 96 86 78 82 84 35 22 22 23 ~OAO r= 65 80.2 87.7 92.6 82.1 64 82 80 76 37 27 34 25 ~l198 TABLE 35 (continued). M. A. 6 M. A. 6 M. A.6 4B Spec. FM. 4B F.M. Spec. 100 162 101 100 9 “ 50 101 16 n=100 MEASUREMENT OF OY tH © 69 48 DG 5 92.0 6: 7 87.7 38 47 ‘ t 10 76.5 62 57.4 13 11 21 = co é 94.4 90 76 60 44 2 é © 66.0 © re ~ 94.4 85 93 67.9 90.7 0.0 63.0 59.3 75 52 54 50 ec a © 10 10 67.9 36 WOr~ewADAS re INTELLIGENCE oo S © 45 42 34 79.6 DH 6 56 11 A H 54 2° © © w “wo 80.2 64 34 38 92.0 58. 49 86.4 uw 60 54 46 48 40.1 47 38 42 St xt © 93.2 54 48 40 48.1 51 81.5 45 59.9 30 10 10LEVELS OF INTELLECT TABLE 36. THE PERMILLES SUCCEEDING WITH EACH SINGLE TASK OF VARIOUS 10-COMPOSITES OF SENTENCE COMPLETIONS. 6 (1) 6172). 6'(3) 6 (4) 199 5% 8% 17 or + n= 205 250 61 100 107 140 60 Crt 1 654 787 490 738 650 933 2 654 672 610 682 593 950 3 693 443 530 748 564 900 4 746 812 607 460 757 578 1000 5 634 804 639 380 682 564 950 6 634 824 623 460 626 585 800 n=162 n =80 iio n=104 7 649 840 667 362 787 510 933 8 639 796 543 500 773 779 983 9 580 800 617 413 667 519 891 10 “ol 832 630 587 667 693 983 CJ 1 541 580 630 437 720 712 941 n= 4 Ome N52, ni—)140F in —59 2 605 696 531 692 690 893 N= 375) ni Liem Sor ness 3 498 780 501 429 613 564 783 4 532 776 481 350 600 539 983 5 493 704 925 6 341 736 443 260 495 436 958 a 463 664 377 290 542 364 717 8 358 708 278 150 547 443 891 9 294 681 279 280 402 450 900 10 206 673 352 79 414 314 925 CK n=116 n= 59 1 376 364 379 890 n= 1845 1:—49 N00) tl a6 2 225 432 386 122 547 238 825 3 270 644 288 41 679 262 867 4 196 336 255 184 453 333 917 n=49 Ni ono 5 377 564 61 442 155 861 6 279 568 7 353 580 123 250 293 202 808 8 225 336 185 137 280 192 683 9 152 476 246 110 299 150 983 10 191 476 180 90 280 186 908200 THE MEASUREMENT OF INTELLIGENCE TABLE 36—Continued. CL This is the least satisfactory 10-composite. It was used because, as a com- posite, it filled a certain place. The 10 single tasks in order showed percents correct of 60, 59, 60, 54, 4214, 41, 15%, 11, 5% and 1% in a group of 200 pupils in grade 9. CM The 10 single tasks in order showed percents correct in a group of 200 pupils in grade 9 of 22, 20, 36, 38, 30, 25, 28, 20, 2614, and 26. zs — ———— Grade 5% 8% NS(1) NS(2) NS NS 17or+ 17or+ 17or+ S.Sch. 10,11, 12 ons 50. 100. 100. 135 “87. (GOP 28. salva 835 82 CN 1 020 200 470 530 482 678 830 821 882 857 350 2 083 160 690 530 467 609 817 821 824 771 386 3 005 080 530 420 297 483 830 964 882 886 446 4 167 240 600 570 526 713 733 786 647 914 349 5 010 108 410 470 341 540 667 964 647 829 277 6 020 164 450 500 356 506 770 893 706 714 578 7 010 92 540 550 400 575 746 893 647 857 602 8 025 100 420 400 259 506 627 964 824 600 747 9 034 192 380 580 326 609 686 893 824 771 482 10 108 148 680 710 511 759 885 893 882 943 482 CO 1 090 160 126 253 600 2 290 220 200 264 577 3 430 380 363 391 551 4 1108 1508 097 172) 7438 5 280 210 297 368 442 6 260 330 259 448 596 7 250 230 297 253 619 8 410 490 445 586 636 9 350 430 304 425 593 10 210 190 208 345 610 cP 1 040 020 067 000 2 150 170 193 207 3 050 040 030 034 4 100 060 059 161 5 100 050 082 149LEVELS OF INTELLECT 201 TABLE 36—Continued. Grade 5% 8% NS(1) NS(2) NS NS 17or+ 17or+ 170r+ S.Sch. 10,11,12 n= 9505 520 100 100) 135) 87 160 28 17 35 82 6 120 090 119 138 7 260 240 297 310 8 100 100 082 149 9 030 090 052 080 10 110 LOO} ETE 057, CQ 1 080 080 082 115 700° 353) 171 2 000 050 015 000 429 235 114 3 000 020 008 023 O71 765 229 4 030 030 015 023 500 765 257 5 020 040 030 023 500 824 257 6 000 040 000 000 ACO OSS a 7 7 020 020 015 O11 536 «©6529 =. 229 8 020 030 000 000 607 471 229 3 000 010 000 000 AL YAN alsa 10 000 010 000 000 393 706 086 CR 1 000 000 000 000 321 2 000 000 000 000 464 3 000 010 000 000 429 4 000 000 000 000 250 5 000 000 000 000 500 6 000 000 000 000 250 7 000 000 000 000 393 8 000 000 000 000 250 9 000 000 000 000 286 10 000 000 000 000 536 when they have been secured, it is especially hard to obtain time enough to exhaust abilities in arithmetical problems. A patient adult may work for half an hour at a single prob- lem. We fear that the tasks which happen to come late in the series as first printed showed fewer successes in our returns than they would have shown if they had been at- tempted first. In general, we feel less security that the per- cents of successes correspond closely with degrees of dif- ficulty in the case of the arithmetical problems than in the202 THE MEASUREMENT OF INTELLIGENCE TABLE 37. THE PERMILLES SUCCEEDING WITH Eacu SINGLE TASK OF VARIOUS 10-CoMPOSITES or ARITHMETICAL PROBLEMS. Al Grade Sp Sp 514%, 8% 9I 911 ni 50 52 189 126 246 264 pads ae ee a 1 260 346 751 2 340 192 682 3 340 288 661 4 340 404 762 5 380 308 857 6 620 673 831 7 300 365 897 8 440 423 788 9 2980 250 815 10 140 135 656 AJ 1 B02 nneenn 646 792 2 933) == 626 739 3 296 818 672 708 4 545 778 846 799 5 917 627 512 538 6 370 690 667 633 7 217 643 715 610 8 344 603 768 527 9 217 540 732 564 10 328 611 821 652 A K 1 317 532 2 143 643 3 iiyfsy GYAl 4 307 603 5 206 524 6 058 429 7 228 540 8 196 587 9 139 579 10 105 341LEVELS OF INTELLECT 203 TABLE 37—(Continued). Grade 5% 8% 91 9II NS(1)NS(2) NS NS 17 ai 189 250 246 264 100 100 135 87 240 AL 1 296 544 2 296 484 3 190 448 4 206 424 5 185 396 6 153 416 7 238 516 8 148 436 9 127 376 10 201 484 AM 1 059 174 680 570 437 563 2 093 246 730 650 481 598 3 102 220 700 650 452 644 4 089 133 720 650 496 621 5 065 140 680 630 504 736 oO oO oO ow 171 580 430 341 609 129 650 600 407 701 057 095 600 510 348 667 ~] oS | “I AN 1 630) 480) 252.) 391e 762 2 400 390 185 322 725 3 420 290 185 379 762 4 310 260 222 230 662 5 A470) 370) 916>) *Sosmese 6 530 420) 9378), DL S20 7 470 450 326 437 683 8 410 480° 326 471 817 9 340 300 178 3810 742 10 840 260 200 379 712204 THE MEASUREMENT OF INTELLIGENCE TABLE 37—(Continued). Grade NS(1) NS(2) NS NS 17 n= 100 100 135 87 240 120 140 104 161 792 120: 130 “08t -2308ai7o 130 090 104 149 642 030 040 022 034 467 000 040 22 057 421 or © DN RF 030 040 022 115 679 020 030 067 092 671 010 040 044 057 642 020 040 052 057 700 10 020 050 059 092 600 © con 546 617 400 504 650 oe © be 579 629 562 612 10 675 oon an AQ 519 343 423 502 218 or w bd Fe ease of C or V or D. However, the errors in this respect are probably very small in comparison with the difference in difficulty from Arithmetic A to Arithmetic P. The main facts for the arithmetical 10-composites A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, and Q are given in Tables 34, 35, and 37. We have also certain provisionalTHE PERCENTS SUCCEEDING WITH EAcH SINGLE TASK OF VARIOUS 10-COMPOSITES OF LEVELS OF INTELLECT TABLE 38. DIRECTIONS AND READINGS. 205 ude 4 5 Ad 91 oTr ~=§°10) 11) 11) LEG) 11 E) 12) 12:2); 12.3) rAd feei62 311 44, 246 236 100) 100 100° 100 63 100 100 84 44 D% pone wit 875.0 95.1 96 100 96 97 98.4 26.5 46.0 63.6 91.9 95 99 92 97 96.8 5.6 20.3 56.8 81.3 96 98 96 99 100.0 Bao) 50:2 77.3 91.5 92 oe 98 94 98.4 De mee ole «6865.9 88.2 85 84 94 84 88.9 56.2 62.4 65.9 80.9 91 94 89 94 90.5 62 113 38.6 82.9 89 95 90 88 87.3 aa) 3.9 6289.1 72.0 92 92 92 89 96.8 40.1 47.6 63.6 82.9 91 95 94 91 98.4 26.5 32.5 52.3 65.0 82 85 93 86 98.4 24.1 35.7 34.1 76.4 92 90 90 80 96.8 ara 60-1 65.9 91.9 95 96 93 86 95.2 Gar 17.7 50:0 79.4 83 98 93 90 94.0 46.9 55.3 47.7 58.9 81 94 91 87 90.5 D2 17.3 20.3 50.0 62.2 52.6 75 82 81 79 «79.4 5.6 8.7 38:6 47:6 45.8 69 85 83 82 81.0 O90 27. 52.3 57.7 53.8 68 88 77 Uo oe: 13.6 O16 4727 (41:9 39:0) 70 79 7 82 79.4 9.6 15.4 45.5 42.7 79 78 85 (3) (88:9 moe 23.) 47.7 42.7 vy 83 77 75 74.6 4.3 16.1 52.3 78.5 74 84 76 75 «684.1 19.8 30.2 43.2 65.4 78 78 83 76 =©81.0 9.3 22.2 40.9 69.5 7 83 72 78 76.2 16.7 164 40.9 44.3 66 77 77 76 76.2 D 2% 8.6 13.8 52.4 48.3 74 72 78 (3 (GE: 72 OS ale ef aye: 32.5 38.6 64 77 72 7 77.8 66 69 79.8 15.4 19.6 58.1 58.1 #63 76 76 68 73.0 85 69 76.2 3.7 3.9 39.4 24.2 66 71 74 72 69.8 79 76 85.7 93 15.8 59.3 cal 72 68 78 82.5 76 68 75.0206 THE MEASUREMENT OF INTELLIGENCE TABLE 38 (continued). Grade 4 5 Ad 9I 9QIL 10 11(1) 11(2) 11(3) 11(4) 12(1) 12(2) 12(8) mises) ll 4644 946 236 100 100 100 100 «63 91007 GlOOMaeS 6 53.8 67 77 74 64 67.7 81 79 83.3 7 20.8 56 71 71 72 69.2 68 69 60.7 8 42.0 54 62 73 64 69.2 68 67 70.2 9 33:3 61 66 64 68 69.2 68 68 70.2 10 34.1 48 63 66 61 60.0 71 67 61.9 D3 1 Astle 8777) 67, 66 65 57 55.6 70 73 61.9 2 S187) 9337 164 73 71 65 77.8 69 66 66.7 3 40.7 22.5 64 67 66 54 58.7 60 61 63.1 4 46.7 42.0 62 64 68 63 63.5 76 78 65.5 5 48.8 38.6 65 62 65 62 69.8 74 4 OT 6 43.5 42.8 60 64 69 66 65.1 72 80 71.4 7 49.6 39.4 65 64 69 66 61.9 67 74 73.8 8 92.4 14.0 69 64 64 67 74.6 63 71° 68:1 9 39.8 56 69 70 65 77.8 79 TOTS 10 air 17:0 59 71 72 67% 762 74 64 72.7 D 4 1 38.6 30.5 44 41 54 55 «57.1 52 53 47.6 2 ate 288) | 58 50 52 51 55.6 48 68 47.6 3 AONB, O84) 745 46 49 By Gy/al 58 A3bSs3 4 OF 19:9) 72:7 49 60 56 58:7 65 59 65.5 5 42.3 25.0 47 49 44 54 52.4 49 48 47.6 6 36.2 25.4 43 55 51 538 «52.4 56 56 54.8 7 Ape O71 6 46 54 51 54 44.4 66 64 65.5 8 40.2 25.9 43 55 48 54 44.4 61 52 60.7 9 19:5) 15:3 40 45 52 52 50.8 60 45 61.9 10 32:0 16 41 49 43 42 42.9 41 47 38.1 D 414 1 Al (49 "40. 46 60'3). "56, . aie oo 2 51 58 45 AS) 47,7 54 45 51.2 3 63° § 50) 52) ae) 4717 67 59 58.3 4 47 68 57 51 58.5 65 60 67.9 5 42 59 62 61 56.9 63 62 54.8LEVELS OF INTELLECT 207 TABLE 38 (continued). 9 10 11(1) 11(2) 11(3) 11(4) 12(1) 12(2) 12(3) 286 100 100) 100.100 63 100) 1007 Ss 42 67 57 64 52.3 60 76627, 30 52 39 37 646.2 55 49 46.4 33 62 48 37) 43.1 59 51 55.9 47 49 52 61 63.1 52 44 48.8 47 50 59 53 61.5 49 58 59.5 44 46 33 40 49.2 54 46 55.9 32 39 39 37 =. 25.4 34 32 32.1 43 47 35 47 46.0 59 52 64.3 31 43 39 38 36.5 43 41 46.4 27 30 35 32 33.3 53 35 42.9 27 37 31 34 34.9 38 22 44.0 28 40 37 25 49.2 45 41 40.5 25 43 30 46 46.0 38 41 38.1 26 42 41 37 636.9 47 37 44.0 41 30 38 41 33.8 33 33 40.5 D 6 16 17 24 16 14.3 25 22 28.6 17 21 17 18 23.8 21 16 21.4 30 26 21 21 32.3 27 24 23.9 08 16 17 09 15.4 17 08 13.1 19 21 20 22 16.9 17 24 21.4 20 24 17 05 26.2 27 18 28.6 D7 02 00 03 02 0.15 00 03 02.4 02 12 06 10 40.0 15 21S 02 03 03 05 06.1 11 01 04.8 00 03 03 OU 01Ko 16 07 04.8 09 08 05 04 16.9 13 10 09.5 05 05 08 08 12.3 16 09 09.5208 THE MEASUREMENT OF INTELLIGENCE TABLE 38 (continued). neshaby WIH@y) RMSE BLE 100 100 135 87 D 5% 1 31 37 27.4 46.0 2 27 34 26.7 29.9 3 52 44 40.7 39.1 4 54 49 34.8 43.7 5 20 40 27.4 23.0 6 37 46 38.5 28.7 7 42 42 34.8 31.0 8 41 45 34.1 36.8 9 47 40 39.3 26.4 10 31 26 25.9 19.5 D 6% 1 29 31 23.7 23.0 2 10 9 5.9 5.1 3 24 21 18.5 17.2 = 8 8 3.0 5.7 5 7 10 10.4 6.9 6 32 27 23.7 11.5 arithmetical 10-composites, I J, J I, K I, and LI, which will be useful. In the case of comprehension of directions and para- graphs, we have composites of from six to ten elements Ay B, C, D, K, F, G, 1, 1}, 2, 24, 3, 3k; 4, 43, Ds Od, 6, 63, and F and have provisional composites H and 3. The facts con- cerning the difficulty of the constituent elements of these composites are given in Tables 34, 35, and 38. THE DIFFICULTY OF THE 10-COMPOSITES After the composites of ten have been obtained for sen- tence completions, arithmetical tasks, and understanding sentences, by such experimentation and selection as has been described for the word-knowledge tasks, the difficulty of each composite in comparison with one or more others was measured in several groups of individuals. As many different composites were used in each group of individualsLEVELS OF INTELLECT 209 as was feasible. It is hard to secure the cooperation of large groups in taking such long examinations as are neces- sary to put a large number of these composites in compari- son for the same group; but in one way or another, we have accumulated a very large body of facts (shown in Tables 39 to 50). We use these 10-composites to make 40-composites each containing 10C, 10A, 10V, and 10D.’ They will also be available for special scales for sentence completion, arith- metical reasoning, vocabulary knowledge, and comprehen- sion of sentences and paragraphs. The arithmetical series, for example, is unquestionably a better instrument for mea- suring arithmetical ability of the problem-solving sort than has hitherto been available. In Tables 39 to 50 there are sometimes two forms of entry: ‘‘%s’’ (percent successes) means the percent of the group in question having 50% or more of the single tasks right; ‘‘o distance’’ means the difference in difficulty be- tween the 10-composite in question and a 10-composite of the same kind which exactly half of the group would suc- ceed with in the sense of having 50% or more of the single tasks correct. o distance is in terms of the mean square deviation of the group in the ability measured by the 10- composite in question. The o distances even for the same group are not then strictly comparable, since the mean square deviation of the group in the ability measured by, say, C I, may not be identical with its mean square devia- tion in the ability measured by C J, or by A I, ete. Two 10-composites of identical o distances will not, however, be far apart in difficulty. Minus (—) means easier than the median difficulty de- fined by the 10-composite which exactly half of the group succeed with; plus (+) means harder than it. The form of distribution is arbitrarily assumed to be ‘‘normal”’ in the case of all the abilities in all the groups. This is often 7In some cases the number is less than 10. Each single task is then given a weight so that a perfect score would count 10.210 THE MEASUREMENT OF INTELLIGENCE erroneous and always doubtful, but will do no harm if its arbitrariness is kept in mind. The o distances are rough approximate measures convenient for comparison. The actual fact is always the %s. TABLE 39. THE DIFFICULTY oF 10-Composite C—A, B, C, anp D; A—A, B, C, ann D; V—A, B, C, anp D; anp D—A, B, C, AND D, IN THE CASE oF 180 ADULT IMBECILES. %s o distance Jos o distance nm 7100) +80 100 80 m 100° 80 100 80 OA 84 8236-119 = 13 Vv Al 80) 8 —-1.03 -1.07 IBs= 60) «06 — 51 — .20 B 49 57144 + .03- — 72s © 35 2746 + .40 + .68 G14" 19 +1.09 + .93 D 3 0 + 1.59 high D 1 5 +1.78 +1.47 Aw AY. 69. 80 — 65 -—1.03 D, A. 90;, 2.86 —-145 -1.27 IB 45 AS + 15 + .03 B 4 671% + 45) 309 Cerio 2 +1.05 + .87 G 19 2734 + 9373 ce D 5 5 +147 +1.47 D 2 14 +116 +41.09 TABLE 40. THe DIFFICULTY oF 10-ComposiITEs C—E, F, G anp I, A~E, F, G ann H, V—E, F, G anp H, anp D—E, F, G anp H, IN VARIOUS GROUPS. (o DISTANCES ARE OMITTED FROM THIS TABLE.) Jos Jos 100 50 101 100 50 101 M.A. 6 Class3 Special M.A.6 Class3 Special Class Class CE 56 94 93 VB a oe 94. FE 25 94 88 F 23 96 96 G 7 42 74 G 15 84 85 1 ae 20 18 H 4 64 62 A } a5 92 100 D E 45 96 98 F 20 90 96 F 23 92 97 G 5 74 93 G 6 58 75 Br ae 88 73 H A 57 The Sentence Completion and Arithmetical tasks were done in June, 1924; the Vocabulary and Directions-Reading tasks were done in September, 1924. We treat them to-LEVELS OF INTELLECT DA gether, since the differences due to an interval of less than three months are small. The following composites had fewer than 10 single tasks: Directions-Reading 4, which had 4, 24, which had 5, and 3, which had 6. 2 right, 3 right, and 3 right are used respectively, in these cases, instead of 5 right. TABLE 41. THE Dirricuuty oF 10-ComposirEs C—F, G, I, J, anp K, A—F, G, H, I, J, AND K, V—F, G, H, 2, 3, AND 4, anD D—F, G, H, %, 1, 2, anp 214, IN VARIOUS GROUPS IN %S. 4B cA oB 4B 5A 5B ni 162 125 186 162 125 186 Cr 100.0 Ver 98.8 G 83.3 G 98.1 if 35.2 47.2 67.6 H 87.0 93.6 100.0 J 22.2 25.6 44.6 2 50.6 57.6 75.3 K 1.2 9.6 6.4 3 25.9 35.2 53.8 4 5.6 5.6 9:7 AF 100.0 DF 93.8 G 98.8 G 82.1 H 96.3 95.2 98.4 H 75.9 80.8 91.9 I 72.8 77.6 86.6 % 36.4 52.8 68.3 J 6.8 8.0 25.3 2 3.1 5.6 10.2 K 0 3.2 15.6 2% 1.9 0.8 5.4 Of the 53 adult students, only 45 were measured with the completion tasks, and only 28 of these attempted the Q and R composites. We have estimated as well as we can how the individuals in question would have succeeded if they had attempted all. THE COMBINATION OF 10-cOMPOSITES INTO 40-COMPOSITES These 10-composites were combined into 40-composites by putting together a C, an A, a V, and a D which, from the data at hand, seemed of nearly equal difficulty as com- posites. Some of the 10-composites were constructed espe- cially to fit others in this way. Into the history of the pro- cedure by which the final arrangement of the 40-compositesTABLE 42. DIFFICULTY OF 10-CoMPOSITES MEASURED BY THE PERCENTS OF 147 PUPILS IN GRADE 5144 SUCCEEDING WITH FIVE OR MORE OF THE 212 JRADE 514, IN UNITS OF THE MEAN OM THE MEDIAN DIFFICULTY FOR L OF WHATEVER ABILITY THE 10-COMPOSITE "R SINGLE TASKS, AND BY DISTANCES + OR —- SQUARE VARIATION OF GRADE 544 IN I TEN MEASURES IN EACH CASE. 147 PuPILS ARE THOSE oy u 4EVE WHO WERE 9 4 7 LEE / /2 51 RADE a 7 SIMILAR Facts ror 205 Pupims AND 200 PUPILS IN ( INCLUDED IN BOTH THE 205 AND THE 200. THE MEASUREMENT OF INTELLIGENCE o distance 10-Composite %s o distance %s 10-Composite 00 9 Me 00 13 “ 147 205 147 wo NI 08 9 hei ot 9.5 ¢ 100 .86 + 0,14 so =P) 80.6 83.1 9.5 Cc 100 9 la 44.4 43.2 — 1.25 ~ 1.691 — 1.49* 39 ¢ o* a 3. 81 + 20.0 20.9 / 2 3 2.3 4+ 2.20 1.0 + 0.0 1.4 0.0 4.() .20 4 60.5 55.4** 58.0 100.0 100.0 100.0 D G(-4) 3) ‘ « H(- — 1.49 — 1.40 a oO Oo 91.9 AS LK L(- 48 .65 68.3 sH SoS i ~rEaqae res ort or fal a ID HD 1O NI’ 10 CO =H Qo CO ate oO rt wel Sy Sa Meare N Oo = + + + So io) sine + + TOPOS 1D © CO WH mA re 0 © sH a on - 30.4 2 + 1.04 + 1.55 14.9 2% * Average of the results for 2 and 2a. ** Average of the results for 3 and 3a.LEVELS OF INTELLECT Dil TABLE 43. THE DIFFICULTY OF VARIOUS 10-CoMPOSITES IN THE CASE or 44 ADULTS: RECRUITS IN THE UNITED STATES ARMY. Jos Jos %s %s C F 100.0 AF 100.0 V F_ 100.0 DF 100.0 G 86.4 G 100.0 G 100.0 G 95.5 H 100.0 H 100.0 H 88.6 lee I 95.5 1 100.0 6 79.6 ee 1025 I-J 70.5 2 90.9 1 61.4 KS 50:0 3 81.8 2 47.7 4 61.4 2% 38.6 5 45.5 3 38.6 6 11.4 4 18.2 7 2.3 5 09.1 TABLE 44. DIFFICULTY OF 10-COMPOSITES MEASURED BY THE PERMILLES OF SUCCESSES AND BY DISTANCES + OR—FROM THE MEDIAN DIFFICULTY FOR GRADE 8144, IN UNITS OF THE MEAN SQUARE DEVIATION OF GRADE 814 IN THE ABILITY MEASURED BY THE COMPOSITE. 10-Composites Permille s o distance Crt 972 -1.91 J 876 -1.15% K 472 + .07 N 72 + 1.46 A KI 564 - .16 L 440 + .15 \Y al 980 — 2.05 la 976 — 1.98 2 920 —1.41 2a 972 -—1.91 3 740 — .64 3a 864 —1.10 4 644 me OL 4a 618 — .30 5 440 + .15 5a 448 + .13 6 152 + 1.03 6a 236 + .72 7 44 +1.71 7a 60 + 1.55 Inf) 3) (1) 984 — 2.14 4 (2) 716 eT 5 (3) 500 00 6 (4) 356 TRS 7 (5) 324 + .46 8 (6) 200 + .84214 THE MEASUREMENT OF INTELLIGENCE TABLE 46. DIFFICULTY OF 10-COMPOSITES MEASURED BY THE PERCENTS OF Two GROUPS (246 Pupius IN GRADE 9 AND 264 PUPILS IN GRADE 9) SUCCEEDING WITH Five oF More oF THE TEN SINGLE TASKS, AND BY DISTANCES + OR — FROM THE MEDIAN DIFFICULTY FOR THE GrouP IN UNITS OF THE MEAN SQUARE DEVIATION OF THE GROUP IN THE ALTITUDE OF WHATEVER ABILITY THE 10-COMPOSITE MEASURES. 91 9II 10-Composite Permille s o distance Permille s o distance Cr 967 — 1.84 LJ 951 — 1.65 J 967 — 1.84 936 — 1.52 K 805 — .86 689 — .49 L1 350 + .39 295 + .54 M1 191 + .88 178 + 92 N 30 +1.88 O 0 ATI 1000 f I-J 980 — 2.05 J 886 —1.21 773 — .75 Jl 943 — 1.58 784 — .79 Kl 545 - ll 333 + .43 K 671 — 44 500 .00 L 439 + 15 258 + .65 Ll 629 — .33 M 167 + .97 N 72 + 1.46 \W al 996 — 2.65 996 — 2.65 la 2 967 — 1.84 977 — 2.00 2a 3 866 —1.11 826 — 94 3a 4 703 — .53 678 — 45 4a 5 492 + .02 405 + .24 5a 6 150 +1.04 144 + 1.06 6a 49 + 1.65 7 20 + 2.05 23 + 2.00 7a 11 + 2.29 1In the case of the L and M completions with those pupils of Group 9I who did not have time to do everything to their satisfaction, an estimated score was derived on the basis of what they did as far as they went and of what they did with the completions of K.LEVELS OF INTELLECT 2s TABLE 45—Continued. 9T 9II 10-Composite Permille s o distance Permille s o distance D% 992 — 2.41 1 951 — 1.65 2 683 — .48 214 600 — .25* 3 362 + .35 205 + .82 3% 293 + 54 4 289 + .56 133 +1.11 4% 5 126 + 1.15 27 + 1.93 6 23 + 2.00 7 0 TABLE 46. DIFFICULTY OF 10-CoMPOSITES MEASURED BY THE o DISTANCES + OR—FROM THE MEDIAN DIFFICULTY OF A GIVEN GRADE IN UNITS OF THE MEAN SQuARE DEVIATION OF THE POPULATION OF THAT GRADE IN THE ABILITY MEASURED BY THE 10-COMPOSITES. Grade 9 Grade 10 Grade 11 Grade 12 n= 1089 mii n= 769 n= 643 \y al — 2.457 — 2.652 — 2.748 — 2.512 2 — 2.290 — 2.290 — 2.652 — 2.512 3 — 1.360 — 1.433 — 1.838 — 1.960 Bs — .671 — .845 — 1.522 — 1.607 5 + .182 — .151 — .668 — .931 6 +1.131 + .553 + .068 + .151 7 + 2.120 +1.881 + 1.546 + 1.243 n=1041 n= 700 n= Wo2 n= 637 V ila — 2.457 — 2.457 — 3.090 — 2.748 2a — 2.576 — 2.366 — 3.090 3a — 1.366 — 1.468 — 1.995 — 2.226 4a — .719 — .990 — 1.483 —1.774 5a + .068 — .264 — .845 — 1.019 6a + .904 + .527 + .050 — .235 Ta + 2.120 + 1.866 + 1.259 + 1.094 mn =1985 n= 1053 n= 742 Di — 2.409 — 2.878 — 2.652 2 — 1.398 — 1.695 — 1.812 3 — .690 — .999 — 1.170 4 — .055 — .335 — .o16 5 + .542 + .306 — .065 6 + 1.243 + .966 + 824 7 + 2.170 + 1.896 + 1.774 *71.5% had 2 right out of 5. 48.4% had 3 right out of 5. 16216 THE MEASUREMENT OF INTELLIGENCE TABLE 47. THE DIFFICULTY OF VARIOUS 10-COMPOSITES MEASURED BY THE PERCENT SUC- CEEDING AND BY THE DISTANCE FROM THE MEDIAN IN TERMS OF THE MEAN SQUARE DEVIATION OF THE Group. 422 NorMaL ScHOOL SENIORS. THE FORM OF DISTRIBUTION IS ASSUMED TO BE ‘ 00 ~ wD 1.14 13.37 15.73 13.22 13.45 159 160 ** 169 LB ORGS 8.70 .62 9.97THE TRANSFORMATION OF STANDARD SCORES 243 In Table 68 the results of the 6, 9 and of the 7, 8 deter- minations are put side by side and averaged, and measures of the unreliability of the averages are attached. We have made a provisional extension of the transmu- tations for National A down to original scores of 20 by using the assumption that Grades 5 and 4 will show ap- proximately normal distribution of perfectly measured in- TABLE 69. Nationa, A: GRADE 4 (n=1677) AND GRADE 5 (n= 2487) Original Teterval Disha hationa Values in Equal Units Aver- Average x £ 5 4 5 age 924 10— 19 10 3 20— 29 33 7 11.51 9.30 10.40 9.61 30-— 39 69 23 9.92 10.19 10.06 9.30 40-— 49 113 63 8.24 10.44 9.34 8.63 50— 59 214 118 10.07 9.42 9.75 9.01 60-— 69 285 275 9.87 12.00 10.94 10.02 70-— 79 311 347 9.99 10.02 10.01 9.25 80— 89 257 417 9.36 10.13 9.75 9.01 90— 99 200 412 10.26 10.01 10.14 9.37 100-109 106 350 9.49 10.27 9.88 9.138 110-119 55 226 10.85 9.55 10.20 9.42 120-129 7 147 7.69 10.83 9.26 8.56 2 66 10.59 10.44 10.52 9.71 22 11 1 iL tellect. Table 69 shows the original distributions, the values of the intervals in equal units by the assumption, the quo- tients when these are divided by 1/50 of the difference between 70 and 120 of the original scale, the averages for the two grades, and these averages after multiplication by a factor which equates the 70 to 120 difference with the 70 to 120 difference of the scale derived by the use of Grades 6 and 9.TABLE 70. EQUIVALENTS FoR NATIoNAL A ScorEs FROM 20 TO 170, IN A ScaLE WITH Equau UNITS 1= APPROXIMATELY 1/50 OF THE DIFFERENCE BE- TWEEN 100 AND 150 oF THE ORIGINAL SCALE. THE MEASUREMENT OF INTELLIGENCE Orig. Cor. Orig. Cor. Cor. 2 57.8 80 82.4 110 110 140 139 1 ie OS.6 1 83.2 Ha 1 140.1 2 2 . 59.4 2 84.1 yA dial 1 142 3 3 ©660.2 3 =©85.0 3 113 3 142.3 4 4 61.0 4 85.8 4 114 4 143.4 5 5) GES 5 86.6 5 115 5 144.5 6 6 62.6 6 87.4 6 116 6 145.6 7 7 ~#@63.4 (ee 1SSee Ce AL, 7 146.7 8 8 64.2 Se 8910 8 118 8 147.8 9 9 65.0 9° 89:8 9) ets 9 148.9 0 60 65.7 90 90:7 120 120 150 150.0 1 1 06:0 O26 1 12029 1 151.3 2 2 67.3 2 92.6 2 121.8 2 152.6 3 Se OoeL S930 3 122.7 3 154.0 4 4 68.9 4 94.6 4 123.6 4 155.3 5 5) eer 5) UBS 5 124.5 5 156.6 6 6 70.5 y Os 6 125.4 6 158.0 7 Go tals (leo 7 126.7 f Loos 8 Se 2a) Sie 98:5 8 127.2 8 161.6 9 Ie 03:0 99.5 oF tsa 9 162.0 0 70 =73.8 100.5 130 129.0 160 163.4 1 Wo Yes) 101.4 1 130 1 164.8 2 Zi loro 102.4 2 131 3 166.2 3 Or Ord 103.3 3 132 3 167.6 4 Ae iilee 104.3 4 133 4 169.0 5 On 8.0 105.2 5 134 5 170.5 6 6 78.9 106.2 6 1385 6 172.0 7 i a193S 107.1 7 136 Ce LS. 8 8 80.6 108.1 8 137 8 175.0 9 9 Sito 109.0 oF 38 9 176.5 170 178.0 We combine these results with that from Grades 6, 7, 8, and 9, allowing equal weight to each of the two, and so have, as provisional values for these low intervals, the following: 20-29 30-39 40-49 50-59 60-69THE TRANSFORMATION OF STANDARD SCORES 245 Using these values up to 70 and that of Table 68 from 70 on, and making the original scale and the scale in equal units coincide at 120, we have Table 70 as our transmuting table. THE OTIS ADVANCED EXAMINATION In the case of the Otis Advanced Examination we have the distributions shown in Table 71. We obtain the o values in equal units for each interval for each group, as shown in the case of Army Alpha. In Groups I, II, and III we then divide each of these by the difference between 70 and 140 (in equal units) ; average I and II with respective weights of 2 and 1; combine this average with III, giving equal weight to Grade 6 and to Grade 9. In Group IV we divide each of the o values by the difference between 100 and 170 (in equal units) and then multiply each by a factor which makes the 100 to 170 difference for Group IV equal to that for the I, II, Il] weighted average. The J, II, III weighted average and the IV result are then averaged with weights of 3 and 1, respectively. The essential steps in these com- putations are shown in Table 72, the last column of which shows the combined estimate of the relative values of the 10-point intervals from 30 to 200 in terms of equal units. For convenience in interpretation these values are divided by 1.06, which makes the unit of the equal-unit scale 1/100 of the difference between 70 and 170 of the original scale. By interpolation and smoothing, letting the two scales coin- cide at 100, we obtain the equivalents of Table 73. It may be noted that what scant data we have above 200 indicate that the rise from 12.52 to 17.77 (or 11.81 and 16.76, after division by 1.06) is not a matter of the sampling error. The data give 21.60 (or 20.38) as the value for 200 to 209. The interval from 20 to 29 has a value of 18.380 (17.26 after division by 1.06) by the sixth-grade groups, but this is too unreliable for use without confirmation. We have considered the facts for a fifth-grade population of 3,058 individuals and a fourth-grade population of 1,500 pupils. We do not, of course, know that in these grades the distri-THE MEASUREMENT OF INTELLIGENCE TABLE 71. OTIS ADVANCED: DISTRIBUTIONS. iu IIt IV I Grade 6 Grade 6 Grade 9 Grade 12 Interval n= 4298 n= 1654 n= 3627 n= 1226 Oto) 19 3 1 20 eee 9 19 12 SOROS 74 23 AQ SS 749 168 56 HOR Oo 334 107 9 60) E69 504 183 40 One 9 659 243 79 SOR 89 738 268 174 6 S0FSS 199 587 244 262 23 100 ** 109 499 209 443 38 LO AD 346 135 520 58 1:20) © 1129 193 93 541 95 TS Once oo 97 45 547 153 140 ‘* 149 51 25 409 187 DO) eel 59 20 8 317 191 160 ‘* 169 5 1 190 187 UA) SO alyAy 62 139 180 ‘* 189 1 ul 24 85 190 ‘* 199 10 50 200 ** 209 13 210 <« 219 1 butions of truly measured intellect are of Form A; but their low end will not diverge enough from the corresponding sections of Form A to invalidate the comparisons which we shall make. Assuming the low end to be of the geometrical form of the corresponding section of Form A, and expressing the true values of the interval 10 to 19, and of the interval 20 to 29 in terms of the interval 30 to 39, we find the following: The relevant facts are that in Grade 4 we have 33 cases (out of a total of 1,500) from 0 to 9, 121 cases from 10 to 19, 246 cases from 20 to 29, and 248 cases from 30 to 39; in Grade 5 we have 6, 41, 112 and 257 (out of the total of 3,058) in these same intervals. Interval Grade 4 Grade 9 Average 10-19 1.65 1.37 1.51 20-29 1.43 1.00 1.22THE TRANSFORMATION OF STANDARD SCORES 247 We allow equal weights in averaging, because the larger Q o o) oD population of Grade 5 is offset by the larger proportion of Grade 4 in the intervals studied. TABLE 72. OtT1s ADVANCED: EQUIVALENTS FOR EACH 10-Pornt INTERVAL OF THE ORIGINAL SCALE IN Equa UNITS. A B C D E F 6 6 2A Be 9 Cre 12 3E+F Interval n=4298 n=1654 3 n=3627 2 n=1226 4 °&G/1.06 30to 39 14.29 10.04 12.86 12.86 12.86 12.1 40to 49 11.89 10.80 11.53 11.53 11.53 10.9 5s0to 59 11.74 10.54 11.34 11.34 eS Agel OL7 60to 69 10.91 11.18 11.00 11.00 11.00 10.4 70'to 79 10.89 10.80 10.86 10.13 10:50 10.50 9.9 80to 89 10.99 10.46 10.81 10.69 10.75 10.75 10.1 DOMO99) | (9:47, 5 9:84 9160s 9)35u golds 9.48 8.9 100 to 109 10:00 10.31 10:10 10:74 10.42 1052 10.45 2:9 110to119 10.23 9.18 9.88 9.90 9.89 OOF L009 9.5 120 to 129 9.63 10.45 9.87 9.56 9.72 9.74 9.72 9.2 130to0139 § 11.54 9:38) 10582) 10141062 ele 7a oh Os 7 Sit Ole, 140 to 149 Soe Dele 24 OSA: 9.6 150 to 159 Ini} EB} NO LIS} TO 160 to 169 USN) GU), AIS} IGE TIT 170 to 179 OSCR aL OSUG 2227 O69 108 180 to 189 ttog, | LAto2 ses 190 to 199 Wot aetty AMA Multiplying the 12.86 of Column G and the 12.1 (more exactly 12.13) of Column H of Table 72 by 1.51 and 1.22, we have these values for the intervals: 10-19 19.42 (or 18.32 when divided by 1.06). 20-29 15.69 (or 14.80 when divided by 1.06). We may use these as provisional values subject to further investigation. They are used in the extension of Table 73 by Table 73a. THE HAGGERTY EXAMINATION, DELTA 2 In the case of the Haggerty Delta 2 we have the distribu- tions shown in Table 74. After estimating the frequencies 18248 EQUIVALENTS FOR OTIS ADVANCED ScorES FROM 30 TO 200 IN A SCALE WITH EQuaL UNITS. THE MEASUREMENT OF INTELLIGENCE a TABLE 73. AND 170 OF THE ORIGINAL SCORES. 1 720 OF THE DIFFERENCE BETWEEN 50 Cc OFC O C O C O C 30 26.4 70 70.6 110 109.5 150 147.8 190 192.3 Sl Alot le TG LOLS 151 148.9 191 193.6 32 28.9 72 «72.6 1A able: 152 149.9 192 195.1 33 30.2 73 «73.6 113 112.4 153 151.0 193 196.6 34 31.4 74 74.6 114 113.3 154 152.0 194 198.1 35 32.6 75 75.6 115 114.3 155 152.9 195 19 O87 36 33.8 76 76.6 116 115.2 156 154.0 196 201.3 37 395.0 ile ntidO Te Le GeZ 157 155.0 197 203.0 38 36.2 78 78.6 TS ae 158 156.1 198 204.7 39 37.4 79 ©679.6 119 118.1 159 157.2 199 206.5 40 38.6 80 80.6 120 119.0 160 158.3 200 208.3 41 39.7 81 81.6 121 120.0 161 159.4 42 40.8 82 82.5 122 120.9 162 160.5 43 41.9 83 83.5 123 121.9 163 161.6 44 43.0 84 84.5 124 122.8 164 162.7 45 44.1 85 85.4 125 123.8 165 163.8 86 86.4 126 124.7 166 164.9 oo — aRTHE TRANSFORMATION OF STANDARD SCORES 249 TABLE 73a. PROVISIONAL VALUES FOR OTIS ADVANCED SCORES FROM 10 TO 29. O C O C O Cc O C 10 — 6.7 15 2.6 20 11.6 25 19.1 11 — 4.8 16 4.4 21 13.1 26 20.6 12 — 3.0 17 6.2 22 14.6 27 22.1 13 =—1.1 18 8.0 23 16.1 28 23.6 14 8 19 9.8 24 17.6 2 25 in intervals of 10 from the irregular arrangement of III and IV, the values of each interval of each group in equal units are computed. These values for I, I], and III are put in terms of the difference between original 70 and original 130, to make them comparable. The two Grade 9 deter- minations are then combined with weights of 1 and 3, re- spectively. With these averages are combined the deter- minations from Grade 6, with weights of 2 for the former and 1 for the latter. The determinations from Grade 12 are TABLE 74. HAGGERTY DELTA 2 DISTRIBUTIONS I II III IV Grade 6 Grade 9 Grade 9 Grade 12 Interval n=916 n—473 n=1995 n= 668 10 to 19 1 20 25to 42 1 30 4 40 12 43 to 54 3 50 39 i 55 to 65 10 60 87 5 66to 76 29 70 127 6 80 161 36 77 to 86 73 1 90 164 54 87 to 99 225 13 100 154 89 100 to 114 555 45 110 86 109 115 to 119 212 36 120 61 79 120 to 129 415 121 130 17 73 130 to 139 283 162 140 2 42 140 to 149 155 170 150 to 159 1 12 150 to 159 31 102 160 to 169 1 160 to 169 3 16 170 to 179250 THE MEASUREMENT OF INTELLIGENCE made comparable with this composite determination by multiplying them by a factor such as makes the 100 to 150 difference the same for the Grade 12 group as for the com- posite. They are then combined with the composite deter- mination, the weights being 1 for the Grade 12 items and 3 for the composite. The essentials of these computations appear in Table 75, the last column of which gives the final estimate. The units of the Haggerty Delta 2 score become progressively ‘‘harder’’ (that is, larger) when put in equal units, from some point in the 70’s up to 160. TABLE 75. HaAcGcerty DELTA 2: VALUES IN EQuaL UNITS. A B C D E F G B+3C A+2D E+ Interval n=916 n=473 n=1995 id ce IS ee n= 668 50 to 59 10.85 10.85 10.85 60 to 69 10.55 10.55 10.55 70 9.31 5.20 9.29 8.27 8.62 8.62 80 9.19 SIZ 8.38 9.62 9.48 9.48 90 9.03 9.62 9.08 9.21 9.15 9.15 100 10.30 10.42 11.06 10.90 10.70 10.05 10.54 110 8.78 11.48 10.67 10.87 10.17 9.50 10.01 120 13.41 9.82 ESS 11.10 11.87 12.61 12.06 130 16.61 11.87 13.06 13.06 12.86 13.01 140 10.37 16.16 14.71 14.71 15.46 19.86 150 20.88 20.88 Interpolating, smoothing, expressing the values in terms of 1/60 of the difference between original 60 and orig- inal 120, and letting the two series coincide at 90, we have the equivalents of Table 76. THE TERMAN GROUP TEST In the case of the Terman Group Test of Mental Ability, we have the scores of 5,582 pupils in Grade 7, 9,087 in Grade 8, 10,881 in Grade 9, 6,730 in Grade 10, 4,206 in Grade 11, and 4,886 in Grade 12. [Terman, ’23, p. 9.] These are reported in the form of the point on the scaleTHE TRANSFORMATION OF STANDARD SCORES Det TABLE 76. EQUIVALENTS FOR HaGgcGEerTy DELTA 2 ScoRES FROM 50 TO 160, IN A SCALE WITH EQUAL UNITS. O Cc O C O Cc 50 50.6 87 87.1 124 124.8 ol 01.7 88 88.1 125 126.0 52 52. 89 89.0 126 127.2 53 53. 90 90.0 127 128.4 54 99.0 91 90.9 128 129.6 55 56.1 92 91.8 129 130.8 56 57.2 93 92.7 130 132.0 o7 58.3 94 93.7 131 133.3 58 59.4 95 94.6 132 134.6 59 60.5 96 95.6 133 135.9 97bo Or LS) THE a © © S — = fl tase n oa Co Bik 3d qd Oo bed tee 4 of ¥E og - 3 ie aD'o eS = SH M go Q, 2 nS | EO Sm oO pata SS AM Po ene =| 5D oe S HH AS eS 4 QR a © w SI 7 S CH | RL OB | a A 5 = 3 aR me Oo | = Om | qd | r | EA See a a=] i) iS) | - | po Permille Entries. Seale Points Corresponding to MEASUREMENT OF Grade Grade Grade Grade Grade Grade Grade Grade Grade Grade 10 9 a 11 10 48 9 9 3.23 1 9 a 3.4. 11.54 ieysul 1 .63 12 12.21 0 44 53 10.86 3 10.8 10.64 1 1D 10 3.12 8.85 « « 2.49 8.42 6.69 1 9 AG see 8.17 9 6. 100 109 11 90 63 — 8.98 9 Cc 99 9 AN 92 98 6.03 io} NMAMOrFKKRAROHAHMHDOOKR BODO AnMHt On DOHA o eee e ee KK RK © OO 0 0 © WDD WDADDAAGAAAGAARARSSSO 3 NNANAATAANANAA A ANNAN ANNAN A HM & 7, i- “4 1 STANMHMNOKRAROAAMHNMORARHROCHAMH MOR DAG (9 ° é oO oS fa a A a -¢ ts r » “¢ = rio SmeAntnnnrDaan mN ow HH Ww ta SOT NM HOO KR OC Io H lO 0O Son NM Ww m= ODOC FF ie oO ' x 4 ° SAAMHMHOONMADARRBAANAMWNOERAROANMH HOM DOD , _ on “3 an nN an 3 DODD ARS SS ¢ OD DOHA NMHOR DAOC HAMHWM OR ao NoHo oO bw & MHD A DADA AD AND ND AHBAARAARAGRHHDRSSOSDSCSDCGCSCSS "y arn ni ntinti nti nt a nA es cae nar TANNNNNNANAANAN A ko ba a oO a } 3 ISG Mo SOI tI I OM MOS EG OWb OOS 4 tt A re ri a R 8 co Aorrondn naan es QD 2D OD mA © 1d oo or) SHNMHTHNHOORARHONTAAwMHMO Ore DO OO ao D190 10 10 1D 1D WH WW ODT MW HWOHWOHWO WOOHOO OO ODO KR Re KR RK Ke KR KR KR & a SAmnt nn nna ni A nA MA KARR AAA AA eR aeieatnoie Bi xs HAAMMHIMOOMDADAAMHMONMNDACHAMHH OND = 1D > ~ a ° r- me 4 | tH 19 © © © t~ 0 0 o HAMHH MON DD o ANMDHWSDKRDROCONHMHH UK CORDAGDOHAAHNHORARDS A o ANNAN ANANDHDAHAMDMDMDADMOAHA HHH Ho HO cy Sess Sst ri mam nA nt ini ni nt nti nti nti ni nti nn nnn nt ae es Z S mr < : oO & Q 4 A262 THE MEASUREMENT OF INTELLIGENCE TABLE 87. BROWN UNIVERSITY PSYCHOLOGICAL EXAMINATION. GRADES 12, N=3333 AND 13, N =2118. Value of interval Values in terms of in equal units 1/35 of the difference Interval Gradel2 Gradel3 between 35 and 70 Average n =3333 n=2118 Gradel2 Grade13 20 to 24 5304 6.25 “ease: 2D) a 20 .0800 6.84 6.84 BU) aso4 4764 5.62 5.62 oD a! 39 4500 .4260 5.30 5.13 5.22 40 ‘* 44 .3780 3741 4.46 4.50 4.48 Ab *§ 49 4172 3442 4.92 4.14 4.53 5085 54 .3976 3851 4.69 4.63 4.66 Doe OD 4179 4500 4.93 5.42 5.18 60 ‘* 64 4263 4495 5.02 5.41 5.22 65 4" 69 4805 4798 5.67 577 5.72 10 214 .4908 5.91 5.91 (ITY a .6048 7.28 7.28 TABLE 88. EQUIVALENTS OF SCORES FROM 20 TO 80 FOR THE BROWN UNIVERSITY PSYCHOLOGICAL EXAMINATION, IN EQuAL UNITS. O C O C O C 20 17.0 ae ra < 60 snore tf 18 1 419 1 \60I8 Be 19.5 Age 2 619 3 20.7 3.7 s -<630 4 22.0 4 44.6 4). 3640 oe og 45 45.5 65 65.1 6 24.6 6 46.4 6 . 662 7 26.0 7 APS 7 67.4 Bi 7.4 8 48.2 8 685 9 28.8 o. 4g 9 69.6 30 30.2 50 50 70 70.8 1) B13 ees G5 1 72.0 DP aga 2 52 Ban 3 33.6 3 52.9 3 . 743 Ae 387 A 538 4 75.5 35. 335.8 55 ow. 75 176.7 6 369 6 -B5.7 6° «(18a 7 38.0 7 56.7 7 19.6 8 39.0 8 57.8 8 | RiLO 9 40.0 9 58.8 9 82.5 oo oS oO iTHE TRANSFORMATION OF STANDARD SCORES 263 TABLE 89. ARMY EXAMINATION A: DISTRIBUTION OF PUPILS IN GRADES 4, sy, (iy hs hy bt AND 13. Interval } 5 6 7 8 13 n= 463 n=9570 n= 672 n= 685 n = 630 n=701 O- 9 4 10-19 9 20-29 16 4 1] 2 30-39 21 5 4() 36 10 2 1 30 41 21 6 ] 16 60 6 26 3 1 2 70 o7 48 15 6 80 3 45 32 11 4 o 90 47 64 4] 20 2 100 4 j 110 31 61 54 36 21 1 120 22 61 64 56 30 1 130 20 55 78 a 28 140 1] 43 84 63 37 150 7 28 80 65 46 3 160 1 20 53 53 52 3 170 1 11 24 79 56 5 180 6 26 56 61 5 190 8 23 47 ay 20 200 1 8 44 42 29 210 9 16 45 25 220 7 16 36 31 230 = 18 33 53 240 3 28 58 250 2 ] 15 61 260 1 2 13 68 270 9 75 280 1 6 66 290 1 48 300 1 ol 310 1 39 320 20 33 16 340 12 350 3 360 2 19264 present use. TABLE 90. THE MEASUREMENT OF INTELLIGENCE THE ARMY EXAMINATION A For special reasons, we have investigated the values of scores in the Army Examination a, although it is not in We have distributions from Grades 6, 7, 8, and from college freshman, nearly seven hundred in each [Memoirs, p. 537], shown in Table 89. We shall also use to RESULTS FROM GRADES 6, 7 AND 8. ArMyY EXAMINATION A: EQUIVALENTS FOR EACH 10-PoInt INTERVAL OF THE ORIGINAL SCALE IN EQUAL UNITS. 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 23 240 250 260 270 Original Interval Value in Equal Units Gr. 6 Gr. 7 Gr. 8 Average 69 18.7 18.95 18.8 79 15.65 6.9 11.3 89 2.3 9.15 15.6 12.4 99 11.6 11.0 4.0 8.9 109 9.6 9.7 9.5 9.6 119 10.1 Sif I(t 12.3 129 11.55 11.4 14.1 12.4 139 12.65 9.45 9.2 10.4 149 13.65 9.6 9.8 11.0 159 11.3 9.4 10.05 10.3 169 6.4 7.65 9.9 8.0 179 8.65 2.2 9.9 10.3 189 10.6 10.05 10.5 10.4 199 5.2 10.35 9.8 8.4 209 7.5 13.2 7.9 9.5 219 9.4 6.7 9.6 8.6 229 11.25 9.6 8.9 9:9 239 0.00 20.7 10.05 10.3 249 9.28 7.3 11.8 9.5 259 13.48 5.5 8.45 9.1 269 11.3 eS 279 12.9 12.9 some extent the distributions of 570 pupils in Grade 5 and 463 pupils in Grade 4, which are also shown in Table 89. The values of each 10-point interval from 60 to 270 in equal units are computed for Grades 6, 7, and 8, by the methods previously used, and made strictly comparable by being divided by the difference between 90 and 230 of theTHE TRANSFORMATION OF STANDARD SCORES 265 original scale. They are then averaged. The true values of the intervals from 10 to 100 are computed for Grades 4 and 5, assuming that the low ends of these distributions are distributed like the low end of Form A. They are made strictly comparable and averaged. The averages are then multiplied by a factor such that the difference 60 to 100 is represented by the same amount in the series of true values TABLE 91. ARMY EXAMINATION A: EQUIVALENTS FOR CERTAIN 10-PoIn? INTERVALS OF THE ORIGINAL SCALE IN EquaL Units. RESULTS FROM GRADES 4 AND 5. Values in Equal Units Interval Gr. 4 Gr5 Ay. Av. X 1.28 10 to 19 16.4 16.4 21.0 PRY 9G pte SAD Ve 16.8 30 10.1 10.3 10.2 13.1 40 11.9 10.5 12 14.3 50 10.1 1A} 11.65 14.3 60 9.7 9.5 9.6 1223 70 10.8 11.9 11.35 14.5 80 10.0 8.6 9.3 11.9 90 ‘* 99 9.7 10.4 10.05 12.9 obtained from Grades 6, 7, iid 8 and in the series Aintree from Grades 4 and 5. The 6, 7, 8 series is then extended by the values from 10 to 100 ahremed from Grades 4 and 5, allowing equal weight to the two sets of values from 60 to 99. The values of the intervals from 200 to 360 are com- puted from the facts for college freshmen (Grade 18). They are then multiplied by a factor such that the difference 200 to 260 is represented by the same amount in the two series of true values (from Grades 6, 7, 8, and from Grade 1133) The 6, 7, 8 series is then extended by the values for 260 to 360 obtained from Grade 13. The essentials of these procedures and their results ap- pear in Tables 90, 91, and 92. A transmutation table in steps of 10 is then made, let- ting the two seales coincide at 170. This appears as Table 93.TABLE 92. THE MEASUREMENT OF INTELLIGENCE Army EXAMINATION A: EQUIVALENTS IN EQUAL UNITS. GRADES 6, 7, AND 8; 4 AND 5; 13; AND COMPOSITE FROM ALL. RESULTS FROM Original Interval 1- 9 10— 19 20— 29 30-— 39 40— 49 50— 59 60— 69 70— 79 80-— 89 90— 99 100-109 110-119 120-129 130-139 140-149 150-159 160-169 170-179 180-189 190-199 200-209 210-219 220-229 230-239 240-249 250-259 260-269 270-279 280-289 290-299 300-309 310-319 320-329 330-339 340-349 350-359 Value in Equal Units By Grades By Grades By 6,7 and 8 4and5 Grade 13 Composite 21.0 21.0 16.8 16.8 13.1 13.1 14.3 14.3 14.3 14.3 18.8 12.3 15.6 11.3 14.5 12.9 12.4 11.9 2.2 8.9 12.9 10.9 9.6 9.6 12.3 12.3 12.4 12.4 10.4 10.4 11.0 11.0 10.3 10.3 8.0 8.0 10.3 10.3 10.4 10.4 8.4 8.4 9.5 9.5 8.6 8.6 29 39 10.3 10.3 9.5 9.5 9.1 Oat 11.3 8.7 10.0 12.9 9.2 Se 10.3 10.3 9.7 9.7 8.1 8.1 10.3 10.3 10.8 10.8 7.7 7.7 8.6 8.6THE TRANSFORMATION OF TABLE 93. STANDARD SCORES TRANSMUTATION TABLE FOR ARMY EXAM. A. 267 Original Seale in Seale Equal Units 10 — 34.1 20 — 13.1 30 3.7 40 16.8 50 31.1 60 44.4 70 60.0 80 72.9 90 85.1 100 96.0 110 105.6 120 117.9 130 130.3 140 140.7 150 151.7 160 162.0 170 170.0 180 180.3 190 190.7 200 199.1 210 208.6 220 217.2 230 228.1 240 238.4 250 247.9 260 257.0 270 267.0 280 278.1 290 288.4 300 298.1 310 306.2 320 316.5 330 327.3 _ 340 335.0 350 343.6 353.327 329 9° 330 9 288 289 291 292 1/80 OF THE DIFFERENCE 1 ov 290 249 250 251 1 210 O 210 TABLE 94. EQUIVALENTS FOR ARMY EXAMINATION A SCORES FROM 10 To 360 IN A SCALE WITH EQUAL UNITS. 30 210 the value of the two From 1 seales BETWEEN 130 AND 210 OF THE ORIGINAL SCALE. to C H 6 293 294 53 ) 21¢ 217 are identical. 2 91 90 5 ) 9C ae ~ 4 26 33 929 v0 340 ) 9 ( Qo oO ( ) 298 9C QO 9C 9C 9 o 2 o 300 200 8 9 a7 5 a0 60 61 63 NI 9 ae 9 ae 9 9 9 a 9 260 219 19 220 221 999 222 9992 “240 9 990 hiked } 94 D€ 100 MEASUREMENT 3 64 9 AN © 342 345 346 9 350 9 305 307 308 309 8 310 265 269 929 Mei ede 230 “= 3 on ~ nN 34 INTELLIGENCE wo Ae NC 33 9 ae 236 237 15 oro oO Cc SH a oD 360 316 317 320 322 324 vf 9 7 284 28 280 39 240 242 9 “ac 240 €THE TRANSFORMATION OF STANDARD SCORES 269 From 130 to 360, the original scale value may be used with little error, but from 130 down the true values of the original scale units increase so that these 120 points of the original scale are equal to about 164 elsewhere. Table 94 presents a detailed transmutation table made with some smoothing.CHAPTER VIII THE Form or DistrIBUTION oF INTELLECT IN Man The orthodox doctrine is that the form of distribution of intellect in human beings of the same sex and age is Form A, shown in Fig. 15, representing a fact whose varia- tions up and down from its average condition are caused by a large number of uncorrelated factors each of which exer- Fig. 15. Form A, The Normal Probability Surface. cises about the same amount of influence on intellect as any other, and being a surface enclosed by a curve approximat- -x2 . ce Io Ss ing the normal probability curve y = orc where ois oV 2n the mean square variation. This doctrine was urged by Francis Galton, on the basis partly of analogy with the facts in the case of certain bodily dimensions, and partly of his own shrewd observations of human abilities. Since his day it has gained very wide ac- ceptance. This is partly because the measurements of in- tellect and of other mental abilities in children of the same 270THE FORM OF DISTRIBUTION OF INTELLECT IN MAN 271 age (their units being taken at their face value) have uni- formly shown continuity clustering around one mode, with diminishing frequencies in proportion to remoteness from that mode, and with no notable departure from symmetry toward any one special form of asymmetry. It is partly because some assumption had to be made in one investiga- tion after another for purposes of quantitative treatment, and this assumption was about as safe as any other one assumption, and much easier to operate with. Hence we gradually slid into the habit of using the doctrine. This fashion became so strong that in recent years psychologists have assumed symmetry, even though the units taken at their face value produced a markedly skewed distribution. GENERAL CONSIDERATIONS Many of those who have made extensive use of this as- sumption have been aware of its highly hypothetical nature. The argument from analogy is weak because so many bodily variables are clearly skewed in distribution. Such are weight, longevity, girth of chest, strength of arm pull. The argument from mental measurements is weak, not only because of the general ambiguity of the units, but still more because the ‘‘error’’ has been a large proportion of the variation in many of the investigations. The ‘‘error’’ being symmetrical and ‘‘normal’’ tends to add a spurious sym- metry and normality to the variability. Moreover, some- times the selection is such that normality in the group mea- sured may well be an argument in favor of skewness for man in general. So, for example, with sixteen-year olds in high school, or twenty-five-year olds in universities. In general the form of distribution of any variable trait is due to the number of causes that influence variations in its amount, their magnitudes and their interrelations.’ Since we do not know what the causes of the variations in 1 There is a certain regrettable vagueness, not to say ignorance, concerning the causation of variations, as when psychologists consider the amount of in- tellect to be a consequence of the presence or absence of a single Mendelian determiner, and yet to be distributed unimodally in Form A. Either of these beliefs really denies the other.212, THE MEASUREMENT OF INTELLIGENCE intellect in human beings of the same age are, we cannot as yet count them or measure their magnitudes or determine their correlations. We should then be very skeptical of a priori assumptions of Form A as the form of distribution of intellect in human beings of the same age. They are very much stronger in the case of children in the same school grade. In the case of children in the same school grade the causes are our own acts; and we do know that school au- thorities have a rough standard of the educational ability which belongs in a certain grade, say Grade 7, that intellect correlates closely with educational ability, that departures from this standard (that is, mistakes in grading) are rare in some proportion to their magnitude, that they are due to many causes (the different teachers’ judgments with all the experiences upon which they are based, and the ideals and prejudices which they exemplify, and the other causes of error to which they are subject), and that many of these causes are only loosely inter-correlated. These are all fea- tures of a status productive of symmetry and normality. In the case of children of the same age (or age and sex and race) the causes are acts of nature, many of them hap- pening millenniums ago; and we do not even know whether the hereditary factors of variability in intellect are six big ones or sixty small ones. We do not know whether the words heard and acts seen in the first three years of life are of almost zero consequence, as used to be thought when favored children were turned over to healthy peasants dur- ing this period; or are of enormous consequence, as is as- serted by Freud and (but for different reasons) by Wat- son.?, So we may best consider the facts of the distribution of intellect in man with little or no pre-disposition. 2 There is one special set of major causes of variation about whose action we do know something. Certain diseases and certain accidents, either before or during or after birth, act to prevent or reduce the development of intellect. In some eases one of these causes may act to prevent intellect from reaching more than a certain very lowly status regardless of what might have happened had its action been withheld. The result may be that whatever the distribu- tion apart from these causes, there is combined with it a very small distribu- tion with a mode at a very low degree of intellect, as shown in Fig. 16.NATIONAL INTELLIGENCE EXAMINATION: DISTRIBUTION OF SCORES FOR WHITE PUPILS, AGE 11. Permille ades 3 to 8 Interval Frequencies in Gr tervals of in Total 3B 4A 4B 5A 5B 6A 6B 7A 7B 8A 3A 10 1D 12 13 27 No OO Hr ri 16 18 29 43 38 44 56 26 30 47 Le) Reet a ri nN ra rd ej oO sO ANWHWMHMA nN 70 63 ww 13 81 14 THE FORM OF DISTRIBUTION 72 10 20 26 14 100 92 9 40 12 54 60 60 88 uw 114 99 10 99 67 110 10 11 44 124 ct WO 18 13 OF INTELLECT IN 112 97 13 37 33 Are MAN © nN © a a) a ~ oO ri Oo oO Se | Of CO rt of nN274. THE MEASUREMENT OF INTELLIGENCE THE EVIDENCE The results stated in Chapter VII permit us to free the evidence of the past from the ambiguity and misleading of units whose real value was unknown. If we had the time and facilities, we could free them also from the constant tendency toward symmetry and normality due to the error of measurement, but that work must be delayed. What we can do now is to show the form of distribution of children of the same year-age in respect of intellect in so far as it is measured by the Haggerty or by the Otis or by the National A, and in so far as the children examined are a random sampling of the children at that age. We do not separate Fig. 16 the sexes, because the sex differences are small and the separation would leave us with too small populations. We do not separate races, because that cannot be done in the records available. Negro schools are very rarely, if ever, included in the records; but negro children and children of mixed parentage doubtless are sometimes reported without distinction, and so included in the distributions. The ages used are 11, 12, 13, and 14, at which years cer- tain very dull children have been excluded from school at home or in institutions. Some 14-year olds have left school. The measurements were taken in schools, so that there iswo qe ee * 4 NATIONAL INTELLIGENCE EXAMINATION: DISTRIBUTION OF ScoRES FoR WHITE PUPILS, AcE 12. Frequencies in Grades 3 to 8 Permille intervals of Interval 6A 6B TA 7B 8A 8B Total 5B 3B 4A 4B 3A 10 THE FORM OF mA Wino wo = Seeianrar DIS aN sH TC sH OOO 10 =_ a HA Ado = ri rn) HI co a ri am a ro 0 ret ates 16 9 26 34 47 10 13 4] 19 20 9 a 78 37 56 78 89 You 11 30 20 14 11 D1 59 110 10 nN 16 118 6 01 nN 92, 106 C Nn 14 | es on ri 1 » TRIBUTION OF INTELLECT 56 Nn oN 120 125 130 oO o ri 47 20 3 46 oO 10 16 135 140 145 79 59 g N 13 150 a 15 oO oc | | pale ae S Oo rc | © 19 © © eo NO ei ri O — — of} monn | Oo cm | OOrdr | | 00 1D Or. oO ~ ri on on — © ri nia > iM Sr) _ co re © © N | CO or 1} wo ~ H jr ao ~ rei — | 3 olO © a oo~r |S ae |e276 THE MEASUREMENT OF INTELLIGENCE too small a representation of truants and of sickly children. City children are more fully represented than country chil- dren. The schools are predominantly public schools; so that Catholic children are insufficiently represented. The age is doubtless sometimes in error, and is a year wide. The latter fact should spread and flatten all the distribu- tions a little. Se | Fic. 17. The form of distribution of the scores of 11-year-old children in National A, transmuted into a scale with equal units. Fig. 18. The same as Fig. 17, but for 12-year-old children. It seems unwise to tamper with the records in an effort to allow for these various factors and make the distribu- tions more exactly representative of ‘‘all white children of the United States of age x.’’ The process of allowance would probably make improvements, but they would be small and uncertain and very tedious to make and to under- stand. So we shall take the facts just as Haggerty, Otis, and the National Committee give them; and do nothing to them save transmute each scale interval into units which are truly equal, construct the resulting distributions, and measure certain of their properties. In our inferences fromTHE FORM OF DISTRIBUTION OF INTELLECT IN MAN 277 the results we shall, of course, try to bear all the condition- ing factors in mind. Consider first Tables 95 to 98, which give the facts in the case of the National A. Tables 95, 96, and 97 give the original data. Table 98 gives the data for constructing the surfaces of frequency in the shape of the true values for each interval, which are taken as lengths along the abscissa line, and the quotients of the permille numbers each divided by its corresponding abscissa length. These quotients give Fig. 19. The same as Fig. 17, but for 13-year-old children. / \ \ \ a ~\ ea Fie. 20. An approximate composite of Figs. 17, 18, and 19. the relative magnitudes of the ordinates or heights of the rectangles erected over the corresponding abscissa lengths. igs. 17, 18, and 19 show the resulting surfaces of fre- quency with equal units. Fig. 20 shows a rough composite picture of the form of distribution of National A ability in children of the same year-age. It contains the three sepa- rate distributions centered on their medians. Tables 99 and 100 show the original data in the case of the Otis Advanced Examination. Table 101 shows the ab-TABLE 97. NATIONAL INTELLIGENCE EXAMINATION A DISTRIBUTION OF SCORES FOR WHITE PUPILS, AGE 13. Frequencies in Grades 3 to 8 Permille intervals of 10 5 Total 6A 6B 7A 7B 8A 8B B 5 A 4B rennin Hw xd co Ss ndninne 12 1 1 4 9 nN N 41 99 oa 31 5 14 16 39 10 or ae nN “> MEASUREMENT 37 47 68 31 40 10 7 / 10 16 18 10 13 ae 111 1 13 19 Nn 33 16 83 130 10 5 nN OF INTELLIGENCE 71 ~ 11 an 11 46 wo wo 18 96 0 of AN N 19 13 73 28 10 11 N nN N 42 a CIN! ri a 4TABLE 98. NATIONAL A. DATA FOR SURFACE OF FREQUENCY IN EQUAL UNITS. THE FORM OF DISTRIBUTION OF INTELLECT IN MAN 279 Abscissa Ordinate Heights to Make the Areas Equal to Original Length in the Corresponding Permille Entries of Interval Equal Tables—95, 96 and 97 Units Age ll Age 12 Age 13 10-19 10.00 (Est.) 5 2 2.5 20-29 9.61 12.5 6.2 1.6 30 9.30 14.0 10.8 7.5 40 8.63 31.3 25.5 17.4 50 7.92 59.3 42.9 35.4 60 8.12 99.8 64.0 50.5 70 8.55 117.0 93.6 59.6 80 8.32 137.0 93.8 81.7 90 9.85 137.0 112.0 93.4 100 9.44 131.0 125.0 118.0 110 10.03 112.0 131.0 128.0 120 8.95 107.0 115.0 145.0 130 9.92 72.6 85.7 106.8 140 11.14 32.3 71.0 86.3 150 13.37 14.2 44.1 54.5 160 14.59 4.8 15.8 28.8 170 15.00 (Est.) 5.3 4.7 180 16.00 (Est.) 0.7 scissa lengths in equal units and the ordinate heights ob- tained by dividing each original permille number by the corresponding abscissa length in equal units. Figs. 21, 22, 23, and 24 show the surfaces drawn according to Table VII. Fig. 21. 20 The form of distribution of the scores of 11-year-old children in Otis Advanced, transmuted into a scale with equal units.280 THE MEASUREMENT OF INTELLIGENCE oe Fig. 22. The same as Fig. 21, but for 12-year-olds. oc ie Fie. 23. The same as Fig. 21, but for 13-year-olds. oh oars lie eg See ~~ Fig. 24. The same as Fig. 21, but for 14-year-olds. va Ne Fig. 25. An approximate composite of Figs. 21, 22, 23, 24.THE FORM OF DISTRIBUTION OF INTELLECT IN MAN 281 Fig. 25 is a composite repeating Figs. 21 to 24, with the four medians coinciding. Table 102 shows the original data for the Haggerty Delta 2; Table 103 shows the lengths and heights when — | Fic. 26. The form of distribution of the scores of 11-year-old children in the Haggerty Delta Two, transmuted into a scale with equal units. ay Lael Fie. 27. The same as Fig. 26, but for 12-year-olds. Fig. 28. The same as Fig. 26, but for 13-year-olds.TABLE 99. DISTRIBUTION OF ScorES: AGES 11 AND 12. Otis ADVANCED EXAMINATION Age 11 Frequencies in Grades 4 to 9 Age 12 Frequencies in Grades 4 to 10 Per- Per- mille ul ‘ © Tot 10 mille Total 9 8 | 4 x | | rio | | | | } | nN reir Aaa | ar ~ 10 929 42 i 100 ee Vv oS a a a a N 43 | 52 ™~ ~ S — 86 11 115 2 Vv 9 “ 11 96 141 150 92 85 5 16 31 « « 74. 3 « x 8 THE MEASUREMENT OF 133 124 14 ~~ N 63 109 67 98 84 94 re co 19 10 13 11 NI it~ 43 39 € « 37 13 N 33 ~ N 19 10 AN INTELLIGENCE 10 Ooi 11 13 43THE FORM OF DISTRIBUTION OF INTELLECT IN MAN 283 equal units are used. Figs. 26, 27, 28, and 29 are the result- ing surfaces of frequency; Fig. 30 is their composite. Fig. 31 is a composite of the three composites, Figs. 20, 25 and 30. Fig. 29. The same as Fig. 26, but for 14-year-olds. Fig. 30. An approximate composite of Figs. 26, 27, 28, and 29. A composite of three composites.TABLE 100. DISTRIBUTION OF ScorES: AGES 12 AND 14. Oris ADVANCED EXAMINATION Age 14 Frequencies in Grades 5 to 11 Age 13 Frequencies in trades 4 to 11 al Interv 11 Per- « 11 Tot 10 a Total Per- 10 mille mille nN os | 1 16 17 45 16 41 31 39 11 23 14 19 15 BC Oe 6 31 9 a 1 54 17 7 12 10 16 21 75 96 11 10 12 102 99 88 40 11 14 10 11 41 17 29 10 17 30 41 120 118 18 44 48 nN nN OO petals 1 10 11 109 #119 1 17 92 12 47 96 88 11 48 39 92 47 90 39 20 10 uw N 46 30 19 16 13 12 61 12 14 34 14 31 33 16 30 15 19 106 72 252 304 129 48 117 219 125 311 144 58 3THE FORM OF DISTRIBUTION OF INTELLECT IN MAN 285 The ability which is measured by the score of any one of the commonly used intelligence examinations is thus shown to be distributed in children of the same age (from 11 to 14 in rather close approximation to Form A. There are no demonstrable departures from unimodality or from TABLE 101. OTIS ADVANCED EXAMINATION: DATA BY WHICH THE SURFACES OF FREQUENCY ARE CONSTRUCTED. enenyals =) Values of PR aA Gait ERS Ghee oe ee yy Original intenvals va to the Corresponding Permille Scores Equal Units Po 8 q Entries 11 12 13 14 0- 9 20.0 (Est.) 1.0 0.5 10-19 18.3 Lal 2.2 0 20-29 14.8 10.8 4.7 6.6 2.5 30 12.2 26.2 27.9 9.8 13.4 40 10.9 64.2 45.0 28.4 16.0 50 10.7 116.9 67.3 55.2 41.7 60 10.4 110.5 96.2 74.1 56.5 70 10.0 150.0 112.0 96.0 52.2 80 9.7 142.3 170.0 105.0 717.2 90 9.7 112 137.0 112.0 98.9 100 9.5 98.9 103.0 126.0 110.9 110 9.5 75.8 85.3 99.0 125.0 120 9.6 31. 44.8 98.0 123.9 130 9.6 19.8 40.7 95.9 99.7 140 9.6 8.3 28.2 49.0 81.5 150 10.5 9.5 21.0 29.6 58.0 160 11.0 1.8 leo Wee 29.7 170 11.2 3.6 4.5 14.6 180 11.8 9 AT 4.7 190 16.0 6 2.1 symmetry; the decrease in frequency as we pass from the mode is slow, then more rapid, and then slow again. THE FORM OF DISTRIBUTION AT AGES UP TO FIFTEEN It is reasonable to infer that the form of distribution which is found for these examination scores, when trans- formed into a scale with equal units, will be found with very286 THE MEASUREMENT OF INTELLIGENCE little change for any valid measures of the altitude of In- tellect CAVD, or of Intellect GOPI (letting G refer to geo- metrical tasks, O to opposites, P to picture completions, I to information), or of any representative sampling of intel- lectual tasks. It is reasonable to carry the inference on to any valid measures of the histological and physiological basis of altitude of intellect. It is probably safe also to ex- TABLE 102. Haccerty DELTA 2; DISTRIBUTION OF ScorES. DATA FROM MADSEN [722]. Age 11 Age 12 Age 13 Age 14 Grades 3 to 8 Grades3to10 Grades3to10 Grades3toll n Per- n Per- n Per- n Per- mille mille mille mille 0- 9 5 6 4 5 1 2 10- 19 6 7 8 10 4 5 3 5 20— 29 8 10 6 7 6 8 2 3:5 30- 39 39 48 22 28 11 14 9 6 40-— 49 78 96 35 44 16 21 11 19 50-— 59 80 98 62 79 26 34 13 23 60-— 69 99 122 74 94 48 62 32 57 70— 79 106 130 92 117 65 85 37 66 80-— 89 106 130 93 118 88 114 66 117 90-— 99 100 123 107 136 99 129 7 135 100-109 89 109 95 121 115 149 83 147 110-119 44 54 69 88 112 145 77 136 120-129 36 44 65 83 76 99 76 135 130-139 17 21 33 42 68 88 47 83 140-149 1 1 18 23 26 34 28 50 150-159 1 1 3 4 7 9 4 7 160-169 2 3 170-179 1 1 1 1 Total 815 787 770 565 tend the inference back to ages eleven to one, since there is no evidence that mortality from one to twelve is selective in respect of altitude of intellect to any considerable ex- tent, or that the environment acts during those years to reduce and counteract tendencies to multimodality, skew- ness, and other departures from Form A. With a little less assurance, we may extend it back to the germ cells andTHE FORM OF DISTRIBUTION OF INTELLECT IN MAN 287 assert that, to a close approximation, the original capacities of white children in the United States to manifest given altitudes of intellect are distributed in a surface that is ap- proximately unimodal, symmetrical, and of Form A. THE FORM OF DISTRIBUTION IN ADULTS Extending the inference to later ages is a very different matter. If the distribution is ‘‘normal’’ at 14, it may still become skewed at 24. This would happen if the gains made by those of different degrees of intellect at 14 differed in certain ways and by certain amounts. For example, sup- pose that the altitude of intellect of fourteen-year-olds is distributed as shown in column I of Table 104, and that from fourteen to twenty-four those individuals of abilities 1, 2, 3, 4, and 5 gain 0, while those of abilities 6 to 19 gain as shown below: Ability Gain 6 0 to 15 7 15 to 35 8 35 to OO 9 59 to 1.3 10 1.3 to 2.0 11 2 to 5 12 5 to 10 13 10 to 16 14 16 to 40 15 40 to 80 16 80 to 150 17 150 to 300 18 300 to 600 19 600 to 1000 The distribution at age twenty-four would then have its low extreme at 1 as before, its mode and median at about 11, and an enormous skew running up to about 1,000. To take a much less extreme state of affairs which might be real, suppose the condition at fourteen to be as in columns II and III of Table 104, and the gains to be as shown in column IV. Then the condition at twenty-four will be as shown in column V with a clear skew.288 THE MEASUREMENT OF INTELLIGENCE We also have evidence that a positive relation of gain to ability exists in the case of the ages above fourteen, though we do not know its exact nature or amount. Imbe- ciles notoriously gain very little. Thorndike has shown [’23] that the sort of pupil who attends high school gains up to eighteen at least, in the ability measured by stock intelligence tests, and that the white pupils gain much more than the colored pupils. TABLE 103. Haccerty DELTA 2. DATA FOR SURFACE OF FREQUENCY IN EQuAL UNITS. Oneinsl Abscissa Ordinate Heights to Make the Areas Tee Length in Equal to the Corresponding Per- Equal Units mille Entries of Tables Age 11 Age 12 Age 13 Age 14 0- 9 12.00 (Est.) 5 + 2 10-19 12.00 (Est.) 6 8 4 4 20-29 10.70 9 7 7 3 30 9.50 51 29 15 6 40 9.33 103 47 23 20 50 10.85, 90 73 31 21 60 10.55 116 89 59 54 7 8.62 151 136 99 77 80 9.48 137 124 120 123 90 9.15 134 149 141 148 100 10.54 103 115 141 139 110 10.01 54 88 145 136 120 12.06 36 69 82 112 130 13.01 16 32 68 64 140 19.86 0.5 12 ily/ 25 150 20.88 0.5 2 4 3 The differential gain could be caused by several differ- ent factors. Inner mental growth is less in amount in the dull at all ages; it may, and probably does, slow up and approach zero earlier in the dull. Insofar as ability with intellectual tasks is due to environment and training, the expectation will be that each added acquisition will be a stimulus to others and an aid in acquiring them. So learn- ing to read commonly leads to the acquisition of a wider vocabulary and a better score in opposite tests and comple-THE FORM OF DISTRIBUTION OF INTELLECT IN MAN 289 tion tests than would have been attained by oral intercourse alone. The more intellectual the individual is, also, the more will he give his free time to intellectual pursuits. Finally, vocational selection is such that the more intellec- TABLE 104. THE EFFECT OF CORRELATION BETWEEN STATUS AND GAIN WHEN GAIN IN- CREASES IN A GEOMETRIC RATIO. I II III IV V Status Frequency at 14 Gain Frequency at 24 Grouped Grouped 14 to 24 by 1’s by 3’s 0 1 .02 2 oll 0.2 2 ako 125 3 1.14 156 4 4.85 21.5 195 21.5 5 15.50 244 6 38.76 305 7 77.52 242 381 242 8 125.97 ATT 9 167.96 596 10 184.76 521 745 510 11 167.96 931 12 125.97 1.16 13 77.52 242 1.45 214 14 38.76 1.82 15 15.50 2.27 16 4.85 21.5 2.84 53 lyf 1.14 3.55 18 19 4.44 19 .02 2 5.55 6 20 21 22 0.2 23 24 25 0.02 tual individuals continue in school and engage in clerical and professional work that involves intellectual activities, while the dull leave school for labor which requires little thought, and sometimes does not even permit it.290 THE MEASUREMENT OF INTELLIGENCE There may of course be a marked increase of gain for those of higher abilities without producing skewness. If g—as-+b, no skewness will be produced, no matter how steep the relation line may be. The variability will in- crease, but the form will still be Form A, as shown in Table TABLE 105. THE EFFECT OF CORRELATION BETWEEN STATUS AND GAIN WHEN G=AS+B. Status Frequency Gain Frequency at 14 14 to 24 at 24 1 2 10 45 120 on he oCoonroaouw1rk WDTHE FORM OF DISTRIBUTION OF INTELLECT IN MAN 291 105, where the abilities 1, 2, 3, 4, 5, 6, ete., have gains of 2, 4, 6, 8, 10, 12, ete. The causes which influence differences in gains in intel- lect up to about fourteen do seem to produce them in a rough proportion to the differences in ability, so that the form does remain that of Form A. The only data that we have found for measuring the form of distribution of anything approximating a random sampling for any age above fifteen are the well-known Army records with Alpha, Beta, and Examination a. We have no satisfactory means of determining the value of Beta scores in a scale of equal units. So we limit our in- quiry to Alpha and Examination a. I 1 1 1 ) L L 1 L L 30 100 Fig. 32. The form of distribution of the scores of recruits in Army Alpha transmuted into a scale with equal units. l L 1 i 1 1 1 Using the equal-unit values for Alpha and a derived in Chapter VII, and proceeding as in the case of the National, Otis, and Haggerty scores for children, we obtain the re- sults shown in Fig. 32 for 51,620 native-born whites of the draft [Data from the National Academy of Sciences Mem- oirs, ’21, p. 764]. The equal-unit values of the interval from 0 to 20 in Army Alpha and from 0 to 30 in Examination a are esti- mates from exceedingly scant data.292 THE MEASUREMENT OF INTELLIGENCE The lower end of Fig. 32 would be extended if the illit- erates who were exempt from Alpha had been included. It would have been extended still further if the men rejected for dullness by the examining boards had been included. The upper end would be extended if the officers had been included. Using the equal-unit values for Examination a in the same manner, we obtain the results shown in Fig. 33 for - l 1 1 4 4 1 1 PLL peveny 100 200 Fic. 33. The form of distribution of the scores of recruits in Army Examina- tion a, transmuted into a scale with equal units. 63,647 enlisted men in four camps. [Data from the Mem- oirs, ’21, p. 492.] The same considerations concerning the inclusion of men rejected for dullness and of officers apply as applied in the case of Alpha. In these four camps, 13.9% had been excluded from examination as illiterate. It is difficult to reach any secure conclusion from the facts of Fig. 32 and Fig. 33, except that there is no evidence of negative skewness. From Alpha alone in the general draft it would appear that even after generous allowance for the dullness of the illiterates and others who were ex- cluded from examination, the distribution was skewed posi- tively, z.e., toward the high end. With Examination a@ in the four camps, however, the skewness of the original scores disappears when the values in equal units are used.THE FORM OF DISTRIBUTION OF INTELLECT IN MAN 293 We cannot even estimate with surety what the distribu- tion of 51,620 of the native-born whites would have been if they had been measured with Examination a, or what the distribution of the 63,647 enlisted men in the four camps would have been if they had been measured with Alpha. That is, we cannot decide how far the difference between Fig. 32 and Fig. 33 is caused by the tests used and how far it is caused by the groups tested. On the whole,*? we may provisionally regard the sort of intellect measured by Alpha and a as distributed in the adult native-born white population of the United States with some positive skewness. We may provisionally as- sign, as the cause of the change from the symmetry and normality found in children, a differential gain from the age of fourteen to twenty and beyond, whereby some indi- viduals increase these abilities very greatly, whereas others increase them little or not at all. This should be only pro- visional. The whole matter of adult intellect should some- time be studied with the care which it deserves. For the ages from 14 up to 17 or 18, we may assume symmetry and normality without much probability of more than a small error. Or, we may have a slightly greater prospect of correctness if we allow for a very little positive skewness, increasing year by year. 3 Certain facts of the distribution of men in occupations, of the distribution of wages, of the distribution of schooling, and the like rather favor the sup- position that adult intellect is distributed with positive skewness.CHAPTER IX A Scare ror Mrasurinc ALTITUDE OF INTELLECT [It was not a part of our original plan to make an actual scale for measuring intellectual difficulty, but only to find methods whereby this could be done. We have proved that the form of distribution of altitude of intellect in grade populations from Grade 6 to the first year of college can be known, so that the degree of intellectual difficulty of a com- posite task which is truly intellectual can be measured by the percentage of successes in such a grade population. We have also shown that the form of distribution of intel- lect of an age population 10 to 14 is approximately of Form A, that of the normal probability surface, so that the same procedure can be followed in one of these age groups. It is highly probable that it can be followed in lower age groups. Although we did not plan for scaling the difficulty of actual tasks and are not able to do it precisely with the time and facilities at our disposal, it seems best to make a begin- ning, if only to illustrate the workings of the principles and techniques involved in an actual case. The work on this scale may best be considered in two parts, that which evaluates the differences in difficulty of Composites I to Q, and that which evaluates the differences in difficulty of Composites A to I. The latter was done primarily to put the values for I, J, K - - - Q in relation to the absolute zero, for which purpose chance errors in the determinations of B-A, C-B, D-C - - - I-H are of minor importance, since they tend to equalize one another. These lower intervals are less precisely determined than those from I to Q; and we report them and their derivations sepa- rately in the latter part of the chapter. 294A SCALE FOR MEASURING ALTITUDE OF INTELLECT 295 THE DIFFICULTY OF COMPOSITES I, J, K, L, M, N, 0, P, AND Q We present first the measurement of differences in dif- ficulty between tasks I and J, between J and K, between K and L, and so on with M, N, O, P, and Q. The facts at our disposal for the measurement of differences in difficulty amongst these composite tasks are the percentages correct in various groups as shown in Table 106. Group 54 refers to the 147 pupils measured at the end of Grade 5 and at the beginning of Grade 6 with composites I, J, and K. Group 91 refers to the 246 pupils of Grade 9 who are measured with composites I, J, K, L, and M. Group 9II refers to the 192 pupils of Grade 9 who are measured with composites TABLE 106. PERCENTS OF VARIOUS GROUPS SUCCEEDING WITH 20 OR MorE SINGLE TASKS or CAVD 40-ComposiTEs I To Q. Composite Percents Succeeding 5144 OF 9II 13 17 I 91.2 99.6 J 29.1 89.4 K 11.5 61.4 47.0 L 32.9 16.3 M 0.3 7.2 N 1.1 81.5 95.4 O 48.1 77.1 iE 27.5 56.7 Q 3.7 22.9 K, L, M, and N. Group 13 refers to the 189 candidates for entrance to college who were measured with composites N, O, P, and Q. Group 17 refers to the 240 college graduates who were measured with N, O, P, and Q. If we know the form of distribution of a group and the percent of the group succeeding with a task, it requires only straight-forward mensuration to find the point on the base line corresponding to that percent, and the distance of that point plus or minus from the median (or mode, or other point of reference defined by the distribution of the 21296 THE MEASUREMENT OF INTELLIGENCE group) in terms of the mean square variation (or other de- fined measure of variability) of the group in whatever abil- ity is measured by that task. The form of distribution is taken as normal for each of the grade groups, 54, 91, 9II, and 18, in consequence of the facts outlined in Chapter II and presented in detail in Appendix III. The form of distribution of Group 17, which was composed of first-year law-school students, all college graduates, was determined by a special investigation which is reported in Appendix VI. The same has been done for TABLE 107. THE DIFFICULTY OF ComposiTES I TO Q IN VARIOUS GROUPS EXPRESSED IN EacH CASE AS A DEVIATION FROM THE DIFFICULTY FOR THE MEDIAN OF THAT GROUP, IN TERMS OF THE o OF THAT GROUP IN THE ABILITY MEASURED BY SUCCESS WITH THE COMPOSITE IN QUESTION. -— IS EASIER, + IS HARDER. Composite Difficulty 514 9I 9II iS 17 i — 1.35 — 2.65 J + .DD — 1.25 K + 1.20 — .29 + .08 L + .44 + .98 M + 1.62 + 1.46 N + 2.29 — .897 — 1.862 O + .048 — .714 P + .598 — .153 Q + 1.787 12738 the form of distribution of certain groups used, later in this chapter, namely, for the 180 adult imbeciles of mental age from 24 to 5 years, for the 100 adult feeble-minded of men- tal age near 64, for the group of 50 feeble-minded at or near mental age 8, for the group of 101 dull pupils 13 years old or over, in special classes in New York City, for the popu- lation of Grade 4 (second half year) and for the population of Grade 5. The evidence and argument in all these cases appear in Appendix VI. Table 107 gives the difficulty of various 40-composites in various groups, expressed in each case as a distanceA SCALE FOR MEASURING ALTITUDE OF INTELLECT 297 a from the difficulty which 50% of that group can succeed with and in terms of o:, (the mean square deviation of whatever ability is measured by that composite in that group). The next procedure in constructing a scale of dif- ficulty is to make all these different measurements of dif- ficulty commensurate and put them all into relation to the same point of reference. This is a complicated procedure involving the following steps: Kach measurement in o:, for a given group is to be turned into a measurement in 6; for that group, o, being the mean square variation of the group in altitude of intellect perfectly measured in truly equal units. Each measurement in the o, of a certain group must be made commensurable with measurements in the OW On any other group, by finding the comparative magnitudes of o; of the 240 graduates, 6, of the 189 college entrants, 6; of the 246 pupils in Grade 91, o; of the 192 pupils in Grade 9II, and so on. All of the different o, values may then be multiplied or divided by numbers so that all will be expressed in the same units. We shall use the mean square deviation of pupils in Grade 9 as our unit for this purpose. The measurements, now in units of 6,5, must be ex- pressed, not as distances plus or minus from the CAVD difficulty for the median now of one group now of another, but all from some common point of reference such as the median for Grade 9. ESTIMATING 6; FROM Ot, We turn the measurements in o, into terms of 6, by using 6; = ot, Vrr,t, Or by using o, — ot, It,1. The self-correlation ryt. is, of course, for a 40-composite with another 40-composite of equal difficulty, not for an in- finitely extensive set of tasks of a certain difficulty with another equally extensive set. Also ret. is the correlation for the specific group of restricted range which is being used, not the correlation for a group of wide range, such as all persons of age 20.298 THE MEASUREMENT OF INTELLIGENCE For precise determinations of Vrt,t, Or of Tr, we need measurements with more extensive groups and alternate forms of our 40-composite tasks. With the material at our disposal we can hope only for approximate results. We measure or infer rit. separately for each composite with each group. We may, however, wisely modify the esti- mate for each composite with each group in view of the facts concerning Tut, for the same composite in other groups, or for other neighboring composites in the same group. Consider, for example, the 40-composites K, L, and M. The correlations of each of these with a 40-composite of different content but similar difficulty estimated by Diieaten = are as shown below according to the group and 1 +- T20, 20 kind of coefficient computed. K L M Group 246 (Sheppard) .74+. ~ 86 .69 Group 246 (Pearson) .65+§°"° iseiter 168 5°82 Group 192 (Sheppard) .80+ F 86) 3 7 Group 192 (Pearson) .73+§°"' 189 802 7g (2 The correlations of each of them with a 40-composite of different content but similar difficulty may also be esti- mated by adding .03' to their average correlations with their nearest neighbor composites (or, with some justifica- tion, by adding .02 or .01 or even 0). Using .03, we have the results shown below. K L M Group 246 (Sheppard) .70 .68 66 Group 246 (Pearson) .69 i 697 .69 t 68% 70 t 68 Group 192 (Sheppard) .50) -. .65 = 75% Group 192 (Pearson) eat ant aD sia 15 t io Combining the two sorts of estimates, we have K L M Group 246 “0 (LU .68 Group 192 67 114 19 1See Appendix IV for the derivation and justification of this allowance for remoteness.A SCALE FOR MEASURING ALTITUDE OF INTELLECT 299 Moreover, we may consider that chance played some part in making the self-correlation of L higher than the other two; and so lower its rit. and raise theirs somewhat to balance. Similarly we may consider that chance played some part in making these ri,t,’s higher in the 192 group than in the 246 group, and allow somewhat for that. Thus we may replace the last set of figures by K L M Group 246 ae 16 70 Group 192 .67 744 od in which slight smoothing by these allowances is made. In Table 108 is collected all the information concerning the ri,t,’s for each 40-composite in each group. I and II refer to the two methods of determining riitz. In I we use the correlation between the two halves of a 40-composite obtained by taking 5C +5A+5V-+ 5D at random, the sec- ond half being composed of the remaining 5C + 5A + 5V + 21 20, 20 eee T'20, 20 : 27 20, 20 ———""_. Jn II we use the obtained correlation between oe T20, 20 HDi and estimate 145. 4 DY: That, 1S) Eito=— the 40-composite in question and its nearest neighbor com- posites,” adding .03. The correlations for composites N, O, P, and Q in group 17 under ‘‘By other data’’ were obtained as follows: A composite almost identical with N was correlated with an- other of very closely equal difficulty, giving r= .72. A com- posite almost identical with O was correlated with a com- posite of very closely equal difficulty, giving r=.75. The composite almost identical with N was also correlated with a composite almost identical with O, giving r=.73. The composite almost identical with O was correlated with Q, giving r=.73. From these correlations, allowing + .03 for 2 The results by method I are in general higher. The differences (Method I-—Method II) are: .09 .12 0 .17% .00% .20 .20 .00% .05% —-.07 .05 .05 — .06, averaging .063.300 TABLE 108. AS ESTIMATED FROM CORRELATIONS BETWEEN NUMBER OF SINGLE TASKS CORRECT IN ONE-HALF OF A 40-COMPOSITE AND NUMBER OF SINGLE TASKS CORRECT IN THE OTHER HALF THE MEASUREMENT OF INTELLIGENCE ite Tre AND ALSO AS ESTIMATED FROM CORRELATIONS BETWEEN NUMBER CORRECT IN A ’ IGHBORING 40-COMPOSITE. u IN A NE AND NUMBER CORRECT > oe 40-COMPOSITI Average Method II 40 Composite By other data r 20,20 l+r 03 + Tao with nearest 40 Leones Tt 20,20 OTT 13 17 9I 9T 9II 13 17 17 5% 9T UE Als} 17 Group 5% 661% 72% .70 78 .62 .78 78 “ffl .78 6215 .70 18% 70 86 .67 17% 17% 674% Sk co 87 75% 684% 80 691% 718% 78% .69 72 74 .86 73 69% 81 75Yy 86% 79% 12 .76 79% 74 .76 81 .76A SCALE FOR MEASURING ALTITUDE OF INTELLECT 301 one step of remoteness, we have the self-correlation of N as .72 or .76, averaging .74; that of O as .75 or .76, aver- aging .754; and that of Q as .76. 9I and 9II differ almost nil in the general magnitude of Ttyto for the three 40-composites used with both of these groups, the average difference (I-IL) being — .013 with a TABLE 109. VALUES OF r,t, DERIVED FROM TABLE 108, AND THE VALUES OF VTr,+, USED TO OBTAIN TABLE 110 FROM TABLE 107. Ttito Vreyt, Slo OE 9IT 13 7; 5144 «YI 9II 13 17 I 78 .66% 883 .815 J 78 .72% 883 .851 K 68144 .68% 828 .828 L 17% .77% 880 .880 M 70 73 837 .854 N ae 12% TT 84814 .85114 .877% O atl 821% 87714 .908 i ETM 81 87714 .900 Q .69 74 831 .860 mean square error of + .033, three times as great as the difference. So we shall probably be nearer the truth by using .684 and .684 in place of the .70 and .67, and .70 and .73 in place of the .68 and .75. In general ret. is .044 higher in 17 than in 13, and the use of this fact to smooth out the irregularities in the values for N, O, P, and Q will probably be an improvement. Thus, columns 3 and 4 below are probably truer than column 1 and 2. The totals for each group and for each composite are unaltered by the amendments. From the table Amended 1 2 3 4 N 694 80 724 77 O 784 81 Ue 824 12 784 794 ii 81 Q 69302 THE MEASUREMENT OF INTELLIGENCE We make the amendments noted in the last two para- eraphs and so use the rt,t,’s listed in Table 109 in estimat- ing the difficulty in terms of 6, for each composite in each group. The results are shown in Table 110. TABLE 110. THE INTELLECTUAL DIFFICULTY OF COMPOSITES I TO Q IN Groups 54%, QI, 9II, 13 AND 17. EXPRESSED IN TERMS OF 0; 53, G1 91) O1 911) O1 13 OR Oj 173 AS DEVISED BY THE USE OF TABLE 109. Com- ; posite Difficulty In oj 5} In 6 91 In 6; on In 64 33 In ois I — 1.53 — 3.25 J + .62 — 1.47 K + 1.36 — .85 + .10 L + .50 +1.11 M + 1.94 +1.71 N + 2.70 — 1.054 — 2.120 O + .055 — .786 ‘PB + .681 — .170 Q + 2.150 + .858 ) Estimating the o6,’s by o,;=or, r,1, we obtain ri,: by sti =,=—, Im which r:.,1 V Ties oa tween the 40-composite in question and the summation score in a long CAVD series, and ri,:, is the self-correlation of this summation score. In certain cases we have to estimate ri,i,, but the error of the estimate is small,’ and its effect is reduced since only the square root of ri,:, is used. The values of rt,:, and ri,:1, used are those used for another pur- pose in Appendix V. The results of the computations are shown in Table 111. Using the estimates of r,: of Table 111, we obtain the estimates of the difficulty of each composite for each group in terms of the o, of that group which are presented in mee ee Tt, i is the obtained correlation be- 8 For a grade population the empirical values of T,1, Vary from .91 to .95. In group 5% and group 17 where we estimate, the summation score is from a very long series, so we use .95.A SCALE FOR MEASURING ALTITUDE OF INTELLECT 303 Table 112. These differ on the average from those of Table 110, as shown below, the median difference regardless of signs being .03 and the average difference .05. oi (by VTtuts) —o: (by Ft) — .15 to — .06 4 — .05 to + .04 12 + .05 to + .14 3 + 15 to + .24 1 TABLE 111. VALUES OF Tes ESTIMATED FROM CORRELATIONS BETWEEN NUMBER OF SINGLE TASKS CORRECT IN A 40-COMPOSITE AND NUMBER CORRECT IN A LONG CAVD SkERIES. Tei Composite 5% 91 9II 13 17 if .933 759 J) .882 .907 K .854 .819 L .944 .896 M .849 922 N .819 -824 872 O 917 .944 P .948 913 Q .790 .882 Except in the case of composite I in group YI, it does not matter much whether we use the estimates of Table 110 or those of Table 112 or averages of the two. We have averaged each pair of determinations with the results shown in Table 113 which are used as the o, values in all that follows. EXPRESSING THE 6; OF EACH GROUP IN TERMS OF A COMMON UNIT We make the o,’s of two groups, A and B, commensurate by finding the difference in difficulty between two tasks in terms of o;, and in terms of o;j3, provided the two groups overlap sufficiently. Thus, we find, in the case of the groupTHE MEASUREMENT OF INTELLIGENCE 304 (17) of 240 college graduates and the group (13) of high school graduates, that: Composite O — Composite N = 1.126; 13 and 1.366; 17. Composite P — Composite O= _ .616; 13 and 600; a7 Composite Q — Composite P = 1.550, 13 and 1.026; 17. 61 13 = 1.216; 17 OF -9901 17 OF .660; 17, according to the successive pair of composites used. If we take the most remote composites which include all the data, Q and N, we have 3.286, 13 = 2.986; 17, whereby oi 13 = -916; 17. TABLE 112. THE INTELLECTUAL DIFFICULTY OF CoMmPosITES I TO Q IN TERMS OF 0; 53, 6; 91, ETC.; AS DEVISED BY THE USE oF TABLE 111. Composite Difficulty In o; 8h In 6; o1 In 61 on In 61:3 In ox If —1.45 — 3.49 J + .59 —1.38 K + 1.29 — .34 + .10 L + .47 +1.0914 M +1.91 +1.58 N + 2.80 —1.09 —2.1314 O + .05 — .76 iP + .63 — 17 Q + 2.26 + .84 In the same way, we find, in the case of the Group 9I of 246 pupils in Grade 9 and the Group 9II of 192 pupils in Grade 9, that: Composite L — Composite K = _ .84o; 9; and 1.006; on. Composite M — Composite L = 1.446; 9,and .550; on- 6; 9 = 1.196; 94 OF .886; 91, according to the pair of com- posites used. If we take the most remote pair which in- elude all the data, M and K, we have 2.286, 9, = 1.550; on, whereby 6; 91 = -686; on- In the same way, we find, with Group 54 and Group YI, that: Composite J — Composite I = 2.100, ;, and 1.946; o:. Composite K — Composite J = .7206, 5; and 1.096; or.A SCALE FOR MEASURING ALTITUDE OF INTELLECT 305 GO: 53 = -920, 9 OF 1.516, 9; according to the pair of com- posites used. If we use K and I, which inelude all the data, we have 2.826, 5; = 3.020, 9, whereby oj 5; = 1.076, 9;. For precise work in scale construction, the groups should be large and close enough together to have a consid- erable overlapping. The measurement of the o, of any one group in terms of the o,; of any other group may then be de- termined with as small an error as is desired. Our groups are obviously not large enough, since there are so great differences between the estimates of the com- TABLE 113. THE INTELLECTUAL DIFFICULTY OF COMPOSITES I TO Q. AVERAGES OF THE DETERMINATIONS OF TABLE 110 AND TABLE 112. Composite Difficulty In o; 53 In oj o1 In 6; on In 6148 In os 11 I — 1.49 — 3.37 J + .61 — 1.43 K + 1.33 — .35 + .10 L + .49 +1.10 M + 1.93 + 1.65 N + 2.75 — 1.07 — 2.13 O + .05 — .77 iE + .66 — 17 Q +2.21 + .85 parative variabilities according to the composites which we use. There is particular risk in using the estimates of com- parative variabilities in different groups which depend upon a composite that is very easy or one that is very hard for the group. In the case of the very easy composites carelessness may play a part that affects the results. In the case of the composites which are very difficult for a group, lack of effort and persistence and interest may be a disturbing factor; and it is possible that, in spite of care taken to give what seemed to be abundant time, certain in- dividuals may not have exhausted their abilities for lack of sufficient time. The eccentricity of the results with Com-306 THE MEASUREMENT OF INTELLIGENCE posite M in Group 9I may be due to this fact. In general, Group 9I was superior to Group 9II and the reversal to notable inferiority with Composite M may be explainable by the fact that this was the hardest composite taken. It was not truly the last in point of time, since all the C’s were done in one division of the examination, all the A’s in an- other division of it, all the V’s in another division of it, and all the D’s in another division of it. In eases where the material is not notably richer than this of ours and in eases where the groups are spaced so far apart that there is little or no overlapping, valuable aid may be derived from a general consideration of the com- parative variability of groups similar in school grade or other indication of intellect to the particular groups which are used in scaling the difficulty of the composite tasks. Moreover, facts concerning the comparative variability of grade populations are valuable as a check on even the best determinations made by using two or more composites with two or more groups. Consequently we have made a rather exhaustive study of the variability of grade populations from 6 through 13, using all the data that we could discover which had sufficiently large populations to make the deter- minations of variability reasonably precise. In order to discover the relative variability of different grade populations from 6 through 13, if each individual were measured in truly equal units, we may proceed in either one of two ways: We may argue after the fashion of the argument in Ap- pendix III that inequalities in the face-value units will neu- tralize each other so that the general average result from many tests, each with its own sort of inequality, will be near the truth. In this case, we simply take the sigmas by the original scoring for these different grades and get their general drift. Dr. Bregman has done this for all the ma- terial available with populations large enough to give re- hable sampling of the grades. The results are shown in Table 115 and in more detail in Table 114.A SCALE FOR MEASURING ALTITUDE OF INTELLECT 307 The second method is to transmute the face-value mea- sures for such tests as Army Alpha, National A, Otis Ad- vanced, ete., into terms of equal units before computing the sigmas. The results of the investigations reported in Chapter VII enable us to do this, since in that chapter we determined the value in equal units of each interval of the TABLE 114. DATA FOR COMPUTING RELATIVE VARIABILITIES OF DIFFERENT GRADES IN INTEL- LECT; AND FOR COMPUTING DISTANCES BETWEEN MEDIANS OF DIFFERENT GRADES IN INTELLECT. Original median refers to the median by the standard method of scoring; corrected median refers to the median by a scale in equal units; original o refers to the mean square deviation by the standard method of scoring; cor- rected o refers to the mean square deviation by a scale in equal units. Median o Grade Number Original Corrected Original Corrected Army Alpha* 6 281 54.9 55.6 18.4 19.1 9 1721 97.94 97.94 24.0 24.2 10 1223 24.0 11 977 23.8 12 1387 125.39 125.39 24.24 24.8 12 766 128.04 128.04 24.13 24.9 Coll. 1 2545 128.50 128.50 28.20 29.2 Se 400 157.8 158.5 19.99 23.3 Army Examination A* 6 742 139.8 139.8 36.9 38.94 id 685 158.6 158.6 39.2 40.90 8 630 186.1 186.1 43.04 43.39 9 311 204.36 204.36 45.89 45.53 12 53 276 274 36 36 Coll. 1 701 267.33 265.33 40.63 39.25 National A** 6 1668 111.9 111.9 22.8 21.8 9 494 141.75 140.85 16.8 16.5 * All computations exact. ** Sigmas in equal units are computed by 1% the distance required to exclude 15.87% at each extreme.308 THE MEASUREMENT OF INTELLIGENCE TABLE 114 (Continued). Grade Number Original Corrected Original Corrected Otis* 6 5952 86.8 87.2 24.3 7 3896 96.98 97.1 24.4 8 4598 111.93 111.4 25.08 9 3627 125.04 123.8 24.62 12 122 151.83 149.7 24.06 Haggerty* 6 916 91.4 91.3 20.4 20.7 7 737 105.07 105.2 20.2 8 689 113.9 113.7 19.46 9 473 BL 113.5 17.5 19.54 9 1995 116.5 116.4 18.2 23.25 12 668 135.83 139.3 15.31 22.4 I.E. k. Horm A** 6 379 83.9 81.0 32.41 9 3231 173.4 173.4 42.9 10 1935 191.1 191.1 40.3 11 1533 202.6 202.6 42.4 12 972 219.81 219.8 44.99 12 1666 227.79 227.5 45.85 I. E. R. Form B** 10 1656 209.0 43.55 11 1453 219.7 44.0 12 1207 229.9 44.7 Terman Group Test*** 9 1438 102.16 102.16 32.0 12 4886 144.55 142.55 32.61 * All computations exact. ** The sigmas in equal units will vary inappreciably from the sigmas by the original scale and are not computed. *** The effect of inequalities in the units will be almost identical for Grade 9 and for Grade 12; hence the relative values of the sigmas will not be influ- enced thereby. Consequently the sigmas for values in equal units have not been computed.A SCALE FOR MEASURING ALTITUDE OF INTELLECT 309 original scale for Army Alpha, National, Otis, Haggerty, Army a, Terman Group Test, and several others. We have made these computations with results as shown in Table 114. TABLE 114 (Continued). Median o Grade Number Original Corrected Original Corrected Brown University* 12 3333 45.69 46.2 11.59 Cols 1 2118 56.62 56.3 vere Myers Mental Measure** 6 724 46.3 46.3 13.1 12.6 7 696 49.61 49.8 14.65 8 950 54.15 04.55 13.72 9 311 57.1 57.5 13.05 13.75 Pintner Non-Language** 6 1237 316.7 313.7 86.7 86.5 7 755 339.0 339.0 73.18 8 530 379.6 381.6 73.24 9 258 400.6 403.0 75.0 78.5 Pressey Cross-Out*** 6 1057 51.18 10.30 7 998 56.10 10.30 8 725 63.12 10.0 9 303 72.5 10.0 Trabue Completion*** 6 1454 21.8 5.5 7 1456 25.39 5.67 8 1740 27.61 6.29 9 273 30.05 5.9 * The inequalities of units in the scale are such as balance one another and leave the relative values of the sigmas by the original units undisturbed. Con- sequently new values are not computed. ** The sigmas according to a scale with equal units are computed by finding % the distance required to exclude 15.87% at each extreme. *** Scores in equal units have not been determined.THE MEASUREMENT OF INTELLIGENCE 310 The data which we have used to measure comparative variabilities are the same as those which will be used later to measure the differences between the medians of various grade groups in intellect. We present them in Table 114 classified according to the examination used. In connection with each examination we record the results for Grades 6, 9, 12, and 13 (or first year of college) and occasionally for other grades or groups. We report the number of individuals; the median score, taking the TABLE 114 (Concluded). Median o Grade Number Original Corrected Original Corrected Illinois Examinatwn* 6 588 75.52 17.01 9 380 101.4 18.5 Thorndike Intelligence Examination, Part I* 12 1527 91.4 18.1 Coll. 1 166 101.7 17.6 oo al 466 108.4 17.0 Seek ol 319 107.1 18.5 Weighted average ‘¢ 1 (weights 1, 2 and 106.5 17.7 2) * Scores in equal units have not been determined. units at their face value; the median score in a scale with equal units; the mean square deviation, taking the units at their face value; the mean square deviation, using a scale with equal units. In the latter case the sigmas have been computed exactly, where it was possible, but in many cases we have had to resort to approximations. In cases where the scale with equal units was so closely similar to the original scale that little, if any, difference would be made in the mean square deviation, we have used the original figures. Notes are appended to Table 114 de- seriptive of what was done in this regard in each case.A SCALE FOR MEASURING ALTITUDE OF INTELLECT oll From the facts of Table 114 are computed the ratios of Table 115. From the facts of Table 115 we may conclude that the forces of selection and gradation which determine the vari- ability of grade populations result in a slight increase from Grade 6 to 9, which we may estimate as 4 percent (giving twice as much weight to the results from equal-unit scaling TABLE 115. THE RELATIVE VARIABILITY OF DIFFERENT GRADE POPULATIONS. Examination Using the Original Using Scales with Equal Seale Units Units O16 O112 Gis Giss 16. 16 Gi22 Gis Gis Go Wis Ois Oix Cis Cis Cis Oi army Alphas... 0c 1.01 1.00 1.00 19 1.03 1.09 1.06 PAID ges Ae ec ccccsok, 80 7844 .88% 1.13 85144 .79 86 1.09 National Ac 1:36 1.32 Ons Advi. 5 99 98 99 98 EA O PONG: cece 1.13 85 93 1.00 I.E.R. Sel. Gen. .76 1.06 76 1.06 Terman Group... 1.02 1.02 Brown Univ......... 96 .96 Myers Mental... 1.00 92 TAETIOT os sscssccssesnce 1.16 1.10 ihe Part DP Ne. 97 Trabue Comp..... .93 NO IS seks a 92 1 PACTS (eee a 1.03 Median’ ck :., 99 9916 94 9814 9216 1.01 971% 1.06 IAVOTALC csecinaan 299 95 94 1.01144 ~=-«.96 98 974% 1.04 as to those from the original scores). From 9 to 12 there pea : Oi12 . is little or no change. The medians for the —— ratios aver- i9 age 1.003. The .784 and .79 of Army a which make the aver- ages lower (.95 and .98) are from a very small group of 53, which should be given very little weight. This group was used because it enriched somewhat our very scanty ma- 0113 terial on the comparison. We may then estimate the i12 22oie THE MEASUREMENT OF INTELLIGENCE variabilities of Grades 6, 9, and 12 as 96, 100, and 100. Com- paring Grade 13 with both Grade 9 and Grade 12, we find __ oe of octet Oi12 OF Oig an average of .99; for the scales in equal units, there is a median of 1.06 and an average of 1.01. The best estimate, in view of the fact that the .884 and .86 by Army a deserve less weight than the other determinations, seems to be about 102. We then have 96, 100, 100, and 102 as the rela- tive variabilities of Grades 6, 9, 12, and 13. These general facts may be used to correct the eccentric and unreliable determinations from the composites them- selves (see page 304 f.). The use of the entire stretch of overlapping gave 6,5; aS 1.07o6;9,, for our particular group, but in general o;,; may be expected to be about .960;5. We know of no facts which make it probable that our groups 93 and 91 differ from Grades 54 and 9 in general in such a way for the original-scale units a median as to make a variation of 2 up from .96, any more prob- i9 able than a variation down. The scientific procedure would be to apply the same examinations to these two particular groups, and compute the variabilities in units of known value, but this was not practicable. The best thing to be done in the circumstances is to attach some reasonable weights to the two lines of evidence, and so obtain a work- ing estimate. Giving the general facts about Grades 6 and 9 a weight of 4, and the particular facts from the composites : : " 5 ONE 6 used in both groups a weight of 1, the ratio — is .98. Oior The next matter to be cleared up is the comparative variability of 91 and 911. ©" was 1.19 by L-K and .38 by O ior M-L. We shall disregard these determinations entirely and treat the variability of 91 as equal to that of 9I1, for the following reason. These two groups were constituted by a division of all the pupils in Grade 9 in a certain school at random, so far as is known. There is nothing in theirA SCALE FOR MEASURING ALTITUDE OF INTELLECT 313 summation scores to show that one is more variable than the other. The L-K and M-L determinations are enor- mously at variance, and so deserve very little weight. Between 9 and 13 there is no overlapping, so that the general facts of grade variability are the only means of estimate. As has been stated, our group 13 is a group of candidates for college entrance, not of actual freshmen. They were, however, candidates already selected by certain tests and were of intellect comparable to the freshman groups reported in Table 114, differing probably toward less variability rather than toward more, if they differed at all in this respect. 1.02 or 1.00 is then suitable as the 2!" i9 ratio, so far as is known. The last comparison to be considered is of group 13 and group 17. The determinations from the composites taken : 0113 : ; in common were: —— — 1.21 or .99 or .66, with a median of i17 -99 and an average of .95. The use of the widest stretch between composites gave .91. The .66 and .91 and .95 are probably too low, inasmuch as all depend on the + 2.21 for composite Q in the 13 group. This is the most unreliable of the eight determinations, and is probably too high. The difference between the general level of ability of group 17 and that of group 13 is 1.06 by composite N, .82 by O, .83 by P, and 1.56 by Q. The median .99 is the most probable esti- mate from the composites used in both groups. The gen- eral drift of the facts for Grades 6, 9, 12, and 138 gives the expectation that the variability in Grade 17 will be some- what but not much higher than that in Grades 12 or 13, per- haps 1.04 or 1.05 times the variability of Grade 9, giving a ratio foro of about .95. The records for Examination a with 136 college students of Grades 14, 15, and 16, and with 27 graduate students show, however, decreases in variability much below that ofTHE MEASUREMENT OF INTELLIGENCE 314 : 0i13 = Grade 9, making —— well above 1.05. So the general con- i17 siderations can hardly be used to favor change from 1.00 in 13 17 either direction. On the whole .99 for 17 oF 1.01 for 3 is fairly well justified by both methods. The values recommended for turning the various o;’s into o;,’s are then: Ois1 aa Oise TER Oior OT Oior Oig1 — Oion 0113 CEOS Oig: OY Oion Oi17 Se C3 Oigr OT Oion Nothing in the particular comparisons from the composites themselves is inconsistent with these estimates. What has been done is to use general considerations to locate ratios within the limits of those which were reasonable in view of the particular comparisons. Using them the measures of Table 113 become those of Table 116. EXPRESSING THE MEASURES OF DIFFICULTY AS DISTANCES FROM A COMMON POINT OF REFERENCE The differences in difficulty of composites I, J, and K plus and minus from the median of group 53 may be ex- pressed as differences from the median of group 91, by find- ing the differences between the difficulty for the median of group 53 and the difficulty for the median of group 91. This may be found by using the composite tasks which were used with both groups. Thus composite Lis, by Table 116, 1.520;5 easier than the task which just 50% of group 53 can master and 3.370, easier than the task which just 50% of group 9I can master. By this determination, the difficulty of the median task for 5} is 1.85 less than the difficulty of theA SCALE FOR MEASURING ALTITUDE OF INTELLECT 315 median task for 9I. Using the facts of Table 116 for com- posites J and K in the same manner, gives 2.050,, and 1.7loi9. The average is 1.876;9; the median is 1.850,9. In the same manner, K is .350,, easier than the task which just 50% of group 9I can master and .10o,, harder than the task which just 50% of group 9II can master. By this determination, the difficulty of the median task for 9I is .450;) greater than the difficulty of the median task for 9II. Using the facts for composites L and M gives .61loi, greater and .280;, less. The average of the three determi- nations is .260;); the median is .45o,. Composite N is 2.7506,, harder than the task at which 0% of 9IT succeed, and 1.056,, easier than the task at which 00% of group 13 succeed. So the difficulty of the median task for group 13 is 3.80;, greater than that of the median task for 9II. Using N, O, P, and Q in similar manner, the difficulty of the median task for Group 17 is found to be 1.026,, or -806;9, or .8loi9, or 1.3446,, greater than the difficulty of the median task for Group 13. The average is .994o,,; the median is .926;o. Relating the difficulty of the median task for each group to the difficulty of the median task for a group half-way between 9I and 9II, we have: Computed Computed by average by medians The median for 54 — Median 91+ 9II —1.74 —1.624 The median for 9 — Median 91+ 91] + 13 + .224 The median for 9I1I — Median 91+ 9II — 13 — .228 The median for 18 — Median 91+ 9IT +3.80 + 3.80 The median for 17 — Median 91+ 911 +480 +472 The reasonableness of these estimates may be checked by the facts for the difference between the median scores in Grade 54 and Grade 9 and Grade 13 in intelligence ex- aminations in general, expressed in terms of the variability of Grade 9, or in some other unit of measure.316 THE MEASUREMENT OF INTELLIGENCE We have collected the available facts concerning the median scores of Grade 6, Grade 9, Grade 12, and the first- year of college, in Army Alpha, Army Examination a, Na- tional A, Otis Advanced, Haggerty, I. H. R. Sel. Rel. Gen. Org., Terman Group, the Brown University Examination, the Myers Mental Measure, the Pintner Non-Language Test, the Trabue Completion, the Illinois Examination, and the Pressey Cross-Out Test. They are reported in Table 114 (on pages 307 to 310, inclusive). For all save the last three, we have computed what the differences between the medians in question are by a scale of equal units. The re- sults, both by the original scale and by the scale with equal units, are shown in Table 117. TABLE 116. THE INTELLECTUAL DIFFICULTY OF COMPOSITE TASKS I TO Q IN TERMS OF Ojo. Difficulty Composite By 5% By 91 By 911 By 13 By 17 I — 1.52 — 3.37 J + .62 — 1.43 K +1.36 — 85 + .10 L + 49 +1.10 M +1,.93 + 1.65 N +2.75 —1.05 — 2.07 O + .05 — .75 iP. + .65 - 16% Q +2.17 + 8216 The variabilities used in computing Table 117 are, of course, the variabilities of the respective groups im the ability measured by the particular instrument used, such as Army Alpha or National A. (01S Oatpna OT Onational 3 Mg Re M6 - Ma iphag ree Ma tpha6 O9 OAlpha9 Mnat.9 a Myat.6 . ° ee OF the like; and will be smaller than Nat.9 mM y aes m y m ae m _ 1 CAVD9 CAVD6 or peek eel Cx SINCE Oaipna OF Onat.9 will be Ocavp Oj larger than oi.)A°SCALE FOR MEASURING ALTITUDE OF INTELLECT 317 Oaipna Should be treated just as we treated o,,’s. We have to estimate oi from Gaipnag OF Gnat.9 OF Ootiso. This has to be done rather crudely since neither the self-correlations of most of these tests, nor their correlations with any such criterion as the score of one of our long CAVD series, have been worked out. TABLE 117. DIFFERENCE BETWEEN GRADES IN Scores ATTAINED IN VARIOUS INTELLIGENCE EXAMINATIONS. The self-correlation of the I. EK. R. for Using the original scores Using scores in Equal Units Mm, — Mz Mi3—-My. My—-M, Miz — My M3 ~My Wh; — My2 Wisse Et: Ros Dic; & one Army Alpha 175 1.28 2.50 Army a 1.42 1.38% National 1.76 Otis Adv. 1.49 1.05 Haggerty 1.09 1,21 T. E.R. 2.15 1.20 Terman 1.26 Brown .87 Myers 81 Pintner 1.14 Thorndike 831% Trabue 1.40 Illinois 1.40 Pressey 2.13 Median 1.40 1.451% 1.21 1.94 Average 1.64 1.45 1.20 peodian + i SNGEagS 1.52 83% 1.45 1.21 1.94 .87 “= two different forms of the examination taken a year apart is .82 for 1,000 boys of Grades 9, 10, and 11, and is .86 for 489 sixteen-year-old boys in these grades [Bailor, 724, p. 8]. We have computed the self-correlation of the Terman Group Test for 209 cases of high school pupils in Grades 9, 10, and 11, finding it to be .92. The correlation of the Hag-318 THE MEASUREMENT OF INTELLIGENCE gerty test against a combined score in Army Alpha, Thurs- tone, Otis, Pressey, and other tests is .89 for a group of 60 college seniors. This would make the self-correlation about 80. The self-correlation in ‘‘an entire school,’’ the two trials being on the same day, is .90 [Haggerty, ’23, p. 54]. From the data given in the Memoirs [’21, pp. 315-17], we estimate the self-correlation of Army a as about .80 for a gerade population. The Otis Self-Administering correlates .88 with the Terman Group Test in a group covering Grades ftom? (Clark, 225; p. Lol: Allowing for the restriction of the range in our groups as compared with those reported above, we may expect the self-correlations of these various examinations within one gerade to vary around a central tendency of about .80 for rt,t,, Dividing by \/.80, we have, for the data from equal- unit scores: M5 Mel OLoie m5 — Ms, — 1 d0G16. Ms — are: M33 — Myo = + 978019. The same divisor with the data from original scores gives: ms) —— mM, — le (Onis: Mi3 —Myo— .934o0i0- Allowing a weight of 4 to the determinations from scores in equal units and a weight of 1 to the determinations from the original scores, we have: M, —M, —1.64oio. Mo —— Nn — Wee M13 —My — 2.17 oo. M3 —My.— .964 4p. We have two independent estimates of m3 — Mg, 2.176;9 by the direct comparison and 2.3140;) by the comparison via Grade 12. Allowing equal weight to each gives 2.24oi9 as the combined estimate.A SCALE FOR MEASURING ALTITUDE OF INTELLECT 319 The 1.640,, for my —m, agrees very well with the ob- served results of the Av., —1.746,5, and the Median, — 1.620;9, for my—m,,; and we may reasonably accept — 1.746,) or — 1.620; or the average —1.686,,. We shall take the last, and use —1.7o;5 as the my — m,, difference. The observed comparison of 9I and 9II may be taken as it stands, there being no relevance of the general facts to it. So 9I is .130,, or .2240,, above m, and 9II is .130,, or .2246i9 below it. We use + .20,, and — .2oio. The 2.246;, is much below the observed result of 3.60¢,, for our Group 13 — Group 9; and, since this 3.60¢,, depends upon the single determination by Composite N, it is wise to consider possible amendments of it in view of the general facts. The following additional facts will help in the decision. The individuals of Groups 13 and 17 were tested with half of the Composite M and with D44, which is only a little harder than D4. We can infer approximately what the percent of successes with Composite M would have been, if it had all been given, by allowance for the missing C (Completion M) and for the replacement of D4 by D44. In the case of the 189 individuals of Group 13 there were four who might perhaps have failed to have 20 or more right out of 40 in Composite M if they had been tested with it. By our estimates two probably would have so failed. This gives 1.06% or 2.346.13; below the median dif- ficulty for Group 13. This, in terms of o;1; would be 2.75; in terms of oi, it would be 2.70. This would make the 9 median 4.5¢,, below the 13 median. Among the 240 of Group 17 there was no individual who would not have had 20 or more right if he had been tested with all the 40 tasks. There were some who probably would have had only 22, 23, 24, or 25 right. By our esti- mates 2 would have scored 22. lene oe alos 3 6 66 66 D4. y 66 66 66 95,320 THE MEASUREMENT OF INTELLIGENCE The level of Composite M is then probably more than — 3611, below the median of Group 17, but not much more than that. A reasonable placement would be — 3.26%4;. This in terms of o;;, would be 3.64; in terms of oj, it would be 3.53. This would make the 9 median 5.326,, below the 17 median or about 4.30,, below the 13 median. In view of these additional facts it seems best to con- sider that our Group 13 differs more from our Group 9 than the college freshmen classes of our general survey dif- fered from the ninth grades of that survey, and that the 3.606;9 18 approximately correct. This means that we are treating Groups 13 and 9 as if only about one in ten of the latter were equal or superior to the lowest tenth of the former in altitude of Intellect CAVD; and this would not, in our opinion, seem too small an overlapping to anyone who knew the two groups. The difference (1.006, av. or .926,, median) between Group 13 and Group 17 is determined from four different composites and with a mean square error of only .11. There is no reason to alter this in one direction rather than in another. So we put all the measures of difficulty of Table 116 into differences from the difficulty of the task at which 50 per- cent of our Group 9 would succeed by the following: —1.7o,, for Group 54 ee Gol aos i 9T a EAI se oi OTT =pattate 13 me OGta ce a 17 The results are shown in Table 118. The average values, allowing equal weight to each deter- mination, are: T—=—=3:26i9 M=+ 1.86;5 J =— 1.2615 N =-+ 2.66; Ke eas O =-+ 3.80: L=+ 8oi9 P=+ 4 Aoi Q = + 5.6019A SCALE FOR MEASURING ALTITUDE OF INTELLECT 321 The differences, all in terms of oj9, are: J-I = 2.0* N-M= 8 K-J = 1.0 O-N = 1.2 L-K = 1.0 P-O= 6 M-L = 1.0 Q-P =1.2 The measurement of the unreliabilities of these deter- minations is beyond our facilities both of time and skull. They are doubtless large, perhaps as large as .15. They are, however, not as large by far (relative to the differ- ences to be measured) as are those of the best forms of the Binet. TABLE 118. THE INTELLECTUAL DIFFICULTY OF TASKS I TO Q EXPRESSED IN EACH CASE AS A DIFFERENCE FROM THE MEDIAN DIFFICULTY FOR GROUP 9, IN UNITS OF ojo. Task Difficulty By 5% By 91 By 911 By 13 By 17 I — 3.22 = elkif, J — 1.08 — 1.23 K Sere sree = 210 L + .69 + .90 M + 2.13 + 1.45 N + 2.55 + 2.55 + 2.53 O + 3.65 + 3.85 P + 4.25 +443% Q + 5.77 + 5.42% THE DIFFICULTY OF COMPOSITES A, B, CG, D, E, F, G, AND H” As was stated at the beginning of the chapter, the mea- surements of these lower levels of difficulty are less secure than those of tasks I to Q, since investigations of the form of distribution of the various groups used and of their dif- ferences in central tendency and variability comparable to the investigations in the case of Grades 6 to 13 have not 4 These estimates will be amended by the results from other large groups to become 1.9 for J-I and 1.1 for K—J. 5 Composite H contained only 30 single tasks, having no sentence com- pletions.DD, THE MEASUREMENT OF INTELLIGENCE been made. The results of such investigations as we have made are reported in Appendix VI. The basal facts for measuring the differences in diffi- culty between A and B, B and C, C and D, and so on, are the results of experiments with 180 adult imbeciles of Stan- ford Mental Age from about 24 years to 5 years, 100 adults of mental age 6 (a few over 84 months), 50 feeble-minded comprising all the children graded as Class III in one in- stitution for the feeble-minded,® 101 pupils in ungraded classes in a large city,’ 163 pupils in Grade 4 (second half), 311 pupils in Grade 5, and 44 adults, recruits in the United States Army. These groups will be referred to in order as: im. 3, im. 6, f., sp., 4, 5, and ad. (The use of im. and f. in- volves no theory of classification, but is solely for con- venience. ) In groups im. 3, im. 6, f., and sp., the tasks were given orally. In groups f. and sp. (and in some eases in group im. 6), the individual tested was allowed to look at the book- let as the questions were asked, and read it if he could. In groups 4 and 5, the tasks were all presented in print. The comparative difficulty for any given group of oral and printed presentation has not been determined. In the com- putations of differences between groups in variability and central tendency which follow, the assumption is made that the pupils in Grades 4B and 5 would do better, but vary about as much, if they were tested in the manner used with the lower-level groups, as they did when tested with the printed booklets. The amount of allowance made will be described when the differences of groups below group 4 from groups 4 and above are computed. The percent succeeding for each of the 40-composite tasks is reported for such of the groups as were measured by that task, in Table 119. Table 119 thus corresponds to Table 106. 6 This Class III corresponds roughly to grade 3 of an ordinary school. The chronological ages ranged from 9 to 21, only 6 being below 12 and only 2 over 18. 7 The distribution of ages reported was: 15 from 13-0 to 13-11, 37 from 14-0 to 14-11, 39 from 15-0 to 15-11, and 10 from 16-0 to 16-11.A SCALE FOR MEASURING ALTITUDE OF INTELLECT 323 Using for the respective groups the forms of distribu- tion derived and described in Appendix VI, the difficulty of each 40-composite is found in terms of its difference from the difficulty of that 40-composite which exactly half of the group in question would have succeeded with, in terms of the mean square deviation of the group in question in the ability measured by that 40-composite. These measures appear in Table 120, which corresponds to Table 107. TABLE 119. PERCENTS SUCCEEDING WITH VARIOUS COMPOSITES IN GROUPS IM3, IM6, F, Sp, 4, 5, AND AD. Groups in Institutions Special Regular School Adult for the Feeble-Minded Classes Classes Recruits im3 im6 f sp 4B 5 ad MA21%2 MA6 MAT-— to 5 to 7 to 10+ m7 S0 ni — 100) ni—i00 m=O msi mesiil n= 44 A 88.3 B 48.3 C 12.8 98.0 D 00.6 73.0 E 45.0 96.0 98.0 FE 14.0 94.0 96.0 100.0 100.0 G 03.0 66.0 88.1 98.8 100.0 H 68.0 67.3 91.4 97.7 97.7 I 06.0 34.7 35.6 63.3 70.5 J 03.1 13.2 56.8 K 00.0 00.3 47.7 ESTIMATING 06; FROM Ot, By means of determinations of rest. for the various 40- composites in the various groups, the measures in units of Ga im3) Ocimey and the like, are transmuted into units of OG: ima Oi ime) aNd the like. The essential facts of these deter- minations are shown below. The results appear in Table 122, which corresponds to Table 113. In general, we have measured rt;t2 both by the Spear- man formula using two twenties, and by the correlations of neighboring forties. To economize time, only one method324 THE MEASUREMENT OF INTELLIGENCE — is used in the case of group im. 3 and group im. 6 and group ad.; and only 98 of the 180 individuals are used in group im. 3. The self-correlation of one random half of a 40-com- posite with the other half for 98 of the imbeciles of men- tal age 24 to 5 years was found to be .864 for A, .773 for B, .86 for C, and .76 for D. The self-correlation of one 40-composite with another at the same level may then be Oto : estimated (by Poi a = ) as .927 for A, .874 for B, .924 ++ Teo for C, and .864 for D. TABLE 120. Tur DIFFICULTY oF Composites A TO K, IN VARIOUS GROUPS EXPRESSED AS A DEVIATION FROM THE DIFFICULTY FOR THE MEDIAN OF THAT GROUP, IN TERMS OF THE o OF THAT GROUP IN THE ABILITY MEASURED BY SUCCESS WITH THE COMPOSITE IN QUESTION. Group im 3 im 6 f sp 4 5 ad n 180 100 50 101 163 311 44 A - 1.68 B 200 Cc +1.13 -— 1.90 D + 1.83 — .45 E + 229 — 1.33 — 2.61 FE + 1.25 — 1.25 — 2.31 <-—3.10 G + 2.08 =| 238 — 1.54 — 2.26 <-3.10 H — 41 — 44 — 1.37 — 2.00 — 2.00 I +1.17 + .36 i cou! — .34 — 1.00 J + 1.87 +1.12 — .30 K > +3.10 + 2.75 + .06 Dividing the entries under im 3 in Tables 120 by \.927, V .874, \/.924, and 1.864, respectively, we obtain values in terms Of 6; img from the values for Ga ims, Op ima, ete. They are: — 1.74, + .05, + 1.18, and + 1.97, as shown in Table 122: The inter-correlations of the 40-composites C, D, HE, F, and G in the case of the 100 adults of mental age 6 were as shown in Table 121. The correlations with neighboringA SCALE FOR MEASURING ALTITUDE OF INTELLECT 325 composites were .685 for A, .703 for B, .725 for C, .769 for D, and .809 for EK. We add .03 to obtain estimated rijt.’s. Dividing the entries in the im 6 column of Table 120 by V.715, V.733, V/.755, V/.799, and 1/.839, respectively, we obtain values in terms of 6; ime from the values for Go mes Op ime) Ozime ete. They are: —2.25, — .53, + .33, + 1.40, and + 2.27 as shown in Table 122. The self-correlation of one random half of a 40-compos- ite with the other half for group f (the 50 feeble-minded in class 3) was found to be .638 for E, .809 for F, .638 for G, .876 for H, and .588 for I. The self-correlation of one 40- TABLE 121. RAw INTERCORRELATIONS OF COMPOSITES C, D, E, F AnD G IN THE CASE oF 100 INDIVIDUALS CHRONOLOGICALLY SIXTEEN OR OVER, AND MENTALLY SIx. D E F G Cc -685 -685 088 426 D 721 .638 426 E .729 009 EF 809 composite with another at the same level of difficulty is thus oa (by wes i) 779 for B, 894 for F, .779 for G, .934 for H, and .741 for I. The inter-correlations of the 40-composites EK, F, G, H, and I for group f, were: E with F=.59, F with G=.81, G with H =.88, and H with I=.81. The correlations with neighboring composites are thus .59 for EK, .70 for F, .844 for G, .844 for H, and .81 for I. Adding .03 as an allowance for remoteness gives .62, .73, .874, .874, and .84. Allowing equal weight to these two determinations, the values of rit. are, respectively, .70, .81, .83, .90, and .79. Dividing the entries in column f in Table 120 by V.70, v.81, V.83, v.90, and 1.79, respectively, we obtain values in terms of o,, from the values of Gz:, Or: Oar) Our, ANd Ox.326 THE MEASUREMENT OF INTELLIGENCE They are — 1.59, — 1.39, — .36, — .43, and + 1.32, as shown in Table 122. In group sp (the 101 pupils in special classes) the inter- correlations of neighborigg composites were: F with G= .62, G with H = .774, H with [=.86. Adding .03 allowance for remoteness, rit. is .65 for F, .73 for G, .82 for H, and 89 tor I: The self-correlation of one random half of a 40-com- posite with the other half in group sp is .73 for F, .54 for G, .64 for H, and .82 for I. The correlation of a 40-composite with another of equal difficulty, that is, rijt2, may by these / 9 . 20 facts be estimated ( by req) as .844 for F, .701 I + Too for G, .780 for H, and .901 for I. Giving equal weight to these two determinations, we have, as values of rite, .75, .714, .80, and .894 for F, G, H, and I in group sp. Dividing the entries in the sp column of Table’ 120 by V/.75, V.715, v.80, and 1.895, respec- tively, we have values in terms of o, ;» from the values of Ox sp) OF sp, ete. They are — 2.67, — 1.82, — .49, and + .38, as entered in Table 122. In group 4 (the 163 cases of Grade 4B) the intercorre- lations were: G with H —.834; H with I=.86; I with J = .63; J with K=.47. Adding .03 as allowance for remote- NESS, Tit. 1s .864 for G, .88 for H, .774 for I, and .58 for J. In group 4 the self-correlations of one half with the other half of each 40-composite were .69 for G, .79 for H, .83 for I, and .65 for J. The correlation of a 40-composite with another of equal difficulty is thus .817 for G, .883 for H, .907 for I, and .788 for J. Giving equal weight to the two determinations of Ytits, we have .84, .88, .84, and .684 for G, H, I, and J, respec- tively. Dividing the entries in the 4 column of Table 120 by V/ .84, \/.88, \/.84, and 1.684, respectively, we have values in terms of o;, from the values of of 4, Og 4) Ou 4, ete. They are — 2.47, — 1.47, +.40, and +2.26, as entered in Table 122.A SCALE FOR MEASURING ALTITUDE OF INTELLECT 327 The intercorrelations in the case of the 311 pupils of Grade 5 were: H with I=.77, I with J =.85, J with K = 61. The correlations with neighboring composites, ele- vated .03 to allow for remoteness, are thus: .80 for H, .84 for I, .76 for J, and .63 for K. The self-correlations in this group, using 20 elements with 20, are: .68, .77, .70, and .51 for H, I, J, and K in order. The correlation of one 40-composite with another of equal difficulty would then be .81 for F, .87 for I, .824 for J, and .674 for K. Allowing equal weight to the two determinations of Gries TABLE 122. THE DIFFICULTY OF CoMPOSITES A TO K IN TERMS OF Oims; Oime, Or, ETC. im3 im6 f sp 4 5 ad A — 1.74 B + .05 C + 1.18 — 2.25 D + 1.97 — .o3 E + 530 -— 1.59 F + 1.40 — 1.39 — 2.67 G + 2.27 — .36 — 1.82 — 2.47 H — .43 — .49 — 1.46 — 2.23 — 2.34 I + 1.32 + .38 + .40 — .37 —1.09 J + 2.26 + 1.26 — .35 K +3.41 ‘fate OO we have .804 for H, .854 for I, .79 for J, and .65 for K. Di- viding the entries in Column 5 of Table 120 by /.805, V/.855, /.79, and \/.65, respectively, we have values in terms of o,; from the values for oy 5, 6; 5) 63 5) Ox > They are — 2.23, — .37, + 1.26, and + 3.41, as shown in Table 122. In the case of the 44 adults, the intercorrelations of the 40-composites were: G with H =.75, H with I=.65, I with J = .96, and J with K=.91. Allowing + .03 for remote- ness, we have .78, .73, .834, .954, and .94 as the probable correlation of G, H, I, J, K, each with another 40-composite of equal difficulty. Dividing the entries in Column ad of 23328 THE MEASUREMENT OF INTELLIGENCE Table 120 by \WiiT3, V 835, V 955, and / .94, respectively, we have values in terms of 6; «a from the values Of 66 aa Coho oro; 200 Ona Lieyiare Se 2.34, — 1.09, — .35, and -+ .06, as shown in Table 122. EXPRESSING THE 6; OF EACH GROUP IN TERMS OF Oig In accordance with the earlier findings, oi9: and oion are treated as equal. The o, of group 5 (311 pupils in Grade 5) is made com- parable with oi) by finding the difference in difficulty be- tween two tasks in terms of o;; and in terms of ois, which is equal to oi. Thus K-J = 2.156,, and 1.096; whereby 6:5 = -Oloior- J-I = 1.6306,,; and 1.946;9:, whereby oi5 = 1.1969; It is also possible to proceed indirectly by way of 6:53, Which was found to equal .98o;5. Thus K-J =2.156,,and .720;5, or .7050i9, whereby ois = .o0019- J-I = 1.636,; and 2.10o;5; or 2.060i9, whereby oi5 = 1.260. It is also true in general that the variability of Grade 9 im intellect will not be much different from that of Grade 93. If an estimate had to be made from general considera- tions, o;; would be expected to be at least .950io. We assign equal weight to .85 (the median of the .51, 1.19, .83, and 1.26) and to .95; and use .90oi9 as the value of oy. he — 2.23, — .37, + 1.26, and + 3.41 of Table 122 in terms of o,; thus become the — 2.01, — .33, + 1.18, and + 3.07 of Table 123 in terms of ojo. Next, the o,4 is put in terms of oj) both directly and via Oi5- J-I = 1.8606, and 1.946,, whereby oi, = 1.04010. J-I = 1.866, and 1.636,;, whereby 6,4 88015 OF -79oi0- I-H = 1.866,, and 1.866,;, whereby 614 = 1.000;; or .90ois. From these facts, 6,4 is taken to be approximately equal to .940,; or .850;9. The — 2.47, —1.46, +.40, and + 2.26 ofA SCALE FOR MEASURING ALTITUDE OF INTELLECT 329 Table 122 in terms of 6,, thus become — 2.10, — 1. 24, + .34, and + 1.92 of Table 123 in terms of Gp. Next the o;., (the 101 special class pupils) is nut in terms Of ojo Vid 6,5 and via 6,4. H-G = 1.336;, and 1.016,4, whereby Oisp== -(60i4 OF .65D0jo. I-H= _ .87o;., and 1.866,4, whereby 6.) = 2.146;,4 or 1.9360. Nothing is known precisely of the general tendency of pupils over 13 in such special-class populations to vary, though the expectation would be that the variation would be fairly wide, from pupils who really belonged in an insti- tution for the feeble-minded to pupils who really belonged in a regular Grade 4. Giving equal weight to the three de- terminations, 6;.»> 1.47019. Giving equal weight to the I-H and the H-G pairs, 6;,, = 1.266,5. We use the latter. In a similar manner 6;;, 6; img, ANd 6; ime ATE put in terms of Gig. The essential facts are: I-H = 1.75o;, or 1.866;, or 1.866,, or 87 6isp) Whereby Gir = 1.060;,; Or 9564, OF Oi: = 1.066,4 or 90649, OF Oic— -D0Gjsp OF -6304p. H-F = .96o6;, or 2.186,,,, whereby o;,—= 2.2¢0isp OF 2.8660. We take the median of these four observations, .9246;5. G—F = .876; ime and 1.036,, and 890isp) Whereby Oi ime — 1.186;; or 1.096i5 or Oi; ime — Ors or e22Gio F-Ei = 1.076, ime and .206,,, whereby o; ime = -1870;; or 17645. Since group im6 contains only individuals of Stanford Mental Age 6,° it may be assumed to be much less variable than group f or group im3, or any other group used here. The average of the three determinations (1.096;9, 1.226;5, and .170;9) which is .836,9, is used, giving in terms of ojo — 1.87, — .44, + .27, +1.16, and + 1.88 as the entries in Table 123. 8 Plus two individuals of mental age 7.330 THE MEASUREMENT OF INTELLIGENCE DEO 1961 as Ol, 1.020; ine) whereby, 0; im3 oe 2.1860; im6 or 1.81lo6io- Using 6; ims = 1.8lois, the entries for Table 123 are — 3.15, + .09, + 2.14, and + 3.97. The facts for the adult group are: K-J = .41oy2q or 1.096;9 whereby Giaa = 2-67 io. J-Il = .74oj2a oF 1.94019 whereby oi2a = 2-626i0- I-H = 1.256;24 OF 1.860;; or 1.860;4 OF .870;

S mS s S S & ) VJ a = =z 4 —Q bd Q 5 20:0) 23:2, 92083) 26S eS) esse O Oa s 6 20:2)" 23:4, 25:5" 26:9) 28:0) 28'd) | 201i O0 7 20:5; 92337 25:8 272s 2833)" 28:8) 2 0looUes 8 2037) 23:9) 26:0) 274 2820) | 29108 2 Oto Uro 9 20:0" 24:2) 26:3) 206i 28-8. 29-3) 300m Use 10 22 2454 26:0) 2729) 2910) 29}0 ee 502 ole 11 21:4 24:6 26:57 2851 29:2) 29:7, | 304) sike 12 2G) 24-8) (26:9) 2883, 29:4 29:0) OG eon: 13 218) 25:0) ail 283d) 2916 BONE eS 0lSerotlco 14 2220) 25:2 Zed) 28er 298, 30:3) ol Oeroiles 15 22k Dose 2iko) 2039) 30:0) = 30to) arolca moa 16 224 25:6) 2 29S | 3032) SOL ound ocr 17 22'5) ost, 2eS! 29:2) 7 30'3) 30:8) Sloot 18 2257 25\9), 2810) 29%4) S0ib) SiO) alee octo 19 2213) 2610) 2831 329'5) 30:6) Sin Si Se oso 20 PRY) PAP, PRB] eA OS) hil} BRN BRU 21 23:2) 26:4 280) = 2919," SiO) sito, S22 esar0 22 23:3 20%) 9286) 30:0 = Sih 3G oat oe 23 PB PAM PAs) XO) BH GLE GAG) Sha! 24 23:6) 26:8) 28:9) 30S ole oNk9) 32:6 oot: 25 Pid} PU) ADL GMS) SLR GID BRS SBE 26 PHI) PAPA PBS ETI StI} BAS} © BRH) ©— oBHS 27 24.2 27.4 29.55 30.9 32.0 32.5 33.2 34.0 28 2A 2026 295%, Belle 3202) 32 So: ote 29 24:6 27-8 29:9 31:3 32:4) 3219 33:6 34.4 30 PYLE PM) aL Bly RA SBLIL BED GHEE 31 208 28:2) 30'3) silt tO 2tSe Sots Ot Oto aeG 32 2Oco) aaoeo) 9 750:6) 32/0) esol oor 34.3 35.1 33 mays) | Pésle SINS) A GBS} SEL 34.5 35.3 3 Zor8) 29:0) Se (32's 33:6) 4 34:5) 00 35 20:08 2926 Shor SIN oS 34:3 35.0) 30:8 | | | | | |CHAPTER XII Tur MEASUREMENT OF WIDTH AND AREA OF INTELLECT The width or range of intellect at any altitude or level of difficulty is measured by the number of tasks mastered at that altitude. Thus, suppose that Intellect X is mea- sured with ten 40-composite tasks (Ni, No, Na, ete.), each equal to Composite N in difficulty; and has the following score: Number of single tasks right in N,; = 20. a4 “c“ <6 “c ‘é 6c N, — 19. 6c “c“ 6c 6 ‘cc “< Ns — 9]. 66 6c 6 6 66 ‘é N, — 9() ~*~ . ‘cc 6c ‘< “< “é 6c N — 9() 5 Sao aod . cé 6c ‘cc <6 ‘é “< Ne — 18. “<“ a 66 <é “é “é N, — 19, ‘c “c ‘< 6c ‘é 6< Ng — “c ‘c “6 cé «6 “c ING — 99. ce ce cc “é éé 4 N — 90 4X19 — 4U- Success at one of these 40-composite tasks means attain- ing 20 or more single tasks correct. The width of Intellect X at Altitude N is 7 out of 10 for Tasks N, to Ni. It may also under certain conditions be considered as 200 out of 400 for the single tasks composing N, to Nio, or as a certain number out of 40 for the same single tasks grouped in 10- composites, or as a certain number out of 100 for the same tasks grouped in 4-composites. WIDTH OF INTELLECT IN THE CASE OF TRULY INTELLECTUAL TASKS Consider first the first and most correct meaning, that is, the number of composite-tasks correct, here 7 out of 10 for Tasks N, to Ni. If the ten are a representative sam- pling of tasks of intellectual difficulty N, Intellect X may 373374 THE MEASUREMENT OF INTELLIGENCE be expected to have approximately 70 successes out of 100, or 700 out of 1,000, or in general approximately 70 percent of successes with tasks at the intellectual altitude of N. If there are 200 such tasks, his probable width is 140; if there are 60,000 such, his probable width is 42,000. If, when mea- sured in respect of ten 40-composite tasks representative of intellectual difficulty M, his scores are 25, 25, 22, 24, 26, 25, 23, 24, 26, and 21, he may be expected to have 100 per- cent of successes with tasks of intellectual difficulty M. If there are 150 such tasks, his probable width is 150. If there are 40,000, his probable width is 40,000. This illustration directs our attention to two meanings of width, namely, width of intellect in the sample examined and width of intellect in the entire series which the sample represents; and also to the fact that the sample examined may have a larger representation of tasks at one altitude than of tasks at another. Suppose, for example, that the sample contains 40 single tasks between difficulty 30.0 and 30.99, Aer << es ee 30.0 and 35.99, and AQ. < ue es < 40.0 and 40.99, and that there really are one million CAVD tasks between 30.0 and 30.99, two million between 35.0 and 35.99, and three mil- lion between 40.0 and 40.99. Then the sample has twice as large a representation of level 35.0 to 35.99 as it has of level 40.0 to 40.99, and three times as large a representation of level 30.0 to 30.99 as it has of level 40.0 to 40.99. If an indi- vidual can do 9 out of 10 of the sample at level 30.0 to 30.99, he can probably do 500,000 tasks at that level. But if he can do half of the tasks of the sample at level 40.0 to 40.99, he can probably do 1,500,000 tasks at that level. If each of the tasks, the number of which measures width, is perfectly intellectual, depending for success upon all of intellect and nothing but intellect, the change from one hundred percent of successes to zero percent of suc- cesses, as the intellect in question is tested at higher andTHE MEASUREMENT OF WIDTH AND AREA higher altitudes, will be instantaneous. When a small amount of inadequacy and error is present, as in our 40- composites for Intellect CAVD, the change will still be very sudden. The conditions in representative intellects, each measured by a score or more of tasks like our 40-composites | | | | OP 108 720) 750" 40) 50) 360F 70) 460s S000 Fie. 55. The probable percentages of successes of three intellects, I, II and III, in a series of 360 tasks, 20 of difficulty A, 20 of difficulty B, and so on, each task having r,,=approximately .9. The drawings are not from precise computations, being for illustration only, not for mensuration of the effect. 26376 THE MEASUREMENT OF INTELLIGENCE at each altitude in Intellect CAVD, will be roughly as shown in Fig. 59. The evidence for this is the correlations between one 40- composite and another at or near the same level, and the in- frequency of reversals from failure to success in our series of tasks. For example, in the 240 individuals of Group 17, of those failing with P (103 in all), only 4 or 3.9 percent succeeded with Q, which is 1.1 harder. In the 246 individuals of Group 9I, of those failing with K (93 in all), only 9 or 9.7 percent succeeded with L, which is 1.0 harder. The measurement of CAVD width at any altitude, in the rigorous sense of number of intellectual tasks mastered at that altitude, is thus given for most altitudes by the mea- surement of altitude itself. Nearly up to that altitude the percent is one hundred; above it the percent very soon drops to zero. Within the short distance of uncertainty the widths may be determined by experiment or estimated fairly closely from the altitude. This will hold true of any sort of intellect defined and treated in the same manner as Intellect CAVD. In propor- tion as each task depends for success upon all of intellect and nothing but intellect, a smaller and smaller increase in difficulty will cause a shift from success to failure, the alti- tude where it does so varying with the intellect that is being measured. WIDTH OF INTELLECT IN THE SENSE OF THE NUMBER OF SINGLE SHORT TASKS MASTERED, ANY ONE OF THESE TASKS BEING ONLY A VERY PARTIAL REPRESENTATION OF INTELLECT For many purposes it is desirable to know how many single tasks from a set which are nearly or quite alike in difficulty and which are nearly or quite as intellectual as any short single tasks can be, a given intellect can succeed with. If, for example, two intellects A and B have identical CAVD altitudes exactly at Level N, and if A has average scores at Levels K, L, M, N, O, P, and Q of 39, 36, 29, 20, 17,THE MEASUREMENT OF WIDTH AND AREA Sid 11, and 6, whereas B has scores of 30, 28, 27, 20, 18, 6, and 0, there is a difference between A and B which may need ex- pression. Between 40 and 20 right, and between 19 and zero right in the case of such 40-composites as the CAVD series, there are ranges of difference which may be of great importance for theory or for practice or for both. The measurement is, of course, a simple count of sue- cesses in the sample used in the examination, and an esti- mated count for the entire series which is represented by the sample. If the single tasks in K represent a selection of 40 out of 10,000, while those in L represent a selection of 40 out of 15,000, and those in M represent a selection of 40 out of 25,000, A’s scores of 39, 36, and 29 in the examina- 39 36 tion mean probabilities of success with A0 < 10,000, 40°* 29 15,000, and 40 X 25,000, or with 9,750, 13,500, and 18,125 single tasks of the sort chosen as components of Com- posites K, L, and M, respectively. A series of names is needed to designate different sorts of width, from the width of an intellect in perfectly intel- lectual tasks, down through its width in various composite tasks less and less representative of all of intellect and nothing but intellect, to its width in such tasks as giving the opposite of one word, or understanding one sentence, or tracing a way through one maze, or repeating one series of five digits backward. We suggest the use of a series of W’s, each followed by a notation describing the tasks, and being in each ease the percent of successes. Thus, W(10C + 10A +10V + 10D) would refer to the percent of successes with 40-composite tasks made up equally of C) A, V, and D: WIC or LA or IV or DIN would refer to the percent of successes with a series of tasks made up of single C’s, A’s, V’s, and D’s. W(10M) would refer to the percent of successes with a series of composite tasks each made up of ten mazes. The altitude at which W is measured will require very careful description in every case.THE MEASUREMENT OF INTELLIGENCE AREA OF INTELLECT Area or volume seems the best term to use to mean the total number of tasks of some specified sort at which an in- tellect succeeds; and area seems preferable. Area, like width, will have two distinct meanings, namely, the number of successes in the sample set of tasks examined, and the number of estimated successes in the entire inventory of tasks which have been or can be made, and of which the ex- amination-tasks are a representative sample. Area of intellect, like width, is, in the strictest usage, the number of truly intellectual tasks, each of which mea- sures all of intellect and nothing but intellect. In this sense the area found will be a function of the altitude; Intellect X, of Altitude N, will succeed with all tasks up to that alti- tude, and with none beyond it. As in the case of width, it will be desirable to use area of intellect in a loose sense to mean the total number of tasks mastered which are proper components of composites which, as totals, are intellectual, all the way down from composites which are nearly perfectly intellectual to short single tasks like the single C’s, A’s, V’s, and D’s. A nota- tion like A(10C +10A+10V+10D), A(1C or 1A or 1V or 1D)N, A(10M), and the like may usefully be adopted to deseribe the kind of ‘‘area’’ that is being measured. We shall consder as a typical case the measurement of A(1C or 1A or IV or 1D). Everything is simple so far as concerns finding this area for the sample examined. But the effort to estimate the area as a fraction of all the dif- ferent sentence-completions that might be desired, all the different arithmetical problems which could be collected or invented, all the word-knowledge tasks (Shall other than English words be used?) possible, and all the sentences or paragraphs or books that might be heard or read, and so to estimate effective A(1C or 1A or 1V or 1D) brings us up squarely against great difficulties due to lack of knowledge of the relative frequency of different C’s, A’s, V’s, and D’s at different levels of difficulty.THE MEASUREMENT OF WIDTH AND AREA 379 If we know the width of an intellect at each level in an adequate sample of tasks, we can measure its total ‘‘area,’’ provided we know the number of tasks at each level. Thus, if the C, A, V, and D single tasks of Intellect CAVD at levels zero to forty’ number, in order, 100, 100, 100, 100, 100, 200, 200, 200, 200, 200, 300, 300, 300, 300, 300, 400, 400, 400, 400, 400, 500, 500, 500, 500, 500, 700, 700, 700, 700, 700, 1000, 1000, 1000, 1000, 1000, 2000, 2000, 2000, 2,000, and 2000, and if Intellect JS, when measured with a representative sam- pling of 40 at each level, scores 40 at each level up through level 30, and 38, 32, 24, 20, 10, 4, 0, 0, 0, 0 in order there- after, we find his A(1C or 1A or 1V or 1D) as 14,200 out of a possible 26,000. If there had been 650 tasks at each level, the same record in the examination would have meant 21,580 out of a possible 26,000. Such a computation of the area of an intellect would not be a mere theoretical curiosity or statistical tour de force, but would be a systematic and accurate way of measuring something of very great importance. Common-sense thought and action about intellect often deal with some- thing which this concept of area makes definite and objec- tive. Just as terms like acuity, originality, and intellectual genius refer to intellect with especial emphasis on its alti- tude, so terms like breadth, scope, and intellectual power refer to intellect with especial emphasis on its ‘‘area.’’ We should not expect common sense to make clean-cut distine- tions or to avoid confusions, for the very good reason that altitude and area are closely correlated, so that for most practical purposes, we can describe a man’s intelligence adequately by simply rating him for intelligence as a unit. But the concept of a man’s general average probability of correct response to intellectual or semi-intellectual tasks has been real and useful; and it will be more so now that it can be made definite and measurable. 1 Level 0 includes all C, A, V or D tasks from 0 difficulty up to a difficulty of 1.00, 1 includes all from 1.00 up to 2.00, 2 includes all from 2.00 up to 3.00, and so on.380 THE MEASUREMENT OF INTELLIGENCE It is possible to discover approximately the number of single tasks at each level of Intellect CAVD or any other defined intellect, though such estimates are beset by many difficulties. The enumeration of the C or A or V or D tasks harder than the average of those in Composite N and easier than the average of those in Composite O is, indeed, prob- ably comparable in complexity to the enumeration of all the species of animals. The chief and most obvious difficulty is that of deciding how much one task must differ from another in order that they shall be counted as two rather than one. Consider, for example, these fourteen tasks to be given orally: 1. John is 5 years old now. How old will he be in 3 years? . Tom is 5 years old now. How old will he be in 3 years? . John is 5 years old now. Tom is 3 years older than John. How old is Tom? 4. John is 5 years old. Will is 3 years older than John. How old is Will? 5. John has 5 cents now. How much will he have if his father gives him 3 cents? 6. John has 5 cents now. How much will he have if his mother gives him 3 cents? 7. How many dollars are five dollars and 3 dollars? la. John is 6 years old now. How old will he be in 3 years? 9a. Tom is 6 years old now. How old will he be in 3 years? 3a. John is 6 years old now. Tom is 3 years older than John. How old is Tom? 4a. John is 6 years old. ‘Will is 3 years older than John. How old is Will? 5a. John has 6 cents now. How much will he have if his father gives him 3 cents? 6a. John has 6 cents now. How much will he have if his mother gives him 3 cents? 7a. How many dollars are 6 dollars and 3 dollars? How many different tasks are there? All competent students of intellect will deny that there are fourteen. By any reasonable view, we should not count 2 as a dif- ferent arithmetical task from 1. Whether the problem is © boTHE MEASUREMENT OF WIDTH AND AREA 381 put about John or Tom or Will or Mary, does not, we think, make any difference to it as an arithmetical or intel- lectual task. Our thinking is probably sound, and we shall later state the facts and principle which justify it. But note that if we think in a stiff pseudo-logical way that the name of the boy makes no difference, we shall err. Let Tom, well known to be of age ten, be sitting in full sight and the task is now not quite the same, requiring for suc- cess that the intellect shall not be misled by the temptation to think of the present Tom. Or let the problem be stated as ‘‘Sneezer Snoop Squibb is 5 years old now. How old will Sneezer Snoop Squibb be in 3 years?’’ and the task is not quite the same, requiring that the intellect be not dis- tracted by the seductive name into inattention to the num- bers. If a psychologist should list all the arithmetical tasks that ever have been set, and add to them all that a decade of ingenious thought could devise, and then try to cull out the duplicates, he would find some that would be indubitably so, and some that would be as unlike as arithmetical tasks can be; but with many he could only say that the two tasks were somewhat different. So he would have to set up some standard of the amount of difference which would qualify two tasks to be counted as two, or some scheme for frac- tional counts. The facts which he should use for these purposes are the facts of the differences of the tasks as tasks for intellects. By this is meant not merely that the facts are facts in the minds or neurones of individuals, but also that they are facts in the action of intellects to which the tasks are pre- sented for solution. Two sentences differing in print only by an apostrophe or comma may differ enormously in the intellectual actions which they evoke in an intellect set upon solving them, and two questions which have not a word in common may arouse very similar behavior, as is the case with ‘‘Solve 2x?--+x—=21’’ and ‘‘What does y equal if oy? + y—21?” for competent students of algebra. And382 THE MEASUREMENT OF INTELLIGENCE either may arouse very different behavior according as the person reacts to it as a task to be solved or as, say, a mere question to contemplate. So the investigator seeking to measure the differences amongst tasks (apart from differences in difficulty) must be expert in the psychology of thinking, and must be skilful in examining and cross-examining individuals who do the tasks in question and report what they did. He will often have to make subtle distinctions in cases where two tasks arouse different action in two intellects, and when it is doubtful how much of the difference lies in the tasks and how much in the intellects. The objective method of correlation will be helpful. Ob- viously, if an intellect can do task 1 and cannot do task 2, then the two tasks are different for that intellect; two tasks are not perfectly alike as tasks for intellect, unless every intellect that can do one can do the other. Other things being equal, the more individuals there are within any given group who can do the one task and not the other, the greater the difference between the two (for that group) will be. More generally, the differences with which we are concerned here are measured, other things being equal, by the lack of perfect correlation between the ability to sueceed with one and the ability to succeed with the other, in some defined group of intellects, difficulty being kept constant. If two tasks are identical as tasks for intellects, rtjt2 will be 1.00. If they are of equal difficulty, the more unlike they are the lower Yt. will be until it reaches a minimum which repre- sents the amount of likeness which two tasks must have to be included in the series of arithmetical tasks which is to be enumerated. This argument from ccrrelation will not hold good if a task is a composite where success is defined as obtaining a certain percent of successes with the elementary tasks, or attaining a certain score by some system of credits. Two such composites may show very high correlations in respect of success as just defined, and yet have hardly a singleTHE MEASUREMENT OF WIDTH AND AREA 383 detail of one like any detail of the other. The correlations are between scores, each of which measures chiefly ability in what is summed up in or common to all the single tasks of the composite, not what is characteristic of any one of them as a totality. The composites are closely alike in the sense that what is summed up in or common to all the single tasks of A, is closely like what is summed up in or common to all the single tasks of B. PROPORTIONAL COUNTS For some purposes, the relative numbers of tasks at the different levels of difficulty will serve in place of the abso- lute numbers. Thus, if we wish to know what percent of A’s area B’s area is, we will do as well by knowing that the numbers of tasks are in the proportions K, 3K, 9K, 27K, 81K, as by knowing their absolute amounts. It may well be that such proportional counts may be made with greater accuracy, as well as with greater ease and speed, than absolute counts. Certain factors of error may act alike at all levels and so do no harm to the propor- tional counts. Certain arbitrary schemes of fractional al- lowance for overlapping tasks may also act alike at all levels and so do no harm to proportional counts. Even such proportional counts, however, will require much sagacity and industry to achieve even approximate truth for even a small fraction of intellect. A reasonably satisfactory pro- portional count of the number of tasks at each level of even so small a representation of intellect as CAVD is, will in- deed require an enormous amount of observation and ex- periment. New tasks, like new species of animals, are coming into existence while we count them; tasks a and ¢ seem enough different to count as two, and tasks b and d seem enough different to count as two, but when a, b, e¢, and d are considered together, they do not seem to deserve a eredit of four; it seems as if some sorts of tasks at some levels of difficulty were innumerable; when task X is simply a task where both a and b must be performed successfully384 THE MEASUREMENT OF INTELLIGENCE to bring success in X, shall we count a and b and X as 3 or as 2? These difficulties, together with those which have been mentioned and others which might be, may make per- fect or even approximately correct counts impossible. The best way to find out what is possible is to begin work at actual counts. We have begun, but have not progressed far enough to report results, save one. This one, which the reader’s sagacity may have anticipated, is that the number of different tasks per unit of altitude of intellect is not equal, but increases as we go up from zero altitude. That this is true for sentence completions can be easily realized if one will try to make as many different C tasks as he can between the average difficulty of those in A and the average difficulty of those in B (23 to 264), and to do the same for the stretch of difficulty from N to Q (40 to 43). It will be found very hard and perhaps impossible to devise five hundred of the former, whereas there seems almost no limit to the possible number of the latter. Apparently the harder the task, the greater the number possible, though it is not easy to devise extremely hard completions which are linguistic rather than informational in their difficulty. In the case of the arithmetical tasks the number of dif- ferent tasks surely increases from the very easy levels up to a certain point, after which there is some doubt. The doubt seems, however, to be due mainly to our averseness to fabricating problems whch are so elaborate and intricate as to be extremely unreal, rather than to the paucity of such. In the case of the disarranged-equation task, it is obvious that the number of different ones possible increases rapidly with increased difficulty and has no limit. In the case of vocabulary, the fact is unquestioned if other languages than English are included, and probably holds true for English alone. In the case of the understanding of sentences and para- graphs, the increase is obscured by the facts that people usually try to make their statements as easily intelligibleTHE MEASUREMENT OF WIDTH AND AREA 385 as may be, and that the number of persons who are con- cerned with very subtle and intricate ideas is few. Also, the number of different statements and questions of evena moderate degree of difficulty which can be fabricated is so enormous that comparisons are very difficult. Also, only persons of very high directions ability can frame state- ments which are sensible and correct but very hard to un- derstand, and still free from any great informational dif- ficulty. Sentence comprehension cannot, however, well be kept distinct from informational abilities; and if informa- tional difficulties are allowed to enter freely, the number of sentences very hard to understand is practically infinite. Even if one abstains from these rather rigorously, the num- ber of very hard D’s that can be made is enormously greater than one would expect from the number found in reading. ‘Merely by combining and permuting causal, con- ditional, and concessive clauses and pronoun references, one can produce an enormous number of different tasks like, ‘‘A change in ab would cause a similar change in og if ek did not produce its usual effect upon il, although ek did act upon um, and ba would cause an increase in ab, provided bi did not occur in unison with bo. What will happen to og if ba and bi and ek happen shortly subsequent to bo, pro- vided the ek-il action is neutralized by bo, and um does not occur ?”’ We have not even begun a count for the entire series of tasks which might reasonably be made constituents in com- posites designed to measure intellect in general. Conse- quently, we are not able to make more than a very rough estimate of how much number increases with altitude, or of the way in which the increase comes. We think the increase for Intellect CAVD is so great as to make the number of different tasks at level 40 to 40.99 at least a hundred times the number at level 20 to 20.99. We also think that it comes smoothly and with acceleration, at least up to a certain level, after the pattern of Fig. 56 or Fig. 57 or Fig. 58. In- tellect CAVD ean hardly be said to have an appreciableFigs. 56, 57 and 58. Samples of possible patterns of the increase in the num- ber of different intellectual tasks with increase in intellectual difficulty. 386THE MEASUREMENT OF WIDTH AND AREA 387 area below level 20, since it probably requires an altitude of 20 to complete ten out of any twenty sentences, no matter how easy, or to solve ten out of any twenty arithmetical tasks. The increase for intellect in general will be found, we think, to increase to a similar degree and in a similar manner, with at least fifty times as many tasks at 40 as at 20, and at least several hundred times as many at 40 as at 10. An intellect of altitude 40 may then have an area, not twice that of an intellect of altitude 20, but ten or twenty or perhaps two hundred times it. The common-sense view that the greatest intellect of a thousand men is many times as great as the worst intellect of the thousand may be en- tirely correct, if we mean by ‘‘great’’ something corre- sponding to area. Moreover, if we think of intellect as a hierarchy of unit connections or bonds between ideas or between the neural correspondents of ideas, the number of different connec- tions required to enable a person to respond correctly to 20 out of 40 of the elements of task N at level 40 may be not twice the number required to enable one to respond cor- rectly to 20 out of 40 of a task 3 below A, but ten or twenty or two hundred times it. Intellectual altitude, by our definition, shows a small relative rise from the imbecile to the average and then to the gifted adult, by the argument followed in Chapter IX, so small as to arouse astonishment and incredulity concern- ing the usefulness of the definition and the validity of the argument, at first thought. If, however, the altitudes of the imbecile, average and gifted were in the proportions of 5, 15, and 20, or 1, 11, and 16, instead of about 25, 35, and 40, we might find the relative areas of intellect in the three groups much more preposterous in the reverse way. The seale of altitude must not be criticized for the lack of at- tributes which are appropriate only for a scale of area, unless it can be shown that width is approximately the same at all altitudes. It is not.CHAPTER XIII THe RELATIONS oF ALTITUDE TO WipTH, AREA, AND SPEED The number of CAVD tasks at any given level of diffi- culty is unknown. Consequently all the relations with width which are considered in this chapter are relations with percents. No comparison or conclusion will appear which involves the absolute number of tasks in two levels. THE RELATION BETWEEN ALTITUDE AND w (10c+ 10a + 10vV + 10D), LE., NUMBER OF 40-COMPOSITE CAVD TASKS SUC- CEEDED WITH AT A GIVEN LEVEL OF DIFFICULTY N individuals are measured, each with, say, a score of CAVD composite tasks, each composite being of the same difficulty as any other, and each, consisting of so many single tasks that the correlation between the number right in any one composite and the number right in any other is perfect. Then any one of the N individuals who succeeds with any one of these composites (in the sense of having 50 percent or more of the single tasks correct) will succeed with any other of them; and the W of any individual will be one hundred percent or zero percent. Suppose that the same N individuals are measured perfectly in respect of altitude of Intellect CAVD. The correlation between alti- tude CAVD and W (10C +10A + 10V + 10D) will be per- feet, every one of the individuals who succeeds with these composites having a higher altitude than any one of those who fail with them. If each task at a certain level of diffi- culty is extensive enough to represent and measure all of CAVD difficulty and no other difficulty—all of CAVD intel- lect as it operates with tasks at that level of difficulty and nothing but it—then everyone who succeeds with these will have a CAVD altitude as high as, or higher than, the alti- tude which they represent and no one who fails with them 388ALTITUDE, WIDTH, AREA, AND SPEED 309 will have a CAVD altitude as high as the altitude which they represent. That is, if each task measures all the CAVD intellect which can operate at that level and nothing but it, the percent of tasks mastered at that level will be zero or one hundred and will correlate perfectly with alti- tude CAVD. Stated in another way, any individual who succeeds with any task of difficulty d which measures CAVD perfectly as it operates at that level of difficulty, will succeed with all tasks of less difficulty than d, if these also measure CAVD perfectly as it operates at their respective levels of diffi- culty; and any individual who fails with any task of diffi- eulty d will fail with all tasks of greater difficulty than d, if these also measure CAVD perfectly as it operates at their respective levels. These are not axioms necessitated logically by the defi- nition of Intellect CAVD and of difficulty CAVD; but con- clusions reached by observations of facts. The facts could be otherwise. Some men might conceivably succeed with tasks like O, P, and Q and fail with tasks like M, N, and O. We do not give an absolute empirical proof of these con- clusions, because we have not any tasks which measured all of the CAVD intellect which operates at any given level of difficulty. All the evidence, however, goes to prove their truth. Evidence may be found in the correlation between the altitude measure and the score of success or failure in 20-composites (5C +5A+5V-+5D) corrected for attenu- ation, so as to give the correlation between a precise mea- sure of altitude and the number of s’s in an examination with a very large number of such 20-composites. For ex- ample, the average correlation (bi-serial r) of the mea- sure of altitude with success in a CAVD 20-composite in the ease of 98 adult imbeciles was .984 for A, .916 for B, .875 for C, and .757 for D, averaging .883. The self-correlation of the altitude measure is .94, the inter-correlations of the three determinations whose average it is being .92, .77, and390 THE MEASUREMENT OF INTELLIGENCE .83. The self-correlation (tetrachorie r) of a CAVD 20- composite in this group is .96 for A, .76 for B, .79 for C, and .99 for D, averaging .874. The correlation between a precise measure of altitude and success in S50 percent or more of a number of CAVD 20-composites of equal diffi- 984 culty may then be expected to be —————— for As V .96 X .94 .916 879 0 VV .76 X .94 V 17 X 94 Veo e 883 or onwheayerace. Onur W/O OF Also the correlations between altitude and W(1C or 1A or 1V or 1D) are very near unity, as will be demonstrated in the next section. The correlations between altitude and W(10C + 10A + 10V + 10D) a fortiori will be near unity. In view of such evidence the conclusions stated in the first two pages of this chapter may be accepted as true. There is no reason to expect that the case will be different with any fairly catholic form of intellect (such as Picture- Completions -++ Opposites + Geometrical Relations + Rea- soning Problems of the type devised by Burt + Informa- tion; or Analogies + Number-Completions + Arithmetical Computation + a Common Element test of the type devised by Otis) from what it is with CAVD. THE RELATION BETWEEN ALTITUDE AND W(lc or 14 or lv OR 1p), I.E., THE NUMBER OF SINGLE TASKS SUCCEEDED WITH AT A GIVEN LEVEL This correlation is very close. There are a certain num- ber of individuals who are, relatively to others, much better (or worse) in arithmetical tasks than they are in the lin- guistic tasks, and whose records prevent perfect correla- tion. Also, there are probably other minor specializations within Intellect CAVD. But on the whole, individuals would be found to follow rather closely the general pattern of CAVD intellect shown in Fig. 59 if each of them hadALTITUDE, WIDTH, AREA, AND SPEED o91 been tested with several hundred tasks (one-fourth being C; one-fourth, A; one-fourth, V; and one- fourth, D) at ang level of dificult from 0 to 44. In general, that is, if intellect A has a higher altitude than intellect B, intel- lect A will also show a greater W(1C or 1A or 1V or 1D) than B at all levels between those where both A and B have one hundred percent right and those where both A and B have zero percent right; and the amount of superiority of A to B in W will be dlasels similar to the amount of superi- ority in altitude. To prove this, we have to estimate the relation as it will be found with a very large number of single tasks at the level of difficulty in question, from data where this number is only 40 or less. The evidence is as follows: In the case of 237 individuals of group 17, the correla- tions between altitude CAVD and percent succeeding in tasks N, O, P, and Q were as follows (P means the Pearson r; Sh means the Sheppard r): IE Sh ING SO 16 O93 94 Sik Sil Ones 86 Average .88 87 The self-correlations for %s in N, O, P, and Q in this group may be taken as approximately .76, using the data given in Appendix V, which show that the correlations of neighboring 40-composites average .73 in this group. .03 is added for the effect of the slight remoteness. The self- correlation of the measure of altitude in this group is com- puted as .90 from the intercorrelations of the three inde- pendent measures of altitude of which it is the average. They are .80, .76, and .71, averaging .757. By the well- 3(.757 = 1 + 2(.757)° By this determination, a precise measure of altitude a 27 known formula of Spearman, r 3 witn 3 Will equalTHE MEASUREMENT OF INTELLIGENCE 392 correlate with a precise measure of W(C or A or V or D) or 1.06. ordeal 875 Vv .76 X .90 As a check on this determination, we have computed the obtained correlation between the measure of altitude and the sum of the numbers of rights in N, O, P, and Q. It is 99. The correlation between a precise measure of altitude and a precise measure of W(C or A or V or D) should be higher than this obtained correlation. In the case of 189 individuals of group 13, the correla- tions between altitude CAVD and % s in tasks N, O, P, and Q were as follows: to the extent of iP Sh N_ .875 84 OF 2925 904 P96 89 A) liters 83 Average .874 .866 The average self-correlation for % s in N, O, P, and Q in this group may be taken as .74, from the data given in Appendix IV. The self-correlation of the measure of alti- tude in this group is found by the Spearman formula to be 89. The intercorrelations of the three independent mea- sures of altitude of which it is the average are .71, .64, and 81. The correlation between altitude and W(C or A or V or D), both being measured accurately, will thus be 87 V 74 X .89 95 between the obtained measure of altitude and the sum of the numbers correct in N, O, P, and Q,.and a part of M. In the case of 246 individuals of group 91, altitude CAVD correlates with % s in composites I, J, K, L, and M as follows: .58 for I, .82 for J, .92 for K, .823 for L, and .643 for M (all by the Sh formula). The self-correlations of % s in I, J, K, L, and M in this group are respectively .73, .80, 14, (86, and .69. Whe self-correlation of the measure of or 1.07. As a check, we have a correlation ofALTITUDE, WIDTH, AREA, AND SPEED 393 altitude in this group is .79, the intercorrelation of the three measures of which it is the average being .56, .58, and .52. It is perhaps unwise to average correlations such as these which show wide and regular differences. So we correct each for attenuation separately and have, as the five resulting determinations of the correlation between altitude and W(C or A or V or D), .76, 1.03, 1.20, 1.00, and .87. The average of these is .97; the median is 1.00. Asa check we have the correlation between the altitude measure and the sum of the numbers correct in I, J, K, L, and M. It seein In the case of 192 individuals of group 9II, altitude CAVD correlates with % s in composites K, L, M, and N as follows: .73 for K, .90 for L, .91 for M, and .66 for N. The average is .80. The self-correlations of K, L, M, and N are respectively .764, .874, .754, and .75, averaging .80. The self-correlation of the measure of altitude in this group may be taken as .83, the intercorrelations of the three measures of which it is the altitude being .50, .635, and .73. So a pre- cise measure of altitude will correlate with a precise mea- sure of W(C or A or V or D) to the extent of .99 (.91 by K, 1.07 by L, 1.15 by M, and .84 by N). As a check we have a correlation of .96 between the measure of altitude and the sum of the numbers right in K, L, M, and N. In the case of 63 university students the correlations between altitude CAVD and %s in tasks N, O, P, and Q were as follows: Sh N ag O 92 P 90 Q 70 The intercorrelations of N, O, P, and Q are: N with O, 08; O with P, .70; and P with Q, .73. The self-correlations of N, O, P, and Q may be estimated as .61, .67, .744, and .76 by adding .03 to the correlation between neighboring com-394. THE MEASUREMENT OF INTELLIGENCE posites. The self-correlation of the measure of altitude in this group may be taken as .83, the three measures of which it is the average having intercorrelations of .80, .54, and .00. The correlation between a precise measure of altitude and a precise measure of width is then computed as 1.08 for N, 1.23 for O, 1.144 for P, and .88 for Q, averaging 1.08. As a check on this result, we have the correlation of .98 between the measure of altitude and the sum of the numbers right in N, O, P, and Q. We have thus five determinations of what the correla- tion between altitude CAVD and W would be if both were measured precisely, namely, for group 17 1.06 e 13 1.07 a 9] 97 gy Ort ago a Univ. students 1.06 with an average of 1.03 + a mean square error of .019. There is an element of insecurity in these corrections for attenuation, especially in so far as the self-correlations for W(C or A or V or D) are estimated by adding .03 to the obtained correlations for neighboring composites. However, the empirical correlations between the obtained altitude measure and the obtained sum of the W2s (.995 299; .96 .91, and .98) show that the corrected correlations should be near unity. The same close correlations obtain in groups at low alti- tudes. In the case of the 100 individuals of group im6, the correlations of the measure of altitude with % s in Oe) abs F, and G, respectively, were .79, .86, .89, .86, and .54, aver- aging .79. The self-correlations of the measures of %S were, respectively, .80, .86, .84, .88, and .81, averaging .83. The self-correlation of the measure of altitude in this group is .67 by the Spearman correction, the average intercorrela- tion of the three determinations of which it is the average being only .407. The correlation between a precise measureALTITUDE, WIDTH, AREA, AND SPEED 395 of altitude and a precise measure of W(C or A or V or D) is then 1.06 by this determination. The correlation of the obtained measure of altitude with the sum of the numbers right in C, D, E, F, and G was .93. In the case of the 50 f, the correlation between the ob- tained altitude measure and the sum of the number right in K, F, G, H, and I was .98. In the case of 162 individuals of group 4, the correla- tions between altitude CAVD and % s in tasks F, G, H, I, J, and K were .48, .83, .93, .95, .75, and .53, respectively. The intercorrelations of % s in F, %s in G, and so on, are: F with G— .67; G with H=.81; H with [—.854; I with J = 63; J with K=.51. The self-correlations may therefore be takenias «(0 tor Hy i¢ tor G. 86 tor He 600 tor lb OO toner and .54 for K. The self-correlation of the measure of alti- tude is .81, the average intercorrelation of the three mea- sures of which it is the average being .59. The most prob- able correlation between a precise measure of altitude and a precise measure of width is then .64 for F, 1.05 for G, 1.114 for H, 1.20 for I, 1.08 for J, and .80 for K, with an average of .98. As a check on this determination, we have computed the correlation between the measure of altitude and the sum of the numbers of rightsin F, G, H, I, J, and K. It is.96. A rough calculation of the correlations for the 180 eases of group 1m3 shows that with them the raw correlations of the altitude measure with W(1C or 1A or1V or 1D) in com- posites A, B, C, and D will be around .90 and that the cor- rected coefficients will be near unity. The closeness of these correlations indicates that each individual would, if adequately measured by a large num- ber of single tasks at each level of difficulty, show a pattern closely of the type of Fig. 59. Individuals might be of widely different patterns, such as those shown in Fig. 60, Fig. 61, and Fig. 62, so that individuals of the same altitude would differ widely in width at any level. But, in fact, such large divergences in pattern are very scarce in Intellect CAVD.THE MEASUREMENT OF INTELLIGENCE Fig. 59 Fig. 60 Fig. 61 Fig. 62 Fie. 59. The pattern of decrease in percent of single tasks correct with in- crease in difficulty, which corresponds to close correlations between altitude and W(10C or 14 or 1V or 1D). Fies. 60, 61 and 62. Patterns of decrease in percent of single tasks correct with increase in difficulty such as individuals would show if the correlations between altitude and W(1C or 1A or 1V or 1D) were much below 1.00..ALTITUDE, WIDTH, AREA, AND SPEED 397 How small and searce they will be in other forms of in- tellect, that is, how close a resemblance between altitude and width will be found for any other form of intellect, will depend upon the constitution of the form in question. In CAPI,,., with picture completions and information about music and art replacing vocabulary and directions tasks, the correlations will probably be lower. However, so long as the constituents of our composite tasks all concern the ability to deal with ideas and symbols for ideas, the amount of specialization will be small in comparison with the total variation in ability, so that the correlations will be high. THE RELATION BETWEEN ALTITUDE AND AREA OF INTELLECT The facts brought forward in the first and second sec- tions of this chapter prove that the A(10C + 10A + 10V + 10D) of any intellect and the altitude of that intellect are determined almost or quite entirely by the same cause or causes. The facts of the third and fourth sections prove that to a very considerable extent this is true for the A(1C or 1A or 1V or 1D) of any intellect and its altitude. A verifica- tion of this by the direct measurement of A(1C or 1A or 1V or 1D) is not yet possible because the number of tasks at each level of difficulty is unknown. Indirectly, it may be partially verified as follows: If n single tasks are taken from each level from zero to forty-five, one-fourth being C, one-fourth A, one-fourth V, and one-fourth D, and individ- uals are measured in respect of these, n being sufficiently large, the A’s so obtained will have the same rank as A’s obtained by an examination where the intellects are tested with all tasks at all levels. The area for the selection of nat 0,-+-nat1,-+nat 2, +n at 3, and so on, may be taken to be approximately the area found by assuming that each intellect will succeed with all or nearly all of the single ele- ments at levels below the highest level where it obtains 100 percent right and will fail with all or nearly all of the single elements at levels above the lowest level where it obtains398 THE MEASUREMENT OF INTELLIGENCE zero percent right (or only that percent which mere chance guessing could give). By permitting some estimating of scores, this procedure may be earried out. 46 45+ 44} 43- 42 4|P 40r J9- JB 3/P 36r- JOP 34F 3dr 32 3| 30- 26° 2/- Hig. 63. ! I l IZ l 13 The results appear in Fig. eases entered in Fig. 63 are all taken at random l 14 The relation between CAVD altitude and area in a sampling of tasks comprising N tasks for each unit of altitude. 63. ‘Lhe so far as ve ! ! 15 16 17 18ALTITUDE, WIDTH, AREA, AND SPEED 399 the relation in question is concerned. Those used were all that had 37 or more right in the easiest altitude with which they were tested, or a random selection from all such. The groups used were im3, f, 4, 91, 17 and the group of 63 univer- sity students. The area number was computed as follows: I. Assume that, at each unit of altitude up to the easiest altitude at which the person was measured, he had 40 (i.e., all) right. II. Count the number he had right over the range at which he was measured; and estimate from this how many he would have had right had he been tested with 40 single tasks at each unit of altitude over this range. III. Estimate the number which he would have had right at all altitudes above the highest at which he was tested, using arbitrarily the number which he had right at the highest altitude at which he was tested. The area number is the sum of the three numbers obtained by I, II, and III. The area number thus ranges possibly from 957 for an im3 who had 37 right in Composite A and none right in any higher composite, to 1,800 for a person who had 40 right in N and also in O, P, and Q. The lowest actual area number among the cases used was 1,063; the highest was 1,760. The very close interdependence of area and altitude shown by Fig. 63 would be little if at all reduced if more extensive and precise measures were available. There is thus a high degree of genuine unity to Intellect CAVD, not assumed but discovered. We began with a mea- surement in the form of an inventory, differing from a bare enumeration of success or failure with actual tasks only in that the tasks were graded in difficulty. We end with mea- surements of altitude, width and area which intercorrelate so closely that they may reasonably be treated as results of a closely knit set of causes. Whatever makes one intellect able to do much harder CAVD tasks than another intellect 1It would be reduced inasmuch as some of the errors now involved act in the same direction on the altitude measurement and on the area measurement. It would be increased inasmuch as the purely chance fraction of the error acts to reduce the correlation.400 THE MEASUREMENT OF INTELLIGENCE also makes it able to do many more tasks than that other can do. After the necessary data have been collected, width at any altitude, and so total area, will be predictable in the case of Intellect CAVD (and presumably in the case of other forms of intellect) rather precisely from altitude alone. THE RELATION OF ALTITUDE OR LEVEL OF INTELLECT TO SPEED It is important to know the relation between level and speed for two reasons. If the relation is very close, the speed of performing tasks which all can perform would be an admirable practical measure of intellect. The record would be in time, an unimpeachable and most convenient unit. If, on the other hand, the correlation is very low, the practice of giving credit for speed in group examinations should probably be amended. Dr. Hunsicker [’25] has made extensive individual mea- surements upon 82 adults and 81 school children, taking the time for easy problems in arithmetic and for easy comple- tions, such as appear in our composites HE, F, and G; and then testing the person with harder and harder tasks until the level was reached where he could not obtain fifty per- cent right. The correlations which she obtained between altitude and rate (the reciprocal of the time required for tasks done with no, or very, very few errors) are shown in Table 130. They are much too low to make it advisable to use the speed at easy tasks as a measure of the altitude or width or area of intellect, except possibly in the case where the time avail- able for the examination is very short. They are indeed so low that it seems unwise to attach much weight to speed in intelligence examinations in general.? A graded or ladder test of thirty minutes containing 5 levels each consisting of ten words and five arithmetical problems* using small num- 2 Except, of course, in the case of tests (such as the substitution test) where speed measures the speed of learning. 3 Or containing ten opposites and ten questions of arithmetical informa- tion, or containing five directions and five arithmetical problems.ALTITUDE, WIDTH, AREA, AND SPEED 401 bers, will in all probability show a closer correlation with any reasonable criterion of intellect than will a thirty min- utes’ speed test. TABLE 130. CORRELATIONS, RAW AND CORRECTED FOR ATTENUATION, BETWEEN RATE AND LEVEL. (AFTER HUNSICKER, 725, TABLE V.) Arithmetic Completion Individual No. in Raw _ Corrected Raw Corrected Average Testing group Tr r r Tr Was) ce 28 29 35 00 06 46 I sy occa 54 46 55 19 .23 3 Pes: 189)... 2 49 08 49 .64 61 Ere Oh eee 49 .29 30 41 50 43 Average ......... 46 48 AT We have extended Dr. Hunsicker’s work by a measure- ment of the speed of doing a collection of CAVD tasks chosen from levels I and below in the case of 63 university students for whom a measure of CAVD altitude was ob- tained by the use of composites N, O, P, Q, and a still harder composite. There were some errors in the easy tasks, so we have computed r,,.., the partial correlation between speed and altitude, for those making equal numbers of errors in the rate test. T.. —-+ .403 To == — .084 T,, = — -484, hence; re. — +416: The self-correlation of the measure of altitude is .83 for this group; the self-correlation of the measure of speed is not known but is almost certainly between .7 and .9. If the 403 were corrected for attenuation, the result for CAVD would thus be fairly close to Dr. Hunsicker’s results for A and C.CHAPTER XIV Tuer MEANING OF ScorES OBTAINED IN STANDARD INTELLIGENCE HixAMINATIONS THE MEANING OF THE BINET MENTAL AGE A Binet Mental age is a rough measure of relative alti- tude A D Inf Ot, using Ot to mean ‘‘other tasks found or alleged to deserve inclusion in a battery to measure intelli- gence’’; or, more exactly, of the relative A(1la or 1d or linf or lot) of a sampling of a certain number of tasks at each of certain levels. This A will correlate closely with alti- tude. Up to about M. A. 14, Binet scores are defined by the probable median or average chronological age of those who would obtain such a score, in the group by which the ex- amination was standardized. Above M. A. 14, the scores are arbitrary. Until the numbers of tasks at each level of difficulty are known, and perhaps even after they are known, a Binet Mental Age may best be treated as a measure of altitude— of how hard tasks the person can succeed with. If this is done, nothing will be lost from sound present uses and cer- tain misapprehensions will be avoided. For example, everyone will understand that a very small increment of mental age at the high ages may mean a very large inere- ment in area of intellect or percentage of success with the total mass of intellectual tasks which life may offer, and that a very large increment of mental age at the low ages may mean a relatively small increase in the total number of tasks achievable or in the total number of connections formed. The great merit of the Binet Test is that it is a graded scale for intellectual difficulty, and it is only weakened by being interpreted loosely as a measure of some mysterious essence called intelligence which grows in man. The weak- 402MEANING OF SCORES IN INTELLIGENCE EXAMINATION 403 ening is not disastrous simply because, as was shown in the previous chapter, altitude and width (and consequently area) of intellect are so closely correlated. Miss Rowell is measuring the values of Stanford Binet M. A. 10, M. A. 11, M. A. 12, ete., in terms of the absolute units of the CAVD scale in so far as one can be said to mea- sure the equivalence of two series of magnitudes which may not be measures of exactly the same fact in nature, and of which one (the Binet) may not measure varying amounts of the same fact. We have found that adults of Stanford Binet Mental Age 48 months, or 4 years, will show an alti- tude of about 26 in Intellect CAVD; and that adults of Stanford Binet Mental Age 78 months, or 64 years, will show an altitude of about 30 in Intellect CAVD. When, by these measurements or by others, the differences in the M. A. scores are put in equal units and referred to the abso- lute zero of intellectual difficulty as located by us, or as more accurately located by future workers, the Binet scale and measurements will have a much greater value than they now have. What has been said of the Binet applies equally to the Herring Examination, which is an alternative Binet. THE MEANING OF SCORES OBTAINED IN STANDARD GROUP EXAMINATIONS The significance of scores in group tests such as the Army Alpha, National, Otis, may best first be considered with disregard of the factor of speed; that is, on the as- sumption that the scores of individuals represent what they can do with time enough allowed to exhaust their abilities. The score does not measure either altitude or width or area of intellect. It does not measure altitude, because the number of tasks between levels equally far apart is not nec- essarily the same. It does not measure width, because the score is not divided up into a number of sub-scores, each representing the number of successes at a certain level. It404 THE MEASUREMENT OF INTELLIGENCE does not measure area, because it measures neither altitude nor width, and because the percent which the tasks are of those that might be had at any level of difficulty 1s not known. Although one of these group tests does not in a rigorous sense measure any one of the three, the score in it is about as closely symptomatic of altitude as the score in any test requiring so short a time could be. It is also closely symp- tomatie of the average width of intellect at and near the levels of difficulty represented by its tasks. One of these group examinations is in fact very much like what we have when we put together five or six of our CAVD 40-com- posites that are in a sequence for difficulty. The difference between a set of these CAVD composites from about G to N and Army Alpha or the National or the Otis (no time limit being set) is that in the case of the CAVD composites, we know how many single tasks there are at each level of diffi- culty, and we know how far apart the levels of difficulty are, and we can not only make a summation of credits, but also can make an altitude score, and a width score at each level. In Army Alpha or the National or the Otis, the total sum- mation score is not susceptible of such an analysis. Except for the speed element, then, one of these stock intelligence examinations may be regarded as a series of composites unequal in the number of their elements, and undefined as to the distances between levels. The addition of the speed element complicates matters and theoretically makes the significance of the score incapable of interpreta- tion except in terms of what people of a certain sort do in that kind of a test when it is scored in that way. Practically, however, the speed element does not make the scores in these examinations, as they are administered in the case of most of the individuals who are measured by them, very much different in significance from the scores which would be obtained with no time limits set. A few persons are nervously upset by the instructions to work as fast as they can; a few cautious, critical workers do not haveMEANING OF SCORES IN INTELLIGENCE EXAMINATION 405 time enough to do as many of the hard tasks as they are really able to do; a few persons are scored unduly high because they utilize the time especially shrewdly, while a few others are scored unduly low because they dally too long over tasks at which they fail, or leave tasks undone which the use of a little more time would have enabled them to finish. But, in general, the scores in these speed tests correlate very closely with the seores obtained when a longer time allowance is given, partly because the correla- tion between speed and altitude is positive, but more be- cause the standard time allowance is long enough to enable most of the candidates to do most of the tasks which they could under any circumstanees do. The experiments of the Army psychologists on the re- sult of doubling the time allowance for the Alpha and Beta examinations are well known [Memoirs, ’21, pages 415- 420]. The general result was that there was a slight im- provement in the correlation with officers’ ratings for in- telligence, and a close correlation between the score in single time and the score in double time (r = .967), which is probably as high as the self-correlation of the determina- tions would permit. Dr. J. R. Clark has investigated the influence of alti- tude and of speed upon the abilities measured by the Stan- ford Binet, the Otis Self-Administering Test, and the Ter- man Group Test, in the case of school pupils from Grades ff 10) WA His results are not entirely clear, because his measures of speed are afflicted by rather large variable errors, and are perhaps also disturbed by the presence of an altitude factor; but on the whole they indicate that scores in these stock examinations are determined much more by altitude than by speed, and perhaps are determined almost entirely by altitude and width. The average of the six speed cor- relations (speed in arithmetic and speed in completions with Binet, Otis, and Terman) each being corrected for attentuation, is .54. The average of the corresponding alti-406 THE MEASUREMENT OF INTELLIGENCE tude correlations is .70. The average of the four correla- tions between speed and altitude is not given, nor all the data whence to obtain it. Ar speed with Ar altitude (cor- rected r) is .76; Co speed with Co altitude is .40; the aver- age is thus .58. The other two r’s are not given. They would presumably be lower. If their average is estimated, we can compute the partial correlations of speed with Binet, Otis, and Terman for persons of equal altitude in Ar or Co and of altitude with Binet, Otis, and Terman for persons of equal speed in Ar or Co. Estimating this average as .48, the partial correlations are .28 for speed and .58 for alti- tude. A more instructive set of measurements is of the relations between speed in general and altitude in general to scores in Binet in general, Otis in general, and Terman in general.t These Dr. Clark has made. He finds that dif- ferences amongst individuals in the score in one of these examinations are almost perfectly correlated with differ- ences in what is common to their two altitudes, and much less closely correlated with differences in what is common to their two speeds. We quote his results. Ohi Y general level and Binet ie N (r ar. level and co. 1evely) (r Binet and Binet ) as EGE. ( r. level and Binet ) (1 co. level and Binet ) #55) |(90* ) Similarly I general level and Otis (r ar. level and onic) (ir co. level and Orin) N (r ar. level and co. evel) (r Otis and Otis) SONG Ia NiG55G0E) an ae and Ti general level and Terman 1‘‘Binet in general’?’ means the average score in an infinite number of tests patterned after the Stanford Binet. * Estimated.MEANING OF SCORES IN INTELLIGENCE EXAMINATION 407 ate ar. level and Nera) i: co. level and Tern) (515) (SL) 80 X .66 — ase — IND ToOxegle (55)(.90*) PP In the same way, the relationship between ‘general speed’ scores and intelligence test scores is found to be: oo X .49 5 TI’ general speed and Binet — eee EOF es (50)(.90%) slx