LB Variable Factors in the Binet Tests A DISSERTATION presented to the Faculty of Princeton University IN Candidacy for the Degree OF Doctor of Philosophy BY CARL C. BRIGHAM Princeton Princeton University Press 1917 Variable Factors in the Binet Tests A DISSERTATION presented to the Faculty of Princeton University IN Candidacy for the Degree OF Doctor of Philosophy BY CARL C. BRIGHAM Princeton Princeton University Press 1917 VARIABLE FACTORS IN THE BINET TESTS Table of Contents I. Introduction i II. Subjects and Methods 8 III. The Personal Equation i8 VI. Grade Correlations 37 V. Sex Differences 65 VI. Summary 91 I. INTRODUCTION During the past decade, the Binet-Simon measuring scale for intelligence has received considerable attention, and a large amount of literature has appeared on the subject. No attempt has been made in the following pages to review all the literature on this scale or other systems of intelligence testing. Kite (38) gives an excellent account of the history and nature of the scale. Kohs (41) has assembled a very complete bibliography on the subject up to June 1914. Schmitt (57) gives an historical ac- count of the development of the various attempts to correlate psychological findings with general intelligence, particularly in this country and England. Bo'bertag (10) and Schmitt both give detailed descriptions and analyses of the individual tests. Stern (62) has devoted a monograph to the collection, exposition and critical analysis of the large amount of data bearing on the problem of intelligence testing, and in another work (61) has assembled the literature of cognate fields. The literature bearing on the Binet scale up to 19 12 is largely descriptive of the scale itself, the standard methods of procedure, etc. The more recent literature has been critical and reveals a tendency at the present time for investigators to depart from the methods of the exten- sive application of the scale as a whole to the more intensive study of the individual tests. All systems of intelligence tests may be classified as qualitative or quantitative. The qualitative system consists of an aggrega- tion of tests designed to detect the capacities or incapacities of the subject in order to afford the experimenter an opportunity to make a diagnosis concerning the subject's mentality. This method throws the responsibility for the final diagnosis on the experimenter. The system of tests proposed by Healy and Fer- nald (34) are of this type. Quantitative systems of tests necessi- tate a final score of some sort, whether that score be in the form of a mental age, a mental quotient, a certain number of points, 2 CARL C. BRIGHAM a coefficient of intellectual ability, a percentile rank or what not. The essential characteristics of the quantitative systems are the interpretation of the total scores in terms of the age of the sub- ject, and the placing of the responsibility for the final diagnosis on the tests rather than the experimenter, Binet and Simon's 1905 scale (5 and 6) was of the qualitative type. A series of 30 tests of approximately increasing difficulty was published with directions for their application. The authors reported in a general way that from their experience in examin- ing a few selected normal children of different ages, and other subnormal children in the schools and at the Salpetriere, approxi- mate levels of performance could be found characteristic of the development of normal children of 3, 7, 9 and 11 years chrono- logically, the performance of idiots, imbeciles and morons cor- responding roughly with that of normal children of 3, 7 and 9. Although the reference to chronological ages introduced the quantitative element, at no place were the authors insistent on this point, merely stating that they had found the series of tests exceedingly valuable in diagnosing and classifying defectives, and in their opinion others would also find it valuable. The 1908 scale (7) was quantitative in character owing to the introduction of the concept of "mental age". It included a list of 56 tests grouped according to ages from 3 to 13, each group containing from four to eight tests. Most of the tests of the 1905 series were included, the additions including in a large measure tests of a scholastic nature. The authors gave directions for applying the series and for computing the resultant "mental age". A child testing three years below his chronological age was to be considered defective. Although the scheme of the 1908 series was entirely quantita- tive, the authors did not discard the qualitative idea, and they cautioned against the application of the scale in the manner of a measure of height or weight. The border line between the idiot and the imbecile was fixed by the ability to use and compre- hend spoken language. The imbecile was differentiated from the moron by the use of written language, illiteracy being di^ I VARIABLE FACTORS IN THE BINET TESTS 3 ferentiated from imbecility by certain tests. The authors stated that the moron could be defined only in terms of the environ- ment in which he lived, and they considered six tests important in differentiating the moron from the normal individual of the Paris population. Any system of tests which throws more weight on some tests than on others in making a differential diagnosis is fundamentally qualitative in kind, for the responsi- bility is placed not on the score but on the judgment of the ex- perimenter. The idea of a quantitative measuring scale of intelligence however met with instant favor. The interest that actuated the psychologists of the "early nineties" to correlate the measurements of reaction time, motor ability, sensory dis- crimination, etc. with intelligence was revived. The scale was translated into several languages and applied to individuals of many classes and types. In 191 1, the authors published a revised scale (8) in which many of the tests of scholastic ability were discarded, and the remaining tests shifted about so that there were five tests for every year except one from III to X with similar groups for "twelve year", "fifteen year" and "adult" mentality. In the same year, Binet published an article (4), his last word on the sub- ject, in which he discussed many of the criticisms which the scale had received, and again sounded the note of warning against the mechanical interpretation of results. However, as one traces Binet's thought on the subject through his writings, he may see the idea of a qualitative system of tests gradually dropping into the background, and more and more weight placed on the "scien- tific" (quantitative) measure of intelligence. That Binet did not depart entirely from the qualitative stand- point is shown by his discussion of the test of comprehending difficult questions. "Sometimes after an examination one hesi- tates on a diagnosis. The child has failed in one or two tests, but this does not seem to be convincing. Failure to give the day and date and the months of the year are excusable errors, which may be caused by distraction or by lack of education. But the questions for comprehension dissipate all doubts. We recall 4 CARL C. BRIGHAM several instances when teachers brought us children, desiring to know whether or not they were abnormal; occasionally, in this way they set a trap for us, but we did not object, it was fair play. Our questions for comprehension decided us every time. We remember one child who was very slow in answering as though dull, his face was expressionless and unprepossessing; he knew neither the day nor the date, nor what day comes after Sunday, and he was 103^ years old; his reading was syllabic. But when we asked question 5 : Why do we judge a person by his acts rather than by his words ? he gave the following answer : Because words are not very sure and acts are more sure. This was enough — our opinion was formed, that child was not so bad as he seemed." (Town's (72) translation, page 48.) The popular interest that was manifest before the advent of the 191 1 scale was tremendously reinforced in this country by Goddard's (30) publication of the results of the application of the scale to "two thousand" non-selected school children in Vine- land, N. J. Popular interest increased rapidly, and the scale continued to have wider and wider application in the hands of less and less experienced investigators. The concept of "mental age" was exceedingly easy of comprehension, no apparatus was needed, and the scale has now become the common property of all. This development or overdevelopment has taken place in spite of the warnings of the authors themselves and the psycho- logical fraternity in general. The very fact of overdevelopment however is striking evidence that persons interested in the social sciences need a quantitative scale for measuring intelligence. The question whether the Binet scale is an accurate measure of intelligence can be decided only by the study of the individual tests and the factors underlying them. A study of this sort will show the errors that underlie the total score or "mental age", and at the same time will show the direction in which the cor- rection of the scale should take place. The proper understanding of the individual tests involves the theory on which the measur- ing scale was constructed. The method which Binet and Simon used in constructing their VARIABLE FACTORS IN THE BINET TESTS 5 measuring scale of intelligence was entirely empirical. A large number of tests were given to children of a certain social status. Certain tests could be shown to be correlated with age, and in the authors' opinion were correlated with intelligence. The fact that at a certain age a test could be passed by a certain propor- tion of the subjects was taken to mean that the test in question was characteristic of that age. Tests that were characteristic of the same age level were then combined into one age group. In this way a scale was built up with a number of tests for each age group. By a certain arbitrary system of scoring the re- actions of a subject to all or part of the scale of tests, the "men- tal age" of the subject was obtained. The comparison of the "mental age" with the chronological age of the subject would show him to be advanced, at age or retarded, and the amount of acceleration or retardation would afford a quantitative index of his intelligence. A person could construct a scale on the same basis and arrive at an age score using entirely different tests. A scale could be constructed containing tests of height, weight, vital capacity, strength of grip, circumference of the head, etc. and the results interpreted in terms of age. In this case however the age ob- tained would be more physical than mental. A scale of tests could also be constructed which involved the subject's knowledge of geography, spelling, history, grammar, etc. but in this case the resulting age would be determined very largely by the amount of training the subject had received. The assumptions that a child at a certain age should weigh 25 pounds, at another age 50 pounds, etc., that a child can repeat 3 digits at one age, 5 digits at another and 7 digits at another, and that a certain percentage of children at one age can enu- merate the months, and a higher percentage at another age, differ only in the possible determiners to which the growth may be re- ferred. In the first case the growth is referred to certain physio- logical processes which are supposedly independent of intelligence and training. Binet believed that the principal determiner of growth in the last two cases was intelligence, but the possibility 6 CARL C. BRIGHAM remains that they might be more or less independent of intelli- gence, and more or less dependent on training and other variable factors. The principle on which the scale was constructed involves three assumptions, (i) that the individual tests are correlated with age, (2) that the individual tests are correlated with intelligence, and (3) that intelligence is correlated with age — three distinct assumptions any one of which does not necessarily involve the others. The purpose of this investigation is to study the correla- tion of the individual tests with age, to determine the variable factors that might operate on the tests to produce an apparent correlation with age that was not a real correlation, or that might alter the real correlation in some way. There is a possibility that an error might occur in the statistical treatment of the results, so that figures which would apparently indicate a correlation with age of a certain degree might actually represent a correlation of another degree. Another variable factor is the personal equation of the experimenter, who might alter the procedure in giving a certain test so that the correlation of that test with age might be different from the correlation obtained by another experimenter. If the subjects of various ages had received different school training, this difference might introduce another factor which would vary independently of the age of the subjects. If the tests used depended on any inherited or acquired differences between the sexes, then the correlation of the tests with age might be different for the two sexes. If any or all of the variable factors mentioned prove to be present in the correlation of the tests with age, then certain allowances will have to be made for these factors in making a diagnosis of the subject's intellectual ability on the basis of his total score or "mental age", and the scale becomes qualitative rather than quantitative. At the Fourth International Conference for School Hygiene held in Buffalo in the summer of 19 13, several persons of un- questioned authority in the field of mental tests held an informal VARIABLE FACTORS IN THE BINET TESTS 7 conference on the Binet-Simon scale, reporting the results in 1914 in the form of recommendations and suggestions (15). The question, "How much is the outcome of the testing in- fluenced by the personal equation, both of the examiner and ex- aminee?" was answered, "Undoubtedly there is some influence and it may be a serious source of error." Another question, "How much do previous environment and school training effect the outcome of the tests?" was left unanswered by the opinion, "The experimental evidence thus far available is conflicting. Further investigation is needed." The question, "Should the scale be divided, in the upper years at least, to furnish separate standards or separate tests for the two sexes?" was answered, "We do not know, and recommend this a subject for investiga- tion." The following study is in part an attempt to answer these questions. The method used in this study is that of studying the indi- vidual tests, disregarding entirely the total score or "mental age". There are at present so many revisions and editions of the Binet scale, that the term "mental age" has no meaning out- side of the particular scale in question. The tests that are used in the various standardizations are however approximately the same, so that conclusions concerning the factors underlying the individual tests have a wider significance than those drawn from the "mental ages". Furthermore variable factors in the indi- vidual tests may balance each other in the total score so that their influence might be obscured. The subjects and methods will be described first, and in con- nection with the methods of treating the results a statistical error will be pointed out. The problems of the personal equation, grade correlations and sex differences will then be taken up in detail. 11. SUBJECTS AND METHODS Subjects The data which are here analysed to determine the influence of the personal equation, of grade training and of sex differ- ences, are derived from all the boys and girls below the seventh grade in the Princeton, N. J., Model School. This group in- cludes 422 subjects of the following age distribution, — Chronological Ages. 4 5 6 7 8 9 10 II 12 13 14 15 16 4 17 62 52 56 42 53 49 36 32 II 62 Each of the first six school grades was divided into a plus and minus grade, the latter division being under a different teacher, and containing those who were either backward, or, on account of illness, change of school, or for reasons not neces- sarily related to their mental development, were not sufficiently advanced to perform the work of their grade. The school also contained a special class for defective and exceptionally back- ward children. The subjects were distributed in the school grades as follows, — School Grades. Spec. Kind I— 1+ II— 11+ HI— III+ IV— IV+ V— V+ VI— VI+ 18 32 38 51 12 40 12 45 15 35 IS 49 II 49 39 or 9.2% of the subjects were children of non-English speak- ing parents, this group including 6.6% of the children in the Kindergarten and first six regular grades, and 15.7% of those in the special class and minus grades. The selection of subjects is only fairly typical of the general run, for Princeton has no manufactories. The children examined came, for the most part, from the homes of laborers, domestics, artisans, farmers, tradesmen, clergymen and college professors. The selection is atypical in that none of the children came from homes of the manufacturing class, while an unusually large pro- VARIABLE FACTORS IN THE BINET TESTS 9 portion came from the homes of those engaged in domestic, personal, and professional service. Tests The scale used was Goddard's (28) 191 1 revision of the Binet-Simon scale. The methods used in giving the tests were, as far as possible, the same as those outlined by Goddard in the original revision, incorporating the rules and suggestions for standardized scoring published by that writer (29) in 19 13. The methods used will not be discussed in detail, for the data are not used in obtaining age norms and standards for children generally. For the analysis of the data in terms of grade and sex it is not necessary that the procedure should be absolutely standardized, but that the experimenters who gave the tests should have used the same procedure. Differences in the tech- nique of the experimenters will be discussed in the chapter on the personal equation. One variation frorh the usual procedure was adopted. In no case did the experimenter know the chronological age of the child being tested. The influence of any prejudice or bias on the part of the experimenter is therefore eliminated from the problem of the correlation of the tests with age. The three ex- perimenters who gathered the material in the spring of 1913 examined the sixth grade first and the remaining grades in de- creasing order. During the school year 19 13- 19 14, the fourth experimenter examined all children at that time in the kinder- garten and first grades, and others who were not examined in the spring of 1913. The tests in the "three year", "four year", "five year", "fifteen year" and "adult" groups were given so 'in frequently that the data from them are not treated. The tests used are as follows. The figure at the right shows the total number of times each test was given. AGE VI 1. Distinguishing between morning and afternoon 108 2. Defining in terms of use 333 3. Executing three commissions 100 lo CARL C. BRIGHAM 4. Showing right hand and left ear 107 5. Choosing the prettier of given faces 117 AGE VII 1. Counting 13 pennies 217 2. Describing pictures 219 3. Indicating omissions in pictures 217 4. Copying the diamond (in pencil) 225 5. Naming four colors 218 AGE VIII 1. Comparing remembered objects (butterfly and fly) 271 2. Counting backwards from 20 to 251 3. Enumerating the days of the week 277 4. Counting stamps 258 5. Repeating 5 digits 413 AGE IX 1. Making change 271 2. Defining in terms superior to use 333 3. Giving the day and date 307 4. Enumerating the months 284 5. Arranging five weights 334 AGE X 1. Recognizing pieces of money 282 2. Copying designs from memory 252 3. Repeating 6 digits 413 4. Comprehending easy and difficult questions 250 5. Using three words in sentence (two ideas) 279 AGE XI 1. Detecting absurdities in statements 226 2. Using three words in sentence (one idea) 279 3. Giving 60 words in three minutes 233 4. Giving rhymes with day, mill and spring 213 5. Reconstructing dissected sentences 190 AGE XII 1. Repeating 7 digits 413 2. Defining abstract terms 144 3. Repeating a sentence of 28 syllables 169 4. Resisting suggestion (length of lines) 203 5. Solving problems from various facts 123 The tests in the "six year" group, with the exception of de- fining in terms of use, and the tests in the "twelve year" group, with the exception of repeating 7 digits, were given so infre- quently or so irregularly that the data from them could not be treated. The apparatus used in the test of arranging five weights was not constant throughout the experiment, the standard cubes VARIABLE FACTORS IX THE BIXET TESTS ii and weighted pill boxes being used at different times by different experimenters. On this account, the data from this test are not included in the subsequent discussion. Methods of Treating Results The chronological age of each subject was taken as that at the last birthday, one tenth of a year being allowed for each 36 days beyond the birthday. The subject that was 10 years and 35 days would be rated lo.o years, while ten years and 36 days would be 10. i years. A subject one .day short of 11 would be rated 10.9 etc. The teachers of each grade submitted the dates of birth of all pupils after the grade had been tested. These data were later checked up from the entrance cards. Since the purpose of this study is to analyze the factors involved in the individual tests, no "mental ages" or total scores were fig- ured. The classifications of the subjects are all made independ- ently of the tests. Two measures of central tendency will be used in the subse- quent discussion, the average and the median. The measure of variability from the average, that will be used, is the mean variation (or average deviation), the average of the differences, regardless of signs, between the separate measures in the series and the average of the whole series. The measure of variability from the median that will be used is the semi-interquartile range (Q), or half the difference between the measure with three times as many measures above as below it and the measure with one third as many measures above as below it, i. e. half the difference between the 25 percentile, and the 75 percentile. Any coefficients of correlation used will be stated in terms of the formula applied. The reader is referred to Thorndike (70) for the discussion and explanation of the statistical measures used. The measures of ability in most of the tests are in the "all or none" form — the tests are either passed or failed. The only measure that can be obtained from data of this sort is the per- centage that an ability is present in a defined group. This method of treating the results has as many "pit-falls" as the tests themselves. Before undertaking the analysis of the Prince- 12 CARL C. B RICH AM ton data to determine the effect of the personal equation of the experimenter, and the age, grade, and sex of the subject upon the results of the individual tests, it is necessary to consider an error which underlies incomplete data, or those data derived from experimenting in which every test is not given to every subject. No uniform instructions were given to the experimenters con- cerning the order in which the tests should be given, nor the number of tests that should be tried. The experimenters at- tempted to determine the mental age of the child according to the scale. In doing this they would start with some test which they considered would be interesting to the child, and, at the same time, well within his reach. The tests given first were usually those of describing pictures and arranging five weights. The experimenter would then gradually explore the subject's range of ability, varying the order of the tests so as to maintain the subject's interest, and to ward off fatigue. In this way the experimenter would eventually establish the basal age of the subject (that age in which he passed all five of the tests), and by the end of the examination would have tried all the tests above the basal age which, in his judgment, there was any possi- bility of the subject's passing. This method of experimenting will be called incomplete. The other method of experimenting, in which a certain number of tests are adopted and all of the tests are tried on each subject, will be called complete. Each experimenter in the Princeton investigation averaged 19 or 20 tests to a subject. In the Trenton investigation all the tests were given to all the subjects. The incomplete method is more desirable from the standpoint of the subject who is not unnecessarily fatigued, and from the standpoint of the experimenter, as well, who saves in the ex- penditure of time and energy. However, the data derived from the incomplete method are subject to an error, which, unless it is properly considered, will completely vitiate the results. When the experimenter does not try a test above the basal age because he believes that the subject will not pass it, he im- plies that the subject will fail it. This amounts to a failure. VARIABLE FACTORS IN THE BIXET TESTS 13 for the subject receives no credit. However, a failure of this sort, due to the experimenter's assumption, is not the same as an actual failure in which the test is tried, for there is always the possibility that the assumption was unjustified. In like man- ner when the experimenter does not try tests below the basal age, he actually gives credit for passing the test without the actual trial. In some cases the assumption on the part of the experimenter is quite justified. Obviously if a subject can make change, he can count up to thirteen; if he can repeat seven digits, he can repeat five and six digits; if he knows the names of the months, he will know the days of the week; and, conversely, if he cannot repeat the days of the week, he cannot repeat the months. Other assumptions are less justifiable. Since very intelligent persons, lacking in particular sorts of abilities, might fail in tests such as drawing the design from memory or arranging five weights, there is no reason for supposing that a subject making basal "eleven" or "twelve" will pass these tests. At the same time there is no reason for assuming that a subject failing to estab- lish basal "seven" for instance, will fail to pass a test such as the line suggestion test in "twelve". The assumptions of the experimenters, then, are more or less justifiable and it is im- possible to estimate the amount of the justification, since this is dependent on the nature of the individual tests. The manner in which this error works out in the statistical treatment of the results may be shown by examining any test which has been tried through a number of chronological ages. Table i shows the results of the 60 w^ord test obtained from subjects 7 to 13 years of age. TABLE NO. I Analysis of the Results from the Test in Naming 60 Words in 3 Minutes. Chronological ages 7 8 9 10 11 12 13 No. of times given 11 18 25 42 44 31 28 No. of time passed 4 10 10 24 34 19 21 Actual percentage passed 36% 56% 40^0 57% 77% 61% 75% Total number of subjects 60 52 42 54 48 36 28 Percentage of subjects to whom test was given 18% 35% 60% 78% 92% 86% 100% Theoretical percentage passed 7% 19% 24% 44% 71% 53% 75% 14 CARL C. BRIGHAM An example will make the above table clear. The 60 word test was given to 11 subjects, age seven, 4 of whom passed. In all there were 60 subjects at this age, so that the 11 subjects to whom the test was given constitute but 18% (and probably the brightest 18%) of this whole number. The percentage passed would have been 7% had the test been given to all 60 subjects, and had all the subjects failed who the experimenters assumed would fail if they gave the test. The true per cent, which represents the ability of non-selected seven year boys and girls in passing the 60 word test therefore lies somewhere be- tween 7% and 36%, probably nearer 7%. An accurate estimate of the real per cent, which will represent this ability is, however, impossible. In like manner, the ability of the 8 year subjects is represented by a percentage somewhere between 19% and 56%. As the proportion between the number of subjects in the group and the number actually tested increases, the disparity be- tween the actual and theoretical percentage passed becomes less, or, in other words, the results which express the ability of a group become more reliable as the number of individuals actually tested as a sample of this group becomes larger. The higher the percentage given, the more reliable the percentage passed, when the reliability is measured by the difference between the actual percentage passed and the theoretical percentage passed. The source of error mentioned causes great difficulty in com- paring the results of different investigators. For example, it is desired to compare the results of Terman and Childs (66) and Douglierty (23) with those of this investigation on the 60 word test. Table 2, derived from their published results, shows the percentage that the test was given of the number of times it was possible to be given, (%G), the actual percentage passed, (A%P), and the theoretical percentage passed, (T%P), or that percentage passed that would have resulted had all of the sub- jects failed, who it is necessary to suppose would have failed, had the test been given all the possible number of times. VARIABLE FACTORS IX THE BIXET TESTS TABLE NO. 2 Analysis of the Results of Three Investigators on the 60 Word Test. This investigation Terman and Childs Dougherty Age %G A%P T%P %G A%P T%P %G A%P T%P 7 18 36 7 14 50 7 8 35 56 19 47 35 16 IS 9 60 40 24 86 57 49 35 60 21 10 78 57 44 100 67 78 53 41 II 92 77 71 98 83 82 89 79 70 12 86 61 53 97 82 80 91 95 87 13 100 75 100 94 94 88 83 It is very difficult, if not impossible, to make a comparison of these results shown in Table 2 for the years 7, 8 and 9. The ability of Terman's 7 year group is represented by a figure some- where between 7% and 50%, while that of the 8 year group falls somewhere between 16% and 35%. Dougherty's 9 year group falls between 21% and 60%. In the older years where the re- sults have greater reliability, it is probable that the discrepancies betw-een the investigators could be accounted for on the basis of the inferiority of the selection of the older subjects in this investigation, the other investigations including children from the seventh and eighth grades. In order to make a comparison between investigators, it is necessary to express the results in terms of a percentage or a proportion. The expression of the ability of a group by a per- centage or a proportion is inaccurate if the data are incomplete, and in order to judge the accuracy of the data, it is necessary to know the degree of completeness. Unfortunately, the results of most of the investigations on the individual tests are not pub- lished in a form that enables one to estimate the accuracy of the data. The w-riters who have published their data in a form that will admit of this treatment, have not treated the sexes separately. On this account, the writer will not attempt a sys- tematic comparison of the results of this investigation with those of other experimenters. Before analysing the Princeton data the following problem should be answered: — What proportion of a given group must actually be tested for an ability in order that the results may be i6 CARL C. BRIGHAM considered as typical of the ability of the whole group? The proper proportion to select as typical of any one group depends upon the characteristics of the group itself. If the members of a group are similar, a smaller proportion would stand for the ability of the group than would be necessary for a group composed of unlike individuals. A smaller number of individuals would be necessary to stand for the ability of all the 12 year boys in the sixth grade, for example, than for all the 12 year boys coming from a great many grades. This proposition op- erates directly counter to actual practice, for the members of a group of similar individuals will be given similar tests, while unlike individuals will receive different tests, inasmuch as the experimenter adapts his procedure to the need of the individual being examined. The proposition actually means, then, that selected results from incomplete testing are more reliable than non-selected results, if each group has the same range of testing. The proportion of a group that must be tested to stand for the whole group will also vary from test to test. In some tests of particular abilities, no proportion will accurately stand for the whole group — the entire group must be tested. In other tests that are easy for the group, the results of a very small proportion would not be altered by examining the remainder of the group. The problem of deciding what proportion of a given group must actually be tested for an ability in order that their results may be considered as typical of the ability of the whole group has, therefore, no answer in the work. The writer will decide arbitrarily what the proportion will be. The actual magnitude of the proportion between the number actually tested and the number in the whole group (the percentage given) will always be published as an index of the reliability of the percentage that the group passes the test in question. It is not possible to obtain reliable results showing the growth of an ability with age, if the data on which the results are based are of the incomplete sort. A test for any age will be given to a superior selection of subjects below that age, and an inferior selection of subjects above that age, so that the growth curve VARIABLE FACTORS IN THE BINET TESTS 17 will appear flatter than it actually is. For this reason, the Prince- ton data may not be used for the purpose of standardizing age norms. Binet (4) recognized the fallacy of calculating proportions from the actual number of times a test was given and passed when the test had not been given all the possible number of times. In calculating the proportions from Levistre and Mode's data, Binet used what the present writer would call the "theoretical proportion passed". It has been shown that the reliability of the theoretical per- centage passed rests on the accuracy of the experimenters' as- sumptions, and that according to the nature of the tests and the character of the groups to which they are given these assump- tions vary from complete certainty to absolute uncertainty. Inasmuch as these assumptions are not equally certain, the con- clusions drawn from them are not equally certain, and the logic of scientific method demands that an investigator establish the degree of certainty of his conclusions. In this case the measure of the degree of certainty is the magnitude of the percentage given. The use of the theoretical percentage passed without reference to the percentage given ignores the dictum that an investigator establish the degree of certainty of his conclusions, and sets up all conclusions as equally valid, a procedure which in actual prac- tice results in making all conclusions equally invalid when the fact of degrees of certainty is admitted. The investigator who draws conclusions from incomplete data should always state the percentage given and the actual percentage passed. This much at least is experiment. The only legitimate use of the theoretical percentage passed is when it is compared with the actual per- centage passed as a probable limiting value. The theoretical percentage passed alone has no claim to reliability. III. THE PERSONAL EQUATION Before attempting to correlate the individual tests with age, grade and sex, it is necessary to demonstrate the presence or absence of the effect of the personal equation. By the term "personal equation" is meant the complex of variable factors which are independent of the mental make-up of the subject and the environmental conditions at the time of the examination. The term includes such widely different factors as the experi- menter's ability to obtain the cooperation of the subject, his pro- cedure in giving the tests, his criteria in deciding whether a subject's response should pass or fail, and the tests used, insofar as the selection of tests and the construction of the apparatus were occasionally left to his discretion, apart from the uniform procedure. The only method of detecting the influence of the personal equation in most of the tests is that in which the responses of similar- groups of subjects to different experimenters are com- pared. On account of the wide variations in the character of the subjects examined, it is not possible to compare similar groups. On some tests, however, it is possible to determine the effect of the personal equation independently of the method of group comparison. The results of the tests that may be studied independently will be discussed at some length, in order to dem- onstrate the fact that certain tests are susceptible to this influence. The examinations of the Princeton subjects were made by four experimenters, called for convenience A, B, C and D. None of the experimenters was highly trained in giving the tests, although they had all been trained in the methods of psycho- logical experimentation, one experimenter being an assistant pro- fessor of psychology, and the other three graduate students of psychology of at least one year's standing. B, C and D per- formed their experiments at the same time, in the spring of 19 1 3, while A experimented one year later. B, C and D studied rARIABLE FACTO US IX THE BIXET TESTS 19 the scale together so that it was possible to secure a correspond- ence in method. At the close of practically every day's testing, B, C and D would confer on the questions brought out by the day's work, and as far as possible would adopt uniform methods of procedure and scoring. A was subsequently trained in these same methods. In spite of the attempt to adopt uniform methods, there were a few tests which always caused difficulty, and concerning which the experimenters could reach no definite agreement. One of the tests that caused the greatest difficulty was that of defining in terms of use and in terms superior to use. The hierarchy of responses to this test could be fairly arranged as follows. To the question "What is a chair?" the following typical responses would be obtained,— I, "A chair is a chair." 2, "This is a chair." 3, "x\ chair is to sit on." 4, "A chair is what you sit on." 5, "A chair is a thing you sit on." 6, "A chair is a piece of furniture you sit on." 7, "A chair has four legs, a back, etc." 8, "A chair is a piece of furniture with four legs, a back, etc." Any of the objects for which a definition is asked (fork, table, chair, horse, mother) may be defined by repetition, by demon- stration, by indicating the use to which it is put, by showing the class to which it belongs, by describing its parts, or by the combination of any or all of these methods. The only problem is to decide, arbitrarily, how definitely the class must be indicated (i. e. by "what", "a thing" or "a piece of furniture") in order to have the definition considered as one of classification. The rule adopted in this study was to consider "thing" as indicating the class. Nos. i and 2, definitions by repetition and demonstration, received no credit in "six years". Nos. 3 and 4 were given credit in "six years" as definitions by use, and nos. 5, 6, 7, and 8 were given credit in "nine years" as definitions in terms superior to use. In studWug- the ranks given to the responses of the subjects in this test, it was found that the experimenters did not record the responses all of the time. A gave the test 94 times, and recorded the responses 66% of this number. B gave the test 98 times, recording the subject's answer 67'y^c of the time. C ^o CARL C. BRIGHAM gave the test 65 times, and recorded the answer in 95% of the cases, while D gave the test 76 times and recorded the response only once. By ranking the recorded responses of A, B and C according to the rules shown above, it is possible to obtain an estimate of the relative severity of their criteria in marking these responses plus or minus. 19% of A's definitions were corrected, the cor- rection in all cases being from minus to plus. 11% of B's definitions were corrected, all of the corrections being from plus to minus. iy% of C's definitions were corrected, three fourths of them being changed from plus to minus, and one fourth from minus to plus. C's standards changed during the course of the experiment, so that at first, with older subjects, he was too lenient, while later, with younger subjects he was slightly too severe. The tendencies of A and B remain constant throughout the experiment, A marking too severely and B slightly too len- iently. The differences between the experimenters hold constant for both sexes. The experimenters agreed on all definitions by use, the cases of disagreement coming on the definitions superior to use. One test in which variations between the experimenters might be expected is that of copying the diamond. In this test, al- though the apparatus and procedure were the same, the experi- menters had very little to guide them in forming their judgments of passed and failed. The instructions given ("The result is considered satisfactory if it would be recognized as intended for a diamond shaped figure"), and the examples published furnish very vague criteria. In order to determine the effect of the personal equation of the experimenters in giving credit on this test, all of the repro- ductions of the diamond obtained in the Princeton and Trenton experimenting, (311 in number), were first transcribed and then ranked. On the sheet containing the copy only the subject's number was placed, so that the person ranking the reproductions was in ignorance of the experimenter by whom it was obtained, the mark that the experimenter had given it, and the age, grade, sex, etc. of the subject. The 311 diamonds were then classified VARIABLE FACTORS IN THE BINET TESTS 21 into six groups by one observer. The classification, at best, was vague and indefinite, but it represented the unbiased judgment of a single person. Inasmuch as the reproductions were classified and re-classified a great many times, small errors in the classifi- cation would be counterbalanced. The first group contained fairly accurate reproductions of the original, diamonds of approximately the same size as the copy, having the sides and opposite angles nearly equal, and with a proper proportion between length and width. The second group contained figures inferior to those of the first group in size or symmetry, but representing a fairly high grade of ability. The reproductions that were less symmetrical than those of the second group were classified in the third and fourth groups. Figures showing some inequality between length and width were classified in the third group, while those of approximately unit proportion, square shaped figures, were classified in the fourth group. The reproductions placed in the fifth group were figures less sym- metrical than those of the fourth group, and figures which had curved sides and rounded corners. The sixth group contained all figures which it would have been difficult to have recognized as intended for a diamond, figures having three, five or more sides, circles, elipses, unfinished lines and eccentric figures. The above classification did not offer an opportunity for a sharp grading between one group and another, but in general, the reproductions placed in the various groups from the first to the sixth, represented a decrease in the ability to copy the dia- mond. The justification of the method was not in the accuracy of the classification, but in the fact that the material was all classified by one observer (B), in such a way that he was in ignorance of the original rank that had been given the repro- duction, of the experimenter who graded it, and of the character of the subject. 16%' of the reproductions were classified in the first group, 21% in the second group, 20% in the third group, 17% in the fourth group, 9% in the fifth group, and 17% in the sixth group. (The irregularity of the distribution is due to the presence of the diamonds drawn by the Trenton subnormal group.) 22 CARL C. B RICH AM After classifying all of the reproductions the ranks given to them by the different experimenters were then compared with the group in which they were classified. That the sliding scale classification used represented real dift'erences between the re- productions is shown by the relative certainty of the experi- menters' judgments. None of the reproductions classified in the first and second groups were ranked as failed by the four experimenters, while only one reproduction in the third group was ranked minus. i8% of the fourth group, 45% of the fifth group and ^7% of the sixth group were ranked as failures. All of the sixth group diamonds that were ranked plus (23%), were so ranked by one experimenter, A. To obtain a general estimate of the relative severity of the experimenters' criteria in making their judgments of passed or failed, the diamonds obtained by each experimenter from boys and girls were classified according to rank, plus or minus, and according to their group in the classification. From this it was possible to obtain an estimate of the passing mark of each ex- perimenter. For example, the boys of experimenter B passed the test 72% of the time according to his ranking. Had B given credit for the first five groups and failed only the reproductions in the sixth, i. e., had his passing mark been the fifth group. 88% would have passed. Had his passing mark been the fourth group, 81% would have passed. If it had been the third group, 72% would have passed, while only 56% would have passed had it been the second group. Since 72% of B's subjects actually passed the test, his passing mark was the third group — in the long run, he would pass all diamonds in the first three groups and fail all in the last three. The differences between the experi- menters on this basis are quite marked. The passing mark for C and D was the fourth group, while A's passing mark was the fifth group. B was the most severe, and A was the most lenient, with C and D between the two. The results were the same for both sexes. Another test in which the influence of the personal equation might be looked for is that of copying designs from memory. The experimenter must here use his own judgment in marking VARIABLE FACTORS IN THE BINET TESTS 23 the designs passed or failed. Very little guidance is given by Binet's rule, which reads, "The test is considered passed when one of the designs is reproduced exactly, and half of the other is correctly drawn", or by the interpretation of this "half right" as applying "when two component parts are transposed or one component part omitted". In order to test the experimenters' judgments in ranking this test, a scoring system was devised, which may be explained by reference to Figure i, which gives the original copy and various duplicated portions. In scoring the reproductions of the pyramid section, 5 points were given when the reproduction of the asym- metry of the figures was nearly exact, as in no. i, 4 points for a less perfect reproduction as in no. 2, and 3 points for a repro- duction in which the rectangle fell in the center of the figure, as in no. 3. i point was deducted from this score for each failure to connect the corners of the rectangles as in no. 4 (which is modified from no. 3 and would therefore receive only i point), and no credit was allowed for "boxes'" (no. 5), and other eccen- tric figures. In scoring the more complicated design, 4 points were allowed for each of the "posts", ABCDE and JKLMN, or no. 6. 2 points were deducted for turning them in the wrong direction as in no. 7, (which is "post" ABCDE turned in the wrong direction), 2 points for failure to make the line AB penetrate DE as in no. 8, so that a combination of these errors, as in no. 9, would receive no credit, along with other eccentric reproductions as in nos. 10 A, B, C, D, E and F. i point was given for each of the lines EF and IJ, and 5 points for the "hump", FGHI. A continuous line from E to J as in no. 11 would therefore re- ceive no credit, while a division of the lines, without the portion FGHI, as in nos. 12 or 13, would receive 2 points. An accurate reproduction of the portion EFGHIJ, as in no. 14, would receive full credit for all parts, 7 points, no credit being allowed for eccentric reproductions of the "hump" as in nos. 15 A, B, C and D. The maximum credit for the test is 20 points, divided between the two figures on the proportion of 5 to 15, a fair proportion 24 CARL Q. BRIG HAM c q H L K E F I J P- 6 7 8 9 10 A B C D E F 15 A B CD Fig. I. Method of Scoring Test of Copying Designs from Memory (in the writer's opinion) according to the relative difficulty of the parts. A design with "one component part omitted" would be scored 13 points according to this system, and one with "two / -JRIABLE FACTORS IX THE BIXET TESTS 25 component parts transposed", 16 points, provided that the repn;- ductions of the pyramid section were perfect in each case. All the reproductions of the designs obtained from the Prince- ton and Trenton experimenting were then scored according to this system. The score of each subject of each experimenter in the Princeton series was then compared with the experi- menter's ranking, which was recorded on the same sheet, and which was not seen at the time the designs were graded by the point system. From the number of times the test was given, and the number of times it was marked passed by the experi- menter, the percentage passed was obtained for each experi- menter for both sexes. The scores from all the designs from o to 20 were then classified according to the judgment passed or failed as given by each experimenter on subjects of both sexes. It was found that there were certain ranges where the experi- menters' judgments coincided accurately, i: e. in the very low scores and in the very high scores. A certain range existed, approximately from 10 to 15 points, in which the same results would sometimes be ranked as passed and failed by the same experimenter at different times. It was possible, however, to obtain a general estimate of the experimenters' criteria by a method similar to that used in the study of the diamond test. For example, B gave the test to boys 48 times, passing 40% of them. Had his passing mark been 18 (i. e. had he passed all subjects whose designs scored 18 points or better), 21% would have passed. Had his passing mark been 15 points, 35% would have passed. Had it been 13, 42% would have passed etc. B's passing mark would therefore fall between 13 and 15 points. In this way, by calculating the percentage passed at each score for each experimenter for both sexes, it was possible to obtain the passing mark of each group. The passing marks coincided very closely except in one case. With one exception the passing marks were around 12, 13, 14 or 15 points, for the boys and girls of all experimenters, i. e. the experimenters would, in the long run, rank all below this level minus and all above this level plus. The degree of cor- respondence was quite remarkable considering the fact that the 26 CARL C. BRIGHAM experimenters had very little on which to base their judgments. The one exception is both striking and suggestive. C's pass- ing mark for boys was 15 points, for girls 8 points. In order to receive a plus from C, boys would have to draw a much more accurate design than girls, or, in other words, a very faulty reproduction drawn by a girl would receive credit, w^hile the same reproduction if drawn by a boy would invariably be failed. This deviation rests on a small number of cases. A gave the test to 24 boys and 21 girls, B to 48 boys and 33 girls, C to 28 boys and 22 girls, and D to 36 boys and 31 girls. A's results, although resting on a number of cases as small as C's, show no such deviation as those of the latter. On account of the small number of cases, this finding cannot be considered definite. It does, however, suggest the possibility of a difference in the ex- perimenters' reaction to the sexes. An experimenter may show greater leniency to one sex than to the other, so that a supposed sex difference may be the results of an experimenter's reaction to the sex, rather than the sex's reaction to a test. The test of using three words in a sentence ("Philadelphia, money and river") was given 279 times, and the sentences given by the subjects were recorded over half the time. Experimenter A gave the test 53 times, recording the result 36% of the time. B gave the test 95 times, recording the answer in 92% of the cases. C gave the test 56 times, recording the answer 23% of that number, and D gave the test 75 times, recording the response in 43% of the cases. To obtain a check on the accuracy of the experimenters' scor- ing of this test, all of the recorded sentences were transcribed so that they could be studied and ranked without reference to the subject or the experimenter. The 162 recorded sentences were then marked plus or minus by one observer (B). This ranking was checked several times and then compared with the original ranking. There was no disagreement between the judgments of the four experimenters and the one impartial observer in marking responses for the "ten year" credit. In marking for the "eleven year" credit, there were 8 disagreements out of the 162 judg- VARIABLE FACTORS IN THE BINET TESTS ^ ments, the 8 variations being evenly distributed among the ex- perimenters. It may be concluded, then, that the influence of the personal equation is absent in this test, although there is ample opportunity for variation. The detailed study of the foregoing tests has shown that the personal equation of the experimenters has a marked effect on the results of some of the tests. In the subsequent correlation of the tests with grade and sex the corrected score of these tests will be used. Only those definitions will be used which were re- corded by the experimenters, and the ranking of the one observer will be followed. All reproductions of the diamond in the fifth and sixth group will be scored as failed, the others as passed. A reproduction of the designs scoring 15 or more points will be ranked as passed. The corrected results of the sentence test will be used. To show that the effect of the personal equation of the experi- menter is present or absent in the tests on which there is no actual record of the subject's response, is a more difficult prob- lem. The most reliable method of showing the influence of this factor is that in which the reactions of similar groups of sub- jects, examined by different experimenters, are studied. The greater the similarity of the groups the more reliable the results. If two experimenters each examined 50 boys of 12 years of age from the sixth grade, their results should compare closely, and any difference could immediately be referred to a difference in the personal equation. However, if one examined boys from this grade and the other girls, the variations might be explained on the basis of sex differences. In the same way the results may vary with the age of the subject, and with his grade and nation- ality. It is not possible in this study to obtain groups of a sufficient degree of similarity, in spite of the small number of children of non-English speaking parents, and the fact that the sexes may be treated separately. The subjects vary in age from 4 to 16, and in grade from the kindergarten to the sixth grade. A ex- amined a very much younger run of subjects than B, C and D. The data of the four experimenters were treated by three meth- 28 CARL C. BRIG HAM ods, by comparing the per cent, that all boys and girls of each experimenter passed each test, by comparing the per cent, that selected subjects of each experimenter passed each test, and by comparing the per cent, that all subjects from 5 to 9 and from 10 to 13 passed each test. The sexes were separately treated in each method. None of the methods proved satisfactory, and it was found to be impossible to obtain an accurate quantitative estimate of the ef¥ect of the personal equation on each test. In certain of the tests, however, there were known differences of procedure which might have influenced the results, while the variations in the results of certain other tests were so striking that definite conclusions could be drawn. One possible source of variation was the use of alternative questions in several of the tests. When an entire school system is examined, and the children learn that they will all be tested, the possibility is always present that they will inform each other of the nature of the tests, and the answers to some of the ques- tions. The alternative questions were used to counteract the influence of this factor. In the test of detecting absurdities in statements, ten or eleven statements were used, the experimenter choosing the five that he would give the subject. The statements varied greatly in difficulty and the experimenters did not use the same selection throughout the experiment. This test was given by B to 26 girls whose average age was 10.6 years, while D gave the test to 25 girls whose average age was 10.9 years. 65% of the girls examined by B passed the test, while only 36% of D's group passed. The variation between the experimenters might be due to the selection of absurdities of unequal difficulty, or to different criteria in grading the responses. The sources of variation are too large to admit of obtaining any reliable results from this test in correlating it with grade and sex. 75% of the girls to whom B gave the test of reconstructing dissected sentences passed, while only 28% of C's girls passed. The average age of the 26 girls to whom B gave the test was 10.8 years, and the average age of C's subjects 10.5 years. Part of the difference between these two experimenters is due to the VARIABLE FACTORS IN THE BINET TESTS 29 fact that more of B's subjects came from the fifth and sixth school grades. Some variation might have been due to different apparatus, B using cards with the sentences printed on two lines, while C had the sentences typewritten on one line. The sentences used by B were more legible, and, being broken into two lines, it was easier to grasp the individual parts as discrete units. Each experimenter used six sentences of varying difficulty so that some variation might be expected from the selection of the three sen- tences for the test. Whatever the cause of the discrepancy be- tween the results of the two experimenters, it is obviously impossible to dbtain any reliaible conclusions concerning the cor- relation of this ability with age, grade or sex, on account of the presence of so many variable factors. Three problems were used in the test of making change, 20c — 4c, 25c — 6c and 25c — 9c, the process of subtraction involved in each being of unequal difficulty. Certain variations occurred in the tests of comparing remembered objects and com- prehending easy and difficult problem questions. Alternative questions were used in both of these tests, and variations might occur due to the relative severity of the experimenters' judg- ments in marking the responses passed or failed. None of the tests in which alternative cjuestions were used will be treated in the subsequent discussion of the results. At the close of the experiment, it appeared tliat a difference of procedure had existed between A and B in the test of indi- cating omissions in pictures. A and B both showed the three faces first, and the figure with the arms missing last, according to the standard procedure, but A, if his subjects failed to detect the parts omitted from the faces, would give them another trial after they had detected the missing arms. A gave this test to 51 boys and 33 girls, B to 30 boys and 30 girls, his subjects averaging about a year and a half above those of A. The test was passed by 76% of A's boys and 97% of A's girls, but by only 60% of B's boys and 6^% of B's girls, showing that the difference of procedure had a most striking effect on the results. It is interesting to note what the effect of a difference of this magnitude would mean if the material from this test were used 30 CARL C. BRIGHAM as a basis of assigning it to the proper ''age group'' in the scale. If a test is to be considered normal for a given age if it is passed by 75% of the non-selected school children of that age, the test of indicating omissions in pictures would be a "six year" test for A, and an "eight year" test for B. The data from this test will not be treated in the subsequent discussion. In the analysis of the results of the definitions test, it was found that certain differences existed between A, B and C in scoring the responses of the subjects as superior to use. No esti- mates could be made concerning D, for he did not record the actual responses. B, C and D gave this test to approximately the same range of subjects, averaging about 9 years. The cor- rected results of B and C show, in all, 28% of their subjects giving definitions superior to use, while 65% of D"s subjects pass this test. Obviously D was very much more lenient than B and C. The influence of the personal equation may or may not be present in the remaining tests. In the opinion of the writer it is not present to any marked degree. The data of the four ex- perimenters were treated in several ways, and in none of these was it possible to demonstrate this influence. The writer's opin- ion, however, is more or less certain according to the test. The tests of repeating digits might show a slight difference between C's results and those of the other experimenters, a difference which could be explained by reference to the rate at w^hich the digits were spoken. The results of experimenter D are slightly lower than those of the other experimenters in the tests of naming 60 words in three minutes and naming rhymes. Whether these differences are real or not, the writer does not know. The data from these tests are included in the subsequent study. In the subsequent treatment of the results in terms of grade and sex, the material from the following tests will be treated. VI-2 and IX-2, Defining in terms of use and in terms superior to use. VII-i, Counting 13 pennies. VII-2, Describing pictures. VII-4, Copying diamond. VII-5, Naming four colors. VARIABLE FACTORS IN THE BINET TESTS Zl VIII-2, Counting backward from 20 to o. VIII-3, Enumerating the days of the week. VIII-4, Counting stamps (three singles and two doubles). VIII-5, X-3 and XII-i, Repeating 5, 6 and 7 digits. IX-3, Naming the day and date. IX-4, Enumerating the months. X-i, Naming the pieces of money. X-2, Drawing designs from memory. X-5 and XI-2, Constructing a sentence, containing one or two ideas from three given words. XI-3, Giving 60 words in three minutes. XI-4, Giving rhymes with "day", "mill" and "spring". The treatment of the results of the definitions test will be confined to the recorded and corrected definitions of A, B and C. The results from the diamond test are based on the scoring sys- tem outlined, the passing mark being the fourth group unless otherwise indicated. The arbitrary point system of scoring the design test is used in the subsequent calculations, the passing mark, unless otherwise noted, being 15 points. The corrected scoring of the sentence tests will be used. The foregoing study of the effect of the personal equation shows conclusively that in certain tests this influence is present to a very marked degree. The errors involved may be traced to three sources, to the apparatus used, to the technique of the experimenters in giving the tests, and to the experimenter's ob- servation in marking the test passed or failed. The error due to apparatus may result from a variation in the material itself, or from the calibration of different sorts of ma- terial as equal in difficulty, e. g. — alternative questions. The variation in the material used by B and C in the test of recon- structing dissected sentences illustrates the error due to defect in the material. The writer has seen apparatus for the line sug- gestion test in use, in which the last three pairs of lines were actually unequal, the difference between the pairs being above the threshold of discrimination. The subject with good dis- crimination will invariably fail this test when this faulty ap- paratus is used. The error due to the use of alternative questions is more Z2 CARL C. BRIGHAM common and therefore has more practical significance than de- fects in the material itself. There is a strong temptation for an experimenter, who believes a certain question to be unfair, to substitute another which seems to him to be of the same diffi- culty. In the study of the Trenton results, which will follow, it will be shown that the different questions included under the same test are not of the same difficulty. The question, "What would you do if you were delayed in going to school?" was passed by practically none of the normal children of 12, 13 and 14. If this question is changed to Goddard's (28) interpreta- tion, "What ought one to do if he is afraid he'll be late for school?", the test is easily within reach of the 12 year children. The difficulty in the first test is caused by the word "delayed". Changing the structure of the test changes its nature completely. In this connection it is to be regretted that Town {y2) in the appendix of her translation of Binet's 191 1 scale, has changed the wording of some of the tests from that in the actual body of the translation. For example, the question "What would you do before taking part in an important afifair?" (page 47) is changed to "Before taking part in something very important, what would you do?" (page 78), and "Why is a bad action done when one is angry, more excusable than the same action when one is not angry?" (page 47), becomes "Why do we more easily pardon a bad act done in anger than a bad act done without anger?", (page 79). The meaning is the same but the wording different; and in many cases success or failure in a test depends on the interpretation of a single word. If an ex- perimenter using Town's translation were allowed to select his questions from the actual translation or the appendix indiscrimi- nately, variations would, in all probability, result. The general proposition that there is no such thing as an alternative question, i. e. a question involving the same mental processes and having the same difficulty as another, could very easily be maintained. To avoid this error experimenters should adhere strictly to one wording and should never be allowed to substitute one question for another. An example of the influence due to differences of the tech- VARIABLE FACTORS IN THE BINET TESTS 33 nique of the experimenters in giving the tests is afforded by the test of detecting omissions in pictures. This test is a "six year" test for A and an "eight year" test for B. Differences in pro- cedure make it very difficult if not impossible to compare the results of one investigator with those of another. To eliminate this error, very careful and minute instructions should be pub- lished for the giving of each test. No edition of the Binet- Simon scale is entirely satisfactory in this particular. Examples of errors due to the observation of the experi- menters are afforded by the tests of copying a diamond and defining in terms superior to use. Errors due to observation may be avoided or minimized by increasing the number of grades of response with which the particular response in question may be compared. This principle is followed by Yerkes (82) in the arrangement of the Point Scale. In the diamond test, for ex- ample, Yerkes allows three grades of response while Binet al- lows but two — plus or minus. The accuracy of any measure increases with the number of gradations on the measuring scale, and the significance of the error of observation is diminished by decreasing the chances of wide displacement. In the tests in which a definite question is put to the subject, uniformity of scoring may be obtained by an accurate and painstaking catalogu- ing, and a subsequent classification and weighting of all the responses of a large number of subjects to each question. If the responses to a free association test may be classified into a re- latively small number of groups, then the responses to a restricted association test could be classified into a much smaller number of groups. A sufficiently large number of responses will include practically all possible responses. In this way the chances of the error due to observation are diminished, while the adoption of a point system of scoring will minimize the effect of any errors that might be made. The differences between the experimenters in this study are large enough to demonstrate the influence of the personal equation. Scientific procedure demands that the investigator who studies the results of the individual tests for the purpose of analysing the factors involved or for obtaining age norms should 34 CARL C. BRIGHAM demonstrate that the effect of the personal equation is not present in the results treated. The burden of proof should be on the person who maintains that the influence is not present. Negative results concerning the influence of the personal equation that are based on the method of comparing the total scores or "mental ages" of different experimenters should not be taken as conclu- sive, inasmuch as the experimenters may deviate in one direction in one test, and in the opposite direction in another, so that in a total score these deviations might equalize. In a study of this sort made on the basis of "mental ages," which has previously been reported, the writer (14) found no deviations between B. C and D, while deviations between these three experimenters do appear in the more detailed study of the individual tests. Studies of the individual tests can have no claim to reliability unless the personal equation has been eliminated. The importance of the personal equation as a source of error in making diagnoses on the basis of the "mental age" of the subject is universally recognized by psychologists and almost universally ignored by medical men, field workers, school teachers and others who have had no experience in making mental meas- urements. Among psychologists there are two opinions concern- ing the solution of the difficulty arising from this source, the first, that of making certain allowances for the inexpert ex- aminers or establishing limits within which their opinions are valid, the second, that of removing the scale from their hands entirely. Doll (22) in discussing criticisms of the Binet scale on the ground that diagnoses of normality and feeble-mindedness are made by inexpert examiners urges "that those who are capable of doing good Binet testing of the mechanical sort without ibeing clinical psychologists should report the findings of their examina- tions of children or groups in tables of related chronological and mental ages and not in terms of normality or abnormality. In their reports they can say with a high degree of certainty that those children who show an intellectual retardation of more than 3 years are feeble-minded, but they should not say that those wlio test less than 3 years retarded are backward or normal. In VARIABLE FACTORS IN THE BINET TESTS 35 the lesser degrees of retardation only the expert is capable of evaluating the details of a Binet test with any finality as to either diagnosis or prognosis." (page 607). Doll also points out that Binet examiners who have worked in institutions give very reliable diagnosis, for they intuitively sense distinctions which inexpert laymen do not see. When the re- sponsibility for the diagnosis is placed on the examiner in this way, the scale it treated as a qualitative instrument. This stand- point is quite different from that in which certain allowances are made for all inexpert examiners and the quantitative character of the scale preserved. Goddard (31) in a study of the personal equation based on re-testings of normal and feeble-minded in- dividuals fixes the quantitative limits somewhat higher. "In all cases where a child tests four or more years behind his age, there is little danger of error in considering him feeble-minded, even though the test was made by a person who was not highly expert, provided such a person is able to use the test with reasonable intelligence. With the borderline cases, those vVho are two or three years backward, the best expert should be employed in the testing." (pages J^-yy). As early as 19 10, before the scale had received very extensive application, Huey (35) took the stand that inexpert examiners sihould not use the scale. In discussing this point he said, "I w^ould urge that these Binet tests must be used with judgment and trained intelligence, or they will certainly bring themselves and their authors into undeserved disrepute. — Results can be con- sidered valid only when the tests are made by an experienced psychologist who has familiarized himself with Binet's directions, or by other competent persons who apply the tests under the direction and supervision of such a psychologist." (page 444). Three years later, in referring to the reports that the medical inspectors in Pittsburgh were to take over the Binet testing in the sichools, Whipple (78) says, "And we can only express our hopes that these reports are unfounded, or that at least those in au- thority may be led to understand that for a person, whoever he may be, without extensive psychological training to attempt to di- agnose the precise mental status of a school child is about as 36 CARL C. BRIG HAM absurd as for a mere psychologist to attempt to diagnose in- cipient tuberculosis or any other obscure pathological condition." (page 302). The same position is taken by Whipple {yy) in anbther editorial. "We have no quarrel with the use of the scale in the public school: properly used, it is of direct and practical value ; but improperly used, it will become a farce which can but bring discredit upon psychology and retard the movement for its application to educational practise." (Page 119). In defense of this position, Whipple calls attention to an error inherent in the procedure of all inexpert examiners. "There is nothing about the conduct of the Binet-Simon tests that is in- trinsically difficult, yet there is a source of error inherent in the use of any psychological procedure, which, as experience shows, is surmountable only by drill in psychological experimentation. I refer to the difficulty of following directions. No one who has drilled students in the laboratory has failed to be struck with the impossibility of laying down fool-proof directions for the conduct by an amateur of a psychological test." (Page 119). Kuhlmann (43) agrees with Whipple in this position. "The untrained examiner meets difficulties because he lacks the follow- ing: (a) Familiarity with the directions for giving the tests, (b) Familiarity with the rules for interpreting the responses of the children, (c) Ability to adapt the procedure in testing in special instances for which directions can not be given, (d) Ability to interpret responses in special instances for w'hich rules can not be given, (e) Ability to adapt himself in attitude to the mental levels of children of different ages so as to obtain the best efforts from the child in each case, (f) General ap- preciation of the absolute necessity of adhering strictly to all rules of testing, and of careful, painstaking work. These deficiencies are of quite different degrees of importance. The last is, on the whole, the most serious and most frequent, and can be remedied only by extended laboratory training." (Pp. 255 and 256). In regard to the quantitative allowance that must be made for inexpert examiners, Kuhlmann's article affords the following, "The amount of error made by an examiner because of his lack of training seldom equals two years in the mental age; in the majority of cases it is less than one year." (Page 256). IV. GRADE CORRELATIONS The correlation between intelligence, as measured by the Binet scale, and school performance, as measured by age and grade standing, has been worked out by various investigators. In all cases intelligence was measured by the "mental age" or total score of the Binet tests, and pedagogical age by assuming that all children begin school at a certain age and should therefore be in certain grades at certain ages. Stem (62) has reviewed the work of Goddard (30), Binet (4), and Bobertag (10) in this field, with the general conclusion that the correlation is only moderately high. The number of children showing mental advance is in excess of those showing pedagogical advance, but very rarely do children showing pedagogical retardation show mental advance. The correlation is one-sided in that "inference from school performance to mental ability is safer- than from mental ability to school performance." (Page 61). Stern ac- counts for the discrepancies on the ground that "performance in the school depends not only upon intelligence, but also upon certain other and quite different factors." (Page 63). These factors are strength of memory which plays a large part in school performance but correlates only to a moderate degree with intelligence, and other factors that have nothing to do with intellect but belong largely in the domain of the will — "the degree and duration of attention, industry and conscientiousness, sense of duty and capacity to fit into the social group." (Page 63). Stern concludes that "the lack of agreement between tests of in- telligence and school perfonnance is really calculated to increase our confidence in the psychological test-methods," (Page 64) that absolute correlation is not to be desired since that w^ould mean that the tests were testing school performance only, and that the measure of intellectual ability was the school performance itself, the tests being superfluous. More recently, Schmitt (57) has reviewed the work of God- 38 CARL C. BRIGHAM dard, Terman and Childs (66) and Dougherty (23) in corre- lating intelligence, as measured by the Binet scale, and school performance, and reaches conclusions quite opposite to those of Stern, The following quotations from Schmitt's monograph ex- plain her view point. "Further doubt is cast upon the accuracy of the tests by the fact that judgments arrived at through their application do not coincide with that of the school concerning the same subjects." (Page 57). Concerning this lack of correlation Schmitt writes "The Binet tests, therefore, while professing to test native ability are concerned very little with the education which all normal children have the native ability to acquire, and which is of much importance in civilized life." (Page 60). To the investigations cited Schmitt has added one of her own, in whic'h the lack of correspondence between the Binet "mental age" and school grade is shown. The writer is of the opinion that the method of correlating school performance with "mental age" fails to demonstrate either the adequacy of the Binet tests according to Stern, or the com- plete inadequacy of the tests according to Schmitt. For the demonstration of this point Schmitt's investigation may be dis- cussed, inasmuch as it shows the most striking deviations between the measures of the two performances. Schmitt applied Binet's 191 1 scale (Town's translations with modifications) to 150 children of superior social status. The following quotations in- dicate the status of the subjects. "The children who served as subjects for the tests comprised the Kindergarten and first six grades of a private school in Chicago." "They were the chil- dren of the professional class mainly. A few were children of successful business men w^ho sought the best obtainable type of education for their children." (Page 2). The tests were ap- plied at the close of an examination with the Healy-Fernald tests tmder rather unfavorable conditions as indicated by the follow- ing quotations, — "In the conduct of the two sets of tests the Binet-Simon tests were reserved for the last. By the time they were reached the child had been doing tests for an hour or more. In some cases there was too much restlessness and fatigue to VARIABLE FACTORS IN THE BINET TESTS 39 carry the child as far as the majority of his comrades in his grade were able to go and the tests were then discontinued." (Page 68 and 69). The tests in the various age groups given to each grade were as follows, — Kindergarten, tests for V, VI, VII, VIII and IX years; Grade I, tests for V, VI, VII, VIII, IX, X and XII years; Grade II, tests for VI, VII, VIII, IX, X, and XII years. Grades III and IV, tests for VIII, IX, X and XII years ; Grade V, tests for IX, X, XII and XV years; Grade VI, tests for XII and XV years. The "Adult"' tests were also given to Grade VI as a class-room test. Schmitt compared three measures, chronological age, school grade age and "mental age". The "mental age", in case a sub- ject passed all tests in one group and failed one or more in a lower group, could be reckoned from two basal ages, these alternative rating being included by Schmitt. The summary of the results is as follows, — Comparing the Binet age to the chronological age, 14 (or 20)% are retarded, 26 (or 24)% are normal, and 58 (or 54)% are advanced. Comparing the school grade to the chronological age, (using 5 to 6.5 years as the nor- mal age for the Kindergarten, 6.5 to 7.5 for Grade I etc.) 38% are retarded, 56% are normal and 4% are advanced. Comparing the Binet age to the school grade age, 2 (or 4)% are retarded, -5 (or 35)% are normal and y2 (or 60)% are advanced. The essential discrepancies are indicated by Schmitt by the follow- ing,' — "Where the school grading shows 4% advanced over the normal for the chronological age, the Binet grading shows 58% over the chronological age and y2% over the age normal to the school grade." (Page 80.) The discrepancies thus indicated, although much larger than those of other investigators, agree with the general trend of results in that more children are shown to be advanced according to the Binet mental age than according to the school grade age. The results disagree with those of other investigators in finding a higher per cent, advanced by Binet age compared to chronological age. The inadequacy of the methods employed in the investigations of Schmitt and others is seen when the measures are separately 40 CARL C. BRIGHAM Studied. The use of the normal grade age as a measure of scholastic ability is false inasmuch as it rests on the assumption that all children enter school at a certain age, which is not the case. The measure of scholastic ability is the measure of the child's reaction to the subject matter of the grades, and that measure may be expressed only in the fact of promotion, non- promotion or (very rarely) double promotion, in other words, it may be expressed only in the relation of grade to the length of time in school. Furthermore, the two measures of scholastic ability, the age in grade method, and the grade progress method, are measures of an historically past performance not of present possibilities, and the true measure of an ability must indicate potential ability. As measures of scholastic ability in terms of actual reaction, these measures present a distribution of general ability that is skewed toward the lower end, or in the direction of no ability. If a child enters school late, he presents a picture of retardation according to the age and grade method, while throug^h any num- ber of causes independent of intellectual ability, a child may present a retardation of at least a year according to either method. The possibilities for advancement are not as great, however, for advancement means forcing a child through a mass of subject matter, a process which the school is generally unwilling to undertake and the parent is generally unwilling to sanction. The school therefore presents a picture of ability in which promotion is normal, and non-promotion far more frequent than advance. If general ability is to be considered as distributed over any sort of a frequency surface, that surface will not take the form presented by the school measure in which the modal ability is almost completely the upper limit. The measure of "mental age" has been shown to be one which varies from one chronological age to another in the form of its distribution. Normal children of 6 or 7 test over age, while those of II and 12 test under age. This abnormal distribution is due to two facts. In the first place, the tests in the younger years are too easy and those in the higher years are too difficult. VARIABLE FACTORS IN THE BINET TESTS 41 In the second place, the younger children have a wider range of tests beyond their average abihty, so that exceptional sub- jects may display exceptional ability in a manner that is im- possible if ability is measured by school progress, while older children have only a few tests within their range, the picture of advancement being excluded as in the measure of school ability. If the mental ages of a run of subjects of different chronological ages are combined, the frequency surface is nor- mal, the error of the extremities balancing. The investigators who have compared ''mental age" with grade age, have compared two distributions, one of which is markedly skewed, the other normal, but false. The resulting finding of mental advance in excess of pedagogical advance has significance only insofar as it shows that a measure of general ability that will admit of exceptionally high performance is a better measure than one that precludes the possibility of such performance. The only significant finding is that pupils who show marked retarda- tion in school rarely if ever show mental advance. Applying the foregoing discussion to Schmitt's results in par- ticular, all that has been said concerning the inadequacy of the age in grade method applies to her results. The age for enter- ing school being 5, none of the subjects in the Kindergarten could be advanced, w^hile those wdio entered late would be retarded. It is difficult to see how these young children would be able to make up their work in such a way as to show advance during the first two or three school years. The normal age for the sixth grade is from 11.5 to 12.5 years. Inasmuch as no grades were tested above VI, none of the 37 subjects from 11,5 to 14.5 could show an advance, and all of the 19 subjects from 12.5 to 14.5 would necessarily show retardation. Schmitt's results differ from those of other investigators in finding more subjects ad- vanced according to Binet age in relation to chronological age. This deviation is probably due to the fact that she examined a superior selection of subjects, and to the fact that the XV year and "Adult" tests were used, so that the older subjects, who in general fall below their chronological age, had an opportunity to 42 CARL C. BRIGHAM better their scores. The discrepancy shown by Schmitt between school standing and the Binet tests does not demonstrate the inadequacy of the tests. The final demonstration of a correlation between the Binet scale and school grade, rests not in comparing the total score or "mental age" with school grade, for that is susceptible to the errors of over-estimation and under-estimation according to vary- ing chronological age, but in comparing the results of subjects in each grade on the individual tests. The tests may vary in their correlation with grade. Inasmuch as there is a general growth in age with grade, and a corresponding growth of in- telligence with age, a test, in order to be an adequate test of intelligence, must show a correlation with grade. If the correla- tion is too high, however, the value of the individual test is in question for it would then be testing, not intelligence, but grade training. This criterion was actually used, though not stated, by Binet in his discussion of the results of Decroly and Degand (19), and in his revision of the 1908 scale, in which many of the tests that he considered to relate to school training were eliminated. Studies of the individual tests in the light of school grade are not available. Decroly and Degand published in 1910 the re- sults of an investigation on 45 children in a Brussels school, similar in character to that studied by Schmitt in Chicago. Binet discussed these results and those of other minor investigations in the Paris schools in considering the effect of environment on the results of the tests. Although he referred to school training as a factor, and classified the tests in which Decroly and Degand's subjects were superior, he gave no quantitative demonstration of the effect of this factor. The results of Decroly and Degand are based on too few subjects to admit of quantitative treatment. Chotzen (18) studied the tests by comparing the performance of feeble-minded individuals of the same mental age but. of different chronological age. Although this method shows the effect of environment and maturity on feeble-minded individuals, it does not bear directly on the factor of school training. The foregoing VARIABLE FACTORS IN THE BINET TESTS 43 investigations will be discussed in this chapter only in their relation to the results of the particular tests. Schmitt, in her monograph, published tables showing the reaction of each sub- ject in each grade to each test, the tables being discussed in the text. Although it was not Schmitt's purpose to determine the correlations between the various tests and grade, her data are available for a study of this sort, and the writer has taken the liberty of figuring them in this light, indicating at the same time Schmitt's interpretation of the grade factor, contained in the accompanying text. These data will be compared with the re- sults of the Princeton investigation. 422 subjects of this investigation were distributed in the kindergarten, first six regular grades, minus grades and the special class of the Princeton Model School. 301 of the subjects (161 boys and 140 girls) were in the kindergarten and first six regular grades. The data obtained from the examination of these 301 subjects were classified according to the grade in which the subjects were found, and the percentage that the subjects of each grade passed each test was calculated. Only those tests were studied which showed themselves to be free from the influence of the personal equation of the four ex- perimenters. The elimination of the unrecorded results of the definitions test left a number of cases too small to be studied. To avoid the influence of the error due to incomplete data, the writer has calculated the percentage from only those tests that were given from 75% to 100% of the possible number of times. The data from the tests of repeating 5, 6 and 7 digits have been combined into one weig'hted measure. The procedure of the experimenters in giving these tests was to start within the sub- ject's range and continue till he failed. If 5 digits were suc- cessfully repeated, 6 were given, and if these were passed, 7 were given. The results have been combined into one measure for the sake of simplicity, i point being allowed for the successful repetition of 5 digits, 2 points for 6 digits and 3 points for 7 digits, the weighting being roughly in accordance with the weighting in Goddard's scale, the tests being in the age groups 44 CARL C. BRIGHAM VIII, X and XII respectively. The measure of the ability of a group to repeat digits is the per cent, that the. number of points scored is of the number of points possible (i.e. 6 times the num- ber of subjects in the group). The number of subjects in each grade (boys and girls shown separately) the average age of the subjects in each grade, to- gether with the mean variation from the average are shown in Table 3. TABLE 3 Number of Boys and Girls in Each Grade, and the Average Age of All Subjects in Each Grade. Number Number Total No. Average Mean Grade of Boys of Girls of Subjects Age Variation Kindergarten 20 12 32 5.64 years 0.46 years Grade 1 27 24 51 70s " 0.50 " Grade II 16 24 40 8.16 " 0.65 " Grade III 21 24 45 9-3i " 0.75 " Grade IV 20 15 35 10.46 "" 0.91 " Grade V 24 25 49 11.71 " 0.99 " Grade VI 33 16 49 12.81 " 1.06 " The above table shows an increase of a year or more (actually from 1. 10 years to 1.41 years) in the average age of the subjects in each grade. From this it is reasonable to expect that there is a general growth in intelligence correlating with this increase in age, or, in other words, to expect a correlation between the re- sults of the individual tests and the grade in which the per- formance occurred. If the correlation is too high, it will in- dicate a dependence of that particular test on the subject matter of the grade. In Table 4 are shown the percentages that the sub- jects in each grade passed each test. The notes referred to in the margin contain the proportions passed for all other subjects for whom the percentages are not given, the percentages being given only for those groups to whom the tests were given from 75% to 100% of the possible number of tirries. A study of Table 4 shows that the tests in general correlate with grade. The combined score of the test of repeating digits, for example, shows a growth from 6% to 78%, more rapid in the first three grades than in the last four. The tests vary in VARIABLE FACTORS IN THE BINET TESTS 45 TABLE 4 Percentage that Subjects in Each Grade Passed Each Test. 301 Subjects. Grades Test K I II III IV V VI VII-i, 13 pennies 72 96 100 Note i VII-2, Pictures 69 96 94 Note 2 VII-4, Diamond 46 75 88 Note 3 VII-5, Colors 72 90 97 Note i Vni-2, 20 to 9 53 80 Note 4 VIII-4, Stamps 13 50 78 Note 5 All digits, (combined) 6 21 42 51 55 78 75 VIII-3, Days of week 16 45 90 100 Note 6 IX-3, Date 5 35 96 100 Note 7 IX-4, Months 28 84 90 Note 8 X-I, Money 20 2,6 57 82 Note 9 X-2, Designs 21 37 42 66 Note 10 X-s, Sentence (2 ideas).. 67 89 88 98 Note 11 XI-2, Sentence (i idea)...- 22 46 51 74 Note 12 XI-3, 60 words 62 6i 87 Note 13 XI-4, Rhymes 67 63 76 Note 14 Note I. Counting 13 pennies and naming colors given 20 times above II. Not failed. Note 2. Describing pictures given 21 times above II. Not failed. Note 3. Copying diamond given 25 times above II. Not failed. Note 4. Counting from 20 to o given 18 times in K. Not passed. Given 31 times above III. Failed once. Note 5. Counting stamps given 15 times in K. Not passed. Given 35 times above III. Failed 3 times. Note 6. Naming days of week. Given 2i2 times above III. Not failed. Note 7. Giving day and date given 5 times in K. Not passed. Given 56 times above IV. Not failed. Note 8. Naming months. Given 26 times below II. Passed twice. Given 44 times above IV. Failed twice. Note 9. Naming money. Given 26 times below II. Passed 3 times. Given 28 times in VI. Failed twice. Note 10. Copying designs given 33 times below III. Passed 5 times. Note II. Sentence (2 ideas) given 32 times below III. Passed 12 times. Note 12. Sentence (i idea) given 2i2 times below III. Passed 4 times. Note 13. Giving 60 words given 53 times below IV. Passed 19 times. Note 14. Giving rhymes given 42 times below IV. Passed 26 times. the number of grades taken to reach their maximum. The test of naming the day and date, for example, is failed by all subjects in the kindergarten, 95% of Grade I and 65% of Grade II, while only 4% of the subjects in Grade III and none of those in t-he 46 CARL C. BRIGHAM higher grades fail it. A sudden increase occurs between Grades II and III showing possibly the influence of grade training. The tests vary considerably in the degree of their correlation. An easily obtained measure of the degree of correlation is that of comparing the magnitude of the increases from grade to grade. For example, there is an increase of 6i% (96% — 35%) from Grade II to Grade III in the ability to pass the test of giving the day and date, and an increase of 16% (36 — 20%) between the same grades in the test of naming the pieces of money. The former test correlates higher with the influence of grade in this particular case than the latter. In this manner the percentage difference between the per- formance of the subjects in each grade and that of the subjects in the preceding grade was obtained. All the increases or de- creases in ability from one grade to another were thus obtained, these values serving as measures of the amount of correlation between the tests and the grades. 42 differences between the performance of the subjects in any grade and those of the next succeeding grade were thus obtained. In 4 cases there were actual decreases of i, 2, 3 and 4% which were not significant. The difference ranged from — 4% to +61%, the median being + 19-5% ( 0=16.25%). Some of the differences between the grades might be due to the chance superiority of a particular grade. To overcome this chance variation, and to furnish an- other index of the growth of the various abilities, the differences were calculated by steps of two grades, i.e., siibtracting the per- formance of the kindergarten from the second grade, the first from the third, etc. In this way, 26 differences were obtained varying from -]-g% to -\-gi%, the median being +29% (Q=i8%). Some of the differences noted are undoubtedly high enough to warrant the assumption of the effect of grade training on the tests. Just what tests show this effect is probably a matter of opin- ion. Allowance must be made for the growth of an ability independent of training. 25% of the highest increases from one grade to another were selected as being worthy of special consideration at least. A larger increase must be allowed be- VARIABLE FACTORS IN THE BINET TESTS 47 tween two grades. Those differences were considered worthy of special consideration that exceeded twice the vakie of the median of the one-grade differences or 39%. This manner of selecting the largest differences is quite arbitrary, but is justified by the outcome, for the tests that show the most significant in- - -eases nccording to this method show those increases in more than one step, so that the evidence is concentrated against a very few tests. In this way the significant values outweigh the less significant values and fair allowance is made for growth from one grade to another. The following list includes the tests showing the greatest in- creases by one-grade and two-grade steps, together with the magnitude of the increases and the grades between which they occur. One-grade steps. 25% of largest increases. -t-6i% Date, II to III +56% Months, II to III -t-45% Days, I to II -f 44% 20 to 0, I to II -1-37% Stamps, I to II 4-30% Date, I to II 4-29% Diamond, K to I -f29% Days, K to I -f28% Stamps, II to III -1-27% 20 to o, II to III -f-27% Pictures, K to I Two-grade steps. Increases greater than 39%. -f-91% Date, I to III -1-74% Days, K to II -\-7i% 20 to 0, I to III -1-65% Stamps, I to III -1-65% Date, II to IV -1-62% Months, II to IV -f55% Days, I to III +46% Money, III to IV -f42% Diamond, K to II The above lists of increases are confine'd to but 8 tests. In all, there were 16 tests studied. According to the method of selecting the significant increases, 20 such values actually ap- peared. In this manner the evidence combines against a very few tests. Some tests appear in both lists and more than once in the same list. The most striking growth with grade is shown in the tests of giving the day and date, naming the months, nam- 48 CARL C. BRIGHAM ing the days of the week, counting from 20 to o and counting stamps. The tests of copying the diamond, describing pictures and naming money may or may not show this influence. The evidence is strongest in the case of the diamond test since that appears in both Hsts. The foregoing method of selecting those tests which correlate with grade to such an extent as to indicate the influence of grade training is not conclusive, owing to the fact that there is also an increase in age from grade to grade. If a test showed a very rapid growth with age, and those ages fell for the most part in certain grades, then those grades would show an increase which might be wrongly assumed to be due to training. The tests of counting from 20 to o is a case in point. Yerkes (82) in Table 32, page 125, gives the percentage values for each test in the Point Scale, for English speaking boys and girls of each age. The test, of the twenty one tests included, that shows the most marked increase with age is that of counting backward, the values being as follows, — age 4=0%; age 5=3.5%; age 6=23.7%; age 7=45.7%; age 8=72.2%; age 9=96%; the values for ages above 9 being 97% or higher. The age in grade distribution of the 301 subjects in this in- vestigation is given in Table 5. TABLE s Distribution of Subjects in Each Grade according to Chronological Age. Grades Age K I II III IV V VI Total 4 4 4 5 17 17 6 II 28 2 41 7 18 17 2 I 38 8 4 15 18 I 38 9 5 13 II 29 10 I ID 14 18 I 44 II I 2 3 l6 16 38 12 5 8 II 24 13 "■ 4 12 16 14 2 5 7 15 I 3 4 16 I I Total 32 51 40 45 35 49 49 301 VARIABLE FACTORS IN THE BINET TESTS 49 The rapid growth of the abiHty in counting from 20 to o, according to the method of comparing the subjects in each grade, was from 9% in Grade I to 80% in Grade III. From Table 5 it may be seen that practically all, (89%), of the chronolog- ical ages in Grades I, II and III were distributed in the ages 6, 7, 8 and 9, a chronological range coinciding with that in which Yerkes' results show the ability to develop. The growth of this ability might be due then either to age or to grade. For this reason, to arrive at any final conclusion, it is necessary to compare the subjects of the same age but in different grades. The treatment of the Princeton results according to this method follows, but the analysis of the data in this manner can have no great reliability owing to the small number of subjects in each group. The number of subjects in each group, (boys and girls shown separately), the average age and mean variation from this average are shown in Table 6. TABLE 6 Number of Boys and Girls of Similar Ages in Different Grades, and the Average Age of the Subjects of Similar Ages in Each Grade. Number Number Total no. Average Mean Grade Age of Boys of Girls of Subjects Age Variation Kindergarten. 5 11 6 17 5-48 0.20 Kindergarten. 6 8 3 11 6^ 0.21 Grade I 6 14 M 28 6.59 0.17 Grade I 7 9 9 18 7-36 0.22 Grade II 7 7 10 17 7.56 0.24 Grade II .... 8 6 9 i5 8.39 0.24 Grade III ... 8 8 10 18 8.60 0.22 Grade III ... 9 5 8 13 9.43 0.16 Grade IV .... 9 5 6 n 9.65 0.13 Grade IV .... 10 10 4 14 10.39 0.30 Grade V 10 7 u 18 10.54 0.25 Grade V 11 10 6 16 11.54 0.22 Grade VI ... 11 10 6 16 11.53 0.26 Grade VI .... 12 6 5 " 12.52 0.14 All chronological ages were computed in tenths of a year, so that a variation in age from o.i yr. to 0.9 yr. is possible within so CARL C. BRIGHAM c^ l—t 11 > IH ii bo < t-H *o t-4 > hH V to < > VO I? (30 TJ- IT) " " '^00 "" „ HH < >> bfl *<• o 4; o I— ||-iOoO\O\OCSt^O\0)M 00 'S'^W 2. > rtS j^voi— i^O^Q^vooiSi-ii-iOliO bOfyCen ^ !> «K - ■w Mwo\oor)-oo -r -^ 1) "o .S "" •c «« * p ^ ° 2tio*-.tN\oiot^ ^" .2^&^ ^ < « -« ^' 8 bfl 0^^^r3'^'^*-tn Coir *2 Ui'" 5^»!«_, o) U'c«J^ o ^ c yy.S oc8o u CiiSi-'a 01,,,-.^ CO *« en "2 o-ou 3 C/5 '-'hn'^°'«<«i2(UP--'"r-r---^ ^ ^ " .„ .. be •« ^ S a. -o .S .S ,„ >• SS nSSftUDPC-SbO^-^bOtuObflcflcn'O'- QO ■•^•[:.SS'-2'-2rt^'«^'=S'2'2ob< W< . So>.pccS"c.S'p"p>.oo§c UKU o > lU 'bib 'bib lU "O tn ^ c H rt * ■'-' C VARIABLE FACTORS IN THE BINET TESTS Si Note I. Tests of counting 13 pennies, describing pictures and naming colors each given 12 times above II-8. No failures. Note 2. Copying diamond given 15 times above II-8. No failures. Note 3. Counting from 20 to o given 16 times below 1-6. Not passed. Given 31 times above III-8. Failed 4 times. Note 4. Counting stamps given 14 times below 1-6. Not passed. Given 32 times above III-8. Failed 4 times. Note 5. Giving days of week given 32 times above III-8. No failures. Note 6. Giving date given 39 times below II-7. Passed twice. Given 36 times above IV-io. No failures. Note 7. Naming months given 24 times below II-7. Passed twice. Given S7 times above IV-9. Failed 4 times. Note 8. Naming pieces of money given 35 times below II-8. Passed 4 times. Given 14 times above V-ii. Failed twice. Note 9. Copying designs given 26 times below III-8. Passed 5 times. Given 15 times above V-ii. Failed 6 times. Note 10. Three words in sentence, 2 ideas, given 24 times below III-8. Passed 9 times. Note II. Sentence, i idea, given same as 2. Passed 3 times. Note 12. 60 words in 3 minutes given 41 times below IV-9. Passed 10 times. Note 13. Giving rhymes given 37 times below IV-io. Passed 25 times. each age group. That the subjects of the "same" age but in different grades are not exactly the same is shown in Table 6. The subjects of each age in the higher grades average from o.oi yr. to 0,33 yr, different, with an average superiority of 0.19 yr. This difference, however, is about one fourth that between the subjects or different ages in the same grades, and may be called the same for practical purposes. For convenience, the groups will be referred to as K-5, II-7 etc., the first member referring to the grade, the second to the age. K-5 would mean the group of 5 year children in the kindergarten, II-7, the 7 year subjects in Gra'de II, etc. The actual per cent, that the subjects in each group passed each test was calculated and is shown in Table 7. Unless otherwise noted, the percentages are based on tests given 75% to 100% of the possible nimiber of times. Some of the groups from which results were obtained are too small to have great reliability, but the method is at least sug- gestive. The results of 14 groups are given. It is possible then to compare the results of subjects of 6 ages, (6, 7, 8, 9, 10 and 11), that are in different grades, and also to compare sub- 52 CARL C. BRIGHAM jects in all seven grades that are of different ages, and in this way to determine whether the dominating factor in the growth of any ability is that of grade or age. The reliability of the method rests only on its connection with that of the first method employed. In answer to the question of whether the growth of ability in the test of counting from 20 to o is due to age or grade, a ques- tion which was unanswered by the first method, we may turn to the results shown in Table 7 in which the subjects of each age in each grade are shown. The test of counting from 20 to o was not passed by any of the 5 and 6 year subjects in the kindergarten. Comparing first the subjects of different ages in the same grade, the 7 year subjects in Grade i are 16% lower than the 6 year subjects in that grade, and the 8 year subjects in Grade II are 20% lower than the 7 year subjects in the same grade, the older subjects making a lower record in each case. Comparing the performance of the si±>jects of the same age but in different grades, the 7 year subjects in Grade II are 63% ahead of the subjects of the same age in Grade I, while the 8 year subjects are 40% ^ ahead of the subjects of the same age in Grade II. Allowing for the retrogression of the older subjects in each group, i.e. assuming that they should have done equally as well as the younger subjects in the same grade, the groups in Grades II and III are still 47% and 20% ahead of the subjects in the grades lower. The growth of ability in this test would therefore appear to be due to grade training. A rapid growth of ability in the test of counting stamps oc- curred between Grades I and III (37% I-II-f28% 11-111=65% I-III), according to the first method, so that the same question arises as in the test of counting from 20 to o. The test was not passed below group 1-6. No growth with age is shown between iThis test was given to but 66% of the subjects in III-8, the experimenters assuming that the other 34% would pass. The score given, 85%, therefore represents the ability of the lowest selection of III-8 subjects, or the most conservative estimate of the ability of the whole group. The same applies to the other tests in III-8 given 66% and 72% of the time. In this way the hypothesis that the tests are not influenced by grade training is given the benefit of the doubt. VARIABLE FACTORS IN THE BINET TESTS 53 1-6 and I-7, but a growth of 31% appears between II-7 and II-8. A growth with grade of 17% is shown from I-7 to II-7 and of 25% from II-8 to III-8. This test shows therefore the operation of the two factors of age and grade training. The improvement in abihty in the tests of counting 13 pen- nies, describing pictures and naming colors, that was indicated between the kindergarten and Grade I by the first method, would refer to age rather than grade, for a greater increase in each test is indicated between K-5 and K-6 than between K-6 and 1-6. Above 1-6 these abilities are completely developed. It could be maintained that these tests are so completely within the ability of the groups that the effect of train- ing would not be indicated. The test that is best adapted to show the influence of any factor on a group is one that is well within the ability of the group — the influence of the factor will be obscured if the measure is either too easy or too difficult. The test of copying the diamond is a case in point and one well worth study, for it has been attributed to the effect of training by various authors. All the reproductions of the diamond had been scored according to the arbitrary system outlined in the previous discussion of the personal equation. A control on the factor of difficulty was obtained by raising or lowering the pass- ing mark in this test. The percentage passed was calculated for each group for each of the 5 possible passing marks. The re- lations indicated in Table 7, where the passing mark is Group IV, were not changed by this process of raising or lowering the pass- ing mark. In all cases the influence of age was shown between groups 1-6 and I-7, and the influence of grade shown between groups K-6 and 1-6, The test was given to but 59% of the K-5 group, the experimenter assuming that the other 41% would fail, so that the percentages calculated represent the performance of the best selection of K-5 subjects, or, in other words, the benefit of the doubt is given to the hypothesis that the test is influenced by grade training. If the other members of K-5 had failed according to the experimenter's assumption, (and this assumption was quite justified for some had failed to draw the square), 29% of the group would have passed instead of 50%. 54 CARL C. BRIGHAM The influence of age indicated in this test is as great if not greater than that due to training. The test of repeating digits, scored by the weighting system previously described, exhibits a slow but uniform progress throughout, the older subjects in each group making records that are about the same or slightly lower than those of the younger subjects in the same grade, an increase showing fairly regularly from grade to grade. The most marked increase in this ability appears between K-6 and 1-6, and between I-7 and II-7, possibly indicating that the lack of familiarity with the use of digits in the lowest grades interferes with this test as a measure of auditory memory. The test of naming the days of the week shows the most marked improvement with age (40%) from K-5 to K-6, prac- tically no improvement (10%), from K-6 to 1-6, no improvement from 1-6 to I-7, a very marked increase with grade from I-7 to II-7, a drop from II-7 to II-8, group III-8 marking the complete development of the ability. The test would appear to be due to the combined effect of age and grade. The tests of giving the day and date and naming the months are passed only twice in the kindergarten and first grade, by about a quarter of the sub- jects in II-7 and II-8 without age increase, while the subjects in III-8 shows a most marked increase due to grade. Above III-8 these tests are seldom failed. The test of naming the pieces of money shows a slow growth from 8 to 11, the largest increases appearing from III-9 to IV-9 and from IV-io to V-io, improve- ment with grade in each case. Copying the designs from memory shows a growth of 26% from 8 to 11, the development occurring in two age steps, from IV-9 to IV- 10 and from V-io to V-ii. The growth with age cannot be determined in the tests of con- structing sentences from three given words, because they were given to too few cases below the third grade. The results do not show whether III-8 is exceptionally high or III-9 exceptionally low. Both tests show decreases in ability from III-8 to III-9 and from V-IO to V-ii. The ability in the easier test is well within the range of the third and higher grades, showing, therefore, no VARIABLE FACTORS IN THE BINET TESTS 55 improvement. The improvement in the second test develops from 33% to 80% in three steps, correlating with Grades IV, V and VI in each case. The most vital question, that of determin- ing whether or not the language training in the third grade helps to make the construction of a sentence possible, cannot be deter- mined owing to the lack of material in the second grade. The experimenters' assumptions in not trying the test would indicate this fact, but this is not experiment. The same lack of material makes conclusions in regard to the rhyming test impossible. The performance of IV-io is exceeded only by VI-ii. The test of naming 60 words in three minutes shows two decided increases with age and one decided drop with grade. The foregoing analysis is based on a number of subjects in each group too small to have any great significance. The general fact of the correlation of the tests with grade remains, and con- clusions concerning what tests correlate too highly with training can be answered only by considering both methods of study, and by considering only the largest deviations. The two most strik- ing instances are found in the tests of naming the months and giving the date. These tests undoubtedly relate almost entirely to training. Less striking but equally definite is the relation of the test of counting from 20 to o to training. The tests of naming the days of the week and counting stamps show the in- fluence of age to an extent almost as marked as that of grade, so that while the development in these tests is rapid, the grade factor probably exerts only part of the influence. Conclusions concern- ing the other tests are largely a matter of opinion, and the opinion of the writer has been indicated in the detailed discussion. A study of the tests in relation to grade by the first method employed may be made from Schmitt's results. The author gives, in Table I, II, III, IV, V, VI and VII on pages 70, 71, 73, 74, 75, 76 and 77 of her monograph, the results of each subject in each grade on each test. From these tables the present writer has calculated the percentage passed in each test. A study of this sort rests for its reliability on the accuracy of the published tables, and the facts indicated by the tables do not always coincide 56 CARL C. BRIGHAM with Schmitt's discussion.^ The writer has followed the tables rather than the discussion in calculating the results. In the VIII-2 test where an alternative rank is given for counting from lo to o instead of 20 to o, the writer has considered success in counting from 10 to o as a failure in counting from 20 to o. In the line suggestion test Schmitt recognizes two types of failure, the typical failure according to Binet of accepting the suggestion of the first three lines, and the failure due to the fact that the subject actually judges the lines unequal after studying them. The sec- ond type of response Schmitt marks as passed, using a special symbol. The writer has calculated these percentages separately, entering the first or Binet type of response under "Line sugges- tion A" in the table, and the second type under "B." The V year and Adult tests were omitted. All of the other tests were in- cluded that had been given over 70% of the possible number of times. Unless otherwise noted, each test was given 100% of the possible number of times. Table 8 shows the per cent, that Schmitt's subjects in each grade passed each test in Binet's 191 1 scale (Town's translation with modifications). The table is given with the reservation that the tables from which the per- centages were calculated might contain misprints, and that the writer's interpretation of the tables might be at fault. Inasmuch as there are many differences in procedure in giving the tests, and in the character of the schools tested, the results of the two investigations are not comparable in respect to the per- centage passed in one grade in one study with those in the same grade in the other study. The method used in determining the 2 In the discussion (page 69) Schmitt gives 15 subjects in the kindergarten failing test VII-4. Table I shows 13. On the same page she gives 24 sub- jects failing VIII-4. Table I shows 22 failing. In discussing the results of Grade I (page 72) Schmitt states that there is "more than 50% of failure with the discrimination of weight", while Table II shows 35% failure. Again, the tests referred to specific school instruction by Schmitt are VII-4, VIII-4, and IX I, 2, 3 and 4. On page 72, in discussing the results of Grade I, she says "the tests below ten years which depend upon specific instruction are usually not passed except the VII-4 test. The percentages passed are as follows: VII-4 = 85%; VIII-4 = 45%; IX-i = 35%; IX-2 = 75%; IX-3=9o% ; IX-4=i3o%. "Usually not passed" includes, therefore, tests passed 75% and 90% of the time. VARIABLE FACTORS IN THE BINET TESTS 57 TABLE 8 Number of subjects VI-i, Distinguishing morning, afternoon 2, Defining in terms of use 3, Copying diamond 4, Counting 13 pennies 5, Choosing prettier of faces VII-i, Showing right hand 2, Describing pictures 3, Executing 3 commissions 4, Counting stamps 5, Naming colors VIII-i, Comparing remembered objects 2, Counting backwards from 20 to 3, Indicating omissions in pictures 4, Giving day and date 5, Repeating 5 digits IX-i, Making change 2, Defining in terms superior to use 3, Naming pieces of money 4, Naming the months 5, Comprehending easy questions X-i, Arranging 5 weights 2, Copying designs 3, Detecting absurdities 4, Comprehending difficult questions 5, Constructing sentence. Two ideas XII-i, Resisting suggestion, A. (Binet scoring) B. Judgment error counted plus 2, Constructing sentence. One idea 3, Giving 60 words in three minutes 4, Defining abstract terms 5, Reconstructing dissected sentences XV- 1, Repeating 7 digits 2, Rhyming words with "obey" 3, Repeating a sentence of 26 syllables 4, Interpreting pictures 5, Solving problems from various facts Note. — All tests except those marked (*) were given all the possible number of times. The VI year tests were given 90% of the time in Grade I, the IX year tests 72% of the time in the kindergarten, the XII year tests 70% of the time in Grade I, and the XII and XV year tests 95% of the time in Grade V. rade Passed Eacl 1 Test. 150 Subjects. Grades K I II III IV V VI 25 20 17 21 22 22 23 96 100* 92 94* 76 94* 92 100* 92 100* 92 80 100 72 65 81 92 95 100 48 85 100 96 100 100 92 100 100 100 100 40 85 94 95 100 100 95 94 100 100 12 45 94 100 100 64 85 94 100 100 6* 35 71 95 86 100 39* 75 65 100 95 100 28* 90 94 100 100 100 6* 30 71 95 95 95 61* 100 100 95 100 100 65 41 57 50 64 10 35 57 45 32 60 88 100 100 100 85 100 100 100 100 65 76 100 100 100 ng) 64* 76 52 100 41 86 14* 100* 100 57* 71 95 95 100* 100 43* 82 62 100 95* 96 7* 29 52 7i 95* 100 0* 6 10 23 81* 62* 86* 10* 14* 62* 78 78 70 17 70 70 58 CARL C. BRIGHAM correlation of the tests with grade is the same as that used in the first method of treating the Princeton data, that of comparing the differences between grades by one-grade and two-grade steps, of selecting an arbitrary standard for detect|ing exceptional growth, and of comparing the resulting lists. The differences between the performance of each grade and the next succeeding grade were calculated. These differences, lOO in number, ranged from — 24% to +62%, the median being -f 5% (Q=io.75%). The run of differences differs from that found in the Princeton study in two respects, in having a lower median and variability, and in containing more minus deviations. The lower median and variability is due to the fact that the tests were given over a wider range, the Princeton tests being given only on the "up slope" of the growth curve, or not being given when the tests were any distance above or below the probable range of ability of the group. The Princeton results showed only 4 minus deviations of 4, 3, 2, and 1% respectively, while Schmitt's results show 15 such deviations, 6 of them being 10% or over. These deviations are probably due to the smaller number of subjects, and if due to chance, should be counteracted by the precautionary measure of combining the indices of correlation into two-grade steps. 71 two-grade differences were obtained ranging from — 25% to + 82%, the median being +10% (Q=:zi6.5%). 4 meas- ures were still in the minus direction, one of these,' — 25% (Design III to V) is probably significant, the other values of — 6%, — 5% and — 4% having no significance. Inasmuch as the variability of the series is lower, those differences were considered to be worthy of special study that had the value of 2Q+M, or were in excess of the interquartile range plus the median. The lists of tests that appear as showing marked growth with grade according to the two methods are as follows: VARIABLE FACTORS IN THE BINET TESTS 59 One grade diflferences Two grade differences higher than 2Q4-M higher than 2Q4-M +62%, IX-3, Money, K to I +82%, VIII-4, Date, K to II +58%, XII-5, Dissected, IV to V +7i%, XII-s, Dissected, III to V +49%, VIII-4, Date, I to II +66%, IX-3, Money, K to II +45%, VIII-2, 20 to o, K to I +65%, IX-4, Months, K to II +41%, IX-4, Months, I to II -\-^5%, IX-4, Months, I to III -f65%, IX-i, Change, K to II +60%, IX-i, Change, I to III +39%, IX-5, Comprehension, K to I 4-39%, XII-3, 60 words, I to II +38%, XII-3, 60 words. III to IV +55%, XII-5, Dissected, IV to VI +37%, VII-4, Stamps, K to I +55%, VIII-4, Date, I to III +36%, IX-2, Definitions, K to I +54%, VIII-2, 20 to o, K to II +36%, IX-I, Change, I to II +52%, VII-4, Stamps, K to II +35%, IX-2, Definitions, II to III , „o, v -^ • x . ttt +33%, VIII-4, Date, K to I +^7%, X-2 Design, I to III +45%, XII-4, Abstract Def., I to III +29%,, IX-I, Change, K-I +44%, XII-4, Abstract Def., II to IV +28%, X-3, Absurdities, I to 11 +43%, XII-4, Abstract Def., Ill to V A Study of the above lists shows, as in the similar study of the Princeton data, that although the method of selecting the exceptional tests is an arbitrary one, the method is justified in practice, for only a few tests (13) appear in the lists as signifi- cant. In all, there were 34 tests^ studied, and 30 differences were considered large enough to be significant. These 30 differences were confined to 13 tests. The tests of naming 60 words and defining in terms of use drop out of the first list owing to the elimination of the errors of negative correlation. The design test is both positive and negative, the ability increasing from Grades I to III and decreasing after III. The test of defining abstract terms appears according to the second nlethod because the ability increases with grade from 7% in I to 95% in V by 3 No differences were calculated from the line suggestion test owing to the possibility of misinterpreting the symbols. Schmitt notes the difference in the character of the responses from the suggestion error to the judgment error in passing from Grade II to III. The scoring of the suggestion error in the tables shows an inverse correlation with Grades II, III, IV and V, and a sudden change again from 14% in Grade V to 100% in Grade VI, so that there is probably a mistake. The scoring of the responses to this test according to the strict Binet ruling would make the "mental ages" lower, for many cases would then have basal X. 6o CARL C. B RICH AM increases of approximately 25% in each grade. No conclusions may be drawn concerning the easy comprehension test and the absurdities test. The 20 remaining differences are confined to 7 tests, those of naming the day and date, naming the months, counting from 20 to o, counting stamps, naming money, recon- structing dissected sentences, and making change. The first four were included in the five found to show the most marked influ- ence of grade in the Princeton study. The test of naming the pieces of money did not show a marked relation to grade in the latter study, but this difference might be one of school curriculum. The test of naming the days of the week is not included in Binet's 191 1 scale. In the Princeton study alternatives were used in the making change question so that no data from this test were included in the quantitative study. These data show the ability in this test developing in the second and third grades, the test being passed only twice in the kindergarten and first grades, and generally passed above the third. The data in the test of reconstructing dissected sentences show very few passing the test below grade V with approximately three fourths passing in V and VI. In so far as the Trenton experimenting was applied to a few subjects in the regular grades below the seventh, this test was rarely passed in the third and fourth grade, passed about 5% in V, and almost universally passed in VI, VII and VIII. The number of subjects in each grade is small in the Trenton experiment, but each test was separately scored, i.e. each part of the dissected sentence test, each part of the absurdity test etc. Each of the three parts of the dissected sentence test showed the same growth between the same grades, and this growth was more marked than that in any other test. The evidence concerning these two tests, therefore, supports the evidence from Schmitt's results. The quantitative analysis of the Princeton data and Schmitt's data would indicate that the tests of counting stamps, counting from 20 to o, naming the days of the week, giving the day and date, naming the months, naming the pieces of money, making change and reconstructing dissected sentences were influenced to a considerable extent by grade training. The performance in VARIABLE FACTORS IX THE BINET TESTS 6i certain of these tests (days, date and months) may be the result of specific school training in the tests themselves, while others (perhaps the tests of counting stamps, counting from 20 to o, and reconstructino- dissected sentences) mav involve a transfer . . . *■ effect in the application of the content of the grade in a new way. The fact that the tests correlate very highly with grade training does not show that the tests are worthless, but it does show that they should, perhaps, be placed in another scale, or should at least be placed on a different footing than those that test capacity irrespective of attainments. One of the best tests* of intelligence is the determination of what an individual can do with the training he has received, but tests of this sort rest on the assumption that the individual's opportunities have been determined. The importance of tests of information in cases of alienation presenting a picture of deteri- oration is recognized. The important change to be made is not the elimination of such tests from intelligence scales, but their standardization on a different basis. The diagnostic value of such tests rests not in the mechanical memorizing of a time series such as that of the months, but in the ability to apply such a series. In pointing out this fact Katzenellenbogen (37) suggests that the months test be given in some such manner as "If somebody asks you in November to return three months later, what month would it be?" Decroly and Degand also suggest that the mechanical tests of counting and naming the days of the week and months be modified in some such manner. * The writer recalls two cases in which the failure in tests which involved the application of training was very significant. The first was that of a woman of about 30, a parole patient in a hospital for the insane, who had never shown any marked symptoms other than a history of intellectual in- feriority. This patient passed practically all of the Binet tests in the IX, X and XII year groups, but failed completely in the test of making change. This observation was later checked up. Another case of a woman of 22, in the same hospital, presented a border-line psychoneurotic picture perhaps, but no marked symptoms other than a history of intellectual inferiority. She passed in a great many of the difficult tests in the upper years but had great difficulty in telling time. Both cases had lived under very good home con- ditions and had mingled with people of ability. A great many tests of capa- city were given, but the most illuminating evidence of their mental status came from the two tests mentioned. 62 CARL C. BRIGHAM Comparing the conclusions of this study with other investiga- tions, the agreement is fairly close. Schmitt's results do not support her suggestion that the definitions test relates to specific school instruction. The other tests which she refers to this factor (stamps, date, 20 to o, change, months and money) show the influence to a marked extent. Binet in classifying some of the tests referred the tests of copying a sentence, reading for memories, writing from dictation, copying a diamond, counting backwards and making change to scholastic training. The first three tests were not included in this investigation. The diamond test showed the influence of age to be as great if not greater than that of school training. The last two tests showed a marked influence of training, Binet referred the tests of counting 13 pennies, naming four colors, naming the days of the week and enumerating the months to home training. The last two showed a marked influence of school training. The results of the present investigation agree with those of Chotzen in finding no effect or very little effect of training in the tests of copying the diamond, repeating digits, describing pictures, counting 13 pennies, naming colors, comparing remembered objects, defining in terms of use and superior to use, and in finding marked influence of this factor in the test of naming the days of the week. The methods used in analysing the results, especially the sec- ond method, reveal several suggestive relations between the tests and the school grades. There is a general correlation be- tween the tests and the grades, a correlation that is very necessary to establish, for there is also a general correlation between intelli- gence and grade. In analysing the results of the individual tests by comparing the results of subjects of the same age in different grades, and of subjects of different ages in the same grade (Table 7), it was seen that, as a general rule, the growth in any particular ability occurred in passing from grade to grade, not in passing from age to age within one grade. In fact in only half of the cases in which the subjects of two ages in one grade may be compared do the older subjects make records that are higher than those of the younger ones, and only 10% of these gains are over 20%. If the groups were considered to be equal in all VARIABLE FACTORS IN THE BINET TESTS 63 cases in which their records were within 10% of each other, equahty occurs in exactly 50% of the cases. Of the remainder, 20% of the groups were lower, while in only 30% of the cases are the older subjects actually higher than the younger subjects of the same grade. Some of the cases of retrogression could well be accidental, but they occur too frequently to be due entirely to chance. Applying the same general method to the cases in which groups of the same age but in different grades were compared, 5% of the groups in a higher grade showed lower scores, the results correspond in 43% of the cases, while 52% showed definite improvement. This might indicate that there is a higher correla- tion between the tests and grade than between the tests and age. The fact that the comparison of children, of different ages in the same grade showed the older children making lower records in 20% of the cases, equal records in 50% of the cases and higher records in only 30%, would confirm the general diagnostic value of the tests if Bonser's interpretation of this phenomena is cor- rect. Bonser (12) applied various sorts of reasoning tests to children in the fourth, fifth and sixth school grades. In summarizing the results of the tests in the dift'erent grades, he says, 'Tn the contrast with grade progress and progress with age, in the generally superior showing made by the younger groups of children of any grade when contrasted with the older pupils of the grade, and in the fairly substantial percentage of pupils from lower grades found in the highest quartile of ability for all, it is shown that native capacity is measured to a high degree by the tests." In conclusion, the results shown in this chapter would indi- cate a correlation between the individual tests studied and the school grades, this correlation being high enough in some cases to show the actual eft'ect of training. In answer to the general objection that since one demonstration of the accuracy of the tests rests on their correlation with school grades, the school grades are the real measure of intelligence and the mental tests superfluous, it is only necessar}' to point out that intelligence tests, besides affording the opportunity for accurate standardization. 64 CARL C. BRIGHAM also detect the subject's potential abilities independent of his past performance. The school measure indicates mental defect in cases of gross retardation, but it does not indicate exceptional ability. Schmitt's contention that the school represents a standard environmental situation, and a measure of a subject's ability should include a measure of the adequacy of his reaction to this situation, is well founded. It is not, however, a criticism of the Binet scale, for the scale aims to test native capacity. At the Buffalo conference (15) on the Binet scale, the following ques- tion was raised, — "What is it, after all, that the scale aims to test?" The question was answered by "We believe that current misconceptions as to the aim of the scale should be removed. It is not intended to test the emotional or volitional nature, but primarily intelligence (judgment)." To this list might be added the assertion that the scale was not intended to test a child's reaction to the school situation, or to furnish an outline for taking a record of his life history. Rogers and Mclntyre (54) would also have mental tests in- clude tests dependent on both school and home training. This general trend of present day discussion is a reversion to Binet's 1908 type of scale, a tendency to which Binet was in opposition. The probable solution rests in eliminating from the scale the tests involving training, and in constructing a standardized scale of another sort for the estimation of the individual's reaction to the school situation in terms of the length of time that he has met that situation. That such a scale is not a matter of speculation is shown by the number of scales now on the market for measur- ing handwriting, spelling, composition, arithmetical ability, etc. Tests of native capacity and tests dependent on school and en- vironmental training cannot be standardized on the same basis, for they are essentially different measures. Measures of the first sort may perhaps be correlated with age, while measures of the other sort can be correlated only with opportunity. V. SEX DIFFERENCES The investigators who have studied the influence of sex differ- ences on the Binet-Simon tests have used two methods, that of comparing the "mental ages" or total scores of subjects of each sex, and that of comparing the per cent, that the subjects of each sex pass each test. The first method throws no light on the individual tests, inasmuch as one sex may be superior in one test and inferior in another so that the total score will balance the influence of this factor. Inasmuch as the scale is founded on the principle that sex differences do not exist, it is important to study the individual tests, and to determine the accuracy of this assumption. The Princeton data are available for a study of this sort. 352 subjects (187 boys and 165 girls) between the ages 6 and 12 were examined. The method of study adopted was that of com- paring the results of non-selected boys and girls of each age, and, as a check on this method, of comparing the results of selected boys and girls of four ages. Inasmuch as the subjects of each chronological age are dis- tributed over a range of one year (the 6 year subjects for exam- ple being distributed from 6.0 to 6.9), the actual average age of the subjects of each age was computed to make sure that no differences might appear due to the chance selection of subjects at either extreme. These averages are shown in Table 9. TABLE 9 Actual Average Chronological Age of Boys and Girls in Each Age Group. BOYS GIRLS Number of Average Age Number of Average Age Subjects (M.V.) Subjects (M.V.) Age 6 Zl 6.58 (0.20) 2Z 6.51 (0.20) Age 7 29 7.50 (0.29) 31 7.39 (0.26) Age 8 24 8.48 (0.29) 28 8.48 (0.22) Age 9 20 9.46 (0.27) 22 9.54 (0.26) Age 10 31 10.46 (0.25) 23 10.37 (0.30) Age II 28 11.59 (0.22) 20 11.52 (0.27) Age 12 18 12.43 (0.30) 18 12.57 (0.24) 66 CARL C. BRIGHAM A perusal of this table shows that the subjects agree closely both in their average and in their variability. The 12 year boys are actually 0.14 yr. younger than the girls of the same age group. The 7 year boys are o. 1 1 yr. older than the 7 year girls. All other differences are less than o.io yr. The correspondence is close enough for all practical purposes, but these differences must be taken into consideration before drawing final conclusions. The 352 non-selected subjects from 6 to 12 were distributed throughout the kindergarten, special class, and first six minus and plus grades as shown in Table 10. TABLE 10 Age in Grade Distribution of 187 Boys and 165 Girls, 6 to 12 Years of Age. Age 6 7 8 9 10 II 12 Totals Sex B G B G B G B G B G B G B G Special Class 2 I 3 3 3 I 13 Kindergarten 8 3 II Grade I- 13 4 8 9 I I I 37 Grade I 14 14 9 9 3 I I SI Grade II- 2 2 2 4 2 12 Grade II 2 7 10 6 9 2 3 I 40 Grade Ill- 3 4 2 I I I 12 Grade III 2 8 10 5 8 5 5 I r 45 Grade IV- I 5 I 2 2 3 I IS Grade IV I I 5 6 10 4 I 2 3 2 35 Grade V- I I I 3 3 9 Grade V 7 II 10 6 2 6 42 Grade VI- I I 2 Grade VI I 10 6 6 5 28 Totals 37 23 29 31 24 28 20 22 31 23 28 20 18 18 352 It is generally conceded that a difference exists in the reactions of the sexes to the school curriulum, the girls in the long run mak- ing better progress in school work than the boys. A study of Table 10 shows that in general the girls have a slightly higher distribution than the boys, these relations being more clearly indicated in Table 11 in which the average grade of the subjects of each age and sex is shown. In computing the average grade. the kindergarten was counted o; Grade I — , 0.5; Grade I+, i.o; Grade II — , 1,5; etc. Each subject in the special class was as- signed a grade 0.5 lower than the lowest subject of his age (o be- ing the smallest value given), on the theory that each subject in VARIABLE FACTORS IN THE BINET TESTS 67 the special class was less satisfactory than any of his comrades in the regular class. The fact that there were no girls in the special class would cause an unduly exaggerated difference between the average grades of the boys and girls. For this reason, the average grades of the boys, including and excluding the special class cases, were separately figured, these values being separately shown in Table 11 under Boys A (the average grade including the special class cases), and Boys B (the average grade exclud- ing the classes). Had the special class subjects been in the regu- lar grades, they would have lowered the average of each group, so that the two values may be taken only as limits, the values under "Boys A" being the lower limit, and those under "Boys B," the upper limit. TABLE II Actual Average Grade of Boys and Girls in Each Age Group. BOYS A BOYS B GIRLS No. Average Age (M.V.) No. Average Age (M.V.) No. Average Age (M.V.) Age 6 Z7 0.55 (0.34) 35 0.59 (0.33) 23 0.87 (0.35) Age 7 29 1.24 (0.64) 28 1.29 (0.63) 31 1.31 (0.65) Age 8 24 1.94(0.91) 21 2.21 (0.65) 28 2.25 (0.59) Age 9 20 2.48(1.04) 17 2.91 (0.69) 22 2.98 (0.62) Age 10 31 3.92 (0.71) 31 3.92(0.71) 23 4.19 (0.80) Age II 28 4.66 (1.20) 25 5.04 (0.77) 20 4.83 (0.88) Age 12 18 4.72(0.91) 17 4.82 (0.88) 18 503 (0.59) Table 1 1 shows that the scholastic ability of the girls as indi- cated by the average grade is uniformly higher than that indicated by the lower limit of the boys, and is below the upper limit of the boys in only one case (at 11 years). A slight sex difference in school work in favor of the girls may therefore be assumed at the outset. It is significant that the upper limit of the 11 year boys is higher than that of the 12 year boys, and that the lower limits show a difiference of but 0.06. This would indicate a poor selec- tion of 12 year boys, or a superior selection of 11 year boys. Both measures of the scholastic ability of the boys show a gener- ally higher variability than that of the girls. From Table 9 it may be seen that the growth in the actual average age of each sex is not uniform from year to year, the minimum increase for boys being 0.84 yr. (from 11 to 12), and 6S CARL C. BRIGHAM for girls 0.83 yr. (from 9 to 10), while the maximnm increase for boys is 1.13 yr. (from 10 to 11), and for girls 1.15 yr. (from 10 to 11). A more marked lack of regularity in the growth of scholastic ability from year to year as measured by the average grade is shown in Table 11, no increase being shown by the 12 year boys over the 1 1 year lx)ys, while the 10 year boys show an increase of 1.44 to i.oi grades over the 9 year boys. In the same way the 10 year girls show an increase over the 9 year girls that is nearly three times that of the 7 year girls over the 6 year girls, while the increase of the 7 year girls over the 6 year girls is twice that of the 12 year girls over the 11 year girls. These relations indicate that the selection of subjects is not uniform at each age. The subjects of any one age may be either a superior or inferior selection of all children of that age, and there is no reason for supposing that this random sample of superior or inferior subjects of any age will correspond to a similar sampling of the subjects of the opposite sex of the same age. The process of calculating the percentage that the boys and girls of each age pass each test is extremely simple, but the conclusion, that the differences found between the percentage passed by the sexes at each age may be attributed to sex differ- ences, is not justified unless all the variable factors are known. A previous chapter showed variations in the tests due to the influence of the personal equation of the experimenters. To avoid this variable influence, only those tests were studied that showed that they were free from the influence of this factor. Inasmuch as each experimenter examined approximately the same number of boys and girls of each age, any influence of this factor would be equalized, provided, of course, that there were no differ- ences in the reaction of the experimenters to the two sexes. In the detailed study of the design test, it was found that experi- menter C was more lenient in marking girls than boys. The possibility of a similar interpretation in a few other tests was suggested, but not demonstrated. In analysing the results for sex differences, however, the possibility of such an interpretation must be kept in mind. Another possible source of error is that due to incomplete data. VARIABLE FACTORS IN THE BINET TESTS 6g The experimenters, in giving the tests, would give only those within the approximate range of the subject, so that each test would be given to a superior selection of children below the normal range of the test, and to an inferior selection of subjects above this range, a process tending to make the apparent growth of an ability less than the probable real growth. In comparing the results of the sexes, however, it is not necessary to have ac- curate results on the growth of an ability, but results which have the same determining factors. If the experimenters gave the test to approximately the same proportions of boys and girls at each age, a comparison of the percentage passed is legitimate, even if a small proportion of the whole group were actually tested, for the proportion would include the same selection of subjects. The number of boys and girls at each age, and the percentage that each test was given to these subjects are shown in Table 12. The test of counting 13 pennies, for example, was given Tfy times to 6 year boys, or 100% of the possible number of times, while the test of counting from 20 to o was given 2y times to the same group, or 73% of the possible number of times. Column A shows the total number of times each test was given to all of the boys and girls. Column B gives the average age of all the boys and girls to whom each test was given. The average given in this case is not the actual average derived from the actual chronological age of each subject figured in tenths, but the weighted^ average, the whole numbers 6, 7, 8, 9, 10, 11, and 12 being used. Table 12 shows a very close correspondence between the per- centage that each test was given to boys and girls of each age, so that the error due to incomplete data, though present, is present to the same extent in the results of both sexes, and may be disregarded. A fairly close correspondence in the average age of all the boys and girls to whom each test was given is also indicated in Table 12. In the test of counting stamps there is an ^ For example, in the test of counting 13 pennies, the average age of the boys to whom the test was given is, — (37x6) + (28x7) + (i6x8) + (8x9)+(7xio) + (3xii ) + (ixi2) _ _ ,^ ^^^^^^ _ 7.33 >ears TO CARL C. BRIGHAM TABLE 12 Percentage that Each Test Was Given to Boys and Girls of Each Age, the Total Number of Times Each Test Was Given to Each Sex and the Average Age of All Subjects of Each Sex to Whom Each Test Was Given. A B Chronological age 6 7 8 9 10 11 12 Total Average number age of Number of subjects Boys 37 29 24 20 31 28 18 of subjects. Number of subjects Girls 23 31 28 22 23 20 18 times (weighted) given Counting 13 pennies Boys 100 97 67 40 23 1 1 6 100 7.33 Girls 100 94 68 41 26 10 Ii 90 7.56 Describing pictures Boys 100 90 67 45 26 11 6 100 7.38 Girls 100 94 68 41 30 20 11 93 7.66 Copying diamond Boys 100 93 63 60 32 14 17 108 7.30 Girls 100 94 61 64 3S 20 II 97 7.74 Naming colors Boys lOO 93 67 45 23 11 6 100 7-35 Girls 100 94 68 41 30 15 11 92 7.62 Counting from 20 to Boys 7^, 97 83 80 61 21 44 124 8.18 Girls 74 71 79 77 5-' 35 28 102 8.25 Counting stamps Boys 65 97 88 80 61 21 44 122 8.23 Girls 83 87 79 82 52 35 39 112 8.23 Repeating all digits Boys 95 100 100 100 100 100 100 185 8.75 Girls 96 97 96 100 100 100 100 162 8.78 Naming days of week Boys 92 100 83 80 61 21 44 132 8.04 Girls 96 90 82 82 52 33 28 115 8.10 Giving day and date Boys 43 76 88 95 84 64 89 138 8.10 Girls 78 77 93 100 78 73 72 136 8.70 Naming the months Boys 41 79 79 95 81 54 78 130 8.90 Girls 39 65 93 100 70 63 61 117 8.84 Naming money Boys 27 62 67 90 97 64 78 124 9.21 Girls 43 39 86 93 100 80 67 iiS 9. 11 Copying designs Boys 16 31 67 85 94 79 78 113 9.56 Girls 26 19 37 86 96 80 67 97 9.46 3 words in sentence Boys 8 31 63 90 100 93 100 120 9.79 Girls 26 26 68 86 100 93 94 iii 9.36 60 words in 3 minutes Boys 11 21 38 70 81 93 89 100 9.92 Girls 30 10 32 30 74 90 78 79 9.75 Giving rhymes Boys 8 21 25 50 74 89 94 90 10.08 Girls 13 13 36 43 74 90 83 77 9.92 Defining "fork" etc. Boj's 38 62 50 55 48 25 17 80 8.35 Girls 61 63 61 68 39 20 11 81 8.09 VARIABLE FACTORS IN THE BINET TESTS 71 actual correspondence. The greatest difference is that of 0.6 yr. in the test of giving the date. The differences, on the whole, are small, but must be taken into consideration when comparing the percentages that all boys and girls pass each test. Two methods are available for studying the influence of sex differences on the individual tests. The first is that of comparing the results of boys and girls of each age on each test. This method is affected by the chance selection of superior or inferior subjects, and the results can have no meaning unless the relations of the groups of each age of the same sex are understood. For example, the fact that the 12 year boys are 36% lower than the 12 year girls in the test of naming the months has no significance as an isolated finding, for its significance is modified by the additional fact that this group of 12 year boys is 10% lower than the 9 year boys, 12% lower than the 10 year boys, and g% lower than the 1 1 year boys on the same test. The other method is that of comparing the per cent, that all subjects of each sex pass each test. This method avoids the factor of variations in the results due to a chance superiority of one age group over the other of the opposite sex, but, at the same time, it tends to obscure the magnitude of the differences that might occur. The most reliable differential measure be- tween two groups is one that is well within the range of ability of the groups. The difference will be obscured if the measure is too easy or too difficult. A comparison of the results of all subjects would, in this way, tend to minimize^ the magnitude of the real difference between the groups. Furthermore, there is a possibility that one sex might acquire an ability first, but even- tually be surpassed by the other. The per cent, that all subjects passed would show no deviation, because the two tendencies would balance. 2 For example, if there were 20 subjects of each age and of each sex from 6 to 12, and a certain test were passed by 75% of the 6 year girls, and by all of the 7, 8, 9, 10, 11 and 12 j^ear girls, by 50% of the 6 year boys, 75% of the 7 year boys and all of the remaining groups, the total percentage passed for all girls would be 96%, and for all boys, 89%. The differential character of the test is indicated by the value 7%, while its actual differential character, just within the range of ability of the groups, is 25%. 72 CARL C. BRIGHAM Neither method, then, is entirely satisfactory, the first because it would tend to exaggerate chance differences, the second because it would tend to obscure real differences. The method used in this study is that of comparing the results of non-selected and selected subjects of each age and sex, studying first the general growth of each ability from age to age within each sex, and using the per cent, that all subjects pass each test to determine the cor- relation between the results of non-selected and selected subjects. Table 13 shows the percentage of proportion" that the boys and girls of each age pass each test, the percentage that all boys and girls pass each test, the actual percentage that the boys are su- perior to (-|-) or inferior to ( — ) the girls of each age, the differ- ence between the average age of all boys and girls to whom each test was given, and the dift'erence between the percentage that all boys and girls pass each test. The differences between the performance of the boys and girls at each age have no meaning unless the general growth of the abilities in each sex is first understood. Studying first the re- sults of the 187 non-selected boys shown in the first seven columns of Table 13, it may be seen that the growth of ability in each test is rather irregular. The test of naming the months, for example, shows a slight decrease from 9 to 12. The dift'er- ences between the percentage performances of the subjects of each age and those of the preceding age were calculated. The 12 year group, compared to the 11 year group, is -|-ii% on the test of giving the date, — 9% on the test of naming the months etc. 61 differences were thus obtained, varying in magnitude from — 15% to +36%, the median being +8% (Q=975%)- 13 of the deviations (21 % ) were minus values. The largest nega- tive deviations occurred in the tests of naming colors ( — 15%, 7 to 8), naming money ( — 15%, 11 to 12), and constructing a sentence containing two ideas ( — 13%, 8 to 9). The remaining 10 minus deviations were less than 10%. 3 The proportion given is the number of times a test was given over the number of times a test was passed. No percentages were calculated for tests given less than 12 times, and no percentages are given for the defini- tions tests on account of the small number of times they are given to all subjects. -I u WW r '^ P- < W O o m PL, CN LT, l-" 'T CO ^c ^ vr »-. P-, ■T O ro oc -^ O T t^ 0) 4- + T 1 00 1" t-^ t^ + I T o o o Q VO VO O + + O O ^^ + + PO t^ T + + s3Sb aSBi 3AE ^ ^ ^ Ul 3DU3J3j^{Qi 1' 1 1 1 1 1 1 •r + -f + -f -f + + + -f 0) (— * O VO 00 w O VO Ov tn " + r T T T + + Y •o 4^ en t>s w ON fO 00 >- « VO i-H CA i-i t— 1 0) a 1 T T + 1 1 + 1 1 o '^ w O in 00 00 *-• CJ Q\ lO t^ c< CS t— t ^' H-< *-4 + + 1 1 1 T 1 1 + + + 1 OM CO On M Ov C^ t^ VO O rf Tt o\ h^ M + 1 1 + T 1 + + + + T CTl 00 ^ o o< a, ^ l^ o oo VO ro r^ O t^ VO <« " >-• C^ M hH CN O c 1 + + 1 + + 1 I 1 1 + + + t^ '^ •^ t-^ CO l^ t^ tx VC Cn. c^j ^ lU fe 1 + + + 1 + 1 T I + -V 5 1 VO VO HH K- O U-, -t 0\ -t 1-1 C^l T j r 1 1 1 1 + 1 1 1 spafqns ii« a Ov ro^o Ov*D ro N. N O 0) O PO o, t^ Po Tl-ir> H- •* ^ oooao O t^ 1-vO oc •^00 Oo ■* O COVO pa'ssBd ^.6 Os a.3o t^ t^oo Ov -^ VO LO T^ ■^ ■<^vO a-j \0 VO inso "T 'q- CS »>oaiO00O0 l^i-^ Ov -\SovOOir)t^C><<0 t^oo VO -"^-vS O in o CO «vi M o ~-v. **^ \^ ^v^ "^s^ "^^ ^^ '\^ ^\ ■^^-^---, t^t^^^ lOVO oo ^^^,.-^-< (\jM(\iCO'Vjl-l<\400 >O00 t^ 00 vr> w 1-1 i-< CO fVj CO «Vj fO'vjcs^rM'*wf\o co'o t^v^poo uot>*o\o co'vj f^t ovo »t- f^ *too t^t>.'^o '»^ ^^^ "\ \ ^^ >x>. >,>.\vo j-^\ ~^00 O t^ 0\ ts-vQ PC •^ In °o IT) f>o r>.oo VO >j ^v^"\'\"\ m^ (^-^ ^^ n oo^o KVO t^ VO t^ •-< «^ >!h tV '^ O t^VS 00 t^0st^t^N,0\'O c^°o c- »n iC c r^ o VO •*^ O i-i rfvo r+ VO in c o po t^ o, in Ov po Po "-V. \-^\\^^\^, 1^ t^ t^ »o u-;xo c. C O O O. 0^0 XD cvj o w ^^\\ . t^O'OO K O Oo t^ t^ i-i 1-1 "^ Ovin Ov Ages HH t^ o p\ o> 21 Ov o\oo o o, inoo VO t^oo O ■* >-« -"t^C pr^P^, 101-. (^f^lN k-ivo »o •^f^ o ^ c^ ]^ \\\C\t^\\ Ul «o j>N »^ lo ■.vO PD'v^. C0'-< Vp >0 ^^ 00 CX 0\0> 0\ Ov ►-1 o o-^-v^^^^-^ 03 I-I S I-I >0 "-I VO '3) I-I l~, M H o Tto-t^rOKiwOin •-< PI «0 IT) -^ U-) (VioO 1* M •005 i-< irj >0 O f^p rx >-. ffO inoo cv) VO t~x >o "o 00 a. O O ^ C woo C U-, ^\o ^. cr; ^ t>.oo "3- «n •^^ ^^ W fvj 00 VO cv) tV)-< ^.•\-\ 1-1 '-"-^^ c '-. "-I 0^ OvVO O \^ C^ K o H t^l t^ w *-, 1-1 1-1 U S3 «^ O f^ ffs^ VO oomfvj ov'vjOO «OOv f^oa vc t^ <0 1-1 O fO O PO ^ PO O CO O rt PoVO "OVO •<* t^ On 0,000 ovr^O\0\~i-■ H ■\ ^^ ^^^ \^ ^v^ ^^ \~--.\ 1-1 >< w '-^ I-, Cn O C: ^ ND PO VO cv^O 'T t< CO Po\^^ t ^ TT ^ 1-1 !-< ■s. V. V-. y. ij-. J. x. y, '-r^ tA cA 'S. cr. 'x V. 'T. 'X. IT. -r. ■T. J. iX. X. X. 'X. ""' ao -x. -x. tx. -Xi 'X. ir, x. ixi tr. '^~ 'z-.t e:t ^'."b e.h g-.-n S'.'H &'± s*:^ S'.'s ^± s'.'e ?;| ^.'5 §> h ^'.h S'.-H g'± j:pqomowo CO M bo S 1 g-c5.5 E.S o- — ^ (fl O C a-5 cfl iM ^ "O be «" C C -C tv) O biD Jf; ■tJ -^ p c: c ^ rt re S 03 O qMh ?t-' ?U rt c .t; C ,'i^ c c-S >^ "^^ o -^ ^ P" gj -3 Q^ 4-» U p u :^ u U Cii ^ o ^ ^ U CO c^ a: b Q Q 74 CARL C. BRIGHAM An index of the growth from year to year was obtained by calculating the average percentage increase from one age group to another. For example, the 7 year boys were 26% higher than the 6 year boys in the test of naming colors, 5% higher in naming the date etc. The average of the 10 possible comparisons between 6 and 7 year boys shows that the latter averaged 16.1% higher than the former. The average increases in percentage passed from year to year are as follows, — 6 to 7=16. i % ; 7 to 8^13.5% 8 to 9=8.7%; 9 to 10=11.2%; 10 to 11^6.0%; and 11 to 12=0.2%. These figures show strikingly the irregularity of the growth from age to age. Comparing these average percent- age increases in tests with the averages shown in Tables 9 and 11, there is no observable relation between this increase and the in- crease in average age from age to age, or the increase in average grade from age to age. The smallest increase in the tests (0.2%, II to 12) coincides with the smallest increase in average age from year to year (0.84 yr.), and the smallest increase in average grade from year to year. The other relations are varied. The fact of the variability in the results of the non-selected boys stands out. The irregularity of the growth of the various abilities, and the fact that in 21% of the cases the boys of one age are actually lower than those of the previous age, point to the conclusion that certain allowances will have to be made for chance variations. It is not possible to acccount for the varia- tions in growth by reference to the relative increase in average age or average grade from year to year. The results of the 165 non-selected girls, shown in italics in the first seven columns of table 13, were studied in the same manner as the results of the boys. 60 differences between the percentage performance of the girls of each age and those of the preceding age were obtained. These differences ranged from — 2>2>7g to -f 50%, the median being 7% {Q=&%). 10 of the deviations (17%), were minus values. The largest deviations were shown in the tests of naming 60 words, ( — 33%, 11 to 12), counting stamps ( — 20%, 9 to 10), and drawing designs I VARIABLE FACTORS IN THE BINET TESTS 75 ( — 14%, 8 to 9). The remaining 7 minus deviations were below 10%. The average increases in the percentage passed from year to year are as follows, — 6 to 7=3.9%; 7 to 8:=i5%; 8 to 9= 8.8%; 9 to 10=10.1%; 10 to 11^8.7%; II to 12=1.8%. Both boys and girls show the smallest average increase in the percent- age passed in the step from 11 to 12, and the magnitudes of the increases agree fairly well except for the step from 6 to 7. The increase of the 7 year girls over the 6 year girls is 3.9%, the next to the smallest increase of one age group over any preceding group. The 7 year boys, however, show an average increase of 16.1%, over the 6 year boys, the largest increase of any group of boys over any preceding group. It will be difficult, then, to draw conclusions concerning sex differences from a comparison of the 6 year boys and girls, for the 6 year girls are either a superior selection or the 6 year boys are an inferior selection if the character of these groups be judged by the comparison with the 7 year subjects. The same comparison, on the other hand, might indicate that the 7 year girls were an inferior se- lection and the 7 year boys a superior selection from the general run. It is only possible to point out the irregularity, however, it is not possible to show the cause of the irregularity. A comparison of the average increase in the percentage passed by girls from age to age with the increase in the average ages shown in Table 9 shows no demonstrable relation to exist. Com- paring this growth in the ability on the tests with the growth in average grade, shown in Table 1 1 , shows a very positive relation to exist between these factors. Where the increase in average grade is smallest (i.e. from 6 to 7 and from 11 to 12), the in- crease in the tests is smallest (3.9% and 1.8%), while the great- est increase in grade (from 9 to 10 and from 7 to 8) coincide with the greatest increase in the test abilities (10.1% and 15.0%). This relation was not indicated in the results of the boys. The explanation of this fact that a correlation between the increase in the tests with grade was found in the results of the girls but not of the boys is a matter of speculation. It has been shown that the boys have a higher variability in grade than Td CARL C. B RICH AM ■ girls. This tendency of the boys to be distributed in a wider range of grades might nullify the grade correlation slightly, but probably not to any considerable extent. The fact that the causes of this variation are not determined serves to illustrate the dangers of comparing the results of two groups when the factors operating on the groups are not known. The foregoing study of the growth of the various abilities from age to age in each sex, and the analysis of the causes in- fluencing this growth, demonstrates the great variability of the results. This fact of variability must be considered before draw- ing conclusions concerning sex differences by the method of comparing the results of boys and girls of each age. The percentage differences between the performance of non- selected boys and girls of each age are shown in Table 13. In actual magnitude, these differences vary from 0% to 36%, the median being 9% (Q^5.5%). 75% of the differences are 17% or under, and only 16% are over 20%. In regard to sign, the differences vary from — 36% to -f-26%, the median being — 3.5% (Q=8.75%), showing a slight general superiority of the girls. If the number of possibilities of variation in compar- ing the results of small groups of non-selected subjects are taken into consideration, the presence of mental defectives, of subjects having language difficulties, of subjects in different grades influenced by different training, the possibility of a super- ior selection of subjects at one age group than at another, and the probability that similar chance samplings would not fall at the same age, the fact of correspondence indicated in Table 13 has more meaning than the fact of divergence. The variability indicated in the study of the growth of abilities with age was so great that it makes interpretation of the results in terms of sex dift'erences very difficult, and warranted conclu- sions impossible. It is legitimate to expect that the older subjects of either sex should make higher scores than the younger sub- jects of the same sex, but this was not found to be the universal rule. The boys' results showed minus deviations in 21% of the cases and the girls' results showed minus deviations in 17% of the cases. In one case the 12 year girls were 33% lower than VARIABLE FACTORS IN THE BINET TESTS 77 the II year girls. If this value (33%) be taken as the error due to chance variation, then only one value, that of — 36%, (naming the months, age 12), may be taken as significant, and it has been seen that in this test the 12 year boys are 10% lower than the 9 year boys. The conclusion would follow, then, that there were no sex differences. This alternative, however, seems to place too much weight on one variation so that the truth probably lies in the assertion that the sex differences, that actually exist, are slight. A study of the reactions of selected groups of boys and girls should throw light on the results from non-selected subjects, and make conclusions more certain. Subjects were selected by a process of elimination and selection. All of the subjects that were in the special class and minus grades were eliminated, along with all children of non-English speaking parents. From the following group of English speaking subjects in the regular grades all subjects were eliminated who had entered grade at an age very much above or below that of the general run of en- trants.* The remaining subjects ranged in age from 4.3 years to 14.4 years, but were found to group rather closely around certain ages. It was possible to find four groups of boys and girls of approximately the same chronological ages. The char- acter of these subjects is indicated in Table 14. The four groups of subjects, chronologically from 6.0 to 6.9, 7.6 to 8.9, 9.7 to 10.9 and 11.7 to 13.3 (which will be referred to as 6, 8, 10 and 12), were distributed in approximately the same grades, and had approximately the same average age and average grade. The results of these groups are shown in Table 15, which is arranged to show all the facts for selected subjects that were given for non-selected subjects in Tables 12 and 13. The first four columns show the percentage that each test was given to each group. The next four columns show the percentage or the proportion that the subjects in each group passed each * The ages on entering each grade of the subjects retained were as fol- lows, — Kindergarten — 4, 5 and 6; Grade I =: 5, 6 and 7; Grade II = 6, 7 and 8; Grade III = 8, 9 and 10; Grade IV = 9, 10 and 11 ; Grade V = 10, II and 12; Grade VI = 11, 12 and 13. 78 CARL C. BRIGHAM TABLE 14 Age in Grade Distribution, Average Grade and Average Age of 167 Selected Subjects. 86 Boys and 81 Girls. Age in Grade Distribution Age Group Sex K I II III IV V VI TOTAL Average Average Grade (M.V.) Age (M.V.) 6.0 to 6.9 Boys 5 13 18 0.72 (0.40) 6.52 (0.22) Girls 3 13 2 18 0.89 (0.39) 6.53 (0.22) 7.6 to 8.9 Boys 7 13 3 23 1.83 (0.51) 8.09 (0.38) Girls 2 13 5 20 2.15 (0.43) 8.32 (0.38) 9.7 to 10.9 Boys 6 12 2 20 3.80 (0.48) 10.37 (0.36) Girls 9 7 5 21 3.8i (0.69) 10.14 (0.32) 11.7 to 13.3 Boys 2 8 15 25 S.52 (0.58) 12.35 (0.55) Girls 3 8 II 22 5.36 (0.64) 12.41 (0.46) test. Column A shows the total number of times each test was given to all boys and girls, Column B, the weighted average age (the average ages given in Table 14 being used), and Column C the percentage that all subjects passed each test. The next four columns show the percentage that the boys are above ( + ) or below ( — ) the girls. Column D (derived from Column B), gives the difference between the average ages of all subjects to whom each test was given. Column E (derived from Column C), gives the differences between the percentages passed by all boys and girls on each test. The growth of the various abilities with age in the selected groups of subjects is more uniform than that shown by the non- selected subjects. Only three cases appear in which the younger subjects make higher scores than those of older subjects, these exceptions occurring in the tests of describing pictures ( — 3%, girls 6 to 8), naming colors ( — 7%, girls 6 to 8), and naming months ( — 9%, boys, 10 to 12). In the comparison of the sexes 41 differences are obtained varying in magnitude from — 28% to +26%, the median being 0% {Q=g,^%). In actual magnitude the differences vary from o to 28, the median being 10% (Q=4.75%), the median being 1% higher than that of non-selected data, and the variability 0.75% less. 75% of the differences were less than 14%. C E w 1 O + 00 1 T I 0) 1 1 !N + T IN 00 + 00 + + (N + ,'3 Q in 5 VO 01 o ^ o 5= S CO O ^ O % 01 ID OS oS^ a^ u 1 1 T 1 + 1 + + + 1 + + + + + + + + 1) tn (30 CM + o T '^ o T 01 1 o 01 + CO i >, fi -Q^ o + T o o + 00 + + Tf + CO + •5 1 o « o 00 m 00 + 0) + CO T 00 + o + 0) IN 1 1 Dh;5 ^ i h-4 + T HH 1 + T CO T 01 o o pq p.z. < 3 He/) (Al U pq VO O O O 01 O CO t^<0 <^ ID ■<^00 "Ni ID «X O ^^ 1^01 O ■*>DiDt^i-i crji-i tVj r-^sQ o •^ t^vs 0\ O 5 OsOO 0^ 0\ 0\ -^O ID, ^ -^ lo t^ J^\0 ^ IDX3 vO^OCOCor^t^iDCru^ w-jvo VO O Ov -"t c^ tN.<\)00 ^ 0°o co'^i 1-1 t^t^ '^00 00 00 '<^vO Oo Tt-^rt^iDCv^M fr, i^oo t^on iDOo cooq 00 O lOOq "^'^»O"0"?*0'^*N ^ '^ ^ "0°Q °°. ^. '^. ^. '^ ^. '^. 00 "^ ^ ^ (» "O^O ^ t>. K. t<. t-L r>.od t-i ts.od od od °o d\ ckod '^c>6\d\o^dNCxd6dddo'ddi-H'dQdooodod i-i*~i>-iNi-<'~ii-|i-(i-iS D ■^\0 XD VO >n\0 ^ ID >^ «D i^ ID >D, rf •'^ TJ- -^ CO c^ CO (^ rj- cr> Tf «V5 ■* "^ -* < 01 a i-i o u o-o CXi riiD>-i coc^ H CO Cj CO "J^ CO D T^ IrjoO O CO "J"! CO 1^ CO «WD t^ O O \^\,^^"v,-\^v,-~-. 0\ CK 0\^ iDO O O CO »D rt »D00 0> CO v^ O O O O\0ovo »0 CO 'Vl r^t^iDCrivo >D\0 "O "^^ I VS Q fVi ID co^O "^t-oo O On OvOO Or-^C\Ovi~, Tt-i~< CO cr^vO O\0) crjcrjrv-jro'y-iDO 0\CV| I Oo O 0\ On O^ CO NO •* •^ cO co i-^nq 01 c^^ (n) ,^ oi ^-)~\ ^^^^~\^s, W.-\\^^ c-c •-< 0 iDNO ^"^ ID i-i 1 NO "^ ■* OONO t^ 5 N.00 0'-<<\)01>--iOowOi-<0000'~^0'00>^<010n01CoQ001Ck00001> t^O t^ O Ov O Os O\0o OsOo l-r 1^ l-l iD^O '^O coiD^iD<\l tD*^0 O iDt^ONQ iDNO >-ir)fx>D5otxiO ONOo 00 t^ t^ 5 ex O Oo On O OnOq t^NQ NO '^hiD'OiDV^coN 01 cy^NO '^no «0 O t^ fN^NO On 0\ O "-I Oo NO coNO c^ w 0\N0 «^ ONOOOID'NOCO'O'-''^ Ct) C»^m(V^ k, 1:5 5s o ^ o ^ o ^ o ^ o ^ o^ o ^ o ^ o ^ o ^ o -"S o ^ o ^ o ^ o ^ o ^ o ^ o ^ . bo c c b O » OJ U Q 3 3.S g C U ^ S en • bObfltnc SJTSaS S .loS . (U g - ,^ >>^ g 3 « C ^- Q g "^S £-^- S^-3 O f) e/l "a > 4) 4J (_ ^S E o >.^> O 3 " o u 3 01 -■ C ».- en 00^ -O bO-5 M° S»"jq^J£^ ""^ •^ E rt.tJ.S ^ ^^.S c.S S.S £"2 CrtojbCc c^3o3. >«oo U .s^^5.^g'^g^E.sE'H';f;^:;b.^'.s s^l1eiNS:2giiiE3|-p zb:z;:2;u coco^bQ a 3 en bo i> ..S 3 "tC o 3 4; 4-1 8o CARL C. BRIGHAM The change of the median of the series of differences from — 3.5% (non-selected) to 0% (selected) shows that the elimina- tion of over age and special grade pupils has helped the boys more than the girls, and has altered the general relations between the sexes. This fact is also indicated by the average difference in the percentages that all subjects pass each test, the average for non-selected subjects being — 1.4% and for selected subjects -f-i.6%. The non-selected boys from 6 to 12 were given, in all, 2436 tests, these tests being passed 60.8% of the time. The non-selected girls were given 2195 tests, passing 61.6%, the advantage being 0.8% in their favor. The selected boys were given 1 125 tests, passing 64.3%, an advantage of 0.1% over the girls who passed 64.2% of 1034 tests. The foregoing changes indicate clearly that the selection of subjects has changed the general relations between the sexes, helping the boys more than the girls. The relations between the results of selected and non-selected subjects may be studied by a comparison of the differences be- tween the percentages passed by all subjects. If the differences between the scores of the boys and girls are due to but one factor, that of sex differences, then the correlation between the two methods of study should be very nearly absolute. The cor- relation (Pearson product-moments formula) between the dif- ferences in the percentage passed by all boys and girls according to the two methods is 0.726 (pe=o.o75). This correlation be- tween the two methods is high, but it would probably be high inasmuch as the 167 selected subjects are included in the 352 non-selected subjects. The results of the two methods show cer- tain large discrepancies. The changes of the greatest magnitude are those shown by the 60 words test (+4% by the first method to + 18% by the second), the tests of defining in terms superior to use (+7% to +21%), of naming the days of the week, ( — 16% to — 2%), giving rhymes, ( — 10% to +1%), naming colors, ( — 14% to ' — 4%), copying the diamond, ( + 1% to — 8%), and counting from 20 to o ( — 8% to — 16%), The comparison of the median differences shows that the selected method tends to improve the results of the boys more VARIABLE FACTORS IN THE BINET TESTS 8i than those of the girls. All of the changes in the results of the two methods are not in favor of the boys, however, the total scores on the diamond and 20 to o tests showing changes in favor of the girls. If the cause of the variations shown by the first method is the presence of a few children of non-English speaking parents, to special class and minus grade children, then the elimination of this source of error should change the results in only one direction. The analysis of the results of selected subjects, therefore, does not lessen the difficulty of the interpretation of the results in the light of sex differences. The rate of growth of the various abilities with age is irregular. The analysis of the irregularities points to the fact that the boys or girls of any age may be a chance selection of superior or inferior subjects at that age. The method of comparing selected subjects would tend to eliminate the inferior selection of subjects, but would not eliminate the possibility of a superior selection. The comparison of the results of the sexes shows differences at certain ages and on certain tests that are as high as 20%. The problem involved is that of deciding whether these large differences are due to chance or to differences in the reactions of the sexes. Certain tests show large deviations first in favor of one sex and then in favor of the other. If a difference of a percentage of any magnitude on any test is to be attributed to a sex difference, then the same line of reasoning will show that in certain tests the abilities change from one sex to the other. The analysis of the tests that show this crossing of ability should throw light on the other tests. Three tests show substantial differences in favor of both sexes according to both methods. In the test of copying the diamond, the non-selected girls lead at the start, age 6, and the boys are ahead at 7, 8 and 9, the same relations being shown by selected subjects of 6 and 8. In the test of copying the designs from memory, the non-selected girls are 24% below the boys at age 9 and 21% above the boys at age 12, the same relations being shown by the selected subjects of 10 and 12. In the test of naming 60 words in three minutes, the non-selected girls are 82 CARL C. B RICH AM 19% above the boys at 9, and 19% below at 12. The selected boys of 10 and 12 are in advance of the girls in this test. These three tests are crucial in the consideration of the prob- lem of whether differences shown between the boys and girls are due to actual sex differences or due to accidental causes. Each of these tests may be studied by a method more accurate than that of comparing the percentage passed at each age. The reproductions of the diamond were arbitrarily sorted in six groups according to their merits by a method described in the discussion of the personal equation. The first group contained the best repro- ductions, the sixth, the poorest. The reproductions of the designs were graded from o to 20 by an arbitrary point system described under the discussion of the personal equation. A measure of the ability in the 60 word test is the actual number of words given in three minutes, a measure recorded by the experi- menters in each case. Table 16 shows the average score made by the non-selected and selected boys and girls of each age in these three tests. TABLE 16 Average Score (Mean Variation) of Subjects of Each Age on Three Tests. Copying the Diamond Average Group of the Reproductions, en Boys Girls !_ 6 4.27(1.28) 3.57(1.24) ^ 7 2.85(1.04) 3.17(1-37) ^ 8 2.20(1.15) 3.24(1.57) ^ 9 2.33(0.89) 3.00(1.29) u ^ 10 - II = 12 •«^ 6 4.27(1.20) 3.33(1.26) u.H. 8 2.32(1.00) 3.00(1.17) Drawing the Designs Average number of points scored. Boys Girls 8.06(6.19) 9.00(5.25) 10.29(5.30) 5.32(4.61) 9.I7(S.33) 9.18(6.73) 8.64(6.73) 10.94(7.06) 8.64(6.02) 11.08(6.08) 9.55(5.60) 7.29(6.42) 12.53(5.38) 13.56(5.55) Naming 60 words Average number of words given in three minutes. Boys Girls 52.93(11.20) 59.91(10.10) 68.12(13.12) 61.76(11.25) 73.65(13.35) 71.28(14.25) 68.75(12.28) 58.14(12.57) 67.31(12.74) 62.13(11.39) 75.33(10.92) 66.84(13.87) The relations indicated by the percentage passed are also indi- cated by the more reliable method of comparing the average scores. In the test of copying the diamond, the 6 year non- selected girls average 0.70 group better than the boys, while the VARIABLE FACTORS IN THE BINET TESTS 83 selected girls are 0.94 ahead. The comparison of the 7, 8 and 9 year subjects shows the boys ahead in all cases, the 8 year non- selected boys averaging over one group higher. The non-selected boys show an improvement of two groups from 6 to 9, while the girls show an improvement of only half a group. One sex shows a decided growth of ability, the other practically none. If the differences indicated are to be taken as real, it will be necessary to assume that the girls pick up the ability to draw a diamond easier than the boys, but that this ability once obtained remains constant — that the effect of maturity operates on one sex but not on another. The number of cases on which this assumption is based (174 subjects from 6 to 9) is so small, and the chances of variation in the selection of subjects of different intellectual status in each age group is so large, that the assumption is not substantiated. The relations indicated in the test of copying the designs are more variable than those of the diamond test. The 9 year non-selected boys show an improvement over the 8 year boys, but from 9 to 12 there is a gradual decrease in the ability, so that the 1 1 and 12 year boys are only slightly ahead of the 8 year boys. The relations shown by the non-selected girls are exactly the reverse of those of the boys. The 9 year girls are very much lower than the 8 year girls, and a gradual increase appears from 9 to 12 instead of a decrease. The comparison of these opposite relations gives a maximum difference in favor of the boys at 9 and the girls at 12. If the relations indicated in this test are to be considered definite, the assumption is involved that the influence of increasing age on one sex is exactly opposite to that on the other sex, an assumption that is not substantiated in view of the small number of cases (183 subjects from 8 to 12) and the possi- bility of selecting subjects of chance superiority in the small groups at each age. The relations indicated in the test of naming 60 words are more constant than those shown in the diamond or design tests. Both sexes show a growth of ability from 9 to 11 and a decrease from II to 12. The growth is irregular, however, the girls showing less growth from 9 to 10, and a greater drop from 11 to 84 CARL C. BRIGHAM 12, SO that a comparison of the sexes shows a deviation in favoi of the girls at 9 and of the boys at 12. The assumption of any large sex differences in this test involves the assumption that 12 year girls have less ability in this test than 9 year girls, and that the influence of maturity operates differently on the two sexes, an assumption that is not substantiated in view of the many varia- ble factors. The conclusion that a definite crossing of ability between the sexes occurs in the tests of copying the diamond, copying designs and naming 60 words, is not substantiated. It is not justifiable to attribute a difference of 20% between the sexes to a real sex difference on one test and not on another. If the differences shown between the results of the sexes in the tests of constructing a sentence containing one idea, of naming the months, naming the days of the week, counting stamps and naming colors are to be attributed to sex differences, then the variations in ability shown in the diamond, design and 60 word test must be assumed to be definite. These assumptions were not found to be sub- stantiated, however, so that it is not possible to draw any con- clusions concerning sex differences from a study of the percent- age that selected or unselected subjects of each age pass each test. The variable influences due to the selection of subjects of different status at each age are eliminated or counterbalanced to some extent by combining the subjects of all ages. The differ- ences between the percentages that all boys and girls pass each test are to some extent influenced by the ages of the subjects to whom each test was given. The correlation (Pearson product- moments formula) of the differences between the percentages that all non-selected boys and girls passed each test with the difference between the average ages of all the non-selected boys and girls to whom each test was given is 0.394 (pe=o.i34). The correlation between the same arrays from selected subjects (i.e. between Columns D and E of Table 15) is 0.388 (pe=o.i35). These correlations between the tests and age are high enough to indicate that the factor of age is present to some extent. The close correspondence in the correlations from the two methods VARIABLE FACTORS IN THE BINET TESTS 85 indicates that the age factor is present to the same extent in both methods. The tests vary in the degree with which they correlate with age, so that it is not possible to estimate the amount of the influence of this factor. Furthermore, it has been seen that the results from the two methods are not in strict accordance, that the elimination of inferior subjects caused changes in the results in both directions. For these reasons, it is not possible to draw any conclusions concerning sex differences from a comparison of the percentages passed by all subjects. Certain negative conclusions are, however, possible. The num- ber of subjects at each age in both methods is comparatively small. The chances of variations due to factors other than sex differences has been shown to be very large. The fact of corres- pondence between the results of the two sexes is therefore of more importance than the fact of divergence. 75% of the differ- ences between the non-selected boys and girls are 17% or under, while the same proportion of the differences between selected boys and girls falls under 14%. If it is assumed that the subjects of any age should not test lower than those of any preceding age, and allowance is made for differences between the sexes that are exaggerated on account of the chance falling off of ability with older subjects, only 9% of the differences between the non- selected boys and girls are over 20% (derived from Table 13). The evidence from the foregoing methods of study points to the conclusion that the sex differences, if present, are under 20% or 25% as a maximum, and that deviations of this magnitude are marked exceptions to the general run of differences. The conclusion that the differences that might possibly be attributed to the sex factor are slight, has no meaning unless the word "slight" is defined independently of the writer's personal opinion. The differences shown between the results of the sexes are smaller than those that were attributed to the factor of the per- sonal equation in the study of the results of the four experi- menters. It was concluded that certain tests were influenced by grade training. These tests showed from 40% to 60% improve- ment from one grade to another, so that the greatest influence that may be attributed to the sex factor is only approximately 86 CARL C. BRIGHAM one half that due to grade training. The following study of the diagnostic value of the tests will show that the deviations that might be attributed to the sex factor are insignificant when com- pared to the differences between the reactions of normal and re- tarded children to the individual tests. Most of the investigators who have studied the factor of sex differences in the Binet tests, have studied them from the stand- point of the "mental ages" or total scores made by the subjects of both sexes. A few investigators have studied sex differences in the light of the individual tests. Descoeudres (20) reports the results of the application of the Binet tests to 24 subjects, one good and one poor pupil of each sex from each of six school grades, drawing conclusions from this investigation concerning the diagnostic value of the individual tests and the sex differences involved. Obviously the number of subjects is too small to allow any conclusions to be drawn. Chotzen (18) compared the percentage that all feeble-minded boys and girls passed each of 15 tests, finding differences varying in magnitude from 1% to 20%. The largest deviations were those of 20% in favor of the boys in the test of copying the diamond, 13% in favor of the girls in the test of executing three commissions, 12% in favor of the boys in naming the pieces of money, ii%> in favor of the girls in the test of repeating a sentence of 16 syllables, and 10% in favor of the girls in detecting omissions in pictures. All other differences were less than 10%. Bloch and Preiss (9) examined 155 normal Volkschule chil- dren (79 boys and 76 girls) varying in age from 7 to 13. Bober- tag's translation was used. These investigators found very strik- ing differences in the reaction of the sexes to the individual tests, the differences running as high as 52%, most of them in favor of the boys. The differences between the performances of the boys and girls of each age were calculated, without reference to the many sources of variation. The factor of the personal equa- tion is not treated, and this factor alone might cause these varia- tions. If a more careful analysis of the results had been made, it is very probable that the conclusions would have been modified to some extent. The fact that the 11 year VARIABLE FACTORS IN THE BINET TESTS 87 boys are 37% higher than the 11 year girls on the test of criticising absurdities is most certainly modified by the fact that the 11 year subjects are 30% lower than the 10 year subjects in the test of repeating 7 digits. The small num- ber of subjects (in five cases less than 10), would tend to empha- size chance variations. The fact that the number of subjects is too small to warrant definite conclusions is pointed out by the authors. Stern (62) in commenting upon these results, points out the significance of the fact that the inferiority of the girls extends to so many different kinds of tests. The results of Bloch and Preiss are in almost complete contradiction to the re- sults of the present investigation. They find large differences, and find practically all of these differences in favor of the boys. This investigation shows a general run of differences very much smaller, and a slight general superiority of the non-selected girls. The mere fact of contradiction in the results of the two investiga- tions would indicate that the differences were not produced by the common factor of sex. Rogers and Mclntyre (54) give no figures, but report that they have studied their results in the light of sex differences, and have found no correlation between their results and those of Bloch and Preiss. The results of the investigators who have compared the "men- tal ages" or total scores of children of different sexes are some- what at variance. Goddard (30) reports that there are more backward boys than girls. Stern notes that Goddard's results do not bear out his statement, for the percentage of boys and girls testing two or more years retarded is the same (18.5%). The accuracy of Goddard's statement depends on the criterion' used for measuring backwardness. Although Goddard's state- ^ If the criterion is four or more years retarded, there are more backward boys than girls (boys = 3.7%, girls =: 3.1%). If the criterion is three or more years backward, there are more girls than boys (boys := S%, girls = 9.1%). If the criterion is two or more years backward, the propor- tions are the same, as Stern notes. If the criterion is one year or more re- tarded, there are more backward boys than girls (boys = 41.4%, girls = 35-6%). There are more girls than boys testing at and above age according to Goddard's results. 34.7% of the boys and 36.6% of the girls test at age, while 23.8% of the boys and 27.7% of the girls test one year or more above age. 88 CARL C. BRIGHAM ment concerning the backwardness of the boys may be interpreted differently, his figures leave no doubt concerning the fact that there are more girls than boys at and above age, and therefore indicate a general superiority of the girls. Bobertag (lo) computed the average "mental age" of 90 boys and 90 girls regularly distributed from 7 to 12. The subjects were selected according to school grades, so that the average grade of each group differed by exactly one grade. His results show the boys ahead 0.06 yr. at 7, 0.14 yr. at 8 and 9, 0.20 yr. at 10, 0.19 yr. at 11 and 0.14 yr. at 12. These findings cannot be considered entirely out of harmony with those of Goddard, for, as this investigation shows, there may be a change in the relation of non-selected boys and girls and selected boys and girls. Yerkes and his co-workers (82), scoring some of the Binet tests according to the point system, show that the girls of English speaking parents are superior to the boys of the same parentage between 5 and 7, that they fall below with minor variations till II, where they again surpass the boys at 12 and 13, falling below at 14 and 15. The differences between the sexes are smaller and of less practical importance than the differences due to the language factor, but the authors suspect "that at certain ages serious injustice will be done to individuals by evaluating their scores in the light of norms which do not take account of sex differences." (page 73). In contradiction to these results are those of Terman and his co-workers (67), who, scoring the Stanford revision of the Binet scale according to "intelligence quotients," find differences of but 2% to 4% in these quotients in favor of the girls, and who conclude from the basis of their studies of sex differences that the conclusions of Yerkes are unjustified. These two investigations used tests different in character and differently weighted, so that the results would not necessarily have to correspond. The one common feature of most of the researches on sex differences in the Binet-Simon tests is that the differences are small. Burt and Moore (17) summarize the work of various in- vestigators in the general field of sex differences, and report an investigation of their own on 67 boys and 63 girls, 12^ to 13^^ VARIABLE FACTORS IN THE BINET TESTS 85 years of age. They discuss their results and those of the other authors in the order of the complexity of the mental processes involved. They find a high correlation between the size of the sex difference and the simplicity of the capacities compared — the higher the process, and the more complex the capacity, the smaller the sex difference. The general trend of the investigations on sex differences indi- cates that no very large differences are to be expected in the application of intelligence tests, and that the differences to be expected will vary according to the nature of the tests. The re- sults of this investigation are in agreement with the general trend of the investigations in showing only slight differences that might be attributed to the sex factor. The results do not show on what tests, if any, these differences occur. Conclusions con- cerning the amount of influence of this factor must be drawn from more exhaustive investigations on the individual tests. The research of Bateman (3), for instance, is conclusive in the test of naming colors. Bateman shows that there is a difference of 14% in favor of the girls in this test, showing furthermore that the factor of school training causes an improvement of but 18%. The results would indicate that the test should be placed in the fifth or sixth year, but the sex dift'erence of 14% would probably not warrant the placing of the test in a different age group for boys and girls. The investigations of Bolton (11) and Wooley (79) would show that small differences in favor of the girls are to be ex- pected in the tests of repeating digits, and possibly in all memory tests. The investigations of Gilbert {2y), Thompson (68), Burt and Moore, and Peterson and Doll (51) would indicate that a slight difference in favor of the boys should appear in the test of arranging five weights. Ruger's (55) finding of striking dif- ferences in favor of men in a series of puzzle tests, and Wooley and Fisher's finding of large differences in favor of the boys in the Healy puzzle-box test would show that rather large differ- ences might appear in the general class of "puzzle" tests. Even though the sex differences in intelligence tests may be shown to be small, scientific procedure should demand that the 90 CARL C. BRIGHAM investigator who standardizes any test or system of tests should treat his results in such a way as to demonstrate that the factor is present or not present. The burden of proof should still be on the person who maintains that sex differences are not involved. The knowledge of sex differences is especially important in diag- nosing border-line cases of mental defect, where the diagnosis must often be made on the qualitatively different character of the responses to individual tests. VI. SUMMARY. One of the fundamental assumptions in the construction of the Binet-Simon scale is the correlation of the individual tests with age. The correlation of the tests with age is affected by the error due to incomplete data, by the influence of the personal equation of the experimenter, and by the training the subject has received in school. The influence of the personal equation of the experimenter was found to be more marked in some tests than in others, the in- fluence being most marked in the tests of copying the diamond, indicating omissions in pictures, defining in terms superior to use, drawing designs from memory, detecting absurdities in state- ments and reconstructing dissected sentences. The variations between the experimenters could be traced to three sources, — 1 ) to the use of apparatus, variations In which were due to, a) the construction of the test material, and b) the use of alternative questions; 2) to the technique of the experimenters in giving the tests; and 3) to observation errors made by the experimenters in mark- ing a response passed or failed. It is possible to eliminate all three sources of error. The effect of school training was more marked on some tests than on others, the effect being most marked in the tests of count- ing stamps, counting backward from 20 to o, enumerating the days of the week and the months, giving the day and the date, naming the pieces of money, making change, and reconstructing dissected sentences. Tests that involve school training should be standardized on a different basis than those relatively independent of this factor. Although the comparison of "mental ages" and pedagogical ages gives no information concerning the general correlation be- 92 CARL C. B RICH AM tween the Binet tests and the school grades, the study of the individual tests establishes the fact of a general correlation. The correlation of the individual tests with grade is higher than the correlation of the tests with age, this fact being indirect evidence of the value of the tests as measures of intelligence. Sex differences were found to be slight as compared with the influence due to the personal equation or grade training. Since variations occur in the results due to the influence of the personal equation and grade training, certain allowances must be made for these factors in making diagnoses on the basis of the tests. The scale is therefore a qualitative rather than a quan- titative instrument. The investigator who wishes to use his results for standard- izing age norms should use only those data based on the com- plete method of experimenting, and should treat his results in such a Avay as to demonstrate the presence or absence of the variable factors of the personal equation, grade training and sex differences. ■ifciiyyLiii— .-11. «i .I'M BIBLIOGRAPHY 3- 4- 6. 9- 10. II, 12. The Measurement of Mental Ability of Children. Brit. J. of Psychol., 191 1, 4, Abelson, a. R. "Backward" 268-314. Ayres, L. p. The Binet-Simon Measuring Scale of In- telligence : Some Criticisms and Suggestions. Psychol. CHnic, 191 1, 5, 187-196. Bateman, W. G. The Naming of Colors by Children. Ped. Sem., 191 5, 2.2, 469-486. BiNET, A. Nouvelles recherches sur la mesure du niveau intellectuel chez les enfants d'ecole. Annee psychol., 1911, 17, 145-201. BiNET, A. AND Simon T, Methodes nouvelles pour le diagnostic du niveau intellectuel des anormaux. Annee psychol., 1905, 11, 191-244. BiNET, A. AND Simon T. Application des methodes nouvelles au diagnostic du niveau intellectual chez des enfants normaux et anormaux d'hospice et d'ecole primaire. Annee psychol., 1905, 11, 245-336. BiNET^ A. AND Simon T. Le developpement de I'intelli- gence chez les enfants. Annee psychol., 1908, 14, 1-94. Binet, a. and Simon T. La mesure du developpement de I'intelligence chez les jeunes enfants. Bull, de la soc. Hbre pour I'etude psychol. de I'enfant. 191 1, 11, 187-256. Block, E. and Preiss, A. Ueber intelligenzpriifungen an normalen Volksschulkindern nach Bobertag. (Methode von Binet und Simon) Zsch. f. agnew. Psychol., 1912, 6, 539-547. Bobertag, O. Ueber Intelligenzprufungen (nach der Methode von Binet und Simon). I. Methodik und Ergebnisse der einzelnen Tests. Zsch, f. angew. Psychol., 191 1, 5, 105-203. II. Gesamtergebnisse der Methode. Zsch. f. angew. Psychol., 1912, 6, 495-537. Bolton, T. L. The Growth of Memory in School Chil- dren. Amer. J. of Psychol., 1892, 4, 362-380. Bonser, F. G. The Reasoning Ability of Children of the Fourth, Fifth and Sixth School Grades. New York : Columbia Univ., 1910, pp. 133. 250 CARL C. BRIGHAM 13. Bridgman, O. Mental Deficiency and Delinquency. J. of Amer. Med. Assoc, 191 3, 61, 471-472. 14. Brigham, C. C. An Experimental Critique of the Binet- Simon Scale. J. of Educ. Psychol., 1914, 5, 439-448. 15. Buffalo conference. J. C. Bell, C. S. Berry, W. S. Cornell, E. A. Doll, J. E. W. Wallin, G. M. Whipple, In- formal Conference on the Binet-Simon Scale: Some Suggestions and Recommendations. J. of Educ. Psychol, 1914, 5, 95-100. 16. Burt, C. Experimental Tests of General Intelligence. Brit. J. of Psychol., 1910, 3, 94-177. ly. Burt, C. and Moore, R. G. The Mental Differences be- tween the Sexes. J. of Exp. Ped., 1912, i, 273-284, 355-388. 18. Chotzen, F. Die Intelligenzpriifungsmethode von Binet- Simon bei schvvachsinnigen Kindern. Zsch. f. angew. Psychol., 1912, 6, 411-494. 19. Decroly, O. and Degand J. La mesure de I'intelligence chez des enfants normaux d'apres les tests de M, Binet et Simon: nouvelle contribution critique. Arch, de psycho!., 1910, 9, 81-108. 20. Descoeudres, a. Les tests de Binet et Simon et leur valeur scolaire. Arch, de psychol., 191 1, 11, 331-350. 21. Descoeudres, A. Exploration de quelques tests d'intelli- gence chez des enfants anormaux et arrieres. Annee psychol, 191 1, II, 35i;375- 22. Doll, E. A. Inexpert Binet Examiners and their Limi- tations. J. of Educ. Psychol, 191 3, 4, 607-609. 22). Dougherty, M. L. Report on the Binet-Simon Tests given to 483 Children in the Public Schools of Kansas City, Kansas. J. of Educ. Psychol, 1913, 4, 338-352. 24. Dresslar, F. B. Studies in .the Psychology of Touch. Amer. J. of Psychol, 1894, 6, 313-368. 25. Ebbinghaus, H. Ueber eine neue Methode zur Priifung geistigen Fahigkeiten und ihre Andwendung bei Schulkindern. Zsch. f. Psychol. 1897, 13, 401-459. 26. Fernald, W. E. The Diagnosis of the Higher Grades of Mental Defect. Amer. J. of Insan., 1914, 70, 741- 752. 27. Gilbert, J. A. Researches on the Mental and Physical Development of School Children. Stud. fr. Yale Psychol Lab., 1894, 2, 40-100. 28. Goddard. H. H. The Binet-Simon Measuring Scale for Intelligence. (Revised edition) Vineland, N. J. The Training School, 1911, pp. 16. BIBLIOGRAPHY 251 29. GoDDARD, H. H. Standard Method of giving the Binet Test. Training School, 191 3, 10, 23-32. 30. GoDDARD, H. H. Two Thousand Normal Children Meas- ured by the Binet Measuring Scale of Intelligence. Ped. Sem., 191 1, 18, 232-259. 31. GoDDARD, H. H. Three Annual Testings of 400 Feeble- Minded Children and 500 Normal Children. Psychol. Bull. 1913. lO' 75-77- ^ , 7,2. Haines, T. H. Diagnostic Value of some Performance Tests. Psychol. Rev., 191 5, 22, 299-305. 33. Healy, W. The Individual Delinquent. Boston: Little Brown & Co., pp. 830. 34. Healy, W. and Fernald, G. M. Tests for Practical Men- tal Classification. Psychol. Monog. 191 1, 13 (No. 54) pp. 53. 35. HuEY, E. B. The Binet ^cale for Measuring Intelligence and Retardation. J. <^f Educ. Psychol., 1910, i, 435- 444. T,6. HuEY, E. B. A Point Scale of Tests for Intelhgence. Baltimore : Warwick & York (folder) 4 pp. 37. Katzenellenbogen, E. W. a Critical Essay on Mental Tests in their Relation to Epilepsy. Epilepsia, 191 3, 4, 130-173. 38. Kite, E. S. The Binet-Simon Measuring Scale of In- telligence. Philadelphia: Committee on Provision for the Feeble-Minded, Bull. no. i, pp. 29. 39. Kite, E. S. The Development of Intelligence in Children. (Contains translations of nos. 5, 6, and 7). Vineland, N. J.: The Training School (Publications of the Department of Research, No. 11), 1916, pp. 328 AO. Kite, E. S. The Intelligence of the Feeble-Minded. (Translation of three articles by Binet and Simon on Feeble-mindedness) Vineland, N. J. : The Training School, (Publications of the Department of Research, No. 12), 1916, pp. 328. 41. KoHS, S. C. The Binet-Simon Measuring Scale of Intelh- gence: an Annotated Bibliography. J. of Educ. Psychol., 1914, 5. 215-224, 279-290. 335-346. 42. KoHS, S. C. The Practicability of the Binet Scale and the Question of the Borderline Case. Training School, 1916, 12, 211-224. 43. Kuhlman, F. Some Results of Examining a Thousand Public School Children with a Revision of the Binet- Simon Tests of Intelligence by Untrained Examiners. T. of Psycho-Asthenics, 1914, 18, 233-269. 252 CARL C. BRIGHAM 44. Martin, A. L. A Contribution to the Standardization of the De Sanctis Tests. Training School, 1916, 13, 93- IIO. 45. Meumann, E. Vorlesimgen zur Einfiihrung in die experi- mentelle Padagogik und ihre psychologischen Grund- lagen. Leipzig: W. Englemann 1913, Vol. II, pp. 800. 46. Meumann, E. Ueber eine neue Methode der Intelli- genzpriifung und iiber den Wert der Kombinations- methoden. Zsch. £. pad. Psychol, und exp. Pad., 1912, 13, 145-163. 47. Morrow, L. and Bridgman, O. Delinquent Girls Tested by the Binet Scale. Training School, 191 2, 9, 33-36. 48. NoRSWORTHY, N. The Psychology of Mentally Deficient Children. New York: (Columbia Univ. thesis) 1906, pp. III. 49. Otis, A. S. Some Logical Aspects of the Binet Scale. Psychol. Rev. 1916, 23, 129-152, 165-179. 50. Otis, M. The Binet Tests Applied to Delinquent Girls. Psychol. Clinic, 1913, 7, 127-134. 51. Peterson, A. M. and Doll, E. A. Sensory Discrimination in Normal and Feeble-Minded Children. Training School, 1914, II, 110-118, 135-144. 52. PiLLSBURY, W. B. The Psychology of Reasoning. New York: D. Appleton & Co., 1910, pp. 304. 53. Pyle, W. H. a Psychological Study of Bright and Dull Pupils. J. of Educ. Psychol., 191 5, 6, 1 51-156. 54. Rogers, A. L. and McIntyre, J. L. The Measurement of Intelligence in Children by the Binet-Simon Scale. Brit. J. of Psychol, 191 5, 7, 265-299. 55. Ruger, H. a. Sex Differences in the Solution of Mechani- cal Puzzles. (In report of New York branch of Ameri- can Psychological Assoc.) J. of Phil., Psychol., etc., 1914, II, 412-413- 56. ScHMiTT, C. The Binet-Simon Tests of Mental Ability. Ped. Sem. 19 12, 19, 186-200. 57. ScHMiTT, C. Standardization of Tests for Defective Chil- dren. Psychol. Monog., 191 5, 19 (No. 83) pp. 181. 58. Simpson, B. R. Correlations of Mental Ability. New York: Columbia Univ., 1912, pp. 122. 59. Smith, F. O. The Effect of Training in Pitch Discrimina- tion. Univ. Iowa Stud, in Psychol., Vol. VI. Psychol., Monog., 1914, 16 (No. 69) 67-103, 60. Stenquist, J. L., Thorndike, E. L. and Trabue. M, R. The Intellectual Status of Children who are Public Charges. Arch, of Psychol. 191 5. 33, pp. 52. BIBLIOGRAPHY 253 61. Stern, \V. Die dififerentielle Psychologie in ihren method- ischen Griindlagen. Leipzig: Barth, 191 1, pp. 503. 62. Stern, W. The Psychological Methods of Testing In- telligence. (Whipple, G. M., trans, fr. German) Educ. Psychol. Monog., No, 13, Baltimore: Warwick & York, 1914, pp. 160. 63. Symposium on Mental Tests. (Conducted by C. E. Sea- shore under "Communications and Discussions") J. of Educ. Psychol, 1916, 7. (R. M. Yerkes, 163-164). 64. Terman, L. M. Genius and Stupidity. Ped. Sem., 1906, 65. Terman, L. M. The Measurement of Intelligence. Bos- ton: Houghton Mifflin Co., 1916, pp. 362. 66. Terman, L. M. and Childs, H. G. A Tentative Revision and Extension of the Binet-Simon Measuring Scale of Intelligence. J. of Educ. Psychol., 191 2, 3, 61-74, 133- 143, 198-208, 277-289. 67. Terman, L. M., Lyman, G., Ordahl, G., Ordahl, L., Galbreath, N. and Talbot, W. The Stanford Re- vision of the Binet-Simon Scale, and some Results from its Application to One Thousand Non-Selected Children. J. of Educ. Psychol, 191 5, 6, 551-562. 68. Thompson, H. B. Psychological Norms in Men and Women. Chicago: Univ. of Chicago Press, 1903, pp. 188. 69. Thorndike, E. L. The Significance of the Binet Mental Ages. Psychol. Clinic, 1914, 8, 185-189. 70. Thorndike, E. L. An Introduction to the Theory of Mental and Social Measurements. New York : Teach- ers' College, 191 3, pp. 277. 71. Thorndike, E. L., Lay W. and Dean, P. R. The Rela- tion of Accuracy in Sensory Discrimination to General Intelligence. Amer. J. of Psychol, 1909, 20, 364-369. 72. Town, C. H. A Method of Measuring the Development of The Intelligence of Young Children. (Authorized translation of no. 8) Lincoln, III; Courier-Herald Co. 191 3, pp. 82. y2>- Wallin, J. E. W. Experimental Studies of Mental De- fectives. Educ. Psychol. Monog. No. 7. Baltimore, Warwick & York, 191 2, pp. 155. 74. Witmer, L. On the Relation of Intelligence to Efficiency. Psychol. Clinic, 191 5, 9, 61-86. 75. Whipple, G. M. Manual of Mental and Physical Tests. Baltimore: Warwick & York, 19 10, pp. 534. 76. Whipple, G. M. Manual of Mental and Physical Tests. Baltimore: Warwick & York, 191 4, pp. 690, 2 vol. •jy. W[hipple], G. M. The Amateur and the Binet-Simon Tests. J. of Educ. Psychol, 1912, 3, 118-119. 78. W[hipple], G. M. Amateruism in Binet Testing once more. J. of Educ. Psychol., 1913, 4, 301-302. 79. WooLEY, H. T. A New Scale of Mental and Physical Measurements for Adolescents and some of its Uses. J. of Educ. Psychol. 191 5, 6, 521-550. 80. WooLEY, H. T. AND FiSHER, C. R. Mental and Physical Measurements of Working Children. Psychol. Monog. 1914, 18 (No. yy) pp. 247. 81. Wyatt, S. The Quantitative Investigation of Higher Mental Processes. Brit J. of Psychol., 1914, 6, 109-133. 82. Yerkes, R. M., Bridges, J. W. and Hardwick, R. S. A Point Scale of Measuring Mental Ability. Baltimore: Warwick & York, 191 5, pp. 213. THIS BOOK IS DUE ON THE LAST DATE STAMPED BELOW AN INITIAL FINE OF 25 CENTS WIUL BE ASSESSED FOR FAILURE TO RETURN THIS BOOK ON THE DATE DUE. THE PENALTY WILL INCREASE TO SO CENTS ON THE FOURTH DAY AND TO $1.00 ON THE SEVENTH DAY OVERDUE. JUL 291943 16Mar58RI ritiC O Lt ) MAR-119S I 1 1 1 1 1 1 1 1 1 1 LD 21-100m-7,'39(402s) UNIVERSITY OF CALIFORNIA LIBRARY 40 CARL C. B RICH AM studied. The use of the normal grade age as a measure of scholastic ability is false inasmuch as it rests on the assumption that all children enter school at a certain age, which is not the case. The measure of scholastic ability is the measure of the child's reaction to the subject matter of the grades, and that measure may be expressed only in the fact of promotion, non- promotion or (very rarely) double promotion, in other words, it may be expressed only in the relation of grade to the length of time in school. Furthermore, the two measures of scholastic ability, the age in grade method, and the grade progress method, are measures of an historically past performance not of present possibilities, and the true measure of an ability must indicate potential ability. As measures of scholastic ability in terms of actual reaction, these measures present a distribution of general ability that is skewed toward the lower end, or in the direction of no ability. If a child enters school late, he presents a picture of retardation according to the age and grade method, while throug 57 82 Note 9 X-2, Designs 21 37 42 66 Note 10 X-s, Sentence (2 ideas).. 67 89 88 98 Note 11 XI-2, Sentence (i idea)...- 22 46 51 74 Note 12 XI-3, 60 words 63 63 87 Note 13 XI-4, Rhymes 67 63 76 Note 14 Note I. Counting 13 pennies and naming colors given 20 times above II. Not failed. Note 2. Describing pictures given 21 times above II. Not failed. Note 3. Copying diamond given 25 times above II. Not failed. Note 4. Counting from 20 to given 18 times in K. Not passed. Given 31 times above III. Failed once. Note 5. Counting stamps given 15 times in K. Not passed. Given 35 times above III. Failed 3 times. Note 6. Naming days of week. Given 32 times above III. Not failed. Note 7. Giving day and date given 5 times in K. Not passed. Given 56 times above IV. Not failed. Note 8. Naming months. Given 26 times below II. Passed twice. Given 44 times above IV. Failed twice. Note 9. Naming money. Given 26 times below II. Passed 3 times. Given 28 times in VI. Failed twice. Note 10. Copying designs given 33 times below III. Passed 5 times. Note II. Sentence (2 ideas) given 32 times below III. Passed 12 times. Note 12. Sentence (i idea) given 32 times below III. Passed 4 times. Note 13. Giving 60 words given 53 times below IV. Passed 19 times. Note 14. Giving rhymes given 42 times below IV. Passed 26 times. the number of grades taken to reach their maximum. The test of naming the day and date, for example, is failed by all subjects in the kindergarten, 95% of Grade I and 65% of Grade II, while only 4% of the subjects in Grade III and none of those in the 46 CARL C. BRIGHAM higher grades fail it. A sudden increase occurs between Grades II and III showing possibly the influence Of grade training. The tests vary considerably in the degree of their correlation. An easily obtained measure of the degree of correlation is that of comparing the magnitude of the increases from grade to grade. For example, there is an increase of 6i% (96% — 35%) from Grade II to Grade III in the ability to pass the test of giving the day and date, and an increase of 16% (36 — 20%) between the same grades in the test of naming the pieces of money. The former test correlates higher with the influence of grade in this particular case than the latter. In this manner the percentage difference between the per- formance of the subjects in each grade and that of the subjects in the preceding grade was obtained. All the increases or de- creases in ability from one grade to another were thus obtained, these values serving as measures of the amount of correlation between the tests and the grades. 42 differences between the performance of the subjects in any grade and those of the next succeeding grade were thus obtained. In 4 cases there were actual decreases of i, 2, 3 and 4% which were not significant. The difference ranged from — 4% to +61%, the median being + 19.5% (Q=i6.25%). Some of the differences between the grades might be due to the chance superiority of a particular grade. To overcome this chance variation, and to furnish an- other index of the growth of the various abilities, the differences were calculated by steps of two grades, i.e., sulbtracting the per- formance of the kindergarten from the second grade, the first from the third, etc. In this way, 26 differences were obtained varying from +9% to +91%, the median being -{-29% (Q=i8%). Some of the differences noted are undoubtedly high enough to warrant the assumption of the effect of grade training on the tests. Just what tests show this effect is probably a matter of opin- ion. Allowance must be made for the growth of an ability independent of training. 25% of the highest increases from one grade to another were selected as being worthy of special consideration at least. A larger increase must be allowed be- VARIABLE FACTORS IN THE BINET TESTS 47 tween two grades. Those differences were considered worthy of special consideration that exceeded twice the value of the median of the one-grade differences or 39%. This manner of selecting the largest differences is quite arbitrary, but is justified by the outcome, for the tests that show the most significant in- - reases according to this method show those increases in more than one step, so that the evidence is concentrated against a very few tests. In this way the significant values outweigh the less significant values and fair allowance is made for growth from one grade to another. The following list includes the tests showing the greatest in- creases by one-grade and two-grade steps, together with the magnitude of the increases and the grades between which they occur. One-grade steps. Two-grade steps. 25% of largest Increases greater increases. than 39%. -1-61% Date, II to III -f9i% Date, I to III +56% Months, II to III +74% Days, K to II +71% 20 to o, I to III -f-45% Days, I to II +44% 20 to 0, I to II -1-65% Stamps, I to III -1-65% Date, II to IV +37% Stamps, I to II -f 62% Months, II to IV -f-30% Date, I to II -1-55% Days, I to III -|-29% Diamond, K to I +29% Days, K to I -|-46% Money, III to IV -f-28% Stamps, II to III -[-42% Diamond, K to II +27% 20 to o, II to III +27% Pictures, K to I The above lists of increases are confined to but 8 tests. In all, there were 16 tests studied. According to the method of selecting the significant increases, 20 such values actually ap- peared. In this manner the evidence combines against a very few tests. Some tests appear in both lists and more than once in the same list. The most striking growth with grade is shown in the tests of giving the day and date, naming the months, nam- 48 CARL C. BRIGHAM ing the days of the week, counting from 20 to o and counting stamps. The tests of copying the diamond, describing pictures and naming money may or may not show this influence. The evidence is strongest in the case of the diamond test since that appears in both lists. The foregoing method of selecting those tests which correlate with grade to such an extent as to indicate the influence of grade training is not conclusive, owing to the fact that there is also an increase in age from grade to grade. If a test showed a very rapid growth with age, and those ages fell for the most part in certain grades, then those grades would show an increase which might be wrongly assumed to be due to training. The tests of counting from 20 to o is a case in point. Yerkes (82) in Table 32, page 125, gives the percentage values for each test in the Point Scale, for English speaking boys and girls of each age. The test, of the twenty one tests included, that shows the most marked increase with age is that of counting backward, the values being as follows, — age 4=0%; age 5^3.5%; age 6=23.7%; age 7=45.7%; age 8=72.2%; age 9=96%; the values for ages above 9 being 97% or higher. The age in grade distribution of the 301 subjects in this in- vestigation is given in Table 5. TABLE 5 Distribution of Subjects in Each Grade according to Chronological Age. Grades Age K I II III IV V VI Total 4 4 4 5 17 ^7 6 II 28 2 41 7 18 17 2 I 38 8 4 15 18 I 38 9 5 13 II 29 10 I 10 14 18 I 44 11 I 2 3 \6 16 38 12 5 8 II 24 13 '- 4 12 16 14 2 5 7 15 I 3 4 16 I I Total 32 51 40 45 35 49 49 301 VARIABLE FACTORS IN THE BINET TESTS 49 The rapid growth of the abihty in counting from 20 to o, according to the method of comparing the subjects in each grade, was from 9% in Grade I to 80% in Grade III. From Table 5 it may be seen that practically all, (89%), of the chronolog- ical ages in Grades I, II and III were distributed in the ages 6, 7, 8 and 9, a chronological range coinciding with that in which Yerkes' results show the ability to develop. The growth of this ability might be due then either to age or to grade. For this reason, to arrive at any final conclusion, it is necessary to compare the subjects of the same age but in different grades. The treatment of the Princeton results according to this method follows, but the analysis of the data in this manner can have no great reliability owing to the small number of subjects in each group. The number of subjects in each group, (boys and girls shown separately), the average age and mean variation from this average are shown in Table 6. TABLE 6 Number of Boys and Girls of Similar Ages in Different Grades, and the Average Age of the Subjects of Similar Ages in Each Grade. Number Number Total no. Average Mean Grade Age of Boys of Girls of Subjects Age Variation Kindergarten. 5 11 6 17 5.48 0.20 Kindergarten. 6 8 3 11 6^ 0.21 Grade I 6 14 14 28 6.59 0.17 Grade I 7 9 9 18 7.36 0.22 Grade II 7 7 10 17 7.56 0.24 Grade II 8 6 9 15 8.39 0.24 Grade III ... 8 8 10 18 8.60 0.22 Grade III ... 9 5 8 13 9.43 0.16 Grade IV .... 9 5 6 11 9.65 0.13 Grade IV 10 10 4 14 10.39 0.30 Grade V 10 7 11 18 10.54 0.25 Grade V 11 10 6 16 11.54 0.22 Grade VI ... 11 10 6 16 11.53 0.26 Grade VI 12 6 5 11 12.52 0.14 All chronological ages were computed in tenths of a year, so that a variation in age from o.i yr. to 0.9 yr. is possible within so CARL C. BRIGHAM IN (U bo < > M HH 1— I > < > 2>^ IL> bo 00 0\ a a ^ O rn t^ O O 0\ 00 0\ 00 O 13 bO « C en 3 5 O C o 5 c « bo bfl ? .4-1 4-> M -W O O lO o jz: ^ ;z; O m w M N 00 O 00 t^ CO *o 00 n '-I o bo .** < ts 15* s u Ul ^ C/5 3 tn O <0 U Q O t/1 .2 "o bo bfl c c O ™ u :z: E 'bb t/, bfl bo ^'° bfl G C a; '5 3 D a 5 o o nj '^ u u 0^ ;z; ■^ bfl bfl c 2 :: :2^ ;z; o o iz; iz; a-o .5 bfl bfl cs :2 Moo- " ^^ §• j5 i-h ? I— I u .2 VO ^ -^^-^ 8 (U w •-< - ^ 2 bfl uw .S o fC >% en _ g< >. S O rt o "^ bo o c c « .5: > ii bfl'a VARIABLE FACTORS IN THE BINET TESTS Si Note I. Tests of counting 13 pennies, describing pictures and naming colors each given 12 times above II-8. No failures. Note 2. Copying diamond given 15 times above II-8. No failures. Note 3. Counting from 20 to o given 16 times below 1-6. Not passed. Given 31 times above III-8. Failed 4 times. Note 4. Counting stamps given 14 times below 1-6. Not passed. Given 32 times above III-8. Failed 4 times. Note 5. Giving days of week given ^2 times above III-8. No failures. Note 6. Giving date given 39 times below II-7. Passed twice. Given 36 times above IV-io. No failures. Note 7. Naming months given 24 times below II-7. Passed twice. Given 37 times above IV-9. Failed 4 times. Note 8. Naming pieces of money given 35 times below II-8. Passed 4 times. Given 14 times above V-ii. Failed twice. Note 9. Copying designs given 26 times below III-8. Passed 5 times. Given 15 times above V-ii. Failed 6 times. Note ID. Three words in sentence, 2 ideas, given 24 times below III-8. Passed 9 times. Note II. Sentence, i idea, given same as 2. Passed 3 times. Note 12. 60 words in 3 minutes given 41 times below IV-9. Passed 10 times. Note 13. Giving rhymes given 37 times below IV-io. Passed 25 times. each age group. That the subjects of the "same" age but in different grades are not exactly the same is shown in Table 6. The subjects of each age in the higher grades average from o.oi yr. to 0.33 yr. different, with an average superiority of 0.19 yr. This difference, however, is about one fourth that between the subjects or different ages in the same grades, and may be called the same for practical purposes. For convenience, the groups will be referred to as K-5, II-7 etc., the first member referring to the grade, the second to the age. K-5 would mean the group of 5 year children in the kindergarten, II-7, the 7 year subjects in Gr^de II, etc. The actual per cent, that the subjects in each group passed each test was calculated and is shown in Table 7. Unless otherwise noted, the percentages are based on tests given 75% to 100% of the possible number of times. Some of the groups from which results were obtained are too small to have great reliability, but the method is at least sug- gestive. The results of 14 groups are given. It is possible then to compare the results of subjects of 6 ages, (6, 7, 8, 9, 10 and 11), that are in different grades, and also to compare sub- 52 CARL C. BRIGHAM jects in all seven grades that are of different ages, and in this way to determine whether the dominating factor in the growth of any ability is that of grade or age. The reliability of the method rests only on its connection with that of the first method employed. In answer to the question of whether the growth of ability in the test of counting from 20 to o is due to age or grade, a ques- tion which was unanswered by the first method, we may turn to the results shown in Table 7 in which the subjects of each age in each grade are shown. The test of counting from 20 to o was not passed by any of the 5 and 6 year subjects in the kindergarten. Comparing first the subjects of different ages in the same grade, the 7 year subjects in Grade i are 16% lower than the 6 year subjects in that grade, and the 8 year subjects in Grade II are 20% lower than the 7 year subjects in the same grade, the older subjects making a lower record in each case. Comparing the performance of the subjects of the same age but in different grades, the 7 year subjects in Grade II are 63% ahead of the subjects of the same age in Grade I, while the 8 year subjects are 40% ^ ahead of the subjects of the same age in Grade II. Allowing for the retrogression of the older subjects in each group, i.e. assuming that they should have done equally as well as the younger subjects in the same grade, the groups in Grades II and III are still 47% and 20% ahead of the subjects in the grades lower. The growth of ability in this test would therefore appear to be due to grade training. A rapid growth of ability in the test of counting stamps oc- curred between Grades I and III (37% 1-11+28% 11-111=65% I-III), according to the first method, so that the same question arises as in the test of counting from 20 to o. The test was not passed below group 1-6. No growth with age is shown between iThis test was given to but 66% of the subjects in III-8, the experimenters assuming that the other 34% would pass. The score given, 85%, therefore represents the ability of the lowest selection of III-8 subjects, or the most conservative estimate of the ability of the whole group. The same applies to the other tests in III-8 given 66% and 72% of the time. In this way the hypothesis that the tests are not influenced by grade training is given the benefit of the doubt. VARIABLE FACTORS IN THE BINET TESTS 53 1-6 and I-7, but a growth of 31% appears between II-7 and II-8. A growth with grade of 17% is shown from I-7 to II-7 and of 25% from II-8 to III-8. This test shows therefore the operation of the two factors of age and grade training. The improvement in abiHty in the tests of counting 13 pen- nies, describing pictures and naming colors, that was indicated between the kindergarten and Grade I by the first method, would refer to age rather than grade, for a greater increase in each test is indicated between K-5 and K-6 than between K-6 and 1-6. Above 1-6 these abilities are completely developed. It could be maintained that these tests are so completely within the ability of the groups that the effect of train- ing would not be indicated. The test that is best adapted to show the influence of any factor on a group is one that is well within the ability of the group — the influence of the factor will be obscured if the measure is either too easy or too difficult. The test of copying the diamond is a case in point and one well worth study, for it has been attributed to the effect of training by various authors. All the reproductions of the diamond had been scored according to the arbitrary system outlined in the previous discussion of the personal equation. A control on the factor of difficulty was obtained by raising or lowering the pass- ing mark in this test. The percentage passed was calculated for each group for each of the 5 possible passing marks. The re- lations indicated in Table 7, where the passing mark is Group IV, were not changed by this process of raising or lowering the pass- ing mark. In all cases the influence of age was shown between groups 1-6 and I-7, and the influence of grade shown between groups K-6 and 1-6. The test was given to but 59% of the K-5 group, the experimenter assuming that the other 41% would fail, so that the percentages calculated represent the performance of the best selection of K-5 subjects, or, in other words, the benefit of the doubt is given to the hypothesis that the test is influenced by grade training. If the other members of K-5 had failed according to the experimenter's assumption, (and this assumption was quite justified for some had failed to draw the square), 29% of the group would have passed instead of 50%. 54 CARL C. B RICH AM The influence of age indicated in this test is as great if not greater than that due to training. The test of repeating digits, scored by the weighting system previously described, exhibits a slow but uniform progress throughout, the older subjects in each group making records that are about the same or slig'htly lower than those of the younger subjects in the same grade, an increase showing fairly regularly from grade to grade. The most marked increase in this ability appears between K-6 and 1-6, and between I-7 and II-7, possibly indicating that the lack of familiarity with the use of digits in the lowest grades interferes with this test as a measure of auditory memory. The test of naming the days of the week shows the most marked improvement with age (40%) from K-5 to K-6, prac- tically no improvement (10%), from K-6 to 1-6, no improvement from 1-6 to I-7, a very marked increase with grade from I-7 to II-7, a drop from II-7 to II-8, group III-8 marking the complete development of the ability. The test would appear to be due to the combined effect of age and grade. The tests of giving the day and date and naming the months are passed only twice in the kindergarten and first grade, by about a quarter of the sub- jects in II-7 and II-8 without age increase, while the subjects in III-8 shows a most marked increase due to grade. Above III-8 these tests are seldom failed. The test of naming the pieces of money shows a slow growth from 8 to 11, the largest increases appearing from III-9 to IV-9 and from IV- 10 to V-io, improve- ment with grade in each case. Copying the designs from memory shows a growth of 26% from 8 to 11, the development occurring in two age steps, from IV-9 to IV- 10 and from V-io to V-ii. The growth with age cannot be determined in the tests of con- structing sentences from three given words, because they were given to too few cases below the third grade. The results do not show whether III-8 is exceptionally high or III-9 exceptionally low. Both tests show decreases in ability from III-8 to III-9 and from V-IO to V-ii. The ability in the easier test is well within the range of the third and higher grades, showing, therefore, no VARIABLE FACTORS IN THE BINET TESTS 55 improvement. The improvement in the second test develops from 33% to 80% in three steps, correlating with Grades IV, V and VI in each case. The most vital question, that of determin- ing whether or not the language training in the third grade helps to make the construction of a sentence possible, cannot be deter- mined owing to the lack of material in the second grade. The experimenters' assumptions in not trying the test would indicate this fact, but this is not experiment. The same lack of material makes conclusions in regard to the rhyming test impossible. The performance of IV-io is exceeded only by VI-ii. The test of naming 60 words in three minutes shows two decided increases with age and one decided drop with grade. The foregoing analysis is based on a number of subjects in each group too small to have any great significance. The general fact of the correlation of the tests with grade remains, and con- clusions concerning what tests correlate too highly with training can be answered only by considering both methods of study, and by considering only the largest deviations. The two most strik- ing instances are found in the tests of naming the months and giving the date. These tests undoubtedly relate almost entirely to training. Less striking but equally definite is the relation of the test of counting from 20 to o to training. The tests of naming the days of the week and counting stamps show the in- fluence of age to an extent almost as marked as that of grade, so that while the development in these tests is rapid, the grade factor probably exerts only part of the influence. Conclusions concern- ing the other tests are largely a matter of opinion, and the opinion of the writer has been indicated in the detailed discussion. A study of the tests in relation to grade by the first method employed may be made from Schmitt's results. The author gives, in Table I, II, III, IV, V, VI and VII on pages 70, 71, 73, 74, 75, 76 and yy of her monograph, the results of each subject in each grade on each test. From these tables the present writer has calculated the percentage passed in each test. A study of this sort rests for its reliability on the accuracy of the published tables, and the facts indicated by the tables do not always coincide S6 CARL C. BRIG HAM with Schmitt's discussion.^ The writer has followed the tables rather than the discussion in calculating the results. In the VIII-2 test where an alternative rank is given for counting from 10 to o instead of 20 to o, the writer has considered success in counting from 10 to o as a failure in counting from 20 to o. In the line suggestion test Schmitt recognizes two types of failure, the typical failure according to Binet of accepting the suggestion of the first three lines, and the failure due to the fact that the subject actually judges the lines unequal after studying them. The sec- ond type of response Schmitt marks as passed, using a special symbol. The writer has calculated these percentages separately, entering the first or Binet type of response under "Line sugges- tion A" in the table, and the second type under "B." The V year and Adult tests were omitted. All of the other tests were in- cluded that had been given over 70% of the possible number of times. Unless otherwise noted, each test was given 100% of the possible number of times. Table 8 shows the per cent, that Schmitt's subjects in each grade passed each test in Binet's 191 1 scale (Town's translation with modifications). The table is given with the reservation that the tables from which the per- centages were calculated might contain misprints, and that the writer's interpretation of the tables might be at fault. Inasmuch as there are many differences in procedure in giving the tests, and in the character of the schools tested, the results of the two investigations are not comparable in respect to the per- centage passed in one grade in one study with those in the same grade in the other study. The method used in determining the 2 In the discussion (page 69) Schmitt gives 15 subjects in the kindergarten failing test VII-4. Table I shows 13. On the same page she gives 24 sub- jects failing VIII-4. Table I shows 22 failing. In discussing the results of Grade I (page 72) Schmitt states that there is "more than 50% of failure with the discrimination of weight", while Table II shows 35% failure. Again, the tests referred to specific school instruction by Schmitt are VII-4, VIII-4, and IX I, 2, 3 and 4. On page 72, in discussing the results of Grade I, she says "the tests below ten years which depend upon specific instruction are usually not passed except the VII-4 test. The percentages passed are as follows: VII-4 = 85%; VIII-4 = 45%; IX-i = 35%; IX-2 = 75%; IX-3=90% ; IX-4=30%. "Usually not passed" includes, therefore, tests passed 75% and 90% of the time. VARIABLE FACTORS IN THE BINET TESTS 57 TABLE 8 Number of subjects VI-i, Distinguishing morning, afternoon 2. Defining in terms of use 3, Copying diamond Counting 13 pennies Choosing prettier of faces Showing right hand Describing pictures Executing 3 commissions Counting stamps Naming colors Comparing remembered objects Counting backwards from 20 to Indicating omissions in pictures Giving day and date Repeating 5 digits Making change Defining in terms superior to use Naming pieces of money Naming the months Comprehending easy questions Arranging 5 weights Copying designs Detecting absurdities Comprehending difficult questions Constructing sentence. Two ideas Resisting suggestion, A. (Binet scoring) B. Judgment error counted plus Constructing sentence. One idea Giving 60 words in three minutes Defining abstract terms Reconstructing dissected sentences Repeating 7 digits Rhyming words with "obey" Repeating a sentence of 26 syllables Interpreting pictures Solving problems from various facts 4: 5 VII-i 2, 3 4: 5. VIII-i 2, 3 4 5 IX-i 2, 3 4: 5 X-i 2 3 4: 5 XII-i 2, 3 4: 5 XV-i 2 3 4, 5 rade Passed Eacl i Test. 150 Subjects. Grades K I II III IV V VI 25 20 17 21 22 22 23 96 100* 92 94* 76 94* 92 100* 92 100* 92 80 100 72 65 81 92 95 100 48 85 100 96 100 100 92 100 100 100 100 40 85 94 95 100 100 95 94 100 100 12 45 94 100 100 64 85 94 100 100 6* 35 71 95 86 100 39* 75 65 100 95 100 28* 90 94 100 100 100 6* 30 71 95 95 95 61* 100 100 95 100 100 65 41 57 50 64 ID 35 57 45 32 60 88 100 100 100 85 100 100 100 100 65 76 100 100 100 ng) 64* 76 52 100 41 86 14* 100* 100 57* 71 95 95 100* 100 43* 82 62 100 95* 96 7* 29 52 7Z 95* 100 0* 6 10 23 81* 62* 86* 10* 14* 62* 78 78 70 17 70 70 Note. — All tests except those marked (*) were given all the possible number of times. The VI year tests were given 90% of the time in Grade I, the IX year tests 72% of the time in the kindergarten, the XII year tests 70% of the time in Grade I, and the XII and XV year tests 95% of the time in Grade V. 58 CARL C. BRIGHAM correlation of the tests with grade is the same as that used in the first method of treating the Princeton data, that of comparing the differences between grades by one-grade and two-grade steps, of selecting an arbitrary standard for detectjing exceptional growth, and of comparing the resulting lists. The differences between the performance of each grade and the next succeeding grade were calculated. These differences, lOO in number, ranged from — 24% to +62%, the median being +5% (Q=rio.75%). The run of differences differs from that found in the Princeton study in two respects, in having a lower median and variability, and in containing more minus deviations. The lower median and variability is due to the fact that the tests were given over a wider range, the Princeton tests being given only on the "up slope" of the growth curve, or not being given when the tests were any distance above or below the probable range of ability of the group. The Princeton results showed only 4 minus deviations of 4, 3, 2, and 1% respectively, while Schmitt's results show 15 such deviations, 6 of them being 10% or over. These deviations are probably due to the smaller number of subjects, and if due to chance, should be counteracted by the precautionary measure of combining the indices of correlation into two-grade steps. 71 two-grade differences were obtained ranging from — 25% to + 82%, the median being +10% (Qz=zi6.5%). 4 meas- ures were still in the minus direction, one of these,' — 25% (Design III to V) is probably significant, the other values of — 6%, — 5% and — 4% having no significance. Inasmuch as the variability of the series is lower, those differences were considered to be worthy of special study that had the value of 2Q+M, or were in excess of the interquartile range plus the median. The lists of tests that appear as showing marked growth with grade according to the two methods are as follows : VARIABLE FACTORS IN THE BINET TESTS 59 One grade differences Two grade differences higher than 2Q-J-M higher than 2Q4-M +62%, IX-3, Money, K to I +82%, VIII-4, Date, K to II +58%, XII-s, Dissected, IV to V +7i%, XII-s, Dissected, III to V +49%, VIII-4, Date, I to II +66%, IX-3, Money, K to II 4-457c, VIII-2, 20 to 0, K to I +65%, IX-4, Months, K to II +41%, IX-4, Months, I to II +65%, IX-4, Months, I to III +65%, IX-i, Change, K to II +60%, IX-i, Change, I to III +39%> IX-5, Comprehension, K to I +39%, XII-3, 60 words, I to II +38%, XII-3, 6g words. III to IV -f55%, XII-5, Dissected, IV to VI +2>7%, VII-4, Stamps, K to I +55%, VIII-4, Date, I to III +36%, IX-2, Definitions, K to I +54%, VIII-2, 20 to o, K to II +36%, IX-I, Change, I to II +52%, VII-4, Stamps, K to II +35%, IX-2, Definitions, II to III +47%, X-2, Design, I to III +45%, XII-4, Abstract Def., I to III +29%, IX-I, Change, K-I +44%, XII-4, Abstract Def., II to IV -\-32%, VIII-4, Date, K to I +29%, IX-I, Change, K-I +28%, X-3, Absurdities, I to 11 +437o, XII-4, Abstract Def., Ill to V A Study of the above lists shows, as in the similar study of the Princeton data, that although the method of selecting the exceptional tests is an arbitrary one, the method is justified in practice, for only a few tests (13) appear in the lists as signifi- cant. In all, there were 34 tests' studied, and 30 differences were considered large enough to be significant. These 30 differences were confined to 13 tests. The tests of naming 60 words and defining in terms of use drop out of the first list owing to the elimination of the errors of negative correlation. The design test is both positive and negative, the ability increasing from Grades I to III and decreasing after III. The test of defining abstract terms appears according to the second rrtethod because the ability increases with grade from 7% in I to 95% in V by 3 No differences were calculated from the line suggestion test owing to the possibility of misinterpreting the symbols. Schmitt notes the difference in the character of the responses from the suggestion error to the judgment error in passing from Grade II to III. The scoring of the suggestion error in the tables shows an inverse correlation with Grades II, III, IV and V, and a sudden change again from 14% in Grade V to 100% in Grade VI, so that there is probably a mistake. The scoring of the responses to this test according to the strict Binet ruling would make the "mental ages" lower, for many cases would then have basal X. 6o CARL C. BRIGHAM increases of approximately 25% in each grade. No conclusions may be drawn concerning the easy comprehension test and the absurdities test. The 20 remaining differences are confined to 7 tests, those of naming the day and date, naming the months, counting from 20 to o, counting stamps, naming money, recon- structing dissected sentences, and making change. The first four were included in the five found to show the most marked influ- ence of grade in the Princeton study. The test of naming the pieces of money did not show a marked relation to grade in the latter study, but this difference might be one of school curriculum. The test of naming the days of the week is not included in Binet's 191 1 scale. In the Princeton study alternatives were used in the making change question so that no data from this test were included in the quantitative study. These data show the ability in this test developing in the second and third grades, the test being passed only twice in the kindergarten and first grades, and generally passed above the third. The data in the test of reconstructing dissected sentences show very few passing the test below grade V with approximately three fourths passing in V and VI. In so far as the Trenton experimenting was applied to a few subjects in the regular grades below the seventh, this test was rarely passed in the third and fourth grade, passed about 5 % in V, and almost universally passed in VI, VII and VIII. The number of subjects in each grade is small in the Trenton experiment, but each test was separately scored, i.e. each part of the dissected sentence test, each part of the absurdity test etc. Each of the three parts of the dissected sentence test showed the same growth between the same grades, and this growth was more marked than that in any other test. The evidence concerning these two tests, therefore, supports the evidence from Schmitt's results. The quantitative analysis of the Princeton data and Schmitt's data would indicate that the tests of counting stamps, counting from 20 to o, naming the days of the week, giving the day and date, naming the months, naming the pieces of money, making change and reconstructing dissected sentences were influenced to a considerable extent by grade training. The performance in VARIABLE FACTORS IN THE BINET TESTS 6i certain of these tests (days, date and months) may be the result of specific school training in the tests themselves, while others (perhaps the tests of counting stamps, counting from 20 to o, and reconstructino- dissected sentences) may involve a transfer effect in the application of the content of the grade in a new way. The fact that the tests correlate very highly with grade training does not show that the tests are worthless, but it does show that they should, perhaps, be placed in another scale, or should at least be placed on a different footing than those that test capacity irrespective of attainments. One of the best tests* of intelligence is the determination of what an individual can do with the training he has received, but tests of this sort rest on the assumption that the individual's opportunities have been determined. The importance of tests of information in cases of alienation presenting a picture of deteri- oration is recognized. The important change to be made is not the elimination of such tests from intelligence scales, but their standardization on a different basis. The diagnostic value of such tests rests not in the mechanical memorizing of a time series such as that of the months, but in the ability to apply such a series. In pointing out this fact Katzenellenbogen (37) suggests that the months test be given in some such manner as *Tf somebody asks you in November to return three months later, what month would it be?" Decroly and Degand also suggest that the mechanical tests of counting and naming the days of the week and months be modified in some such manner. * The writer recalls two cases in which the failure in tests which involved the application of training was very significant. The first was that of a woman of about 30, a parole patient in a hospital for the insane, who had never shown any marked symptoms other than a history of intellectual in- feriority. This patient passed practically all of the Binet tests in the IX, X and XII year groups, but failed completely in the test of making change. This observation was later checked up. Another case of a woman of 22, in the same hospital, presented a border-line psychoneurotic picture perhaps, but no marked symptoms other than a history of intellectual inferiority. She passed in a great many of the difficult tests in the upper years but had great difficulty in telling time. Both cases had lived under very good home con- ditions and had mingled with people of ability. A great many tests of capa- city were given, but the most illuminating evidence of their mental status came from the two tests mentioned. 62 CARL C. BRIGHAM Comparing the conclusions of this study with other investiga- tions, the agreement is fairly close. Schmitt's results do not support her suggestion that the definitions test relates to specific school instruction. The other tests which she refers to this factor (stamps, date, 20 to o, change, months and money) show the influence to a marked extent. Binet in classifying some of the tests referred the tests of copying a sentence, reading for memories, writing from dictation, copying a diamond, counting backwards and making change to scholastic training. The first three tests were not included in this investigation. The diamond test showed the influence of age to be as great if not greater than that of school training. The last two tests showed a marked influence of training. Binet referred the tests of counting 13 pennies, naming four colors, naming the days of the week and enumerating the months to home training. The last two showed a marked influence of school training. The results of the present investigation agree with those of Chotzen in finding no efifect or very little effect of training in the tests of copying the diamond, repeating digits, describing pictures, counting 13 pennies, naming colors, comparing remembered objects, defining in terms of use and superior to use, and in finding marked influence of this factor in the test of naming the days of the week. The methods used in analysing the results, especially the sec- ond method, reveal several suggestive relations between the tests and the school grades. There is a general correlation be- tween the tests and the grades, a correlation that is very necessary to establish, for there is also a general correlation between intelli- gence and grade. In analysing the results of the individual tests by comparing the results of subjects of the same age in different grades, and of subjects of different ages in the same grade (Table 7), it was seen that, as a general rule, the growth in any particular ability occurred in passing from grade to grade, not in passing from age to age within one grade. In fact in only half of the cases in which the subjects of two ages in one grade may be compared do the older subjects make records that are higher than those of the younger ones, and only 10% of these gains are over 20%. If the groups were considered to be equal in all VARIABLE FACTORS IM THE BINET TESTS 63 cases in which their records were within 10% of each other, equality occurs in exactly 50% of the cases. Of the remainder, 20% of the groups were lower, while in only 30% of the cases are the older subjects actually higher than the younger subjects of the same grade. Some of the cases of retrogression could well be accidental, but they occur too frequently to be due entirely to chance. Applying the same general method to the cases in which groups of the same age but in different grades were compared, 5% of the groups in a higher grade showed lower scores, the results correspond in 43% of the cases, while 52% showed definite improvement. This might indicate that there is a higher correla- tion between the tests and grade than between the tests and age. The fact that the comparison of children, of different ages in the same grade showed the older children making lower records in 20% of the cases, equal records in 50% of the cases and higher records in only 30%, would confirm the general diagnostic value of the tests if Bonser's interpretation of this phenomena is cor- rect. Bonser (12) applied various sorts of reasoning tests to children in the fourth, fifth and sixth school grades. In summarizing the results of the tests in the different grades, he says, "In the contrast with grade progress and progress with age, in the generally superior showing made by the younger groups of children of any grade when contrasted with the older pupils of the grade, and in the fairly substantial percentage of pupils from lower grades found in the highest quartile of ability for all, it is shown that native capacity is measured to a high degree by the tests." In conclusion, the results shown in this chapter would indi- cate a correlation between the individual tests studied and the school grades, this correlation being high enough in some cases to show the actual eft'ect of training. In answer to the general objection that since one demonstration of the accuracy of the tests rests on their correlation with school grades, the school grades are the real measure of intelligence and the mental tests superfluous, it is only necessar}' to point out that intelligence tests, besides affording the opportunity for accurate standardization, 64 CARL C. BRIGHAM also detect the subject's potential abilities independent of his past performance. The school measure indicates mental defect in cases of gross retardation, but it does not indicate exceptional ability. Schmitt's contention that the school represents a standard environmental situation, and a measure of a subject's ability should include a measure of the adequacy of his reaction to this situation, is well founded. It is not, however, a criticism of the Binet scale, for the scale aims to test native capacity. At the Buffalo conference (15) on the Binet scale, the following ques- tion was raised, — "What is it, after all, that the scale aims to test?" The question was answered by "We believe that current misconceptions as to the aim of the scale should be removed. It is not intended to test the emotional or volitional nature, but primarily intelligence (judgment)." To this list might be added the assertion that the scale was not intended to test a child's reaction to the school situation, or to furnish an outline for taking a record of his life history, Rogers and Mclntyre (54) would also have mental tests in- clude tests dependent on both school and home training. This general trend of present day discussion is a reversion to Binet's 1908 type of scale, a tendency to which Binet was in opposition. The probable solution rests in eliminating from the scale the tests involving training, and in constructing a standardized scale of another sort for the estimation of the individual's reaction to the school situation in terms of the length of time that he has met that situation. That such a scale is not a matter of speculation is shown by the number of scales now on the market for measur- ing handwriting, spelling, composition, arithmetical ability, etc. Tests of native capacity and tests dependent on school and en- vironmental training cannot be standardized on the same basis, for they are essentially different measures. Measures of the first sort may perhaps be correlated with age, while measures of the other sort can be correlated only with opportunity. V. SEX DIFFERENCES The investigators who have studied the influence of sex differ- ences on the Binet-Simon tests have used two methods, that of comparing the "mental ages" or total scores of subjects of each sex, and that of comparing the per cent, that the subjects of each sex pass each test. The first method throws no light on the individual tests, inasmuch as one sex may be superior in one test and inferior in another so that the total score will balance the influence of this factor. Inasmuch as the scale is founded on the principle that sex differences do not exist, it is important to study the individual tests, and to determine the accuracy of this assumption. The Princeton data are available for a study of this sort. 352 subjects (187 boys and 165 girls) between the ages 6 and 12 were examined. The method of study adopted was that of com- paring the results of non-selected boys and girls of each age, and, as a check on this method, of comparing the results of selected boys and girls of four ages. Inasmuch as the subjects of each chronological age are dis- tributed over a range of one year (the 6 year subjects for exam- ple being distributed from 6.0 to 6.9), the actual average age of the subjects of each age was computed to make sure that no differences might appear due to the chance selection of subjects at either extreme. These averages are shown in Table 9. TABLE 9 Actual Average Chronological Age of Boys and Girls in Each Age Group. BOYS GIRLS Number of Average Age Number of Average Age Subjects (M.V.) Subjects (M.V.) Age 6 Zl 6.58 (0.20) 2Z 6.51 (0.20) Age 7 29 7.50 (0.29) 31 7.39 (0.26) Age 8 24 8.48 (0.29) 28 8.48 (0.22) Age 9 20 9.46 (0.27) 22 9.54 (0.26) Age 10 31 10.46 (0.25) 23 10.37 (0.30) Age II 28 11.59 (0.22) 20 11.52 (0.27) Age 12 18 12.43 (0.30) 18 12.57 (0.24) 68 CARL C. BRIGHAM for girls 0.S3 yr. (from 9 to 10), while the maximum increase for boys is 1.13 yr. (from 10 to 11), and for girls 1.15 yr. (from 10 to 11). A more marked lack of regularity in the growth of scholastic ability from year to year as measured by the average grade is shown in Table 11, no increase being shown by the 12 year boys over the 11 year boys, while the 10 year boys show an increase of 1.44 to i.oi grades over the 9 year boys. In the same way the 10 year girls show an increase over the 9 year girls that is nearly three times that of the 7 year girls over the 6 year girls, while the increase of the 7 year girls over the 6 year girls is twice that of the 12 year girls over the 11 year girls. These relations indicate that the selection of subjects is not uniform at each age. The subjects of any one age may be either a superior or inferior selection of all children of that age, and there is no reason for supposing that this random sample of superior or inferior subjects of any age will correspond to a similar sampling of the subjects of the opposite sex of the same age. The process of calculating the percentage that the boys and girls of each age pass each test is extremely simple, but the conclusion, that the differences found between the percentage passed by the sexes at each age may be attributed to sex differ- ences, is not justified unless all the variable factors are known. A previous chapter showed variations in the tests due to the influence of the personal equation of the experimenters. To avoid this variable influence, only those tests were studied that showed that they were free from the influence of this factor. Inasmuch as each experimenter examined approximately the same number of boys and girls of each age, any influence of this factor would be equalized, provided, of course, that there were no differ- ences in the reaction of the experimenters to the two sexes. In the detailed study of the design test, it was found that experi- menter C was more lenient in marking girls than boys. The possibility of a similar interpretation in a few other tests was suggested, but not demonstrated. In analysing the results for sex differences, however, the possibility of such an interpretation must be kept in mind. Another possible source of error is that due to incomplete data. VARIABLE FACTORS IN THE BINET TESTS 69 The experimenters, in giving the tests, would give only those within the approximate range of the subject, so that each test would be given to a superior selection of children below the normal range of the test, and to an inferior selection of subjects above this range, a process tending to make the apparent growth of an ability less than the probable real growth. In comparing the results of the sexes, however, it is not necessary to have ac- curate results on the growth of an ability, but results which have the same determining factors. If the experimenters gave the test to approximately the same proportions of boys and girls at each age, a comparison of the percentage passed is legitimate, even if a small proportion of the whole group were actually tested, for the proportion would include the same selection of subjects. The number of boys and girls at each age, and the percentage that each test was given to these subjects are shown in Table 12. The test of counting 13 pennies, for example, was given 2,7 times to 6 year boys, or 100% of the possible number of times, while the test of counting from 20 to o was given 27 times to the same group, or 73% of the possible number of times. Column A shows the total number of times each test was given to all of the boys and girls. Column B gives the average age of all the boys and girls to whom each test was given. The average given in this case is not the actual average derived from the actual chronological age of each subject figured in tenths, but the weighted^ average, the whole numbers 6, 7, 8, 9, 10, 11, and 12 being used. Table 12 shows a very close correspondence between the per- centage that each test was given to boys and girls of each age, so that the error due to incomplete data, though present, is present to the same extent in the results of both sexes, and may be disregarded. A fairly close correspondence in the average age of all the boys and girls to whom each test was given is also indicated in Table 12. In the test of counting stamps there is an ^ For example, in the test of counting 13 pennies, the average age of the boys to whom the test was given is, — (37 x6) + (28x7) + (i6x8) + (8x9)-K7xio) + ( 3xii) + (ixi2) _ _„ ^^^^^ __ 7.33 years 72 CARL C. BRIG HAM Neither method, then, is entirely satisfactory, the first because it would tend to exaggerate chance differences, the second because it would tend to obscure real dift'erences. The method used in this study is that of comparing the results of non-selected and selected subjects of each age and sex, studying first the general growth of each ability from age to age within each sex, and using the per cent, that all subjects pass each test to determine the cor- relation between the results of non-selected and selected subjects. Table 13 shows the percentage of proportion^''' that the boys and girls of each age pass each test, the percentage that all boys and girls pass each test, the actual percentage that the boys are su- perior to (-J-) or inferior to ( — ) the girls of each age, the differ- ence between the average age of all boys and girls to whom each test was given, and the difference between the percentage that all boys and girls pass each test. The dift'erences between the performance of the boys and girls at each age have no meaning unless the general growth of the abilities in each sex is first understood. Studying first the re- sults of the 187 non-selected boys shown in the first seven columns of Table 13, it may be seen that the growth of ability in each test is rather irregular. The test of naming the months, for example, shows a slight decrease from 9 to 12. The differ- ences between the percentage performances of the subjects of each age and those of the preceding age were calculated. The 12 year group, compared to the 11 year group, is -\-ii% on the test of giving the date, — 9% on the test of naming the months etc. 61 differences were thus obtained, varying in magnitude from — 15% to +36%, the median being +8% (Q^975%)- 13 of the deviations (21 % ) were minus values. The largest nega- tive deviations occurred in the tests of naming colors ( — 15%, 7 to 8), naming money ( — 15%, n to 12), and constructing a sentence containing two ideas ( — 13%, 8 to 9). The remaining 10 minus deviations were less than 10%. ^ The proportion given is the number of times a test was given over the number of times a test was passed. No percentages were calculated for tests given less than 12 times, and no percentages are given for the defini- tions tests on account of the small number of times they are given to all subjects. W (J WW <^ '-^ P- W o pq pL, Ov LT, t-1 -r CO vi^ o vc i-t fc T ro 00 '^ T t^ UI 3DII9J3^IQ 1 + + T 1 + 1 T 1 T (NC^-J-WOOOOvSO + + fO + + fo t^ T + + saSe aSeaaAB ^ ^ ^ III aouaaajgiiQ! 1" 1 1 1 1 1 1 -I + + + + + + + + + n; w O vO 00 i-< vo Cv U-) + 1 T T T OJ 1-1 + + 7 V, (U M; r^ w ON CO 00 w i-i vO M t« 04 1 T T + 1 T + 1 1 1) o TJ- w O to 00 CXD *-" OJ ON IT) tN» (N rt PI « hH t-H S 2f 1 + + 1 1 1 1 1 1 + + + 1 s =« ^ ^ On! CO O CO c^ ^ t^ ^ -rt TJ- hH Ov oj J3 ( h-* t-H 01 HH + ( 1 + 1 1 + + + + 1 CO >0 O 0< 0>i'*l>.O00MD (^ r^ rx VO (A " ►I 0< M w 04 H* H^ o c i ■ 1 + + 1 + + 1 I 1 1 + + + 1 Tt 1^ r^ ro r^ t^ c^ vo r^ N ^^ (Ll ! 1— t !C i 1 + + + I + 1 T 1 + + P 1 '-' IN CO 01 I- 01 r 1 1 1 1 + 1 1 1 spafqns ii« Q 0\ COOO O\0O fOlS.N O INV3 C00^«^'>^'*>J^i-< '^-^OOOOo O ts. -^O 00 ON O O\0o t^ txOO O Tt- >o tn '5h Tj- "^VO Oo vO SD UTO "It "^ N ^c o -to t-^^jO^ooc^ i^i-H a 10 00 <\1 HI \ W w W^\\ w r^ t^W 5 o \o iri t^ Oj 10 t^oo vo -"^vo 'OVO 00 ^^^^^v,^V^ 1 i-cc\j(-.f\)n(\)Mf\400 >r>oo t^ oo >i^ ►-< i-, 1-1 CO fVj 00 (\| co'VjCN'^'N'^wfo f^NO ro'^cnO inl^ONO ^^ N OVO 'ss f^l "^00 ts.tx.'^O 'r^ '^M N M i \\W\W:v^»,\.WO tv.\,-^00 O t^ 0\ t^vQ cc '^ tx°o m «*i r^°o vo kj \.^^^^^-^ ; t*2 <\i n ^ ■^ ■<*• CO *» •-< t^ 'sKx •"^ 1 o t^N3 00 t^Ost^t^t^OsTi 0\°0 O 'mi-: C 01 o vo *^ i-i '. Ov UO 0^ CO .^ -^ (viO voir). t^ O 0\ O* N Ov CJNOo O O irjOp VO t^OO O ^^ K -"^Oq c^ «V) lO i-, t>. fV) p< H \o \^~^~^^v^ 0\ t^^\. iri VI Ix K IT) •<»-00 O 00 O t^XD ro'^'^'^OO N <^Jf': V^ Tj- tx ■<»■ o\\ «D\^^ w H vp >n ^ 00 O CTiO Ov Ov N D ^»\^^^^ M Ci M u-j w »r> C3 M H HI H o ■*o ^■^r^N « ou^H 0^ imn'^m-) fv^oo ^ m =o CM^OO O UT^VO i^ c^ ■<*• t^Oo ^ >0 -^^J rr> ^ ^ ^CC •O 01 tNl -^ \"\--^ '-I *-<"-->^ c 1 H H OnOnvO 0\\ 0< K o H 01 t^ I-I 1-1 1 1-1 1-1 u i 1 CO tN. Q fV) fO'O \0 f»iuT\ O\OOv <000 ^j «^ ON Os 5 00 ON t^ 0\ Os\^^00 00 w «\1 ^i ■^ O ^O N. r^Oo OO-tON"T~i'N'-Hir;t>fOt^l^'V)0l'~.OO'-i P K, '~< CO 0<^ l-l Or) \0 ooooo<^int^t^C! f\jo^>-<>-if\joo'^-i-i'~i ^^^ ^N^ ^^^ ^\ ^\ ^^^ ^^--^--^- \^^ 1-^ '-^ ^--"\ '- ►>! cs c d-o I": VO f^,-0 -T t^ 00 '■'i\^^ -t '* 1-1 1^ cr cr. v. r. cr. .y. x v. c/: c/; c/; c/i c/i c/; t/: c/: c/j c/i tr. C/-. U-. CA J-. V. CA t/5 tift CTj CTi IT. m rA (A lA (A W ^T^ ^.T" ^*I" ^'"u ^Hu '^*u *'^^ '^>'T2 ^1^ ^'T^ ^T! '^•'^ '^TI ^1^ ^'C ^i*^ ^''u ^'u o.:: c- o.::; ci: o-^ o.S o.b; 0.2 o-b c.i: o± c- o-^ o.s o.n o.s 0.= 0.2 pqOpqC!P5 0mOP30pgOPPOP30P30pqOPQOP3UP;Op30PPCP30mOfqO ^.a-^ i^ bo Q bo ^ a r. C 0,.B ^. be i>^ -O bo (/I ^J <^ ^«=: -. ^ f* dj 3 lU +-1 UPUl^lUOCii^O^ ^ U ro CO ^ Q Q 74 CARL C. BRIGHAM An index of the growth from year to year was obtained by calculating- the average percentage increase from one age group to another. For example, the 7 year boys were 26% higher than the 6 year boys in the test of naming colors, 5% higher in naming the date etc. The average of the 10 possible comparisons between 6 and 7 year boys shows that the latter averaged 16.1% higher than the former. The average increases in percentage passed from year to year are as follows, — 6 to 7=16.1% ; 7 to 8=13.5% 8 to 9=8.7%; 9 to 10=11.2%; 10 to 11=6.0%; and 11 to 12=0.2%. These figures show strikingly the irregularity of the growth from age to age. Comparing these average percent- age increases in tests with the averages shown in Tables 9 and 11, there is no observable relation between this increase and the in- crease in average age from age to age, or the increase in average grade from age to age. The smallest increase in the tests (0.2%, II to 12) coincides with the smallest increase in average age from year to year (0.84 yr.), and the smallest increase in average grade from year to year. The other relations are varied. The fact of the variability in the results of the non-selected boys stands out. The irregularity of the growth of the various abilities, and the fact that in 21% of the cases the boys of one age are actually lower than those of the previous age, point to the conclusion that certain allowances will have to be made for chance variations. It is not possible to acccount for the varia- tions in growth by reference to the relative increase in average age or average grade from year to year. The results of the 165 non-selected girls, shown in italics in the first seven columns of table 13, were studied in the same manner as the results of the boys. 60 differences between the percentage performance of the girls of each age and those of the preceding age were obtained. These differences ranged from —2>Z% to +50%, the median being 7% (Qr=8%). 10 of the deviations (17%), were minus values. The largest deviations were shown in the tests of naming 60 words, ( — 33%, 11 to 12), counting stamps ( — 20%, 9 to 10), and drawing designs VARIABLE FACTORS IN THE BINET TESTS 75 ( — 14%, 8 to 9). The remaining 7 minus deviations were below 10%. The average increases in the percentage passed from year to year are as follows, — 6 to 7=3.9%; 7 to 8=15%; 8 to g= 8.8%; 9 to 10=10.1%; 10 to 11=8.7%; II to 12=1.8%. Both boys and girls show the smallest average increase in the percent- age passed in the step from 11 to 12, and the magnitudes of the increases agree fairly well except for the step from 6 to 7. The increase of the 7 year girls over the 6 year girls is 3.9%, the next to the smallest increase of one age group over any preceding group. The 7 year boys, however, show an average increase of 16.1%, over the 6 year boys, the largest increase of any group of boys over any preceding group. It will be difficult, then, to draw conclusions concerning sex differences from a comparison of the 6 year boys and girls, for the 6 year girls are either a superior selection or the 6 year boys are an inferior selection if the character of these groups be judged by the comparison with the 7 year subjects. The same comparison, on the other hand, might indicate that the 7 year girls were an inferior se- lection and the 7 year boys a superior selection from the general run. It is only possible to point out the irregularity, however, it is not possible to show the cause of the irregularity. A comparison of the average increase in the percentage passed by girls from age to age with the increase in the average ages shown in Table 9 shows no demonstrable relation to exist. Com- paring this growth in the ability on the tests with the growth in average grade, shown in Table 11, shows a very positive relation to exist between these factors. Where the increase in average grade is smallest (i.e. from 6 to 7 and from 11 to 12), the in- crease in the tests is smallest (3.9% and 1.8%), while the great- est increase in grade (from 9 to 10 and from 7 to 8) coincide with the greatest increase in the test abilities (10.1% and 15.0%). This relation was not indicated in the results of the boys. The explanation of this fact that a correlation between the increase in the tests with grade was found in the results of the girls but not of the boys is a matter of speculation. It' has been shown that the boys have a higher variability in grade than 76 CARL C. BRIGHAM ■ girls. This tendency of the boys to be distributed in a wider range of grades might nullify the grade correlation slightly, but probably not to any considerable extent. The fact that the causes of this variation are not determined serves to illustrate the dangers of comparing the results of two groups when the factors operating on the groups are not known. The foregoing study of the growth of the various abilities from age to age in each sex, and the analysis of the causes in- fluencing this growth, demonstrates the great variability of the results. This fact of variability must be considered before draw- ing conclusions concerning sex differences by the method of comparing the results of boys and girls of each age. The percentage differences between the performance of non- selected boys and girls of each age are shown in Table 13. In actual magnitude, these differences vary from 0% to 36%, the median being 9% (Q=5.5%). 75% of the differences are 17% or under, and only 16% are over 20%. In regard to sign, the differences vary from — 36% to +26%, the median being — 3.5% (Q=8.75%), showing a slight general superiority of the girls. If the number of possibilities of variation in compar- ing the results of small groups of non-selected subjects are taken into consideration, the presence of mental defectives, of subjects having language difficulties, of subjects in different grades influenced by different training, the possibility of a super- ior selection of subjects at one age group than at another, and the probability that similar chance samplings would not fall at the same age, the fact of correspondence indicated in Table 13 has more meaning than the fact of divergence. The variability indicated in the study of the growth of abilities with age was so great that it makes interpretation of the results in terms of sex differences very difficult, and warranted conclu- sions impossible. It is legitimate to expect that the older subjects of either sex should make higher scores than the younger sub- jects of the same sex, but this was not found to be the universal rule. The boys' results showed minus deviations in 21% of the cases and the girls' results showed minus deviations in 17% of the cases. In one case the 12 year girls were 33% lower than VARIABLE FACTORS IN THE BINET TESTS 77 the II year girls. If this value (33%) be taken as the error due to chance variation, then only one value, that of — 36%, (naming the months, age 12), may be taken as significant, and it has been seen that in this test the 12 year boys are 10% lower than the 9 year boys. The conclusion would follow, then, that there were no sex differences. This alternative, however, seems to place too much weight on one variation so that the truth probably lies in the assertion that the sex differences, that actually exist, are slight. A study of the reactions of selected groups of boys and girls should throw light on the results from non-selected subjects, and make conclusions more certain. Subjects were selected by a process of elimination and selection. All of the subjects that were in the special class and minus grades were eliminated, along with all children of non-English speaking parents. From the following group of English speaking subjects in the regular grades all subjects were eliminated who had entered grade at an age very much above or below that of the general run of en- trants.* The remaining subjects ranged in age from 4.3 years to 14.4 years, but were found to group rather closely around certain ages. It was possible to find four groups of boys and girls of approximately the same chronological ages. The char- acter of these subjects is indicated in Table 14. The four groups of subjects, chronologically from 6.0 to 6.9, 7.6 to 8.9, 9.7 to 10.9 and 11.7 to 13.3 (which will be referred to as 6, 8, 10 and 12), were distributed in approximately the same grades, and had approximately the same average age and average grade. The results of these groups are shown in Table 15, which is arranged to show all the facts for selected subjects that were given for non-selected subjects in Tables 12 and 13. The first four columns show the percentage that each test was given to each group. The next four columns show the percentage or the proportion that the subjects in each group passed each * The ages on entering each grade of the subjects retained were as fol- lows, — Kindergarten = 4, 5 and 6; Grade I = 5, 6 and 7; Grade II = 6, 7 and 8; Grade III = 8, 9 and 10; Grade IV = 9, 10 and 11; Grade V = ro, II and 12; Grade VI = 11, 12 and 13. 7^ CARL C. BRIGHAM TABLE 14 Age in Grade Distribution, Average Grade and Average Age of 167 Selected Subjects. 86 Boys and 81 Girls. Age in Grade Distribution Age Group Sex K I II III IV V VI TOTAL Average Average Grade (M.V.) Age (M.V.) 6.0 to 6.9 Boys 5 13 18 0.72 (0.40) 6.52 (0.22) Girls 3 13 2 18 0.89 (0.39) 6.53 (0.22) 7.6 to 8.9 Boys 7 13 3 23 1.83 (0.51) 8.09 (0.38) Girls 2 13 5 20 2.15 (0.43) 8.32 (0.38) 9.7 to 10.9 Boys 6 12 2 20 3.80 (0.48) 10.37 (0.36) Girls 9 7 5 21 3.8i (0.69) 10.14 (0.32) 11.7 to 13.3 Boys 2 8 15 25 5.52 (0.58) 12.35 (0.55) Girls 3 8 II 22 5.36 (0.64) 12.41 (0.46) test. Column A shows the total number of times each test was given to all boys and girls, Column B, the weighted average age (the average ages given in Table 14 being used), and Column C the percentage that all subjects passed each test. The next four columns show the percentage that the boys are above ( + ) or below ( — ) the girls. Column D (derived from Column B), gives the difference between the average ages of all subjects to whom each test was given. Column E (derived from Column C), gives the differences between the percentages passed by all boys and girls on each test. The growth of the various abilities with age in the selected groups of subjects is more uniform than that shown by the non- selected subjects. Only three cases appear in which the younger subjects make higher scores than those of older subjects, these exceptions occurring in the tests of describing pictures ( — 3%, girls 6 to 8), naming colors ( — 7%, girls 6 to 8), and naming months ( — 9%, boys, 10 to 12). In the comparison of the sexes 41 differences are obtained varying in magnitude from — 28% to +26%, the median being 0% {Q=g.t^%). In actual magnitude the differences vary from o to 28, the median being 10% (Q^475%), the median being 1% higher than that of non-selected data, and the variability 0.75% less. 75% of the differences were less than 14%. o R a O pq < 3 He/) tn W c E 3 O U u + 111 + I I I + I I + CS 00 00 i-< \o I + + + + + + + + + + + + + + + 4> cfl SS.-H t-t so tn >. C O oj ua-c o ^ n^ •i=^ o 5f I- O) i2 o P i- W « ^'J= S 00 ^ 0^:5 + + + T V I 7 v < I I I I + + I + + + I + + + iOM*00 C^OOOO O O\0) OJ tN I+ + +T+ +TTi + ( + u pq >noq trjoo oq o "poq Tt-ooo oqo)'-Hvo^oo oo >-i ■<* Tj- (V) Tj- o W (I in O Q ^ Q f^ to ^O^O '^CO O 0\ OvOO CXI^OOk, Tti~i CO fv^vO OOJ '^ro'YiCO'^mO 0\<\i 0\0 OOo O Ov 0\ O^ COXD -^ 'q- CO f*^ l-^O 01 fv^ 04 ■^ OJ f^'V^^^-^'^W.-v^^^ w i^ <.\ 1-1 Ml-iWll 1/1>-1»H1-I l-i N O 1-1 00 O 11 fVj 1-H O O i-i O i-< o , O 'O r^ rf <^\o tv 3: »V00 O J-i <^^ oj 1-1 O N. tv 1^ OJ «/S i-i"\6" w vD oj' ^^; i-i~ oooo'^oo '-J' p^ (M 0"^0>'iO'00>'50JO^OJ<^QOOJOOOoOJ>r>oj t^OO ^ O 1-1 O i-<\0^"OX3vO '^VO ^ ;_; 11 iir>t^ t^-25 t^ o o\ o ov c>oo o\oo « i-, « ^ 00 »vOO N.00 "O 00 O 3-.Oo OOO 1-1 w K, M 5v ^ W ^*>i! ^1^ ^1^ ^'^ ^"1^ ^1^ ^"i^ »^"i^ ^"si^ '*^"u^ ^"if ^"^ ^"^ ^"^ ^"f ^*^ ^"t* o-s o-s o-« o-s o-« o-« o-S: o-« o-s o-s o-s: o-s o-s o-s o-h o-s o-J: o.s eq^pq^oq^pq!.5m^PQe3pq^PQ(J3mcj3eq^pqi^qq(^pfOme5Pq'.imOpq(JpqO bo O "" • — _ tn • "^-^ E-4^- S^-s4= tn 1- . en O erimental Studies of Mental De- fectives. Educ. Psychol. Monog. No. 7. Baltimore, Warwick & York, 191 2, pp. 155. 74. Witmer, L. On the Relation of Intelligence to Efficiency. Psychol. Clinic, 191 5, 9, 61-86. 75. Whipple, G. M. Manual of Mental and Physical Tests. Baltimore: Warwick & York, 1910, pp. 534. 254 CARL C. BRIGHAM '/6. Whipple, G. M. Manual of Mental and Physical Tests. Baltimore: Warwick & York, 191 4, pp. 690, 2 vol. jy. W[hipple], G. M. The Amateur and the Binet-Simon ^ Tests. J. of Educ. Psychol., 1912, 3, 118-119. || 78. W[hipple], G. M. Amateruism in Binet Testing once more. J. of Educ. Psychol, 1913, 4, 301-302. 79. WooLEY, H. T. A New Scale of Mental and Physical Measurements for Adolescents and some of its Uses. J. of Educ. Psychol. 191 5, 6, 521-550. 80. WooLEY, H. T. AND FiSHER, C. R. Mental and Physical Measurements of Working Children. Psychol. Monog. 1914, 18 (No. 77) pp. 247. 81. Wyatt, S. The Quantitative Investigation of Higher Mental Processes. Brit J. of Psychol., 1914, 6, 109-133. 82. Yerkes, R. M., Bridges^ J. W. and Hardwick, R. S. A Point Scale of Measuring Mental Ability. Baltimore: Warwick & York, 191 5, pp. 213. I : ^s-.m..m . THIS BOOK IS DUE ON THE LAST DATE STAMPED BELOW AN INITIAL FINE OF 25 CENTS WILL BE ASSESSED FOR FAILURE TO RETURN THIS BOOK ON THE DATE DUE. THE PENALTY WILL INCREASE TO SO CENTS ON THE FOURTH DAY AND TO $1.00 ON THE SEVENTH DAY OVERDUE. JUL 291943 REC'D LI ) MAR -119511 ■Wrp illliiii UNIVERSITY OF CALIFORNIA LIBRARY