Southern Branch of the University of California Los Angeles Form L 1 LB 5063 036 COD.l JA!> This book is DUE on the last date stamped below JUL 1 9 19 MAY 5 2 1 13 4 5 No. 6 No. 7 No. 8 Test No. Atts. Rts. Atts. Rts. Atts. Rts. Grade III . . . 26 19 16 58 2.7 2.1 5.0 2.7 2.0 1.1 Grade IV . . . 34 25 23 72 3.7 3.0 7.0 3.3 2.6 1.7 Grade V . . . 42 31 30 86 4.8 4.0 9.0 4.9 3.1 2.2 Grade VI . . . 50 38 37 99 5.8 5.0 11.0 6.6 3.7 2.8 Grade VII . . . 58 44 44 110 6.8 6.0 13.0 8.3 4.2 3.4 Grade VIII . . 63 49 49 117 7.8 7.0 14.0 10.0 4.8 4.0 Grade IX . . . 65 50 50 120 8.6 7.8 15.0 11.0 5.0 4.3 Time allowances, minutes . . . 1 1 1 1 6 6 12 12 6 6 Arithmetic Scales 21 Thus in the Addition Test (Test No. 1), the average score in Grade V is 42, the number of correct additions made in one minute. Similarly, for all the other tests. II. WOODY ARITHMETIC SCALE 1 Whereas, in each of the separate Courtis Tests the problems are of approximately the same difficulty through- out, in the Woody Scales a different method of measuring efficiency is employed. The scales are designed to measure work in the four fundamental operations of (a) addition, (6) subtraction, (c) multiplication, and (d) division, re- spectively. Each of these scales consists of a great variety of problems falling within the field of the particular opera- tion that the scale is designed to test. These problems, beginning with the easiest that can be found, gradually increase in difficulty until the last ones in each scale are so difficult that only a relatively small percentage of the pupils in the eighth grade are able to solve them correctly. That is, taking the addition scale for example, the problems rise in difficulty from the first, which requires next to no ability in addition, up to the last, which, though still an addition problem, is of sufficient complexity to test chil- dren of the eighth grade. The relative difficulties of the problems within each scale were determined by adminis- tering them to large groups of children in several school systems, the difficulty of a problem being calculated from the percentage of correct answers by a method similar to that used in the Buckingham Spelling Scale. Two distinct series of scales in each of the above named operations have been devised. It will be sufficient here to describe the shorter of these scales, Series B, and to illus- trate the general principles which underlie this method of measurement. For the other scale with a full account of its instructions, method of administration, scoring, etc., the reader is referred to the original study. 1 The scales are reproduced by the courtesy of Dr. Clifford Woody. 22 Scientific Measurement Series B — Addition Scale Name When is your next birthday? How old will you be? Are you a boy or girl ? In what grade are you ? . . . (1) (2) (3) (5) (7) (10) 2 2 17 72 3 + 1 = 21 3 4 2 26 33 3 — 35 (13) (14) (16) (19) (20) 23 25 +42 = 9 $.75 $12.50 25 24 1.25 16.75 16 12 .49 15.75 15 19 (21) (22) (23) (24) (30) $8.00 547 i + i = 4.0125 2§ 5.75 197 1.5907 61 2.33 685 4.10 3| 4.16 .94 678 456 8.673 — 6.32 393 525 240 152 (33) (36) (38) .49 2 yr. 5 mo. 25.09 .28 3 yr. 6 mo. .63 4 yr. 9 mo. .95 5 yr. 2 mo. 1.69 6 yr. 7 mo. .33 .36 1.01 .56 .88 .75 .56 1.10 .18 .56 Arithmetic Scales 23 Series B — Subtraction Scale Name When is your next birthday? How old will you be?. Are you a boy or girl ? In what grade are you ? (J) (3) (6) (7) 8 2 11 13 5 17 8 (9) (13) (14) (17) 78 16 50 393 37 9 25 178 (19) (20) (24) (25) 567482 2| - 1 - 81 27 106493 5f 12| (27) (31) (35) 5 yds. 1 ft. 4 in. 7.3 - 3.00081 - 31 - If = 2 yds. 2 ft. 8 in. Series B — Multiplication Scale Name When is your next birthday? How old will you be? Are you a boy or girl? In what grade are you? (1) (3) (4) (5) 3X7= 2X8- 4X8= 23 3 (8) (9) (11) (12) 50 254 1036 5096 8 6 8 6 (13) (16) (18) (20) 8754 7898 24 287 8 9 234 .05 (24) (26) (27) (29) 16 9742 6.25 * X 2 2f 59 3.2 (33) (35) (37) (38) 2* X 8* «= 987* 2i X 4i X U - .0963* 25 .084 24 Scientific Measurement Series B — Division Scale Name When is your next birthday? How old will you be?. Are you a boy or girl ? In what grade are you ? (1) 3j6" (2) 9J27 (7) 4 4-2 = (8) 9JU (11) 2JIF (14) 8)5856 (15) \ of 128 = (17) 50 4-7 = (19) 248 4- 7 = (23) 23)469 (27) I of 624 = (28) .003). 0936 (30) §4-5 = (34) 62.50 - -U = (36) 9)69 lbs. 9 oz. Series B was especially constructed for use in the measurement of arithmetical ability when the amount of time for such measurement is limited. The break in continuity in the numbering of the problems does not mean that the whole scale is not presented. The scale is quite complete as it stands ; the numbering is a matter of convenience for purposes external to the use of the scale. The Addition and Subtraction Scales can be used in Grades II to VIII inclusive; the Multiplication and Division Scales, in Grades III to VIII inclusive. It is recommended that in the use of Series B all tests be given together. DIRECTIONS FOR ADMINISTRATION It is very necessary that the same standard method be employed in the giving of these tests; care should be taken that the same directions are given in the same way to all groups taking the tests. The following are the general directions which should be carefully followed: Distribute the papers face down and do not allow the pupils to turn them over until they are told to do so. When all are ready with pencils in hand, say : " Turn your papers over and answer the questions at the top Arithmetic Scales 25 of the page." When all these preliminary questions have been answered, repeat the following formula of specific directions. If you are giving the Addition test, say, " Every problem on the sheet which I have given you is an addition problem, an ' and problem.' Work as many of these problems as you can and be sure that you get them right. Do all your work on this sheet of paper and don't ask anybody any questions. Begin." For each test in Series B allow ten minutes. It is essential that all the pupils start and stop work together because the test is partly one of speed. Most of the children will have finished all that are within range of their ability before the end of the time allowed; those who have not must not be allowed any further time. The only variation in procedure in giving any of the other tests is the substitution in the formula of specific directions of the expressions "subtraction or 'take away problems,'" "multiplication or 'times problems,'" and " division or ' into problems,' " for the expression " addi- tion or 'and problems.'" Since teachers in the lower grades sometimes use the expressions " and," " take away," " times," and " into," problems, these forms should also be used in administering the test so as to make clear to the children what is expected of them. DIRECTIONS FOR SCORING THE TESTS In scoring each test the standard of marking should be absolute accuracy and the final answer should be in its lowest terms. If the results of class measurement are to be compared with the results and values established by the author, only those answers should be accepted as correct which are identical with those given in the following table, since these are the solutions upon the basis of which the original scoring was done. 26 Scientific Measurement Answers to Problems. Series B Addition Subtraction Problem Answer Problem Answer 1 2 3 5 7 10 13 14 16 19 20 .... . 21 22 23 24 30 33 36 38 5 9 19 98 4 89 64 67 79 $2.49 $45.00 $27.50 3,873 a 18.3762 12| not 11 V s = 1 * ■•-8 10.55 22 yrs. 5 mo. or 22 r \ yrs. 268.1324 1 . . 3 . . 6 . . 7 . . 9 . 13 . 14 . 17 . 19 . 20 . 24 . 25 . 27 . 31 . 35 . 3 1 4 5 41 7 25 215 460,989 If 3| 14 a 2 yds. 1 ft. 8 in. not 81in. 4.29919 2| not 2| = J Multiplication Division Problem Answer Problem Answer 1 3 4 5 8 9 11 12 13 16 18 20 24 26 27 29 33 .... . 35 Off n • • • • 38 21 6 32 69 150 1,524 8,288 30,576 70,032 71,082 5,616 14.35 42 574,778 20,000 si 24693f .0080902^ or 1 . . 2 . . 7 . . 8 . . 11 . . 14 . 15 . 17 . 19 . 23 . 27 . 28 . 30 . 34 . 36 . 2 3 2 6£ not 6 + 1 732 32 7| not 7 + 1 35f not 35+3 20ft ; 20.3, not 20+9 546 31.2 & or .15 50 7* lbs. 11 § oz.; 7^1bs. 9 oz. .00809025 Arithmetic Scales 27 METHOD OF DETERMINING THE CLASS ACHIEVEMENT The method used for determining the class achieve- ment with Scale B is simpler than that employed in the use of Scale A. It is largely for this reason that Scale B was chosen for description. It should be noted that in each of the scales a definite attempt has been made to place the problems so that they would increase by uni- form stages of difficulty from the first to the last. Thus, in the Addition Scale problem 3 is as much more difficult than problem 2 as problem 5 is more difficult than problem 3, and so on. If one compares this with the method of the Courtis Tests, it will be seen that in the latter the problems involving a given operation are all of approxi- mately the same difficulty and require precisely the same knowledge and method for their solution. In other words the Courtis Tests measure speed in the various operations in arithmetic rather than extent of knowledge of the operation involved. In the Woody Scale, because the problems increase in difficulty, the score measures a certain extent of knowledge of the process involved in the operation rather than mere speed of performance. For example, in a race one could have a series of hurdles of all the same height and test the number cleared in a certain time — such is the Courtis method ; or the hurdles could get gradually higher and higher, the success of the individual being measured by the hurdles he can clear without a fall — such is the method of the Woody Scales. An objection is sometimes made by teachers that the problems are too hard for the children. In this con- nection it cannot be pointed out too clearly that when scales of this type are used in the schools it is not ex- pected that the children will be able to do all the prob- lems, just as when we determine the height of a child by means of an eight-foot rule, we do not expect the chilf? to measure up to the eight feet. 28 Scientific Measurement The achievement of a class is measured by calculating the median number of problems which were solved cor- rectly. By the median number is meant that number which marks the point at which there are just as many pupils who solve a greater number correctly as there are those who solve a less number correctly. In order to measure the median point of achievement of the class, it is necessary to make a distribution table, showing the number of pupils who were unable to solve a single prob- lem correctly, the number who solved one, two, three, etc., up to the final number. Take the following as an example : Number of Times a Given Number of Addition Problems Was Solved Correctly No. of pupils 012 1102 10 11 11 12 15 4 16 1 18 That is, one pupil failed to solve a single problem. With that exception there were no children who did not solve at least one problem correctly. Two children solved two problems correctly, three children solved three problems correctly, four solved five correctly, and so on. Since there are, let us say, 52 individuals in a given class, " the median point evidently falls between the achievements of the 26th and 27th pupils. Let us begin with the individual who was unable to solve a single problem correctly and count the two individuals who solved two problems, the three who solved three prob- lems, and so on until we come to the step that includes the 26th individual. Now if we are to indicate the exact point in the achievement of the pupils where there are just as many pupils who solve a greater number of prob- lems as there are those who solve a less number, it is necessary to count 5 of the 6 individuals who solved 10 Arithmetic Scales 29 problems correctly. Thus, on the assumption that the individuals are distributed over any step at equal dis- tances from one another, the median point is f of the distance through this step. Hence, the median achieve- ment of this class, i.e. the median number of problems solved, is 10.8 problems correctly solved." TENTATIVE STANDARDS OF ACHIEVEMENT The following standards of achievement have been determined on the basis of tests made on several thousand children from the second to the eighth grades of various school systems. It is possible that with further experi- mentation they may need to be slightly altered. Tentative Standard of Achievement for Series B Grade Addition Subtraction Multiplica- tion Division II III IV V VI VII VIII 4.5 9 11 14 16 18 18.5 3 6 8 10 12 13 14.5 3.5 7 11 15 17 18 3 5 7 10 13 14 The standards are based upon the total number of problems that were correctly solved in each grade. Thus in the second grade in addition, the median achievement was 4.5 problems, in the third grade, 9.0 problems, etc. All that is necessary, therefore, to test a class is to procure the standard blanks, follow the detailed instruc- tions in administration and scoring, and then determine the median score by the method shown. This median score can then be compared with the tentative scores given by the author. It should be noted of course that these tentative scores would cease to have significance if, previous to the test, the children had been drilled on ex- amples framed with the particular scale problems in mind. 30 Scientific Measurement EXERCISES 1. How does the general character of the work of your class, as revealed by the administration of Test No. 7, Series A, compare with that of other classes of the same grade in your building or city? 2. How does the work of your pupils in the various sub-branches, as revealed by the tests, compare with the standard scores for your grade? How does it compare with the work in these sub-branches in other classes and schools where you may be able to test? 3. Suppose the frequent administration of the tests failed to re- veal a reasonable amount of improvement in the various sub-branches, what would this seem to indicate? 4. Could the tests be utilized to remedy this condition? 5. What two important facts in regard to the ability of the pupils in your class have the tests revealed? 6. Suppose the tests showed the ability of a pupil to differ greatly in the various sub-branches, what action should the teacher take in regard to it? 7. For what purposes may the Woody Scale be used to greater advantage than the Courtis Scale ? 8. In your experience with the tests, have they tended to show any relation between ability in one branch and ability in another? 9. What precautions should be taken in administering the tests? 10. In what ways should the continued use of these tests increase the efficiency of a teacher of arithmetic? CHAPTER III HANDWRITING SCALES I. THORNDIKE SCALE II. AYRES SCALE HI. COURTIS TESTS Probably there is no subject about which opinions of efficiency are more vaguely expressed than the subject of handwriting. Such terms as "good," "fair," "poor," etc., merely express the individual teacher's judgment as determined by certain factors, such as legibility, grace, character, etc., or by certain styles, such as vertical or slanting, to which that individual is partial. No two teachers mean the same quality by the use of the same term. Consequently such judgments, because they are not expressed in terms of a universal standard which conveys the same meaning to everybody, are of little value when comparisons are necessary. Within recent years attempts have been made to eliminate this unscien- tific type of judgment, which is the natural result of the lack of a standard, by the construction of a scale for measuring the quality of handwriting. Thorndike and Ayres have each devised such a scale or standard, while Courtis has outlined a method by which it is possible to obtain samples of children's handwriting, made under uniform conditions. Each of these methods will be de- scribed briefly in turn. 31 p w o fa fa o CO fa P 5 o e w i CO 33 -6 3 f\i l*> T $o f A ^ 0° '{i 00 34 35 £ .: 36 37 d 39 ^ v"N 40 41 42 0> T3 o H >» to S-. O o QJ J3 Xi 0) o 73 O u a, t* 0> h co •a o CO bO C 'S bo a> g o EH 43 44 Scientific Measurement I. THORNDIKE HANDWRITING SCALES Thorndike was the first to construct an objective scale for handwriting. This appeared March, 1910, and was developed as follows. One thousand samples of hand- writing, ranging from the worst to the best to be found in the sixth, seventh, and eighth grades, were given in turn to forty competent judges. Each of these judges was asked to rank these samples according to their "general merit," which was to be based on a combination of grace and legibility, by placing each specimen in one of eleven arbitrary groups in order of increasing merit. Previous experiments had shown that these samples, instead of falling into a thousand different classes, naturally fell into about eleven groups, all the members of a group being of about equal merit. That is, the same thing is true of handwriting as is true of attempting to divide into a thousand classes a thousand people whose height varies from five to six feet. Many would be so nearly of the same height as to make such a classification impracticable, if not impossible. Similarly, exact classification would be impossible in the case of writing, where the distinction between the samples was not pronounced. After each judge had placed each sample three or four times in this way in one of these eleven groups, the aver- age result of his rankings was taken as his final grading for each specimen; that is, if a judge ranked a certain specimen of handwriting in class 10 on the first occasion, in class 11 on the second, in class 12 on the third and in class 10 on the fourth, on the whole he placed it some- where between classes 10 and 11, or to be exact, at a point which can be represented by 10.7. Then the re- turns of all the judges were massed and the average of all rankings given to each sample was determined. In this way the place assigned to each specimen by the com- bined opinion of all the judges was fixed. When the averaged judgments were collected (as might be expected Handwriting Scales 45 where so many samples were concerned), it was found that some samples were placed in, or approximately in, each of the eleven groups; that is, some samples were graded 1, 2, 3, 4 ... 11, while many samples were given rankings midway between the different groups, indicated by the markings 1.4, 1.6, 2.1, 2.8, etc. Now when it is recalled that each one of these groups, in the opinion of the judges, is separated from the others by equal steps of merit, it may readily be seen how a handwriting scale can be obtained, provided only that samples be graded exactly or approximately as falling into groups 1, 2, 3, 4, . . . 11, the handwriting samples in group 2 being as much superior to those in group 1 as those in group 3 are superior to those in group 2, etc. In this way the Thorndike Scale was obtained, a scale whose steps of difference forty competent judges have considered to be equal. Later, this scale was extended to include fifteen classes of handwriting which ranged in quality from handwriting which may barely be called such to that suitable for decorative purposes. This scale with its various classes of handwriting has from one to three different styles of writing in each group. Undoubtedly it would be far more satisfactory if each class contained samples of all the various types of writing which are found in the scjidol. This defect, however, can easily be remedied when a larger number and greater variety of samples become available. Furthermore, it is to be regretted that this scale, which measures about twenty-two by twenty-four inches, is not issued in more convenient form. In spite of these slight defects, which time will remedy, the scale is certainly far superior to the judgment of any one individual. The method of using it is very simple. A sample of handwriting is measured by placing it along- side the scale and estimating to which one of the fifteen groups, as represented by the fifteen samples, it belongs. 46 Scientific Measurement If it is thought to lie between two groups, a fraction may be added or subtracted according to whether it is judged better or worse than the sample on the scale to which it most nearly corresponds. Thus, if it falls between classes 12 and 13, it might be graded at any point in between, such as 12.4 or 12.8. For especially accurate work, it is well to have several individuals rank the samples of handwriting and then take the average of their rankings as the final measurement. Care should be taken to decide a specimen's grade not because of its like- ness in style to some sample in the scale, but because of its likeness in quality. After the person grading has become familiar with the scale, comparisons will be facilitated if the scale is folded so that the samples form the pages of a book. Then the judge should pass rapidly from the lowest to the highest sample, rating the specimen by his impression as a whole, inasmuch as such an impression is the resultant effect of all the qualities possessed by the writing. Long, pains- taking comparisons prevent accuracy instead of securing it. When it is necessary to compare a specimen with samples unlike it in slant and character, placing it some- where between two groups will often solve the difficulty. H. AYRES HANDWRITING SCALE 1 The Thorndike Scale is based on general merit of hand- writing. The Ayres Scale, on the other hand, is based on legibility; thus there is a substitution of function, instead of appearance, as a criterion. Ayres takes this standard for two reasons. In the first place, the purpose of writing is to be read ; hence " readability," or legibility, is the prime requisite. In the second place, it is exceedingly easy to measure the legibility of any sample of hand- writing by determining the time it takes to read it. In this 1 For reproduction of Ayres Scale, see pages 50 to 57. Handwriting Scales 47 way an exact evaluation of the relative legibility of any specimens may be obtained in terms of a unit of time. The criterion of general merit, though based on the opinion of competent judges, does not allow of this accuracy. The method by which this scale was produced differs radically from that used by Thorndike. Previous experi- ments had shown that the best way to find out the rela- tive legibility of different samples of handwriting was to find out the rate, in words per minute, at which each sample could be read. In order to represent a random selection and not the writing of any particular city or section, 1578 samples were secured of the handwriting of children in the upper elementary grades of 40 school systems in 38 different states. These samples did not consist of words so arranged as to convey a meaning, but were composed of words thrown out of context. The object of this was to make it necessary for the reader to decipher each word separately, and to make it impossible for him to memorize. Through the cooperation of super- intendents and teachers, samples from either the best or the worst class in any city were avoided, and it was so arranged that the pupils made no effort to write with exceptional care or rapidity. These 1578 samples were then turned over to ten com- petent paid assistants, who in turn read each sample and by means of a stop watch recorded the exact time it took to read it. After each sample had been read by the ten readers, the average time taken to read it was computed. Then the rate in words per minute at which the reading had been accomplished, was found by dividing the average time it took to read a given sample by the number of words in it. This process was repeated for each one of the 1578 samples. After it had been determined to what extent the readers had increased their reading speed through practice, the first 75 papers were reread and new times recorded to correct this error. 48 Scientific Measurement The next step was the classification of these samples. After various attempts at this, five classes —vertical, medium slant, extreme slant, backhand, and mixed — were finally carefully defined on the basis of the arbitrary judgment of a number of competent judges. Then each of the samples was classified on the basis of the slant of its letters and assigned to one of these five classes. Be- cause of the limited number of backhand and mixed samples of handwriting, these were left out of the final scale. The scale itself was then constructed in the following manner. All of the samples, which had been so marked as to indicate both the rate at which each one had been read and the class or style of writing — vertical or slant, etc. — to which it belonged, were arranged in one long series beginning with the sample having the lowest time rating and extending to the one having the highest. As might be expected, there were many samples of medium grade — that is, that were read at a medium rate ; only a few that were very good — read at a rapid rate; and only a few that were poor — read at a very slow rate. Then, beginning at the poorest sample — that which took the longest time to read — a count was made just halfway through the samples. The specimen thus obtained was the central point, below which one half of the samples were read more slowly, and above which, one half were read more rapidly. This sample had been marked 175.7 indicating that it was read at the rate of 175.7 words per minute. Because of its central position, considering the entire series as 100, this sample was called 50. In a simi- lar manner samples were picked out one tenth, two tenths, three tenths, four tenths, six tenths, seven tenths, eight tenths, and nine tenths of the way through the series, and these were designated 10, 20, 30, 40, 60, 70, 80 and 90, respectively. These values were chosen because teachers are familiar with them in grading. Handwriting Scales 49 The rate of reading marked on these samples was found to be 130.2, 149.4, 163.5, 175.7, 186.1, 195.8, 202.9, and 209.6 words per minute, respectively. Thus it was seen that this scale does not proceed by equal steps as far as the time consumed in reading is concerned. Instead, the gain in time rate became progressively smaller as one moved from the worst to the best sample. How reasonable this is, may easily be seen. A very poor handwriting takes a long time to decipher. One which is just slightly better may be read almost twice as fast. A still better one may be read somewhat faster but not twice as fast as the pre- vious one, and so on, the gain in the rate growing smaller and smaller as the handwriting improves. Thus, as far as readability is concerned, the difference in the time it takes to read a sample marked 30 and one marked 40 is greater than the difference in the time it takes to read one marked 60 and one marked 70. So, what is actually meant when it is said that the steps of this scale are equal, is that each one of them has been so chosen that it is as much better than the one before, as that one is better than the preceding one. Qualities 60 and 40 are respec- tively equally distant above and below quality 50 ; that is, there is the same proportion of samples between 50 and 60 as between 40 and 50, and so on down the scale. The scale itself is on a sheet of paper measuring nine by thirty-six inches. It contains eight groups (the lowest and the highest groups being omitted in the final scale), each group including three types of handwriting, the vertical, the slant, and the extreme slant. Ayres' studies have shown that 95% of the common writings of school children are included in these three styles. To facilitate comparison, both the paper and the ink used in the scale are of the color used in the public schools. The scale is used in exactly the same way as the Thorndike Scale. The following scales are reproduced by the courtesy of Dr. Leonard P. Ay res. 20 a, a 50 30 <^/^*^ %&i?^ <^^^^' ?^ «^ *s>**^ c*rrt^£L~y* \ £Z£? ^ <^cz*??t, tzsyr*u^<*t, 51 40 ZeAAATLs Iv^oJt Aa/O^ star s\s4s AATYUxX Cur* Ascr&sys \^n~ Off sOW>U^l^^s<*Szi^s£*&^ *s£- o o Z^ a •a a> a o o CO H .d w *bb w 13 0< 2> o CO o C 00 00 cm t>. © co >© 1-4 Co * ie i« io » CO oo co o3 Ph 5- d O _ .d +J J-c <1> d T3 d 03 d d 8 * co si o3 ^s a in CO > «j co £ -^ o « ^ b ° a — « co ^ =: J5 S-» CO ^ -"(J ■— ' • »— « • r/} ** <»r-t k. W) bJO - fl >? co -p S 1^ +3 o d ^ „, bo -m R o3 co 03 ^ -^ n^ co rd co s T5 § ^ o3 §* co bo .. wm £-° 03^3 ^ *3 ^ n *•"' 73 X! PI en be .. a T3 T3 »" eS b€ C 55 .-a 5-i .i— i .9 X 73 73 ^ ^ s 5 ° ^ 2 S e2 bb £? PQ 2 +5 s- o d f3 cu m o ^ -d ^S 73 O d .2 ° 73" -M ^^ 'O .a bo d -. g « -d C^ +* ** «M T3T! ° CO c3 ^ S^ P3 ,__ CO C 73 en •S £ C M ^3 73 TO TO CO «i-H fS- d g ^ r-. * +» ^ •§, ■ +J T3 T3 -M % *£ §3 * I 4J t> ^ O i> CO o 1 CO ^ CO 73 $-1 o o CO -3 o bO 03 So & bo o >> CO d d o CO CO d ^-' CO co d CO 03 ^ "S 2 *h o o S M ^S o 3 g co a 5 _. 4J «■, d ^5 d ™ «„ d 'rt -P g ^ ^ +» "8 „, .5 co S d I c3 s 1 8 co^ a 1 a 88 « be rn 03 d 2 a> -S -P d ^ -C bo "^ hn d -> *« "So ^ o cy bo O $ £ *=! T3 d xn A r „ c3 m 'd a> o ^ be pj t» 73 co O fl ^ co ^3 03 x «w 9 S-i J2 S -M O o ■a 'S ^ -s « o o3 bo F c3 • j2 d ^ SB 03 bp o bO d $ Reading Scales 97 en 0\ — CO o * o *4 ro to m to CO oo »-^ as a i-i o I ■<* 6 B 1 © CO CO* c => ■a co* o C ■ — CO a> • •— co CO oi © >-< e* w> m "<3 • «J ■to ""3 o © 13 S CO s •AS 13 s c 03 bO cu Xi -d i 03 «* , be: 03 . .Si m . m '■ O a +j a3 W U) O X5 0) "■3 q 03 +J >~. 0) cu J-l o cu PQ 4J 03 © CU > 03 cu C35 © i- X3 „- o3 X2 09 CU Reading Scales 99 100 Scientific Measurement § o o O CO CO 1m 3 O z £ be d cS o 10 6 CO CD H J3 CO • «-« "So c W Reading Scales 101 Ol r* m t» r-t t* o> o <-< o •H r4 C4 O N h .-B x! -c 102 Scientific Measurement r* N CO i/) "O in -§ O Ph 0) <-« CO S.2 t-, +j en c ra e3 2 "S S -Q « £ T3 CD S "C OJ +5 0) c( S g pq £ a> o S > o *- o Ih «-. CO (1) *© t^ 00 as 2 ^ £ -fi o o ° fcJO W CD O '5 £3 £ s T3 >, >> cd c », S CD O o S S3 o CD -Q 3 -fi JB < O M a> ci -M .5 3 S3 F-h CD o 2 60 S3 o CD S3 o u CD T3 O cd a> S3 £ EH 51 O HN W 1» a) S3 >-. .2 '55 CD PQ CD S3 I- O u o t-l J3 T3 CD T3 2 * 60 P S3 J 03 tf CD C CD J3 *■" 60 _g <4-H 'P ° s O CD a « 60 O CD u CO CD X S3 o o S £ 2 ^ £.2 to S Q PQ HNW^IOIOC-OO 104 Scientific Measurement 9 a rQ rG G . ^ "« » a "- O O — i »-i cd • G co O bO *S .3 03 i t3 - G ^ * a - SC CD O Cfl 1 6b ll -G tT O °i & S* TJ _jh «+-i G> a .g •§ (£ & j* PS to Q H A ft G >» G £ S a s co G ^ _ "2 •» « Oi (y TTJ Pi "H CD 0) rG rG >> 03 -e • 9? -b ^ S. ™ 53 o O «4-t CU CU to pq G -G £ rG ft G O co •> bfl G .as G cu co +-> G - G G ft r* o • r-t G M 03 to .a G G o • G a> .G S-i >— i 03 G a 1 co G G 5 co CO cu w w ft ib fe w Q cu o W +■» S Pi r»> o a> • r- 1 45 O H rH «4-H s c3 o 2 £ G G o3 o3 d co p» bO * C3 +j Qd O •■o ^> q^ G Ej • !— | 0) ri-H g So fc 03 CD I g ■a CO bO bO G G • r-t »r— I -^ ^G co o 03 tf V tf CO 03 a 03 O cu m T3 O (MW^WCDt>00C5OH(NC0rt000>OH(M05 NWWNOJNi^(>qcOWMMcOCOCOCOCOCO'<*rt*'^Tj* cu O CQ •G o iS o ^ ft ^ • G __- • r— I O 5 G . o oT -G :2 bO o *C « fa Q 0) G .G rM rH 03 rQ CO rH co rG co rl c3 rG rb ^3 O rQ r. d. CO rn O b -is >> c3 3 ^ a G cr 03 CD •» • r. jj 0) G .G a> S • • G ° 3 G bOr^T ex <^ fl O « 3 . o o a> 5* y co CD 03 r* ^ bO bO G .a -a ^ ft c3 'ft bO G M " G G d> g 0J • r4 n« »H fa O CO csj 8 G rG .a X3 w 3 +3 HNM^WCOt-OOOJOH NcO^lOOt^OOOJ T3 CD CO CO G .T3 CD r, r" *-" rH CD I CU ol PQ G c3 cd ^ I G co +j 03 -73 CD O CD o 13 O co co rj ri2 -2 rg I °S ^ c3 c3 •> ^ rH ■— ' h E* 8 «r L. 2 H IB 55 ra o 'S > ^ O G G I G _o CD '" rQ - r. M co G * "Si •^ G co G G G PQ »-» o a o r4 a> rO 3 *c3 * w - ft co bO CD CD G i-H H 03 CD T3 rf G O a 1 +J CO CU OQ PQ Reading Scales 105 To summarize then, a teacher of the fourth, fifth, sixth, seventh or eighth grade may test the ability of pupils, (1) in the understanding of single words by using Thorn- dike's Scale A, (2) in the comprehension of material read, by using Thorndike's Scale Alpha, the Starch series of tests, or the Courtis tests, and (3) in the rate of reading, by using either the Starch or the Courtis tests, preferably the Starch. Thorndike's Scales may be obtained by sending to Teachers College, Columbia University, New York, and the Starch Reading Tests, by sending to the author at the University of Wisconsin. The sheets on which the scales and tests appear contain full directions for their use. In using Scale A the teacher should allow thirty minutes for the test in the fourth grade, twenty-five minutes in the fifth and sixth grades, and twenty minutes in the seventh and eighth grades. In administering Scale Alpha the teacher should allow from twenty to thirty minutes. In Scale A the pupil's score is the highest numbered line that he marks correctly without more than a single error. Scale Alpha is scored in a similar manner ; that is, the pupil's score is the highest numbered step or set in which he has answered at least three of the four questions correctly. In using the Starch Tests the teacher should send for the test blank that bears the same number as her grade ; for example, No. 4 for the fourth grade, No. 5 for the fifth grade, etc. The speed of reading is obtained by determining the number of words read in thirty seconds. The pupil's score is determined by counting the number of words in his written reproduction which correctly express the thought of the selection read. Added and repeated words, as well as those which represent the ideas of the selection incorrectly, are not counted. Folders or manuals, covering every phase of the test- ing, together with answer cards, must be procured with 106 Scientific Measurement the test sheets if the Courtis Tests are to be used. These may be obtained by sending to the Department of Co- operative Research, 82 Eliot Street, Detroit, Michigan. To measure oral reading Gray's Scale may be used. In reading the paragraphs in the scale, which gradually increase in difficulty, the gross errors, minor errors, omis- sions, substitutions, and insertions made in each para- graph are recorded. If a child makes 4 or more errors in a paragraph and takes 30 seconds or more to read it, or if he makes 5 or more errors, however quickly he reads, he may be considered to have failed in that paragraph. This scale may be obtained by sending to Teachers Col- lege, Columbia University, New York City. EXERCISES 1. Describe in detail the methods you would employ for measur- ing the reading ability, oral and silent, of thirty children of Grade V, using (a) the Thorndike and Gray Scale and (6) the Starch Scale. 2. How would you compare your class with one of the same grade in another school, using the Starch Scale? What conditions would you have to meet to make the comparison of the results valid? 3. How do the results obtained from the Thorndike Scale compare with those which the Starch Scale give? 4. Does there seem to be any relation between speed of reading and comprehension of material read? 5. What distinctions between oral and silent reading have the tests revealed? 6. Have the tests revealed any marked difference in the reading ability of boys and girls? Of children of different nationalities? Of children who have used different reading textbooks? 7. In what way may a teacher modify Scale A so as to use it to test knowledge in various subjects in the curriculum from the ele- mentary grades through college? 8. When should a teacher stop drill in oral reading and devote all the time to drill in comprehension? 9. Have the tests revealed wide variations in the reading ability of the pupils in your class or a condition of more or less uniformity? 10. What are the shortcomings of the scales described in this chapter? How could these be remedied? CHAPTER V SPELLING SCALES I. BUCKINGHAM SCALE II. STARCH SCALE HI. AYRES SCALE I. BUCKINGHAM SPELLING SCALE This investigation, following the lead taken by the ex- perimental investigation of the quality of handwriting and of composition, had as its object the development of a scale for the measurement of spelling ability; a scale which would no longer depend upon chance selection of words and upon subjective judgments of teachers, but which would be of general application and purely objective. The results were first published in 1913. It is an obvious fact that there is a great difference in words as regards ease of spelling. Thus, we can select words of the very simplest, such as the, as, when, up to words of extreme difficulty which can only be spelled after long acquaintance. Theoretically, therefore, it is possible to arrange a series of words along a scale in such a way that they become more and more difficult. Furthermore, it might be possible to arrange these words at equal inter- vals along the scale, these intervals being determined by the difficulty of each word. If in addition to this we fix a zero point (by taking the simplest words and agreeing that failure to spell these words indicates absence of spell- ing ability), a scale may be constructed which will meas- ure the spelling ability of any individual, and will measure 107 108 Scientific Measurement the difficulty of any word which has to be spelled. Not only can we measure the spelling ability of individuals in this way, but also of classes, schools, and school systems. Such measurements will be independent of individual opinion. Spelling ability will be determined, not by an arbitrary list of words, picked at random by individuals who have no knowledge of their relative degrees of diffi- culty, but by means of words on the scale, which have been standardized as regards their difficulty, by the simple device of finding out what percentage of eighth grade children spelled them correctly. The school has always attached great importance to spelling ability; whether or not this ability is overestimated, does not need discus- sion here. Suffice it to say, that if the school takes as its aim the teaching of spelling, it is essential that some method be devised to measure the extent to which the aim is accomplished. Dr. Rice, as early as 1897, tested the pupils in all grades from the fourth to the eighth inclusive in twenty-one school systems, using a list of words, which has since become known as the Rice Sentence Test. This list is given on the following page. Spelling Scales 109 RICE SENTENCE LIST 1. running 30. writing 59. sensible 2. slipped 31. language 60. business 3. listened 32. careful 61. answer 4. queer 33. enough 62. sweeping 5. speech 34. necessary 63. properly 6. believe 35. waiting 64. improvement 7. weather 36. disappoint 65. fatiguing 8. changeable 37. often 66. anxious 9. whistling 38. covered 67. appreciate 10. frightened 39. mixture 68. assure 11. always 40. getting 69. imagine 12. changing 41. better 70. peculiar 13. chain 42. feather 71. character 14. loose 43. light 72. guarantee 15. baking 44. deceive 73. approval 16. piece 45. driving 74. intelligent 17. receive 46. surface 75. experience 18. laughter 47. rough 76. delicious 19. distance 48. smooth 77. realize 20. choose 49. hopping 78. importance 21. strange 50. certainly 79. occasion 22. picture 51. grateful 80. exceptions 23. because 52. elegant 81. thoroughly 24. thought 53. present 82. conscientious 25. purpose 54. patience 83. therefore 26. learn 55. succeed 84. ascending 27. lose 56. severe 85. praise 28. almanac 57. accident 86. wholesome 29. neighbor 58. sometimes 110 Scientific Measurement The method of scoring was of the simple type which is usually found in schools, i.e. a mark was given for each word correctly spelled, or a unit subtracted for each word misspelled. That is, all words were taken as equal measures of spelling ability. It should be noted that the foregoing list contains among other words, disappoint, necessary, changeable, better, because, picture. An examina- tion of these six words shows at once that they are by no means of equal difficulty. This was conclusively proved by Thorndike, who made an actual test of these words on a group of fifth grade children. Thus, in the group that he measured, while 37% failed to spell neces- sary, the failures to spell better, because, and picture, were 3%, 1%, 0%, respectively. This clearly shows that it is erroneous to measure the score of the individual by giving equal value to each of these words. The pupil who scores, let us say, 95%, has spelled not only all the easy words in the list, but also a considerable number of the hard ones, whereas the pupil who gets 50% has failed in the hard words, and has obtained his mark merely by spelling the easy words. That is, as the score increases, the units really get greater and greater, for to spell the five hardest words represents a very different task from spelling the five easiest words, and yet both have the same effect on the score. In other words, studies of this type must always lack precision because of the inequality of the units which are employed. They are useful for giving a rough estimate of the abilities of various groups, but when it comes to asking questions, such as : How does the spell- ing ability of one class differ from another ? — the figures which represent the results give no quantitative informa- tion, and are actually misleading. As the science of school measurement advances, such a state of affairs can hardly be tolerated. Exact quantitative measurements of spelling ability are required ; such quantitative results can never be obtained so long as the fundamental error Spelling Scales 111 is made, that one word is equal to another word in difficulty, unless this is proved to be the case by actual measurements of large groups. To correct this error was the purpose of Buckingham's study of spelling ability. The study was confined to grades from the third to the eighth, inclusive, of elementary schools located in or near the city of New York. The schools drew such different classes of children that any conclusions derived as a result of the study can be taken as representative. In all, about 9000 pupils were tested, a number from which general results might be expected; a greater number of pupils would not have increased the accuracy of the results sufficiently to compensate for the additional labor. In the first test a list of 270 words was used. This will be called the "original list." This list was selected from a larger list of 5000 words taken from two or more of five special books used by the author in his own school. These 270 words had to satisfy two requirements : (1) All of them had to be words in the speaking vocabulary of a third grade child, and (2) a considerable portion of the words had to be of sufficient difficulty to test the spelling ability of an eighth grade child. These words were then placed in a continuous passage, and the whole dictated to Grades III to VIII in one school and to Grades IV to VII in another school. The dictation was very slow, so that the time factor did not enter. In marking the papers only the 270 words were regarded, those that served to link the whole into a continuous passage being neglected. All the papers were marked by the same person and two measurements were recorded : (1) the number of times each word was correctly spelled in each grade, and (2) the percentage of the entire number of words each pupil spelled correctly in each grade. We shall confine our- selves to the first consideration, i.e. to the number of times each word was correctly spelled. 112 Scientific Measurement TABLE I Figures Indicate Per Cent Correct Table reads: across was spelled correctly in the third grade of School II by 17% of the pupils; in the fourth grade of School I by 60% of the pupils, and of School II by 40% of the pupils, etc. Grade .... 3d II 4th 5th 6th 7th 8th School . . . I II I 76 II 58 I II I II II across .... 17 60 40i 90 79 98 87 93 addition . . . 2 38 26 60 28 76 45 94 76 83 almost .... 16 62 41 73 65 88 75 80 81 87 alphabet . . . 25 13 1 63 12 40 46 82 43 68 arithmetic . . 27 89 53 100 72 96 92 100 97 98 bridge .... 29 59 42 87 52 98 85 100 94 97 button .... 14 50 35 70 49 77 63 84 62 83 choose .... 6 25 10 37 31 62 37 67 55 65 day 97 100 98 96 100 100 99 100 100 100 guess .... 6 29 17 67 30 77 50 82 66 85 handful .... 36 47 33 46 19 76 33 75 63 57 pshaw .... 1 4 6 29 6 46 5 31 31 18 tomato . . . 34 83 49 67 43 74 48 79 32 38 too 10 3 17 4 26 7 63 22 27 whose .... 17 49 15 40 29 47 10 57 59 66 Table I represents the typical results obtained from the various grades in the particular schools. Thus for example, across was spelled correctly in the third grade of school II by 17% of the pupils, and in the seventh grade of school I by 98%. On the basis of these scores a group of 100 words, here called the " selected list," was chosen from the original list of 270 words. The basis upon which the " selected list " was chosen is as follows : Referring to Table I, it will be seen that the word across was spelled by 17% of the third grade children, which means that it was not too hard to serve as a test of their ability. By the time the seventh and eighth grades Spelling Scales 113 were reached, it still served as a test of ability, for it failed to be spelled in the seventh and eighth grades by 13% and 7%, respectively. For this reason the word across was selected. Almost and button were chosen for the same reason. On the other hand, addition, which was spelled by only 2% of the third grade children, was discarded as too difficult, for 2% could spell it rightly by mere chance, which means that the word really serves as no test for the particular grade. Continuous Passage — ioo Selected Words Whose answer is ninety? If the janitor sweeps, he will raise a dust. You ought not to steal even a penny. Wait until the hour for recess to touch the button. Smoke was coming out of their chimney. Every after- noon the butcher gave the hungry dog a piece of meat. One evening a carriage was stopping in front of my kitchen. I wear a number thirteen collar. Guess what made me sneeze. Send me a pair of leather shoes. I do not know, but I am almost sure they are mine. My uncle bought my cousin a pretty watch for forty dollars. The soldier dropped his sword. Jack had a whistle and a£so £we£ve nails. The ocean does not often freeze. You should speak to people whom you meet. It takes or% a minute to pass through the gate and across the roaa\ Did you ever hear a /airy laugh? The Awencaw Indian had a saucer without a cup. Neither a pear wor a peach was at the grocery store to-day. Cut up a w/ioZe omori with a handful of beans. My pmwo ?essow was easy. The animal ran info the road and straight against a tree. Give me another sentence which has the word "title" in it. I believe true friends like to be together instead of apart. 114 Scientific Measurement These 100 selected words (printed in italics) were again put into sentences as shown (page 113) and were dictated later to five schools. Great care was taken to insure uni- formity in the administration of the tests. Later 18 addi- tional words were added, making a total of 118 words dictated. The extent to which each of these 118 words was spelled correctly in each grade in each school was deter- mined. Using the data so collected, it was possible to select words which show a regular increase in difficulty, as we pass down from grade to grade. From these words two lists were then selected, each containing 25 words; these are referred to as the "first preferred list" and "second preferred list," as tabulated below. PREFERRED LIST First Secon D 1. even 14. minute 26. already 39. too 2. lesson 15. cousin 27. beginning 40. towel 3. only 16. nails 28. chicken 41. Tuesday 4. smoke 17. janitor 29. choose 42. tying 5. front 18. saucer 30. circus 43. whole 6. sure 19. stopping 31. grease 44. against 7. pear 20. sword 32. pigeons 45. answer 8. bought 21. freeze 33. quarrel 46. butcher 9. another 22. touch 34. saucy 47. guess 10. forty 23. whistle 35. tailor 48. instead 11. pretty 24. carriage 36. telegram 49. raise 12. wear 25. nor 37. telephone 50. beautiful 13. button 38. tobacco Considering these 50 words alone, Table II shows the percentage of children from the third to the eighth grade, who were able to spell each of the 50 words. Thus, even was spelled correctly by 59% of children in the third grade, 93% in the sixth, and 97% in the eighth grade. Spelling Scales 115 TABLE II (Showing Standard Scores in Spelling) Words 3d Yr. 4th Yr. 5th Yr. 6th Yr. 7th Yr. 8th Yr. 1. even . . . 59% 79% 89% 93% 93% 97% 2. lesson . . 37 72 83 91 94 96 3. only . . 65 75 89 95 97 99 4. smoke . . 46 69 85 94 96 99 5. front . . 51 72 80 90 94 97 6. sure . . 47 55 69 78 89 94 7. pear . . 31 42 58 72 81 94 8. bought 40 65 79 91 94 97 9. another . 36 43 78 86 94 96 10. forty . . 49 62 65 72 83 87 11. pretty . . 45 67 76 90 90 94 12. wear . . 35 49 61 74 84 93 13. button 32 52 61 73 74 87 14. minute 26 38 62 77 86 92 15. cousin . . 19 47 69 89 89 95 16. nails . . 43 58 71 87 92 96 17. janitor 19 42 58 81 81 90 18. saucer . . 11 29 42 58 79 81 19. stopping . 27 39 55 71 76 84 20. sword . . 13 46 57 78 86 93 21. freeze . . 29 46 68 83 86 94 22. touch . . 45 52 60 81 84 93 23. whistle . 22 55 56 64 75 85 24. carriage . 13 40 50 67 81 85 25. nor . . . 63 61 65 68 77 94 26. already 16 42 43 62 44 77 27. beginning 9 25 37 46 66 75 28. chicken 49 70 83 90 96 99 29. choose 22 34 48 60 65 82 30. circus . . 20 39 50 72 75 95 31. grease . . 11 18 37 35 42 57 32. pigeons 7 29 41 57 70 82 33. quarrel 15 39 53 75 86 94 34. saucy . . 14 35 40 52 71 78 35. tailor . . 38 55 70 75 81 84 36. telegram 15 31 39 63 73 84 37. telephone 8 35 48 67 83 87 38. tobacco 12 39 60 75 88 96 39. too . . 14 28 27 24 30 43 40. towel . 24 44 64 73 78 94 41. Tuesday 46 70 67 80 87 91 42. tying . 44 58 70 68 76 87 43. whole . 17 43 64 78 84 90 44. against 19 30 54 75 84 94 45. answer 27 47 67 86 90 97 46. butcher 33 59 69 85 90 97 47. guess . 20 32 49 67 77 85 48. instead 32 48 62 86 87 91 49. raise 21 54 67 84 93 94 50. beautiful 10 52 70 85 94 96 116 Scientific Measurement In this way, Buckingham has provided a basis of com- parison, which may be used by any teacher, as a method of testing the relative ability of different classes. 1 DIRECTIONS FOR ADMINISTERING The following instructions, which are essentially the same as those followed by Buckingham, may be given as regards the conduct of the test : (1) Give all the words in sentences during one session, i.e. either in morning or afternoon of same day, except in classes below the fifth grade, where the material should be given in two periods separated by half an hour at least. (2) Each sentence should be dictated, either as a whole or in part, as many times as may seem necessary to secure its complete understanding. This experiment is purely a test in spelling ; it is not expected that the pupils should be subjected to the added difficulty of recalling the words dictated. (3) Offer no explanation of separate words or sentences. If the meaning is not clear, repeat the sentence as a whole or in part. (4) Do not ask the children to underline words, or otherwise call attention to the significant words of the sentences. (5) After the children have written the sentences, read them again, and allow the pupils to insert words or make other corrections before finally collecting the papers. These papers may now be collected for the whole class, and the percentage of pupils getting any particular word correct determined and compared with the table which has already been given. Of course no particular signifi- cance is attached to any single word; there is no one word which will test the spelling ability of a group. 1 The tables in this section are reproduced by the courtesy of Dr. B. R. Buckingham. Spelling Scales 117 When, however, 50 words are taken, which have been pre- viously standardized, the manner in which these are spelled by any group of pupils will serve to give a quan- titative idea of their spelling ability. Thus, if it is found by a teacher who is dealing with Grade V, that her aver- age percentage for 50 words falls notably below the aver- age given in the table for Grade V, there is every reason to suppose that there is something abnormal about the standing of that class, due to causes which might profit- ably be investigated. Suppose, for example, that we are dealing with a fifth grade which contains 50 children, and we find that the word another is spelled correctly by 31 of the children. Reducing this to the percentage basis, the score of the class for this word is 62%. On reference to Bucking- ham's Table, we see that the average score of this grade for the word another is 78%, which means that the par- ticular grade in question, as far as this word is concerned, was not equal to the average. The same procedure may be repeated with any of the other words in the list, and the average of all the percentages obtained. This figure may then be compared with the averages of the percentage for Grade V given for the particular words employed. It is necessary to use from 10 to 20 words in testing a grade, in order to avoid the danger of picking out one or two words upon which special drill might have been given. When 10 or 20 words are chosen at random from the list, this difficulty is obviated. It may appear that some justification is required for this laborious study. The ordinary individual would be apt to take the attitude that the teacher's judgment would be just about as sound as the estimates arrived at by the foregoing process. As a matter of fact, the 50 words were ranked by 300 judges, most of them teachers. Naturally there was a general agreement between the teachers' judgments, and the relative order of the words 118 Scientific Measurement found as the result of experimental study. But with certain words, there was very great disagreement. Thus, the word nor when ranked by the teachers was given fifth place as regards ease of spelling. The actual records show that the children found it the sixteenth word as re- gards ease of spelling. Again, the word button was ranked ninth by the teachers, and thirty-first by the records which came from the pupils. This shows the unsatis- factoriness of relying on teachers' judgments. As long as those who are teaching do not know the relative diffi- culty of the words taught, how can they be expected to give the correct weight either in time or emphasis in their teaching? Buckingham, in the latter half of his study, proceeds to construct a scale for the measurement of spelling effi- ciency, a scale which contains at one end words which, if they cannot be spelled, would indicate zero ability, and at the other end words which are very difficult for the average child in the grades to spell. By simple statis- tical methods and suitable assumptions he determined the interval between the words on the scale, the length of the interval being measured by the increase in diffi- culty as shown by the percentage of times it was correctly spelled. It would be impossible in the limits of this book to explain the method of derivation of the scale. Its interest is largely theoretical, and in its present form it could not be used with profit by the average teacher. It should, however, be borne in mind that such a measur- ing rod has been constructed even in a difficult function such as spelling. Spelling Scales 119 II. STARCH SPELLING SCALE A second method of measuring spelling ability has been devised by Starch, who worked quite independently of Buckingham. While this method lacks the statistical precision of Buckingham's study, in that it assumes (as far as the score is concerned) each word to be of equal difficulty, it is very straightforward and has many points to recommend its use in the classroom. The first object of the experiment was to obtain six lists of equal diffi- culty, each containing 100 words, representative of the entire non-scientific English vocabulary. This was ac- complished by taking at random the first defined word of more than two letters on every even-numbered page in Webster's New International Dictionary. This made a total of 1,186 words. Every technical, psychological and obsolete word was then discarded, leaving 600 words. These were then arranged alphabetically in the order of size beginning with three-letter words, four-letter words, etc. This list was then divided into six lists of 100 words each, by choosing for the first list, the first, seventh, thirteenth, etc., word of the original list of 600 words. The second list was obtained in a similar manner by tak- ing the second, eighth, and fourteenth word, etc. ; and so on till the sixth list, which was formed by taking the sixth and twelfth word, and so on. The lists which re- sulted from this process are as follows : 120 Scientific Measurement LIST I 1. add 35. prism 69. commence 2. but 36. rogue 70. estimate 3. get 37. shape 71. flourish 4. low 38. steal 72. luckless 5. rat 39. swain 73. national 6. sun 40. title 74. pinnacle 7. alum 41. wheat 75. reducent 8. blow 42. accrue 76. standing 9. cart 43. bottom 77. venturer 10. cone 44. chapel 78. ascension 11. easy 45. dragon 79. dishallow 12. fell 46. filter 80. imposture 13. foul 47. hearse 81. invective 14. gold 48. laden 82. rebellion 15. head 49. milden 83. scrimping 16. kiss 50. pilfer 84. unalloyed 17. long 51. rabbit 85. volunteer 18. mock 52. school 86. cardinally 19. neck 53. shroud 87. connective 20. rest 54. starch 88. effrontery 21. spur 55. vanity 89. indistinct 22. then 56. bizarre 90. nunciature 23. vile 57. compose 91. sphericity 24. afoot 58. dismiss 92. attenuation 25. black 59. faction 93. fulminating 26. brush 60. hemlock 94. lamentation 27. close 61. leopard 95. secretarial 28. dodge 62. omnibus 96. apparitional 29. faint 63. procure 97. intermissive 30. force 64. rinsing 98. subjectively 31. grape 65. splashy 99. inspirational 32. honor 66. torpedo 100. ineffectuality 33. mince 67. worship 34. paint 68. bescreen Spelling Scales 121 1. air 2. cat 3. hop 4. man 5. row 6. tap 7. awry 8. blue 9. cast 10. corn 11. envy 12. feud 13. game 14. grow 15. home 16. knee 17. look 18. mold 19. part 20. ruin 21. take 22. tree 23. well 24. allay 25. blaze 26. buggy 27. clown 28. doubt 29. false 30. forth 31. grass 32. house 33. money 34. paper LIST II 35. quill 36. rough 37. shout 38. stick 39. swear 40. trump 41. whirl 42. action 43. bridle 44. charge 45. driver 46. finger 47. heaven 48. legend 49. motley 50. portal 51. recipe 52. scrape 53. simple 54. strain 55. weaken 56. breaker 57. congeal 58. disturb 59. foreign 60. hoggery 61. meaning 62. onerate 63. provoke 64. salient 65. station 66. trample 67. abstract 68. bulletin 69. covenant 70. eugenics 71. friskful 72. luminous 73. opulence 74. planchet 75. reformer 76. thorough 77. watering 78. belonging 79. displayed 80. indention 81. mercenary 82. redevelop 83. senescent 84. uncharged 85. whichever 86. centennial 87. constitute 88. exaltation 89. in vocative 90. personable 91. strawberry 92. concentrate 93. imaginative 94. mathematics 95. selfishness 96. collectivity 97. marriageable 98. agriculturist 99. quarantinable 100. relinquishment 122 Scientific Measurement LIST III 1. art 35. razor 69. dominate 2. dry 36. saint 70. exchange 3. ice 37. smell 71. governor 4. mix 38. stock 72. manifest 5. run 39. swoop 73. osculate 6. top 40. twine 74. pleasure 7. back 41. white 75. revising 8. bond 42. barrel 76. traverse 9. chip 43. buckle 77. westward 10. crib 44. cotton 78. capitally 11. ever 45. engine 79. extremism 12. fire 46. flimsy 80. indicated 13. gilt 47. helmet 81. monoplane 14. hack 48. lesser 82. repertory 15. hunt 49. ocular 83. stimulate 16. lace 50. potato 84. unlocated 17. main 51. relate 85. accidental 18. more 52. season 86. citizenize 19. pelt 53. single 87. contribute 20. sand 54. supply 88. expertness 21. tang 55. weight 89. locomotive 22. turn 56. captain 90. prevailing 23. wine 57. contour 91. symmetrize 24. amuse 58. earnest 92. consolatory 25. blind 59. fowling 93. incremental 26. catch 60. inflate 94. penetrative 27. count 61. measure 95. superintend 28. dress 62. palaver 96. conterminous 29. fancy 63. raising 97. naturalistic 30. freak 64. seizing 98. artificiality 31. gross 65. sulphur 99. re-examination 32. inlet 66. trestle 100. sentimentalism 33. muddy 67. adhesive 34. peace 68. buttress Spelling Scales 123 LIST IV 1. bee 35. remit 69. enabling 2. elk 36. scale 70. external 3. key 37. speak 71. greeting 4. new 38. stone 72. mosquito 5. saw 39. thick 73. outfling 6. war 40. under 74. positive 7. base 41. widen 75. romantic 8. book 42. bearer 76. undulate 9. clue 43. canine 77. adverbial 10. down 44. create 78. carpentry 11. fall 45. eraser 79. franchise 12. flat 46. garret 80. infatuate 13. girt 47. hollow 81. promenade 14. hand 48. little 82. rigmarole 15. iron 49. office 83. stripling 16. lime 50. prince 84. vegetable 17. make 51. retain 85. assignment 18. move 52. settle 86. comparison 19. plug 53. sluice 87. coordinate 20. shop 54. swerve 88. expressage 21. tear 55. withal 89. mayonnaise 22. tusk 56. chicken 90. recompense 23. wire 57. counter 91. untraveled 24. apple 58. emperor 92. consumptive 25. blood 59. freight 93. infuriation 26. chain 60. journal 94. photosphere 27. craft 61. neglect 95. terrestrial 28. drawn 62. passion 96. horsemanship 29. field 63. reserve 97. regenerative 30. frost 64. serpent 98. circumscribed 31. guard 65. surface 99. sculpturesque 32. jelly 66. trouble 100. verisimilitude 33. ocean 67. affected 34. pitch 68. calendar 124 Scientific Measurement LIST V 1. bow 35. revel 69. entirely 2. fly 36. scorn 70. farewell 3. law 37. spire 71. incident 4. old 38. strut 72. mountain 5. see 39. three 73. parallel 6. ache 40. voice 74. prelimit 7. bead 41. wince 75. spectral 8. call 42. beaver 76. urbanize 9. cold 43. cannon 77. aggrieved 10. draw 44. crispy 78. clarifier 11. fast 45. escape 79. hydraulic 12. foil 46. gladly 80. inheritor 13. glue 47. hustle 81. purgation 14. hard 48. mallet 82. sacrifice 15. jack 49. oriole 83. surviving 16. line 50. pulley 84. vestibule 17. mark 51. rubric 85. authorship 18. musk 52. shears 86. concoction 19. prig 53. solace 87. derigation 20. slat 54. trifle 88. federative 21. test 55. yellow 89. memorandum 22. vend 56. circuit 90. regularity 23. wood 57. crooked 91. abnormality 24. armor 58. enstamp 92. disseminate 25. boast 59. general 93. insensitive 26. chase 60. lateral 94. predominate 27. cross 61. nourish 95. unprevented 28. enjoy 62. placard 96. inarticulate 29. fixed 63. resolve 97. stupendously 30. glean 64. signify 98. communicating 31. guild 65. tabloid 99. anthropometric 32. joint 66. unitive 100. emancipationist 33. order 67. approved 34. point 68. cerebral Spelling Scales 125 LIST VI 1. box 35. river 69. erosible 2. gap 36. shaft 70. fetching 3. lay 37. stall 71. juncture 4. pod 38. sugar 72. narcotic 5. sex 39. throw 73. parasite 6. alms 40. watch 74. probator 7. bird 41. young 75. squeaker 8. camp 42. begird 76. vagabond 9. comb 43. causal 77. amphibian 10. dusk 44. discus 78. clearness 11. fear 45. ferret 79. impatient 12. foot 46. gutter 80. intestine 13. goat 47. killed 81. quadruple 14. hawk 48. middle 82. sauciness 15. keep 49. paddle 83. ticketing 16. life 50. puzzle 84. virulence 17. mass 51. sample 85. bafflement 18. navy 52. shield 86. condescend 19. raft 53. spring 87. disconcert 20. some 54. tubule 88. illiterate 21. that 55. bicycle 89. metropolis 22. vice 56. commode 90. repression 23. work 57. discard 92. animalcular 24. aside 58. excuser 92. divestiture 25. brawn 59. gravity 93. intrinsical 26. chime 60. leaping 94. prerogative 27. crown 61. obloquy 95. upholsterer 28. equip 62. pontiff 96. interference 29. flock 63. retreat 97. subantarctic 30. grand 64. society 98. convocational 31. hedge 65. tigress 99. imperturbation 32. knock 66. vitiate 100. irresponsibility 33. ought 67. auditory 34. poppy 68. churlish These scales are reproduced by the courtesy of Dr. Daniel Starch. 126 Scientific Measurement The advantages of this method of selection are : (1) It gives a random sampling of the entire non-technical Eng- lish vocabulary, for easy words and very hard words occur in the same proportion in the lists as in the English language. (2) The list contains words sufficiently easy to test the poorest speller. (3) The essential requirement of every scientific experiment is fulfilled, since another 600 words of the same average difficulty can be chosen, by employing the same method of selection, e.g. the tenth word in the dictionary could be used in place of the first word. DIRECTIONS FOR ADMINISTERING TESTS First have the pupils write the name, grade, school, city and date at the top of the sheet. Pronounce the words clearly, but do not sound them phonetically, or inflect them so as to aid the pupils. Give the meaning of words that sound like words with a dif- ferent meaning and spelling. The pupils are to write the words and to number them in the order in which they are given. Allow sufficient time for the writing. Each grade is to be tested twice on two successive days. Use any one of the six lists on the first day and a different list on the second day. (When an entire school is being tested it may be desirable, though not necessary, to use on the first day the same list, say List 1, in all grades, and any other list on the second day.) In the first grade use the first 40 words of the list, in the second grade use the first 65 words, in the third grade use the first 80 words, in the fourth grade use the first 90 words, and in all other grades use the entire list. It has been demonstrated by administering the lists in schools, that each of them is of approximately the same difficulty. It is perhaps desirable, however, when meas- uring the efficiency of an individual group, to give two JYI yiiai I2>A 1M 1 B/1 rllfH JGAi u3m ) l H 1 3 J 4f*i <9* /■*^r *8 88 S€; SY 6T 3 88 se ■&e ee 8€ l > 36 se ee oor -HTHUC 30AHL > •^ - ee oor ,Hnn 5 .HTXI8 ^30AR3 I ir I k l navea saic ftb v> \o Jggio* 09fil tea 9Vfid 9d ^qqfld nifli Mi 9TB »wd aoofl obh Sol bad •id* • jlxridJ 09X* nod 19VO Ua idfeh 2fo» jcoisd izum TOO^ 12B9 log 100b oieai tuo biftf xllioa wol loodoE 9£0ij riiuoa 9tidw tsaiJa ^B£0 qaab txwqe rtos fta olixi 5fa:3ai iool bneta 9XU03 mid ould wold bis^ boBd ^Bbo) laog rfDOid jnhd son ifool awot jahqe Us) 9VU bib ^BJ8 TOvn svQ llLrf e:4ii bnnia laalq Did ttel /■s. D. ; ?JI)0 tm wai to! ^od lixab Snog As aid Jlood ba£d I9*niw iou'r leriiom •mag JBOd 320Ji ^aw S9ldJ 99Tl J»8 baef )8»1 oJhsI oincd bloi J2B9 93fiq daum tod no3 ooia Wo lad ql:»d bfl9 gnol blirio bx&d DbI 970I 931 ODBl J99* 09C& ^Blq 19V09 ltt9W o?.uod •98 9xd Jij,.ij T89^ 93B ^BWB 01 MEASURING SCALE FOR ABILITY IN SPELLING SECOND GRADE A B c D E F G H 1 J K L M N o P Q R s T u V w X Y z • 99 98 96 94 92 88 84 79 73 66 58 Cj-i -.SECOND .THIRD GHAHE -FOURTH GRADE «.FIFTH *GRAD£ J.6IXTH ^BRADE ^SEVENTH THIRD^. GRADE 100 99 98 96 94 92 88 84 79 73 66 58 50 FOURTH. 6RA0E 100 99 98 96 94 92 88 84 79 73 66 58 50 FIFTH. GRADE 100 99 98 96 94 92 88 84 79 73 66 58 50 SIXTH. SHADE 100 99 98 96 94 92 88 84 79 73 66 58 50 SEVENTH. GRADE 100 99 98 96 94 92 88 84 79 73 66 58 50 EIGHTH. GRADE ' 100 99 98 96 94 92 88 84 79 73 66 58 50 ■rt the be Of by day nine Mean. became catch trust except eight speod sometimes forenoon often guess meant principal organisation Immediate decision Judgment will eat face brother black enjoy stopped testimony emergency convenient recommend ■t all happy eogage combination argument whether appreciate receipt allege this had lot ride rather theater volume distinguish arrangement sincerely pielimlnary all box tree do thing teach comfort complaint terrible neighbor athletic disappoint tick alstef began able elect peZd" century colonies evidence extreme especially got cast gift S3 ^Ct total official experience practical bed but vee gone jail beautiful addition entertain victim relief proceed committee white south party suit file built Shed flight estimate. probably secretary association cordially Into —J soft track provide property supply assist accident character him foot inside sight publication Invitation foreign height | February little brtag blow they stood district connect] i) d machine difference look ring block fell fix firm examination impossible responsible did live tell spring any fight particular concern beginning old like trill five stay goes death objection Importance affair associate application difficulty bad lata ball plant grand ahould walk hold pleasure automobile red let law outajde o% drill command various book big aak song dark grant tire fortune debate local finally mother just population marriage entitle way StuDo tsr check Edge' TJ," publish further political load rJL first Income prove prepare national cold take rett bought paid beg represent material hat page east summer prefer condition business bat call SOD itself contain government child cod help railroad engine illustrate opinion Ice fall bard without unable something sudden visit progress entire different believe ought ) then feel afternoon ticket write object maim 1 fully ■ea went Priday half account instead department agreement r back On father throw Wednesday unfortunate age wife anything real thus personal family famous already certain really majority l paper gold state table woman everything attention celebration ffl high Mrs. education investigate fours each floe talk fair chief remember director therefore necessary dollar either purpose too divide has May 2S right human effort pleasant line date brots slide Important diamond if left lady farther election Monday •hip March contract feel duty derk Include convention her yet better deal though running Pad least company o'clock manner trft & brought support position field baby well herself less article letter price God ledge take oft effect Mr. why November primary for alter MB took subject April appear Liberty result distribute thing country length Saturday Ail the words in each column are of approximately equal spelling what girl try inform enough destroy appoint information difficulty. The steps in spelling difficulty from each column to the next are approximately equal steps. The numbers at the top indicate that bis led It* toll delay behind another list both heart mouth himself fact September newspaper daughter consider complete about what pei cent of correct spellings may be expected among the lay people children reply themselves children of the different grades. For example, if 20 words from around build attend column H are given as a spelling test it may be expected that the average score for an entire second grade spelling them will be about sold aide kind «»p held understand thought between dUcs SEJ" popular Christmas Russell Sage Foundation, New York City 79 per cent For a third grade it should be about 92 per cent, for a told life dear own January during several justice Division of Education fourth grade about 98 per cent, and for a fifth grade about 100 per beat member through desire gentleman cent. The limits of the groups are as follows: SO means from 46 through 54 per cent) 58 means from 55 through 62 per cent; 66 means from 63 through 69 per cent; 73 means from 70 through 76 percent; 79 means from 77 through 81 per cent; 84 means from 82 through 86 per cent; 88 means from 87 through 90 per cent; 92 means from 91 through 93 per cent; 94 means 94 and 95 per cent; 96 means 96 and 97 per cent; while 98,99 and 100 per cent are sepa- rate groups. By means of these groupings a child's spelling ability may be located in terms of grades. Thus if a child were given a 20 word spelling test from the words of column and spelled 15 words, or 75 far alike add said tonight tenth sir these dub leave ground such while (hose Miss died copy been yesterday among question police madam address August Tuesday enclose suppose wonderful direction although attempt statement Leonard P. Ayres, Director The data of this scale are computed from an aggregate of 1 ,400,000 spellings by 70,000 children in 84 cities throughout the country. The words are 1,000 in number and the list is the product of combining different studies with the object of identifying the 1,000 common- est words in English writing. Copies of this scale may be obtained :or five cents apiece. Copies of the monograph describing the inves- wind ah felt full fail however picture December tax gutting written tigations which produced it may be obtained for 30 cents each, including the scale. Address the Russell Sage Foundation. Divi- sion of Education, 130 East 22d Street, New York City. per cent of them, correctly it would be proper to say that he showed nil set lhall number arrango fourth grade spelling ability, If he spelled correctly 17 words, or lost light comma third ready October 85 per cent, he would show fifth grade ability, and so on. anyway filth top? night Eotat paaa within glad shut trtth •**J body o Spelling Scales 127 of the tests. The average of the score made in the two tests will represent pretty accurately the spelling ability. STANDARDS OF EFFICIENCY IN SPELLING These spelling tests have been standardized by admin- istering them to 2500 pupils in 12 schools of 5 cities, located in Wisconsin, Minnesota, and New York. The average results obtained are shown in the table below, in which the scores are given in round figures. Standard Scores for Spelling Grades . . 1 2 3 4 5 6 7 8 10 30 40 51 61 71 78 85 This table shows that on the average in Grade III in the schools measured, 40% of each list was spelled cor- rectly. The point of most importance for the individual teacher is to know how the pupils of a particular grade compare in spelling efficiency with pupils of the same grade of other schools. By using this very simple device, a purely objective meas- ure of spelling ability can be obtained by the ordinary teacher. No longer need we speak of "good spellers," "bad spellers" and "medium spellers"; we can assign a numerical value to the spelling ability of each individual. III. THE AYRES SPELLING TEST (iooo WORDS) 1 Ayres has also presented a further method of measur- ing spelling ability based on the one thousand most com- mon words in the English language. These words were chosen by combining the results of four previous investiga- tions which had as their object the selection of the words most commonly used in different sorts of writing. The first study was founded on passages from the Bible and other well-known writings, including in all about 100,000 1 The Ayres Spelling Scale (see insert) is reproduced by the courtesy of Dr. Leonard P. Ayres. 128 Scientific Measurement words. The second study of the frequency of different words was made on the basis of an analysis of the words used in 250 different articles taken from issues of four Sunday newspapers published in Buffalo. These articles, counting repetitions, contained 43,989 words; without repetitions, 6000 words. The third study consisted of the tabulation of 23,629 words from 2000 short letters written by 2000 people. The last study comprised a tabulation of some 200,000 words taken from the family correspondence of thirteen adults. The list of 1000 words finally selected was determined by combining the results of all these studies. Thus, the 1000 words chosen were those which occurred most fre- quently in passages selected from a wide variety of sources; namely, the Bible, the writings of famous authors, newspaper articles, and private correspondence. The method employed in standardizing the difficulty of each of the 1000 words was essentially the same as that used by Buckingham, but on a more extensive scale. The 1000 words were first made into 50 lists of 20 words each, and these lists were then administered, in the middle of the school year, to various grades in the schools of 84 cities scattered throughout the United States. The data secured from these tests made an aggregate of 1,400,000 spellings by 70,000 children. It was on the basis of this data that the Ayres Scale was constructed. The scale presented explains itself. All the words in any particular column are of approximately the same spelling difficulty, the difficulty of each word having been determined by the percentage of times the word was spelled correctly in the tests mentioned above. DIRECTIONS FOR ADMINISTERING The details for administering the tests will be clear from the following example. Suppose we wished to measure Spelling Scales 129 the spelling ability of any fifth grade. Taking any one of the columns given in the scale — say Column — we would first of all select any twenty words from it. Then we would dictate these words in a list to the class, giving ample time for each word and explaining the meaning of a word, if doubtful, by putting it in a sentence. Lastly, we would collect the papers and calculate the number of words spelled correctly. If there were 30 children in the class, that would mean that 600 spellings were performed. Suppose out of these 600 spellings there were 480 correct. Then 80% of the words would be correctly spelled. A reference to the scale, Column O, shows that the fifth grade average at midyear is 84%, and the fourth grade average, 73%. Therefore the class measured would be a little below the average fifth grade standing. Suppose a particular child in the grade gets 18 correct out of the 20 words. This means a score of 90%, or slightly below the average for the sixth grade, which is 92%. The only care that must be taken in administering the test is not to select a list of words so short that there is a chance of not obtaining representative results. For this reason, in testing the ability of a particular pupil it is well not to use less than 20 words ; but if a group is being tested, so as to obtain merely the group average, a smaller number of words may be used. It should be noted that the standards published with the Ayres Scale only apply where these words have been given to pupils who have had no especial drill on them. For, since the words in the scale are so common that they form an excellent foundation for spelling, it is reasonable to suppose that special attention will be given them. This drill will make the pupil too familiar with them to have his score judged by the standard score as obtained by Ayres. This means that probably it will be necessary for each school to establish its own standards. 130 Scientific Measurement EXERCISES 1. Select 15 words from the Buckingham Scale and use these for measuring the spelling ability of a particular class. Outline the steps you would take, and the way in which you would administer the test, score the papers, and tabulate the results. 2. What are the advantages derived from knowing the relative difficulties of different words? How should this alter the method of teaching? 3. Using the Starch Scale, how would you establish norms for the grades of your own school? Is it fair to expect a foreign district school and an English-speaking district school to produce the same percentages ? 4. Suppose a teacher took any list of 100 words and administered these to aggrade and discovered that on the average 75% of the spell- ings were correct, what would this tell or fail to tell the teacher? 5. If it was found that the average scores of a grade V, for suc- cessive years, tested in January on the Starch Scale, were 59, 60, 61, 62, 60, and the average fell suddenly to 53, where would you look for the cause? 6. How, by means of these scales, would it be possible to compare two different methods of teaching spelling? 7. If it is found that some children are very much better than the average for their grades, how should this affect the amount of time they devote to spelling? What should be done for those who are much poorer than the average? 8. Use (a) the Buckingham Scale, (6) the Ayres Scale, (c) the Starch Scale, to test the same class on successive days. Do the re- sults agree, in that they show that the class has the same ability, measured by the grade norms? 9. Why would it not be fair to apply any of these tests if the children had been drilled on the lists used in these tests? Which is the safest scale to use if we wish to eliminate this error? 10. Administer List 1 and List 2 of the Starch Scale to the same class, on successive days, and compare the average scores in each. Should they be the same? Why? CHAPTER VI COMPOSITION SCALES I. HILLEGAS SCALE H. HARVARD-NEWTON SCALES The task of evaluating efficiency in composition is obviously a complex one because, not only are there several distinct types of composition, such as narration, description, etc., but merit in each of these types is the resultant of many independent factors. Attempts to esti- mate this efficiency — the qualities desirable in English composition — have resulted in the production of three separate methods of measuring. The first method is that of the Hillegas Scale of mixed types of composition. This scale consists of a number of samples of English composition representing various types and ranging from very good to very poor in quality, each grade in the scale being represented by but one composi- tion. For example, the sample composition representing one grade may be of the narration type, while that repre- senting another grade may be of the description type. Since the composition to be measured is compared directly with the compositions in the scale, as in the Thorndike Handwriting Scale, the accurate comparison of one style of composition with an entirely different style, as is often necessary, is exceedingly difficult. It was to do away with this objection that the second method of measurement, namely, the Harvard-Newton series of four scales, was formed. These scales measure efficiency in description, narration, exposition and argu- mentation, respectively. 131 132 Scientific Measurement Thirdly, there is the method originated by Rice and used with apparent success by Bliss and Courtis. Here no attempt is made to construct an actual scale; but progress in composition writing in an individual, class, or school is determined by simply noting the improvement shown by the individual, class, or school, in successive reproductions of similar selections at intervals through- out the school year. No attempt is made to express the value of the composition in per cents or otherwise. It is simply read, and placed in the class " Excellent," " Good," " Poor," etc., on the basis of the general impression pro- duced by reading it. These initial attempts are so lacking in the precision for which the whole movement for stand- ardization of school products stands, that they need no further description. I. HILLEGAS COMPOSITION SCALE The "Hillegas Scale for the Measurement of Quality in English Composition by Young People" consists of ten sample compositions which have been arranged in order of increasing merit, merit meaning that quality which competent persons consider as such. These samples have been assigned the following values: 0, 18, 26, 37, 47, 58, 67, 77, 83, and 93, respectively. These values are not based on the ordinary percentage system used in grading and should not be confused with such per cents. Instead, each one of the values represents the number of units of quality possessed by the composition to which it is attached. Thus, the composition rated 93 is approxi- mately twice as good as the one rated 47, while the one rated 18 is approximately half as good as the one rated 37. Composition Scales 133 Dear Sir : I write to say that it aint a square deal Schools is I say they is I went to a school, red and gree green and brown aint it hito bit I say he don't know his business not today nor yesterday and you know it and I want Jennie to get me out. 18 the book I refer to reach is Ichabod Crane, it is an grate book and I like to rede it. Ichabod Crame was a man and a man wrote a book and it is called Ichabod Crane i like it because the man called it ichabod crane when I read it for it is such a great book. 26 Advantage evils are things of tyranny and there are many advantage evils. One thing is that when they opress the people they suffer awful I think it is a terriable thing when they say that you can be hanged down or trodden down without mercy and the tyranny does what they want there was tyrans in the revolutionary war and so the throwed off the yok. 37 Sulla as a Tyrant When Sulla came back from his conquest Marius had put himself consul so sulla with the army he had with him in his conquest seized the government for Marius and put himself in consul and had a list of his enemys printy and the men whoes names were on this list we beheaded. 134 Scientific Measurement 47 De Quincy First : De Quincys mother was a beautiful woman and through her De Quincy inhereted much of his genius. His running away from school enfluenced him much as he roamed through the woods, valleys and his mind became very meditative. The greatest ennuence of De Quincy's life was the opium habit. If it was not for this habit it is doubtful whether we would now be reading his writings. His companions during his college course and even before that time were great enfluences. The surroundings of De Quincy were enfluences. Not only De Quincy's habit of opium but other habits which were peculiar to his life. His marriage to the woman which he did not especially care for. The many well educated and noteworthy friends of De Quincy. 58 Fluellen The passages given show the following characteristic of Fluellen : his inclination to brag, his professed knowledge of History, his complaining character, his great patriotism, pride of his leader, admired honesty, revengeful, love of fun and punishment of those who deserve it. 67 Ichabod Crane Ichabod Crane was a schoolmaster in a place called Sleepy Hollow. He was tall and slim with broad shoulders, long arms that dangled far below his coat sleeves. His feet looked as if they might easily have been used for shovels. His nose was long and his entire frame was most looely hung to-gether. Composition Scales 135 77 Going Down with Victory As we road down Lombard Street, we saw flags waving from nearly every window. I surely felt proud that day to be the driver of the gaily decorated coach. Again and again we were cheered as we drove slowly to the postmasters, to await the coming of his majestie's mail. There wasn't one of the gaily bedecked coaches that could have compared with ours, in my estimation. So with waving flags and fluttering hearts we waited for the coming of the mail and the expected tidings of victory. When at last it did arrive the postmaster began to quickly sort the bundles, we waited anxiously. Immediately upon receiving our bundles, I lashed the horses and they responded with a jump. Out into the country we drove at reckless speed — everywhere spreading like wildfire the news, "Vic- tory!" The exileration that we all felt was shared with the horses. Up and down grade and over bridges, we drove at breakneck speed and spreading the news at every hamlet with that one cry "Victory !" When at last we were back home again, it was with the hope that we should have an- other ride some day with "Victory." 83 Venus of Melos In looking at this statute we think, not of wisdom, or power, or force, but just of beauty. She stands resting the weight of her body on one foot, and advancing the other (left) with knee bent. The posture causes the figure to swav slightly to one side, describing a fine curved line. The lower limbs are draped but the upper part of the body is un- covered. (The unfortunate loss of the statute's arms pre- 136 Scientific Measurement vents a positive knowledge of its original attitude). The eyes are partly closed, having something of a dreamy lan- gour. The nose is perfectly cut, the mouth and chin are moulded in adorable curves. Yet to say that every feature is of faultless perfection is but cold praise. No analysis can convey the sense of her peerless beauty. 93 A Foreigner's Tribute to Joan of Arc Joan of Arc, worn out by the suffering that was thrust upon her, nethertheless appeared with a brave mien before the Bishop of Beauvais. She knew, had always known that she must die when her mission was fulfilled and death held no terrors for her. To all the bishop's questions she answered firmly and without hesitation. The bishop failed to confuse her for heresy, bidding her recant if she would live. She refused and was lead to prison, from there to death. While the flames were writhing around her she bade the old bishop who stood by her to move away or he would be injured. Her last thought was of others and De Quincy says, that recant was no more in her mind than on her lips. She died as she lived, with a prayer on her lips, and listening to the voices that had whispered to her so often. The heroism of Joan of Arc was wonderful. We do not know what form her great patriotism took or how far it really led her. She spoke of hearing voices and seeing visions. We only know that she resolved to save her country, know- ing though she did so, it would cost her her life. Yet she never hesitated. She was uneducated save for the lessons taught her by nature. Yet she led armies and crowned the dauphin, king of France. She was only a girl, yet she could silence a great bishop by words that came from her heart and from her faith. She was only a woman, yet she could die as bravely as any martyr who had gone before. This scale is reproduced by the courtesy of Dr. M. B. Hillegas. Composition Scales 137 The scale was derived in the following manner. The first step taken was the collection from various sources of about 7000 English compositions ranging from the very- poorest to the best work done in the elementary and high schools. After these compositions had each been given a number from 1 to 7000, they were roughly graded by Hillegas and an assistant into ten classes, and from these ten classes 75 samples were selected. In order to have samples at both extremes of the scale, some artificial ones were supplied. Those placed at the zero end of the scale were conscious efforts by adults to write very poor Eng- lish, while those placed at the one hundred end were obtained from youthful writings of certain literary geniuses and from the work of some college freshmen. As aug- mented, the set consisted of 83 samples varying from the poorest to the best by small degrees of quality. That the character of the handwriting might not influence the judges, all the samples were typewritten and mimeo- graphed. Separate sets of these samples were then sent to about 100 individuals, who were asked to arrange the samples in the order of their merit as specimens of English com- position, calling the poorest specimen No. 1, the next, No. 2, and so on. Owing to the small number of judg- ments it was not possible to establish the position of any one sample with reasonable accuracy, but those samples that were of about equal merit were indicated. This re- sulted in the selection of a smaller set which still con- tained all the important steps in quality from the worst to the best. This smaller set, comprising 27 samples, was selected by taking successively each of the samples in the larger group that about 75% of the judges had agreed was better than the last one selected. This percentage of judgments was taken for statistical reasons which will be explained later. Where large differences in merit existed 138 Scientific Measurement between two successive samples, new samples, judged by a number of individuals as ranging in merit between them, were introduced. Then, as with the first set of samples, more than 100 of these sets consisting of 27 samples were mailed to com- petent critics of English literature, such as teachers, authors, and literary workers, with the request to rank them in order of literary merit. When 75 replies had been received, the results were tabulated as in the case of the first set. Meantime, the judgments of 41 indi- viduals especially competent to judge merit in English composition writing were secured to use as a check on the others. The examination of the results from this second set showed the necessity of adding two more samples to the set. This was done, making 29 samples in all. After one or the other of the two sets, to which 21 of the samples were common, had been judged by about 200 individuals, it was decided to make the scale. The first thing necessary was to locate a zero point. This point was to be represented by a sample which possessed absolutely no merit as an English composition. It was chosen on the basis of the judgments of 28 qualified indi- viduals. When the result of these judgments was tabu- lated, it was found that just one-half of them considered such a point as below sample 580 and one-half as above it, and so sample 580 was taken as the zero point on the scale. The ten samples chosen for the scale were selected on the principle of equally often noticed differences, which is as follows : Differences that are equally often noticed are equal (unless always or never noticed). Thus, if in a set of samples, a, b, c, d, etc., it was found that a was judged better than b, just as often as b was judged better than c, and so on, samples a, b, c, d, etc., would constitute a scale of equal steps. To put the case more concretely, Composition Scales 139 if in an essay contest, essay A was judged better than essay B in 75% of the judgments, essay B was judged better than essay C in the same number of judgments, and so on, it is readily seen that the differences in quality between essays A, B, C, etc., are equal because the same number of individuals noticed this difference. Similarly, as a result of all the comparisons made of the sample compositions, the result was approximately as follows : Sample 18 was judged better than sample in 75% of the judgments. Sample 26 was judged better than sample 18 in 75% of the judgments. Sample 37 was judged better than sample 26 in 75% of the judgments, and so on for samples 47, 58 and 94. Thus in samples 18, 26, 37, etc., we have the successive steps of a scale, steps that are equal inasmuch as they represent differences that are equally often noticed. Why the opinion of 75% of the judges was taken as the unit of value, instead of some other per cent, may probably be better understood if the following case is considered. If, in comparing the ability of two states- men, say Gladstone and Bismarck, 50% of the judges claim Gladstone to have possessed the greater ability, while 50% claim the same for Bismarck, it may safely be assumed that they possessed about equal ability. If, however, 60% of the judges believe Gladstone to have been the more efficient, the chances are that Gladstone was probably slightly more capable than Bismarck. As the percentage of judgments favoring Gladstone increases, the chances are shown to be greater that Gladstone had the superior ability, and when 100% of the judges believe him to have surpassed Bismarck it may safely be assumed that such was actually the case. Similarly, in the present case if 75% of the judges say that a given sample is better than another given sample, we may be reasonably sure that such is the case. 140 Scientific Measurement The value of any English composition may be obtained by placing it alongside the samples in the scale and decid- ing which it is most nearly like in quality. By having other judges measure it, each being in ignorance of the judgment of the others, or, if this is not practicable, by rating the sample two or three times, a very accurate measure of it may be secured. For example, if the com- position seems to be very similar in quality to sample 77, then it is marked 77. If it seems to lie between samples 77 and 83, it should be given a value between 77 and 83, as 79 or 81, according to which sample the specimen more nearly resembles. II. HARVARD-NEWTON SCALES An experiment with the Hillegas Scale in the public schools of Newton, Massachusetts, led the school authori- ties of that city to believe that it possessed several in- herent defects. They maintained that since the scale provides one, and only one, type of composition for each one of the grades, the type of one grade differing entirely from that of the next (that is, grade A in the scale is represented by one type of composition, grade B, by an- other, and so on), it was difficult or impossible to com- pare the work of one type of composition, narration, for example, with that of another type, like description. Moreover, they claimed the sample compositions were not typical of efficient school work. An attempt to remedy these defects resulted in the Harvard-Newton series of scales, the general nature of which will be de- scribed before the construction of the scale is discussed in detail. This objective measure is the outcome prin- cipally of the cooperation of Ballou, and the teachers of the Boston and Newton public school systems. It consists of four separate scales to measure the four different forms of composition in the eighth grade ; namely, description, narration, argumentation, and exposition. Composition Scales 141 Each scale in the series is composed of six compositions, actually written by eighth grade pupils ; thus each scale possesses the same qualities that it is designed to measure. These sample compositions range by approximately equal steps from the best to the poorest work which is likely to be done in the eighth grade, and each of them has been assigned a letter and a percentage valuation in con- formity with the current practice in grading. "A" rep- resents the conventional value of 95%; "B" that of 85%; "C" of 75%; and so on. In this way sample "A" is fairly representative of all compositions whose value would seem to lie somewhere between 90% and 100%; sample "B", of all those whose value would seem to lie between 80% and 90%, and so on. Each sample com- position is accompanied with a short description of its merits and defects, and it is compared with the next higher and lower compositions in the scale. These de- scriptions and comparisons were written by the teachers who helped to make the scale and expected to use it. Without some such guiding material, it is doubtful whether those who use the scale would see the same merits and defects in a composition as those who made the scale, and, unless this was the case, little advantage would be derived from its use. The general nature of the four scales may readily be seen from the one — the description scale — which follows. 142 Scientific Measurement THE COMPLETED DESCRIPTION SCALE No. 1. "A" GRADE COMPOSITION. VALUE, 94.6% A Storm in a Fishing Village It was a cold damp day in November. The sky was a heavy leaden color. In the east a black line stretched across it foretelling the coming of a storm. The houses across the way were dismal shadows, — flat, cold, heart- 5 less. A piercing chill penetrated to the bone. The rattle of a grocer's cart or the clatter of a horse's hoofs, seemed cold. The pedestrians were all clothed in black, or else the feeble light made them seem so, and they were cold — everything was cold, cold, cold. An awful lonliness 10 pervaded all. The black line in the east had grown into a cloud and was coming nearer, nearer, over the sea. Suddenly a gust of wind shook the very foundations of the houses, — an- other, and then a continuous blowing. The howling was 15 horrible. Great sheets of foam were blown into the streets, — here and there a piece of wreckage hurled itself against a cottage. Fishermen's wives hurried down the narrow streets to the shore, straining their eyes for any sign of a wreck. Old seamen looked at the roaring sea 20 and shook their heads. By this time the black cloud had engulfed the sky. The day was like night, although it was not yet noon. Boys ran about with torches which were immediately extin- guished, and the roaring called to mind the last day at 25 Pompeii. Rain had begun to descend. At first only drops fell on the hardened faces of old mariners, and on the pale countenances of wives, mingling with the drops already there. But soon great sheets fell, forcing the people in- 30 doors, to the poor shelter afforded by the groaning houses. Composition Scales 143 For about an hour the storm continued thus, then by- degrees the wind lessened, though the rain still fell, and the ocean thundered. But soon the rain also slowly- stopped and the roaring ceased. The black cloud rolled slowly away, leaving the tardy sun to shine on the drenched 35 town and the great piles of wreckage on the shore. Merits This theme ranks high because the writer has a clear picture of the scene and has used words and phrases that bring the details of this picture clearly before the reader. There are good color images in such expressions as leaden, a black line, great sheets of foam, the day was like night, and the sun shining on the drenched town. Sound effects are strikingly brought out by such phrases as the rattle of a grocer's cart, the howling, the wreckage hurled against the cottage, the roaring sea, and the thundering ocean. The sensation of dreariness and chill is conveyed by the repetition of the word cold. The confusion caused by the storm is reflected in the anxious look of the wives of the fisher- men. A further human touch is added in the mention of such details as the extinguished torches carried by the boys and the drops of rain fall- ing upon the hardened faces of the old mariners. All these enumera- tions fittingly combine to produce a tone of coldness, desolation, and anxiety. The details are told in their natural sequences. This chronological arrangement has helped the writer to keep safely to his main point and effectively connect the details with each other. Defects The repetition of the word cold, while effective in bringing out the sensation, is somewhat artificial. Loneliness (line 9), is misspelled; a semicolon should supplant the comma in line 8. Omit the comma in line 6. Comparison The theme is superior to No. 2 in its richness of imagery, its wealth of details, its depth of feeling, its maturity of style (seen in the sen- tence-structure and the vocabulary), and in its mastery of mechanical forms. 144 Scientific Measurement No. 2. "B" GRADE COMPOSITION. VALUE, 83.5% Grandmother In front of the open fireplace in a large armchair there sits our old Granny. She is old and feeble. Her hair is snow-white and over her head a little white cap is care- fully tied. Her face is full of wrinkles and her keen blue 5 eyes sparkle through a pair of glasses which she has on her nose. She has a shawl thrown over her shoulders and she also wears a thick black skirt. On her feet can be seen a pair of soft slippers which she prizes very much because they 10 were given her for a Christmas present. As you know Grannies always like to be busy our Granny is busy knitting gloves. Her hands go to and fro. She will keep on working until her knitting is done. Now that it is done she carefully folds her work and packs it 15 into her workbasket. Then she trots upstairs to bed and oh, how lonesome it is when our dear Granny is gone from the room. Merits The merits of this composition are : (1) the clear and pleasing im- pression obtained; (2) the happy choice of details and the logical sequence of their arrangement ; (3) the sympathetic treatment of the subject — for example, bits of sentiment seen in the grandmother's attachment to the slippers, and the loneliness felt when she goes to her room; (4) the interesting introductory sentence; and (5) the mechanical accuracy. Defects The defects are: (1) the rather monotonous sentence structure, and (2) the childish vocabulary. Comparison To justify its place in the scale, note : (1) that in No. 1 there is successfully treated a much more difficult subject; (2) there is a greater power of imagination; and (3) there is a greater variety of sentence structure and a richer vocabulary. Composition Scales 145 No. 3. "C" GRADE COMPOSITION. VALUE, 76.1% A Mansion As you look across the road you will first see a long private avenue or walk. It is in the summer, and on each side of this long walk are some beautiful, stately elms. They are hundreds of years old and they have done their duty for as many, 5 years, shading the walk from the noon sun. Cross the road and you will see if you look up the avenue, a beautiful mansion. It is a colonial house and four large pillars are upholding the roof. A piazza runs along three sides of the house. 10 Near the house is a tennis court where for years the occupants of the mansion have passed many an hour. Let us enter the mansion. It is a beautiful cool place, although dark. As we enter we see large psalms on each side of the entrance. On the floors are old oriental rugs 15 which have been handed down for generations. In the parlor is a harp, and on the walls are the portraits of the ancestors. In all, it is a beautiful place. Merits The writer of this theme has presented a clear though conven- tional picture. Although he changes his point of view several times, he has attempted to put his readers into the best positions to see the mansion. The choice of words is fair. Such details as the stately elms, the oriental rugs, the harp, and the portraits are well selected. Only one mistake in spelling occurs (line 14). Defects There are, however, too many paragraphs for such a short theme. Constant repetition of the pronoun you, and of the words beautiful and mansion give an impression of monotony and of limited vocabu- lary. The pupil has evidently a definite place in mind, but has not suggested the spirit of the scene, as has the writer of No. 2. 146 Scientific Measurement Comparison The composition deserves its place in the scale above No. 4 be- cause of better sentence structure and more orderly arrangement. It is inferior to No. 2 on account of its somewhat prosaic tone and its constantly changing point of view. No. 4. "D" GRADE COMPOSITION. VALUE, 66.6% The Lake at Sunrise In the Mountains of Pennsylvania there is a lake. On one side of the lake is a boat landing, at which a dozen or more boats are tied up. On this boat landing one may stand and look up the lake, at sunrise, and see 5 the sun peering up over the top of the mountains and shinning on the water. Then a King Fisher flies down the lake making his cheerful noise, instantly, all the other birds begin to chirp as if their life depended on it. Looking across the lake one would see numerous wells 10 and coves backed up by woods from which comes the chirp of the birds. Hearing the explosions of cylinders we look to see where in comes from and find a pumphouse that keeps the lake supplied with water. Looking down the lake over the dam to the ice house 15 with the roof sparkling with. On the roof of the house a hawk is sitting adding his clear whistle to noise of other birds. Looking around to the woods, at our back, with an old oil well in front of them. The birds flying from the woods 20 in flocks, and far away from the hills comes the sound of the of Italians singing. Merits The writer has seen and heard concrete details and has re-created his images clearly. He has tried, too, to make his point of view obvious to the reader. His vocabulary is adequate. Composition Scales 147 Defects As a description the composition fails because there is no unified picture of the lake. The selected details, clear in themselves, tend to distract rather than center the interest. There are numerous mechanical errors : there should be no commas after lake or sunrise (line 4) ; shining (line 6) is misspelled ; there should be a period after noise (line 7), and no comma after instantly (line 7), which should commence with a capital ; in (line 12) is not correct ; the groups of words in lines 14, 15, and lines 17, 18 do not make sentences; the word the is omitted before noise (line 16) and the word are before flying (line 18). Comparison The theme merits its rank in the scale by superiority in spelling, paragraphing, and maturity of thought. It does not, on the other hand, show equal mastery in the fine details, the discriminating vocabu- lary, and in the ability to stick to the point. The sentence-sense is faulty. No. 5. "E" GRADE COMPOSITION. VALUE, 55.4% A Light House A description of a light house is quite interesting. First a light house is generally situated on a mass of rocks in the ocean or on some great lake. And then to get into a light house is a question. Some times you have to climb to the top on a steal ladder, and again you only 5 have to go half way up and you find sort of a steal porch, which is very strong with a door in the side of the light house. On the very top of the light there is generally two or three life boats in case of accidents. In side there is an enormous light which flashes every two minutes and 10 sometimes more often it depends holy on the weather. The man himself has very favorable sleeping quarter and food it is a very lonely life except when you have a man with you. Sometimes they play cards all day long until it is time to fix the lights and then they are very busy. 15 148 Scientific Measurement Merits The merits of this theme are: (1) the evident spirit of faithful accuracy ; and (2) a successful use of certain simple words, — such as mass of rocks, enormous light, and lonely life. Defects Many obvious defects warrant its low position in the scale. The pupil was asked to write a description. After announcing his pur- pose to do this, he writes an exposition, or explanation of lighthouses in general. The first sentence of the theme is worthless, contributing nothing toward the development of the subject. It should be omitted. The paragraph is full of misspelled words and grammatical slips; steal, in side, holy, some times, sleeping quarter. The most striking weakness of the work is the loose and rambling form of the sentences, indicating indefinite thought. "Run-on" sentences are found in lines 9-13. No attempt has been made to establish a point of view. On this account, and because of a lack of vivid words, the passage is dead and colorless. Comparison The composition is placed above No. 6 because it contains fewer mechanical errors. NO. 6. "F" GRADE COMPOSITION. VALUE, 44.9% A Scene on the Prairies Along a large plain in the west with mountains on all sides. The sun was just sinking behind the mountains. Some trappers were on the plain just about to get their supper. They had one tend because there was just three 5 of them. Beside their tent tripled a little spring. After the three trappers had eating there supper they sat down by the fire because it had growing dark. All of a sudden a bunch of Indain's came riding up. When they came near they fired of their guns and disappered in the dark- 10 ness and the trappers turned into camp leaving one a the trappers on gaurd. a Composition Scales 149 Merits The commendable features of this composition are directness, simplicity, and a logical arrangement of details. The writer passes from the general to the specific in a natural manner. In spite of a change in the point of view in the last two sentences, the paragraph, as a whole, makes a clear picture. Defects Blunders in grammar and in spelling, lack of sentence-sense, and short, childish sentences make the rating of the composition necessarily very low. Such errors as tend for tent, tripled for trickled, eating for eaten, growing for grown, and the misspelling of Indians indicate either hasty, careless work, or slovenly habits of enunciation. Comparison Compared with the descriptions of the storm and of grandmother, the short sentences here show immaturity and weakness rather than skill or force. With a large amount of correcting of mechanical de- tails, but with very little revising as a whole, this composition would be superior to No. 5. The scales and tables in this section are reproduced by the courtesy of Dr. F. W. Ballou. 150 Scientific Measurement EFFECT OF USING THE SCALE An initial experiment in the use of the description scale was made in Arlington and Boston. Eighth grade teachers and elementary school principals in these two cities graded a set of twenty-five eighth grade compositions secured for this purpose, both without the use of the scale and with it. With the use of the scale the results showed a reduc- tion in the extreme variation of judgments; that is, no two teachers were quite so widely divergent as before. The average variation was also less. But in this matter neither the average nor the extreme variation is the most important consideration. Far more important is the effect which the use of the scale has on the grading of each individual teacher. To ascertain this is obviously a com- plicated matter, and it requires more time than has been thus far at our disposal. This phase of the problem will be the subject of further investigation. The compositions used in the scale were selected from a large number written by the eighth grade pupils of Newton as a part of their regular school work. Each pupil was given his choice among several topics of descrip- tion, narration, exposition, and argumentation, suggested by himself or the teacher, and was required to write a composition of about a page in length. Time for prepara- tion and correction was allowed. Thus, these composi- tions represented the best unaided writing of the indi- vidual children in the eighth grade of that particular city. Then a selection from all these compositions was made by the individual eighth grade teachers. This selection included at least 25% of all the compositions written in a particular class and was made with the view of securing compositions representing all degrees of ability in that class. The compositions were then numerically graded by the eighth grade teacher and the principal, inde- pendently. To be sure of securing compositions deserv- Composition Scales 151 ing the highest grade of merit, namely, "A" or 95%, each school, in addition, sent in from one to three of its "best" compositions in all four types of writing, as judged by the teacher and principal. Twenty-five samples of each one of the four types of composition — description, narration, exposition, and argumentation — seemed a sufficient number from which to select the six composi- tions to be used in the final construction of each one of the four scales. Twenty-five samples, then, of each type were selected on the basis of the preliminary grading given the compositions by the teachers and prin- cipals and on the judgment of Ballou, director of the experiment. To eliminate any possible influence of handwriting these samples were typewritten and mimeographed. Then one set, consisting of 25 samples of each of the four types of composition, was sent to each of the eighth grade teachers and principals, 25 in all, with instructions (1) to grade each of the compositions independently and (2) to rank each in the order of its merit. Because of the probability that 95% rather than 100% would represent the highest degree of efficiency in com- position writing in the eighth grade, and because it was desirable that each reader should start from the same point in marking the compositions, the teachers were asked to give 95% to the best compositions. Although no lower limit was fixed, 40% was intended to be that limit ; for compositions worth less than that were not to be furnished by the schools for the experiment. As already stated each composition was graded by 25 teachers, and, when the marks came in, five things were noted with regard to each of them : (1) Its average mark (found by dividing the sum of all the marks by 25). (2) Its median mark (found by ranging all the marks given it in order from the highest to the lowest and taking 152 Scientific Measurement the middle one). (This is easier to find than the average and for many purposes it is better.) (3) The highest mark given it. (4) The lowest mark given it. (5) The difference between these two, which is the maximum variation in the marking of these particular compositions. Marks Given to the Twenty-five Compositions Composi- Highest Lowest Maximum Mean or Aver- Median 1 tion Number Grade Grade Variation age Grade Grade 1 95 68 27 91.9 83.0 2 90 64 26 80.0 80.0 3 50 30 20 42.7 41.0 4 94 63 31 84.3 85.5 5 78 50 28 61.1 60.0 6 88 50 38 69.4 69.5 7 80 40 40 63.5 65.0 8 95 52 43 82.3 85.0 9 75 40 35 56.1 58.5 10 95 90 5 94.5 95.0 11 65 40 25 49.5 49.5 12 75 42 33 59.9 60.0 13 95 71 24 83.7 85.0 14 76 40 36 55.4 53.5 15 95 80 15 89.6 90.0 16 92 68 24 78.2 78.5 17 93 63 30 81.0 81.5 18 90 60 30 79.9 75.0 19 92 60 32 79.6 80.0 20 92 70 22 82.7 85.0 21 89 54 35 76.1 77.0 22 86 47 39 66.6 66.5 23 74 40 34 55.4 57.5 24 73 30 43 48.9 48.0 25 62 20 42 44.9 45.0 As a check on the results of the gradings, the returns from the rankings were also tabulated and the same items noted as in the case of the grades. 1 "Median grade" is the grade in the series of grades above which and below which there is an equal number of grades. Composition Scales 153 After the various items in both grading and ranking had been recorded for each composition, using these data as a basis, it was necessary to choose the compositions best fitted to have a place in the scale. It is obvious that compositions about which there was most agreement in judgment on the part of the teachers, both as to rank and grade — that is, compositions with low maximum variations — were most desirable ; furthermore, since it was the intention of the authors of the scale that the six compositions selected should represent 95%, 85%, 75%, 65%, 55% and 45%, respectively, in choosing the composi- tions for the scale they accordingly selected those whose average and median marks came nearest those require- ments. In short, in constructing the scale there were no fixed requirements set. The compositions selected were those about which there was the least disagreement as to merit and whose marks approximated those desired in the scale. After the six compositions had been selected on this basis, the teachers were asked to point out in a brief para- graph the merits and defects of each of the compositions. These paragraphs were carefully studied and compared by a committee who, acting under expert advice, put the various criticisms into the form shown in the scale already presented. The method of using this scale is very simple. The composition to be measured is compared directly with those in the appropriate scale — description, narration, etc. — and its value determined in terms of the marks assigned to the sample composition which it most nearly approaches in quality. Thus a descriptive composition is placed alongside the compositions in the description scale, a narrative composition alongside the compositions in the narration scale, etc. If the composition to be measured seems to possess the same qualities as a given composition in the scale — say the composition represent- 154 Scientific Measurement ing grade "B" in the description scale — then it is assigned the same value as that composition, namely grade "B" or 83.5%. If its value seems to lie some- where between two grades on the scale as represented by- two compositions, say "A" (94.6%) and "B" (83.5%), the examiner can determine its value as precisely as he pleases according to its apparent distance below the one and above the other. In spite of the difficulty of comparing a sample of com- position writing of one type with a sample of another type, as is necessary in using the Hillegas Scale, in actual practice the Hillegas Scale has on the whole been used to greater advantage than the Harvard-Newton Scale. This has been due chiefly to the fact that the field in which the former may be used — the elementary grades and high school — is not as limited as that of the latter, which is confined to the eighth grade. However, for eighth grade measurements the Harvard-Newton Scale may obviously be used to better advantage. The teacher may obtain the Hillegas Scale by sending to Teachers College, Columbia University, New York. To recapitulate, all that need be done in using it is to slide the composition to be measured along the scale — as in the case of the handwriting scales — beginning with the sample marked 0, until a sample is reached on the scale to which the specimen to be measured most closely corresponds in quality. As has been said, the former may be of an entirely different type from the latter. The composition to be measured is then given the same value as the one on the scale to which it is most similar in quality. That is, if it appears to be very like the com- position marked 77 it is given the value 77. If it seems to be better than composition 77 but not so good as the next composition in the scale, number 83, it is given a value somewhere between 77 and 83 such as 79 or 81. Teachers of the eighth grade may obtain the Harvard- Composition Scales 155 Newton Scale by sending to The Harvard University- Press, Boston, Mass. In using it, a descriptive composi- tion is measured by comparing it with the sample com- positions on the description scale, a narrative composi- tion, by comparing it with samples in the narration scale, etc. Whichever scale is used, in obtaining the compositions to be measured, the teacher must see first of all that the same amount of time for writing is allowed to all the pupils and, secondly, that the same subject is given to all to write upon. Even in thus making the conditions under which the compositions are obtained as objectively uniform as possible, it is apparent that certain subjective influences, such as interest for example, which cannot be eliminated, are bound to affect the result. Furthermore, within the same class there will be the widest difference in the amount of material written. While it is evident that in disregarding these two fac- tors the scales are not complete as adequate measures of composition writing, still they are of great value ; for by their use the composition work of any grade, school, or system of schools in any part of the country may be compared with that of any other, and the results of dif- ferent methods of instruction or of other conditions ascer- tained and utilized. Moreover, there is so intimate a relation between the successful use of oral and written language and intelligence that an objective standard which accurately measures ability in the use of language also measures, to a certain extent, the possession of mental ability in general. In the writing of English composi- tion, whatever its type, children are compelled, or should be compelled, above everything else to make themselves clear, and, by the use of a uniform standard of judgment, the growth of reason itself, from grade to grade, may be followed and subnormal or supernormal children detected. Then, too, the difference shown by the same child in the 156 Scientific Measurement various types of composition may give a fair idea of his individuality. Increased knowledge of the various types of pupils with which the school has to deal will naturally lead to greater variety in teaching and correspondingly better results with the children. Any such educational progress, however, will come not as an expression of mere opinion, but as the result of scientifically determined educational facts obtained by the use of objective stand- ards. The more scientific, yet comprehensible, are our methods of investigation, the more valuable will be their results. EXERCISES 1. In what way may these scales be utilized to secure a very accu- rate judgment of the merit of a given composition? 2. What relation seems to exist between ability in composition writing and ability in other subjects in the curriculum? Between ability in composition writing and general intelligence ? 3. Procure twenty compositions from various grades and get five teachers to mark them on a percentage basis. What do the results show regarding the reliability of such measures? 4. How would the ratings given by five teachers to twenty com- positions of varying merit test the reliability of the Hillegas Scale? 5. Suppose the Composition Scale revealed a great difference in the same child in the various types of composition writing, of what value would this be to the teacher? 6. Obtain forty specimens of English composition from the various grades. Grade these on the Hillegas Scale. Allow one month to elapse and grade again. What do the results show? 7. In what type of composition writing do you think a child should be most proficient? 8. Suppose a teacher discovered by the use of the scales that the pupils on the whole showed far greater efficiency in one type of com- position writing than in another, what should be the conclusion? 9. How would you modify the standard for composition writing for your particular grade? Why? 10. What modifications would you make in it if your pupils came from a foreign neighborhood? CHAPTER VII COMPLETION TEST LANGUAGE SCALES — TRABUE Suppose we consider an incomplete sentence such as the following: "The . . . rises . . . the morning and ... at night," where three words are omitted, the place of each word being filled by a dotted line; it is a simple matter for any one who is acquainted with the English language to insert a word in each of these three blank spaces, which will cause the sentence to make sense. In the above example, these words are "sun," "in," and "sets," making the sentence read : "The sun rises in the morning and sets at night." The completion of sentences of this kind, while not actually testing ability in English composition, demands an ability very closely related to what is usually called "language ability"; at any rate, it involves a power to read and think about printed words which has great educational significance. , From the nature of this test it is obvious that we may have sentences for completion of all degrees of difficulty. While a sentence such as, "The sky . . . blue," requires next to no ability in English language, a sentence such as the following; "To . . . friends is always . . . the .... it takes," is of sufficient difficulty to test the ability of a college student. If, therefore, we could select a series of incomplete sentences increasing in difficulty from the first to the last, with this as a scale, we should be in a position to measure the language ability of any individual or group. This could be accomplished by allowing a certain specified time in which to complete as 157 158 Scientific Measurement many of the sentences as possible. To construct such a scale for the measurement of language ability of this type was the object of the study made by Trabue. A large number of incomplete sentences were con- structed. After a preliminary trial fifty-six of these sen- tences were selected and their relative difficulty deter- mined by administering them, under standard conditions, to several thousand children and young people in various school systems. The detailed scheme by which each sentence was marked will be described later, but the general method was to give a score of 2 for a perfect com- pletion, a score of 1 for an almost but not quite perfect completion, and a score of for a failure to attempt or for an imperfect completion. By determining the different scores made on the sen- tences in the various grades, it was possible to calculate the relative difficulty of each of these sentences. Thus, two sentences were considered of equal difficulty when they were completed by the same percentage of individuals tested. The greater the difference of percentage attained in completing two sentences, the greater was the difference in the difficulties of the sentences. It is impossible to enter into the details of these calculations, but the method employed was essentially the same as that described in the construction of the Buckingham Spelling Scale. Knowing the difficulty of these original sentences, Trabue constructed eight short scales. The following are some of the reasons for the use of several short scales : (1) A short scale takes less time to administer and score ; (2) a measure of ability is more reliable when taken on two separate occasions than when taken at one time; (3) a number of scales of equal difficulty admit of a class being tested from time to time, the use of different scales being necessary to eliminate the factor of memory. Two scales, called by the author B and C, are here shown ; in the study six similar scales are also given. Completion Test Language Scales 159 Language Scale B Write only one word on each blank. Time limit, seven minutes. Name 1. We like good boys girls. 6. The is barking at the cat. 8. The stars and the will shine tonight. 22. Time often more valuable money. 23. The poor baby as if it were sick. 81. She if she will. 35. Brothers and sisters always to help other and should quarrel. 38 weather usually a good effect one's spirits. 48. It is very annoying to tooth-ache, often comes at the most time imaginable. 54. To friends is always the it takes. Language Scale C Write only one word on each blank. Time limit, seven minutes. Name 2. The sky blue. 5. Men older than boys. 12. Good boys kind their sisters. 19. The girl fell and her head. 24. The rises the morning and at night. 30. The boy who hard do well. 37. Men more to do heavy work women. 44. The sun is so that one can not directly causing great discomfort to the eyes. 53. The knowledge of use fire is of important things known by but unknown animals. 56. One ought to great care to the right of , for one who bad habits it to get away from them. The scales in this section are reproduced by the courtesy of Dr. M. R. Trabue. 160 Scientific Measurement Each of these scales consists of ten steps or sentences, the intervals between the various sentences being approxi- mately equal ; that is, sentence 6 is as much more difficult than sentence 1, as sentence 8 is more difficult than sen- tence 6, and so on. It should, however, be noted that Scale C is, on the whole, a little harder than Scale B, Sen- tence 2 in Scale C is a little more difficult than sentence 1 in Scale B, and sentence 5 in Scale C is a little more difficult than sentence 6 in Scale B. The same is true throughout the series. Directions for Administering the Test The scales which have been described may be pur- chased in any quantity from the Bureau of Publications, Teachers College, New York. It should be noted that these standard blanks must be used if the results obtained are to be used for comparative purposes. When the test is given to a third or lower grade, it is necessary to give a little preliminary training, using a practice sheet, which can be secured with the regular tests. In the fourth grade and above, the following oral explanation should be made before distributing any papers : This sheet contains some incomplete sentences, which form a scale. This scale is to measure how carefully and rapidly you can think, and especially how good you are in your language work. You are to write one word on each blank, in each case selecting the word which makes the most sensible statement. You may have just seven minutes in which to sign your name at the top of the page and write the words that are missing. The papers will be passed to you with the face downward. Do not turn them over until we are all ready. After the signal is given to start, re- member that you are to write just one word on each blank and that your score depends on the number of perfect sentences you have at the end of seven minutes. If there are no questions, the papers may then be dis- tributed, taking care that no child looks at the printed Completion Test Language Scales 161 side until there is a paper upon the desk of each child and the following additional instructions have been given : After you have been working seven minutes, I shall say, "The time is up. All stop writing ! " You will all please stop at once and lay aside your pens (or pencils). Now if you are all ready, you may turn your papers, sign your names and fill the blanks. Take note of the exact time at which the signal to start was given, allow exactly seven minutes, and give the command to stop writing. Collect all papers at once. It is very important that exactly seven minutes be al- lowed. A stop watch is the most satisfactory means of keeping the time on a test of this sort. Grade each paper according to the general scheme about to be described, and make a record of the total number of points made by each pupil, in order that the amount of progress of each indi- vidual may be determined when this scale is used for a second time, or when another scale is employed. Then arrange the scores in ascending order and find out the median score ; namely, that point above and below which there are an equal number of scores. This median value may then be compared with the medians obtained by other classes. General Scheme of Scoring The following general scheme has been the basis upon which the more detailed judgments have been based : Score 2 A score of 2 points is to be given each sentence completed perfectly. Errors in spelling, capitalization, and punctuation should not be allowed to affect the score. Score 1 A score of 1 is to be given each sentence completed with only a slight imperfection. A poorly chosen word or a common gram- matical error, which makes the sentence less than perfect and yet leaves it with reasonably good sense, should serve to reduce the score from 2 to 1. 162 Scientific Measurement Score A score of is to be given if the sentence as completed has its sense or construction badly distorted. A sentence must have reason- ably good meaning and express a sentiment which might honestly be held by an intelligent person in order to receive a higher credit than zero. It is apparent that the above method of scoring leaves more than is desirable to the judgment of the person who is rating the sentence. This subjective element in the marking is much reduced, however, by a careful con- sideration of the examples given by the author of what in his opinion constitutes a sentence worth the score 2, 1, and 0, respectively. For illustration take the sentence : 30. The boy who hard do well. Score 2 works, tries, studies, thinks, will, Score 1 tries .... can, may, does, shall, should, could, must, worked, tried, .... did, will, can, plays, hits, work, .... will, Score tries .... sometimes, surely, often, did ... . work did, does .... work, work .... did, All the other sentences are treated similarly. The reader is referred to the original study for these completions, as they are too bulky to warrant introduction here. It will be noticed that the score is given for the whole sentence; that is, in those cases where more than one blank appears, the mark is not given for each single com- pletion but for the whole sentence. To summarize : All that is necessary to test a class in the type of language ability measured by these scales, is to procure the standard blanks from the publishers. Follow carefully the directions for the administration of the test. Score the tests according to the scheme out- lined. Determine the score below which and above which Completion Test Language Scales 163 there are an equal number of pupil's records, and then compare this median value with previous records, if such have been obtained ; if not these first results will estab- lish tentative standards. EXERCISES 1. How could you rank five completion tests, of your own con- struction, according to their difficulty? What is the test of a suit- able sentence for a particular group ? 2. What is the advantage of having the difficulty of the sentences in the scale rise by equal increments? What would happen if three of the sentences were of the same difficulty? 3. How would you use completion tests for determining whether the pupils had read a certain assignment of history or geography? 4. To what extent do these completion tests measure a valuable language ability? How does this type of ability compare with ability in English Composition in your class ? 5. How could the idea of the completion sentence be employed to measure ability in a foreign language? 6. Can you reasonably expect the same standard of work, in this test, from schools in a foreign district, and in an English-speaking district? How could a school of the first type establish its own standards? 7. How would you determine the standard of your own class in this test, as compared with other classes of the same grade? 8. State how you would compare the standing of your own school with that of another school? What conditions would have to be fulfilled to make this comparison justifiable? 9. Are completion sentences merely a test, or could they be used with advantage as an exercise to increase thought in language lessons? 10. Suppose a grade fell notably below its average of the last few years, what steps would you take to meet this decline? CHAPTER VIII DRAWING SCALE THORNDIKE DRAWING SCALE The measurement of improvement and efficiency in a subject such as drawing is beset with great difficulties. It is reasonable to suppose that in art the judgment of excellence depends on the individual teacher, to a greater extent than in most of the other subjects in the school course. In spite of this supposition, Thorndike in 1913 presented a scale which, though merely tentative, yet limits to a great degree the possible differences of individual opinion in estimating drawing ability. Its method of derivation is very similar to that employed in the scale for the measurement of English composition. From 45 carefully selected drawings from Kerschensteiner's "Die Entwickelung der zeichnerischen Begabung," a more limited selection of 14 drawings was made. These, together with a drawing from another source, constituted the 15 samples. These samples were then submitted to artists, teachers, and students of education and psychol- ogy, with the request that they be ranked in the order of merit : that is, that No. 1 be assigned to the drawing which, in the opinion of the judges, is the best; No. 2, to the drawing that is the next best, etc. ; No. 15 being as- signed to the very worst drawing. It was stated quite clearly that no allowance should be made for the apparent age or training of those who had made the drawings, but that the drawings should all be judged by the standard of their intrinsic merit. In all, 376 ratings or rankings of the 15 drawings were obtained, 60 of which were from 164 Drawing Scale 165 A Scale for the Merit of Drawings by Pupils 8 to 15 Years Old The numbers give the merit of the drawing as judged by 400 artists, teachers of drawing and men expert in education in general 166 Scientific Measurement Drawing Scale 167 00 ! feu S \P 168 Scientific Measurement •-$ * H i— i w o I— I Pn I o w u Q P3 O o Pn s H O o o o x o 00 r-t OJ CO -rj< ® IS IS (3 a 1-3 11 i-l •4 01 l-s a N O Ha a OS l-s M o a 3 1-3 a >-3 a a 1-3 a ai O a a >-3 a d 1-3 H O a n & bo a c T-l m H '-13 a rC -t-> < 4-1 o •»— i 0) s co +i a: ai Eh « •i— » 0) a < o> <* +j CQ -. bo q • i— < a c •t— » 00 bo C a) 'Lo o a S o O co "Ho fl W co a> H a; bo as 3 bo as h3 bo C "a; ft in bo S3 Q The Application of the Scales in the Schools 177 In this way it will be a simple matter to determine the exact point at which the pupil failed to advance at the normal rate in any particular line of study. The teacher will be able to see whether failure was confined to a par- ticular subject or whether it also took place in other sub- jects in the curriculum. If it is found that the child has failed in but one subject and not in the others, then we must assume either that the child was abnormal in that subject or that the method of instruction in that particu- lar branch was not up to the usual standard. If, in addi- tion to this, it is found that the majority of pupils under a particular teacher have failed to advance in this subject alone and not in others, then there is every reason to suppose that it was not the fault of the class but rather the fault of the teacher in failing to give attention to the subject or in using some method which could not produce the average rate of progress. Again, in the case of a particular pupil, it may be found that the failure to progress was not confined to one subject, but that it extended to all subjects. In this case further inquiries must be made. It may have been a matter of arrested mental development, or it may have been due to physical causes or to social conditions which did not admit of the child's spending sufficient time in school. Such a chart as we have shown can easily be passed on from school to school as the child goes from one neigh- borhood to another, or it can be passed from school sys- tem to school system or from country to country, for the very essence of these universal scales is that they are independent of place and time. School systems, under these conditions, will keep track of every pupil from the time he enters to the time he leaves. In other words, the administrator will cease to deal with mere groups of children and will deal with the individual child. Application of the Objective Scales to the Question of Promotion. Such methods of measurement will bring to 178 Scientific Measurement the question of promotion a definiteness which is sorely needed. It is too well known that in many school sys- tems a high percentage promotion does not mean a high standard of work, but rather a lowering of that standard to enable the requisite number of children to pass. As a result, pupils are often found in the higher grades who are totally unable to profit by the relatively advanced in- struction given. As long as the present loose methods of measuring school achievements are in vogue, such a state of affairs is inevitable; under the new system a radical change will be possible, for with certain exceptions the presence of a child in a particular grade must mean that he has passed certain points on the scales which measure the various school abilities. If these points have not been reached, then the pupil will not be promoted, for he will be unable to profit by the instruction given. A teacher will be able to measure the abilities of pupils when they are received in September, and if promotion has taken place in spite of bad previous records, he will at least know of this, and, by pointing to their records, will be able to free himself from criticism on account of their ultimate failure. The position of the efficient and conscientious teacher will be established, not on the insecure basis of the opinion of an often prejudiced super- visor, but on the basis of the actual work of the pupils judged by impartial standards. Application of Objective Scales to Vocational Guidance. Such a chart of improvement will be of great service when the pupil on leaving school requires vocational guidance. The employer will state the requirements of his work in the different school subjects, while the voca- tional guidance expert, by consulting the chart, can deter- mine the extent to which the pupil measures up to these requirements. The Objective Scale as Limiting the Amount of Improve- ment Necessary. Again, it is true that in many subjects The Application of the Scales in the Schools 179 only a certain degree of efficiency is demanded by the world. For example, there is no object in being able to write better than is required for reasonable grace and legibility. The handwriting of some children shows a wasted youth ! If time is spent beyond a certain point, it is relatively wasted. Yet what guarantee have we that when children reach this point they will no longer be given writing lessons? Under the present subjective system of measurement such a guarantee is impossible, and, if given, is meaningless. When the objective scale is used for measuring handwriting, the matter is perfectly simple ; for the child knows that when he reaches a cer- tain point on the scale, provided he keeps up to that point, all formal writing lessons will cease. Application of the Objective Scales in Rural Schools. These scales will find ready application in the rural schools, where the teacher is unable to form correct estimates of the work because small classes do not afford a basis for judgment. With the new methods which these scales introduce, the isolated child in the rural school can be compared with, and in a sense can compete with, children of like age in the city system. In fact, at present one of the authors is comparing, by means of these universal standards, the work of 100 rural school children of a given age, with a random sampling of 100 city school children of the same age. In a sense the results of this experiment will be as definite as measurements made of the pupils' height and weight by means of the foot-rule and the weighing machine. The Scales as Revealing the Success and Failure of School Methods. The purpose of these scales, in fact of the whole subject of educational measurements, is not, like the ordinary examination, to test merely the efficiency of the individual teacher or pupil, but rather to test the effi- ciency of the teaching process itself. The individuals are examined in many cases, not because of our interest 180 Scientific Measurement in them as individuals, but because their work will reveal whether the method which is being used in their instruc- tion is sound. Many of the failures in our schools are due, not to unavoidable inefficiency on the part of the teachers, but rather to lack of knowledge on their part that their efforts are failing to produce the desired results. Were the teachers themselves aware that they were fail- ing, they would certainly attempt to alter their methods. It is lack of definite knowledge of what the pupils are accomplishing, and not incompetence or indifference, which prevents a better adaptation of method to product desired. For this reason teachers should be willing and eager to submit their work to an impersonal standard, not so that it may be praised or condemned, but so that they them- selves may know whether their methods are producing as good results as may reasonably be expected. Teachers should have a more exact knowledge than they have had in the past of those processes which are going on in their pupils, for it is the changes which occur during the school period that must be measured. Over these changes we have more or less direct control ; the test of life is too re- mote. The application of these objective scales enables the teacher to know what is happening, not in terms of mere empty formulae which unfortunately have become associated with the word "method," but rather in terms of what the pupils can actually do as a result of the in- struction given them. Scientific measurement in education will narrow the limits of the wasteful trial and error method which is always incident to the teaching process, however con- scientious the teachers may be. It will also do another great service, for it is undeniable that, by means of these scales, the complacency of a small section of teachers can be disturbed by actually showing them their failure in black and white. The greatest check on inefficiency in The Application of the Scales in the Schools 181 any system is the knowledge that the work of each teacher and the work of each school can be compared with the work of other teachers and the work of other schools. A school which is confronted with indisputable evidence of its shortcomings is in a position to investigate causes, and if necessary to trace them to individuals ; such procedure is always the forerunner of progress. EXERCISES 1. What would be the chief difficulties in constructing a scale for the measurement of knowledge of American history in the eighth grade? 2. How would you prove to an outsider that there v are great in- dividual differences in ability, even in the same class? How should a knowledge of these individual differences affect (a) the amount of matter taught; (b) the method of instruction? 3. What are the chief advantages of continuous school records? Draw up a table, and outline the methods which could be used for recording a child's progress in the fundamental studies, from the time he enters to the time he leaves school. 4. Upon what factors should promotion depend? Have we any right to promote a pupil if he is not up to certain minimum stand- ards ? How do the standard scales help to determine these minimum standards? How does too lax a promotion system disorganize the work in the higher grades? 5. Why is the present system of marking in your school an insufficient guide to the quality of the work which is being done? 6. Should all children give the same time to all studies? In what way will the use of these standard tests enable us to allow the indi- vidual child to distribute his time in a more advantageous manner? 7. How is a rural school teacher handicapped in judging the work of her pupils? Show how the scales help in this respect. 8. A superintendent of a city school system cannot decide between two proposed methods of teaching handwriting. Describe a plan whereby, in a few years, he could decide which method was the better? How have such questions been decided in the past? 9. Why is it better to measure the success of a year's work by the improvement of the pupils during that period, than by the final scores in a test at the end of the year? If you were the principal of a school, outline the methods you would employ to measure such improvement. CHAPTER X DANGERS INCIDENTAL TO THE USE OF THESE SCALES At a time when all available pressure should be brought to bear on school systems to introduce objective measure- ment into the ordinary routine of the school, it seems hardly the occasion to criticise the scales. However, a word of caution may not be out of place as to the dangers which may arise from their application, since their im- proper use will perhaps prejudice those who make the first attempts at this type of measurement. Difficulty of Comparing Methods of Teaching. It has already been stated that one of the great functions of the scales is to compare the various methods of instruction employed in the teaching of a subject. Great care, how- ever, will have to be taken to prevent mistakes in com- paring the relative values of such methods when used in different schools or systems. To know that the work in a particular subject is better in one school than in another is not sufficient to justify the judgment that the method used in the one school is superior to that in the other. In such a comparison several secondary causes must also be considered before any statement is made concerning the relative efficiency of the methods : (1) time allowed in the different schools; (2) personality of the teacher; (3) the type of neighborhood as determining the type of pupil. It will be only by the most careful experimenta- tion, where attention is paid to these points and to many others of less importance, that anything like a scientific application of the scales to the question of the values of 182 Dangers Incidental to Use of These Scales 183 methods will be obtained. The whole subject is full of danger, and many fallacies will have to be avoided. At the present time scientific attention is being directed rather to the construction and use of scales for particular groups than to comparison of procedure values ; but such comparison will be possible later, when every school sys- tem employs a competent statistician and experimenter capable of conducting genuinely scientific comparative experiments. In short, we must not strive to compare groups that are not alike or hold up standards without due consideration of social conditions. Mere statistics can never dictate final standards of achievement ; a standard set up may be too high for one school and not high enough for another. Each school, after working with these scales for some time, can establish standards of its own ; but there is always the danger that a standard may be set up which falls short of what should be done. In fact, the unwise use of standards, in this respect, may confirm the school in lax processes. Failure of Scales, from the Fact That They Measure Com- plex Abilities, to Reveal the Point of Weakness in Method. While these scales will do much to quicken methods used in the schools, it may be well to mention another point which is apt to be overlooked by some who employ such measurements. Thus, a scale may show that the method which has been used is imperfect in that it has failed to produce the desired product ; but it does not directly analyze the particular fault. The scales do not tell you what to do, but rather they tell you where you are. A teacher may be conscious that he has failed, but unable, in spite of great efforts, to find out the exact factor re- sponsible for this failure. In much the same way a phy- sician after examination may make the announcement that the organic processes are wrong, but at the same time be totally unable to attribute the cause. Although the present scales, because they measure such complex 184 Scientific Measurement activities, do not reveal the exact point at which a teacher may have failed, yet we see in the Courtis Test the begin- nings of an attempt to measure the details of what many have considered to be a single process, namely, "arith- metic ability." When more analytical scales have been worked out in other subjects, it will be possible to go into detail and tell the teacher at just what point or points he failed, these small failures accounting for the failure in the wider test. The idea might also be applied to the testing of English composition. As things are now, it is possible merely to tell a teacher that the class has failed to produce as good English composition, as measured on the Hillegas Scale, as might be expected. We are not in a position to say what details are responsible for the failure. But suppose at a later time scales should be used to test (1) punctuation, (2) extent of vocabulary, (3) choice of vocabulary, (4) power of summarization, etc. ; then that which we now attribute perforce to general weakness, we shall then assign to weakness in one or more of these factors which can be corrected by special practice. In this way we shall narrow down the limits within which the teaching process can fail without even a knowledge on the part of the teacher that it is failing. What the Scales Do Not Measure. Another objection which may be urged against the scales is that they fail to take into account such factors as interest in the process of learning, the eagerness with which pupils will continue a particular study after pressure is removed, etc. The scale also takes no direct account of the method by which the product is obtained ; it does not tell the experimenter whether these results were secured by easy work or by undue pressure on the part of the teacher. The reply is that it is only the objectors who have ever assumed that the scales do measure these things. To illustrate, in an automobile reliability test, the measurement of speed does not tell us concerning the internal mechanism of the Dangers Incidental to Use of These Scales 185 engine ; other tests must be used to measure this factor. But if a machine keeps up a high speed for a long period, then as a rule the internal factors cannot be much out of gear. In a precisely similar manner, if a class steadily keeps up its improvement on a particular scale, then it is feasible to assume that the internal factors are not seri- ously wrong. In the end, bad psychological methods such as undue driving (which is little to be feared in modern education), will yield poor objective results. The scales, however, must not be attacked because they fail in many cases to measure what no competent individual has ever claimed they do measure. The use of scales also brings with it the danger that the teacher may sacrifice everything in the classroom to the production of work which can be measured objectively, and, as already pointed out, the scales may fail to give sound relative values to different elements involved in that work. To make this point clearer, let us consider for one moment a scale for the measurement of the child's ability to add simple numbers, such as was described in the Courtis test. If the norms insist upon speed, then the teacher will work for speed ; if the norm is one for accuracy, then the teacher will work for accuracy; and the scale itself does not decide to which of these two fac- tors the greater attention should be given. Even when the scale is placed in the hands of the teacher, these ques- tions of relative value must still be decided. However, in this particular respect the scales themselves will work out their own salvation, for, by a consensus of expert opinion, it will be possible to decide for any particular grade the amount of speed that should be required as well as the degree of accuracy. Another point against which school systems must care- fully guard themselves, when these scales and standards are introduced, will be a tendency for schools to overlook those factors which do not admit of measurement by 186 Scientific Measurement such objective scales. This danger will gradually be eliminated as time goes on and as further scales for the measurement of school products are worked out. In the meantime, merely because only certain abilities at present admit of measurement, the school must not overlook sub- jects and factors which as yet do not admit of such quan- titative estimation. In particular it must not fail to take into consideration such factors as the personal char- acter of the teacher, the moral atmosphere of the school, and other spiritual values which, like life, beauty and happiness, are, to say the least, difficult fields for quanti- tative analysis. Such spiritual values in schools are of the greatest importance; to overlook or underestimate this fact would indicate a profound lack of sense of rela- tive values. Even statisticians remember these things. But because we cannot estimate spiritual values, it is no reason why we should not measure values in those realms which admit of measurement. No science would have evolved, if it had not in its beginning confined itself to a limited field, and left large parts of the subject for the future. Furthermore, there is very strong a priori evi- dence to suggest that there is a close correlation existing between spiritual values and the values which these scales measure. If in the things we can measure it can be shown that the work is inadequate, there is every reason to believe that in the region of spiritual values there are shortcomings which escape our measuring rod. Cer- tainly low objective values are no great argument for high spiritual values ! The Future of Educational Measurement. Many 'of these tests need criticism and revision, and such questions as their fairness and practicability can be answered only by the teachers who use them. For this reason the authors have refrained from any detailed consideration of the shortcomings of the individual scales. But the time spent upon their application will accomplish a twofold purpose : Dangers Incidental to Use of These Scales 187 It will improve the scales themselves ; and it will give to every teacher who employs them a quantitative point of view which is sadly lacking in the schools, for many questions of school procedure do not admit of being answered by a mere affirmative or negative — the answer is found in the quantitative measurement. The Director of Reference and Research of the Department of Educa- tion of the City of New York says: "There could be no better exercise for a teachers' seminar than a series of discussions on some selected tests that would invite the independent judgment and criticism of intelligent teachers." It is dangerous to forecast, especially when a subject is in its infancy, but there is every reason to believe that the application of the scientific method and the logic of statistics to educational problems will slowly revolution- ize the method of education, even on its philosophical side. Moreover, in certain branches it will raise the study of education to the level of an exact science, thereby winning the respect of the scientific world for a subject whose low standards of proof and loose methods in the past have been responsible for the stigma which attaches to the study of education as an academic subject in the school and college curriculum. EXERCISES 1. When we are told that a child is "poor" in arithmetic, what is implied by this statement? How may we use the scales described to discover the point at which, and the extent to which, the individual is below standard? 2. How may the norms established for the scales actually confirm a school in lax teaching methods? How could this evil be prevented? 3. What other scales would be useful in the classroom? 4. How would you start to construct a rough objective scale for measuring (a) moral judgment, (6) aesthetic appreciation, and (c) humor? 188 Scientific Measurement 5. How will the norms established by the use of these scales help greatly in settling the question of time distribution in the schedule ? 6. Why is a single survey of a school of limited value ? What are the advantages of measuring the quality of the work every half year? 7. How would you show a class the rate at which it was improv- ing, from month to month, in order to accelerate its progress in (a) spelling, (6) writing, (c) reading? 8. How would you proceed to compare two different methods of teaching spelling by means of the objective scales? Enumerate the dangers and show how you would avoid them? 9. It is sometimes said, "These scales do not measure the most important work of the school, therefore they are of little avail." How would you meet this criticism? 10. How would you conduct, in a small city system, a general survey of the quality of the work done in the common subjects of the curriculum ? APPENDIX SOURCES OF THE SCALES The sources, from which a full account of each of the scales can be obtained, are given below. Courtis, S. A. A Manual of Instructions for Giving and Scoring the Courtis Standard Tests. (75 cents.) 82 Eliot Street, Detroit. This manual also includes the Courtis Handwriting and Reading Scales. The standard blanks for any of the above tests, together with full directions for administration and scoring of the test, may be obtained from Mr. S. A. Courtis at the above address. Thorndike, E. L. Handwriting. Teachers College Record, 11 : No. 2. 1910. (30 cents.) Publication Bureau, Teachers College, New York City. Separate copies of the scale can also be secured (5 cents). Ayres, L. P. A Scale for Measuring the Quality of Handwriting of School Children. (5 cents.) Russell Sage Founda- tion, New York City. Thorndike, E. L. The Measurement of Ability in Reading. Teachers College Record, 15 : No. 4, 1914. (30 cents.) Pub- lication Bureau, Teachers College, New York City. The standard blanks used in the Thorndike Tests may be procured in any quantity from the above address. 189 / 190 Appendix Starch, D. The Measurement of Efficiency in Reading. Journal of Educational Psychology, January, 1915. (30 cents.) Warwick and York, Inc., Baltimore. The standard blanks for the administration of the test may be obtained, in any quantity, from the author, Dr. Daniel Starch, Uni- versity of Wisconsin. Buckingham, B. R. Spelling Ability: Its Measurement and Distribution. (95 cents.) Publication Bureau, Teachers College, New York City. Starch, D. The Measurement of Efficiency in Spelling. Journal of Educational Psychology, March, 1915. (30 cents.) Warwick and York, Inc., Baltimore. Ayres, L. P. A Measuring Scale for Ability in Spelling. (30 cents.) Russell Sage Foundation, New York City. HlLLEGAS, M. B. A Scale for the Measurement of Quality in English Com- position by Young People. Teachers College Record, 13 : No. 4. 1912. (30 cents.) Publication Bureau, Teachers College, New York City. Ballou, F. W. Scales for the Measurement of English Composition. (40 cents.) The University Press, Harvard Univer- sity, Cambridge, Mass. Trabue, M. R. Completion Test Language Scales. ($1.15.) Publica- tion Bureau, Teachers College, New York City. The scales described, together with the Practice Sheet, may be purchased in any quantity from the above address. Appendix 191 Thorndike, E. L. The Measurement of Achievement in Drawing. Teachers College Record, 14: No. 5. 1913. (30 cents.) Publication Bureau, Teachers College, New York City. Woody, C. Measurements of Some Achievements in Arithmetic. (95 cents.) Publication Bureau, Teachers College, New York City. The standard blanks for the administration of these tests may be procured in any quantity from the above address. BOOKS FOR FURTHER REFERENCE General Starch, D. Educational Measurements. The Macmillan Company. ($1.25.) Teachers Year Book of Educational References. Pub- lications No. 6 and No. 14. Department of Educa- tion, City of New York. Both the above books give very adequate bibliographies. Application of Scientific Measurement to a School Survey JUDD, C. H. Measuring the Work of the Public Schools. (50 cents.) Survey Committee of the Cleveland Foundation, Cleveland, Ohio. 17 AA 000 714 945 3 •