~~

hae

‘

}
|
i
i
|
i

4 ns

ein yaaa tn ii it Rp

ar
ti
(2
Le
l3TESTING THE KNOWLEDGE
, Ue of

RIGHT AND WRONG |

HUGH HARTSHORNE

MARK A. MAY
AND OTHERS

Price Seventy-five Cents

THE RELIGIOUS EDUCATION ASSOCIATION
Monograph No. 1 a) July, teaTESTING THE KNOWLEDGE
OF RIGHT AND WRONG

SIX ARTICLES

Hucu HartTsHorRNE
Mark A. May

AND OTHERS So

g 1@ i “ay
7. oS

~~“ a +3 & = Ae A
e ot fn » Ta ‘
j= / ; —
fe | wee oem ey

General Description of the Tests. || — \ tr

2. Administration of the Tests and Pr&é oS Ge
liminary Statistical Results. 7 Jp }

3. The Code Value of Moral Knowlege

Scores.

4. Some Probable Sources of Moral
Knowledge in Children.

5. The Relation of Standards to Behavior
in Individuals.

-

6. Group Standards and Group Conduct.

Reprinted from Religious Education, Issues of February, April, August, October
and December, 1926, and May, 1927TESTING THE KNOWLEDGE OF RIGHT
AND WRONG

HucuH HarTSHORNE AND Marx A. May*

FIRST ARTICLE
GENERAL DESCRIPTION OF THE TESTS

The Character Education Inquiry is devoting itself to the problem of
how to measure character. For convenience the field of character study in
which tests are called for has been divided as follows:

1. Mental content and skills, the so-called intellectual factors.

2. Desires, attitudes, motives, etc., the dynamic factors.

3. Social behavior, the performance factors.

4. Self-control, the relation of all these factors to one another and to
social-self-organization.

The first three items are of course abstractions from the unitary process
of social experience mentioned in 4. This is the concrete reality we hope
to get at, but for practical purposes it has seemed best to approach it in a
somewhat piecemeal fashion, much as a doctor examines the composition of
the blood, the reflexes, skin color, and so forth, to aid him in making a
diagnosis of the condition of the individual as a whole, even while recogniz-
ing that blood-count, taken by itself, is relatively an insignificant item.

The series of articles of which this is the first will report the efforts
so far made to test item one by means of paper and pencil tests requiring
word responses.

The investigators’ interest in what words may reveal of moral knowl-
edge is not based on the assumption that knowledge and behavior are highly
correlated. One of our problems is to discover what the relation is between
behavior and the knowledge of right and wrong. Furthermore, we do not
assume that word behavior and a true knowledge of right and wrong are
necessarily correlated. It may be that overt action is a far better indication
of what a man really knows about right and wrong than his verbal responses
are. If this be the case, there remains the very significant problem of the
relation between what he says and what he knows on the one side, and the
relation between what he says and what he does or would do, on the other.
Words have a social significance that cannot be ignored. The heart of the
problem of character lies in the adjustment of persons to one another, and
this adjustment is never complete until it has become articulate. Even the
extreme behaviorists write books.

“It should also be remembered that the fundamental folkways are rather
completely reflected in sayings, rules, slogans, definitions, and what not, and
are here far more accessible than if studied only as mores. One can find
out by word responses whether an individual is aware of certain customs

* Dr. May and Dr. Hartshorne are the investigators for the Character Educa-
tion Inquiry which is being conducted by the Institute of Educational Research
at Teachers College, Columbia University, in cooperation with the Institute of
Social and Religious Research.

In this series of articles the writers have described in some detail one section of
the work of the Inquiry. They have been asked to be as specific as possible in order
that persons not familiar with the procedures used in test building and the applica-
tion of tests to particular problems may be fully informed concerning the dangers,
difficulties, pitfalls and values of statistical methods as applied to the study of one
phase of moral behavior.

1even though his possession of a custom in the form of a habit may not be

thus revealed.
In the study of moral knowledge through word responses we have

tried to keep distinct the power of making discriminations and the subject
matter, or experiences in which discriminations are made. The one is a
factor in pure intelligence. The other is a matter of experience. | We have
no reason to suppose that the capacity to make ethical discriminations is not
adequately measured by standard intelligence tests. . Nor do we have reason
to suppose that the ability to make such discriminations 1s measured by such
tests. That is to say, experience with ethical situations and in making ethical
judgments is required in addition to native intelligence. =

This may be illustrated by reference to the relations between the ability
to do arithmetic problems and general intelligence. From the results of an
intelligence test which contains no arithmetic problems it 1s possible to pre-
dict the probable success a person would have in learning arithmetic. But
a highly intelligent person who has had no training in fundamental processes
in arithmetic would make a poor showing on an arithmetic test. ee

On the contrary, a person possessing the ability to make fine discrimi-
nations of any sort, ethical included, must possess the necessary intelligence.
A high score on an arithmetic test, that is, is a fair indication of the pres-
ence of high intelligence. It is equally true that a high score on an ethical
discrimination test is an indication of high intelligence. A low score, on
the other hand, is not necessarily an indication of low intelligence, but may
be merely the result of a limited experience in the handling of ethical situ-
ations.

What has just been said of the power of discrimination is equally true
of any other typical mental process, such as the power of retention and recall
of appropriate experience, of the organization and generalization of experi-
ence and the application of generalizations to the understanding of new
experiences and the solution of new problems, the foresight of the conse-
quences of behavior, the control of an adequate vocabulary, the recognition
of what is at stake in any situation.

In planning a set of moral knowledge tests, therefore, it was necessary
to keep in mind these two preliminary standards: First, the tests must cover
as wide a range of moral experience as possible; second, the tests must
require the exercise of as many appropriate mental processes as possible.

Sources of the Material

In order to facilitate the application of these standards, we found it
convenient to make a preliminary classification of the kinds of experience
that ought theoretically to be included in a complete set of moral knowledge
tests. Had there been time, we should have made this classification on the
basis of an extended study of the actual behavior of children of all ages and
types in all sorts of actual situations. Such a study of children’s moral
behavior is very much needed. In lieu of such a study, we did the best we
could with the knowledge of life and of children we happened to possess.
The following constituted our work sheet:

Brief Outline of Certain Mental Contents and Skills Involved in Ethical Behavior

A. Certain tools needed for the intelligent consideration of problems of social
adjustment.
1. Adequate social-ethical vocabulary.

99

We

3.

Adequate control of language—the ability to say the right thing and
to understand the more subtle nuances of delicate social adjust-
ment.

Assimilation of the fundamental ideas or generalizations in terms of
which life is coming increasingly to be understood, such as

The idea of Sex

The idea of God

The idea of Right and Wrong
The idea of Natural Law

The idea of Growth

The idea of Evolution ;
The idea of Cooperation

The idea of Personality

The idea of Custom

The idea of Design

The idea of Legislation

The idea of Education

The idea of Work

The idea of Fun

The idea of The Machine

The idea of Self-forgetting Service

B. Particular knowledges and skills needed for making social adjustments.

t:

2.

or

6.

10.

at.

Knowledge of natural law, physical and biological, and the limitations
and possibilities of experience.

Knowledge of body and mind in general and of oneself in particular:
to understand the causes and consequences of certain kinds of
behavior in oneself and others, the nature of temptation, reasons
for social and legal requirements and desiderata; to control self
and growth.

Knowledge of race experience in solving problems of social adjust-
ment, as recorded in history, foik lore, fiction, biography, poetry.
Particularly, knowledge of motives and purposes and their conse-
quences.

Knowledge of how people behave toward one another in all sorts of
situations: home, school, church, public meetings, committee meet-
ings, discussion groups, play groups, emergencies, studying, visit-
ing, etc., and the significance of this behavior for the life of the
groups concerned.

Knowledge of moral principles held by different groups, and their
implications and applications in concrete situations.

Knowledge of constitutional rights and obligations, legislative enact-
ments and sanctions affecting oneself and one’s groups.

Knowledge of institutions and other cooperative bodies and move-
ments affecting oneself or needed as instruments of social adust-
ment, such as the church, the school, the home, the state, the
town or city or community or block or neighborhood and its
government, community agencies of welfare and safety, such as
the police department, fire department, health department, national
associations such as the Child Labor Committee and Red Cross,
the movie, the playground, the library, the museum, local indus-
tries, the jail, the hospital, the court, the clinic.

What they do, their history, their value, their address, how
to cooperate.

Knowledge of how the work of the world is carried on in mining,
agriculture, industry, commerce, finance, transportation, communi-
cation, the trades and professions; mechanical and social aspects.

Knowledge of contemporary peoples, races, nations, their contacts,
conflicting interests, efforts toward peaceful settlement of dis-
putes and world organization, effects of war and armament, his-
torical and current utopias.

Knowledge of the trend of evolution, theories of the universe and
the place of man in the universe.

Knowledge of how men have experienced God in connection with
nature and in the control and development of self and society.
Prayer and reflection, retrospect, valuation, foresight, repentance,
forgiveness, aspiration, unification.

Knowledge of causes and consequences of social behavior, the habit
of foresight and valuation, the recognition of personal and social
responsibility, the habit of moral thoughtfulness.

9
vo13. Knowiedge of how to think with the materials of social action, the
habit of inhibition, abstraction from prejudice, gathering and
weighing of evidence, use of past experience, willingness to ex-
periment, discipline of group thinking, openminded consideration
of differences, respect for self and others, freedom from social
suggestion, social perception and imagination.

14. Knowledge of the sources of information needed, and the habit of
making constant reference to them.

With such a framework in mind the tests described below were con-
structed. No attempt was made to match a test against any particular one
of the above classes of material. Each test contains a variety of situations.
But the second standard, that requiring the use of as many kinds of mental
processes as possible, was applied chiefly in the form of the tests. It was
hoped that in the responses requested in the directions for the different tests

there would be found a fair sampling of the fundamental types of process.

The Tests as Given and Scored Experimentally

Of the tests devised, thirteen were given in sufficient numbers to war-
rant statistical treatment. Lach of these will be described briefly and what
each is supposed to measure or symptomatize will be pointed out. To avoid
duplication of material, the problem of criterion and method of scoring will
be discussed at the same time.

In the case of arithmetic tests the criterion as to what is the right answer
is established by universal practice. Spelling tests have a less universal
agreement back of them, but at least there are dictionaries. When it comes
to handwriting and composition the criterion has to be established experi-
mentally by ascertaining the judgment of experts and forming a scale for
the evaluation of samples of handwriting or composition produced by the
subject.“ In the case of ethical experience we are in a still different field,
in which custom and opinion are mixed together to form a great variety
of practice and judgment, with no universal agreement as to what consti-
tutes the right or wrong answer. Indeed it would be difficult to select a
group of “experts” to decide by discussion and vote what the “right” answer
of a question in ethics or the “right” solution of moral problem is.
Even with such a board of judges, there is strong probability that on many
debatable issues there would be only a majority or perhaps 75 per cent
agreement.

The idea of a “perfect” score on moral knowledge tests, therefore, will
probably have to be replaced by the notion of a scale of moral values for
each individual. Certain likenesses among these scales, once they were dis-
covered, would doubtless appear, so that they could be classified and named
without any derogatory implications such as is implied in ‘the notion of a
“low” score. A method of scoring that will reveal the individual’s trend of
thinking is therefore of more significance than one which will show merely
his position on a necessarily arbitrary scale determined by a group of judges.

Such a qualitative, descriptive, objective scale waits upon the admin-
istration of a large number of tests of the sort to be discussed below. Mean-
while they must be scored to be handled in large enough quantities for the
discovery of such scales as may prove reliable and valid. It was necessary,
therefore, to resort to the notion of a standard answer for each question in
comparison with which the particular answer given by the subject could be
automatically judged right or wrong, or partly right and partly wrong. These

4standards will be taken up in connection with the description of each test.
A. Word Tests
1. Opposites—a multiple choice test of the sort frequently used in
intelligence or achievement tests, with the words chosen from the field of
social relations. The following is a sample:

In the bracket at the right of each line place the number of the word which
is most nearly opposite in meaning to the word printed in capitals at the left.

1. GIVE. 1—present, 2—accept, 3—take, 4—wish, 5—absent........ Co yi
2. FRIEND. 1—soldier, 2—true, 3—false, 4—enemy, 5—fight...... Co ) 2
3. HELP. 1-hinder, 2—assisf, 3—someone, 4—need, 5—chantey....(........ 5-3
4, BORROW. 1—steal, 2—return, 3—book, 4—loan, 5—debt............ Cae ) 4
5. KIND. 1—sweet, 2—cruel, 3—sort, 4—sympathy, 5—always..... Cee 25

It was expected that this test would give some notion of a child’s social
ethical vocabulary as well as his handling of ethical concepts. A better
vocabulary test was later devised.

There was no peculiar problem of criterion here, as it was easy to secure
agreement on the meaning of the words.

2. Similarities—a cross-out test, of which the following is a sample:

In each line below, four of the five words belong in a class or mean about
the same thing. One of the five belongs in a different class. Find this odd word
in each line and cross it out.
1—debase, 2—ignore, 3—humble, 4—disgrace, 5—lower.
1i—quit, 2—surrender, 3—enemy, 4—relinquish, 5—forsake.
1—abhor, 2—detest, 3—loath, 4—despise, 5—reduce.
1—abjure, 2—insult, 3—revile, 4—-disparage, 5—curse.
1—love, 2—revere, 3—like, 4—adore, 5—fond.

Not only is the mental process of recognizing such likenesses and differ-
ences not usually found under a mental age of twelve, but the words and
relations selected for the test proved difficult even for children over twelve.
As this test was well represented in the vocabulary test later devised, it was
also dropped. As the criterion involved only word knowledge, it offered no
particular difficulty.

3. Word Consequences—also a multiple choice test.

The directions required that the subject indicate (1) all likely conse-
quences that might follow from the action represented by the word in cap-
itals; (2) the most likely consequence; (3) the best consequence; and (4)
the worst consequence. The following are sample test words with their
multiple choice responses from which tthe subject is to make the selections
just described :

Se cries ta

1. CHEATING. 1—courage, 2—forgery, 3—outcast, 4—wealth, 5—poverty

2. BETTING. 1—gambling, 2—poverty, 3—-optimism, 4—wealth, 5—war
3. FIGHTING. 1—weakness, 2—love, 3—injury, 4—honor, 5—death
4. COURAGE. 1—disgrace, 2—honor, 3—humility, 4—strength, 5—foolhardi-

ness
6 LOYALTY. 1—bigotry, 2—treason, 3—friendship, 4—trust, 5—timidity

This is a word test which is intended to do more than test vocabulary.
It is an association test in which the required associations are those based on
experiences of value. It is an abbreviated evaluation test. The individual
must first pick out probable consequences flowing from a form of behavior
or an attitude, and then distinguish the best from the worst of these conse-
quences. Something of his conception of the “best” is thus revealed.

The only criterion used in scoring this test was agreement between the
two investigators. The criterion involved not only judgment as to the use

5of words, but also as to the consequential relationship of certain experiences.
For this reason it was expected that the combined judgments of a group of
mature and thoughtful people would be secured in regard to each response

before the revised test was scored.

B. Sentence Tests

4. Cause and Effect
lowing:

Some of the statements made below are true and some are false. Read each

statement carefully and underline the word TRUE if it seems to you to be true.
Underline the word FALSE if it seems to you to be false.

a true-false test with 100 items such as the fol-

 

7 Good marks are chiefly a matter of JMCK 0... osc se vv vcs sige cas vee True False
2. Ministers’ sons and deacons’ daughters usually go wrong.......... True False
3. If one eats stolen apples he will have a stomach ache............. True False
4, Success always comes. from hard work... i322. 04 i cei es sees True False
5. From the standpoint of the individual workers the wage system

ia TORI. Of SIVELY a ee ee True False
6. God punishes bad people by making them sick.................. True False
7. Eavesdroppers never hear anything good about themselves........ True False
8. The youngster who can cheat and not get caught at it shows more

wood: sense than one who -does not cheat... 2. 24 6222.5. -05 True False

This test is open to the objections that need to be raised about any true-
false form of testing. We were aware of these limitations but found the
procedure useful, particularly when the test, as here, was only one of a
battery of tests and the gross score only was used in measuring the individual.

The intention of this test is the reverse of that of the consequences test
outlined above and the foresights test described below. The attempt is made
here to get at the individual’s ability to trace consequences back to their
causes. It is felt that such ability is an important factor in locating one’s
own and others’ moral responsibility for what happens, that is, in placing
oneself and others in a true causal sequence with events that superficially
may appear quite removed. Ability to place oneself in such a determinative
sequence of events is one aspect of self-conscious activity that needs to be
understood and measured.

In working out a criterion for this test as in the case of several others
we were fortunate in having available a class of sixty graduate students in
education who were taking a course called the Psychology of Character
Study. It was the sort of group of which one might expect not only con-
scientious work but also mature and liberal ethical judgment.

This group took the Cause and Effect test, and furnished us with a
criterion of a 75% (or better) agreement on seventy-seven of the hundred
items. The remaining twenty-three were reviewed by the investigators and
were either dropped or scored with the majority vote of the class, except
in a few cases where it seemed to us that either ignorance or conventional
opinion prevailed, in which cases the class decision was reversed. For
example, 55% of the class thought that success always comes from hard
work.

Theoretically, the elements of this test deal only with objective fact, but
it is in this sort of material that prejudice and highly conventional opinion
often reign. The individual’s score, if the criterion is correct, reveals his
approximation to knowledge as against ignorance, prejudice or convention.
Of course, the fact that more than 75% of these graduate students say that
it is not true that unemployment is the fault of the laborer does not make
this statement untrue. But it does lend backing to what would otherwise be
the unsupported personal judgment of the investigators. This standard is

6imperfect, very, but it is probably as objective as that which determines the
bulk of the present day-school curriculum.

5. Duties—a modified true-false test with three point response.

A hundred items of the following nature were used; the subject being
asked to indicate whether the act stated is his duty, is not his duty, or is
sometimes his duty and sometimes not:

1.. To-help a slow: or-dull- child: with iis fessonus...2. = Yes ? No
2.._lo read: the newspapers every day... 22... ‘Ves= & =No
3. To call your teacher’s attention to the fact if you received a

higher gerade than: you-deserved. 2 32... Yes =? -No
4. Lo keep a diary... Yes’: ° No
5. “Fo sneeze when -you feelclike 1.3 a Yes ? No
6. Eo jeer ‘at a child who-has just been punished]... .- Yes ? No
q. To smile when. things go wrong. 43. Yes? Noe
S. -Fo report another pupil 3 you see: him cheating... 4... Yes 2 No

This test furnishes a sort of rough index to knowledge of folkways the
significance of which to the child is indicated by whether he considers the
act his duty or not.

It is very difficult to secure a criterion for a test of this sort. The items
do not represent a grown person’s activities and it is not particularly prac-
ticable for an untrained adult to attempt to answer such questions from the
standpoint of a ten-year-old. The graduate class referred to showed far less
agreement than on the Cause and Effect test. It may prove wise later to
use as a standard the majority or 75% agreement of the pupils of a given
age who have on the other tests a score approximating mature ethical
judgment.

With some exceptions, illustrations of which are given below, the judg-
ments of the class were utilized as follows: Two answers were allowed for
each item, the one which followed the predominant vote of ‘the class having
a value of two, and the other, following the next most frequent reply, having
a value of one. On each item, therefore, a child would score two, one or
zero,

The class judgments were reversed by the investigators in the case of
some twenty items, such as the following, in which the class percentages are
given on the first line, and the final score value, as set by us, on the second:

Yes S No

To. prayat least. once: asdays. 64 19 17
0 1 2

To go to Sunday school every Sunday. .....-. 22.6 es 48 40 12
0 1 2

‘Fo: take2a_ temperance pledge. 2. 2 Se 81 17 2
0 0 2

‘Fe sell tickets to: your school-entertatnments. ==... 56 37 6
1 2 0

To correct another pupil when you hear him using bad grammar.. = a 8
2 2

To keep every secret that you promise to keep. ..........4...3..... = = ;
To keep quiet when older persons are talking a %0 2
0

6. Comprehensions—A multiple choice test suggested by the Binet com-
prehensions which employs similar situations. The Terman revision of the
Binet distinguishes among such questions three orders of difficulty instead of
lumping them together as Binet did and as we were compelled to do in our
preliminary testing. The directions in this test called for the “what you
would do or say” response first. Then after the test had been taken the

7pupils were asked to go back and indicate what would be the best thing to
do or say. As the children almost invariably checked the same items, the
second request was later dropped. It might have been better to ask some
such question as: “What would you advise a boy or girl of your own age
to do?” or “Which act would be most likely to promote your own welfare
in the long run?” or “Which act would be most fair, just and friendly for
everyone concerned ?” Le
The following are samples of the situations and responses:

1. If someone asks to borrow your pencil:
(a) Tell him it’s broken.
(b) Tell him that you just lost it. :
(c) Tell him that you don’t want to loan it.
(d) Let him take it.
2. If someone steals your lunch:
(a) Steal another lunch to even it up.
(b) Report it to the teacher.
(c) Cry about it.
(d) Say nothing about it. =

3. If you see a classmate cheating on an examination:

(a) Say nothing to anyone,
(b) Explain to him that it is wrong and warn him.
(c) Report it to the teacher. :

(d) Say nothing, but try to cheat yourself.

This test is similar to the Duties test in its intent, but with a different
technique. Counting the multiple responses there were 132 possible ways
of responding, each one a distinct item, and not merely ‘the opposite of
another, as in the case of Duties. The presumption back of these two tests
is not that one may not do the correct thing without knowing he ought to,
nor that he will do it when he knows he ought to, but that knowledge of
what is expected or of what is wisest is in the field of morals, just as in
plumbing or cooking, an essential part of intelligent control of a situation,
even when one chooses to do precisely what is not expected or what is not
wise. Our moral issues lie largely in this field of conflict, on the one hand,
between what we transiently wish and what we know is good, and, on the
other, between what is generally regarded as good and what we ideally vision
as better. In any case, the tester must know the individual’s equipment of
standards before he can understand the moral significance of his behavior.

Seventy-five per cent of the class agreed in twenty-three out of the
thirty items. In twenty-two of these cases their judgment was followed. In
one case it was reversed. In five of the remaining cases, majority opinion
was followed. One of the others was dropped and one was reversed. Of
the two reversals one was subsequently omitted.

7. Provocations—A few illustrations will introduce the test:

Here are some little stories of what some children did. You are to decide
whether they did right or wrong. If what they did was not quite right, perhaps
it was at least excusable in view of the circumstances. Look at the sample first.

SAMPLE: Jane’s family were too poor to buy fruit for her sick brother.

So every now and then Jane took an apple or an orange from a fruit stand and
brought it home to him.

Now if you think she was absolutely wrong in taking the fruit,
put a circle ground te Wer tke this... Ro. EX: 9
But if she did exactly right, encircle the R, like this...... Pee cs

If you think she was wrong but excusable in view of he
desire to bring it to her sick brother, encircle the Ex like

ee 8 ee R....€)...Ws

Begin here and do the rest in the same way:
1. Helen noticed that nearly everyone in the class was cheating
an tee, oo she Clicated too. Be ee CAVx

Harry was a Christian boy. One day a Jewish boy called

Harry a “dirty Christian.” Harry knocked him down...... Ri Ex We
Charles did not want to play marbles for keeps but the boys

called him a “sissy” so he went ahead and played for

keeps any. Way... 5.0.66 oe Re Ex, Wer
4. On the way to Sunday school Jack matched pennies with the
other boys in order to get some money for the Sunday-
school collection: 20.605. A a ee ee Ra EX. Wer

The test is called ‘“Provocations” because the situations named are pro-
vocative of responses that are in conflict with ideal modes of response. In
the case of ethically immature persons the situations stimulate wishes, preju-
dices, emotions, and so forth, which lead to the sort of action stated in the
little story. Sometimes convention supports the stated action and sometimes
it does not. In the following case convention and wish seem to agree in
contradistinction to more ideal conceptions of the appropriate response:

Henry saw a big bully strike a little boy, so Henry walked up and gave the
bully a real hard blow and knocked him down.

Judgment is passed on the particular responses listed in ‘the test, and
thus the examiner gains an insight into the level of moral judgment attained
by the subject.

As can be imagined, a standard for such a test as this is almost
impossible of achievement. It was first decided to take a conventional
standard as the criterion. Two suggested themselves, the one a rather mature
one as found in the answers given by the graduate class, the other, the less
socialized standard found by examining the actual answers given by all the
children who took the test. So many of the conventional replies, however,
offended our own sense of right and wrong that it was finally decided to
attempt an approximation to a standard that would conform to the great
historical moral ideals, and to measure all divergences from this viewpoint
rather than from some point further down the scale. The conventional
standard is thus identified by a score rather than by a qualitative exposition,
and so also is the standard of the major group to which the child belongs.
The median of his-group may be lower or higher than the conventional stand-
ard and his own score may deviate from the median of his group toward the
conventional, or toward the ideal, or toward a vague and undetermined zero
of moral knowledge.

The decisions of the graduate class turned out to be so highly conven-
tional that they were practically ignored as a criterion. They were too much
like what sixth grade children give as their responses. For example, in the
last illustration given, of the boy who knocked down the bully, 45% of the
class thought it was unqualifiedly right to knock the bully down, 42% thought
it wrong but excusable, and only 13% called it unqualifiedly wrong. In one
sixth grade previously given this test 85% marked it right, 6% excusable
and 9% wrong. Our own standard gives a value of one to excusable and
of two to wrong.

Or take the following illustration:

The neighbors had been kept awake at night by two cats fighting. So Fred
set his bulldog on them.

The following percentages of the graduate class and the sixth grade
were given to the different answers:

R Ex Wr
teraduate Class ee 19% 53% 28%
Sixth rade. 2 29% 45% 26%
Our valiation: 6.2 0 1 2Instances like this made us feel that if the test was to have real differ-
entiating value, the only possible standard to be used was one which would
grade all from the top down, on which the score would represent approxi-
mation toward consistency in forming judgments in the light of ethical ideals
rather than in terms of convention or prejudice.

8. Foresights—One important distinction between this and the Word
Consequences test is in the fact that here no suggestions are given as to the
possible consequences. The subject is left entirely to. himself in thinking
of what might happen from the events recorded. He is requested to write
down as many things as he can think of, both good things and bad, and a
sample is given as an illustration. Here are some of the incidents selected
from the forty-eight actually used:

1. Whenever anyone picked on John he would go tell his teacher.
(Space is given for a large number of possible consequences.)
John accidentally broke a street lamp with a snow ball.
Ruth’s folks had a crowded apartment so they kept a lot of boxes and things
on the fire escape.
4. Jim was anxious to make good marks at school so he usually studied instead
of going out to play with the other fellows.

Go 2%

The foresight of consequences involves the ability to see for oneself
what is likely to happen. Foresight is, of course, a conspicuous factor in
intelligence. But foresight in any particular field is a function of experience
as well as of intelligence. The foresight of social consequences is one of
the chief characteristics of the good man, and even the relatively unintelligent
can learn from experience to see ahead to the effect of their own and others’
deeds with sufficient clearness to act kindly if not altogether wisely.

The forty-eight items of this test were put into six separate forms. The
eight items of each form consisted of two sets of four each, with each set
covering about the same range of situation and allowing for about the same
range of possible consequences.

The method of scoring this test has not yet been worked out.

9. Recognitions—A multiple choice test. The following is a sample:

After each statement are five letters: C. L. S. X. J. If the deed is a case
of Cheating, draw a circle around the C; if it is Lying, around the L; if it is
Stealing, around the S. If it is something wrong, but not either cheating, lying,
or stealing, put a circle around the X. If it is not wrong at all, put a circle
around the J. If the thing is both cheating and lying or stealing and lying, or
all three, encircle all the letters you need to in order to express your opinion.
(A sample is given which is here omitted. )
1

pee oe younger cldies 2 6 Ct. a
2. sing street car transfers that are out of date.................. C43 So oe
3. Riding on the back of a truck without the driver’s knowing it,.C. iL S. A. J
4, Apologizing for a misdeed when you are not really sorry..3... 4 ee
5. Forgetting to brush your teeth for a Gv 2 = os ee
6. Talking loudly in the hallways when classes are in session..... @ tS AL 4
Po Peet Bogers fa public parle, Ct, Seg
8. When you don’t want to go somewhere, making up an excuse so

SOE 10 uel enyore’s teehags <n de Be ee,

Intelligent control of behavior involves correct classification of situations
and responses. If adjustment is to occur, new situations must be assimilated
in part at least to familiar ones which are already connected with the most
useful or appropriate responses. Stock speculation is not called stealing by
those who deal in it, nor is keeping a package mailed one by mistake. One
is just “business ;” the other is “a piece of good fortune.” <A child has to
learn how each of a thousand forms of behavior is named, or he cannot

10be expected to make satisfactory adjustments. He may not act differently
for knowing the way his acts are named, of course, but he can not be subject
to intelligent social control or social motives unless he can name his own
acts. The Recognitions test aims to discover how far this naming process
has gone.

As first given, the Recognitions were in four forms, each containing
twenty-five elements, five being cases of stealing, five of lying, five of cheat-
ing, five wrong but neither of these three, and five jokers, or things neither
right nor wrong. There was of course some overlapping, making double
values necessary in scoring.

We found that in many instances the snap judgment of the graduate
class failed to show adequate analysis of the situations into their socially
significant aspects. Here again it would be advisable to get together a group
of experienced people who would if necessary take the time to discuss their
disagreements and arrive at as many common judgments as possible.

The method of scoring this test is shown below:

Copying a composition out of a book but changing some of the words.

2s
Percent of class giving this only one name............ 12. 52
Percent of class giving it more than one name........ 24 15 20
Our scott salies.. 2 1

Thus, 72% of the class marked this solely as cheating. Two per cent
marked it solely as lying, 24% marked it cheating and also something else,
15% marked it lying and also something else, 20% marked it stealing and
also something else. So we gave cheating a value of two and stealing a value
of one. A person marking it either C or C and S got a score of two. One
marking it only S got a score of one. Any other mark was zero. In most
cases only one response was permissible. In a few cases both responses
were valued two.

10. Principles—a true-false test, containing twenty-seven items. Sam-

ples:

1. To master oneself is a greater thing thafi to win a battle.......-. True False

2. Clean speech is a sign of being a “goody-goody”................. True False

3. Obedience is of greater importance than honor................... True False

4. No one should be forgiven a wrong deed until he has asked for
forgiveness: 5055 es ee ee True False

5. Jf anyone hurts you it is up-to-you to: hurt him... 4 =, True False

Self-criticism involves knowledge of principles. It is fairly easy to
cover the whole range of moral behavior in a few principles—even in one
or two if they are general enough. It is difficult, however, to discover when
a principle is “known.” It may be recognized verbally or in general outline
and yet be left entirely unconnected with any mental or physical action. Yet
it cannot be grasped in communicable form without words. Hence we felt
it to be necessary to test the verbal mastery of great moral principles.
Although the true-false technique does not make an adequate analysis of the
subject’s knowledge possible, it probably does place him properly in his
group.

There proved to be very little disagreement among mature judges as to
how to mark these elements. Three were not scored on account of am-
biguities revealed by the way they were marked by the graduate class.

11. Applications—a multiple choice test, utilizing the elements already
found in the Principles test and the Provocations test.

“Knowing” a principle involves being able to unearth it and apply it to

 

 

11a given situation to which it affords the key, and also the ability to see in
advance the concrete situations to which it is likely to apply. This test
attempts to find out something about the former of these abilities. The
situation is presented involving a dilemma. Then five principles are given
from which the subject is to select the two which might affect his judgment
as to the rightness or wrongness of the alternative ways of meeting the
situation that is offered for consideration. The principles are so chosen that
one suggests one alternative as the right one, and a second the other. The
person whose mind is already made up as to the correct response is less
likely to find both the principles which apply than is the open-minded person
who is willing to consider various solutions before settling on one as final.

After selecting his principles, the subject then returns to the alternatives
and indicates which he regards as the right one. This answer may be com-
pared with the one he gave in the Provocations test where he had no oppor-
tunity to weigh the principles.

The following are some samples of the situations offered in the test:
1. Mary saw Helen cheating on an examination. She had to decide whether

she would

( ) (a) Report it to the teacher.
( ) (b) Not report it to the teacher.

Here are five rules, of which two apply to this problem. Check two and
two in the spaces at the left of the numbers.

(1) Treat others as you would like them to treat you.

nly
)
) (2) Be true to what is for the good of all, even when your own interests
)
)
)

or those of your friends are involved.
(3) When you have wronged someone, ask to be forgiven.
(4) Be cheerful and uncomplaining when disappointed or hurt or in trouble.
(5) Do not think of yourself as more important than you are.
After checking the two rules that apply to Mary, put a check before either

(a) 7 (b), according as you think it would have been right for her to tell or not
to tell.

2. When John started home from school he found that someone had taken his
rubbers, and so he thought that as it was raining he might take another
pair he saw, which just fitted him. He had to decide whether he would

(a) Take the rubbers.
( ) (bd) Not take the rubbers.
(1) Return good for evil.
(2) Love your enemies and be kind to them.
(3) Obey the laws of health.
(4) Treat others as you would like them to treat you.
( ) (5) Keep your word of honor sacred.
The Applications test was not taken by the graduate class, but was
standardized by the investigators only. In scoring the (a) and (b) alter-
natives the same method was used as in the case of the provocations, viz.,
reference to what the investigators felt to be the response called for on the
part of a morally mature person.
i. Social-ethical Vocabulary—a word knowledge test in the form used
by the Thorndike tests of word knowledge.
a We early felt the need of a vocabulary test which would measure the
individual’s ability to express himself concerning social relations. We had
previously used a general word knowledge test as one of a bat
measuring intelligence. — This did not purport, however, to measure the
vocabulary of any special field of experience. While it cannot be assumed
in advance that knowledge of the vocabulary of a special field is highly
correlated with either the mental or physical skills connected with that field,
there is nevertheless sufficient likelihood that there is some close relation

to justify effort to get at the facts. Certainly without suitable words one

wee ee Ne

tery for

12cannot communicate with others about anything as complex as ethical rela-
tions.

Further, there is a large element of word knowledge involved in the
above moral knowledge tests. Scores on these tests might conceivably be
a matter of vocabulary, or vocabulary and moral knowledge might prove to
be so closely related as to make it possible to substitute a vocabulary test for
all the others.

Our first efforts were not elaborately planned and included only the
short Opposites and Similarities tests above described. The work involved
in building a satisfactory tool for measuring vocabulary was too great for us
to undertake in addition to the other testing that had to be done. Fortu-
nately a graduate student* at Teachers College was in a position to engage
in this piece of research under our direction. She secured the material for
the test from Thorndike’s Word Book, his Word Knowledge Tests, from
our moral knowledge tests and similar sources, compiling a carefully selected
total of a thousand words, from which, after testing about a thousand chil-
dren, she constructed two equivalent tests of 150 words each. These were
used in our testing program.

No special problem of criterion was involved here at this stage of the
work.

A few of the words and their arrangement in the test are added here for
illustrative purposes: The directions require that the number of the word
that means the same or most nearly the same as the first word in the line
shall be entered in the space at the right.

1. BRAVERY. 1—folly, 2—courage, 3—livery, 4—impertinence, 5—humanity

re a eee 1
2. SCOFF. i—cold, 2—angry, 3—make fun of, 4—extol, 5—expound.. ........ 2
3. MALICE. 1—spite, 2—poison, 3—glass, 4—character, 5—hammer.. ........ 3
4. SLUGGARD. 1—snail, 2—lazy person, 3—lax, 4—shot, 5—regard.. ........ 4
5. REPROACH. i1—come near, 2—insect, 3—scold, 4—steal game, 5—nerve

re 5
6. JUDICIOUS. 1—punch, 2—spoken, 3—jury, 4—wise, 5—learned.. ........ 6
%, SUMPTUOUS. 1—conceited, 2—expensive, 3—repast, '4—meager, 5—fairy-

Vike ae a q
8. INTROSPECTIVE. 1—look over, 2—inspection, 3—self-examining, 4—in-

ward, 5—sight. 2.2.28 i 8

C. The Good Manners Test

Another need that we felt quite early in our work was for some measure
of home culture. We did not have the time or the money for careful case
studies, and were obliged to rely on what we could find out from pupils and
teachers. It occurred to us that manners might afford a key to refinement of
a sort that would be symptomatic of careful family training. Here again we
had the co-operation of a graduate student. The test we used was very
largely her product. A sample of the test follows:

The statements below are true or false. If true, draw a line under the word
True in front of the statement. If false, draw a line under the word False.
(True False) If soup or any liquid is too hot, blow on it slightly to cool it.
(True False) In helping yourself to sugar always use your own spoon.

(True False) It is in good form to show general courtesies of “Please” and
“Thank you” to waitresses and maids.
(True False) It is more important to be neat at school than at home.

*Miss Gladys Schwesinger. Her study has been published under the title, The
Social-Ethical Significance of Vocabulary, 1926, Teachers College, Columbia University.

+Miss Cora Orr. The problem is now being studied more extensively by
another student. See also a quite different technique by V. M. Sims, The Measurement
of Socio-Economic Status, and his Score Card, Public School Publishing Co.

13(True False) If a boy meets his mother or sister on the street, he is not
expected to tip his hat. — :
(True False) A boy should not detain a girl to talk on the sidewalk. _
(True False) When yawning, make no attempt to suppress it by covering the
mouth.
In the following, put a cross before the answer which you consider the best.
When not in use the teaspoon should be
1. Left in the teacup.
2. Placed on the table.
3. .Placed on the saucer.
Approval of a program may be shown by
1. Stamping feet.
2. Clapping.
3. Whistling.
Answer the following questions. If the answer is “Yes” underline the word
Yes. If “No” underline the word No.
Should a man tip his hat to a strange lady when picking up an article

WIIG GE nS CT OOPCEl. fo a es es a eee Yes _ No
If the door is closed is it necessary to knock before entering a friend’s

TOON es oan ge ass og oe os I EE Yes No
Is it considered ill mannered to turn and look at a person who has passed

Ae Pate ea ee Yes No
Jane introduces her room mates to her mother as follows: “Mother, may

I introduce Miss Brown and Miss Thompson?” Is this correct?....Yes No

This test was standardized by reference to the judgments of mature,
cultured persons.SECOND ARTICLE

ADMINISTRATION OF THE TESTS AND PRELIMINARY
STATISTICAL RESULTS

The reader is asked to remember that we are describing the process of
building a set of moral knowledge tests. The first step in this process was
outlined in the. preceding article. This consisted in determining the field of
knowledge to be covered by the proposed tests and in reducing to test form
as much of this material as possible.

In this part of the work we had the advantage of two decades of school
testing and its experimentation with various testing techniques, so that there .
were ready to hand many suggestions for the most useful arrangement of
material.

In addition, we have had at our disposal the rapidly developing theory
of measurements so far as this theory is applicable to the field of character
testing. Our successive steps, therefore, are by no means arbitrary or
gratuitous, but are based on established principles and correspond closely to
the procedure through which all standardized tests must pass before they
become usable instruments of measurement.

} One of the first principles of test making is that the preliminary experi-
mental forms shall contain considerably more material than is likely to be
needed in the final form of the test. There are three reasons for this. First,
it has been found by experience that many test items turn out to be too easy,
or too hard, or ambiguous, and have to be discarded. Second, there must
be enough items so that each test can be split into two equivalent halves and
the scores on the two halves correlated as a measure of the reliability of the
material. Third, when enough items are used, the test can subsequently be
divided into two or more equivalent forms.

As one of the preliminary tests had four forms and another had six,
we were faced with the necessity of administering twenty-one different tests,
all of which were far too long for final practicability. In fact, to have taken
them all would have occupied more ee ten hours of a child’s time.

A second principle is that the preliminary testing must be done with the
same population groups with which the test will be used when finished.
These tests were therefore given to a wide range of social levels in grades
five to eight, inclusive. Only city children were included, however.

In addition to knowing the reliability of each test, that is, the likelihood
that it will always work in the same way, when one is building a battery of
tests to be given all at once, all of which purport to measure aspects of the
same general trait, it is necessary to know the intercorrelation of the tests
with one another, and the correlation of each with the total of all the rest.
If the correlation of a test with any other test is too high, that is, if they
measure just the same thing, one of the two may as well be dropped. The
correlation of each with the total score made on all the other tests needs to
be high, as this is a measure of the effectiveness of each test in measuring
what the whole battery: measures.

To secure the correlation of any test with any other it is obvious that
the two tests must be taken by the same children. We have already seen
that for each child to take all the tests would have required over ten hours.
Consequently, the tests had to be divided up among different children, while
still being distributed over a wide range of population of grades five to eight.

15We actually used sixty-eight groups of children for an average of two and
a half hours each. Day schools were used for this purpose, as it would
have been quite out of the question to do such extensive testing in Sunday
schools and clubs.

These sixty-eight groups were distributed as follows: 22 in a New
York suburb which contained a wide range of socio-economic level ; ae, in a
New York City public school unusually cosmopolitan in its population ; 17 in
a mid-western city of some 200,000 population; 10 groups in three private
schools for boys. Some of the latter boys were in grades nine, ten and
eleven. The classes varied in size from fourteen (one group) to sixty-five
(one group), with an average of about thirty-five.

In all this the schools concerned gave us the most cordial and helpful
co-operation, without which it would have been impossible to get as much
accomplished as we did.

In Table I is given the number of pupils of each grade who took each
of the tests described in the previous article. The classes called “Opportunity
1” and “Opportunity 2” are respectively a special group of dull children of
grades five and six, and a special group of exceptionally bright children of
grade six.

TABLE I
DISTRIBUTION OF TESTS BY GRADES
Grades
Tests Total 5 6 7 8 9-11 Opp.1 Opp. 2
1. Opposites ©... 6.2. 273 ae es 123 150 = <. ee
2- Sinlarities,..... .:. «337 ao: vee 144 193 ee
3. Word Consequences.. 381 ae ae. 188 162 31
4. Cause and Effect..... 760 320 226 76 150 78 a
Be WUGES: ee es tol 268 246 163 53 ae 21
6. Comprehensions ..... 690 114 65 284 227 as os sae
@ Erovocations =<... 3-4 657 63 175 168 108 67 21 DD
S. Poresicht “A. .....:. 268 120 45 38 65 ce
Peresignt Bo... ... 251 125 i see 51
Porement ©... 8... 266 88 56 122 ee
Poeresieht 1)... 234 86 112 an 36
Foresight -....::..: 290 43 48 Se es ‘
Poresicnt: Po =... 304 Sas 82 56 EV1 fae oe 55
9. Recognitions A...... 719 263 144 75 201 36 :
Recognitions B...... 719 230 182 134 137 36
Recognitions C...... 697 260 132 61 208 26
Recognitions D...... 674 261 121 114 142 36 ae
a0) Principles: 22... 2. Bh 130 151 104 109 ee 21 —s
11. Applications ........ 468 137 96 a! — 55
12. Vocabulary Se 1599 392 322 331 354 200 ets
13. Good Manners ...... 235 82 34 43 55 Bae 21

PRINCIPLES TEST
With the exception of the vocabulary test the total number of copies of
any one test did not exceed one thousand, but the total number of tests used
was 11,410. As some of the tests were four, five and six pages, the total
number of pages was 36,372. Expert examiners were used for all the
testing.
Scoring the Papers

Inasmuch as we needed not only each child’s score on each test, taken
as a whole, but also a record of how he answered each question, it was
hecessary to make a complete transcription of each child’s treatment of each
item on each test. For this purpose the customary large quadrant sheets
were used, one for each group for each test. The test items were numbered
across the top and the children’s names or numbers were written at the left.

16The record for a class would look something like the following, in the case
of a true-false test. The + indicates that the word true was underlined and
the — indicates that the word false was underlined.

PRINCIPLES TEST

School. Grade. Teachers. Examiner. Date.
No. of etc.
Child 4. 8-.8 £55 =6- 28 SA io 32 ee R. W. S.
001 ———++4+-—-+4+--4++4++
002 SS SS — Fo +
003 — +. — — + + 4+4+4--—-44¢+4+ 4
004 ee a

On a strip of paper having the same size squares the correct answers,
according to the scoring methods described in the previous article, were indi-
cated as follows:

Principles Key

-~ —~-—~—-++++-—-++++-
1 22 3 4-5 6 7 8 9 0 ti ie ic

In the case of the true-false tests the score was taken to be the total
number right minus the total number wrong. So when the key was applied
to the child’s record as found on the sheet, the number right was counted
and entered at the right and the number wrong, likewise. Then the differ-
ence was taken and entered as the score.

The other tests were treated in similar fashion, so that there was thus
available on large sheets all the basic facts concerning the tests.

Statistical Treatment of Scores
1. General Summary

The elementary facts concerning each test are given in Table II, viz.,
the mean score, the standard deviation of the scores, the range of scores,
the mean chronological age, and the mean mental age, with respect to each
of eleven tests. The total number taking each test is given in Table I. The
Foresights test is omitted from all the following tables, as it was not scored
at the time they were made. The Good Manners test is treated separately
later in the article.

TABLE II
GENERAL SUMMARY

Approx. Mean
Mean Age Mental Age

Mean S.D. Range (in mos. ) (in mos.)
+. Opposites. 4.2. 28.0 7.5 3-48 161 166
2. Sinularities.....42.. 4... 12.6 3.8 2-30 167 166
3. Word Consequences ..... 83.0 23.5 14-157 166 166
12.°V ocabulaty = =. =. 74.0 30.0 6-140 140 163
4 Cause and Pttect..22::.. 33.0 1-0 —12 to +70 144 168
Se DMieS= i es ee 134.5 13.0 75-155 140 161
6 Comprehensions — = 2 J... 18.5 3.2 4-26 156 167
Y -Provoeatious =... a. 43.5 6.8 3-6 150 164
9 Recognitions ...14 = 130.0 24.0 35-175 150 161
102 Principles 223.5 4 11.0 6.0 —5 to +24 146 163
dio Applications. .2..-..4.54. 24.0 5D 2-36 144 171

As has been noted, no one child took all the tests. Each child took at
least two and in some cases four or five. The tests were distributed in such
a way as to have each test taken in combination with every other test as far
as possible. Some tests, such as Similarities, Opposites, and Word Conse-
quences, were felt to be too difficult for the younger children and were given

17to a limited range. The relative equivalence of the groups tested, however,
is shown by Table II. The lowest mean age is for the Duties test, and the
highest for the Similarities.
2. Intercorrelation Figures .

The intercorrelations of the tests among themselves are given in Table

III. In the case where the range of ages was restricted, the coefficients
— : : oe
have been corrected for such restriction. None of the correlations of Table

III have been corrected for attenuation.

TA BIiCE LE

INTERCORRELATIONS*

2 3 12 4 5 6 7 9 10 11
ipsa x < 418° 30 = 36 A497 x wie
De Simuntises = == 612 = x 236 x2 472 x x
S weord (ousequtiices....--.. .-- 5... 065 x x 137 x 440 x x

12 ocatlary (5 pa 52 2 458 383 38k = 276 389 330 x
Sos Sy -.. +++ B50 000 237 500 555 326
HUES = eee ee ee a ee et ee 575 030
6, -Comiprehensions: .....4....- eee sie: gpk Bie eee Se OOD 400 a0 363
ue ELOvocalious= = =. ae ee 248 463
OW RECOSTIMIONS..=..3 5 ee eS oe ee a ee ae

PUP CaCes ee $2 i ee eae eee tk

fA pplications: 1 ass Spee ee eg ee eee ee ee

The gaps in the table are noted by the letter x. At these points the
test concerned was not matched by any other or the numbers taking both
were too small to be considered.

The relatively high correlation of the word tests, numbers 1, 2, 3, and
12, among themselves is clearly seen by this table, and suggests at once the
propriety of omitting those that are like the vocabulary test in their intention,
‘wiz., the Opposites and Similarities. On the whole, these intercorrelations
are satisfactory, being dangerously high in only a few cases. No one test
is consistently high in relation to all the others.

3. Reliability Figures

Since the tests were not in any case given to the same children twice,
it was necessary to split each test into two parts by taking the odd items
for one part and the even items for the other part and score each of these

*The reader unused to statistical terms needs only remember. that correlation means
likeness. If the coefficient is plus 1.0 between two tests, then the two tests are alike in
their capacity to measure whatever is being measured. Or if the same test is used
twice, and the correlation between the results is 1.0, then it may be concluded that the
measuring device, like a yard stick, does not change from time to time, but always
measures the same thing in the same way. On the other hand, if the correlation were
00 or nearly .00, between two tests, as is the case of the correlations between several
of the tests of Table III, then the two tests either do not measure the same thing or
measure it very poorly. There is no relation between the score on one test and the
score on the other. The pupil standing first in one test, might be anywhere at all on
the other. And in like manner, if the correlation between two trials of the same test.
or two halves of the same test approximates zero, then it must be concluded that the
test is like a variable yard stick, and is useless as a measuring device.

A negative correlation, if high, shows that those
test, tend to stand last on the other.

Correlations between zero and 1.0 are indications of approximations to identity
between the two measuring instruments, or between two groups measured by the same
instrument, or between two or more performances of the same test on the same group
or in measuring the same thing. Generally speaking a correlation of over .50 is usually

called high. A correlation of -90 or better is very high, and .99 means that the two
distributions are practically identical.

who tend to stand first on one

18parts separately. These scores were then correlated, just as though they
were the scores of two independent tests. These correlations are given in
the first column of Table IV.

TABLE IV
RELIABILITY FIGURES

r betweentwo Coefficient Time No.of Relative

halves of test of reliability (Min.) Elements _ reliability
7, Opposites — 3. a 707 828 15 65 .96
Aj Cases eS. 637 tT 25 90 .88
Be Utes si13 .832 15 100 25
6:-Compréhensions: ... a2 675 805 30 30 .90
TP rovocations=... 53. 579 133 20 36 .90
9: Recognitions =... 55. 255 664 798 30 100 89
10: =Prmeciples= 2.55: =... S, .526 .688 5 24 92
it sAppicaons = =3 .682 810 30 22 91
fe NV ocabtlary=== = es .960 .980 30 150 99

The coefficient in Column 1 is not as high, of course, as if the same
test had been repeated several times. The figures in Column 2 are predic-
tions of what the self-correlation would be if we had correlated two forms
of each half, each of which was as long as the one used.*

This reliability is as high as ordinary intelligence and school achievement
testing would give for tests of the same length. The coefficients are high
enough so that if the tests were combined in a single battery taking, say, an
hour’s testing time, they would yield a reliability coefficient of .90 or better.

As the tests used were of widely varying lengths, merely comparing the
coefficients of reliability does not show the relative reliability of the type of
material used and the testing technique. By predicting what the reliability
coefficient would be if the tests were all of the same length, their compara-
tive reliability as procedures can be seen. Column fivet shows what the
reliability figures would be if each test took one hour. Rearranged in the
order of the reliability of the type of material and the technique used, nine
tests appear as follows:

MOteouiery 9) 5 as eee 99
Opposites =2.. 3 ss ee es 96
Duties a a 95
Priscipis 2. 92
Applications 4: s523-- e 91
Compréhensions = 90
ProvVOCAUIONS (2. ee .90
Recoenitions 2 .89
Casts ee .88

We can conclude without further ado that paper and pencil tests of the
.ort used here measure consistently whatever they do measure.
4. Validity Figures

The question of what they do measure is not so easy to answer. There
are no previously validated tests with the results of which the scores on
these tests can be compared. We have no outside independent measures or
estimates of the moral knowledge of the pupils tested. Indeed, it is difficult
to see how such knowledge could be secured without some manner of testing,
unless one would wish to regard behavior as a measure of knowledge, which
was suggested in the first article as a defensible procedure.

Lacking such a criterion, the next best thing is to regard the sum of

* Computed from Column 1 by the Spearman-Brown formula.
+ Computed by the Spearman-Brown formula,

19bee)

 

all or a part of these tests as the best existing measure of the moral knowl-
edge of these pupils, and correlate each test against the rest. Four of the
tests are word tests. The other seven contain 412 moral situations to
which the child makes some kind of reaction. Table = gives the correlations
of the sentence and vocabulary tests with this criterion.

TABLE V
CORRELATIONS WITH THE CRITERION (TOTAL OF 1-7)
1 2 3 4 5
Increase
Unweighted Weighted Corrected Predicted Required
he CnIGeS. 3 es 440 .402 21 .882 2.0
Be DUlICS. .5 os eee 472 486 544 .912 2.0
3. Comprehensions ....... S17 .288 372 897 2.11
4 Provocations = ....-...: 342 328 421 856 2.17
5, Recognitions ........... 493 491 581 893 2 37
G6 erileples ...... 5... 502 544 .636 .830 2.21
7. ADOUCAUIONS -. 5.052: .5+- 358 412 418 .900 2.10
8. Vocabulary .........-.. .623 me a5 e eo
O. bnitelpence cs. 3..5. .: : .686 Sas

Column 1 of Table V gives the correlations* between the scores on each
test and an unweighted sum of the scores on all seven of the sentence tests.

A fairer picture of the relative power of each test to measure whatever
the whole battery measures is given in Column 8, which shows the corre-
lation of each test with the sum of the seven when each is weighted by its
length.

These correlations, when corrected for attention, appear as in Column
3, on the assumption that the reliability of the sum of the tests is not
over .90. Column 4 gives the maximum possible correlation that could be
expected from each test if a true measure of whatever it measures were
available. And Column 5 gives the number of times the test would have
to be lengthened or the number of times it would have to be given to yield
the predicted maximum correlations.

This does not mean, of course, that by using the sum of the scores as
a criterion, these high correlations could be procured by merely doubling
the tests, for the sum of the scores is probably not a perfect measure of what-
ever each test attempts to measure. But it is clear that it is not worth while
to lengthen some of the tests materially, both because of the high reliability
of the tests as they stand and because they would not give any more valid
results by being more than doubled.

We are now ready to make the new tests and to combine them into
usable batteries on the basis of the facts discovered about the way they work.
The question still presses, however, as to whether it is worth while to go to
the expense of reconstructing the tests. What assurance is there that when

_ .*These correlations are subject to error in that the score on each test is included
in the total with which it is being correlated. Owing to the unsatisfactory character of
the criterion, however, it was not felt to be worth while to do the work necessary te
avoid this error, which is slight in any case.

The reader is referred to standard texts on measurement for justification of this
procedure, and for the correction for attenuation, the assumptions underlying which

are met about as well in the case of the data under consideration as in the usual run
of similar data.

t More accurately, by its standard deviation; that is, each score i Itipli h
ratio, Total S.D. : foe

Test S.D.they are reconstructed they will be of much significance? The only way to
find out is to compare the results of the use of these tests with other facts
known about the pupils tested, such as intelligence, age, conduct, home back-
ground, and see whether the moral knowledge tests add anything to what
we already know or can find out in more economical ways. These relation-
ships will therefore now occupy our attention.

Relations Between Moral Knowledge and Intelligence, Age, Vocabulary,
Home Background and Conduct

1. Moral Knowledge and Intelligence

Table VI shows the relation between each test and the intelligence of
those taking it.

TABLE VI
RELATION OF INTELLIGENCE TO C. E. I. TESTS
r Median Int. Score* No. of Samples
ta Opposites 0. ae 15 104 269
S Similarities ...2 =... 2.5 55 .664 104 317
3--Word. Consequences... -.. .-2... 519 110 346
A Cause:and Pitect 23). == ka, 647 104 672
Bo Duties< 2.53 ks oes 402 95 567
6. Comprehensions . 6.5.2 a a eee, BCL 114 363
7 Provocations«. <2 = 145 106 258
0 Recoenitions <. 5. ee 498 105 522
IO.2PUINCIDIES 555 fa ee 444 103 258
it Appcations. ... = 3: a. 562 114 278
12 “Vocabulary oe a 882 106 234

These correlations are relatively high, particularly in the case of the
Vocabulary test and the two other word tests. Is it to be concluded that
these tests are measuring intelligence rather than moral knowledge?

Considering only the tests in Table V, it is to be noted, first, that the
correlations with intelligence are paralleled somewhat by the correlations
with the criterion of the seven combined. These parallel correlations are
as follows:

TABLE VII
INTELLIGENCE AND MORAL KNOWLEDGE
r with sum rwith Partial r with sum
of 1to7 intelligence intell. constant.
1 Cause and Efiect. =... 402 647 .000
o Duties  s2  e .486 402 .296
@. ( oprehensions @ -.. cs cs .288 371 .090
Ae Provocatiots: oe .328 145 338
h Meternitiots 22. eee 491 498 .240
© Prcinies =. ss ee 544 444 304
S Aagledions 4-5 412 562 003
SB Vocabulary 3-2 626 882 .000 r
iin Ot 1) 2 a ee Soe .686 ee

 

* Obtained by other tests on these pupils administered by the Inquiry and involving
highly standardized intelligence test material developed by the Institute of Educational
Research. The score is a point score.

+The reader is referred to standard texts for the interpretation of the partial
correlation technique. Roughly, the interpretation of Table VII is this: The children
tested were of many degrees of intelligence. They also made many different scores
on the moral knowledge tests. Column two shows how closely their position on the
intelligence test corresponded with their position on the moral knowledge tests. Now
if they had all been of the same intelligence, then the relation between their position on
each test as compared with the sum of their scores on all the tests combined would he
as shown in Column three.eh icbiPb

 

In the second place, the correlation between the sum of the first seven
and intelligence is .686. This is almost as large as the correlation between
many intelligence tests. Apparently intelligence is a large factor in making
a score on one of these tests.

Whether it is the only factor we could say with more certainty if there
were some adequate outside measure of moral knowledge. ;

Assuming once more that the sum of the seven tests affords something
of a criterion, we can partial out the factor of intelligence from the corre-
lations we have already found with it. These partial correlations, showing
the relation between each test and the sum of the seven, when intelligence
is kept constant, are shown in the third column of Table VIL. .

Except for three of the tests, viz., Provocations, Principles and Duties,
these partials suggest one or another of the following conclusions: First,
intelligence is the factor which determines the score on the remaining tests ;
or, second, intelligence is as much a factor in the sum as in any single test,
except the three named; or, third, the criterion is not a measure of moral
knowledge, and consequently, when intelligence is partialled out the corre-
lations disappear; or fourth, there are other factors entering in, which are
not yet accounted for.

With regard to the first two of these possibilities it should be remem-
bered that whatever dependence on intelligence there is in the separate tests
is not removed by merely adding the tests together. The low correlations
with intelligence constant, therefore, merely point to the third and fourth
possibilities, and further light will be thrown on these in what follows. Four
factors that seem to enter into the causal relation under consideration will
be discussed: age, vocabulary, home background and conduct.

2. Moral Knowledge and Age
Table VIIT shows the relation between each test and the age of those
taking it.
TABLE VIII

AGE IN RELATION TO THE TESTS

r NoSamples MeanAge Age Range

ee —.094 270 161 11.0-16.5
as —.068 304 167 11.5-18.0
3. Werd Consequences... =... =. =. = 204. 336 165 10.5—18.0
eee 473 700 144 8-19

eee ee 138 624 140 8-15

O. Comprenensions =<... .. 2. 416 372 156 8.5-18.5
eT Otis “—~,097 259 150 8.5-17.0
me eens 172 462 150 8.5-18

Set Hees a .026 260 146 8.0-16.0
oi Asotatorns 4... 183 292 144. 8.0-16.0
ee —.091 240 140 8.0-16.0

The low correlations in this table are quite startling. Only two tests,
Cause and Effect and Comprehensions, show significant correlation with age.
That is, the older children do not do any better on the other tests than
the younger children do. They may answer more questions, but their scores
in relation to the standards used in scoring the papers are no higher. This
may be regarded as a criticism of the standards or it may be valuable evi-
dence regarding the way moral knowledge develops. In any case, whatever
the tests measure, and they evidently measure something very well, this thing
does not increase with the age of the pupils in grades five to eight. :

223. Meral Knowledge, Age and Intelligence

Assuming that chronological age is a rough measure of experience, it
is worth while to note what the effect on the correlations with age and
intelligence is when each is in turn kept constant. Table IX gives the facts
for four tests and for one school situation. In this situation and for the
ages concerned, the correlation between age and intelligence was .396.

TABLE IX

CORRELATIONS OF AGE AND INTELLIGENCE WITH MORAL KNOWLEDGE

Age Int. Intelligence
Age Intelligence Constant Age constant

Provocalions.. —.097 145 —.170 .200
Applications 2. 21. 34 183 562 —.053 543
Coniprchensions 4... .416 oc 316 247
Caguee a00 Pie 473 647 310 569

Careful study of this table brings out certain interesting comparisons.
The answers to the Provoeations situations seem not to be functions of
either intelligence (native ability) or age (experience). Reference to the
test may throw light on this. The items are like this:

Henry saw a big bully strike a little boy, so Henry walked up

and gave the bully a real hard blow and knocked him down. Was

this right, entirely wrong, or, if wrong, then excusable?

It is probable that judgments here are largely matters of attitude, rather
than of cold analysis, and that these attitudes are picked up early in life
and do not change much, at least not while the individual is in the grades.
We have found by experiment that in the case of adults, answers to problems
on this test are most:tenaciously clung to once they are given, whereas
answers to items on the Cause and Effect test are readily changed when
other possible answers are suggested.

In the case of Applications, on the other hand, we seem to have some-
thing more nearly like pure intellect at work, for when intelligence is par-
tialed out the correlation with age is zero, whereas, when age is partialed
out the correlation with intelligence remains high. Experience affects the
matter not at all.

In the case of Comprehensions, both age and intelligence hold their own
in the partials, suggesting that knowledge of what to do in a situation involv-
ing ethical alternatives is a function of both intelligence and experience. In
the Causes test, intelligence proves stronger than experience, though age
retains a significant partial correlation. Although this correlation of .569
with I.Q. is larger for Causes than for Applications, comparison with the
third column will show that the Applications correlation is more exclu-
sively I.Q. than is that of Causes. That is, the Applications is more nearly
an intelligence test pure and simple. But the reader is asked to suspend
judgment on this point until other factors have been presented and discussed.

It certainly would appear possible to develop a series of tests each of
which would measure a type of judgment different from that of every other,
thus discriminating the influences of temperament, attitudes, knowledge and
skill in handling moral problems. Also the recency of experience as it is
registered in fixed attitudes or flexible opinions might be revealed by the
nature of the test, as seems to be the case in Provocations as contrasted
with Causes.

23, be bbehatp

fee
ce
per
ef
4 igi

a

4. Moral Knowledge, Vocabulary and Intelligence

The moral knowledge tests all involve the use of words and to score
high in them a child must have an adequate vocabulary. This statement is
borne out by reference to Table Il which gives the correlations between each
test and the social-ethical vocabulary test. These run from .276 to .587 for
the sentence tests. The correlation between the sum of the first seven and
vocabulary is .623. But the correlation between the first seven and intelli-
gence is .686, and the correlation between vocabulary and intelligence 1s 882.
Since these correlations are all positive and high it is impossible to tell which
is the predominating factor, as the partial correlation of any two with the .
third constant is approximately zero. All we can say is that moral knowl-
edge, vocabulary and intelligence are closely interrelated, recognizing that
there are also other factors entering into the situation which may be more
significant than either vocabulary or intelligence in determining scores on the
moral knowledge tests. Certainly we are not justified in attempting to sub-
stitute either a social-ethical vocabulary test or an intelligence test for a
moral knowledge test.

There are two arguments against such a proposal. In the first place,
the correlations concerned (.623 and .686) are not large enough to predict
from either an intelligence score or a vocabulary score what the score on
the moral knowledge test would be.

The second argument involves the question of whether a special ability
test can be substituted for a general ability test. If moral knowledge is an
achievement analogous to arithmetical ability, or, better, some other field of
knowledge, then it is a function of both intelligence and experience. Any
good measure of a specific school ability will correlate highly with intelli-
gence. Yet no test of a single school achievement can be used as a sub-
stitute for an intelligence test, nor can an intelligence test, unless it con-
tain material drawn specifically from a given field, supplant a test of achieve-
ment in that field. It may be noted in this connection that one of the tests
which has the least relation to intelligence as shown in Table IX, neverthe-
less has one of the strongest correlations with the criterion.

5. Moral Knowledge and Home Background

The Good Manners test is not a validated measure of home background.
Good manners are generally regarded as the product of home background,
however, so that the relation of this test to the moral knowledge tests is
interesting. The figures are given in Table X.

2 TABLE X
GOOD MANNERS AND MORAL KNOWLEDGE

r M N
1 Causes So 174 62.6 43
ODS. a ee SS a 302 55. 49
Se OMmprenensions, <= s s 443 58.0 86
Me Provocations: i a eS. 396 54.0 50
5. Recognitions pe a ee 478 53.0 42
6. Principles ee ee .438 55.0 62
Pe opucaions: ts 274 68.0 43
8, Vocabulary Bee ee 720 58.8 199
OF icligences ee 583 58.8 201
WO Aces = a [jad 58.6 209
a a a 560

The correlations between good manners and vocabulary, age and intelli-
gence all run higher than most of the corresponding correlations between

24the separate moral knowledge tests and these three factors. Partialling out
intelligence we find a remaining coefficient of .271 between the Good Man-
ners test and the sum of seven tests used as a criterion of moral knowledge.
These coefficients indicate that the same factors that lead to a knowledge
of right and wrong lead also to a knowledge of etiquette.

6. Moral Knowledge and Conduct

A. School conduct ratings.

In one school reasonably good conduct ratings were secured which were
compared with scores on the Moral Knowledge test material given in that
school, with the results shown in Table XI.

TABLE XI
CONDUCT RATINGS AND MORAL KNOWLEDGE
B;-G,.D A
Med. Score Med. Score

i Causes 2225 3 27 26
2. Duties = =a 140 145
3; Recognitions =: 2 ee 135 138
4. Comprehensions ..5..05.555 5.5 4. 19 19
5. Provocations 5) a 45 43
6, Principles =... ee 10 14
". Applications =.4.....3. 23 24

The first column gives the median score of those receiving a deport-
ment grade of B, C or D, and the second column gives the median scores
of those whose conduct grades were A. The differences are not significant.
It is to be remembered, however, that these conduct grades are teachers’
marks and not objective measures of conduct.

B. Moral Knowledge and Cheating.

Only one school is included in the following comparisons, which must
be regarded therefore as suggestive rather than final. In the case of the
behavior called cheating, the inquiry had available objective measures rather
than judgments, procured by a technique which yields a reliability well
over .80.

TABLE XII

RELATION OF CHEATING TO MORAL KNOWLEDGE

r r Partial r
Home School School Cheating
Cheating Cheating Int. Constant

ie Gause and: Pitect 3.37. +.031 —.054 .000
2 DUES = ee ee —.178 —.296 —.164
3. Comprehensions =... =. —.018 —.301 —.182
4, Provocations=.. = = 4226 —.129 —.241 —.202
5 Recoonitiots ...5..4 23. .... Se —.091 —— os .000
G. Principles 3.33... —.088 —.247 —.089
Fe NpoOucaoOns 2s a —.066 —.402 —.239
Sun cot 4a ee —.121 —.385 ae
Partial r, cheating and int. moral knowledge

CONStANt os ee +.094 ~ +.037
Cheating and intelligence <...:2.......%.. —.201 —.392

The sum of the scores used in the table is unweighted. Just itt there
should be a higher correlation between moral knowledge and cheating in
school than at home is not apparent.

It happens that the Cause-effect test correlates low with cheating and
high with the other tests. If we omit this test from the sum, the correlation
between cheating and the sum of the remaining six becomes —.537. This
is higher than many would expect, and the first thought is that it is due

25feb ppibet HH

 

to the common factor of intelligence, as would be suggested by the corre-
lation of —.392 shown in the table. The partial correlation, however, be-
tween cheating and the sum of the six moral knowledge tests (Cause-effect
omitted) when intelligence is kept constant, 1s —.402 for school cheating.
(r intelligence and sum of six, .778.)

The last column gives the partials of each moral knowledge test and
cheating, with intelligence constant. Here the Applications test shows up
best, although, as previously noted, it has a very large element of intelli-
gence in it. That there is hardly any relation between it and age when
intelligence is constant is in keeping with the fact that cheating and age
correlate zero also.

Another related fact is shown in the partial correlation between intelli-
gence and cheating when moral knowledge is kept constant, which becomes
either zero or positive. This suggests that it is their superior moral knowl-
edge rather than their intelligence which make the brighter children cheat
less than those less gifted, and that, granted the same amount of moral
knowledge, the more intelligent would cheat even more than the dull pupils.

These preliminary studies in the relation of the scores on the moral
knowledge tests to other factors besides intelligence, viz., age, vocabulary,
home background and conduct, give sufficient evidence that something more
is being measured than merely intelligence to justify further experimenta-
tion. Consequently our next task is to sift the old forms for the most
valuable material and construct new forms which will give maximum results
with a minimum expenditure of time.

Building New Forms

It was stated early in the paper that the tests as they stand would
require several hours of testing time. There is evidently plenty of material,
and our first task is to cut it up into usable portions, eliminating what is
of no use, and arranging the remainder in test forms adapted to ordinary
test conditions. The process of elimination, dissection and reassembly in-
volved the following steps:

1. Items with ambiguous answers or with localized answers were
thrown out.

2. Items on which ninety percent. of the children agreed with the
standard were thrown out as failing to distribute the subjects. These were
too easy to use as measures.

3. Tests correlating highly with intelligence and having no independent
value were thrown out. The Opposites and Similarities tests came under this
head. They correlate most highly with intelligence with the exception of the
vocabulary test, and so far as vocabulary is concerned, the test of this name
serves the purpose sufficiently. Unlike the Applications test, these two do
not show up particularly weil in their interrelations with other factors. The
vocabulary test is retained because of its high reliability and because of the
opportunity it gives of measuring the extent to which moral knowledge
scores are matters of vocabulary.

_4. Each test was split into two forms, each having the same number
of items, each item of one form being of the same order of difficulty as the
corresponding item of the equivalent form. :

5. Including the Foresights tests

of which the eight items eliciting the
widest range and |

argest number of responses were retained, there were now

26

 

 

——eeee ae

SS eet taoieadasdLasetenedanaeaeen tests of two forms each, requiring ninety minutes of testing time. So the
ten tests were put into two scales of five tests each.

The first three steps are clear, but the last two need explanation. The
reader is requested to refer to the illustration given at the beginning of the
paper showing how the answers to each item were recorded. To find the way
each item was treated, it was only necessary to add up for each column the
number of each kind of answer given. As the records were tabulated by
groups, this gave for each group tested the total number of each kind of
answer given to each item. The record for one group for one test would be:

PRINCIPLES TEST

t= 28 4-5 6a SS 10 es ee
NGS 2 ee §.19= 29 84 24 38 292 10 4d 2 8 5 88
Nos = See 30-16. 26: 20] fl = 6 2

The group results were combined into grade results, and the grade
results into totals, and the totals translated into percentages, so that the final
sheets looked like this:

PRINCIPLES TEST
1 2 Be. 4 5 etc.

+- +—- $+ — Ff + —
Rerceniagces=- ees 2 26 74 80 20 48 52 37.68 86 14

 

These results were then compared with the standard and the percent.
agreeing with the standard was indicated, and made the measure of the diffi-
culty of each item. These difficulty values were then placed on the original
test opposite the items, and the percentages giving each kind of answer were
indicated above the multiple choice response offered in the test. For each
test, therefore, there was a basic work sheet such as the following:

PRINCIPLES TEST

J correct % answers
26 74
74 1. If anyone hurts you it is up to you to hurt him.......... True False
20 2. No one should be forgiven for a wrong deed until he has 80 20
asked Tor foteiveness. 2. ss ee True False
52 3. It is best to have nothing to do with an unpopular boy or 48 52
Pitl — True False

The selection of equivalent items and the arrangement of these items in
their order of difficulty became then a mechanical matter.

In deciding which of the ten tests should go into each of the two scales.
reference was made to the correlation tables. The tests were divided so
that in each scale the average intercorrelation between the tests was at a
minimum and at the same time the correlation between the two: scales was
at a maximum. The scales, with the number of elements and the time
required for each test are shown in Table XIII.

TABLE XIII

Scale A a E Scale B tT E
base and Pec =: 22. 9 37 te Foresi¢his =. 12 4
Be INES. oo ee oe ee 5 30 2, Recognitions; 44 10 43
3. -Comprehensions= =.=. 4 10 10 3. Principles 2. S25. 2 10
4. Provocations =... 6.6 2 AOs- 17 4- Applications 2. .5 233 =. 10 10
5. Word Consequences........ 10 16 5; Ethical Vocabulary. =: 2.2.2 10 50

4d 44

The column marked T is the time required for the test and the column
marked E is the number of elements. These figures are the same for each
form.From our preliminary data we estimate that Scale A will correlate about
90 with Scale B and that each scale will have a reliability of over .90. Each
scale will correlate about .60 with intelligence and the two scales combined
will correlate around .70 with intelligence.

This general scheme seems to us to be the best combination of statistical
reliability and practical administration. Either of such scales might be
given easily in one hour which is a period and a half in day school. A boys’
club or Sunday-school class, or any such organization might give one scale
at one period, and, if desired, the other scale at a second period. Or they
might give only one. Each scale has two equivalent forms so that a test
might be given with a time interval between for purposes of comparison.THIRD ARTICLE
THE CODE VALUE OF MORAL KNOWLEDGE SCORES*

In the first two articles of this series the process of building a test of
moral knowledge was described. This process included the selection of ma-
terial, its organization into test forms, the administration of the tests, methods
of scoring, the determination of the reliability of the material, and preliminary
studies in its value as a research tool. The second article closed with a ref-
erence to two scales of two forms each which were projected as the most
useful arrangement of the material thus evaluated.

These scales were printed as planned. The material of each type tried
out and found usable was arranged in an order of difficulty and divided into
two parts of equal range of difficulty and almost equivalent item for item.
These two parts of each test were placed one in each form, which made the
two forms of each scale of identical difficulty and equivalent material.

The ten tests selected were also divided into two groups of five each so
that the two scales would take about the same length of time and correlate
in about the same way with the sum.

Scale A included a Word Consequences Test somewhat modified from
the original form and therefore unevaluated. Scale B contained the Fore-
sight Test also not as yet statistically treated. The complete arrangement
was given in the previous article.

The work remaining to be done on this material as thus constituted re-
lated to the following problems:

1. What are the statistical values of the two relatively new tests intro-
duced—Foresight and Word Consequences?

2. How will the reliability of the material be affected by its present ar-
rangement in a battery, involving not only the reduction in length of test
(a calculable effect) but also a certain sequence of tests?

3. To what extent are answers to the questions determined by the tem-
porary emotional set of the occasion on which the tests are administered ?

4. To what extent are answers to the questions merely reflections of the
opinions which the children think are approved by the authorities under whose
auspices the tests are given?

5. What are the major sources of the knowledge or quasi-knowledge
the children exhibit on the tests?

6. What codes characterize children of different groups—age, sex, race,
community, culture, etc.? How do these compare with codes of adults of
the same or other communities and groups—teachers, parents, Sunday-school
and club leaders, etc. ?

7. What norms in terms of scores can be built up for practical compari-
sons of individuals and groups?

8. What further light can these tests throw on human behavior as
measured by other techniques employed by the Character Education Inquiry ?
' 9. What is the best arrangement of the present material in test form?

10. What new tests are still needed?

I. Problems 1-3
Present knowledge permits us no comment on the Foresight test as con-
templated work on this material had to give place to other interests. The

*In the study of the Shift Technique the investigators had the cooperation of
Mr. Leonard Stidley, a graduate student at Teachers College.

29Pb bebe

 

ope here!

results obtained on the Word Consequences test confirmed our first impres-
sions that this made too little contribution to the battery to justify retaining
it. It is too difficult, for one thing, and also too much like an intelligence
test. Neither of these tests, therefore, are included in the moral knowledge
scores to be reported. So much for Problem 1.

With regard to Problems 2 and 3 some information has been incidentally
gathered which will need to be supplemented before a final battery of tests
is offered for general use. We find that when the shortened tests are given
on different days in the two forms into which the original material was split,
the actual correlations run slightly lower than the predicted correlations. The
following table gives the details.

 

 

FABLE 1
This year Last year This year
Test—Scale A r N_ Elements r N_ Elements corrected
Cause—Effect ........+. 441 150 36 637 378 45-45 496
Pitiec. = es .400 150 29 218 154 50-50 571
Comprehensions ........ .400 150 10 675 355 15-15 500
Provocations ........... .583 150 17 579 256 18-18 583
Stale. 60s 7151 881 821
Scale B
Recognitions... ss. 653 185 43 664 100 50-50 653
Pemeisies.= 3 348 185 10 526 331 12-12 348
Applications Pe Pe ee .D22 185 10 .682 329 11-11 522
WocnDidts 6. 896 185 50 .960 75-75 928
Scale=h- = = ee 862 895 864
Seres A al 5... .900 .945 915

The sum of the first four tests of Scale A (Cause-effect, Duties, Com-

prehensions and Provocations) shows a reliability of .751 as against a pre-
dicted reliability of .881. The last four tests of Scale B (Recognitions, Prin-
ciples, Applications and Vocabulary), show a correlation of .862 as against a
predicted correlation of .895. Had these tests been as long as those last year
for estimating the expected r’s, the r’s this year would have been respectively
£821 for Scale A and .864 for Scale B. The sum of the two scales together
last year yielded a predicted r of .945. The actual r this year is .90, with
915 as the r that would have been obtained had the tests not been reduced
in length.
___ The lower reliabilities may be explained in part by the fact that in build-
ing the tests out of the material used last year we deliberately excluded all
items on which more than 90% of the answers agreed with the criterion.
If this operated to cut down the correlations between the two forms we would
expect the greatest differences in the case of tests where the cuts were great-
est. With the exception of the Principles test this seems to be what hap-
pened.

Another factor tending to reduce the correlations was the difference
between the two situations. The drop in the case of the relatively stable
Word Knowledge test lends support to the suggestion that great pains need
to be taken to keep the situation as constant as possible when tests of this
sort are given on different occasions. It happens that the above figures were
obtained on the populations on which the shift technique was employed.
Doubtless when the examiner returned on a second day to give the other set
a oa ee ee effect from the manipulation permitted on ‘the

‘ which tended to inter ; : oa :
econ of the means je es eae 2 ee
SU's quivalent tests exhibited on the

30separate occasions. Other things being equal the means and SD’s should
have been the same, as the material which went into the tests was statistically
equivalent to start with. As a matter of fact both the means and the SD’s

; : ae. : SD
shifted. Using the index of variability as a measure of this change a

we find the following facts, where V,—the variability of Form 1 and V,=the
variability of Form 2.

TABLE II
Vi V2 V:- V2

CAUSES. ois on ee gs ee 1.930 1.50 .430

IB plat ccpeee ere ee rr ee i 164 .140 .024

Gomprehensions: .....s..isecss se cas Gea sce wie 383 246 137
PROVOCALIONS =< osc csc sa cscs och te ee .260 151 .049 é
RECOCNILIONS sc 5. coe ee 242 222 .020

Principles. .4 24.s.00. 0s ce ccs feces ssc a ee 1.000 1.000 .000

Aeeicaions ...2 5. us 405 284 121

MOGI DIIAL Yoo ok a os oes oo a ee we 516 563 —.047

From this it can only be concluded that something was affecting the
behavior of the children which caused them to take the second test in a some-
what different way from that of the first day. This together with the cuts
mentioned might easily account for the reliabilities being lower than predicted.

Evidently four tests are not a sufficient basis for measuring codes with
this material. Not less than eight are needed, and the situations must be
carefully guarded against varying attitudes which might influence the behavior
of the pupils.

II. Problems 4-6

Underlying problems four to six is a fundamental question concerning
the nature of moral knowledge. There is a sense in which the very term
contradicts itself, for morality, so far as it possesses an intellectual aspect,
is more a matter of judgment than of knowledge. There are no “right” data
or “wrong” data to be known. There is only the right use of data, or the
wrong use of data. One may know the data involved and this knowledge
may be tested. But knowing the “right” means knowing what is called right
—what others think is right. These data are the opinions of others or one’s
own opinions. They are not comparable to the “facts” that are the objects
of knowledge in the usual meaning of*the term.

In the case of facts, one’s opinion is of small moment. The veridity of
the fact is not dependent on any single man’s opinion regarding the impor-
tance of the fact. Ford cars are made in Detroit whether the child that
takes the test containing the statement agrees or not. But in the case of ques-
tions involving moral principles, such as whether one should steal when
starving, the relevant knowledge has to do with laws, consequences, concepts
of society, the ethics of property, and we are at once in the field of opinion. : '
It is well to “know” what the current opinions are, and to know also what~~—
one’s own opinion is, if one has any independent opinion, but as opinions vary
from time to time and from group to group, and as opinion as to what is the
prevailing or the best or standard or conventional opinion is rarely based on
scientific study, the answers to moral knowledge tests cannot be treated in
the same way as answers to general information tests, where the data to be
reported on may be verified.

As social science develops and more becomes known concerning the
nature of human relations and the laws of social behavior, a larger amount of
material essential for the intelligent guidance of conduct will become organ-

31ite
i beer
iF
H
bbe
EH
ee
Ce
+
+
ie
Feet Sie

: Ht
a
|
ret
+f

#4

|
3
:

i eee

PHP

a

ized into a body of knowledge comparable to historic fact or scientific fact and
the individual’s mastery of it can then be tested. But at present such insight
as we have into the laws of conduct is largely esoteric, the prophetic assertion
of moral leaders, to be taken on faith rather than to be regarded as scien-
tifically established. Fortunately such insights have been available for the
guidance of men, for only very recently has the field of morality been opened
up to scientific study. Now that people are increasingly willing, however, to
make their behavior the subject matter of scientific investigation, it is essential
that the ethical dicta of the great moral leaders as well as the conventional
codes of those less inspired be regarded not as final immutable laws but as
hypotheses worthy of careful study and application.

Our Cause-effect test and the Foresight test which has yet to be perfected
are in this field of verifiable fact, and the Vocabulary test may be similarly
classified. The rest, however, belong to the field of codes. But whose codes ?

Some codes, such as “conventional morality,” are rather vague in outline
and specification. Others, such as the moral theology of the Catholic Church,
are entirely precise and clean cut. In general, where codes are the result of
the accumulation of varied experiences, they are vague, and when they are
deliberately formulated, as in the Boy Scouts, they are definite. As any child
is not only a member of the community with its conventional codes of which
he has been made aware through parental and school discipline, but is also a
member of the fraternity of youth, with youth’s own heritage of attitudes
and practices, and in addition is a member of some section of the community,
such as the exclusive residence section of the wealthy, or the more humble
and congested district where the pressure of economic necessity lays its heavy
hand on every family, and besides all this, may be a member of a church
which teaches definite formulations of right and wrong and of a club which
has also its own formal code—in consequence of these varied and overlapping
social contacts, the child’s code is naturally a complex affair. Who can say
what part of it, if any, is truly his own? On any one occasion, which code
is functioning? The acquisition of a stable personal code is probably a mat-
ter of slow growth, and doubtless many never achieve one. ‘When in Rome
do as the Romans do” is a self-preservative caution which necessitates the
adoption of the approved code, at least in words, and, as far as necessary,
in acts.

How, then, are the answers to the moral knowledge tests to be inter-
preted? What code is represented? To what extent are the children’s an-
swers the Roman answers, and to what extent are they the Barbarian answers
of the casual visitor to the City? We have made an effort to throw a little
light on this problem.

The Shift Technique

In a suburban community where all the children in grades five to eight
were given the Moral Knowledge tests, certain groups were handled in such
a way as to elicit maximum “reflected” answers on the tests. That is, acting
on the assumption that the answers were conscious efforts to say what would
be approved, we defined what would be approved so that there would be no
ambiguity about it in the minds of the pupils. Four techniques were used:

1. The SAA Technique. This involved the use of an answer sheet on
which the standard answers were printed. The children first took one form
of the Moral Knowledge tests. Then the answer sheets were passed out and

32the children were given a chance to change their answers if they wanted to,
so they would correspond with the answers approved by the adults. The rele-
vant parts of the directions were as follows: Before the tests were passed
the examiner said:

“These are called ‘tests’ because they are to be used to find out what anyone knows
about right or wrong. But what ts right and what ts wrong? ‘We realize that some
things that parents and teachers and older people call wrong children may not think
wrong at all. And grown people sometimes ask children to do things which the children
think are wrong.

“Therefore we want to find out what children really think. We want you to help
us by giving us what you think are right answers.

“It may help you to know what some of the best informed grown people to whom
these questions have been given think about the answers. You will probably disagree
with them on many points. If you do, we want you to feel perfectly free to give your
very own opinion. No one is going to think the worse of you for that.

“So after we take the test, I will pass out a sheet showing how some teachers and
parents and other people would answer the questions.”

Red pencils were used in taking the test. Then the pencils were collected
and blue ones passed. These directions followed:

“Now you can compare your answers with those of the answer sheets which show
the opinion of some of the best informed people, such as parents, teachers and others.

“Now look at the first test. Compare your answer with that given on the answer
sheet. If your answer is the same as the answer sheet it means that you agree with these
people. If it is different it means that you don’t agree with them. Read your questions
and see whether you have changed your mind. It will make no difference to your stand-
ing whether your answer is like the key or not. But if you feel on second thought that
the answer sheet is right you may change your answer. If you do change your answer
cross out the red mark and make a new mark showing what you now think. . . . Etc.”

The changes made are distinguished by the difference in the pencil used,
aud are presumably in direct proportion to the pupils’ desire to give an ap-
proved answer.

2. The SBA Technique. This also used answer sheets, only this time
they were passed out with the test so that the pupils had them before them
as they worked. The directions began as for SAA, the last two paragraphs
reading:

“It may help you to know what some of the best informed grown people to whom
these questions have been given think about the answers. You will probably disagree
with them on many points. If you do, we want you to feel perfectly free to give your
very own opinion. No one is going to think the worse of you for that.

“So when I pass out these papers I will pass out a sheet showing how some teachers
and parents and other people would answer the questions.”

After the tests were passed the examiner said:

“Now remember that what we want is your very own opinion. It will make no
difference to your grade whether you agree with these answers or not. If you do agree
of course your paper will be marked like the answer sheet. If you have some other an-
swer, never mind the answer sheet at all, but give your own answers. Etc., etc.”

In this technique the extent to which the answer sheets were used as a
substitute for original opinions is indicated by the correlation between the
scores as thus obtained and those secured from the use of the other form of
the same grade which was subsequently given without the answer sheets, or
by a comparison of the respective means.

3. The SS Technique. Here the standard answers were entered in red
on the tests themselves, and these were passed to the pupils with the request
that they correct them if they were wrong. The answers were stated to be
those of the “best people.” The relevant directions were:

“On this test you are going to change places with your teacher. Usually grown
people give children tests and then mark them as they see fit. This time some grown
people have taken a test and I am asking you to correct their papers. The way these
grown people marked the papers is shown here,”—point.

33Hu dH

 

; aH

 

Pass the Form 1 marked papers. “Fill in the blanks on the front page. Now open
to test 1. The people who marked these papers this way were teachers, parents and
others who are supposed to know what is right and wrong. But I want you to point
out their mistakes, or at least where you think they are wrong. If you think they have
answered a question correctly, mark it C. If you think they are wrong, make another
mark in pencil where the right mark should be. If you don’t know, make no mark at
all. Thus on Test 1, if you think the first statement 1s false, as grown people do, mark
it C. But if you think that the statement is true, draw a line under the word True, Is

that clear? Begin . . . Ete.”

4. The SD Technique. In this case no answer sheets were used, but
the children were given the tests in the ordinary way (except that red pencils
were used) and then they were asked to indicate what they thought most
people would give as the answers. Blue pencils were used on this part of
the test. The directions read:

“Now I am going to ask you to do something hard. I want you to go back to your
papers and look at the first test. You have been giving your own opinion on these an-
swers. Now what do you think most people think about these things? Very likely you
feel that your opinion is in some respects different from that of other people. It may
be better. It may be worse. But how would most folks answer these questions? Now
take your blue pencil and show any differences between your answers and what you
think would be the answers of the average person. Mark any changes with your pencil.
Do this for all the tests. Take your time.”

The double pencil technique shows differences in opinion, if there be any.
The number of differences is theoretically in proportion to the originality of
their first answers, or, conversely, the number of likenesses shows the extent
to which their first answers were thought by them to be conventional or ex-
pected answers.

These four techniques make use of two standards—the “ideal” score
basis previously discussed, and represented to the pupils as the opinion of
persons to whom they presumably looked up, and the conventional or general
adult opinion, represented only in the imagination of the children. Three de-
vices were used to confront the children with the explicit “ideal” standard,
and one with the conventional standard. In two techniques the sensitiveness
of social approval should operate to increase the amount of change, and in two
techniques it should operate to decrease the amount of change.

Space does not permit giving the entire results of these techniques, but
they may be summarized as follows:

The SA Technique. As the pupil had opportunity to change only the
answers which were not like the answer sheet, and as his additions from the
answer sheet beyond the point in the test which he had reached by his own
efforts are not indications of changes, the percentage of change is found by
dividing the number of changes he made in his wrong answers by the num-
ber of answers he got wrong. This method brought out a 31% change,
nearly half of which was made by 25% of the pupils. 69% of their answers
which were not like the key they refused to change. Dependence was ex-
pressed by an average of 14.2 changes per scale or 2.8 per test.

__ Over against the fact that 89% stated that it was their duty to read the
Bible every day (apparently a conventional response) must be set the fact
that very few of these children changed their answers when confronted with
the answer sheet which showed that the “standard” did not regard it as their
duty to do so. :

The SS Technique. As the pupils were asked to “correct” these papers
and give their own opinion where it differed from the opinions of the adults
the number of changes made is an indication of the possession of a fixed
opinion associated with some other code than this adult code. a

34Inasmuch as each pupil theoretically agreed with the answers given to a
certain extent, these agreements must be allowed for. An equivalent control
test was used to determine the area of doubt within which change was sig-
nificant. This consisted of the number tried minus the number right on the
control test. This technique brought out a 41% change. When in doubt,
that is, 59% of the answers were kept like the adult standard and 41% were
changed to conform to their own independent opinion. This independence
was expressed by an average of 21.2 changes or corrections per scale or
4.3 per test.

The amount of overlapping of the adult code as represented in the an-
swer sheets and the code reported by the children is shown by the proportion
of answers not thanged within the area of true effort to the total number of
questions tried on the control test. This proportion was 84%.

The SD Technique, where no standard was stated but the children were
asked to show how they differed from conventional standards, however they
might conceive them, brought out. a 47% change or 47% sense of difference
or independence. This independence was expressed by an average of fifty-
three changes per scale or 10.5 per test.

The SBA Technique, where the answer sheets or standard was given out
along with the test, a forced likeness of only about five points on the mean
total score for each scale was shown over a mean total bona fide score of 68.6
for Scale A and 74.4 for Scale B, or a percentage of susceptibility of 31
percent, which is identical with that shown by the SAA technique.

When the changes made by each pupil are regarded as a score, they
yield a correlation of -.047 with moral knowledge scores. We may say, on
the one hand, that the moral knowledge scores, therefore, which differ widely
from child to child, are not merely efforts to repeat the school standards, but
represent something more fundamental. Or, on the other hand, we may say
that the tendency to make a good appearance does not correlate with moral
knowledge. .

The SA Test

Another approach was made by the use of a paper and pencil test. We
assumed the existence of a tendency to make answers correspond to what was
regarded as accepted opinion. We assumed, further, a general tendency to
tell the truth apart from any specific gain to be derived from falsification.
This gave us a motive and a resistance. We then created a situation in which
what the child regarded as the approved answer could not be given truth-
fully. The extent to which a child would falsify to gain approval measured
his social sensitiveness.

Obviously there will be many children whom such a test will not measure,
inasmuch as the disposition to tell the truth at all costs is too great to be
overcome. But it is fair to presume that such a child would also tell the
truth on any sort of paper and pencil tests, so that its purpose in getting at
the tendency to express an opinion without regard to its acceptability to the
particular group responsible for the tests is realized even in such cases.

The test is of the true-false type and is in two forms of thirty-six ques-
tions each. These questions are largely based on the Duties test, in Scale A
of the Moral Knowledge series. They are such questions as these:

1. Did you ever accept the credit or honor for anything when you knew the credit
er honor belonged to someone else (35... <5. 6 mess ee Yes No
2. Did you ever act greedily by taking more than your share of anything?.Yes No

353. Did you ever blame another for something you had done when you

knew all the time it was your fault?.........+eeeeseeeecererers epee ene es No
4. Do you usually report the number of a car you see speeding?......... Yes No
5. Do you always preserve order when the teacher is out of the room?...Yes No

i WO? a ee Yes No
6. Do you report other pupils whom you see cheating!
T. Did you ever pretend to understand a thing when you really did not

a EE 5 ao oc bo ks sn wpe ne waves tet eee t eo nee ch tse eens seas es No
8. Have you ever disobeyed any law of your country or rule of your fe

Bae sas ca oa ea ee ore Se ccs e eves ese ese ses Jecceees teen ee ee ee eeecaes es 0
9. Do you speak to all the people you are acquainted with, even the ones

you do not like?..........ssseeeeees se eenseccecnseeseterttssseesececeneraes Yes No
10. Do you usually call the attention of people to the fact that you have on

new shoes or a new SUit OF AresS?.... 2. cece eee e eee reer eect rere eeeeteeeeees es No

Any child could answer some truthfully as scored above, but the individ-
ual who could answer all thirty-six truthfully would be a pious fraud. Fur-
thermore, the items are such as adults are apt to represent as “duties.” Chil-
dren are frequently told to do or not do the sort of thing asked about. The
pressure to claim the virtue specified is therefore considerable.

The scores on this test are “amount” scores, without any predetermined
point separating those possessing the tendency from those not possessing it,
and can be used as they stand for correlation purposes.

The two forms, given to 133 children, yield a reliability coefficient of
836.

The moral knowledge scores correlate with the SA scores .121 in one
school and —.029 in another.

Here again there seems to be evidence that the necessity of making a
good appearance is not so strongly felt in the case of the moral knowledge
tests as in the case of the inquisition about behavior. But we need to be cau-
tious in drawing this conclusion, inasmuch as it presupposes a knowledge on
the child’s part of what would be approved. He had no answer sheets to
show him. If he did not know what the approved answer was he of course
could not give it anyway. The SA test was given a conventional score, as
just now described, to bring out the child’s tendency to appear conventional.
But the moral knowledge tests are scored by an ideal not a conventional
standard.

Now the SA test, as has been stated, was built around the Duties test
of Scale A, having eighteen questions in common with it. So we correlated
the score on the Duties test with the score on the SA test and found correla-
tions of .022 and .059 in two groups of 146 and 208 respectively, correspond-
ing to what was found in correlating the total moral knowledge scores with
the SA scores.

But a comparison of the way in which each element was handled brings
out this interesting fact, that in one group fourteen out of the eighteen and
in another, twelve out of the eighteen items were answered in the same way
on the Duties test and on the SA test. That is, the children represented them-
selves as doing what they had previously given as their duty in about two
thirds of the situations presented. When the Duties test is scored by the
same standard as that used in scoring the SA test, the correlation between
the two becomes .77.
el = ee ete be pected? We have the same situations
ee z . - re S, What is your duty t The other, What is
Po : cept of duty is of course a social function, a function

groups, a matter of code. Either the child pictures himself as living up

36fairly well to his duty, or else presents a picture of both code and practice
that is supposed to portray the “correct” behavior in the situation concerned,
without regard to whether his own behavior or his own code or the code of
some other group than the school group corresponds to the picture. Of
course his codes may and probably do overlap very considerably as the sum-
mary of the results of the shift technique tends to indicate. It does not
follow that because he may have lied to make a good appearance as to conduct
on the SA test he also lied in stating his opinion as to his duty. Since he was
not asked to say whether this notion of duty was what he learned in school
or at home or in clubs or what he felt to be the code of children in their own
world, we cannot speak with confidence as to his sincerity in giving his an-
swers. The consistency of results, as found by correlating comparable forms
of the same test, indicates that what is stated as moral knowledge has a cer-
tain coherence and stability, and whether lived up to or not, points the way
to action that is regarded as “proper”.

Moral Knowledge and Intelligence

In our second article correlations with intelligence were given which are
corroborated by this year’s testing. An unselected population yields a coeffi-
cient of at least .50 between mental age and the ability to give mature answers
to the questions in the tests. Evidently there is a large factor of intelligence
present. What this would be if the tests were scored from the standpoint of
convention has not yet been determined.

The conclusion seems warranted that the moral knowledge tests reflect
codes, and that there is considerable overlapping among these codes.FOURTH ARTICLE

SOME PROBABLE SOURCES OF MORAL KNOWLEDGE
IN. CHILDREN*

In our third article were listed ten questions relating to the further
development of the moral knowledge tests and their possible uses,
Questions five and six were as follows:

What are the major sources of the knowledge or quasi-knowledge
the children exhibit on the tests?

What. codes characterize children of different groups—age, sex,
race, community, culture, etc.? How do these compare with codes of
adults of the same or other communities and groups—teachers, parents,
Sunday-school and club leaders, etc.? .

These questions Messrs. Sonquist and Kerr have attempted to an-
swer. That the answers are not final they would be the first to assert.
But that their methods are most suggestive for future research in this
field no one will doubt.

The writers have been painstaking in the care with which they have
used their statistical techniques, securing advice and criticism at every
turn. But the results are none the less fundamentally their own. The
Inquiry welcomes this informal addition to its findings.

It has seemed best to introduce here and there certain contro-
versial matters in order to promote discussion of the paper. The ensuing
footnotes recall the first edition of Wells’ Outline of History in which
criticisms and replies were both included in the notes to the text. We
trust that this practice of printing attack and defense will prove sug-
gestive for similar articles.

The Character Education Inqutry.

The multiplicity of books, articles and interviews by students of
child life bears witness to the fact that there are widely divergent ideas
as to what causes are behind the so-called new standards which present
day youth is setting up. Indeed these attempts by many men of many
minds to evaluate child standards of action rather indicate also that as
yet we have no accurate knowledge of what are the most direct sources
from which children derive their notions of right and wrong which,
solidifying into codes, later become their adult standards of action.
Various institutions are building programs for what they feel to be
worthy attempts to educate children in morals and ethics; prizes are
offered and won, for “moral codes” for children, as if the intellectual
acceptance of an adult fabricated code was an index of a standard;
schools are adopting courses in ethics and morals, and Sunday schools
and clubs are continuing, with added emphasis on conduct, the pre-
senting of truths which have a moral implication, and all of this without
scientific data as to what is making actual contribution to children’s
knowledge of right and wrong.

In pursuing the investigation implied by the subject of this paper,
the writers have not gone on the assumption that knowledge and be-
havior are highly correlated. In the first article on “Testing the Knowl-

 

* .
Dr. May and Dr. Hartshorne conducted a research course in the Measurement

of Character during 1925-26, and this paper was written by Sonquist and Kerr in this
connection. :

58edge of Right and Wrong,” by Hartshorne and May, in the February,
1926 number of Reticious Epucation, we find this paragraph:

“One of our problems is to. discover what the relation is between
behavior and the knowledge of right and wrong. Furthermore, we do
not assume that word behavior and a true knowledge of right and
wrong are necessarily correlated. It may be that overt action is a far
better indication of what a man really knows about right and wrong
than his verbal responses are.”

Nevertheless, the students who have undertaken the research which
will be described below feel that we will be rendering a certain service
if we can to a degree ascertain what are some of the probable sources
from which a child does get his knowledge of right and wrong actions.

By consensus of opinion, the groups which have a major influence
upon the life of a child are ordinarily four or five. He lives in a home;
he spends a large part of his time in school; he has friends; he is prob-
ably in some institution for religious instruction at least once a week;
and he may belong to an organized club having an adult leader. We
recognize that there are other factors contributing to his fund of ethical
concepts, such as commercialized amusements, books read independently
of any of the five above mentioned influences and, in the middle and
later adolescent years, employers of youth.

Any attempt to undertake research must of course be limited in
such a way as to secure reliable and accurate results. This research is
limited to a group of children, from the fifth through the ninth grades
in seven different day schools, as studied and tested in four different
situations for the five major influences, viz., homes, schools, Sunday
schools, organized clubs and friends.

Before more accurately describing the fields in which the investi-
gation was conducted, it may be well to state the problems to be dealt
with:

1. Does a child’s knowledge of right and wrong tend to be more
like his Sunday-school teacher’s, his parents’, his adult club leader’s, or
his child friends’?

2. Is there a moral knowledge age, similar to mental age? In
other words will a fifth grade child have as high a code asa seventh
grade child, or a ninth grade one?

3. Do boys rate the same as girls in so far as the knowledge of
right and wrong is concerned?

4. Does a child have a uniform code of morals or does his code
vary according to the group in which he finds himself at the moment?
That is, is there a typical day-school code, Sunday-school code, etc.?

There are many other questions which will naturally arise through
the discussion of the investigation and the description of results. Some
of these will be indicated as we proceed, although solutions of most of

them will have to be delayed until a further study of our data can be
made.

Description of the Fields
This investigation was conducted in six suburban towns ranging

from three hundred to three thousand in population and in one small
city of thirty-odd thousand population. One school in each of the seven

39communities was used as a primary testing unit. The foreign popula-
tion of all seven is about ten per cent of the total. These children are
largely American-born of foreign-born parents. Negro children num-
ber about ten per cent of the total in the six smaller communities and
about fifteen per cent of the total in the larger town. In the smaller
places the children practically all come from average homes, but in the
junior high school of the larger town there is a very great range both
in intelligence and in economic background. Of the total of 1,159 chil-
dren tested in the school situations, 690 came from the fifth through
the eighth grades of the six smaller schools and 469 came from the
seventh, eighth and ninth grades of a junior high school in the larger
community.

» Techmque

The “Moral Knowledge Tests” devised by Drs. May and Hart-
shorne of the Character Education Inquiry and described in the Febru-
ary and April (1926) numbers of Retietous EpucaTion, were used as a
basis of our study. The reader is referred to these articles for a com-
plete description of the building of the tests. For our purpose we
decided to use Forms 1 and 2 of Scale A only, which are subdivided
as follows:

eee an ect. 2. ee 36 Situations
CO ee 29 eS
ROrprenenstne 22k. . oe en cae 10 es
Oe se 17 =
Wem © onsegucices .....-. 5.2.. 20sle.. 16 :

In order to have tests which would be commensurate with these,
but which would be devoid of too great carry-over from one form to
another, it was necessary to build on the two forms of Scale A, two
additional tests. The new Form 3 was built on Form 1 and the new
Form 4 was similarly built on Form 2. This was done very largely by
reversing statements, using negatives where the original had positively
worded situations, and vice versa. In every case the moral problem faced
was left identical. The new forms were given a different appearance,
Form 3 being mimeographed and Form 4 being multigraph-printed. In
so far as was possible a different set-up of each test was also used.
That is, “YES—NO” was substituted for “TRUE—FALSE,” etc. The
five tests of the battery were arranged in different order.

The reliability of the two new forms was determined by giving the
parallel forms of the test to the same children in the seventh, eighth
and ninth grades of a junior high school which was not to be used in
the investigation. In each case all three grades were represented in
each part of the test in order to have a vertical cross section of the
school life. Under the direction of the research department of the
school system of which this school is a part, finely trained teachers
administered the tests. A total of 164 children took both Forms 1 and 3,
and 151 children took both Forms 2 and 4.

These tests, when scored in as nearly comparable ways as possible,
showed the following means:

40TABLE I

Form 1 Mean 67.14 Form 3 Mean 67.12
Form 2 Mean 69.11 Form 4 Mean 69.32

This striking agreement between the means of Forms 1 and 3,
and Forms 2 and 4 indicates an almost equal degree of difficulty of the
original forms and those devised for the purposes of this study. The
slightly higher means for Forms 2 and 4 might seem to show that they
were more difficult than Forms 1 and 3 but this difference is well within the
probable error of these means which leads us to believe that all four tests
are practically of the same degree of difficulty.

A study of the co-efficients of correlation reveals the following relation-
ships:

TABLE Hi
‘r’ of Forms 1 and 3 equals .651 + .03
‘r’ of Forms 2 and 4 equals .607 + .05
*r’ of Forms 1 and 2 equals /31 2 02

With these results we feel that the forms are sufficiently comparable to
warrant their use in this study, although we realize that for many types of
research a reliability of less than .90 would be of little service.

Administration

In every case the day school was used as the primary testing unit. In
order to secure the necessary information concerning other groups to which
the children belonged, we used a “Survey Blank” asking for the following
facts :

Name School Grade.
Address Name of home room teacher.
Age Names of three favorite teachers.

Names of at least three best friends, either boys or girls, or both, in this school.
Name of Sunday school attended.
Name of Sunday-school teacher.
If belonging to a club, Scout Troop, Camp Fire Girls, or any group which has an
older person as leader or adviser, and meets more than two times a month:
Name of club.
Leader’s name.
Where it meets.

Names, ages and school grades of brothers and sisters.

Father’s name, or name and address of step-father or guardian.

When these blanks were distributed in the schools it was made very
plain to the children that we were asking for some information which they
might consider rather personal. They were given to understand that no
teacher or other person connected with the schools would ever see them.
They were told that each blank would have a number and that in using the
information the investigators would refer always to numbers and never
to names. They were asked specifically to fill out the blanks as completely as
possible, and then turn them face downward on their desks. This was done,
after which the blanks were collected by a pupil and taken by him directly
to the investigator. School teachers and principals gave excellent co-oper-
ation in maintaining absolutely fair-play in this regard.

After the blanks were returned they were alphabetized by grades and

*Taken from the third article of the series referred to, which discusses this prob-
lem of reliability.

 

41each grade was assigned a set of serial numbers, e. g., 7th grade, 1 to 200;
8th grade, 201 to 324, etc. Fach blank was carefully gone over. The data
which it contained were carefully tabulated, by numbers, under proper heads
on separate charts. This information enabled us to approach Sunday schools
and organized clubs and gave us data on how many tests we would need in
sending tests into the homes. The securing of this information on this blank
took an average of ten minutes. The writers believe that it is better used
separately from the tests.

Examiners. In all cases the tests were administered by trained and ex-
perienced examiners, either from outside the school system itself (advanced
normal school students in educational psychology were used in many places),
or by teachers designated by the research department of the school system.
Specially trained testers were used in all Sunday schools and club situations
and in many cases the tests were administered by the writers themselves.

Testing in the Schools. Scale A, Form 1 was used in the school situ-
ation. All five parts of the test were given, forty-five minutes being allowed
as maximum time. On the average not more than forty minutes was re-
quired; 1159 children took the tests, together with thirty-one teachers and
one vice-principal. Nine teachers did not care to take the tests.

Testing in Sunday Schools. Scale A, Form 4 was used for Sunday
schools. A necessary interval of two or three weeks elapsed after the tests
were used in the day schools before they could be given in Sunday schools.
Children were listed as being from 110 different Sunday schools. Tests
were given in but 21 of these. In others, there were but two or three chil-
dren who had also had the test in day school and the irregular attendance
would probably have resulted in much wasted time and effort if we had
tried to reach such small numbers. In some few cases official objection pre-
vented giving the tests. 650 tests were administered in Sunday schools. Of
this number we secured 276 tests of children who had also had the test in
day school. 51 Sunday-school teachers took the test at the same time. Of
the 1159 children who took the tests in day school, about 17% did not desig-
nate any Sunday school.

Administration of the tests in Sunday schools was rather more difficult
than in day school. It was necessary to provide pencils and in many cases
writing boards because there were no tables or other writing spaces avail-
able. The lesson time was too short for adequate testing, although in most
cases the class period was extended to allow completion of the tests. The
general atmosphere of the Sunday schools was not adapted to fair testing.

Testing in Clubs. Scale A, Form 3 was used for club groups. Of the
1159 children tested in day school approximately 500 said they were mem-
bers of some club. 70 clubs were listed. Tests were administered in 20
clubs, to a total of 205 children, of which number 104 had the test in day
school. 666 children said they were not in any club having an adult leader
and meeting place out of school, but it must be remembered that these include
children of the fifth and sixth grades. In the junior high school, of 469
children taking the test 172 said they were not members of any club. 59
children in this school were members of school clubs meeting in the school.
These children were not included in the club testing program because it was

felt that it would be a duplication of the school situation. 17 club leaders
took the test.

42The club testing presented better administration conditions than the
Sunday-school testing, although it was not on a par with the day-school
situation. There is a wide age range in most clubs, little close grading and
irregular attendance.

Testing in the Homes. Scale A, Form 4, with a specially printed set of
directions on the front sheet, was used for the home testing of parents and
children. In each school classroom, explanation was made about the use of
the tests in the homes, and caution was given against receiving or giving any
help. The tests going into homes were each given the serial number which
appeared on the school test of the child from that home. This made it unneces-
sary for parents to write their names on the blanks. An explanatory letter
giving assurance of the absolutely impersonal use of the tests and requesting
parental co-operation with the investigation, was sent with each set of tests
into each home. A return envelope, addressed to the Character Education
Inquiry, was included, with the request that the finished tests be enclosed and
sealed in that envelope and either mailed to the headquarters of the Inquiry
in New York, or returned to the school within four days.* 620 children took
the test in the homes. In 476 families either one or both parents took the
test and in 276 cases both parents as well as at least one child were rep-
resented.

Scoring and Weighting. In scoring these tests, the fifth section, “Word
Consequences,” was not used, because as yet no satisfactory means have been
devised for scoring it. Had this section been used in measuring the reliability
of the two new forms a higher coefficient of reliability would have been
obtained.

Final scoring of the tests was done by the clerical staff of the Character
Education Inquiry according to the technique outlined in the second article
in ReLicious Epucation. The scores were then weighted by a method devised
for this purpose by the Inquiry in order to equalize the values of the differ-
ent parts of the test. Isolation and classification of the various groups of
data and the necessary statistical treatment preliminary to interpretation were
done by the writers. With this introduction, we are now ready to take up
our original questions in the light of the data secured.

1. Does the child’s knowledge of right and wrong tend to be more like
his Sunday-school teacher’s, his parents’, his adult club leader’ s, or his child
friends’?

Results of our investigation indicate the different degrees of likeness
(r., coefficient of correlation) between the child and leaders of the five major
influence groups of which he may be a part as shown by the following table:

TABLE Ti

Correlation and Probable
Child relationship with: No. of Cases Mean Scores Error
res 416 69.22 io es
ries 5 1020 64.79 sod a OTS
me Aub Leaders 204 70. Sete a
4. Public School Teachers..... 695 80.423 .028
5. Sunday School Teachers.... 205 69.64 002

 

_*Of a total of 620 tests returned from either child or parents only fifteen were
mailed. A study of the question of child and parent collusion is included later in this
article under the paragraph heading, “Is there a Uniform Code?”

43Child-Parent Relationship

The figures given in Table III for parents are for situations where either
one or both parents took the test. Where both parents’ scores were available
(276 cases) the average or mean of both parents’ scores was correlated with
the child’s score. Where there was more than one child in the family, each
child’s score was correlated with this composite parents’ score. In addition
we felt it necessary to show the relationship of fathers and mothers to each
other, mothers with children and fathers with children, so we segregated all
cases where both parents took the tests.

 

TABLE IV
MEANS AND STANDARD DEVIATIONS OF 276 CHILD-PARENT CASES
Mean S&S. D.
Mothers... 2... 69.359 8.079
Paliers 68.477 8.46
Ciien.. 2. 26s 69.263 8.07

The differences in the means and the deviations are so slight that they
can be accounted for by chance. This certainly shows remarkable agreement
of all persons concerned in the home situation.

Study of the correlations obtained in this home situation reveals some
significant results. The relationship of each is indicated by the following
correlations :*

 

TABLE V
pacmer lather ........2.. r equals .65 + 024
Mother-Children’ ..... 2... r equals 49 + 031
Paiter’ Nildten |. 220... r equals 40 + .034

The Mother-Father correlation indicates a high degree of accord with respect
to their knowledge of right and wrong. This accord between parents ac-
counts at least in part for the fairly high relationship between parents and
children.

The following partial correlations show interesting relationships :

IARLE. VI

Mother and child with father constant, r equals .33
Father and child with mother constant, r equals .12
Father and mother with child constant, r equals .57

It is evident that the mother relationship is considerably closer than the
father relationship as regards the moral knowledge of their children. This is
only to be expected when the time which each parent spends with the children
is considered.

Other influences are discussed later but none compare with the home.
In the light of this study it seems increasingly evident that if we wish to raise
the standards of moral knowledge of children, the most logical place to center
our efforts should be on the home.

Rs

 

*If the members of each pair correlated (father and mother
dren, father and children) had thought exactly alike on each
been +1.00. If they had taken opposite sid
—1.00. If they had agreed on half the
have been 0.00. As it is, the r’s show co

, mother and chil-
question, the r would have
es on each question the r would have been
questions and disagreed on half, the r would
nsiderable agreement.

44Another point arises at this time which will have some bearing on a
later discussion. Attention is called to the slight difference between the r of
65 between parents and the partial r of .57 with the child factor constant.
This would indicate a very small influence of the children on the moral
knowledge of their parents.*

Child-Friend Relationships

A study of this material should be prefaced by an explanation of the
way in which it was obtained. On the “Survey Blank” each child was asked
to name three or four of his best friends, in the school, either boys or girls
or both. Of the total of 1159 children taking the tests in schools, 79 or
slightly less than 7% did not name any one as a friend. On the other hand,
145 or slightly over 12% were not named as a good friend by any other
child. Considering the range of three school years in one case and four in
the others and also that this period is marked ordinarily by the tendency
to form friendships, these figures seem to be indicative of considerable in-
dividualism or lack of community experience among the school children. A
further study of the isolated group as to intelligence, moral knowledge, etc.,
might reveal significant information regarding their characteristics.

It is to be observed that a great majority of friends were named within
their own grade. Such a grouping is primarily probably an age grouping, but
that does not necessarily indicate that the friends named do not constitute
a genuinely friendly group. It is possibly significant that no child was named
as a friend by his comrades more than twelve times and most were named
six or seven times. On the other hand, most children did not indicate more
than two or three friends as “best” friends, whereas they were told they
might name four. It certainly seems as if within the age-grade grouping
children were deliberately limiting the circle of those whom they would
designate as friends. However, the study so far is too superficial to permit
of any extensive deduction as to whether or not there is a gang or group
influence at work.

Another interesting fact is that boys named girls as friends and vice
versa, quite freely. These lines are probably not as closely drawn as many
would have us believe.

Tabulation of the correlation between a child and his friends was done
by correlating the score of each child with the mean score of those he named
as friends. We did not consider the relationship between the child and
those who named him as friend.

*In criticism it may be noted that the partial r’s obtained between_parents and
children are lower than r’s between intelligence and moral knowledge. This does not
mean that the r’s obtained are not real, but that they are weak. Extraordinarily weak
for basing any predictions, save to say, as the text does, that the mother-child rela-
tion seems to be greater than the father-child relation, which latter is negligible in
the partial, and may easily be accounted for by the father-mother relation.

In reply it may be said that the r’s between intelligence and moral knowledge
probably do not invalidate r’s between parents and children. The means indicate that
there is very little difference. Probably the difference found is due to the differences
in intelligence so that the true correlation in moral knowledge would be higher. If
intelligence plays such a large part here it must also do so in the case of the teachers
and leaders. Why then do we get such correlations as .35, .137, and .002? After all,
we are not saying that the home is THE influence. We are only saying that of the
home, school, club, friends and church school, the home is the greatest influence. In
comparison to these other groups, the case for the parents seems quite clear.

45TABLE VII

RELATIONSHIP BETWEEN CHILD AND FRIEND BY GRADES

Grades No. Cases Correlation Means of Children Means of Friends
Die, 181 246

Mia aa7 .208

fg 328 .228

ee 15/7 054

es 168 .148

Jig. 65 2 = 2 1071 ao = IS 64.494 64.797

This table seems to reveal a relationship between the knowledge of right
and wrong of a child and that of his or her friends. It may be argued that
inasmuch as the children tended to name friends in their own grade more
frequently than others, similarity of ages might be responsible for the positive
correlation of .35. This might seem to be further borne out by Table VIII,
which shows the differences in the means of the various grades. Table IX
shows an interesting difference between the scores of girls and boys which
might well be a larger factor than age—if boys had named only boys and girls
only girls as their friends. A study was made of the seventh graders in
comparison only with their older friends. This gave us a correlation of .24,
Comparison of this with the r of 228 between the seventh graders and all
their friends indicates that age does not increase the correlation coefficient
by more than the probable error.

Granting some influence of such factors as age and sex, it still seems
evident that there is a positive relation between a child’s knowledge of right
and wrong and that of his or her friends. It also appears that this likeness
grows as the child grows older, which seems to indicate that friendship counts
for more in influencing one’s moral knowledge as time goes on, at least up
through the ninth grade. This accords with our own experience and our
observations of adolescents generally where the mores of a group is observed
to dominate the mores of the individual at certain times.

This study of friends should be carried on much further before any
broad generalizations are made. Sufficient evidence, however, seems to war-
rant the suggestion that more natura] groupings would tend to more effective
results in the field of moral education.

Child-School Teacher Relationship
In arriving at a fair measure of the relationship between children and
their day-school teachers a somewhat different technique from the ordinary
had to be employed. First we obtained a norm for the different grades by
pooling all the scores obtained in our study with those found in two other
large schools studied by the Inquiry. These norms are as follows:
TABLE VIII

MORAL KNOWLEDGE GRADE NORMS

Number Norm
oe Glate 438 55.98
Se 420 61.14
oe 535 64.17
a ee 455 64.05
MO ee ao4 68.45

Ss ee ee

Ae 62.57The ninth grade norm is probably not so valuable as the others since it
was obtained from but two schools.

Next we correlated the deviation of each child’s score from the norm for
his grade with his teacher’s score. Any deviation from this broadly based
norm is a much better indication of the influence of the particular. teacher than
the actual score would be.

Reference to Table III will show that there is a negligible influence of
day-school teachers on the moral knowledge of their children, in spite of the
very high mean for teachers of 80.423. The mean is over eleven points above
the mean of the parents’ scores, but a comparison of the correlations will
indicate that the influence of parents is much greater than that of teachers.
Evidently other factors than the amount of moral knowledge play an impor-
tant part in the imparting of ideas of right and wrong to children.

It may be objected that persons who are in positions of authority over
children such as are grade-room teachers are by virtue of that fact hardly
likely to have much influence on the moral knowledge of their children. To
which it may be replied that as part of our study we asked children to name
their favorite teachers. 557 pupils named 32 teachers as favorites. The
same method was used in working out this correlation as was used in the
other pupil-teacher correlations with a resulting coefficient of .055 which,
though higher than the other, is negligible. Inspection of the scores reveals
the fact that many pupils who rated very poor on their own tests named
teachers with very high scores as favorites.

Child-Club Leader Relationships

The deviations from the norms of the children’s school scores was corre-
lated with the scores of their club leaders. By this method any amount of
influence which the club leader would have over and above the school situation
would become apparent. The coefficient was found to be .137, seeming to
indicate little, if any, relationship.

In view of the much higher correlation of friends, the question may be
raised as to the value of having a club leader imposed upon the group.
Ordinarily it is presumed that there is more or less of direct moral in-
struction being given in such groups. Are we to suppose that this goes for
naught or does the coefficient obtained indicate a laxness in the club atmos-
phere ?*

Child-Sunday School Teacher Relationships

Deviations from day school norms were again used in working out this

relationship, in order to give every fair chance to show whether or not Sun-

 

*Criticism.

Or does the low r suggest some discrepancy between what the club leader teaches
and what he answered on the test? The problem of interpretation of likeness or differ-
ence from the leader is again raised. The implication of lack of influence because
ideas of teachers and pupils are not alike may be questioned. The ideas the pupils have
may be the result of what the children get in school, even though the teachers may
think differently.

Reply. :

Of course we are only giving the relationship between one teacher and her pupils
while there are many other factors in the school situation. We have studied two of
these however, in the friends and in the favorite teachers. Many of the teachers studied
have taught in these schools for several years. To be exact we should have all the
teachers which the children have had, but this is impossible. With all these allowances
we cannot make out a case for any appreciable influence on the part of the teachers.
I have spoken of these results to a number of friends who are interested in this study
and they are not a bit surprised. In nearly every case the results seem to corroborate
their own exneriences.

4?Fea
hee

 

day-school teachers have a real influence on their children in the matter of
knowledge of right and wrong. 205 pupils with 51 teachers took the tests in
21 schools. The coefficient was found to be .002. It is interesting to note
that the means of these teachers is nearly the same as that of the parents,
being 69.64 for Sunday school teachers and 69.22 for parents.

The Sunday school is supposed to be the bulwark of moral instruction
for children; why then do we find no relationship between the knowledge of
right and wrong which Sunday school teachers have and which their pupils
have? It is probably true that the Sunday school is much less a natural
grouping than either clubs or schools. And it is certainly not in any degree
the natural grouping which is found in the home. Does this account in
part for the low relationship.*

Summary

Results seem to point directly to the home as the outstanding source
of the knowledge of right and wrong and that friends come second. There
is possibility of a slight influence by club leaders, but we have here no evi-
dence to show that either day-school or Sunday-school teachers are con-
tributing to the moral knowledge of children either directly or indirectly.

2. Is there a moral knowledge age, similar to mental age? In other

words, will a fifth grade child have as high a code as a seventh grade
child, or a ninth grade one?

3. Do boys rate the same as girls in moral knowledge?

Reference is made to Table VIII, which indicates a rising of the norms
through five grades, and with 2181 pupils included in the measuring process.
Attention is called to the following table, which is compiled from our own
investigation :

 

 

EARLE: 1%
COMPARATIVE MEAN SCORES OF BOYS AND GIRLS
————_— Mean ————__ =. 5,7).
Grade Boys Girls Both Boys Girls Both
Om . .. 66.419 69.299 67.463 7.83 7.26 7.982
oe 66.86 69.5 68.21 8.55 7.44 8.13
ee 64.725 67.677 66.46 9, 7.83 8.37
ee 59.05 63.418 62.063 8.55 8.04 8.58
oe... 5303 57.38 55.88 9.6 8.01 9.57
i 62.489 67 257 65.625 10.02 8.638 9.471

It is to be noted that in the means for both boys and girls there is a
range of almost thirteen points upward through the grades. How much of
this is due to the intelligence factor we are unable to say as we have no intel-
ligence scores. We do know that these tests correlate about .50 with intel-
ligence. The reader is referred to the concluding paragraph of the third
article of this series. It is interesting to note that the girls are consistently
higher than the boys. At the same time there is a greater range and a larger

 

“Criticism: The value of Sunday schools and clubs would be shown better by
comparison of scores of those who are regular or old attendants and those who are
irregular or new.

tCriticism: This does not mean, of course,

[his that they are not so contributing.
A universal negative is not established in one study. =

48measure of variability among boys than among girls. This tends to decrease
with age, which is probably accounted for by the greater homogeneity of the
older groups.

When we consider the factors of intelligence and also the variation in
means in various situations (see discussion of the fourth question, below)
such as the home and school, it is rather questionable whether we would find a
moral knowledge age if all these factors were partialled out. The large
difference in means between the sexes would also argue against such a pos-
sibility.

4. Does a child have a uniform code of morals?

Where we are dealing with the same child under different situations the
factors of intelligence, age and sex are constant, so necessarily the differ-
ences we find are due to the situation and not to the child. From the scores
of 621 children who took the tests in both home and school we find the fol-
iowing results:

The child-home mean was 67.886 while the child-school mean was 64.391,
giving us a difference of 3.495 in favor of the home. This we find to be
7.07 times the S. D. of the difference, and therefore highly significant.*

The reliability between the two forms of the tests used in school and
home was ./51 while we secured a correlation between home scores and
school scores of only .459, which is another indication of real difference
between these two situations. The first conclusion might well be that this
difference is due to collusion between parents and children, affecting the
scores for the better in many cases. The following technique was employed
to discover whether this is so or not: (1) Where the children’s score is much
higher than that of either parent we assume that there has been no collusion.
This is a fair assumption on the basis of the small influence noted previously
of children on parents. (2) We have also assumed that where the parents’
scores are considerably higher than their children’s scores there has been no
collusion. (3) By this method we have narrowed down our possibility of
collusion to some 200 cases in which the child’s score was within five points
either above or below that of the parents’ score. (4) A random sampling
of one out of every five of these cases of most probable collusion was selected
which gave us forty-two cases for item to item comparison. Since three of
the four tests were multiple choice tests, we reasoned that collusion would
be most evident in identical errors. Careful comparison of item for item of
each of the ninety-two situations of the test revealed 700 identical errors
to 902 non-identical errors. The laws of chance alone would yield 800
identical and 800 non-identical errors which leads us to believe that there
was practically no evidence of collusion even in these most likely cases.
In addition, only one of the forty-two sample cases showed all the errors
identical.

With the above evidence before us we have come to the conclusion that
the difference between the home and the school is due primarily to the
situation. In other words, a child naturally responds differently in his
answers to what is right and wrong at home from the way he does at school.

*Cf. the procedure in Garrett, H. C., Statistics in Psychology and Education,
pp. 128 ff. or any standard text in statistical methods.

49There seem to be different codes for the different situations, such as a home
code, a school code, a Sunday school code, a club code. Every teacher and
every parent have experienced how differently children act in different
situations, so our data seem to bear out what common sense has told us

many times.
Table X gives further evidence of such codes:

TABLE X

No. Cases __ Institution vs. Institution Correlation
Mean Mean
621 Home 67.886 School 64.391 4599
276  - Sunday school 66.91 School 66.957 454
152 Home 68.89 Club 62.816 433
183 Home 68.45 Ss 65.532 398
48 Club 65.15 SS: 64.58 351
104 Club 62.387 School 65.86 349

From this table it is evident that the scores on tests taken in different
situations do not correlate as highly as the reliability coefficients of these
tests would lead one to anticipate if there were no factors in the situations
tending to call out differentiated responses. The evidence seems to indicate
that there is not the large amount or degree of transfer from one situation
to another which we have generally expected.

Moral knowledge does not seem to be a fixed general factor which ap-
pears identically in all the various situations in which a person finds him-
self. It seems rather to be more specific, in pre- and early adolescent chil-
dren anyway. The child is influenced more by the group code than by an
individual code.* This fact is tremendously significant for the religious
and moral educator.

We cannot be content with giving moral or religious instruction in the
church (even though it should be made effective with the hope that this
will mold the child’s character so as to carry over into all of his other life
situations). Rather must we get into every situation to build up specific
moral concepts. This involves the organization of all of society on a moral
and religious educational basis which when consummated would approxi-
mate what Jesus called “The Kingdom of God.”

In Conclusion

In summing up the results of the study, the evidence seems to justify
us In suggesting certain more or less tentative conclusions:

1. Though not extremely high, the home reveals by far the highest
relationship between children’s knowledge of right and wrong and that of
major influence groups, viz., parents, friends, club leaders, public school
teachers and Sunday school teachers, the degree of relationship ranging
from an r of .545 between children and parents to an r of .002 between
children and Sunday-school teachers. The means of the scores of the vari-

: *Criticism: What is the evidence for this conclusion? The child’s code might be
tea individual and yet there might be factors in the various situations which would
ead him to vary it enough in different directions to account for the correlations secured.

_ Reply: It seems to me that we have no evidence that the child’s code is highly
individual. If so, why does it not carry over into the different situations ?

50ous groups seem to have little to do with the relationship. Public school
teachers have a mean score of 80.42 as compared with the pupils’ mean
score of 62.57, with an r between them of only .03; while friends have a
mean of 64.79 as compared with 64.49, and an r between friends of .35.

The two more natural groups of home and friends are the most sig-
nificant though neither is high enough to warrant being called the predomi-
nant influence.

Within the home situation the mother’s influence is considerably greater
than that of the father while the children seem to influence the parent-child
relationship very little.

The evidence from this study seems to suggest that in the field of moral
knowledge greater results will be obtained by emphasis on education in
the home and amongst friends than in the other groups. Undoubtedly other
factors exist that influence children in this regard which need to be dis-
covered before we can determine what the most significant influences really
are. The lack of relationship between leader and led in the formal groups
where moral teaching is attempted directly, especially in the club and Sun-
day school, indicates that the leader’s ideas at least are not getting across
to the children.

2. There seems to be little evidence to lead us to believe that there is
a Moral Knowledge Age corresponding to the Mental Age of children.
The differences noted may well be due to the mental ages of the children.
The uniform difference between girls and boys in favor of the former is rather
interesting. It may or may not be due to the generally closer confinement
of girls to the home, especially to the mother whose influence we have seen
is greater than that of the father.

If the inference is correct that the daughters have a higher score than
the sons because they spend more time at home and are in more intimate
contact with its adult members, then the reason for the greater influence
of the home as compared with other agencies may be accounted for by the
fact that the home maintains more extensive and intimate contact with both
boys and girls than do schools, clubs and Sunday schools.

3. The wide differences in means and the relatively low correlations
between the scores of the same children in the different situations indicate
quite clearly that a child does not have a uniform generalized code of morals
but varies according to the situations in which he finds himself. In other
words, he has a Home code, a School code, a Sunday-school code, etc., or else
adapts a code fundamentally his own to meet the more insistent demands of
the occasion. Knowledge of right and wrong is a specific matter to be applied
to specific situations which the child encounters in his daily living. Perchance
this lack of a fixed general code is due to the secularized life with which we
surround our children. We may have to get more of a moral unity in the
individual child. Suffice it to say, the task of the moral and religious educator
is concerned with the complete life of the child and not with a portion set
aside for so-called religious instruction.

* * *

The writers have considered the problem from many different ap-
proaches and have confronted many facts which could not of necessity be
included in an article of this length. The study is rather more suggestive
of future possibilities of research than burdened with accomplished find-
ings. All pertinent criticisms and suggestions which will give more light on
a most complex and important problem in the field of religious education
will be most welcome.

51FIFTH ARTICLE

THE RELATION OF STANDARDS TO BEHAVIOR IN
INDIVIDUALS

Results previously reported have led us to feel that the scores on
the so-called Moral Knowledge Tests represent for the most part the
genuine opinion of the persons taking the test. In the case of children
these genuine opinions seem to agree largely with the adult standards of
the communities in which the children live and in particular are influ-
enced by the standards of the parents rather than by the standards of
teachers and leaders outside the home.

One of the most important of the problems listed in our third article
concerns the relation betwen the scores on these tests of moral opinion
and the scores on the tests used by the Inquiry for measuring behavior
tendencies of ethical significance. We are prepared now to report the
facts of this relation in some detail with regard to various forms of
deceptive behavior, and more briefly with regard to self-denying help-
ful behavior.

In our second article we reported a correlation of—.385 between the
sum of seven moral knowldge tests and a type of deception which con-
sisted of copying answers from an answer sheet while taking a test.
This figure was based on data obtained from a mid western city school
system and from one school in New York. In our discussion we gave
the evidence for concluding that this correlation was not altogether due
to the common factor of intelligence, and that it seemed rather high in
view of the fact that the conduct measured was highly specific and the
moral knowledge measured was quite general.* Since that article was
written many other groups have been measured with the revised moral
knowldge tests and with a variety of deceptive tests, affording us a
better foundation for the study of the relation between standards and
conduct.

For convenience of reference we will use the following notations
as symbols for the different behaviors studied:

Copying from an answer sheet or dictionary or getting help from someone.
Adding to one’s work after time is called.

Opening the eyes to guide one’s pencil when eyes are supposed to be shut.
Faking the solution of a puzzle test.

Faking a score in a physical ability contest, and so cheating one’s school mates.
Cheating in parlor games.

Stealing money from a puzzle used in a test or from a game at a party.
Total number of instances of deception in Behaviors A to G.

Helpful behavior.

It is not necessary to give in detail the techniques for measuring
these types of conduct. Suffice it to say that each was tested objectively
by performance tests the making of which has constituted one large sec-
tion of the work of the Inquiry. The moral knowledge tests used were
in the form discussed in the third article of this series.

Table I summarizes the facts in terms of total moral knowledge
scores and the scores on the various types of conduct tests.

School A consists of cases from a suburban school system. School

HOA OO wp

*The sum of the seven moral knowledge tests, with intelligence constant, yielded
a partial r of—.157 with cheating. But the sum of six (omitting the Cause-Effect test)
yielded a partial r with cheating of —.402 (intelligence constant).B consists of the residents of an institution for homeless children. School
C is a private school in New York.

The column headed N gives the number of cases. The mean moral
knowledge score is given in column two, and the standard deviation of
these scores in column three. The raw r reported is between the type
of cheating referred to in each section of the table—A, B, C, etc.—and a
total moral knowledge score. These r’s are corrected in the next column
for errors inherent in the conduct test material and in some cases (the
starred figures) for restricted range. The next column gives the r be-
tween mental age and cheating, which is needed for the partial. The
column headed “Partial r with M.A. constant” gives the correlation be-
tween moral knowledge and each type of conduct when intelligence is
kept constant.* It represents what we would get if the children were
of the same mental age. Here we see that the correlations in the other
column are due largely to the mental age factor which correlates posi-
tively with moral knowledge and negatively with deception, for when
mental age is kept constant the r’s drop to nearly zero in all cases.+

If we take the figures of Table I at their face value we shall have
to conclude that general moral knowledge as measured by the tests
described, and the specific behaviors classified as deception are only
slightly related, there being a barely detectable tendency for higher
moral knowledge scores to be associated with higher honesty (lower
dishonesty) scores. As this conclusion tends to contradict our common
sense judgment in the matter, it would be well to examine our data
more closely.

In the first place the moral knowledge score used was an actual
total score with the elements so weighted as to equalize the tests in
length, correlation with intelligence, correlation with conduct, and cor-
relation with the sum of all.

The partials reported in the second article were based on predicted
totals weighted only for length of test. What effect the method of
weighting had upon the correlations with intelligence and conduct ob-
tained for this article is not known and it has not seemed worthwhile to
find out inasmuch as the present total moral knowledge score seems to
be a more adequate measure of the true state of affairs than the results
secured previously with our preliminary tests. That is, we are probably
nearer the truth in this article than in our second article.

TABLE |
MORAL KNOWLEDGE AND CONDUCT

Behavior A. Copying from a Key.

 

Mean SD: r with Partial r
N M.K, M. K. Rawr Correctedr M.A. M.A. constant
School A 290 121 20.7 —.252 —.305
School B 200 120 16.0 —.313 G1
School C 194 128 16.2 —.280 = O5e
Average —.346 —.400 —.158

*The same r for moral knowledge and intelligence, .566, is used for each partial
and represents the actual correlation between mental age and the total moral knowledge
scores of an unselected group.

+Behavior A is the same as the one referred to in the second article. The co-
efficients are closely similar to the —.385 quoted above. See footnote on page 52.

53Behavior A. Misuse of Dictionary at Home.

 

 

 

 

School A 270 117 18.5 —.223 —.236
School C 200 130 13.6 —.178 —.286
Average —.261 08 —.061
Behavior B. Adding on Answers.
School A 632 109 23.3 —.076 —.085
School B 200 120 16.4 -+-.029 +.032
School C 167 126 15.9 -+.106 +.112
Average +.020 +.07 —.023
Behavior C. Peeping.
School B 210 120 16.0 —.244 —.297* — 226 —.210
Behavior D. Faking Puzzles.
School B 150 116 16.0 +.0164 +.020 -++.062 —.023
Behavior E. Faking Contests.
School A 88 118 20.0 —.039 —.045
School B 207 120 16.5 —.014 —.020
Average —.032 -+.052 —.075
Behavior F. Cheating at Parties.
School B 200 120 16.0 —.142 —173*
Behavior G. Theft of Money.
School B 216 120 16.0 —.210 —.256*
Behavior H. Total C’s.
School A C’s 285 Tie 20 —.103
School=5:- CT: 210 120 16 —.263
School C C’s 191 126 16 —.002
—.123
Behavior I. Helpfulness.
School A 484 117 24.2 —.002
School B 198 120 16.8 -+-.283
School C 166 129 16.0 -+.218

In the second place the moral knowledge tests were scored from
the standpoint of the highest social standards of adults, as was indicated
in the first article. A high score represents knowledge of this ideal
adult code. It does not necessarily mean that the child’s own code is
like the adults’. And since in practicing deception the child has no idea
that any adult is aware of what is going on he naturally feels no need
of making his behavior conform to the adult code. If this is the psy-
chology of the situation,s then we could not expect very high negative
correlations between scores on the moral knowledge tests and the de-
ception scores, but we would expect to find evidence of a closer relation
between the child’s own code and his behavior than between his knowl-
edge of the adult code and his behavior.

We attempted, therefore, a method of scoring which would show

*Corrected for restricted range.

fIf this interpretation is correct, then the low correlations between moral knowl-
edge scores and helpful behavior indicate that the child’s knowledge of the ideal adult
code is relatively independent of his own code, for there is no doubt of his being aware

o what the approved helpful behavior is and that his altruistic acts are known to
others.

54the child’s likeness to other children rather than his likeness to the adult
code. We used as a key the conventional or majority answers of the
children’s own papers. An item was called right when it corresponded
to the answers of more than half the children. This did not give us a
true children’s code, of course, but the resulting scores measure more
nearly the approximation to the conventional than the scores used before.

We first correlated the new conventional scores and the idealistic
scores with the following results for Scale A.

 

r

Sesto! 2 ee 364
Tést 2. 090
rést 3. | Se 735
Ost Se 328
ies = 594

These r’s suggest that the new conventional scores might give dif-
ferent results from the former scores when correlated with deception.
For one group we had over twenty cheating and stealing tests. Com-
bining the results of all these into one deception score and correlating
with the conventional moral knowledge scores we get the following for
Scale A:

 

r
ten be ee 015
eet 077
Vest 3 3. 5 —.058
16 4 2 115
16a. 049

Exactly the same tests scored by the former ideal key yield these r’s:
Test 1 2 eS —.131
Tes —.039
Test 2) 2 = S22 a Se Se —.041
Te 4 —.039

The conventional scores are low but interestingly enough they
change mostly from negative to positive, corroborating our feeling that
we were getting nearer to the child’s code score by the conventional
scoring procedure.

The use of a conventional key raises the mean and shortens the range
of effective differentiation on every test. It is obvious that an element
which all answer in the same way will not differentiate. Our low cor-
relations are partly due to this limitation in our second scoring method.
Apparently we shall have to study the tests item by item to find the
crucial elements.

First we will report on items referring specifically to cheating.
Tables II to IV show how the cheaters and non-cheaters answered these
items as well as a few others involving other types of behavior.

aeIn the case of each table the elements referred to are printed in
full. The first column for each element gives the number of times the
individuals cheated on a test of the Behavior A variety, in which cheat-
ing consisted in copying answers from a key or dictionary. The next
columns give the number of individuals who chose the answer given at
the top of the column. The columns headed N show the number of
cases used for this particular study. In Table II, for example, we find
that on item No. 7 we have 341 cases of whom 126 did not cheat at all,
135 cheated once, 64 twice, etc. Of the 126 honest ones, one gave the
answer labelled a), seven b), twenty-five c), twelve d), and eighty-one
e). The row marked Ave. gives the average cheating score for each
column, that is, for the pupils who gave the several possible answers.
Those who answered a) have a mean cheating score of 1.5 times per
pupil, whereas those who answered c) have a mean cheating score of
8 times per pupil. The bottom row gives the percentage of cheaters
who gave the answer indicated at the top of the respective columns.
Thus of those who said a), 91% cheated; of those who said b), 61%
cheated, etc.

The elements reported on in Table II are taken from the original
form of the Comprehensions Test and are as follows:

7. If another pupil wants to copy your work and hand it in:

a. Let him do it and say nothing about it.

b. Let him do it but tell the teacher.

c. Don’t let him do it and say nothing.

d. Don’t let him do it and tell the teacher he wanted to.

e. Don’t let him and tell him that you don’t approve of cheating.

12. If you make a mistake and put a nickel for a penny in the slot:

a. Put in four slugs to even it up.

b. Call up the company and tell them about it.
c. Smash the thing and get your nickel.

d. Report it to the police.

e. Do nothing.

19. If the storekeeper gives you back too much change:

a. Tell him he made a mistake.

b. Go out and say nothing about it—spend the money for candy.
c. Keep the money and give it to the Christmas Fund.

d. Tell your mother about how clever you were.

24. If you find that someone has passed you a coin that is not real money but
looks like it at first glance:

Pass it on to someone else.

Throw it away—destroy it.

Try to find the person and give it back to him.

Keep it as a souvenir.

Ask an older person what to do with it.

Pane oe

26. If someone asks your opinion about 'a person whom you don’t know very well:
a. Say he is all right.
b. Say you don’t know him.
c. Say you think he is not a very good person.
d. Say, “Why do you ask me such a question?”

27. If your teacher asks you a question and you don’t know the answer:
Say, “I don’t know.”

Try to make her think you know but can’t express it.
Guess at the answer.

Say you think it is a foolish question.

ao oP

56TABLE II

CORRESPONDENCE OF COMPREHENSION AND CONDUCT

 

Element 7 Element 19
Gs a b c d € N a b c d N
0 aE % 25 12 81 126 119 at: 4 2
ef: 4 6 14 5 106 135 128 2 f) 1
2 6 4 iC 5 42 64 61 a i: a;
3 1 3 3 8 15 14 1 :
4 1 1 :
4h 11 18 49 25 238 341 322 oS 10 5 340
Ave. 1.5 1.0 0.8 1.0 9 9 9 1.0 ot 1.2
% C: 91 61 49 49 66 63 63 67 60 60
Element 12 Element 26
Cs a b Cc d e N a b c d N
0 i. 52 1 66 15 78 3 30
i: iG So 3 2 70 alee 67 1 50
2 10 22 2 30 9 30 4 20
3 2 7 aL 5 S cL 5
4 at
T 26 134 > 4 el 340 44 183 8 105 340
Ave. 1.3 29 1.4 133 8 1.0 8 1.1 1.0
%C . 61 100 75 61 66 57 63 71
Element 24 Element 27
Gs a b Cc d e N a b c d N
0 : 45 29 21 28 114 3 9
t: 2 DS 82 18 30 130 2 2
2 2 22 aL ey 8 60 2 2
s 4 4 5 2 15
4 1 1
aE 7, 124 87 ao 68 341 320 iL 13 0 340
Ave. 9 9 1.0 1.0 8 9 9 aa
“ol if 64 67 62 59 64 57 31

The elements reported on in Table III are taken from the original
form of the Provocations Test. The subject is to indicate whether the
act described is right (R), excusable (Ex), or wrong (Wr).

1. Helen noticed that nearly everyone in the class was cheating on a test,

So she cheated t00.c 3.2: 5 5:55 4555 ee REX We
10. There was a contest among the classes for high grades. John cheated

on the test in order to help his class win...:...-......... R Ex We
15. The neighbors had been kept awake at night by two cats fighting. So

Fred set his bull doe on them...,.-...... ee REX Wer

21. When Dick pointed his father’s revolver at Joe in fun, Joe said,
Lee

 

 

 

 

 

 

“Don’t you know better than that you ——— foolre. 23. R Ex Wr
23. Helen knew that cucumber salad would make her sick but she ate some
So. as not.to-offend-the hostess. ..2..... 2... REx Wer
TABLE III
CORRESPONDENCE OF IMAGINED PROVOCATION AND CONDUCT
Element 1 Element 10 Element 15
C's R Bx “Wr  N R Ex Wr N R Bx We oN ‘
0 5 8 87 100 1 15 83 99 34 30 30 94
a LE 10 118 129 4 12 113 129 56 30 45 13h
2 3 4 83 90 4 13 74 91 39 30 22 91
Z 1 3 29 33 4 4 25: oo 10 9 At 30
4 1 5 6 1 1 4 6 3 2 iL 6
Lr 10 26 BoD 358 14 45 299 358 142 101 109 ob2
ve. £.0 2 42 ey 2.0 12 1-2 12 1:2 1.2 1.2 1.2
To 50 69 Lo (2 93 67 ice Ve 76 70 72 aElement 21 Element 23

0 33 23 34 90 29 39 24 92

1 47 36 51 = 134 33 51 ATS 131

2 43 17 30 90 21 40 30 91

3 12 9 9 30 3 13 14 30

4 2 1 2 5 2 3 5
aE 137 86 126 349 88-143. 118 349
Ave. 1.3 12> S10 1.2 1.0 1.2 1.4 1.2
JC 76 73 73 74 67 73 80 74

The elements reported in Table IV are taken from the original
Duties Test and are as follows:

8. To bet on your home team........-..-eeeee reese er eercces True? - False
36. To stick with your gang even when they are wrong.......- True ? False
69. To accept every decision of the umpire without question... .True ? False
74. To pretend you understand a thing when you dO tt = True ? False

TABLE IV
CORRESPONDENCE OF SENSE OF DUTY AND CONDUCT
Element 8 Element 36 Element 69 Element 74
Cs + — ? N Se 2? SN + — ? N +— ?N
0 77 64 42 183 39: 410 3) 180 $23 21 32-15 8 166 honed:
aE 93-59 21 173 55 105 14 174 113] -24 2 25 162 7 150 10 167
2 43 29 14 86 26 51 = G83 5O 14 13-4 1 61 6 76
3 20 414 S39 t= ee 0 22 48 5-28 6-39
4 Sie 8 oes 2 8 4 1 3 8 3 AD ES

T 238 164 87 489
Ace ft) 9 9 £0
we 65 61 52 63

134 291 59 484
i120 10
71 62 47 63

312 67 76 455
Q= Ti F021 0
61— 69-— 59: -62

34 409 30 473
£692 bo
16-59 11 Ge

Looking back over Tables II to IV we find the following conspicu-
ous differences.

Comprehensions, Element 7. 91% of those who say it is all right
to let another pupil copy your work and hand it in as his own actually
cheated themselves.

Comprehensions, Element 12. 100% of those preferring to smash
the slot machine to recover their lost nickel actually cheated on a test.

Provocations, Element 10. 93% of those who thought it right for
John to cheat in order to help his class win actually cheated themselves.

It is noteworthy that these high agreements among the cheaters are
in regard to cheating in two cases, to property in the third and not in
any instance to other types of behavior. This is somewhat surprising,
since one would not expect a cheater to wear his heart on his sleeve.
The way he gives himself away in these particular instances may afford
suggestions as to how to build a test that will contain a large number
of elements having this attraction for the cheater.

Meanwhile, it may be found that other elements already used may
distinguish between the honest and the dishonest subjects. A complete
analysis of six hundred elements for all the cases available was hardly
justified in view of the improbability of success. So we selected the
twenty-five most deceptive individuals from a group that had over
twenty tests of deception and twenty-five cases from another group who
did not cheat on any one of ten tests. The first group cheated on the
average three out of every four chances. All these children had Scale
A, Form 2 of the Moral Knowledge Tests. We ran through the first
four tests—Causes, Duties, Comprehensions and Provocations—and
tabulated the way the honest and dishonest groups answered each ele-

58ment. The items of Table V showed significant* differences between
the two groups. The score reported is the score chosen by the honest
group, the other group choosing some other answer. Those marked “2”
are weighted double because of the extreme difference between the
groups. Items scored as shown in this table give the honest a high
score and the dishonest a low score.

TABLE V
HONEST RESPONSES ON DISTINGUISHING ELEMENTS—SCALE A
Causes Duties Provocations

Item Score 3 —ors 6 wr or ex

9 — 13 + *2” 7 wr or ex
13 as i Ss —— or 5 8 ex —
i/ — 17 S 12 ex “2
18 — Z5 —— ors 13 wr or ex
20 — Comprehensions 14 Wr or ex
21 = 2" 3 b 15 wr or ex
23 Sa 9 a

26 sD?
27 —
29 =O”
34 —
35 —

Using Table V as a key we scored the papers of the two groups,
using only the items listed. Table VI shows the results:
TABLE Vi

DISTRIBUTION OF MORAL KNOWLEDGE SCORES (SCALE A) OF HONEST AND
DISHONEST GROUPS

Score Honest Dishonest

— eet
ARN
ND

Oo) a Gri te Oe ho

DO
On
mewn PDO

32

22 25

*Not statistically determined. The largest differences were used.

59This seemed to warrant further study, so we did the same thing
for Scale B, Form 2, using another group of most honest cases, but the
same group of dishonest cases. The honest scoring of the most dif-
ferentiating elements was as follows:

TARE Vii

HONEST RESPONSES ON DISTINGUISHING TEST ELEMENTS—SCALE B

Applications Recognitions Principles Vocabulary

item © Score Item Score Item Score Item Score
Z bor 3 3 J 3 ~~ i 1
5 4 or 5 5 C + a 10 |
7 5 12 C 6 -|- ti 1
8 4 or 5 13 Cor % 7 -— 12 3
9 3 or 4 14 x 15 1
16 J on % 18 1
18 C 19 3
oe c 20 3
24 2 Zo 1
26 ko 26 3
2s 1
= Z
pe 1
oS 2
34 1
36 1
of 4

When scored as in Table VII the two groups of papers yield the

following distributions:
TABLE: VIII

DISTRIBUTION OF MORAL KNOWLEDGE SCORES (SCALE B) OF HONEST AND
DISHONEST GROUPS
Score Honest Dishonest

0

Z

4

6

8
10
12
14
16
18
20
22
24
26
28
30
32
34

Nh uw DON

ern

1
1
3
6
3
4
3
3
1

60Scale B did not succeed as well as Scale A in distinguishing the
two groups, but the difference is still marked. But we may still be
“stacking the deck,” so to speak, by this method of selection. The same
questions might not distinguish between other groups. The apparent
differences in the separate items may be chance differences in each case,
so that by combining a lot of such chance differences we may have built
up a large total difference peculiar to the groups selected. When the
items are not selected because of their capacity to distinguish groups,
but are chosen at random, the chance differences between groups tend
to be neutralized. This can be tested by taking a fresh population of
honest and dishonest cases and using the same items as before.

We did this by selecting the most honest twenty-five and the most
dishonest twenty-five from a population of 500. The difference in this
case is much less significant, being only 2.7 times its standard error,
whereas it should be three times its standard error to be beyond the
range of chance. The difference between the cheating means of these groups
on Behavior B alone was twelve times its S. E.

This method seems to be unavailable for discovering the relation
of moral knowledge to conduct. But having gone so far we thought
we might as well see what else the differences among these several
populations might reveal. Apparently the moral knowledge scores are
due to other factors than those which determine the behavior scores.

First it should be noted that the honest group in Table VI is from
a private school of unusually fine moral tone. The deception group in
the same table is from an institution for children from broken homes.

The second group of honest cases used for comparison with these
institutional cases consisted of about half the same children as before
and half other children from the same school. The groups from the
population of 500 used as a check and referred to in the second para-
graph preceding, are from a suburban community and both the honest
and dishonest groups are from the same schools so that the general back-
ground is relatively homogeneous. Let us call these groups HP1, HP2,
DI, HS, DS, respectively; HP1 and 2 the most honest private school

TABLE Ix

DIFFERENCES (LEFT ) BETWEEN HONEST AND DISHONEST GROUP MEANS AND
THESE DIFFERENCES DIVIDED BY THEIR STANDARD ERRORS (RIGHT)

rt EP2 DI HS D>

    
    

HP1 Moral Knowledge 10.3 4.4 77
Deception 7 12.3 4.9 151
HP2 Moral Knowledge 8.2
Deception + 18.6 10.4
DI Moral Knowledge + 128 + 11.3
Deception +1459 +127.3
HS Moral Knowledge + 5.9
Deception + 19.
DS Moral Knowledge + 9.6
Deception +108. —38.1 +88.0

61children, DI, the twenty-five most deceptive institutional children, HS
and DS the twenty-five most honest and twenty-five most dishonest
suburban children. Table IX displays some interesting comparisons
among these groups.

Remembering that any difference three or more times its S.E. (right
side of table) is beyond the range of chance, let us examine this table.
The biggest differences are between the groups on which the technique
was built, HP1, HP2 and DI, the test questions being selected because
they differentiated these groups, the private school most honest and the
institutional most dishonest. The next largest differences occur in the
two instances in which one of these original groups is compared with a
fresh group, viz., HP1 and DS (private honest and suburban dishonest)
and DI and HS (institutional dishonest and suburban honest). When
entirely fresh populations are used for the honest and dishonest groups
(HS and DS), the moral knowledge difference is not quite beyond the
limits of chance although the deception difference is considerable. Com-
parison of the suburban and institutional dishonest groups, DI and DS,
shows that there is a slight difference in favor of the suburban group
on both moral knowledge and deception tests. Comparison of the pri-
vate school honest and suburban honest groups (HP1 and HS), shows
a curious and significant difference in both moral knowledge and decep-
tion. These moral knowledge scores, it must be remembered, are based
on only twenty-six elements. The private school mean (honest groups)
is 24.8 as against 18.8 for the suburban honest groups, a difference 4.4
times its standard error. This is a more significant difference than the
difference between the moral knowledge elements of the two suburban
groups. When we get away from the original two groups by means
of which the elements were chosen, their power to distinguish dis-
appears. This is particularly conspicuous when it is noted that these
two suburban groups differ in deception by twelve times the S.E. of the
difference.

We must conclude, therefore, that while the responses on the
selected elements are much the same for two dishonest groups, they differ so
between two honest groups as to eliminate their discriminative capacity. But
the comparability of the differences between the HP1 and HS group in moral
knowledge (4.4) and deception (4.9) and between the DS and DI group in
moral knowledge (2.5) and deception (2.9) as well as the relations between
HPI and Ds, and HS and DI (see Table IX), suggest, if they do not demon-
strate, a relation of some kind between the moral knowledge responses
and conduct. But the great difference in answers between the two
honest groups, HP1 and HS, suggests also that the relation is slight
and that other factors such as the general cultural differences often
found between distinct social groups such as public and private schools,
and institutions, are more significant in determining correlations be-

tween knowledge and conduct than are any logical relations in the minds
of individuals.

62The facts just discussed are graphically portrayed in the accom-
panying chart, from which it will be seen that the various groups occupy the
same relative position in both moral knowledge and deception.

MORAL KNOWLEDGE DECEPTION

 

Means Means
Ht 5 1 ae
oo ee SS
NS i
Hs 49
aS 15 118 DS
DE = 13} 156 DI

 

If, as has just been suggested, the group as a unit should exhibit
higher correlations between such factors as knowledge and conduct than
does the individual as a unit, many interesting problems of interpreta-
tion would be raised. It has seemed worthwhile, therefore, to make an
intensive study of the relation between moral knowledge and conduct
of social groups each of which is relatively homogeneous. The con-
clusions of this study will be reported in the next article.SIXTH ARTICLE
GROUP STANDARDS AND GROUP CONDUCT

The previous paper in this series reported two conclusions and two
provocative suggestions covering the extent to which standards and conduct
are psychologically related in the behavior of individuals. The scores on
our moral knowledge tests, purporting to measure general level of compre-
hension of ideal conduct, proved to have very little in common with either
deceptive or altruistic behavior. The way in which certain test items were
answered by honest as contrasted with dishonest children seemed to offer
a fruitful lead regarding the way to build a test of moral opinion which
might show a better correlation with conduct. We were not able, how-
ever, to select from our own tests a group of items which would consistently
discriminate between honest and dishonest children. Finally, we drew atten-
tion to the fact that close correspondences existed between the most honest
sections and the most dishonest sections of certain school populations with
respect to their mean differences in both moral knowledge and deception.

From Table IX of the last article it appears that Honest Group HP1
differs from Honest Group HS in the same amount in both moral knowl-
edge and deception; Dishonest Group DI differs from Honest Group HP1
84 per cent as much in knowledge as in conduct and from Honest Group
HP2 78 per cent as much in knowledge as in conduct. Dishonest Group
DS differs similarly from Honest Group HP1 in about the same ratio as
Dishonest Group DS differs from Honest Group HS.

The means of four of these groups were charted on the last page of the
previous article so as to indicate the correlation.

All this suggests a group similarity in behavior on moral knowledge
tests and deception tests which we have thought worth investigating.

In reporting the similarity of groups in moral knowledge and conduct
we are not engaging in controversy over the psychological nature of a group.
We shall show, however, that when one relatively homogeneous group is
compared with another, differences in both knowledge and conduct are found
which cannot be accounted for by chance or by differences in intelligence and
which also correlate more highly than do knowledge and conduct in indi-
viduals. These facts bear out the suggestion that there is a community of
code and conduct in homogeneous groups which is not a function of indi-
vidual integration.

In this paper two types of dishonest tests and a record of helpful acts
are used for the conduct scores, and eight different moral knowledge tests,
wherever these could be matched, case for case. The classroom group is
always the unit used. Table I shows the correlations between the available
moral knowledge test scores and a type of dishonesty called Behavior C,
which consists in making illegitimate use of an answer sheet while taking
a test or grading one’s own paper.

There were three such tests involving arithmetic problems, completion
problems, and information problems. These three are combined in a single
classroom or school deception score in Table I. The scores all represent
amounts of deception. Classrooms doubtless differ in code in this matter
as well as in conduct, but these codes are not qualitatively revealed in the
moral knowledge scores, which indicate, rather, a kind of level of com-
prehension as to what is expected of children. If a genuine code were avail-
able the correlations would presumably run much higher.

64TABLE {
INDIVIDUAL AND GROUP CORRELATIONS BETWEEN MORAL KNOWLEDGE AND
DECEPTIVE BEHAVIOR C

Total School Score 7 Groupr
Ind. r’s Group r’s Partials
1 2 3 4 is 6 Intelligence
M.K Tests Raw Corr; : PE. N Groups constant
Al Causes —.04 —.05 +.28 12 435 13
A2 Duties —.25 —.32 —.35 .09 450 15
A3 Comprehensions —.18 —.24 —.80* .04 457 16 —.73
A4 Provocations —.15 —.20 —.53* .09 307 13 —.20
B2 Recognitions Se —64* 06 766-84 oS
B3 Principles —.26 —.36 —.49 .09 302 8
B4 Applications —.40 —.52 —.49 .09 243 9
BS Vocabulary —~.15 —.18 —.51* .08 540 18 —.05
The columns of Tables i, if and fl have the following meanings: At the

left are the separate moral knowledge tests, referred to by name, scale and number.
Col. 1 gives the r’s between individual moral knowledge and deception scores. Cole 2
gives these r’s corrected for chance errors. Col. 3 gives the r’s between the class-
room means in moral knowledge and deception. Col. 4 gives the P.E’s ot Col. 3.
Col. 5 is the number of cases in each population. Col. 6 gives the number of class-
room groups. Col. 7 shows the partial r’s between moral knowledge and deception
group means with intelligence held constant.

Table II presents the same facts for Behavior A—a type of dishonesty

which consists in adding on more scores in a speed test when one is sup-
posed to be correcting his paper. There were six such opportunities in
the test. TABLE Ii
INDIVIDUAL AND GROUP CORRELATIONS BETWEEN MORAL KNOWLEDGEF
AND DECEPTIVE BEHAVIOR A

Ind. r’s Group r’s
1 2 3 4. 5 6
Raw Corr. r Ek, N Groups

Al —.14 —.22 +.265 12 780 30
A2 —.18 —.30 —.367 12 710 28
A3 —.08 —.13 —.087 13 780 30
A4 —.09 —.13 —.435 .09 780 30
B2 +.03 +.04 —.177 a 458 LT
B3 —.06 —.11 +.382 12 458 i
B4 —.09 —.14 —.443* 10 419 14
B5 —.06 —.07 —.338 12 528 19

The columns of Table IT have the Same meanings as those of Table I.
Table III gives the correlations for general helpful behavior called

Behavior H. The helpfulness scores are ratios based on teachers’ estimates
of the amount of co-operation each child gave to each of several class and
school service projects, and the number of such projects.
TABLE HE
INDIVIDUAL AND GROUP CORRELATIONS BETWEEN MORAL KNOWLEDGET
AND BEHAVIOR H

Ind. r’s Group r’s
1 3 4 5 6 7
Partials
Raw r BE N Groups (int. constant)
Al +24 714* 06 387 13 ++.65
A2 +.26 685% .06 359 12 +.63
A3 —++.12 .362 .10 386 3
A4 116 404 10 400 13
Be +.17 .363 .10 221 9
B3 +.24 .730* .05 222 9 +.75
B4 +.18 .758* .05 152 6 +.73
B5 +.45 650 07 258 10

The columns of Table TT have the same meaning as those of Tables I and i
The r’s of Col. 1 could not be corrected for attenuation since the reliability of the
helpfulness scores is not known.

+The moral knowledge scores in Tables II and IU are from a revised form of
those previously used which in each case is less than half the length of the original.

65‘Lhe first thing to be noticed in these tables is the fact that the group
r’s of Column 3 are, with one exception, higher than the individual r’s of
Column 1, and almost always higher than these r’s even when they are cor-
rected for attenuation in Column 2. Column 1 gives the fairer comparison
since in groups made up by a random selection of cases Tmimz = T1z as will
be pointed out ina moment. In many cases the group r’s exceed the indi-
vidual r’s in the ratio of from 4 to 1 to 7 to 1. Those that are significantly
greater than the individual r’s are starred(*).

Table I shows that in the case of Behavior C at least four of the moral
knowledge tests correlate significantly higher in the case of the group means
than in the case of the individual scores. Behavior A, however, shows
only one single significant difference, although in each case the group r’s
are larger than the individual r’s. Four of the moral knowledge tests show
significantly different r’s between the individual and group r’s for helpful
behavior (Table III), and most of the r’s run higher than for deception.
The four that are starred for helpfulness are precisely the four that are not
starred for the deception scores of Behavior C in Table I.

These figures now set our problems for us: Classroom groups exhibit
a genuine association of scores on certain moral knowledge tests and certain
conduct tests which is not accounted for by the association of these same
facts in the individuals who make up these groups. Individuals who rate
high in moral knowledge do not necessarily rate high in conduct. In fact
the relation between the two is nearly negligible. But groups that rate high
in moral knowledge do also rate high in conduct, under certain conditions.
That a relation of this sort between individual r’s and group r’s is not a
chance result has been shown by Pearson, who demonstrated that if a
series of groups are random samples of the entire population, the r’s between
the means of the groups will be the same as the r’s based on individual
scores.*

In our case, the groups are obviously not selected at random so far
as age is concerned since they are ordinary grade groups, the members of
which have been together for the most part for some time. It may be that
the mere mechanical age and intelligence differentiation of such grade groups
would account for the likeness found in knowledge and conduct.

This explanation depends upon the existence of correlations between
either age or intelligence in both moral knowledge and the conducts studied.
Chronological age, we know, does not correlate with either Behavior C or H.
It does slightly in the case of Behavior A, but this factor has already been
eliminated from the scores reported for this behavior. Differences in age,
therefore, cannot account for these correlations.

Differences between groups in intelligence, then, must be considered
as a possible explanation of our superior group r’s. Fortunately, intelligence
scores were secured in the course of our study which enable us to test this
hypothesis in two different ways. The first and most obvious procedure
is to partial out the variability in intelligence. This we have done for
Behaviors C and H in the starred cases where the differences between the

*See Kelly, Truman L. Statistical Methods, page 178, Formula 118. More ex-
plicitly, if each pupil’s moral knowledge and deception scores were written on a card,
and all the cards were shuffled and then sorted by chance into piles, the correlation
between the mean moral knowledge scores and mean cheating scores of these
piles would be the same as the r between the individual scores if they were thrown
into one plot (within the limits of chance variation).

66group and individual r’s are statistically significant, and the results are to be
found in Column 7 of Tables I and III. These partials are, of course,
highly unreliable, but they are large enough in several cases to indicate that
intelligence is not the only factor at work to produce group similarity of
knowledge and conduct. Strictly speaking these partials should be compared
with corresponding partials for Column 1. We have not computed these as
the only effect would be, in most cases, to increase the difference between
the individual and group r’s and so still further undermine the suggestion
that the group r’s are to be accounted for by differences in the mean intel-
ligence of the classrooms.

The relatively low group r’s and high P.E.’s in the case of Behavior A
make the partial correlation technique here unavailable. Hence we have
adopted a different method of testing the intelligence hypothesis in this case.
Our criterion here depends on the following statistical relations among
random samples: If from a large population several batches of about thirty
each are drawn at random, the mean and the standard deviation of each
batch will be the same as the mean and the standard deviation of the whole
population, within the limits of determinable errors due to chance variations
among the samples.* The means of the random samples will form a normal
distribution, the mean of which will be the same as the mean of the larger
population and the standard deviation of which will equal the average of the

SD.
VN

samples are not random—not mere chance accumulations of individuals—
the average of the standard errors of the sample means will be less than the
S.D. of the group means. The reason for this is that when a selective force
is operating to make the members of a group resemble one another more
than they would by chance, the range and therefore the S.D. of the scores
in the trait concerned is less than for a random sample or for the total
population of which the sample is a selection. Hence the average of a
series of such non-random S.D.’s is less than the S.D. of the whole population.
If the selective force operates unevenly from group to group, the range
and therefore the S.D. of the group means will be greater than in the case
of groups chosen at random. Consequently the average of the standard
=a.
errors of the group means (—) is bound to be less than the S.D. of

VN
these group means.

Applying this criterion to our data, we are to show that even when
class groups are random samples with respect to intelligence, or do not differ
from one another significantly in this particular, they nevertheless do differ
significantly from one another in both moral knowledge and conduct. Under
these circumstances, such superiority of group over individual r’s between
knowledge and conduct as is secured may with some confidence be attributed
to some common factor other than intelligence.

In applying this criterion we used seven classroom groups, whose mean
intelligence scores were close together, and who had Scale A of the Moral
Knowledge tests, nine such groups who had Scale B, and ten of homogeneous
intelligence who were tested with Behavior A. The results are summarized
in Table IV.

 

standard errors of the S.D.’s of the samples, each of which is If the

*See Yule, G. U. Introduction to the Theory of Statistics, Page 344.
67TEDL 1V

CRITERION FOR RANDOM SAMPLING IN REGARD TO INTELLIGENCE, MORAL
KNOWLEDGE AND BEHAVIOR A

1 2 3 4 5
No. of Ave. S17 of Ave. S.E. of Ratio of
Scale A. groups N Means Means 3to4
Intelligence i 23 3.4 3.5 97
Causes iy 23 3.7 1.04 3.56
Duties 1G 23 1.6 a; 1.60
Comprehensions 7 23 72 .40 1.80
Provocations 7 2D 1.0 78 1.28
Scale B.
Intelligence 9 24 3.2 3.5 91
Recognitions 9 24 3.0 2.5 1.20
Principles 9 24 1.4 45 3.1
Vocabulary 9 24 4.4 1.8 2.44
Behavior A,
Intelligence 10 25 3.2 3.5 91
Deception 10 25 12.96 5.24 2.47

Thus we see from Table IV that in each set of groups the average of
the S.E. of the intelligence means is slightly greater than the S.D. of the
means, indicating that the groups selected are of the same level of intelligence,
or, in other words, random with respect to intelligence. In each of the moral
knowledge tests, however, and also in the deception test the ratios of Column
5 show the S.D.’s of the means to be greater, often very much greater,
than the S.E.’s of the means, demonstrating that these groups are not ran-
dom samples but show the presence of a selective force, operating inde-
pendently of intelligence, to produce variation in the means.

We have approached the suggestion that there is genuine group unity
of standard and conduct by several steps which may be summarized as
follows:

1. The correlation of groups, treated as units, with respect to level of
moral knowledge and conduct, is not altogether due to the correlation of
these two factors in the individuals composing the groups.

2. This correlation is not the product of a large number of uncorre-
lated factors (chance).

3. This correlation is not due to differences between the groups in age
or in intelligence.

4. The variability of the groups among themselves is not such as could
occur by chance, age or intelligence.

5. Since the group r’s are larger than the individual r’s, they cannot
be accounted for by a causal relation between moral knowledge and con-
duct, since this relation could operate only through the minds of the
individuals concerned.

6. Hence the superiority of the group r’s must be due to the reaction
of individuals to some influence which tends both toward higher code and
more social conduct (and vice versa) without these being integrated in the
minds of the individuals.

Such a common influence might be exerted either by the group as a
whole through a growing tradition or by the teacher or by the school system,
or by all three. No matter how much it affects either conduct or code for
the better, if the correlations indicate the absence of individual integration,
this improvement can hardly be regarded as growth in character.

Lest this evidence from group correlations be regarded as insubstantial
we will illustrate how it is possible to get a high correlation between group
means when the r between individual scores is zero. It all depends on how
the groups are constituted or selected.

68SCATTERGRAM No. 1

re-=Q

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1 2 2 5 6 vd
e @ @ @ @ Ba)
@e;* ele @

@ @ @ @ @ e
® @|@ @le e@

e eee @@@\leeaqi* @

e ee e
e@¢ |@e@0e Cee|@®@eeiao «a

oe #/e0¢e |ee0e|eea\e
@ @ ee eo @e@ @
© |@e8@ leeeleaedlioa @
eo @ [Pee /@@elegele @
@ @ @ s
@ @ @@20e@ @eaea\le@@e@lga e
© O19 @16 5
@ oe @ @ @ e
e @ e@ ® e @
e e oe @ 2 @
6 18 350 56 30 18 6
x
SCATTERGRAM No. 2
redQ
ThixMy S L<OG
1 gz 3 4 o 6 7
& & fh e h
f ff | fgg | fgg | be-+c_Y| h
h h fh a
@ | egg | dde | aabbi bb be s
h | ghh|ccee/c/a| fh
hh
be | bbce} 88b/! aabb fg
ce a ddff we ceee| gh | 12
ab | 2a bbce; dde| fg
Ye se d)ddff| ghn| gh | &
38
a ab be ce | TF gz
a cc de ga
a a de e =
6 18 30 56 350 18 6

69

18

30

36

30

18

144

18

50

56

30

18

144Consider the accompanying Scattergram, No. 1, of 144 cases in which
r is 0.00.

Now we can select from these cases eight groups of eighteen cases each
in such a way as to yield a correlation of either plus or minus 1.00 between
the means of these groups, or of any amount in between, according to the
way in which the groups are selected out of the total population of 144.
Scattergram No. 2 shows a selection of eight such groups whose means will
correlate +1.00. Every dot on Scattergram No. 1 is an individual. We
now put eighteen of these individuals in group a, eighteen in group b,
eighteen in group c, etc., selecting them from the total population with
great care so that the mean score of the a’s with respect to the scores plotted
on the X axis will equal the mean of the scores plotted on the Y axis, and
so also for groups b, c, d, etc. Scattergram No. 2 substitutes for the dots
of Scattergram No. 1 the letters of the groups to which we have assigned
the individuals and Table V gives the distributions of the scores of the
individuals thus grouped for both the X and Y variables. It is obvious
that the mean of each group for one variable is identical with the mean
for the other variable and that the r of these means will therefore be +1.00.

TABLE V
DISTRIBUTION OF SCORES OF INDIVIDUALS IN GROUPS a, b, c, ETC., ON X AND Y
AXES OF SCATTERGRAM NO. 2.
Axes

x y xX y x y x y x y x y x y x y

Groups a a b b c c d d e e f f g g h h
= ee 1 a. 5 1 1

8 3 1 go 8 7 3 3 1 1

Z 6 6 2 oh 4 4 6 6 1 if: S 3 4 4 3 A

ee ee ee oe 8 8

Be 5B 8 Posy 1 1 is Se ef

6 4 = 25 1 1 1 1 eg. fee

ee 1 1 1 2 ee

Ma tt SST TS GG TSC COTS 8 8 9 kT Bz

It will be noted that the eighteen individuals composing each group
(there are 18 a’s, 18 b’s, etc.) are so selected as to place an equal number
on each side of the principal diagonal, and in complementary cells. If this

SCATTERGRAM No. 3
rs +1 0

85 /
80
75 LL
70 /
65 Mf
60
55

50 | /
50 55 60 65 70 75 80 85
70process were reversed and they were balanced across the opposite diagonal
the r would be —1.00.

When the means of the group scores for each axis (Table V) are
plotted in Scattergram No. 3 their close correlation is seen at once.

Scattergram No. 2 shows what rigid selection will do. This is a purely
theoretical arrangement and would not occur in an ordinary population.

Scattergram No. 4 shows a hypothetical case representing a simplifica-
tion of the facts actually found. Here the r of the whole population is
—.316 and tfmxmy =—.5/79. The distribution of scores for each group is
given in Table VI and these group scores are graphed in Scattergram No.
5 which shows the variability among the group means, on both axes. Some
groups are high on the X axis (moral knowledge) and low on the Y axis
(deception) (g and h of Scattergram 4), others are low in moral knowledge
and high in deception (a and b of Scattergram 4), and others are scattered
through the center of the graph.

TABLE VI
DISTRIBUTION OF SCORES OF INDIVIDUALS IN GROUPS, a, b, c, ETC., ON X AND Y
AXES OF SCATTERGRAM NO. 4

 

Axes = x y x y x y a y x y x y x y x y
Groups a a b b eC c d d e @ f f g g h h
1 2 3 iL 2 4
2 6 4 2 a 2 2 5 5 Z dD
Zs 5 5 3 1 3 5 5 4 3 6 aE 8 1 3
4 5 3 5 3 4 t 6 6 6 | 3 2 2 2 2 3
5 6 4 6 5 6 3 4 5 4 4 5 1 4 1

6 c 7 3 = Z 4 2 1 e 6

7 2 2 zr 1 3 5
Means 49 98 63 98 09 85 50 68 78 78 64 62 78 49 102 46

SCATTERGRAM No. 4
rs = e316
TiixMy = ~ 079
2 eS 3 4 3° 6 =
7 a bd a 4

 

b
ab ab abe db
8

ac ce

abd. abe | abe | bec
abd | abe| ef | cge| h 34
fe edt |

p~| abe| ab_| bee
Y4 ad adf ad cad ghe ceh ch 30

fe| gide

a
a | 4a | rr | adel $F a en | gh Be

ef | gef | 6h

 

 

 

 

2 f | df | gef; ehd| ght] gn gh| 19

 

 

 

 

 

 

 

 

 

 

6 SL Bo ae 2 10 144

{eeSCATTERGRAM No. 5
- a ee 5079

 

 

95
90

85
80
79
70

695
60
995

00

45 / /
45 50 55 60 65 7075 & 85 9095 100

These imaginary cases are given thus fully to illustrate forcefully the
fact that when the r between individual scores is zero or near zero (as is
the case in Tables I-III), for the r between the means of samples of the
whole population to go as high as .70 or .80 requires very rigorous selec-
tion, such as would be found only in some factor tending strongly to vary
the group means on both axes in the same direction from the mean of the
total population.

Whatever this selective factor or influence may be it (or they) must
operate on both variables. When the individual r is zero or thereabouts,
if the selection of groups took place on one variable only, the r between
the group means would remain zero. We showed earlier in the paper
that the classroom groups are not random samples (are “selected’* by some
influence which makes them vary among themselves more than they would
by chance) in respect to both moral knowledge and conduct. The r’s indi-
cate that this variation is m the same or opposite direction on the two axes.
Furthermore, in the case of group r’s of .65 or more the groups must either
lie in clusters or constellations in the correlation plot or else must be bal-
anced across one of the main diagonals (as in Scattergram No. 2). The
probability is that in any actual case they will be somewhat clustered and
also fairly balanced as in No. 4.

*The individuals are not “selected” by us into groups because they are alike, as one
would make some sort of arbitrary selection of those over five feet, those between
four feet six inches and five feet, etc., but are actual groups whose differences
among one another are due partially to the differences among the homes from which
the children of different schools come and partly to influences operating within each
group in the course of its common experiences.

72