s s J'11/?>: , g-<~ B\500 A Logic-Based Approach to the Measurement of Deductive Abilities in Deaf and Hearing Persons m United States Civil Service Commission Bureau of Policies and Standards Technical Study 78-3 A LOGIC-BASED APPROACH TO THE MEASUREMENT OF DEDUCTIVE ABILITIES IN DEAF AND HEARING PERSONS Magda Munoz-Colberg Mary Anne Nester Test Services Section Personnel Research and Development Center United States Civil Service Commission Washington, D. C. 20415 October 1978 Abstract The logic of deduction is outlined as background for the objectives fulfilled in thisproject: 1. development of an instrument to measure deductive abilities in the formof conditional and alternative schemata, which have been generally neglected by psychometricians; 2. development of syllogistic tests traditionally used in psycho metrics (categorical syllogisms) according to the logic of propositional equivalence and propositional relations; and 3. determination of which form of syllogistic test,symbolic or linguistic, would be appropriate to measure deductive abilities in deafpersons. Four tests were constructed and administered to deaf and hearing adults.The psychometric properties of the logically-constructed tests (test means, reliabilities, and item analysis results) were highly desirable. Deaf persons performedas well as the hearing on the symbolic test .but less well on the linguistic test. The performance of hearing persons is discussed relative to the logical design andlinguistic medium of the questions. A LOGIC-BASED APPROACH TO THE MEASUREMENT OF DEDUCTIVE ABILITIES IN DEAF AND HEARING PERSONS When the issue of testing deductive LOGICAL ANALYSIS abilities is approached from the ~tand point of deductive logic, tests of Deductive schemata are inherently syllogistic reasoning appear to constautological. In Kemeny's words: titute a pure approach to the measurement of these abilities. In other words, in It [deduction] finds out certain terms of reasoning process, the testing facts which are contained in our situation reflects with precision the statements, and adds nothing new situation for which it selects only when (except in so far as this fact may there is an identity between both situabe psychologically new to us; that tions. is, we did not realize that we werein possession of this fact). (pp. A corollary of this is that the 112-113) testing situation should exemplify the schemata of deductive logic. Regrettably, In other words, a conclusion derived in the psychometric tradition, tests of through a deductive schema is merely an syllogistic reasoning have not extended explicit elucidation of the actual content beyond the categorical syllogism. Thus, of the premises. As such--but assuming thefor example, deductive schemata expressing truth of the premises--deduction represents conditionals have been generally ignored an unassailable epistemic tool, for the thus far in the attempt to measure deduc conclusion follows apodictically from the tive abilities. premises. The seriousness of this omission in A deductive schema may be stated as: terms of predictive value hardly needs to be emphasized. Conditional deductive (x) (Px :2 ~reasoning is present in every "real" Pa situation, on the job as well as in the Qa (Carnap, 1974, p. 17) academic milieu. In fact it is present in the everyday existential situation and The first premise asserts that for all if one were testing for existence as such, x, if x has the property P, then x also hasconditionals would be a contextual sine the property~· The term~ at the-beginning qua non. A similar case can be made for of the formula represents a universal quan alternative deductive reasoning and even tifier and indicates that reference is beingfor disjunctive deductive reasoning. made to all cases of x (all deductive schemata include a universal premise). The The research and development effort symbol ) is a connective which indicatespresented in this study had a triple implication. objective: First, to develop a deductive The second premise asserts that a particular object a has the property test which would include conditional andalternative deductive schemata; second, to P. From these premises the conclusion that object a has the property ~ follows with construct tests of syllogistic reasoning logical-necessity. traditionally used in psychometrics (i.e.,categorical syllogisms) exclusively ac Broadly defined, therefore, deduction cording to logical formulae; and third, constitutes a demonstrative form of argument to determine which form, symbolic or linin which the conclusion necessarily has the guistic, of a test constructed according same truth-value as the premises. Black to these criteria would be more appropriate to attempt to measure deductive abilities (1970) thus rightly insists that the validityof the conclusion is axiomatic: in deaf persons. Research on testing otherabilities suggests that access to syllogisticreasoning in deaf persons would be impededby the utilization of the linguistic mediumin the measuring instrument. There is no interval between considering and understanding the premises and wondering whether you have to accept the conclusion. In affirming the premises, you thereby-affirm the conclusion. (pp. 20-21) [Similarly] ••• the counterinstances it envisages (p and If p then q, but possibly not-q) cannot be expressed without absurdity••• (p. 22) Basic deductive schemata include the valid forms of the categorical syllogism and the valid forms of conditional and alternative deductions. Categorical Propositions The categorical syllogism consists of three categorical propositions: two premises and a conclusion. A categorical proposition expresses a connection between two classes or concepts, with predication expressed through the copula "to be" (or through another verb reducible to the verb "to be"). These classes or concepts are known as the terms of the proposition. There are four kinds of categorical proposition: Universal Affirmative (expressed by the symbol A), Universal Negative (expressed by the symbol E), Particular Affirmative (expressed by the symbol I) and Particular Negative (expressed by the symbol 0). The traditional square of opposition, in which the logical relations existing among these four propositions are clearly expressed, is reproduced in the figure below. Definitions of the logical relations expressed therein are given immediately below the figure. E (No (All s is P) A s is P) Contraries <= <= ... .... Ill ..,Ill .., .... ....., ., ... .... Ill " "" "" " " "' "' - -- ... " " .... Ill Ill .., .., ....., ....., .0 .0 " "' " Subcontraries "' I (Some s is P) (Some S is not P) 1 Contradictories: Propositions cannot both be true and cannot both be false; i.e., from the truth of one we can validly infer the falsity of the other and from the falsity of one we can validly infer the truth of the other• Contraries:l Propositions cannot both be true, i.e., from the truth of one we can infer the falsity of the other but the falsity of one leaves the truth-value of the other undetermined. Superaltern/Subaltern:l The truth of the subaltern is included in the truth of the superaltern but the falsity of the superaltern leaves the truth-value of the subaltern undetermined. Conversely, the truth of the subaltern leaves the truthvalue of the superaltern undetermined but its falsity determines the falsity of the superaltern. Subcontraries:l Propositions cannot both be false at the same time, i.e., from the falsity of one we can validly infer the truth of the other but the truth of one leaves the truth-value of the other undetermined. Categorical Syllogisms The categorical syllogism, as stated before, consists of three categorical propositions: two premises and a conclusion. These contain only three terms; of these, one is a comparative term which therefore appears in both premises and makes possible the derivation of the conclusion about the two terms being compared. The two terms being compared are called major and minor terms--the major term is the predicate of the conclusion and the minor term the sub ject of the conclusion• The categorical syllogism has four figures,.these being determined by the position of the middle term in the premises • Each figure has valid and invalid moods • The mood of a syllogism consists of the quantity (universal or particular) and quality (affirmative or negative) of the premises and the conclusion. The valid These relations clearly presuppose acceptance of the statement there are ~· A discussion of this point is found in e.g., Quine (1972), pp. 84-85. 2 moods for each figure are presented in Conditional and Alternative SyllogismsTable 1. Their validity is establishedby the axioms and theorems of the cateConditional and alternative deductivegorical syllogism. It is, however, not schemata (also called conditional syllogisms necessary to analyze these axioms and and alternative syllogisms, respectively) theorems directly in order to follow consist of either a conditional or an alterthrough the deductive process of the native proposition as a major premise, a syllogism. The application of Venn's' categorical proposition as the minor prediagrams constitutes an equally valid mise, and a categorical proposition as theand easier test of the validity of a conclusion. syllogism.2 In Quine's (1972) words: The major premise in the conditional As a practical method of appraising syllogism consists of an antecedent and asyllogisms, rules are less convenient consequent (l!~· then~). From the affirthan the method of diagrams. Indeed••• mation of the antecedent in the minor prewe can apply the diagram test to a mise follows the affirmation of the conse given argument out of hand, without quent in the conclusion, or, in the negative pausing to consider where the argument form, from the negation of the consequent may fit in the taxonomy of syllogisms. in the minor premise follows the negationThe diagram test is equally available of the antecedent in the conclusion. The for many arguments which do not fit affirmative conditional and the negativeany of the arbitrarily delimited setof forms known as syllogisms. (p. 91) Table 1 Valid Moods and Figures of the Categorical Syllogism 1. First figure: M-p valid moods: AAA, Q, EAE,S -M AII, Po .·. g-::p 2. Second figure: p-M valid moods: EAE, Q, AEE,S -M @• EIO, AOO .·. g-::p 3. Third figure: M-p valid moods: AAI,3 IAI, Ail,M -S EA0, 3 OAO, EIO •·• s -p 4. Fourth figure: p -M valid moods: AAI, AEE, , 4 ~ M -S IAI, EA0,3 0 .·. g-::p NB: M = middle term A = universal affirmativeS minor term E universal negativeP = major term I particular affirmative0 particular negative Encircled moods represent subaltern moods. These require implicit recognition of the added premise there~~· 2 A clear discussion of the application of Venn's diagrams can be found in Quine (1972). pp. 83-92. 3 These moods require implicit recognition of the added premise there are ~· 4 This mood requires implicit recognition of the added premise there are !• 3 conditional are often referred to in logic as Modus Ponens and Modus Tollens. Modus Ponens-is essentially the same form ~ argument presented earlier as a generalized example of a deductive schema. The syllogistic schemata are as follows: p J q p J q p -q ••• q .·.-p where the symbol ) indicates a condi tional statement and the symbol -in dicates negation. The same two syllo gistic statements can be expressed theorematically5 (e.g., Frank and Smith, 1970) as: Theorem: [ (p => q) A p ] => q Proof: [(p => q) A p A -q] <=> ~(p => q) Theorem: [ (p => q) A ~q] => -p Proof: [(p => q) A -q A p] <= > -(p => q) where the symbol => indicates a conditional statement, the symbol A indicates conjunction and the symbol < = > indicates equivalence. The classical conditional sign is J but in some current writing the conditional signs = > and -> are used at least as frequently as the classical sign. In propositional form Modus Ponens and Modus Tollens are expressed respec tively as: If A is B, then C is D A is B Therefore, C is D If A is B, then C is D C is not D Therefore, A is not B A more complex type of conditional syllogism is the pure conditional which involves a conditional statement in the minor premise and the conclusion as well as in the major premise. The logical conditional form remains the same. q j r p ) q .·.p ) r In theorematic terms: Theorem: [ (q => r) A (p => q) ]=> (p => r) Proof: [(q => r) A (p => q) A -(p => r)] <=> -[(q => r) A (p => q)] Theorem: [(r => -q) A (p => q)] => (r => -p) Proof: [ (r => -q) A (p => q) A (r => p)] < => -[(r => -q) A (p => q)] In propositional form the pure conditional affirmative and the pure conditional negative are expressed respectively as: If C is D, then E is F If A is B, then C is D Therefore, If A is B, then E is F If A is B, then C is not D If E is F, then C is D Therefore, If A is B, then E is not F The major premise of the alternative syllogism consists of a statement of two alternants (Either p or q). The minor premise negates one-oy-the alternants and the conclusion consequently affirms the other alternant. The syllogistic schema is expressed-as: p v q -p ••• q 5 Black's viewpoint presented earlier on the axiomatic quality of these arguments should be recalled here. Essentially these arguments are axiomatic rather than theorematic and the proof presented for each theorem is really an explicit statement of the logical absurdity involved in affirming the truth of the premises and denying the necessity of the conclusion, 4 where the symbol V indicates alternation. In theorematic terms: Theorem: [(p V q) A -p] => q Proof: [ (p V q) A -p A -q] < => -(p V q) In propositional form the alternative syllogism is expressed as: Either A is B or C is D A is not B Therefore, C is D An alternative syllogism may also be expressed in disjunctive terms. In this case the major premise explicitly states the mutual exclusion of two dis juncts, the minor affirms one of the disjuncts, and the conclusion negates the other disjunct. The syllogistic schema is therefore expressed as: -(p A q) p ••........ q In theorematic terms: Theorem: [-(p A q) A p] => -q Proof: [-(p A q) A p A q] < => -[-(p A q)] <=> (p A q) In propositional form: It is not the case that both A is B and C is D A is B Therefore, C is not D The deductive schemata presented in the foregoing discussion, as well as other more complex forms of deductive reasoning are, to reiterate the definitional points made in the opening remarks of this section, axiomatic in quality. Deductive conclusions are apodictic. THE MEASUREMENT OF DEDUCTIVE ABILITIES The primary purpose of the testing of reasoning abilities is to connect, to the point of essential convergence, two situations: the testing situation and the "real" situation for which the test is intended to serve as a predic tive instrument. In other words, in terms of reasoning process, the testing situation must reflect the "real" situa tion with precision. It follows that the issue of whether ability is acquired or innate6 does not substantially affect the concept of testing. In fact, whether or not the ability exists as such, predictive access to the specific performance of a person in a "real" situation is relevant. This predictive access is possible only when, as stated above, the measuring instrument adjusts with precision to the demands of the "real" situation. These comments may appear redundant in a discussion of testing, but they certainly are not: there are degrees of connectedness, in terms of reasoning process, between performance in a "real" situation and performance on the test. What we are advocating here is that unless the degree of connectedness or convergence is essential, the test will bifurcate from the "real" situation. This is the very basis of a logic based approach to the testing of deductive abilities. The deductive process conforms to the laws of logic. Hence, if deductive reasoning forms part of the definition of a job or academic situation, the measuring instrument for deductive abilities must include and conform to deductive schemata, i.e., in the "real" situation as well as on the test, correct deductions are crucial. The term innate in any case may be interpreted in the Platonic sense (e.g., in the Meno) in which the innateness of the trait does not negatethe relevance of the learning process. In fact, the learning process is the sine qua~ for the actualization of the trait. 5 On the basis of this criterion, four types of syllogistic test were constructed, three categorical and one conditional and alternative. The construction of the first three tests was preceded by the construction of a pilot test which served to elucidate certain principles of logical design, described below. The pilot test of categorical syllogisms was constructed according to a partial logical plan which item analysis proved to be inadequate. The partiality of this plan excluded some incorrect alternatives, socalled "distracters," from the logical design. In other words, some distracters were not constructed strictly according to logical formulae but were instead constructed according to invalid quantification and affirmation or negation of the conclusion without utilization of the square of opposition (which was discussed in the Logical Analysis section). Some of these distracters demonstrated very little distracting capacity (i.e., they were not selected by many of the test takers), thus making many of the items and the test as a whole inadequate in terms of discriminating power. By con trast, as expected, distracters con structed strictly according to logical formulae yielded very considerable dis criminating power. Accordingly, the final version of the test was constructed exclusively according to logical formulae and as a result the test as a whole, as will be reported later, revealed very clear discrimination between the upper and lower-scoring groups. The logical formulae utilized in the construction of distracters varied for each deductive schema. In general terms, however, these were based on logical formulae for propositional equivalence. The construct~on criteria also included invalid alterations in the quantity and quality of the premises and conclusion according to the square of opposition. 7 Propositional equivalents of a categorical proposition, A, E, I, or O, are obtained through several logical processes. The fundamental processes are conversion~ obversion, contraposition, and inversion. Within the context of categorical propositions these can be briefly defined as follows: Conversion: S and P are reversed, e.g., NoS is P =No PisS. The A proposition converts only by reduction in quantity to particularity because its P is originally an undistributed term.7 The 0 proposition has no converse because the distribution of S would be universalized in the converse. Obversion: The original P is negated and the quality of the original proposition is changed, e.g., All S are P = No S is non-P. Contraposition: S becomes the original P in negative form and P becomes the originalS in negative form, e.g., No S is P = Some non-P are not non-S. (The I proposition has no contrapositive, as can be easily discerned diagrammatically.) Inversion: S becomes the original S in negative form, P remains the original P and the quality of the original pro position is changed, e.g., All S are P = Some non-S are not P. (The I and 0 pro positions have no inverse, as can be easily discerned diagrammatically.) Accordingly, to give one example, an item constructed according to Mood EA08 in the Fourth Figure of the cate gorical syllogism, and having four distracters, might exhibit the follow ing patterns of equivalence and of alteration in quantity and quality of premises and conclusion according to the square of opposition: A distributed term is one which expresses the universal set and an undistributed term is one which expresses a particular subset. 8 This mood in this figure requires, as may be recalled, implicit recognition of the added existential premise there are ~· 6 No e is t. All t are c. Therefore, A) Some c are not e. B) Some c are e. (subcontrary of conclusion) C) No c is e. (invalid universal conclusion) D) Some e are not non-c. (obverted converse of subcontrary of conclusion) E) No c is t. (converse of contrary of minor) Possible alternate distracters would be the valid converse of the major (or of the minor), the obverted converse of the major (or of the minor), the contrapositive of the major (or of the minor), and the inverse of the minor (or of the major). The three types of test constructed within the context of categoricals, exclud ing the pilot test, did not differ in logical design. That is to say, all three had identical schemata and identical logical formulae for distracters. The three tests differed only in linguistic form. In the first test, Test A-1, complex linguistic form was utilized; propositional construction included relatively intricate subordinate clauses and modifiers. In the second test, Test A-2, simplified language was utilized; propositional construction included only nouns (S, P, and M), variables of quantifi cation, and the affirmative or negative copula. In the third test, Test A-3, only symbols were utilized for the three terms S, P, and M (as in the example given above). The results of experimental administrations of these tests to both hearing and deaf competitors will be presented later. It should be reiterated that the pilot series did not have a strictly logical design and that as such it differs deductively from the three tests. As far as linguistic form is concerned, simplified language was utilized. Thus, the pilot test and Test A-2 converge in linguistic form. The performance of deaf and hearing groups on this deductively easier and linguistically simplified pilot test will be presented later. A fourth test, Test B, administered experimentally to a group of hearing candidates, was constructed according to conditional, alternative, and disjunctive schemata. This test was constructed in complex linguistic form, with relatively intricate subordinate clauses and modifiers. The logical design for distracters, as in the three tests of categorical syllogisms, was based on logical formulae for propositional equivalence. Accordingly, if we recall the symbols used in the discussion of conditionals in the Logical Analysis section, an item constructed according to the pure conditional negative, and having four distracters, might exhibit the following patterns: [(r => -q) A (p => q)J Therefore: A) (r => -p) B) (-q => -p) contrapositive of minor C) -(-r=>-q) inverse of major D) (q => p) invalid converse of minor E) (-p => r) invalid converse of conclusion The construction of the four tests (A-1, A-2, A-3, and B) was part of the research and development effort undertaken in order to fulfill the first two objectives enumerated in the beginning of this study, namely, first: to construct tests of categorical syllogisms exclusively according to logical formulae, and second: to include conditional and alternative schemata in the psychometric approach to deductive abilities. The construction of tests A-2 and A-3, as well as the pilot test itself, was intended additionally to fulfill the third objective mentioned in the beginning of this study, namely, to attempt to determine which form of such tests, sym bolic or linguistic, constitutes a more appropriate measure of deductive abilities in deaf persons. This question arises because of the well-documented deficit in verbal language skills which tends to characterize people with severe, prelingual deafness (Levine, 1960; Myklebust, 1964). For example, in a nationwide study of the performance of deaf students on the Stanford Achievement Test (DiFrancesca, 1972) the average performance of 19-year-olds on the Paragraph Meaning subtest was at the fourth grade level, while the performance of the 7 subgroup of 19-year-olds who took the Advanced Battery of the test was at the seventh grade level. In the context of civil service testing, Stunkel (1957) found that deaf college juniors and seniors performed significantly lower than hearing persons on all the verbal subtests (vocabulary, grammar, reading comprehension, arithmetic reasoning) of the Federal Service Entrance Examination. On the nonverbal subtests, however, the deaf students equaled (on symbol classification) or exceeded (on letter series) the performance of the hearing. Accordingly, it was hypothesized in this research and development effort that a deaf person would have no semantic impairment by virtue of being deaf, i.e., that sen1antic discernment in the case of a deaf person would simply not be, by nature or habit, essentially attached to linguistic form. Thus, it was hypothesized that college-educated deaf persons would perform at the level of college~ educated hearing persons in a test of reasoning abilities if and only if the test in question included symbols other than linguistic symbols for conceptual expressions. EXPERIMENTAL ADMINISTRATIONS Method Table 2 summarizes both the plan that was followed in presenting categorical syllogisms to deaf and hearing persons, including the number of subjects, and the results of these experimental administrations. In addition, the test of conditional, alternative, and disjunctive syllogisms (Test B) was given to a group of 1,126 hearing persons. All these tests were given as part of the Federal civil service battery known as the Professional and Administrative Career Examination (PACE); the individuals who took the tests were actual job applicants. Since jobs filled by PACE require a college degree or three years of professional or administrative work experience, it probably can be assumed that most people who took the tests met this requirement (although there is no way to verify this assumption). The deaf applicants were all college seniors or college graduates. Practical considerations made it impossible to administer the tests to random samples of the same group of subjects. In fact, each test was given in a different month in the period from September 1975 to February 1978. In the case of hearing applicants each test was given to a random sample of all the people who took the PACE nationwide in that particular month. In the case of deaf applicants, each test was given to all the deaf Table 2 Means and Standard Deviations for Hearing and Deaf Applicants on Tests of Categorical Syllogisms Hearing Deaf N Mean S.D. N Mean S.D. Pilot test simple language 2,386 21.49 6.00 24 17.46 5.49 Final tests complex language (A-1) 1,523 14.20 6.51 simple language (A-2) 2,688 17.70 6.70 symbols (A-3) 1,724 14.79 7.59 38 16.26 5.82 8 applicants who took PACE on three occasions The instructions for the conditional, when it was administered at a liberal arts alternative, and disjunctive test (Test B)college for the deaf in Washington, D.C. given only to hearing applicants, exThe pilot test was given in November 1975 plained what compound syllogisms are and and the final test-symbols (A-3) was given included a pure conditional negativein March 1977 and February 1978. (Because syllogism as an example.of the small number of deaf applicantsavailable, it was not possible to administer The third major difference in the the other two tests of categorical syllotreatment of hearing and deaf applicants gisms to them.) was in the time allowed to answer the 30questions of the tests. Hearing competiThe instructions to deaf applicants tors were allowed 35 minutes for the differed in several ways from those given experimental test part, as they are forto hearing applicants. First, hearing the other parts of PACE. Deaf applicantsapplicants were not routinely informed that were allowed 60 minutes, in accordance there were experimental questions on PACE. with a practice of allowing more timeTherefore, it is likely that most applicants to deaf applicants for questions which were unaware while they were taking PACE have verbal content. Since the PACE isthat the syllogisms were experimental quesmeant to be a power test, in whichtions which would not affect their score on almost all applicants are expected tothe test as a whole. However, because the attempt every question (Nester, Note 1),use of verbal civil service tests with the giving the deaf extra time should notdeaf has been a sensitive issue in the give them an advantage over the hearing.past, the deaf applicants were informed in In the experimental tests on which theadvance that the syllogisms were experihearing and the deaf are compared, theremental questions which.would not affect is clear evidence that the tests weretheir test score. They were asked, power tests for the hearing applicantsnevertheless, to give these questions as with the 35-minute time limit. For much attention as they did to the rest of example, their last-item omit rate forthe test. It is the impression of the the pilot test-simple language was 5%,people who administered the test, veri-and for the final test-symbols (A-3) itfied by such test performance characterwas 8%. For the deaf, the respectiveistics as a very low omit rate, that last-item omit rates were 4% and 5%. Invirtually all the deaf applicants com-addition, even though they were underplied with this request. relatively little time pressure, twothirds of the deaf applicants finishedThe second difference between the the questions in 35 minutes.treatment of the deaf and hearing applicants was in the instructions for answerResultsing the test questions. For both groupsthe categorical syllogism was defined, Statistical results relative to thebut the language used for deaf applicants three objectives of this study will bewas simpler in vocabulary and sentence presented in this section.structure than that used for hearingapplicants. The instructions for the The first objective, development of hearing included a syllogism of the third a deductive test which included condifigure, mood AAI as an example, while tional, alternative, and disjunctivethose for the deaf included four sample schemata, was met successfully from the questions. These latter questions were point of view of the statistical characmeant as a check to be sure that the deaf teristics of the test. The 30 items ofapplicants understood the instructions. the test had a very desirable range ofIt is possible that these sample ques difficulty levels and of point-biserialtions produced a slight practice effect correlations. The p-values (proportionfor deaf competitors, but this effect of applicants answering the item corwould not invalidate the results of the rectly) ranged from .21 to .82, and thestudy. This issue will be discussed at test mean was 16.59, which corresponds a later point. to a mean ~-value of .55. The item-test 9 point-biserial correlations ranged from To assess the differences among the.26 to .59 and 70% of these were above means of the three final tests a one-way.40. The KR-20 reliability of the test analysis of variance followed by a Newmanwas .87. Keuls test was performed. The overallF-ratio was significant, ! (2,5932) =The second objective, development 158.72, £. < .001, and the Newman-Keuls testof tests of categorical syllogisms revealed that all pairs of means differedaccording to logical formulae, was also significantly from each other (.£.<.OS).achieved, when success is assessed in However, the conclusion that the means forterms of test statistics. All four the complex language (A-1) and symbol (A-3)tests of categorical syllogisms had tests differed significantly probably isextremely desirable distributions of incorrect because the two samples appear toitem-test point-biserial correlations. have differed very slightly in deductiveOf the 160 items used in the four tests, ability, as measured by another test part.90% had point-biserial correlations of A correction for this difference would lower.40 or higher. The KR-20 reliability of the mean for the symbols group (A-3) by athese four tests ranged from .87 to .91. fraction of a question, perhaps as much asIt is chiefly in the difficulty level of .SO, and would eliminate the significantthe items that a difference was found Newman-Keuls comparison with the complexbetween the pilot test and the three language (A-1) group. (An analysis offinal tests. Table 2 shows that the covariance could not be used to compare themean for the pilot test was 21.49 while three final tests because a significantthe mean for the final test in simple inhomogeneity of regression was found forlanguage (A-2) was 17.70. The means for the three syllogism tests and the otherthe other two final tests, those in measure of deductive ability.)complex language (A-1) and symbols (A-3)were 14.20 and 14.79, respectively. The greater difficulty of the finalBecause of the very large samples that tests relative to the pilot test is mainlyare used in test research, which are due to the use of distracters constructednecessary in order to obtain stable according to logical formulae. Table 3 estimates of item parameters, all of lists the logical formulae of distractersthese differences are statistically that were used more than once in the significant with the probable exception final test. Six other formulae wereof the A-1--A-3 comparison. The differused just once in each test. As inence between the means of the two simple dicated before, distracters constructedlanguage tests (pilot test and final strictly according to logical formulaetest A-2) was highly significant, demonstrated very high distractingz = 21.29, p ~ • 0001. These two tests power and discriminating power both inwere chosen-for comparison because they the test of categorical syllogisms anddiffer in logical plan but are similar in the test of conditional, alternative,in verbal content. They are also closest and disjunctive deduction. in difficulty; the tests which differ from the pilot test in language as well as in logical plan are even more diffi cult and hence differ from the pilot test more than A-2 does. 10 Table 3 Logical Formulae for Distracters in Total Logical Designwith Indication of Better Distracters and Discriminators Better Better LOGICAL FORMULA Distracters Discriminators Invalid Universal Conclusions A proposition E proposition * * Propositional Equivalents Valid Converse of A premise * * Converse of E premise * * Converse of I premise * * Obverted converse of A premise Obverted converse of I premise * * Invalid Invalid converse of A premise * Invalid converse of 0 premise * * Utilization of Square of Opposition Subcontrary of I conclusionSubcontrary of 0 conclusion * *Converse of subcontrary of I conclusionConverse of subcontrary of 0 conclusionConverse of contradictory of E conclusionSubaltern of converse of E premise * The basis for the quantification of applicants (both high-and low-scoring)the distracting power and discriminating who chose a distracter of a particularpower of individual distracters is the formula was averaged over all the item analysis data. The Civil Service instances of that formula's use on oneCommission's item analysis tells what particular test. For example, thepercentage of the higher-scoring compesubaltern of the converse of the E pretitors (the top 50%) and what percentage mise was used as a distracter twice inof the lower-scoring competitors (the the test plan. In the simple language bottom 50%) chose each of the five pos test it was chosen by 3% of the highsible answers. A distracter with high group and 15% of the low group on onedistracting power would be one that is occasion--or by 9% of the total group-chosen by a large percentage of all comand by 10% of the high group and 20%petitors (though preferably by more of of the low group on the other occasion- the lower-scoring group). A distracter by 15% of the total group. Therefore,with high discriminating power would be its average distracting power was 12%,one which is chosen much more by the the average of 9% and 15%. To assesslower-scoring than by the higher-scoring the discriminating power of the disgroup. In order to assess the distractracters, the average was found of theting power of the distracters in the difference between the percentages offinal tests, the percentage of the high-and low-scorers who chose thedistracter. For the example given 11 above, the difference between the high group and the low group was 12% on the first occasion and 10% on the second occasion, for an average of 11%. After the distracting and discriminating powers were calculated for each logical formula for each 'version of the final test, the distracters were ranked within the test according to each of these properties. In Table 3, an asterisk is used to indicate which of the logical formulae were in the top half of the rank ordering in two or more of the three final tests. All of the distracters constructed according to logical formulae, with the exception of the invalid universal conclusions, demonstrated, on the average, levels of distracting and discriminating ability which would qualify them for inclusion on the test. There were a few instances where a very small percentage of applicants were attracted by a distracter. In those cases, the standard practice would be to revise the distracter before reusing it. As can be readily appreciated by reading Table 3, high distracting ability was demonstrated by distracters representing an invalid universal conclusion, A orE, derived from .a set of universal premises which did not warrant such a conclusion, e.g., moods AAI and EAO in the Third Figure and moods AAI and EAO in the Fourth Figure. However, the distracting ability of these distracters affected the upper-and lower-scoring groups almost identically. In other words, they demonstrated no discriminating ability. By contrast, distracters representing valid or invalid propositional equivalents of either premise demonstrated both high distracting power and high discriminating power across all figures and moods. This discriminating power can be said more specifically to have differentiated between the group which followed through the deductive process and that which did not, since these distracters repeated the information given in the premises and did not therefore complete any deductive schema. Lastly, the utilization of the square of opposition yielded, with two exceptions, the least desirable distracters in terms of both discriminating power and distracting power, although in most cases they proved to be acceptable distracters. Furthermore, only in one case, the subcontrary of an I conclusion, did this type of distracter demonstrate both high discriminating power and high distracting power. In fulfillment of the last objec tive, the determination of whether a linguistic or a symbolic test of syllogisms was appropriate for testing deduction in the deaf, the performance of these two groups on the pilot test simple language and on the final test symbols (A-3) was compared. Table 2 shows that the mean score of the deaf group on the pilot test was 17.46, while that of the hearing was 21.49. To evaluate this result a t-test was performed to see if the deaf mean dif fered significantly from the hearing mean, which was taken as an estimate of the population mean. The deaf mean was found to be significantly lower than that of the hearing population, t (23) = 3 .60, p < .01. On the final test-symbols (A~3) the hearing and deaf means were 14.79 and 16.26, respectively. The mean of the deaf group did not differ significantly from the mean of the hearing group, which again was taken as an estimate of the population mean, ~ (37) = 1.56, P> .05. (Of the 38 deaf applicants, 16 took the test in March 1977 and 22 in February 1978. These two groups were combined not only because their mean scores did not differ significantly but also because they represented all the deaf applicants available for the research in this two year period. In a sense, they represent a very large percentage of the population of deaf PACE applicants. The combination of the two groups is a better representation of the population of deaf PACE applicants than either group alone is.) These two findings taken in conjunction suggest that the deaf are equal to the hearing in their ability to carry out deductions (as demonstrated by the results with symbolic syllogisms) but that the verbal content of the pilot test syllogisms impeded the full demonstration of this deductive ability. This point will be discussed f~rther in the next section. 12 DISCUSSION As the data presented in the previous section indicate, the three objectives of this study were accomplished successfully. In fulfillment of the first objective, a test consisting of conditional, alterna tive, and disjunctive syllogisms was con structed. The test proved to have very de sirable statistical properties, including high reliability and excellent item stati stics. The significance of these results must be emphasized. Conditional reasoning, as well as alternative/disjunctive reason ing, is present in every real situation and if predictive access is sought to deductive performance on a real situation, the presence of conditionals in the measur ing instrument becomes a sine qua non. In fulfillment of the second objec tive, a test plan for categorical syllo gisms was constructed according to logical formulae and the test plan was elaborated in three linguistic forms--complex language, simple language, and symbols. All three forms had excellent item statistics and high reliability. In addition, the tests had a more desirable level of difficulty than the pilot test, which was on the whole too easy for use in the current testing context. Analysis of the item distracters for the three forms showed that virtually all of the logical formulae used as distracters demon strated an acceptable to excellent ability to discriminate between the lower-scoring group and the higher-scoring group and an acceptable to excellent ability "to distract," i.e., to increase the difficulty of the item in order to select individuals at a higher level of deductive ability. The construction of the test plan exclusively according to logical formulae thus yielded a test which performs its predictive function with a high degree of precision. Since, as discussed in the Results section, propositional equivalents of the premises demonstrated the highest ability to differentiate between those who were able to follow through the deductive process and those who were not, the authors consider it desirable and therefore plan to conduct research in the immediate future to elucidate further 1. whether or not other, more intricate propositional equivalents, such as inverses or contrapositives, would work as well as converses and obverses and 2. whether or not the performance of each specific equivalent is contingent on the syllogistic figure and mood in which it appears. In this research, for example, it was evident that converses of E premises performed a highly successful distracting and discriminating function across all figures and moods. Converses of A premises were less consistent in their function although they were very successful in most figures and moods. This was expected to an extent, since reasoning with negatives may be of itself more intricate than reasoning with affirmative terms and an affirmative copula. Thus when the figure and mood is simpler (for example, the moods of the first figure because of the location of M) the converse of an A, in contradistinction to the converse of an E, could generally be expected to lose some of its distracting and discriminating power. With respect to the third objective, the study showed that symbolic syllogisms are an appropriate tool for testing deductive abilities in the deaf, while syllogisms in simple language are not. The criterion used for determining that an item type is appropriate is that the deaf should perform as well as the hearing on the item type. There are two reasons for the acceptance of this criterion. First, data from hearing applicants are used to establish the norms for scoring deaf applicants on the actual civil service tests. If the deaf as a group scored lower than the hearing, then the test results of the deaf group, which translate into job opportunity in the competitive selection system, would accordingly be lower. Second, in addition to this practical aspect of item-type selection, there is an underlying assumption that the deaf and hearing groups are inherently equal in the reasoning abilities we wish to test and that differences in their performance reflect a difference in their ability to reason in the linguistic medium of the test. Since deaf applicants scored significantly lower than hearing applicants on the pilot test-simple language syllogisms and as well as the hearing applicants on the final test-symbolic syllogisms (A-3), we have concluded that the latter are appropriate questions to test deductive abilities in deaf persons. 13 A few additional points with respect to these data deserve discussion. First, there is the issue of whether or not the deaf may have been assisted by a practice effect as the result of having four sample questions while the hearing had only one example in the instructions. If there had been a practice effect it would be present in the results from the two tests administered to the deaf (verbal and symbolic syllogisms) and it might be expected to have the same magnitude in both cases. In the case of the verbal syllogisms, subtraction of any practice effect from the performance of the deaf group would only enlarge the already significant difference between that group and the hearing. In the case of the symbolic syllogisms, subtraction of any practice effect, up to a magnitude of approximately 3.4 questions, would not create a significant difference between the two groups, since the deaf mean is higher than the hearing mean on those questions. It is unlikely that the four sample questions produced a practice effect large enough to cause such a reversal of the conclusions of this study (Wing, Note 2). Second, the performance of the deaf applicants on the verbal syllogisms, while not as good as the performance of hearing applicants, clearly demonstrates that the deaf are capable of making many correct deductions when problems are presented in simple language. That is not to say, however, that the deaf applicants necessarily used verbal language in solving the questions. It is quite possible that they translated the questions into sign language or used some other means of mental representation of the content of the questions. The concepts used in the questions were mainly of the sort that could be easily represented in signs. If one wishes to apply these data to the controversy concerning linguistic deficiency and thinking (Furth, 1966, 1971) one might say that the data on symbolic syllogisms support the conclusion that reasoning can develop fully in the absence of complete verbal competence. Finally, these data contain some evidence that the reasoning process in the hearing is affected by the linguistic medium in which the problem is presented. In particular, hearing applicants performed better on the simple language version of the final test than they did on the complex language and symbolic versions. A difference in some aspects of the performance of hearing persons on symbolic and verbal syllogisms has been reported earlier (Wason and Johnson-Laird, 1972). In the present study, however, there is the possibility that errors in the execution of the test plan may have exaggerated the difference between the simple language and the other two versions of the test. For instance, a total of six differences were found, either in premises or in distracters, be tween the simple language and the symbolic tests. If all of the difference in difficulty level of these six questions between the two versions of the test were attributed to the errors, it would be found that the mean of the simple language test was increased by .94 because of such errors. Subtracting that amount from the mean would give a value of 16.8 for the mean of the final test-simple language (A-2). This mean would still be significantly different from the means of the other two tests, but the magnitude of the difference would be reduced. In addition to the practical results which this research yielded for the Federal testing program, the project uncovered several areas which merit further analysis in the immediate future. As mentioned before, research is planned on 1. whether or not more intricate propositional equivalents (inverses and contrapositives) will demonstrate as much distracting and discriminating power as those used in the final test of categorical syllogisms, and 2. whether or not the performance of propositional equivalents as distracters is contingent on the syllogistic figure and mood in which they appear. Additionally, research is planned on the testing of deductive abilities in the deaf in the form of conditional and alternative deductive schemata presented in symbolic form. 14 Reference Notes 1. Nester, M.A. Time limits for competitive administration of the Professional and Administrative Career Examination (TM 77-15). Washington, D.C.: u. s. Civil Service Commission, Personnel Research and Development Center, July 1977. 2. Wing, H. Personal communication, May 9, 1978. References Black, M. Margins of precision. Ithaca, N. Y.: Cornell University Press,1970. Carnap, R. An introduction to the philosophy of science. New York: Basic Books, 1974. DiFrancesca, s. Academic achievement test results of a national testingprogram for hearing impaired students, United States: Spring 1971(Series D, No. 1). Washington, D.C.: Gallaudet College, Office ofDemographic Studies, 1972. Frank, T. S. and Smith, J. F. Modern calculus. Glenview, Illinois: Scott,Foresman and Company, 1970. Furth, H. G. Thinking without language: Psychological implications of deafness. New York: The Free Press, 1966. Furth, H. G. Linguistic deficiency and thinking: Research with deaf subjects 1964-1969. Psychological Bulletin, 1971, ~. 58-72. Kemeny, J. G. A philosopher looks at science. Princeton, N.J.: Van Nostrand,1959. Levine, E. S. The Psychology of deafness: Techniques of appraisal forrehabilitation. New York: Columbia University Press, 1960. Myklebust, H. R. The psychology of deafness: Sensory deprivation, learning, and adjustment (2nd ed.). New York: Grune and Stratton, 1964. Plato. The dialogues (B. Jowett, trans.). Oxford: Clarendon Press, 1934. Quine, W. V. Methods of logic. New York: Holt, Rinehart and Winston, 1972. Stunkel, E. R. The performance of deaf and hearing college students on verbaland nonverbal intelligence tests. American Annals of the Deaf, 1957, 102,342-355. Wason, P. C. and Johnson-Laird, P. N. Psychology of reasoning: Structure andcontent, London: B. T. Batsford Ltd., 1972. 15 -> U. S. GOVERNMENT PRINTING OFFICE ' 1978-620-003/3314