Item Analysis of C, D and E Series from Raven’s Standard Progressive Matrices with Item Response Theory Two-Parameter Logistic Model

Nikolay Georgiev

Abstract
The present report is focused on the Item Response Theory research methodology and descriptive potential. Its purpose is to present the item analysis of C, D and E Series from Raven’s Standard Progressive Matrices which were given to 506 Bulgarian high school students. The basic concepts and underlying assumptions of IRT are briefly reviewed. The latent variable is defined after a short check of some intelligence theories and a detailed examination of the used items. After the verification of IRT assumptions, the Two-parameter logistic model is selected for the analysis. The estimated item parameters are interpreted in accordance to suggested guidelines. Item Characteristic Curves and Item Information Functions are plotted and their features are also discussed.
Keywords: item response theory; two-parameter logistic model; standard progressive matrices; intelligence.



Introduction
Psychological measurement history knows great variety of conceptions concerning the most effective methods for assessment and evaluation. The characteristics to assess and evaluate are even more. However, one of these characteristics – intelligence, has been of greatest interest for the researchers for a very long period of time. Enormous efforts have been devoted to experiments with hundreds or thousands examinees. Many different assessment methods have been tested in order to obtain the best procedure. Nevertheless, the disagreement about the nature and the structure of intelligence has been prevailing scientific knowledge for decades. The efforts however were not unprofitable. As a result from the ambition for better measurement methods of intelligence, a whole new measurement theory started to develop – Item Response Theory. The explanatory potential of its methods spread over all behavioral characteristics that could be measured.

Item Response Theory – the modern theory for psychological measurement
A brief history

The conceptual foundation for Item Response Theory was laid down by Thurstone (1925) in his paper “A Method of Scaling Psychological and Educational Tests” where a technique for placing the items of the Binet and Simon test for children on an age-graded scale is provided. Thurstone’s colleagues and students continued to refine the theoretical bases of IRT. Lawley (1943) described maximum-likelihood estimation procedures for the item parameters. Lord (1952) differentiated the latent variable from the observed test score. (According to Reeve, 2004, p. 5-6).
The rapid progress of Item Response Theory starts in the 60s of 20th century and follows two separate lines of development. The first line is traced up to Lord and Novick and their classical textbook “Statistical Theories of Mental Test Scores” (Lord & Novick, 1968) where original and precise measurement methods are presented. The textbook has become a manual for many people with interests in psychological measurement. The second line of development is traced up to the Danish mathematician Georg Rasch (1960). He works out a whole family of models to develop measures of reading and to develop tests for use in the Danish military. Rasch’s work inspired two other psychometricians who extended his models. In Europe, Gerhard Fischer (1974) from the University of Vienna, extended the Rasch model for binary data. During a visit to the United States, Rasch inspired Benjamin Wright, an American psychometrician, to teach objective measurement principles and to extend his models. (According to Embretson, 2000, p. 10-11)
One of the contemporary authors with the biggest contribution to Item Response Theory methods understanding is Frank Baker. According to him (Baker, 2001) over the past century three people have contributed most for the development of Item Response Theory: Lawley from the University of Edinburgh who showed that many of the constructs of classical test theory could be expressed in terms of parameters of the item characteristic curves; Lord, whose work in the Educational Testing Service developed the theory’s methods and many computer programs for their application; Wright of the University of Chicago who recognized the importance of the work of Georg Rasch and brought it to the attention of practitioners. “Without the work of these three individuals, the level of development of item response theory would not be where it is today.” (Baker, 2001, p. ii)
Essential concepts
The fundamental peculiarity of the variables in behavioral and social sciences is that they are directly unobservable and, consequently, directly immeasurable or, in other words, latent. The primary objective of most psychological and social experiments is to assess the quantity of certain latent variables i.e. to give a value for these variables as a result obtained by the examinees on the test or tests. In order to measure a latent variable, a proper scale is needed in the first place. Different people could have very different values of same latent variable. In each sample of examinees drawn from a population the latent variable has certain distribution which serves as a basis for the scale. In correspondence with the latent variable distribution and depending on their observable test result, examinees are differently scored and have different places on the latent variable scale.
On every mention, the phrase ‘observable test result’ is firstly and most often associated with the total test score as a sum of all responses of the examinee. It is Classical Test Theory that states that the total test score is the most eligible estimator of the latent variable. However, some examinees have equal total score but different responses to the test items. Without fully rejecting the Classical Test Theory’s claims, Item Response Theory focuses on responses to different items separately, and on the particular response patterns.
The purpose of Item Response Theory methods is to explain and clarify the relationship between the latent variable and the item response. Examinees possess several latent variable values. Items, on the other hand, have different possible responses and each of them has corresponding probability. Therefore, the relationship mentioned is a relationship between the latent variable value and the probability of certain response. To be more precise, it is a functional dependence between the latent variable value as an independent variable and the probability of the certain response as a dependent variable. For every test item this function is different and determined by the qualities of the item to measure the latent variable, quantitatively expressed as item parameters. The function itself is called Item Characteristic Curve and is basic construct and a building block of all other constructs of Item Response Theory. Although every response category can be described by a characteristic curve, the probability of correct response is most recently used to characterize an item. An example of Item Characteristic Curve is given in fig. 1. The latent variable is usually denoted ?, and the probability of correct response – P(?).

Figure1.jpg
Fig. 1. Item Characteristic Curve example and Item Information Function example.

Item Characteristic Curve estimation through item parameters assessment makes possible the latent variable calculations. The estimated latent variable values are normally distributed with mean ? and corresponding variance s2 and standard deviation s (Cramer, 1946). For each value from the latent variable interval, there is relevant amount of information that the model used for analysis with the item parameter estimations gives for the latent variable. The information shows how precise the item measures the latent variable. The smaller the standard error, the bigger the information the item gives about the latent variable is. The variability of the information between the levels of the latent variable forms the Item Information Function. An example for Item Information Function is shown on figure 1.
Assumptions
The Item Characteristic Curve can only be plotted if the responses of the examinees to the items are consistent to certain assumptions. For the unidimentional IRT models these assumptions are unidimentionality, local independence and (normal) distribution.
The assumption of unidimentionality of the latent variable states that each item measures a single continuous latent variable. “This assumption cannot be strictly met because several cognitive, personality, and test-taking factors always affect test performance, at least to some extent (…). What is required for the unidimensionality assumption to be met adequately by a set of test data is the presence of a “dominant” component or factor that influences test performance. This dominant component or factor is referred to as the ability measured by the test.” (Hambleton, 1991, p. 9-10).
According to the assumption of local independence of the items there is no direct relationship between the items and the only relationship among the item responses is a result by the conditional relationship of the items with the latent variable. “Local independence means that when the abilities influencing test performance are held constant, examinees’ responses to any pair of items are statistically independent…” (Hambleton, 1991, p. 10).
The normal distribution of the latent variable in the population is assumed when the test score distribution approaches normal distribution. This assumption simplifies the presentation of the latent variable values as standard values with mean 0 and standard deviation 1.
Logistic models for dichotomous items
The starting points of the development of various IRT models are the models for dichotomous items. Most effective among the diversity of those models are the logistic models that use logistic functions to describe the dependence between the latent variable and the probability of correct response. Depending on the parameters used there are logistic models for dichotomous item, called respectively one, two, and three parameter logistic model.
The One-parameter logistic model implies only the item difficulty parameter, denoted by b. The bigger the value of difficulty parameter, the smaller the overall observed probability of correct response is. The equation for the model is:
(1.1) fornula_1.1.jpg
,
where e is the constant fornula_text1.jpg, and D1˜702 is constant of the transformation between logistic and normal probability function.
The units of measuring the difficulty of an item and the latent variable of an examinee are same so the difficulty parameters and the latent variable values are directly comparable. The difficulty parameter is the point on the latent variable scale at which the probability of correct response is 0,5. This point is also an inflection point of the Item Characteristic Curve.
The Item Information Function for this model is:
(1.2) fornula_1.2.jpg
where Q(?)=1-P(?) is the probability for incorrect response.
Besides the item difficulty parameter, the Two-parameter logistic model implies the item discrimination parameter, denoted by a. Bigger values of item discrimination parameter mean that the item discriminates better examinees with different values of the latent variable. The equation for the model is:
(2.1) fornula_2.1.jpg
.
The value of the discrimination parameter reflects the steepness of the Item Characteristic Curve. Bigger values correspond to steeper curve which is best seen at the inflection point. Mathematically, the One-parameter logistic model can be considered a variety of the Two-parameter model with discrimination parameter equal to 1.
The Item Information Function for this model is:
(2.2) fornula_2.2.jpg
.
The extra parameter that is implied by the Three-parameter logistic model is the guessing parameter, denoted by c. It represents the component of the probability of correct response that is due to guessing, or in other words, the probability of getting the item correct by guessing alone.
The equation for the model is:
(3.1) fornula_3.1.jpg
.
The ordinate of the lower asymptote of the Item Characteristic Curve is equal to the value of the guessing parameter, not equal to 0 as in the models presented previously. The meaning of the difficulty parameter is also different – the item difficulty is the point on the latent variable scale where the probability of correct response is not 0,5 but fornula_text3.jpg
.
The Item Information Function for this model is:
(3.2) fornula_3.2.jpg
.

A successful theory and an adequate measurement instrument of intelligence
Contradictions and limitations of the factor approach
Having good analyzing methods is important but still not enough for the complete understanding of the latent variable of interest, especially for understanding a variable as complex as intelligence. The theory chosen and its postulates are essential for this purpose. For many years after the beginning of the serious studies on intelligence, two groups of theories have been dominating the scientific society – theories of unitary and theories of multiple structure of intelligence.
The theories from the first group state that one factor dominates all cognitive abilities. In 1904, after many experiments and data analysis on intelligence Spearman concluded “that all branches of intellectual activity have in common one fundamental function (or group of functions), whereas the remaining or specific elements of the activity seem in every case to be wholly different from that in all the others” (Spearman, 1904, p. 284). Later, Cattell expanded that doctrine, suggesting that specific knowledge is the result of an investment of general cognitive ability into the formation of more specific abilities or knowledge and developing the concepts of fluid and crystallized intelligence (Cattell, 1971).
The other theories state that the intelligence consists of several independent but equipollent abilities. In 1938 Thurstone presents the factor analysis results according to which there are seven primary mental abilities: verbal meaning, numbering, word fluency, rote memory, perceptual speed, spatial visualization and reasoning (Thurstone, 1938). An extreme conception of the multiple structure of intelligence is proposed by Guillford, who maintains that intelligence is constructed by 120 independent abilities, result of combinations of five types of operations, four types of contents, and six types of products (Guilford, 1959). One of the most contemporary models, supporting the conception of multiple structure of intelligence, is developed by Gardner. He claims that the intelligence is composed by at least seven frames of mind: linguistic, logical-mathematical, spatial, musical, body-kinesthetic, interpersonal and intrapersonal (Gardner, 1983).
In spite of their fractional hits, these theories have some major disadvantages in explaining how the proposed intelligence structure works. Multiple intelligence theories could hardly enlighten the often observed high correlations between the different factors considered to be relatively independent. By analogy, it is difficult for unitary intelligence theories to explain the individual differences in different test results by taking into account only one dominating factor. On the other hand, intelligence always operates with respect to some stimuli. It is slightly difficult for both sets of theories to completely explain the mechanism of this interaction. This deficiency is overcome by the information paradigm which considers intelligence more as a dynamic process then as a static structure.
Sternberg’s triarchic theory of intelligence
According to Sternberg the intelligence has to be viewed in terms of mental processes which contribute to the task performance. The focus of his work is not on the result of the performance but on the mental processes that contribute to it. Therefore the intelligence should be examined in relation to: the structures and processes of intelligent functioning; the application of these structures in the external world; and the role of experience in forming intelligence and its application. (Sternberg, 1985)
In Sternberg’s theory, intelligence is comprised of three kinds of information processing components: metacomponents, performance components, and knowledge-acquisition components. These components work together to facilitate the cognitive development. “The most important components are the metacomponents, which are used to (a) recognize the existence of a problem, (b) define the nature of the problem, (c) allocate resources to solving the problem, (d) mentally represent the problem, (e) formulate a strategy for solving the problem, (f) monitor solution of the problem while problem solving is ongoing, and (g) evaluate the solution to the problem after problem solving is done.” (Sternberg, 2003). These components are related to the planning and decision making, not directly to actions. But they direct what actions should be used in solving the particular task. The performance components are the processes used to execute the problem solving strategy i.e. they are actually the actions. The performance components also include the weighing of the consequences of the actions in comparison to other options. The knowledge-acquisition components are the processes used to acquire new information, which is used in solving a potential problem. These processes are very abstract and not necessarily related to a current problem solving task.
The extensive research and the deep analysis of the intelligence components and mechanisms and their relations make the triarchic theory a remarkable scientific achievement capable of putting an end on the contradictions of the factor models. It is evident that the intelligence has a modular structure and it only remains good measurement instruments to be adjusted and used for its assessment.
Matrix items in measuring intelligence. Raven’s Progressive Matrices Test
There are lots of intelligence test that consist of different items – number sequences, anagrams, different object identification etc. For most types of items it is not difficult to identify the skills needed to get the item correct. When the items used are strongly connected only to one skill or to many completely different skills, the results can support the hypothesis about the multiple factor intelligence structure or the hypothesis about the unitary intelligence respectively. However, there are types of items with very structured stimuli material which require more than one dominating skill to be used. Matrix items are such items.
Most frequently, a matrix item consists of nine fields situated in 3 by 3 square (fig. 2). A figure is placed in each field except the down-right one. There is a logical relation between the figures which could be seen most often but not only horizontally and vertically. The goal is to find this relation and choose a suitable figure among the given variants to be put in the last empty field.
One of the best intelligence tests with matrix items is the Standard Progressive Matrices developed by Johns in Raven and his colleagues. The test consists of 5 sets with 12 items in each set – 60 items in total. A and B sets are for children under 14 years of age, and C, D, and E sets are for over 14 years old persons. The author’s goal is to make every next item more difficult than the previous one. For the purposes of the present work the sets C, D, and E of the Raven’s Standard Progressive Matrices are used. Each item of those sets has 8 response variants, one of which is correct. Table 1 shows the logical relations between the figures in all items.

Figure2.jpg
Fig. 2. Matrix item example.

As shown in table 1, the different items have different solving strategies that combine several skills. In order to give correct response to the item, the examinee should work out the logical relation between the figures and then find the figure which best complies with that relation. Therefore a significant part of examinees’ abilities are involved in solving the item. The complicated solving strategy of each item unambiguously demonstrates that the items of this type do not only examine one or more basic skills, but the way that one acts deliberately and purposely to combine those skills. The results of those items hence should be interpreted in terms of Sternberg’s triarchic theory of intelligence. When solving an item, the examinee first chooses the strategy to apply (metacomponents). It can involve rotation, moving, calculating etc. (performance components). After solving the item the examinee memorizes the strategy used independently from the stimuli material so it can be used for another item (knowledge-acquisition components). Therefore, if the empirical data satisfies the assumptions of the Item Response Theory, the latent variable should be defined as intelligence, and each response given by the examinees should be treated as a result of the interaction of its three hierarchically arranged components.

Table 1. Logical relations in Standard Progressive Matrices Items.
vable1.jpg

Examination, Analyses and Results
Sampling and Procedure
A total of 506 students from two Bulgarian high schools volunteered to participate in the examination. The students are between 14 and 18 years of age (8th to 12th grade). 203 (40.12%) of them are male and 303 (59.88%) are female. A crosstab for the sample by sex and age is shown on table 2. The examinees specified their sex and age at the back side of the answer sheet. They were asked to specify their age as an integer number indicating their complete age. The age average is 15.585 and the age standard deviation is 0.807.

Table 2. Crosstab of the sample by sex and age.
vable2.jpg

The test is given to the examinees with a standard instruction and an answer sheet on the front side of which they should specify their answers. The examinees have 40 minutes to complete the test. The responses are considered strictly and only correct or incorrect. When an item is omitted, the corresponding response is considered incorrect. The sum of the correct responses for each examinee is referred to as test score. Hence, the minimum test score is 0, and the maximum test score is 36.

IRT Assumptions Verification
Unidimentionality. To examine the unidimentionality of the item set, its factor structure and internal consistency shall be studied. Principal component factor analysis without rotation is used for factor extraction. The results are shown in table 3. The 11 extracted factors with eigenvalues over 1 are sorted in a descending order by their eigenvalues and respectively by the variation explained. The eigenvalues polygon is presented on figure 3.

Table 3. Factor analysis results.
vable3.jpg

Figure3.jpg
Figure 3. Plot of eigenvalues.

The explanatory potential of each factor shall be assessed in comparison to the explanatory potential of the other factors. The eleven factors with eigenvalues over 1.00 explain 55.91% of the variance. The first factor explains 19.62% of the variance. Hence, there is one dominating factor in the factor structure of the item set. According to Reckase (Reckase, 1979) the necessary condition to assume unidimentionality is the dominating factor to explain round or more than 20% of the variance. The first factor satisfies that condition. According to the same author the sufficient condition to assume unidimentionality is the ratio between the eigenvalue of the first dominating factor and the eigenvalue of the second factor to be large enough, for example over 2. This ratio for the first two factors is 2.839 and therefore that condition is met as well. The satisfaction of the two conditions means that the assumption for the unidimentionality of the item set is reasonable and justifiable.
Local independence. When examining the dependence between items the stimulus material has key importance. The figures in the items are different. However, the logical relation between the figures in one item could be similar to the logical relation between the figures in other items. The suggestion that the response to one item could influence the response to other items is somehow sensible. Nevertheless, as previously noted, the experience acquisition is related to knowledge-acquisition components and consequently it is a part from the latent variable. Respectively, the influence of this experience on the overall performance is in actual fact a part of the influence of the latent variable. Therefore the items are locally independent.
Normal distribution. The mean of the test scores is m=24.9328 and the standard deviation of the test score is s=5.6259. The histogram of the test scores is shown on figure 4. The normal distribution of the latent variable is taken for granted.

Figure4.jpg
Figure 4. Histogram of the test scores.

Data Organization
To be able to perform IRT models analyses, the data should be strictly organized. First, the examinees who have responded correctly or incorrectly to all items shall be removed from the data set. The items which have been given only correct or incorrect responses should also be removed from the data set. There are no such items or examinees in the present data set.
Second, the examinees should be grouped in accordance with their test score. Since there are mostly 35 different test scores, there could be at most 35 groups of examinees. There is a standard value corresponding to each test score. This standard value should be the initial and intuitive value of the latent variable of the examinees with certain test score. The intervals of the test score are transformed into latent variable intervals.
Third, for every item separately, for every interval of the latent variable the total number of examinees and the number of examinees who responded correctly shall be taken into consideration. The ratio of the correct responses to all responses is the observed probability for correct response. After calculating those probabilities for every latent variable interval, the item parameters could be estimated. Graphical representation of the calculations for several items is shown on figure 5. Each probability is shown as a point. After the model is built, the item characteristic curve shall be the curve that best fits to the points.

Figure5.jpg
Figure 5. Graphical representation of the probability of correct response for some items.

IRT Model Selection
The decision about which Item Response Theory model should be used is even more difficult to make than the assumption verification. The best way to do it is to analyze the data with all relevant models and to compare the chosen goodness of fit criteria for those models. However, even for the simplest models the item estimation involves many complicated calculations. For that reason the decision about the model that should be used is made on the basis of the specific features of the items and the descriptive statistics of the raw data.
From mathematical point of view, the Two-parameter logistic model is a particular variety of the Three-parameter logistic model, when c=0, and the One-parameter logistic model is a particular variety of the Three-parameter logistic model when c=0 and a=1. The Three-parameter model should be used if it is assumed that the probability of guessing is big enough to significantly influence the results. The less the response variants are, the bigger the probability of guessing is. It is reasonable to use the Three-parameter logistic model for items with less than 4-5 response variants. The items from the test however have 8 response variants each and it is not likely to gain efficient information using the Three-parameter logistic model. The basic feature of the One-parameter logistic model is that all items are assumed to have equal discrimination among the examinees. It is reasonable to use the model when the items have approximately equal variance, which means that they discriminate the examinees almost equally. Table 4 shows some descriptive statistics for the items. There is no need of statistical analyses to make the conclusion that the item variations are significantly different. Hence, the best choice is the Two-parameter logistic model which is also the most frequently used model for dichotomous items parameter estimation.

Table 4. Mean and standard deviation of the items.
vable4.jpg

Estimation Method and Parameter Values
The item parameters difficulty (b) and discrimination (a) were estimated with Maximum Likelihood Estimation method (Baker, 2004, p. 38-43). The standard values of the examinees were used as initial latent variable values. The initial values before the first iteration for the parameters of every item were b=0 and a=1. The iteration process for every item ended when the absolute value of the difference between the parameter estimated in the previous iteration and the parameter estimated by the present cycle is less than 0.001 for both parameters at the same time. The parameters estimated in the previous iteration were then considered to be item parameters. The least square criteria was used as a goodness of fit criteria.
After applying the estimation method for each item the difficulty parameter (b) and the standard error of its estimation (sb) ,the discrimination parameter (a) and the standard error of its estimation (s?), the least square criteria value and the count of the iterations needed to estimate item parameters are calculated. The results are shown in table 5. The average of the count of iterations needed for the estimation of item parameters is 5.19.

Table 5. Parameter values, standard errors, average least square and iterations needed for the estimation.
vcable5.jpg

The Item Characteristic Curves for each item are plotted in accordance to formula (2) by replacing the parameters with the estimated parameter values. Figure 6 presents the Item Characteristic Curves of the items described on figure 6 with charts of their discrete probability of correct response. Figure 7 presents the Item Information Functions of the same items.

Figure6.jpg
Figure 6. Item Characteristic Curves of the items from figure 5.

Figure7.jpg
Figure 7. Item Information Functions of the items from figure 5.

Discussion
Item Characteristic Curves and Item Information Functions
Interpretation

The Item Characteristic Curves and Item Information Functions give a lot of information about the items and are useful especially for item comparisons when showing up item differences. The Item Information Function formula includes expressions and parameters which are also included in the Item Characteristic Curve formula and the graphics of these functions are strongly related.
The point of inflection of the Item Characteristic Curve is where the latent variable is equal to the item difficulty. This is the point where the item discriminates the best. This is also the point where the Item Information Function has a local maximum, i.e. the item gives most information for examinees with latent variable values which are near the item difficulty value. The Item Characteristic Curve is the steepest at the point of inflection. The greater the item discrimination, the steeper the Item Characteristic Curve is, and, the steeper the Item Information Function is. This means that items with larger discrimination values give more information than items with smaller discrimination values. These relations can be easily illustrated with examples.
Item D08 is relatively easy, so the point of inflection of its Item Characteristic Curve, and the curve itself, is situated in the left part of the chart. The maximum of the Item Information Function is also situated in the left part of the chart. Item D12 is relatively difficult, so the point of inflection of its Item Characteristic Curve, and the curve itself, is situated in the right part of the chart. The maximum of the Item Information function is also situated in the right part of the chart. Item D12 is more difficult than item D08 and its point of inflection is on the right side of the point of inflection of item D08.
Item E07 has relatively high discrimination value, so its Item Characteristic Curve is relatively steeper, and its Item Information Function is also relatively steeper. Item E09 has relatively low discrimination, so its Item Characteristic Curve is relatively flatter, and its Item Characteristic Curve is also relatively flatter. Item E07 has higher discrimination than item E09 which means that its Item Characteristic Curve and its Item Information Function are steeper at the point of inflection i.e.it gives more information at its point of inflection.

Item Parameter Values Interpretation
As previously noted, the difficulty parameter and the latent variable have similar units of measurement and those units are standard values. The items then could be grouped in accordance to their difficulty parameter value in the way that examinees could be grouped in accordance to their standard values. Table 6 provides an example of such grouping criteria and the items that go in each group.

Table 6. Guidelines for interpreting the difficulty parameter values.
qwable6.jpg

The discrimination parameter can not only take positive but negative values as well. Items with negative discrimination are rarely used with specific purpose only because the negative value means that the examinees with larger latent variable value give more incorrect responses and the examinees with smaller latent variable give more crrect responses. An exemplary intervals for the positive discrimination parameter values are given by Baker (Baker, 2001). Table 7 provides those intervals and the distribution of the items.

Table 7. Guidelines for interpreting the discrimination parameter values.
vable7.jpg

The given guidelines simplify rough interpretation and comparisons. For example, item D08 is an easy item with moderate discrimination, and item D12 is a difficult item with moderate discrimination. Item E09 is an item with moderate difficulty and moderate discrimination, and item E07 is an item with moderate difficulty and very high discrimination.
For more precise and simultaneous interpretation of the item parameters, the items shall be presented in a coordinate system which basis are the independent parameter scales (fig. 8). The item difficulty values are on the x-axis and the item discrimination values are on the y-axis. Such presentation is very useful not only for item comparisons but also for identifying items with similar parameters, which could be shifted when creating similar tests or parallel forms. For instance, item E06(b=-0.4413, a=1.3311) and item C11(b=-0.4713, a=1.3169) have similar properties and are exchangeable. The same could be said about item D04(b=-2.0860, a=1.3624) and item D07(b=-2.1275, a=1.0726).

Figure8.jpg
Fig. 8. Distribution of all items in accordance to their parameters.

Conclusion
The whole Raven Progressive Matrices test has 60 items. The C, D and E series, which are suited for adults have 36 items. At least 30 minutes are needed for their completion of these series. On the other hand, different researchers have different purposes and needs. The ones who work with developmental disorders would prefer easy instruments, which will differentiate the patients with disorder from others. The ones who recruit people for important positions would prefer difficult instruments, which will differentiate the candidates with the brightest abilities from the others.
In the present quickly developing world even 30 minutes are much to waste with large or non-goal specific instruments. In exchange for the long data collection process and the difficult analyses and interpretation are the opportunities for practical application of the results gained through Item Response Theory methods. By virtue of those methods questions like “Could a latent variable be examined with 10 times less items?” or “Will a test which has 5 times less items be as good as the original one?” have long ago found their positive response.

References
Baker, F. & Kim, S. (2004). Item Response Theory. Parameter Estimation Techniques. Second edition. Marcel Dekker Inc.
Baker, F. (2001). The Basics of Item Response Theory. Second edition. ERIC Clearinghouse on Assessment and Evaluation.
Cattell, R. B. (1971). Abilities: Their structure, growth, and action. Boston: Houghton Mifflin.
Cramer, H. (1946). Mathematical methods of statistics. Princeton, NJ: Princeton University Press.
Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. London: Lawrence Erlbaum Associates.
Gardner, H. (1983). Frames of mind: The theory of multiple intelligences (2nd ed.). New York: Basic Books.
Guilford, J. P. (1959). Three faces of intellect. American Psychologist, 14, 469 – 479.
Hambleton, R., H. Swaminathan, H. Rogers (1991). Fundamentals of Item Response Theory, Newbury Park, CA: Sage Publ. Inc.
Reckase, M. (1979). Unifactor latent trait models applied to multifactor tests: results and implications. Journal of Educational Statistics, 4, 207-230.
Reeve, B. (2004). An Introduction to Modern Measurement Theory. US National Cancer Institute. http://appliedresearch.cancer.gov.areas/
Spearman, C. (1904). “General intelligence” objectively determined and measured. The American Journal of Psychology, 15, 201–293.
Sternberg, R. J. (1985) General Intellectual Ability. In: Sternberg, R. J. et al. (ed.) Human Abilities. An Information-Processing Approach. New York: W.H. Freeman and Company, 5 – 30
Sternberg, R. J. (2003) A Broad View of Intelligence. The Theory of Successful Intelligence. In: Consulting Psychology Journal: Practice and Research (Summer 2003)
Thurstone, L. L. (1938). Primary mental abilities. Psychometric Monographs, 1.

Biographical statement
Nikolay Georgiev has a Bachelor degree in Psychology from Sofia University “Sv. Kliment Ohridski” where he graduated in July 2007. Before that he has studied Mathematics and Informatics in National High School of Mathematics and Science, Sofia, Bulgaria. Currently, Nikolay is studying Statistics and Financial Econometrics in Sofia University to obtain a Master degree.
E-mail: nickolay.georgiev@gmail.com