paper2FINb.dvi


Finite Mixture Analysis of Beauty-Contest
Data from Multiple Samples ∗

Antoni Bosch-Domènech
José G. Montalvo

Rosemarie Nagel
and

Albert Satorra
Universitat Pompeu Fabra, Barcelona

January 27, 2004

∗Research supported by grants BEC2000-0983 and SEC2002-03403 from the Spanish
Ministry of Science and Technology.

1


Abstract

This paper develops a finite mixture distribution analysis of Beauty-

Contest data obtained from diverse groups of experiments. ML estimation

using the EM approach provides estimates for the means and variances of

the component distributions, which are common to all the groups, and es-

timates of the mixing proportions, which are specific to each group. This

estimation is performed without imposing constraints on the parameters of

the composing distributions. The statistical analysis indicates that many

individuals follow a common pattern of reasoning described as iterated best

reply (degenerate), and shows that the proportions of people thinking at dif-

ferent levels of depth vary across groups.

Keywords: Beauty-Contest experiments, reasoning hierarchy, finite mixture

distribution, EM algorithm.

Journal of Economic Literature Classification: C24, C44, C91.

0


1 Introduction

In recent years there has been an increasing interest in evaluating experi-

mentally individuals’ choices, decision processes and beliefs formation. From

an econometric perspective, the potential multiplicity of decisions and be-

liefs favors clustering procedures to separate the different outcomes of each

decision process. These procedures differ in the estimation techniques used

and the amount of structure imposed on the econometric model.

In this paper we seek to interpret the choice data reported in A. Bosch-

Domènech, J. G. Montalvo, R. Nagel and A. Satorra (2002), by constructing a

finite mixture model. These data were obtained in seventeen different exper-

iments involving the Beauty-Contest (BC) game. In a basic BC game, each

player simultaneously chooses a decimal number in an interval. The winner

is the person whose number is closest to p times the mean of all chosen num-

bers, where p < 1 is a predetermined and known number. The winner gains

a fixed prize. In this game there exists only one (Nash) equilibrium in which

all players choose the lowest possible number. In the seventeen experiments

reported, p = 2/3 and the interval, in sixteen out of the seventeen, is [0, 100].

In one experiment the choice set is [1, 100].

Several types of reasoning processes have been proposed to explain the

individuals’ decisions in the BC game (see references in Section 5). One such

reasoning process, denoted as IBRd, for Iterated Best Reply with degenerate

beliefs (i.e., the belief that the choices of all others are at, or around, one

precise value),1 classifies subjects according to the depth, or number of levels,

1See, e.g., Bosch-Domènech et al. (2002) or Stahl (1996).

1


of their reasoning. It assumes that, at each level, every player has the belief

that she is exactly one level of reasoning deeper than all the rest. A Level-0

player chooses randomly in the given interval [0, 100], with the mean being

50. Therefore, a Level-1 player gives best reply to the belief that everybody

else is a Level-0 player and thus chooses 50p. A Level-2 player chooses 50p2,

a Level-k player chooses 50pk, and so on. A player who takes infinite steps of

reasoning, and believes that all players take infinite steps, chooses zero, the

equilibrium. This hypothesis of iterated best reply, together with p = 2/3,

and an interval [0, 100], predicts that choices (in addition to random and

haphazard choices, corresponding to Level-0 players) will be on the values

33.33, 22.22, 14.81, 9.88, . . . and, in the limit, 0.

The seventeen different experiments whose data we are analyzing take

place in differents settings, and are classified in six groups as described in

Table 1.2

Note that the experiments are performed in very different environments,

involving different subject pools, sample sizes, payoffs, and settings: the data

have been collected in classrooms, conferences, by e-mail, through news-

groups or among newspaper readers, as well as in laboratories with under-

graduate students. The non-laboratory sessions typically allow more time

to participants and use economists, game theorists, or the general public as

subjects. We are, therefore, dealing with a rich and heterogeneous data set.

This paper presents a statistical analysis of these BC data allowing for

two types of heterogeneity: one that is unobserved, namely the reasoning

2More details of these seventeen experiments and the IBRd hypothesis can be found in
Bosch-Domènech et al. (2002).

2


Table 1: The data of the 6 different groups of experiments

Group # of Description of Sample size
experiments subjects ng

1 (Lab) 5 Undergraduate students 86
in labs (Bonn & Caltech)

2 (Class) 2 Undergraduate students, UPF 138
3 (Take-Home) 2 Undergraduate students 119

in Take-Home tasks, UPF
4 (Theorists) 4 Game Theory students 92

and experts in Game Theory
in conferences and e-mail

5 (Internet) 1 Newsgroup in Internet 150
6 (Newspapers ) 3 Readers of FT, E and S 7900

Financial Times 1476
Expansión 3696
Spektrum der Wissenschaft 2728

level of each individuals in the sample; and another one that is manifest, the

group membership. We specify a finite mixture distribution model, with all

parameters of the composing distributions unconstrained (to be estimated)

but equal across groups, and with mixture proportions that are group specific.

This approach contrasts with the previous literature, where data sets were

more homogeneous, and the models more restrictive.

The paper is organized as follows. The next section describes the data

and the characteristics of each group of experiments. Section 3 proposes a

finite mixture distribution model to interpret the unobserved heterogeneity

associated with the reasoning processes of agents playing the BC game. Sec-

tion 4 contains estimation results that give empirical support to the IBRd

hypothesis. Section 5 compares our results with those using alternative sta-

tistical procedures applied to BC data. Section 6 concludes.

3


2 Data description

Inspecting the histogram for the whole distribution, when all the groups are

pooled together (see Figure 1), we observe that the peaks closely correspond

to the numbers that individuals would have chosen if they had reasoned

according to the IBRd hypothesis, at reasoning levels one, two, three and

infinity. If we take the histograms for the six groups of data separately

(Figure 2), the peaks at level one, two and infinity are still discernible, but

their frequency varies considerably across groups of experiments.

The first group, Lab-experiments with undergraduates, is clearly distin-

guished from the rest, because the Nash equilibrium was rarely selected.

When subjects have some training in game theory, the proportion of sub-

jects choosing the equilibrium seems to increase. The highest frequencies

are attained when experimenting with theorists, in which case, the greater

confidence that others will reach similar conclusions may be reinforcing the

effect of training. In Newspapers, the frequency of equilibrium choices falls

somewhere in between,3 as should be expected from the heterogeneous level

of training of their readers.

Yet, for some subgroups of data in particular, the regularity of choices

can be striking. Take the responses from readers of Financial Times (FT)

and Spektrum (S). Despite catering to different types of readers (S to scien-

tists and FT to businessmen) and the severe non-normality of the data, a

comparison of the results of the experiment performed with S and FT readers

yields a very similar distribution, as can be observed in the quantile-quantile

3In Expansión the choices were in [1, 100]. If we include choices at 1 as equilibrium
choices, then the frequency would increase.

4


Frequency Distribution

choices

D
e

n
si

ty

Inf C B A 100

0
.0

0
0

.0
2

0
.0

4
0

.0
6

0
.0

8
0

.1
0

Figure 1: Histogram for the whole sample. The points A,B,C and Inf, cor-
respond to the choices of subjects with first, second, third and infinite levels
of reasoning.

5


Lab

 
p
ro

p
o

rt
io

n

Inf C B A 100

0
.0

0
0

.0
4

0
.0

8

Class

 
p
ro

p
o

rt
io

n

Inf C B A 100

0
.0

0
0

.0
4

0
.0

8

Take−home

 
p
ro

p
o

rt
io

n

Inf C B A 100

0
.0

0
0

.0
4

Theorists

 
p
ro

p
o

rt
io

n

Inf C B A 100

0
.0

0
0

.1
0

0
.2

0

Internet

 
p
ro

p
o

rt
io

n

Inf C B A 100

0
.0

0
0

.1
0

0
.2

0

Newspaper

 
p
ro

p
o

rt
io

n

Inf C B A 100

0
.0

0
0

.0
4

0
.0

8

Figure 2: Histograms for the six different groups. As in Figure 1, the val-
ues A,B,C and Inf, correspond to first, second, third and infinite levels of
reasoning.

6


0 10 20 30 40 50

0
1

0
2

0
3

0
4

0
5

0

qqplot Spektrum vs Financial Times

Financial Times

S
p

e
kt

ru
m

Figure 3: Quantiles of Spektrum vs Financial Times for choices smaller than
50.

plot of Figure 3.4 The Kruskal-Wallis chi-squared test statistic for the null

hypothesis that the two distributions are the same is equal to 0.002 (p-value

equal to 0.964), i.e., the two distributions cannot be distinguished.

4In this type of graphs, equality of distributions corresponds to points lying on the
diagonal.

7


3 The finite mixture model and estimation

procedure

From our previous discussion it appears that the basic problem in fitting a

statistical model to the BC data we consider is the existence of unobserved

heterogeneity (the different levels of reasoning), in addition to a multiple

group structure. Statisticians and, more recently, economists, have devel-

oped models of finite mixture distributions to deal with this type of prob-

lems.5 This section proposes an interpretation of the BC data as a mixture

of distributions and provides a statistical strategy to estimate such a model.

3.1 A multi-sample finite mixture model

Let us denote the multiple-sample data in Table 1 by {yig; i = 1, . . . ,ng}6g=1,
where yig is the number chosen by individual i in the group g of experiments,

and ng is the sample size of group g.

For each of the six different groups, we specify the following (K + 1)-

mixture probability density function for y,

fy(y,ψ) = π0f0(y) + π1f1(y,θ1) + . . .πKfK (y,θK ),

where the f0,f1, . . . ,fK are the components of the mixture distribution, θk

denoting a mean and variance parameter vector of component k, and

• f0(y) = 1/100, i.e., the density of the uniform distribution in [0, 100].

• fk,k = 1, . . . ,K − 1, are (truncated below 0 and above 100) normal
distributions of means µk and variances σ

2
k.

5Titterington et al.(1992) covers many issues related to the statistical properties of
finite mixture distributions models.

8


• fK is a Normal distribution, of mean µK and variance σ2K , left-censored
at the value 1. The censoring of this distribution models the non-null

mass probability at the left-limit value of the distribution, at values 0

and 1. Recall that in one experiment the limit value was at 1, not 0.

For the sake of parsimony we consider a single left-censored distribution

at 1 (which, obviously, automatically collects the censoring at 0).

• the πi’s are mixing proportions, with πi > 0 and
∑K

0 πi = 1. This

mixing proportions are the weights of the different components of the

mixture.

We define the parameter vector ψ = (π,θ)′, where π = (π0,π1, . . . ,πK ) is

the vector of mixing proportions and θ = (µ1, . . . ,µK,σ
2
1, . . . ,σ

2
K ) is the vec-

tor of parameters of the normal distributions underlying the mixture model.

The model we adopt for estimation sets π to be group-specific, but imposes

the equality of θ across groups. It is reasonable to assume that there is a com-

mon pattern of reasoning accross groups of individuals playing the BC-game,

therefore we let means and variances to be equal across groups. However, the

proportion of players at each level of reasoning may be different accross ex-

periments. This strategy allows also to obtain sensible estimates for complex

mixture distributions even in the groups with small sample size.

3.2 ML estimation and the EM algorithm

From the finite mixture model described above, the log-likelihood function

of θ is

l(θ) =
∑

i

log

(
K∑

k=0

πkfk(yi ; θ)

)
,

9


where i varies across all sample units. Since this log-likelihood function in-

volves the log of a sum of terms that are (highly non-linear) functions of

parameters and data, its maximization using standard optimization routines

is not feasible in general; for this maximization, we will resort on the EM

algorithm (Dempster, Laird and Rubin 1977). 6 We consider the data aug-

mented with variables di = (di1, . . . ,diK )
′, where dik are dummy variables

identifying the component membership (i.e., for each i, dik = 0, except for

one particular k, when dik = 1). Obviously, the di’s are non-observable. As-

suming that di has a multinomial distribution with parameters (π0, . . . ,πK )
′,

the log-likelihood of the complete data is:

lC (θ) =
n∑
i

K∑
i=0

dik (logπk + logfk(yi; θk))

The EM approach computes ML estimates using the following algorithm.

1. For given values of π̂ik and π̂k, maximize with respect to θ the function

∑n
i

∑K
i=0 π̂ik (logπ̂k + logfk(yi; θk))

2. For given θ, update the π̂ik (estimated conditional probabilities of case

i belonging to k) and the π̂ik (marginal probabilities) using the formula

π̂ik =
πkfk(yi; θk)∑K

i=0 πkfk(yi; θk)
and π̂k =

1

n

n∑
i

π̂ik (1)

Starting from initial estimates π̂ik’s and π̂k, the EM algorithm consists in

iterating 1) and 2) till convergence.

6Recently Arcidiacono and Jones (2003) have proposed an extension of the EM algo-
rithm where the parameters of the finite mixture distribution can be estimated sequentially
during each maximization step. In our case, however, we did not find necessary to resort
to this alternative. See also McLachalan and Peel (2000) for different extensions and
applications of the EM algorithm.

10


The optimization in 1) implies the maximization of a (K+1) group model

with weighted data. That is, we maximize

n∑
i

K∑
i=0

π̂iklogfk(yi; θk) =
K∑

i=0

(
n∑
i

π̂iklogfk(yi; θk)

)
.

Note, however, that our model imposes equality across groups (the six

groups of experiments) of the parameters that define the normal distributions

of the mixture, while it allows for group specific mixing proportions, πikg,

g = 1, . . . , 6. This implies the substitution of (1) by

π̂ikg =
πkgfk(yi; θk)∑K
1 πkfk(yi; θk)

and π̂kg =
1

ng

ng∑
i

π̂ikg, g = 1, . . . , 6. (2)

In terms of Bayes theorem, π̂ikg is the posterior probability of case i of group

g to be in component k, k = 0, 1, . . . ,K. The posterior probabilities can be

used to assign each observation to a component, by applying the simple rule

that element i is assigned to component k if π̂ik > π̂ik′ for any k
′ �= k. Note

that in our approach, the posterior probabilities of belonging to component

k change with the group g.

Information statistics can be computed using the general expression

C = −2log L + qM,

where L is the likelihood of the data, M is some constant and q is the num-

ber of parameters to be estimated. The preferred model is the one with

the smallest information criterium C, so the term qM is a penalty for over-

parametrization of the model. In the present paper we set M = 2, which

implies the use of the Akaike’s information criterium (AIC) as the guide for

choosing of our preferred mixtures model (see, e.g., Bozdogan (1970)).

11


4 Results of the Analysis

Using AIC to assess the fit of the model, we find that the preferred model

includes five (truncated) normal distributions, in addition to the uniform

and the normal censored components. The actual values of the AIC for

the mixture models with four, five and six (truncated) normal distributions

(plus the uniform and one left-censored distributions) are equal, respectively,

to 6.7619, 6.7602 and 6.9022 (multiplied by 104), supporting the choice of

five (truncated) normal distributions. When in this model we suppress the

uniform component, then AIC jumps from 6.7602 to 6.7902 (both values

multiplied by 104), which represents a substantial deterioration in the fit and

indicates the need for the uniform component.

Using initial parameter estimates based on sample statistics (sample quan-

tiles and variances), the EM algorithm achieves convergence after 775 itera-

tions. The evolution of (minus) the likelihood function during the iterations

process is shown in Figure 4.

Table 2 shows the estimates of the means and variances of the compos-

ing distributions, as well as the estimates of the mixing proportions across

groups. Of the five components that correspond to the truncated normal dis-

tributions, three are uncannily centered at the values predicted by the IBRd

hypothesis (estimated: 33.35; 22.89; 14.98; theoretical: 33.33; 22.22; 14.81).

Note also that deviations around these means are moderate.

A fourth normal component is a very flat distribution, centered at 35.9

with a large SD of 9.37. This we interpret as indicating that the uniform

distribution fails to capture all the Level-0 players. While the uniform dis-

12


0 100 200 300 400 500 600 700 800
7.9

8

8.1

8.2

8.3

8.4

8.5
evolution of −loglik

Figure 4: Evolution of the (minus) log-likelihood during iterations of the EM
algorithm

13


tribution appears to take care of some random or haphazard choices be-

tween 0 and 100, the need for this normal component suggests that many of

these choices are biased towards the lower half of the interval.7 We conclude

that Level-0 decisions are better described by both the uniform and this flat

normal distribution. This interpretation would suggest that the number of

Level-0 players is larger than previously thought.8

The fifth normal is centered at 7.35, below the theoretical prediction for

Level-4 players. The interpretation for this normal distribution is not as

straightforward as for other distributions. It could be the distribution of

Level-4 choices, with a mean smaller than the theoretic value of 9.88. How-

ever, analyzing about 1000 comments submitted by participants in different

BC experiments (see Bosch-Domènech et al. (2002)), we found that less than

1% reasoned at Level-4. Instead, participants reasoned either at most until

Level-3, or jumped all the way to Level-∞. Among the choices belonging to
subjects reaching Level-∞, only about 20% corresponded to the equilibrium
and 60% were in the interval between the equilibrium and 10. This leads us

to interpret this fifth distribution as capturing the choices of Level-∞ players
rebounding from the equilibrium.

Finally, the estimated mean and standard deviation of the censored dis-

tribution are respectively 0.59 and 1.91. This distribution also accounts for

choices of Level-∞ players. The proportion of censored observations in the
different groups, both for the fitted and empirical distributions, are shown

7Actually, in game-theoretical parlance, choices above 66.66 are dominated.
8Using BC data on a sample of undergraduate students Nagel (1995) and Ho et al.

(1998) calculate, respectively, a 13.1% and a 28.3% proportion of level-0 players. Using
our sample of undergraduates we obtain that the relative size of level-0 players is 57.05%.

14


Table 2: Parameter estimates of the multiple-sample mixture model

components
f0 f1 f2 f3 f4 f5 f6

µk * 35.91 33.35 22.89 14.98 7.35 0.59
σk * 9.37 0.34 2.75 3.19 3.07 1.91
Reasoning levels L-0 L-0 L-1 L-2 L-3 L-inf L-inf

proportions πkg (in % )
Lab 25.88 31.17 6.93 21.70 7.30 5.75 1.27
Classroom 17.56 18.11 14.79 18.57 12.47 9.83 8.67
Take-home 15.52 18.11 7.88 20.39 23.45 8.13 6.53
Theorist 12.93 11.66 3.20 9.49 10.89 19.51 32.31
Internet 13.74 16.36 9.25 15.01 13.77 7.60 24.26
Newspaper 15.31 15.96 8.32 15.35 14.71 14.57 15.78
Column mean 16.82 18.56 8.39 16.75 13.76 10.90 14.80
∗ uniform distribution

Table 3: The % of censoring in each group for the infinity level component

Groups Lab Classroom Take-home Theorist Internet Newspaper
Fitted % 0.75 5.07 3.82 18.90 14.19 9.23
Observed % 1.16 6.52 6.72 25.34 22.00 9.28

in Table 3. We observe that the proportion of censoring (i.e. the proportion

of choices at the limit of the interval of choices) varies across groups, with

the proportions being largest and smallest for the Theorist and Lab groups,

respectively.

The components of the mixture distribution are depicted in Figure 5,

where we show the probability density function of the various composing

distributions, with the estimated mean values of the normal distributions

displayed in the x-axis of the graph.

Table 2 also shows the estimates of the mixing proportions for each group.

According to our interpretation, the first two columns of results in Table 2,

15


1 7 15 23 33 36 60
0

0.2

0.4

0.6

0.8

1

1.2
Inf
Inf+
Third
Second
First
Zero−Normal
Zero−Unif

Figure 5: Components of the mixture distribution

16


taken together, would indicate the frequency of random, haphazard and un-

explained choices. This proportion of Level-0 players range from about 25%

among theorists to as much as close to 60% among undergraduate students.

The number of Level-1 subjects tends to stay just below 10% in all groups,

while Level-2 and Level-3 vary from 15% to 20% in most groups. Finally,

Level-∞ participants appear in larger proportions among theorists, to as
much as 51%, they consist in a fairly important chunk of newspaper readers,

up to 30%, and in a small proportion of students in the lab, about 7%.

Combining the mixing proportions for each group, as they appear in Table

2, with the components of the mixture common to all the groups, as depicted

in Figure 5, we obtain the fitted mixture distributions that are specific to

each group, as shown in Figure 6. These fitted distributions correspond to

the group-specific empirical distributions of Figure 2 and help to perceive

the variation across groups of the proportions of individuals at the different

levels of reasoning. It is remarkable that a unique set of components of the

mixture allows us to fit the data from different groups by simply changing

the mixing proportions across these groups.

An interesting feature is the increasing variance from Level-1 to Level-∞.
People who reach Level-1 choose very tightly around 33. Those reaching

Level-2 choose around 22, but not so tightly. The variance of the choice at

Level-3 is even larger and it is largest in the choices of Level-∞ individuals,
when we take the compound variance of the two distributions f5 and f6 of

Table 2. 9

9This is in contrast with Ho et al. (1998) and Stahl (1996), where variances were
constrained to follow a decreasing pattern.

17


0 20 40 60
0

0.05

0.1

0.15

0.2 Lab       

0 20 40 60
0

0.05

0.1

0.15

0.2 Classroom 

0 20 40 60
0

0.05

0.1

0.15

0.2 Take−home 

0 20 40 60
0

0.05

0.1

0.15

0.2 Theorists 

0 20 40 60
0

0.05

0.1

0.15

0.2 Internet  

0 20 40 60
0

0.05

0.1

0.15

0.2 Newspaper 

Figure 6: Fitted mixture distribution for each group

18


A plausible interpretation of this result is that as subjects take further

steps of reasoning they become more and more aware of the complexity of the

game, and assume that the rest of participants may make more and more

dispersed choices. In any case, subjects at Level-k must believe that the

dispersion of others’ choices is centered around the choice of Level-(k − 1)
players. Otherwise we would not see the sharp peaks we observe in the

empirical data. Curiously, the increasing dispersion indicates that subjects

at Level-k mistakenly believe that the dispersion of choices around Level-

(k − 1) choice is larger than what in fact is.
To conclude, it appears that the estimated location of the composing

distributions of the mixture gives empirical support to the IBRd hypothesis.

The analysis also shows that the proportions of subjects with different levels

of reasoning vary across groups.

5 Comparison with the literature

The literature on the estimation of data generated by BC experiments is

quite diverse in its use of alternative statistical procedures. In her seminal

paper on the BC, Nagel (1995) separates agents in bins centered around the

theoretical values of the iterated best replies, 50pk, where k represents the

iteration level and p the predetemined number that, when multiplied by the

mean of all choosen numbers, yields the winning number. Stahl (1996) uses a

boundedly rational learning rule assuming that, in the first period, the choice

in each level k is distributed according to a truncated normal distribution

with means specified (not estimated) at 50pk, and all variances following a

19


decreasing rule. Ho, Weigelt and Camerer (1998) specify a model in which

the mean and variance of Level-k choices are functions of the mean and

variance of choices at the previous level, so that the only parameters of the

model are the mean and variance of Level-0 choices. This highly restricted

model is then estimated by maximum likelihood.

These papers share many common features. The empirical models have

as fundamental elements the decision rules used by subjects, the calculation

errors or noise, and the beliefs about other players’ strategies or types. Al-

though some models take explicit account of errors in the individuals’ choices

(see El-Gamar and Grether (1995), or Haruvy, Stahl and Wilson (2000)),

with BC data, the hypothesis of best response to type Level-(k − 1) players
on the part of Level-k subjects provides a hierarchical model that becomes

the basic tool to describe the set of decision rules.

Recently Camerer, Ho and Chong (2003) proposed a non-degenerated

distribution of beliefs about other players choices. They assume that subjects

believe that no other player uses as many levels of reasoning as themselves and

assume also that players guess the relative proportion of other players at the

different (lower) levels of reasoning. Since the number of levels of reasoning

is an integer, Camerer, et al. (2003) argue that the Poisson distribution is a

reasonable parametric distribution of other players reasoning levels. While

this model fits well samples of data from different games, it cannot account

for the multi-peaked distribution of choices typical of BC games.

In our empirical model we also assume that individuals share a common

pattern of reasoning independently of the particular set-up of the BC ex-

periment. Our choice of distribution functions is guided by the nature of

20


the data: truncated distributions between 0 and 100, since the choice set is

constrained by these numbers, and a censored distribution to deal with the

fact that there is non-null mass probability at values 0 or 1. The uniform

distribution seems appropriate to take care of random choices.

All parameters of these distributions are estimated, and the number of

distributions is not determined in advance. This approach is in contrast

with the previous analysis just mentioned, where means and variances of a

predetermined number of distributions are constrained to follow a particular

sequence.

6 Conclusions

This paper provides a mixture distribution analysis of data obtained from

experiments on the BC game, with diverse samples of subjects. The analysis

is based on a model of censored and truncated normal distributions plus a

uniform distribution, but does not impose any further structure on the model

specification. The means and variances of the composing distributions of the

mixture are let free, to be estimated, and so are the proportions of subjects

at different levels of reasoning. Even the number of distributions involved is

not predetermined. This is in contrast with previous statistical analysis of

BC data.

A feature of our analysis is the assumption that individuals playing the

BC game share a common pattern of reasoning, independently of the specific

set-up of the experiment. However, we allow for variations across groups of

experiments in the proportion of players using different depths of reasoning.

In statistical terms this implies a unique specific composition of mixtures

21


across groups of experiments, with the mixing proportions of the components

varying across groups. It is remarkable how much variation can be accounted

for by a change in the mixing proportions. This set-up also permits the fitting

of a complex mixture model to groups with relatively small sample sizes.

We apply this mixture distribution model to data gathered from experi-

ments with newspapers readers, involving thousands of subjects in different

countries, as well as from experiments run in labs with subject pools of un-

dergraduate students, graduate students and economists. We estimate the

mean and variance of each composing distribution, as well as the mixing pro-

portions for each group of experiments. In view of the estimated locations

of the composing distributions, our results support the hypothesis that in-

dividuals reason according to Iterated Best Reply (IBRd). Our results also

show substantial variation across groups of the proportion of subjects using

different levels of reasoning.

References

Arcidiacono, A. and Jones, J. (2003), ’Finite Mixture Distributions, Sequen-

tial Likelihood and the EM Algorithm’, Econometrica, 71, 3, 933-946.

Bosch-Domènech, A., Montalvo, J. G., Nagel, R., and Satorra, A. (2002),

’One, Two, Three, Infinity, ...: Newspaper and Lab Beauty-Contest Ex-

periments ’, American Economic Review, December, Vol 92 No.5, pp

1687-1701.

Bozdogan, H. (1987). ’ Model Selection and Akaike’s Information Criterion

(AIC): The General Theory and its Analytical Extensions’, Psychome-

trika, 52, 345-370

22


Camerer, C., Ho, T., Chong, J. (2003), ’A Cognitive Hierarchy Theory of

One-Shot Games and Experimental Analysis’. Quarterly Journal of Eco-

nomics, Forthcoming.

Dempster, A. P., N.M. Laird and D.B. Rubin (1977), ’Maximum Likelihood

from Incomplete Data via de EM algorithm (With Discussion) ’, Journal

of the Royal Statistical Society B, 39, 1-38

Ho, T., Camerer, C., and Weigelt, K. (1998) ’Iterated Dominance and It-

erated Best-Response in Experimental ’P-Beauty-contests’, American

Economic Review, 88, 4, pp. 947-969.

McLachlan, G. and Peel, D. (2000), Finite Mixture Models , John Wiley &

Sons, New York.

Nagel, R. (1995) ’Unraveling in Guessing Games: An Experimental Study.’

American Economic Review, 85 (5), 1313-1326.

Stahl, D.O. (1996) ’Rule Learning in a Guessing Game.’ Games and Eco-

nomic Behavior, 16(2), pp. 303-330.

Titterington, D., Smith, A. and Makov, U. (1992), Statistical Analysis of

Finite Mixture Distributions, Wiley, New York.

23