OP-BJPS150039 1..22


UC Irvine
UC Irvine Previously Published Works

Title
Evolving to Generalize: Trading Precision for Speed

Permalink
https://escholarship.org/uc/item/2c35h52h

Journal
BRITISH JOURNAL FOR THE PHILOSOPHY OF SCIENCE, 68(2)

ISSN
0007-0882

Author
O'Connor, Cailin

Publication Date
2017-06-01

DOI
10.1093/bjps/axv038
 
Peer reviewed

eScholarship.org Powered by the California Digital Library
University of California

https://escholarship.org/uc/item/2c35h52h
https://escholarship.org
http://www.cdlib.org/


Evolving to Generalize: Trading

Precision for Speed
Cailin O’Connor

ABSTRACT

Biologists and philosophers of biology have argued that learning rules that do not lead

organisms to play evolutionarily stable strategies (ESSes) in games will not be stable and

thus will not be evolutionarily successful (Harley [1981]; Maynard-Smith [1982]). This

claim, however, stands at odds with the fact that learning generalization—a behaviour

that cannot lead to ESSes when modelled in games—is observed throughout the animal

kingdom (Mednick and Freedman [1960]). In this article, I use learning generalization to

illustrate how previous analyses of the evolution of learning have gone wrong. It has been

widely argued that the function of learning generalization is to allow for swift learning

about novel stimuli. I show that in evolutionary game theoretic models, learning gener-

alization—despite leading to sub-optimal behaviour—can indeed speed learning. I fur-

ther observe that previous analyses of the evolution of learning ignored the short-term

success of learning rules. If one drops this assumption, I argue, it can be shown that

learning generalization will be expected to evolve in these models. I also use this analysis

to show how ESS methodology can be misleading, and to reject previous justifications

about ESS play derived from analyses of learning.

1 Introduction

2 The Evolution of Learning

3 The Approximation Game

4 Learning Rules

4.1 Herrnstein reinforcement learning and generalized reinforcement

learning

4.2 Long-term success

5 Short-Term Success and Simulation

6 Evolving to Generalize

7 Conclusion

Brit. J. Phil. Sci. 0 (2015), 1–22

� The Author 2015. Published by Oxford University Press on behalf of British Society for the Philosophy of Science. All rights reserved.
For Permissions, please email: journals.permissions@oup.comdoi:10.1093/bjps/axv038

 The British Journal for the Philosophy of Science Advance Access published September 11, 2015
 at U

niversity of P
ittsburgh on S

eptem
ber 14, 2015

http://bjps.oxfordjournals.org/
D

ow
nloaded from

 
http://bjps.oxfordjournals.org/


1 Introduction

Stimulus generalization, or learning generalization, is a learning behaviour

wherein an actor conditioned to one stimulus responds in the same way to

perceptually similar stimuli.
1

This type of learning is extremely well

documented.
2

It occurs across a wide variety of test subjects—mammals,

birds, reptiles, amphibians, insects—across contexts, and across sensory mod-

alities (Mednick and Freedman [1960]; Ghirlanda and Enquist [2003]). In

evolutionary game theoretic models, however, learning generalization does

not lead to the play of what are called ‘evolutionarily stable strategies’

(ESSes). One point that theorists have generally agreed on is that learning

rules that do not lead organisms to play ESSes in games will not be stable and

thus not evolutionarily successful (Harley [1981]; Maynard-Smith [1982]).

Why this incongruity?

In this article, I will use the case of learning generalization to investigate

how previous analyses of the evolution of learning have gone wrong. I point

out that such analyses have largely ignored the short-term behaviour of learn-

ing rules. Learning generalization is standardly thought to be adaptive be-

cause it allows actors to quickly learn to respond to novel stimuli (Ghirlanda

and Enquist [2003]). In other words, it is especially useful in the short term. I

present evolutionary game theoretic models of learning generalization and

show that, indeed, generalizing can be beneficial in these models in that it

helps speed learning. Furthermore, if one considers evolutionary models of

learning where the short-term behaviour of learning rules is important, it

becomes clear that generalization can evolve. This supports the argument

that previous analyses ignoring short-term learning were misguided. These

results further inform game theory. Previous theorists used analyses of learn-

ing to argue that ESS behaviour should be seen in the real world. The work

presented here indicates that such claims are overly hasty. Furthermore, this

analysis lends credence to the idea that ESS methodology is often misleading.

The article will proceed as follows: In Section 2, I will discuss previous work

on the evolution of learning. In Section 3, I will outline the ‘approximation

game’, which appropriately models the class of scenario in which generaliza-

tion is seen. In Section 4, I describe several learning rules where actors gener-

alize to varying degrees. I go on to show that in the long run, rules that do not

generalize outperform those that do in the approximation game. In Section 5,

I present simulation results showing that despite the long-term success of non-

generalized learning, under certain parameter settings higher levels of

1
This behaviour was documented in the famous ‘Little Albert’ experiment. Watson and Rayner

([1920]) conditioned a nine-month infant to fear a white rat by frightening the child with loud

noises whenever he touched the animal. The child subsequently showed similar fear reactions to

a number of fuzzy stimuli, including a rabbit and a fur coat.
2

Thankfully not with regard to infant fear response.

Cailin O’Connor2

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


generalization can do significantly better in the short term. In Section 6, I show

that in evolutionary game theoretic models where short-term learning is im-

portant, learning generalization can evolve. I conclude by discussing how this

analysis informs game theory and evolutionary game theory.

2 The Evolution of Learning

Harley ([1981]) and Maynard-Smith ([1982]) use evolutionary game theoretic

models to show that only certain sorts of learning rules should be expected to

evolve. Without going into too much detail, these authors argue that only

learning rules that lead to play of ESSes in games should be expected to persist

in an evolutionary setting. An ESS is a strategy in a game that is robust against

invasion by other strategies because it garners high payoffs for those using it.
3

The arguments Maynard-Smith and Harley give are intuitively straightfor-

ward. Suppose that some learning rule does not lead to play of ESSes in

games. A rule that does lead to play of an ESS will provide a higher payoff

for those employing it. Then a learning rule leading to ESSes will be more

evolutionarily successful than one that does not, and will be able to invade a

population of those using a non-ESS learning rule.

This argument leads to a puzzle, however. Generalized learning cannot lead

to play of ESSes in games (as I will show in Section 4). How does the observed

ubiquity of learning generalization in the natural world square with these

results?

The work of Maynard-Smith and Harley, of course, is not the end of the

discussion of the evolution of learning. It has been pointed out by Smead

([2012]) that learning rules that take populations to ESSes have no advantage

over static behavioural rules where the actor simply adopts ESS play rather

than bothering to learn it.
4

Furthermore, most models of the evolution of

learning assume that learning will bear a greater cost than non-learning stra-

tegies (for cognitive architecture, time required to learn, and so on). This

means that non-learning strategies that adopt ESS play will actually receive

higher payoffs than rules that learn such play and so should be able to invade

these learning rules. This point seems to create a worry about learning gener-

ally. If learning rules that do not lead to ESSes are unstable, and static be-

havioural rules can invade learning rules that do lead to ESSes, there are no

stable learning rules at all (never mind ones that generalize).

3
To be specific, an evolutionary stable strategy, xi, is one such that if uðxi; xjÞ is the payoff
of strategy xi played against xj: 1) uðxi; xiÞ > uðxi; xjÞ or 2) uðxi; xiÞ¼ uðxi; xjÞ, and
uðxi; xjÞ > uðxj; xjÞ for all xj 6¼ xi .

4
Maynard-Smith ([1982]) was aware of this. Smead and Zollman ([unpublished]) find something

similar. Smead ([2015]) also argues that learning rules that lead to equilibria in many cases

should not be expected to evolve.

Evolving to Generalize 3

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


The usual response by biologists and philosophers of biology to worries of

this sort is to argue that learning rules are primarily useful in situations where

the environment exhibits some level of variability.
5

In such environments, the

argument goes, non-learning strategies get poor payoffs because the actors

cannot respond to changing payoff structures by changing action. Actors that

play an ESS in one situation, but cannot deal with changes to the environ-

ment, now do poorly against learners that reach this same ESS in the original

situation and can re-adapt when necessary.

Something is amiss here, though. The arguments forwarded by Maynard-

Smith and Harley explicitly depend on the following assumption: when model-

ling the evolution of learning one can ignore what happens in the short term. In

other words, when associating fitnesses with learning rules, these authors do not

consider payoff while the actors are learning. Instead, they look only at the

payoffs of the long-term, stable strategies developed by learners. To date, most

game theoretic work on the evolution of learning has shared this assumption.
6

But if learning is most effective in a variable environment, to the extent that it

should not be expected to evolve otherwise, this assumption is suspect. In a

variable environment, an actor will be changing strategies and so may spend a

significant amount of time playing strategies that are not stable, long-term

outcomes of the learning process. If so, short-term behaviour should be import-

ant to the evolution of learning.
7

In particular, if payoff in the short term

matters, there should be selection pressure for learning rules that work quickly.

Biologists and psychologists have argued that the function of learning gen-

eralization is to allow organisms to quickly learn to respond to novel scenarios

(Ghirlanda and Enquist [2003]). Furthermore, as mentioned, it should not

evolve according to Maynard-Smith and Harley. As such, this learning behav-

iour is an excellent case to explore whether the intuitive argument I just gave—

that short-term learning matters in an evolutionary context—is correct. In the

rest of the article, I will present evolutionary game theoretic models of learning

generalization. As I will show, when the short-term behaviour of learners is

incorporated into evolutionary models, generalization will evolve for just the

reasons that biologists and psychologists suggest. If short-term behaviour is

ignored, on the other hand, generalization will not evolve. These results indi-

cate that the intuitive argument is right, and that ignoring short term behav-

iour of learning rules can lead evolutionary analyses significantly astray.

5
See, for example, (Plotkin and Odling-Smee [1979]; Johnston [1982]; Maynard-Smith [1982];

Stephens [1991]; Godfrey-Smith [2002]; Dunlap and Stephens [2009]; Shettleworth [2009]).
6

There are some exceptions. Zollman and Smead ([2010]), for example, use interim strategies

developed by learning rules to determine the fitnesses of actors in an evolutionary model.
7

Smead ([2012]) points out something similar. Empirical observations about, for example, death

rates in young birds also confirm the important of learning speed in animals (Shettleworth

[2009]).

Cailin O’Connor4

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


3 The Approximation Game

Learning generalization occurs when an organism applies behaviour that was

successful in one scenario to a perceptually similar scenario. What this means

is that an appropriate model to explore the evolution of this phenomenon will

need to include similar scenarios for the actor to potentially generalize over. In

order to do this, I introduce the approximation game.
8

The approximation

game involves one actor and occurs in two stages. In the first stage, a state of

the world is chosen probabilistically by nature or some exogenous force. In the

second stage, the actor observes this state of nature and chooses an act. The

state/act combination then determines what sort of payoff the actor receives.

In order to model the type of scenario in which generalization evolves, the

possible states of the world are assumed to bear similarity relationships to one

another. This is done by treating these states as existing in a metric space

where distance represents similarity. For example, an approximation game

might have three states (1, 2, and 3) existing on a line. If state 1 is closer to

state 2 than to state 3, it is assumed that state 1 is more similar to state 2.
9

For each state of the world in the approximation game, it is assumed that

there is some ideal act that, should the actor choose it, will give a perfect

payoff.
10

It is also assumed that acts will receive similar payoffs in similar

states. In the previous example, in state 1 the actor would achieve a perfect

payoff by choosing act 1. But she would also obtain a good payoff for choos-

ing act 2. Her payoff for choosing act 3 would be less good. One simple way to

model this is to determine payoff using a function that takes as input the

distance between the state and the act.
11

For the purposes of this article,

unless otherwise specified it will be assumed that the actor’s payoffs are strictly

decreasing with distance between state and act.

Figure 1 shows the simplest approximation game of interest—the one

described above. The central node of the figure represents the starting point

of the game, where nature chooses a state (S1, S2, or S3). The probabilities

that each state is chosen by nature are fixed at p, q, and 1 - p - q. The three

decision nodes, labelled ‘A’ for actor, represent the possible choices of act in

each state (A1, A2, or A3). Payoffs for each state/act combination are shown

8
This model should more properly be called the ‘approximation problem’ because it is a one-

player decision problem rather than a multi-player game. Decision problems, however, are

formally identical to one-player games. For this reason, the relevant results on the evolution

of games directly bear on decision problems, and results from the problems investigated here can

be used to inform evolutionary game theory. For simplicity sake, then, I use the language of

game theory, and not decision theory, to describe the model used.
9

Note that this is similar to the sim-max game, introduced by Jäger ([2007]) to model signalling in

situations where states of the world bear similarity relations to one another.
10

For simplicity sake, acts will always be labelled by the state they are most appropriate for, that

is, act 1 will be the ideal act for state 1 and so forth.
11

This is a useful way to understand payoff in these games. It is more precise to say that a payoff is

defined for each state–act pair, and this payoff is chosen using such a function.

Evolving to Generalize 5

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


at the final nodes. It is assumed that 0 < e < � < 1 (payoff decreases strictly in

distance between state and act, but is always positive). It is also assumed that

p and q are strictly positive and p þ q < 1 (that every state is played with

positive probability).

Figure 2 shows some possible state spaces for approximation games.

Diagram (a) represents the state space of a game like the one just outlined,

that is, modelled on a line, but with four states. Diagram (b) shows a game

with a two-dimensional state space.
12

Approximation games with state spaces

of any dimensionality are possible, though this article will only consider the

simplest ones—those where states are modelled on a line. For the purposes of

this article, these spaces are best understood as representing perceptual

N

A

A

A

S1

S3S2

A3
A2

A1 A3

A2A1

A3A2

A1

1

1

1

Figure 1. A 3-state/3-act approximation game with payoffs 1, �, and 2 for dis-
tance of 0, 1, and 2 between state and act. The game begins with the central node

labelled ‘N’ for nature and continues to the three decision nodes labelled ‘A’ for

actor.

Figure 2. Two examples of state spaces for an approximation game. Diagram (a)

shows a game with four states modelled on a line. Diagram (b) shows a game with

eight states modelled in a plane.

12
Note that games with state spaces of higher dimensionality can be used to model cases where an

actor is responding to states with multiple properties varying along different dimensions.

Cailin O’Connor6

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


similarity spaces.
13

In other words, the states of the world in the game corres-

pond to perceptual states. This is a useful interpretation of the model as

learning generalization happens over perceptually similar states. It also

avoids sticky issues around how or whether external states are similar to

each other.

Most of the approximation games considered in this article will have a few

properties that bear mentioning. First, they will have considerably larger state

spaces than the game described above. The reason for this is that in real world

learning scenarios, the number of possible states of the world is often extremely

large. This is certainly true under the interpretation of the game here—that the

actor is responding to perceptual states. Consider, for example, the number of

discriminable colours picked out by the human visual system, or the number of

distinguishable smells. Furthermore, as I shall show later in the article, con-

sidering games with large state spaces is relevant for understanding why gen-

eralized learning might evolve. Second, in the games considered, payoff loss

over distance will usually be modelled with a Gaussian function. This function

is used because it is always positive and strictly decreasing. These attributes

make it particularly tractable from a modelling perspective. While this choice

may seem arbitrary, the analytic results presented are robust under choice of

function as long as it is strictly decreasing.
14

I will call the Gaussian just

described the ‘payoff Gaussian’ as it determines the degree to which an approxi-

mate match of state and act will lead to payoff for the actor.

As noted, for every state of an approximation game, there is one ideal act. A

strategy for a game defines an act in every possible state.
15

What this means is

that there is a single, optimal strategy for every approximation game in which

the actor always picks the correct act for the state. The existence of a single

optimal strategy is significant from an evolutionary standpoint. Under the

replicator dynamics, the most common model of evolutionary change in evo-

lutionary game theory, a population playing the approximation game will

evolve to take this strategy in every case. For this reason, the approximation

game would not usually be of much interest to evolutionary game theorists—it

is immediately obvious what behaviour will be adopted by a population evol-

ving to play it. However, as I will argue in the next section, an organism

learning to respond to this game, and employing generalizing learning, will

not develop the optimal strategy.

13
See (Gärdenfors [2000]) for more on such spaces. See (Krantz et al. [1971]) for how such spaces

can be built using experimental data.
14

O’Connor ([2014a]) also found that results in simulations of related signalling games were robust

under choice of function for payoff loss modelled as linear, quadratic, or decreasing in steps.
15

Again, while the term that technically should be used here is ‘choice’, because this is a one-player

problem, I use ‘strategy’ to avoid confusion. Once again, nothing hangs on this distinction.

Evolving to Generalize 7

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


4 Learning Rules

4.1 Herrnstein reinforcement learning and generalized

reinforcement learning

In evolutionary game theory, learning dynamics, unlike evolutionary dy-

namics, are taken to model the emergence of learned individual behaviours

over the course of an organism’s lifetime, rather than the emergence of evolved

population behaviours over the course of evolutionary time. Herrnstein re-

inforcement learning, first proposed by Roth and Erev ([1995]), is so named in

reference to R. J. Herrnstein’s psychological work on learning, which motiv-

ates the model (Herrnstein [1970]).
16

This learning rule has been widely used in

evolutionary game theory because (1) it is psychologically natural, that is,

based on observed learning behaviour, and (2) it makes minimal assumptions

about the cognitive abilities of the actors. This means that behaviours that

emerge under this rule can be assumed to be available to cognitively simple

animals. In this case, because generalized learning is seen in a wide variety of

animals, including those with minimal cognitive abilities (Mednick and

Freedman [1960]), Herrnstein learning is an appropriate starting place to

model it.

The basic assumption that underlies reinforcement learning rules is that

actors will be more likely to repeat successful behaviour. In other words,

they reinforce this behaviour. In a simulation of these rules, actors engage

in a game many times, at each step reinforcing successful behaviour and thus

improving their strategies. Herrnstein learning can be described using the

following analogy: In the context of the approximation game, imagine that

for each state of the world, the actor has an urn into which is placed one

coloured ball for each possible act available. In the first round of learning,

nature selects a state of the world and the actor draws a ball from the urn for

that state. The colour of the ball determines which act the actor will take. If the

act is successful, the actor returns the drawn ball to her urn and then reinforces

their tendency to take that act in that state by adding a ball (or two, or half a

ball, and so on) of the same colour to that urn. The reinforcement is propor-

tional to the success of the act, that is, the higher the success the greater the

reinforcement. For our purposes, the amount of reinforcement will always be

equal to the payoff achieved by the actor in each step of the simulation. At the

beginning of a simulation using Herrnstein learning, an actor uses all her acts

with equal probability, as they have one of each type of ball in each urn. As

play progresses and successful acts are reinforced, the actor becomes

increasingly likely to choose these acts. In the limit, the actor’s strategy

16
This learning rule is also sometimes called ‘Roth–Erev’ or ‘Vanilla’ reinforcement learning.

Cailin O’Connor8

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


may, under the right circumstances, converge to a successful one. In other

words, the actor will use this strategy with probability approaching one.
17

Generalized reinforcement learning (GRL) builds on the Herrnstein re-

inforcement learning model.
18

Under GRL rules, successful acts are rein-

forced, but they are also generalized, that is, reinforced for other, similar

states of the world. In other words, and to continue the urn analogy, when

an actor draws a coloured ball from her urn for a state and takes a successful

act, she adds balls of the same colour to that urn, but also adds balls of that

colour to the urns for similar states. It must be specified, for these rules, the

degree to which generalization occurs. How many other states are reinforced?

How much reinforcement occurs in those states? For the purposes of this art-

icle, generalization will be determined using a Gaussian function. To be clear, a

model of an approximation game evolved using GRL employs two Gaussian

functions. The payoff Gaussian, introduced above, determines the level of

payoff based on how accurate the act chosen is for the state. The second

Gaussian determines to what degree this payoff is generalized taking as

input the distance between the state of the world and the state to be reinforced.

I will call this second Gaussian the ‘reinforcement Gaussian’.
19

Figure 3

Distance between State and Act

Payo  Gaussian

Reinforcement Gaussian

Reinforcement for  
State of the World

Reinforcements for  
Other States

Figure 3. A representation of how the payoff and reinforcement Gaussians

determine reinforcement in an approximation game evolved using a GRL rule.

17
For more on this and other learning dynamics see (Huttegger and Zollman [2011]). For extensive

work on Herrnstein reinforcement learning and variations of it in signalling games (which are in

some ways similar to the approximation game), see recent work by Barrett ([2007], [2009]);

Barrett and Zollman ([2008]).
18

This learning rule was first outlined by O’Connor ([2014a]). Roth and Erev ([1995]) look at a

learning rule that incorporates a slight amount of generalization in a similar way to GRL. They

interpret this aspect of the learning rules as persistent error.
19

Ghirlanda and Enquist ([2003]) argue that generalization is best modelled in many cases by a

Gaussian function, suggesting that the choice of a Gaussian as the reinforcement function here is

a natural one. Furthermore, Shepard ([1987]) argues that the specifics of how an actor learns to

generalize may not be particularly important in determining subsequent behaviour.

Evolving to Generalize 9

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


represents the way these two functions determine reinforcement in an approxi-

mation game evolved using a GRL rule.

A model of an approximation game evolved using these learning rules will

have five relevant parameters. The first is the size of the state space of the

game. The second and third are the height and standard deviation of the

payoff Gaussian. These control the level of payoff for perfect coordination

in the approximation game (the height) and the degree to which an actor

receives payoff for imperfect action in the game (the standard deviation).

The fourth parameter is the standard deviation of the reinforcement

Gaussian.
20

Variations of this parameter correspond to GRL rules with dif-

ferent degrees of generalization. In models where Herrnstein reinforcement

learning is used, this parameter will not apply. It can be noted, though, that

Herrnstein learning is a limiting case of GRL as the width of the reinforcement

Gaussian approaches zero. The fifth relevant parameter will be the length of

trial for simulations of these models. This parameter will control the number

of times the actor plays the approximation game and updates her strategies.

4.2 Long-term success

One way to explore the evolution of generalized learning is to compare learn-

ing rules with different levels of generalization, like GRL and Herrnstein re-

inforcement learning, to see if high levels of generalization can outperform

lower levels in these models. One method for doing this is to consider conver-

gence outcomes of the models just described. When this is done, however, it

becomes clear that in the long term, Herrnstein reinforcement learning can

always outperform GRL in the approximation game.

Laslier et al. ([2001]) show that a single actor employing Herrnstein learning

in a stationary environment, that is, where payoffs remain constant, in the

long run will always learn to play the act that receives the highest expected

payoff.
21

This result can be applied to each state in the approximation game.

To do so requires that each state be a stationary environment, which is the

case given that the payoffs in the approximation game do not change. It also

requires that each state be selected infinitely often as the length of learning

goes to infinity, which is also the case as each state in the approximation game

has a strictly positive probability. Thus these results indicate that in the long

run, for each state in the approximation game, the act of an agent employing

Herrnstein reinforcement learning will converge to the optimal one. For the

entire game, then, the strategy of the actor will converge to the optimal strat-

egy. In the long run, the actor will take the perfect act in every state in the

20
The height of the reinforcement Gaussian is determined by the level of payoff.

21
In other words, as the learning time goes to infinity, the probability with which the actor chooses

non-optimal acts goes to 0.

Cailin O’Connor10

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


approximation game if using Herrnstein learning. This result holds for an

approximation game of any finite size.

What happens to the strategy of an actor using a GRL rule in the approxi-

mation game in the long run? Unlike Herrnstein learning, GRL rules will not

converge to the optimal strategy and, in fact, the level of generalization will

determine a bound of accuracy that a player will not be able to surpass. This

bound of accuracy will in turn determine a bound on the payoff success an

actor can achieve. The intuitive reason for this is that if an actor were able to

converge to the perfect act in one state, she would simultaneously prevent

convergence in neighbouring states by generalizing the same act to them.

One can show this by solving for the consistent, limiting probabilities of acts

for a model of the approximation game evolved using a GRL rule. This is done

by finding the distribution of reinforcements in a game where the probability

of an act being selected in one round of simulation is equal to the probability

of it being selected in the next round. Consider a toy model of the approxi-

mation game with two states and two acts. Suppose that in each state the

payoff for the perfect act is 2 and for the other act is 1. Assume that states of

the world are equiprobable.
22

This game is pictured in Figure 4, which should

be read like Figure 1. Also consider a simple form of GRL where successful

acts are reinforced in the state of the world by the amount of the payoff and in

the other state by that amount multiplied by � where 0 � � � 1. In this simple

model, � determines the level of generalization. A high � means that success

will lead to strong generalization in the other state of the world; a low � will

mean that generalization is weak. If � is equal to 0.1, the consistent, limiting

probabilities of this game are such that the actor selects the more successful act

N

A A

2112

S1 S2

A2A1 A2A1

Figure 4. A 2 state/2 act approximation game with payoffs 2 and 1 for distance of

0 and 1 between state and act. The game begins with the central node labelled ‘N’

for nature and continues to the two decision nodes labelled ‘A’ for actor.

22
This degenerate approximation game is not generally an interesting one as it is formally the same

as a game with no similarity structure over the payoffs. It is useful, however, as a simple case to

consider GRL.

Evolving to Generalize 11

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


in each state with probability 5/6 and the other act with probability 1/6. It is

possible (though increasingly difficult) to calculate such limiting probabilities

for larger games and more complex generalization rules.

One can further explore this phenomenon through simulation. It is easy to

show what happens in this toy model at the two bounds of �. If one sets �¼0,

the learning rule is the same as Herrnstein learning and so converges to perfect

behaviour. If one sets �¼1, the actor fully generalizes. In other words, if she

reinforces act 1 in state 1 by 0.43, she will also reinforce act 1 in state 2 by 0.43,

and so on. This complete generalization of success means that reinforcement

levels for the actors will always be identical in the two states of the world.

Because actors will not be able to learn to condition their acts on which state

has been selected, every attainable strategy (those where the probability for

each act is the same in both states) will get an expected payoff of 1.5, the same

as choosing by chance.

For intermediate levels of �, simulations of the toy model show that the

actor eventually reaches a level of accuracy, and thus success, that is bounded

by the level of generalization. The lower the generalization, the greater the

success. In Figure 5, success rates are shown for a simulation of this game for �

ranging from 0 to 0.3 and � equal to 1. In each case, success is calculated by

dividing the expected payoff for the actor given her learned strategy by the

perfect possible expected payoff (which, in this case, is two).

Success ¼
expected payoff given learned strategy

perfect possible expected payoff
:

1 2 3 4 5 6 7
Length of Trial0.70

0.75

0.80

0.85

0.90

0.95

1.00
Success

Toy Model Success Rates

1
0.3
0.2
0.1
0
α

Figure 5. Success levels for a 2 state/2 act approximation game with various levels

of generalization (�). The y-axis tracks success and the x-axis represents of length
of the trial where each value x is 10x runs.

Cailin O’Connor12

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


Each line represents the success rate of a simulation over time for a different

level of generalization. Darker lines represent lower levels of generalization.

Rates were averaged over fifty runs of simulation. As should be clear from

Figure 5, for each level of generalization, the success of the simulation reaches

some upper bound and stays there. Note that time is presented logarithmic-

ally. The reason for this bound on success has already been laid out. When the

actor generalizes, success in one state means that an act will be taken with

greater probability in other states where it is less successful.

The results from these toy models can be extended to larger approximation

games, since in every larger game, reinforcement in neighbouring states will

prevent convergence in the same way as it does in a two state model.
23

Thus

these results indicate that in the approximation game, over the long run, low

levels of generalization will outperform high levels of generalization from a

payoff perspective, and in particular Herrnstein learning will outperform any

GRL rule. The single optimal strategy provides the highest possible level of

payoff in the game, and so learning to use any other strategy will be strictly

worse. Importantly, the optimal strategy in an approximation game is always

the unique ESS. Therefore, GRL is unable to learn ESSes in this game, while

Herrnstein is guaranteed to do so.

Furthermore, although this analysis only addresses approximation games,

it may be extended to some other games, including ones with multiple players.

O’Connor ([2014a]) obtained similar simulation results in sim-max games,

which are a variation on the Lewis signalling game where the state space

has the same similarity structure as the approximation game. Unlike approxi-

mation games and sim-max games, most games do not have several possible

states and so it is not possible to evolve them using GRL. For those games that

do, though, if an actor generalizes over states she will only be able to achieve

optimal behaviour if the acts so generalized are ideal for all the states they are

generalized to. Otherwise, generalized learning will lead to reinforcement of

sub-ideal acts and thus to sub-optimal behaviour, preventing play of ESSes.

5 Short-Term Success and Simulation

As I will outline in this section, there is a tension that can arise between the two

desiderata a learning rule should meet—working quickly and developing be-

haviour that obtains the highest possible payoff.
24

While low generalization

23
To see why this is the case, consider two states of any larger approximation game. Use the

reinforcement Gaussian for this larger game to define � as above (the proportion of reinforce-
ment on a neighbouring state). It has been shown that this smaller system cannot reach an

optimal strategy and so the larger system it is a part of cannot either.
24

This has been widely observed in other fields. It has been argued in psychology that ‘fast and

frugal’ decision heuristics, which allow actors to make decent strategies quickly and easily, are

adaptive, despite the possibility that they lead to irrational or sub-optimal behaviour

Evolving to Generalize 13

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


learning outperforms high generalization learning eventually, the very prop-

erty that prevents high generalization rules from approaching optimal behav-

iour is the one that allows them to outperform low generalization rules in the

short term. I will illustrate this argument using simulation results showing that

in trials of the approximation game, high levels of generalization can outper-

form low levels under certain parameter settings. In particular, high general-

ization does best when states of the world are numerous, when trials are short,

and when the payoff Gaussian (modelling how accurate an actor must be to

get a good payoff) is wide. This result confirms intuitive arguments about the

benefits of learning generalizations.

All the results presented in this section were generated using models where

payoff and reinforcement were calculated with Gaussian functions. Each trial

of a parameter setting was run fifty times and reported results are averages of

these. The parameters that varied were the size of the state space, the length of

the trial, the standard deviation of the reinforcement Gaussian, and the stand-

ard deviation of the payoff Gaussian.
25

The state spaces considered were of

size 100, 200, 300, 400, and 500. The lengths of trial were 1000, 10,000,

100,000, and 1 million runs. The reinforcement Gaussian standard deviations

were 5, 10, 15, 20, and none (Herrnstein learning). And lastly, the payoff

Gaussian standard deviations were 1, 5, 10, 15, and 20.

Figure 6 shows the success rates (calculated as they were in the previous

section) for one set of these trials—those where the payoff Gaussian had a

standard deviation of 10. The x-axis of the figure represents the length of trial

(ranging from 1000 runs to 1 million). The z-axis tracks the size of the state

space (from 100 to 500), and the y-axis tracks average success of the trials.

Each surface shown represents results for one reinforcement width parameter

setting. In other words, each surface corresponds to one learning rule and

these rules vary with respect to generalization. The black surface represents

the highest levels of generalization (a reinforcement Gaussian with a standard

deviation of 20) and successively lighter surfaces represent lower and lower

levels of generalization.

As is evident in the figure, each level of generalization considered outper-

forms the others for some region of parameter space. The rule with the highest

level of generalization (black) outperforms the others in the area of parameter

(Gigerenzer and Selten [2001]; Gigerenzer and Gaissmaier [2011]). Generalized learning can be

thought of as a learning rule that leads to making decent, if sometimes inaccurate, decisions

quickly. In machine learning, much work has been done on learning models that generalize from

limited input to make predictions in novel scenarios. Similar trade-offs between speed and

accuracy are found in these models (Hastie et al. [2005]).
25

Height of the payoff Gaussian was always 2.

Cailin O’Connor14

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


space where trials are short and the number of states of the world is large.

Herrnstein learning (the lightest surface) performs best in the longest trials

and when states of the world are fewer. These results should not be surprising.

In a short trial with many states of the world, there is not enough time for the

actor to learn ideal actions in each state, so a learning rule that allows success

to be generalized does better. When an actor has a long time to learn,

more precise strategies can be developed using low generalization rules and

so these do better. Similar results were obtained for the other payoff Gaussian

values with the slight difference that in games where approximate action

was successful (wide payoff Gaussians), higher generalization could per-

form better. In extreme cases of games with very narrow payoff Gaussians,

approximate actions do not receive a good payoff. Generalization thus

does not help the actor in this case, because only precise strategies will be

successful.

Real world learners do not use learning strategies that exactly mimic those

used in the models here. In order to strengthen these results, I investigated

their robustness across learning rules. Under reinforcement learning with pun-

ishment, actors reinforce successful acts for the state of the world (and for

similar states under the generalized version), and simultaneously punish, or

Success Rates for the Approximation Game

1000

10,000

100,000

1 million

Length of Trial

100

200

300

400

500

Number of States

0.0

0.5

1.0

Success

Figure 6. Average success levels for various parameter settings for an approxima-

tion game with a payoff Gaussian of standard deviation 10 evolved using GRL

and Herrnstein reinforcement learning. Results are averaged over fifty runs of each

setting.

Evolving to Generalize 15

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


decrease the reinforcement level, for that act in other states.
26

The results of

simulations for these rules were highly similar to those presented in the last

section. I also explored a learning rule outlined by Barrett ([2014]), which I call

‘Barrett learning’. This rule is in some ways similar to adjustable reference

point with truncation learning introduced by Bereby-Meyer and Erev ([1998]).

Actors using this rule discount past experience compared to more recent ex-

perience. Results were, again, very similar to those presented in this section.

It should be noted that the results presented in this section are not particu-

larly surprising given previous results from machine learning, and previous

observations from psychology and biology about the benefits of generaliza-

tion. As we will see in the next section, however, generating similar results in

an evolutionary game theoretic model is useful in that is allows us to discuss

the motivating problem presented in Section 2: why do previous evolutionary

game theoretic analyses of learning predict that rules like GRL should be

unable to evolve if generalization is so ubiquitous?

6 Evolving to Generalize

At this point it has been established that high generalization learning can

perform well in the approximation game when time is limited and states are

numerous, despite the fact that only non-generalized learning leads to optimal

behaviour. How, it will now be asked, do these results inform the evolution of

learning generalization?

The larger question at hand, remember, is whether or not it is problematic

to assume that the short-term behaviour of learning rules does not matter in

evolutionary analyses. In order to assess this using the case of learning gen-

eralization, let us consider an evolutionary model where the environment for

the actor changes regularly, meaning that speed of learning may be evolution-

arily relevant. The replicator dynamics are the most commonly used model of

the evolutionary process in evolutionary game theory and will be employed

here. These dynamics assume that actors using strategies that receive higher

payoffs will replicate more successfully than actors using strategies that re-

ceive lower payoffs.
27

In populations modelled under these dynamics, high

payoff strategies tend to proliferate. In the approximation game in particular,

because there is only one player, the learning rule that will evolve under the

replicator dynamics is simply the one that gets the best payoff.

26
There is experimental evidence supporting the use of rules where actors punish or forget stra-

tegies, that is, they sometimes decrement their reinforcements. See (Bereby-Meyer and Erev

[1998]), for example.
27

The replicator equation determines how proportions of strategies in a population change under

the replicator dynamics. This equation states that xi
:
¼ xiðfiðxÞ�

Xn
j¼1

fjðxÞxjÞ where xi is the

proportion of a population playing strategy i, fiðxÞ is the fitness of type i in the population state x

and
Xn

j¼1
fjðxÞxj is the average population fitness in this state.

Cailin O’Connor16

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


Consider a model where a population of actors learns to play an approxi-

mation game using either Herrnstein learning or various GRL rules. One can

think of the actors’ strategies as now consisting in which learning rules to

adopt. The payoffs associated with each learning rule will be the expected

payoffs for the behavioural strategies that these various learning rules develop

in simulation. Now suppose that at regular intervals, the population encoun-

ters a new approximation game (one where the actors encounter new states

and must associate them with new actions). If these intervals of learning are

short enough, under the replicator dynamics this population will evolve to use

a GRL rule rather than Herrnstein learning. This is the case because, as shown

in the previous section, generalizing rules will lead to higher payoffs for the

actors over a short timescale. And, as pointed out, for an approximation game

the replicator dynamics will always select whichever behaviour receives the

best payoff. To give an example, suppose that actors in the population play

approximation games with 100 states, and that they switch games every 1000

rounds. If the initial population contains the learning rules considered in the

last section (Herrnstein learning and GRL with reinforcement Gaussians of

widths 5, 10, 15, and 20), GRL with a reinforcement Gaussian of width 10 will

evolve. In other words, when the environment varies, generalization can

evolve.

One might worry that in the model just described actors begin their learning

processes anew when the environment changes rather than having to forget

currently developed actions. To alleviate this worry, I also considered models

of populations in changing environments where actors must forget previously

learned strategies when the world changes. I found that under a wide range of

parameter settings, generalization evolved.
28

Furthermore, there is a feature of learning situations that I have not dis-

cussed yet that makes generalization relatively more important and more suc-

cessful in real world scenarios with numerous states. In the approximation

game, every possible state of the world has its own ideal act. In reality though,

for highly similar states it will often be appropriate for an organism to take the

same act, in which case generalization will be more effective than the models

here predict (Shettleworth [2009]). To further elucidate this claim, consider a

scenario where a bird is learning to interact with blackberries. Imagine a

model of this scenario. The state space of this model would have hundreds

(thousands?) of states varying along multiple dimensions of perceptual

space—smell, size, colour, shape, and so on—but the birds would only have

two available acts—eat and not-eat. Generalization, in this case, will only lead

to sub-optimal behaviour for states right at the boundary between edible and

28
These results are not presented here as the description of these models is lengthy and the results

are unsurprising.

Evolving to Generalize 17

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


inedible berries. For all the other possible states, generalizing will be com-

pletely successful. In models of this scenario, Herrnstein learning will still lead

to ESS play while GRL will not, but the benefits of Herrnstein learning are

only relevant for a small proportion of states, while GRL provides more sig-

nificant benefits for most of the state space. In other words, the window of

time during which GRL is a more successful learning rule is longer, making it

more problematic to ignore short-term learning behaviour in evaluating the

evolution of generalization.

In the evolutionary models presented above, the learning rule that evolves

strictly outperforms the other learning rules from a payoff perspective, and so

satisfies the definition of an ESS (if one treats a choice of learning rule as a

choice of strategy). It would be strange to say that GRL rules are evolution-

arily stable, though. In principle, given this set-up, any learning rule (like

GRL) that has not gotten to the optimal outcome in the short time period

could be outperformed, and so invaded, by a learning rule that does better in

that same time period.
29

This, however, does not really matter. The point is

not that a particular rule for generalization will be stable, but rather that this

type of stability analysis ignores some of the most evolutionarily relevant

features of learning rules, in this case a need for speed. Maynard-Smith

([1982]) and Harley ([1981]) are not wrong in thinking that there should be

selection pressure for rules that learn ESSes, just wrong in thinking that this is

the only, or the most important type, of selection pressuring bearing on

learners.

7 Conclusion

To conclude, I will discuss how the results of this article inform game theory

and evolutionary game theory; but first, a word should be said about the

proposed interpretation of the state spaces of approximation games.

I pointed out in Section 2 that these state spaces should be thought of as

perceptual rather than external because generalization happens over percep-

tually similar states. Given that similarity is built into the approximation game

29
In fact, the real world behaviour of learning discrimination points towards a possibility for such

an improved rule. Previous investigations into animal learning indicate that when it is relevant

from a payoff perspective for organisms to discriminate between states, they learn to do so

(Mackintosh [1974]). In fact, generalization and discrimination can be seen as two sides of a

coin: the former allows animals to extend successful behaviours to possibly relevant scenarios,

the second allows animals to trim these behaviours back if they are not applicable (Shettleworth

[2009]). This combination of behaviours could be modelled with a learning rule that combines

the best aspects of GRL and Herrnstein reinforcement learning. Learners begin by generalizing,

but eventually stop generalizing and develop more precise strategies. In fact, the models de-

veloped here help illuminate why learning discrimination is important: it helps organisms avoid

sub-optimal behaviours developed when generalizing and can allow actors to move closer to

ESSes.

Cailin O’Connor18

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


through the payoff structure, however, this interpretation assumes that per-

ceptually similar states will always get similar payoffs when responded to with

similar actions. At first consideration, this assumption may seem problematic.

It should be noted, though, that perceptual similarity structures themselves

evolve. O’Connor ([2014b]) argues that in models of the evolution of percep-

tual categorization, real-world states that actors can respond to in similar

ways evolve to be perceptually similar. If this is right, it may be reasonable

to assume that perceptual similarity (usually) tracks payoff similarity. This

line of thinking points to a way in which the exploration of generalization in

this article is incomplete, though. Generalization happens over perceptual

states, and will only be successful if the similarity structure of these perceptual

states is arranged so that perceptually similar things can be reacted to simi-

larly. In this way, the evolution of generalization arguably cannot be fully

understood without also understanding the evolution of perceptual similarity.

I will now return to how this exploration of the evolution of learning gen-

eralization informs evolutionary game theory. First, and most importantly,

the assumption that the short-term performance of learning rules can be

ignored in evolutionary analyses is a bad one. This assumption is inconsistent

with other assumptions made about the evolution of learning, in particular,

that learning should be expected to evolve in variable environments. It is an

assumption that matters because, as shown here, when the short-term success

of learning rules is taken into account, evolutionary outcomes are significantly

impacted. And, as this article shows, if this assumption is maintained, evolu-

tionary game theoretic models are unable to account for the evolution of

generalization. When the assumption is dropped, on the other hand, evolu-

tionary game theoretic models can account for this highly successful real

world learning behaviour. As such, this case illustrates how the long-term

learning assumption is not just intuitively suspect, but can actually lead an

evolutionary analysis significantly astray.

Past investigations into the evolution of learning rules have been used to

justify assumptions about equilibrium play in game theory (see, for example,

Maynard-Smith [1982]). The results here indicate that a better understanding

of the evolution of learning does not support this justification. Although there

should be selection pressure for learning rules to reach ESSes, there should

also be selection pressure for rules that learn quickly. When these desiderata

are at odds, as is the case with learning generalization, non-equilibrium be-

haviour should be expected in the real world. Even if real world actors even-

tually learn to discriminate between relevantly different states, and so mitigate

the sub-optimal effects of generalization, while learning progresses (which

should be a non-trivial proportion of the time if actors face heterogenous

environments), non-equilibrium and thus non-optimal behaviour should be

expected.

Evolving to Generalize 19

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


In recent years, the tradition of depending on ESS methodology in evolu-

tionary analyses has come under fire. The results presented here are one more

example of a case where a dynamical investigation reveals important insights

into evolutionary processes that ESS analysis misses. As discussed, simply

identifying which learning rules are evolutionarily stable in the sense that

they lead to ESSes misses important differences between the processes that

actors employing these rules undergo, and thus misses evolutionarily relevant

information. This analysis thus gives further reason to be very careful when

applying ESS methodology to complicated evolutionary scenarios.

Acknowledgements

Many thanks to Simon Huttegger, Michael McBride, Louis Narens, Kyle

Stanford, Brian Skyrms, Elliott Wagner, and James Weatherall for comments

on this work. Thanks to helpful audiences at ISHPSSB 2013, the Winter Q-Bio

conference 2014, and the ABMP conference 2014, as well as at the Center for

Philosophy of Science at the University of Pittsburgh. Special thanks to Rory

Smead for his help at all stages of this project.

Department of Logic and Philosophy of Science

University of California

Irvine, CA 92697, USA

cailino@uci.edu

References

Barrett, J. A. [2007]: ‘Dynamic Partioning and the Conventionality of Kinds’,

Philosophy of Science, 74, pp. 527–46.

Barrett, J. A. [2009]: ‘The Evolution of Coding in Signaling Games’, Theory and

Decision, 67, pp. 223–37.

Barrett, J. A. [2014]: ‘Description and the Problem of Priors’, Erkenntnis, 79, pp. 1343–

53.

Barrett, J. A. and Zollman, K. [2008]: ‘The Role of Forgetting in the Evolution and

Learning of Language’, Journal of Experimental and Theoretical Artificial

Intelligence, 21, pp. 293–309.

Bereby-Meyer, Y. and Erev, I. [1998]: ‘On Learning to Become a Successful Loser: A

Comparison of Alternative Abstractions of Learning Processes in the Loss Domain’,

Journal of Mathematical Psychology, 42, pp. 266–86.

Dunlap, A. S. and Stephens, D. W. [2009]: ‘Components of Change in the Evolution of

Learning and Unlearned Preference’, Proceedings of the Royal Society, 276, pp.

3201–8.

Gärdenfors, P. [2000]: Conceptual Spaces: On the Geometry of Space, Cambridge, MA:

MIT Press.

Cailin O’Connor20

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


Ghirlanda, S. and Enquist, M. [2003]: ‘A Century of Generalization’, Animal

Behaviour, 66, pp. 15–36.

Gigerenzer, G. and Gaissmaier, W. [2011]: ‘Heuristic Decision Making’, Annual Review

of Psychology, 62, pp. 451–82.

Gigerenzer, G. and Selten, R. [2001]: Bounded Rationality: The Adaptive Toolbox,

Cambridge, MA: MIT Press.

Godfrey-Smith, P. [2002]: ‘Environmental Complexity and the Evolution of

Cognition’, in R. Sternberg and J. Kaufman (eds.), The Evolution of Intelligence,

Mahwah: Lawrence Erlbaum, pp. 233–49.

Harley, C. B. [1981]: ‘Learning the Evolutionary Stable Strategy’, Journal of Theoretical

Biology, 89, pp. 611–33.

Hastie, T., Tibshirani, R. and Friedman, J. [2005]: ‘The Elements of Statistical

Learning: Data Mining, Inference, and Prediction’, The Mathematical

Intelligencer, 27, pp. 83–5.

Herrnstein, R. [1970]: ‘On the Law of Effect’, Journal of the Experimental Analysis of

Behavior, 13, pp. 243–66.

Huttegger, S. M. and Zollman, K. J. S. [2011]: ‘Signaling Games: Dynamics of

Evolution and Learning’, in Language, Games, and Evolution, Berlin Heidelberg:

Springer-Verlag, pp. 160–76.

Jäger, G. [2007]: ‘The Evolution of Convex Categories’, Linguistics and Philosophy, 30,

pp. 551–64.

Johnston, T. D. [1982]: ‘The Selective Costs and Benefits of Learning: An Evolutionary

Analysis’, in J. S. Rosenblatt (ed.), Advances in the Study of Behavior, Volume 12,

New York: Academic Press.

Krantz, D. H., Luce, R. D., Suppes, P. and Tversky, A. [1971]: Foundations of

Measurements, Volume 1: Additive and Polynomial Representations, Mineola, NY:

Dover Publications.

Laslier, J. F., Topol, R. and Walliser, B. [2001]: ‘A Behavioral Learning Process in

Games’, Games and Economic Behavior, 37, pp. 340–66.

Mackintosh, N. J. [1974]: The Psychology of Animal Learning, Oxford: Academic Press.

Maynard-Smith, J. [1982]: Evolution and the Theory of Games, Cambridge: Cambridge

University Press.

Mednick, S. A. and Freedman, J. L. [1960]: ‘Stimulus Generalization’, Psychological

Bulletin, 57, pp. 169–200.

O’Connor, C. [2014a]: ‘The Evolution of Vagueness’, Erkenntnis, 79, pp. 707–27.

O’Connor, C. [2014b]: ‘Evolving Perceptual Categories’, Philosophy of Science, 81,

pp. 840–51.

Plotkin, H. C. and Odling-Smee, F. J. [1979]: ‘Learning, Change, and

Evolution: An Enquiry into the Teleonomy of Learning’, in J. S. Rosenblatt

(ed.), Advances in the Study of Behavior, Volume 10, New York: Academic Press.

Roth, A. E. and Erev, I. [1995]: ‘Learning in Extensive-Form Games: Experimental

Data and Simple Dynamic Models in the Intermediate Term’, Games and Economic

Behavior, 8, pp. 164–212.

Shepard, R. N. [1987]: ‘Toward a Universal Law of Generalization for Psychological

Space’, Science, 237, pp. 1317–23.

Evolving to Generalize 21

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/


Shettleworth, S. J. [2009]: Cognition, Evolution, and Behavior, Oxford: Oxford

University Press.

Smead, R. [2012]: ‘Game Theoretic Equilibria and the Evolution of Learning’, Journal

of Experimental and Theoretical Artificial Intelligence, 24, pp. 301–13.

Smead, R. [2015]: ‘The Role of Social Interaction in the Evolution of Learning’, British

Journal for the Philosophy of Science, 66, pp. 161–80.

Smead, R. and Zollman, K. [unpublished]: ‘The Stability of Strategic Plasticity’.

Stephens, D. W. [1991]: ‘Change, Regularity, and Value in the Evolution of Animal

Learning’, Behavioral Ecology, 2, pp. 77–89.

Watson, J. and Rayner, R. [1920]: ‘Conditioned Emotional Reactions’, Journal of

Experimental Psychology, 3, pp. 1–14.

Zollman, K. and Smead, R. [2010]: ‘Plasticity and Language: An Example of the

Baldwin Effect?’, Philosophical Studies, 147, pp. 7–21.

Cailin O’Connor22

 at U
niversity of P

ittsburgh on S
eptem

ber 14, 2015
http://bjps.oxfordjournals.org/

D
ow

nloaded from
 

http://bjps.oxfordjournals.org/