Exploiting Functional Relationships in Musical

Composition

Amy K. Hoover and Kenneth O. Stanley
Evolutionary Complexity Research Group

School of Electrical Engineering and Computer Science
University of Central Florida
Orlando, FL 32816-2362 USA

{ahoover, kstanley}@eecs.ucf.edu
http://eplex.cs.ucf.edu/neatmusic

To appear in:
Connection Science Special Issue on Music, Brain, & Cognition,

Abington, UK: Taylor & Francis, 21:2, 227-251, June 2009.

Abstract

The ability of gifted composers such as Mozart to create complex mul-
tipart musical compositions with relative ease suggests a highly efficient
mechanism for generating multiple parts simultaneously. Computational
models of human music composition can potentially shed light on how
such rapid creativity is possible. This paper proposes such a model based
on the idea that the multiple threads of a song are temporal patterns that
are functionally related, which means that one instrument’s sequence is
a function of another’s. This idea is implemented in a program called
NEAT Drummer that interactively evolves a type of artificial neural net-
work (ANN) called a Compositional Pattern Producing Network (CPPN),
which represents the functional relationship between the instruments and
drums. The main result is that richly textured drum tracks that tightly
follow the structure of the original song are easily generated because of
their functional relationship to it.

Keywords: compositional pattern producing networks; CPPNs; computer-
generated music; interactive evolutionary computation; IEC; NeuroEvo-
lution of Augmenting Topologies

1 Introduction

A most intriguing capability of human composers is that they can often quickly
conceive multiple instrumental parts simultaneously during the creative process.

1


For example, Mozart could hear complex multipart pieces form “in his head,”
suggesting a powerful creative mechanism for generating accompaniment (Rup-
pel, 1998; Deutsch, 1965; Hymer, 1990). Relatedly, rock guitarists in jam ses-
sions and jazz musicians can improvise together while simultaneously perfectly
respecting the interdependencies of their separate parts (Barrett, 1998; Berliner,
1994; Katz and Longden, 1983; Oliver, 2006; Schuller, 1968; Weick, 1998).

Thus although intuition may suggest that complex interdependent construc-
tions should require care and labor to devise, in fact such constructions in music
appear almost effortless. Thus it is likely that no explicit serial reasoning is in-
volved in the creative construction of accompanying instrumental tracks. What
kind of mechanism then is responsible for such a capability?

This paper suggests a possible high-level answer to this question. The key
idea is that different instrumental parts are functionally related, which means
that one can be expressed as a function of another. Furthermore, although we
may perceive the interplay between two or more simultaneous instruments as
rich and complex, in fact the function that relates one to the other can be quite
simple. Thus, in this view, once a single track such as a melody is created,
it can serve as a scaffold, i.e. an existing support structure, upon which other
tracks are generated. In this way, while composers may seem to improvise entire
harmonies and drum tracks one note at a time, fundamentally they need only
construct a simple function for each part that transforms the scaffold.

In fact, because the scaffold already in effect embodies the intrinsic con-
tours and complexities of the song, any transformation of the scaffold inherits
these features and thereby embodies the same thematic elements automatically.
Thus the space of possible transforming functions is highly forgiving, in part
explaining why improvising accompaniment can appear effortless. As long as
the accompaniment is expressed as a function of the scaffold, it is difficult to go
significantly wrong.

While this idea of functions relating one pattern to another is difficult
(though not necessarily impossible) to confirm at a neurological level, it does
suggest a promising model for computer-generated music. This paper describes
the implementation of such a model and presents its results. The goal is to gen-
erate drum tracks to accompany existing songs. Because rhythm is simpler than
melody or harmony, rhythm generation is an appealing stepping stone to full
blown harmonization. It effectively highlights the advantages of the functional
perspective in clear and simple terms that do not require musical expertise to
appreciate.

Formally, an appealing drum pattern for a particular piece over time can be
described as a function of time, f (t). However, a good pattern for a particular
drum may be highly complex with respect to t, making its discovery prohibitive.
Yet given an existing part p(t) (i.e. the scaffold) that varies over time, it is likely
significantly easier to discover the pattern g(p(t)) rather than f (t) even though
they produce the same pattern. In effect, p makes discovering the accompanying
pattern easy because it provides the scaffold, thereby allowing the composer to
focus only on devising the much simpler g(p(t)).

This idea is implemented in this paper in a program called NEAT Drummer,

2


which automatically generates drum tracks for existing songs. It accepts existing
human compositions as input to a type of artificial neural network (ANN) called
a Compositional Pattern Producing Network (CPPN; Stanley 2007) and outputs
drum patterns to accompany the instruments. The inputs to NEAT Drummer
are specific parts of a Musical Instrument Digital Interface (MIDI) file (e.g. the
lead guitar, bass guitar, and vocals) and the outputs are drum tracks that are
played along with the original MIDI file. That way, outputs are a function of
the original MIDI file inputs, forcing synchronization with the MIDI.

To take into account the user’s own inclinations, NEAT Drummer allows
the user to interactively evolve rhythms from an initial population of drum
tracks with the NeuroEvolution of Augmenting Topologies (NEAT) algorithm
(Stanley and Miikkulainen, 2002, 2004), which can evolve increasingly complex
CPPN-encoded patterns.

The main results are drum tracks for existing songs that tightly follow the
contours and idiosyncrasies of individual pieces, yet elaborate and elucidate
them in creative and unexpected ways. Even when major transitions occur,
because the drum tracks are a function of the music, the drums perfectly follow
the transitions.

This functional model of musical composition is then further extended to
allow human users to add their own functional influences to create variational
motifs outside the confines of the provided song. For example, users can provide
a monotonically increasing function (i.e. time), which suggests change over
time even if the underlying scaffold is repetitive. The result is that the drum
track can be made to vary exactly as the user requests, while still seamlessly
interweaving with the song. These user-provided functions are called conductors
in a loose analogy with orchestral conductors, who describe functional contours
with their hands that the orchestra follows. The conductors further highlight
the simplicity and relative ease of creating subtle overlapping textures through
simple functional relationships.

To highlight the importance of scaffolding and conductors, several variants
of NEAT Drummer without such facilities are compared with NEAT Drummer
with its full functionality intact. The result is that the consequent capabilities
are significantly impoverished, demonstrating the critical role that scaffolding
plays in generating accompaniment.

While the main contribution is a powerful new approach to computer-aided
musical creativity, the high-level implication for improvisational accompaniment
provides an intriguing clue to how such mechanisms may work in the brain.

The next section provides background for the approach introduced in this
paper. This approach is then described in Section 3 and the experimental design
is explained in Section 4. Results are disclosed in Sections 5 and 6, and discussed
in Section 7.

3


2 Background

This section first explains interactive evolutionary computation (IEC), which
is part of the NEAT Drummer approach, and its application to computer-
generated music. The section concludes with a review of the NEAT method.

2.1 Interactive Evolutionary Computation

NEAT Drummer refines its original drum patterns through a process called
interactive evolutionary computation (IEC), which means a human, rather than
a predefined fitness function, selects the parents of the next generation (Takagi,
2001). IEC implementations typically generate a random initial population.
The user then selects the most fit individuals from that population to reproduce,
resulting in increasingly complex individuals.

IEC addresses the problem that objective evaluation is difficult in aesthetic
or subjective domains such as art and music. By shifting the burden of evalua-
tion to the human, the need to formalize subjective quality is avoided. Richard
Dawkins first popularized IEC with Biomorphs, a visual representation of arti-
ficial organisms designed to illustrate evolutionary principles (Dawkins, 1986).
Biomorphs inspired a proliferation of programs tackling design problems from
tool creation (Sato and Hagiwara, 1999) and suspension bridges (Furuta et al.,
1995) to education, teaching, and story composition (Kuriyama and Terano,
1997).

Because it can harness subjective preferences, a major application of IEC
is art. The power of this approach is evident in visual domains like the L-
system-encoded Mutator (Lindenmayer, 1968; Todd and Latham, 1992), Karl
Sims’ genetic art (Sims, 1991), and Picbreeder (Secretan et al., 2008), a website
where users evolve, save, and publish images. Picbreeder evolves its images with
NEAT (Section 2.3), the same evolutionary algorithm used by NEAT Drummer.

IEC has also branched into musical evolution, such as the Biomorphs-inspired
Sonomorphs (Nelson, 1993).The next section reviews several such approaches to
computer-generated music, which are often based on IEC.

2.2 Evolutionary Computer Generated Music

The idea that computers might be able to compose music has inspired a diversity
of approaches. While this section focuses mainly on evolutionary approaches, a
broad review of the area can be found in de Mantaras and Arcos (2002). Often
computer generated music utilizes IEC to leverage the subjective capabilities
of average human subjects while avoiding the need for musical expertise. For
example, among the first IEC music applications is Sonomorphs (Nelson, 1993),
which encodes rhythms as bit strings in which a note is either on or off. Direct
representation of this type, wherein each note is represented by a single gene,
does not attempt to model how humans encode music neurologically. However,
the creative evolutionary process is a metaphor for human composition through
variation and refinement.

4


Listeners often feel that computer-generated music sounds artificial and
uninspired. Music generators tend to either evolve a solution to fit a partic-
ular a priori style or improvise pieces that often lack a global structure that
holds together the entire song (McCormack, 2005; Husbands et al., 2007).

It is common for music generators, such as Sonomorphs, CONGA (Tokui
and Iba, 2000), GP-Music System (Johanson and Poli, 1998), and the GA-
based IEC composition system of Onisawa et al. (Onisawa et al., 2000) to focus
on composing short phrases rather than on entire songs (Nelson, 1993; Biles,
2007). These short phrases, which are selected and evolved by the user, may be
extended through looping or manual juxtaposition, but the overall structure of
the song is not itself generated.

A notable example of computational improvisation is GenJam (Biles, 1999),
which composes music in the style of jazz in cooperation with a human musi-
cian. GenJam listens to human improvisations and interprets and genetically
modifies the notes. GenJam can also evolve a soloist that is independent of any
particular jazz composition by first training from human input. It integrates
its improvisations seamlessly into a musical stream by prescripting when in the
song improvisation may occur. In this way, it preserves the overall musical
structure provided by the human, although it does not innovate at the level of
global structure on its own.

Early connectionist approaches, which also emphasize short phrases, repre-
sent change over time through recurrence (Todd and Loy, 1991). Recurrence
means that a network can represent a time series through a pattern of cycling
activation. Todd and Loy (Todd and Loy, 1991) first applied recurrent ANNs
to music generation by training them to reproduce patterns with Real-Time
Recurrent Learning (RTRL). Chen and Miikkulainen (Chen and Miikkulainen,
2001) later combined this recurrent learning approach with evolution based on
the idea that a simple recurrent network (SRN) can capture a general style of
music and then vary it through evolution. This approach succeeded in produc-
ing melodies in the style of Bela Bartok on a local level; however, even with
recurrence it is difficult to capture global structure.

NEAT Drummer approaches the problem of global structure by generating
its rhythms from already-existing instrumental parts that span entire songs,
thereby diminishing the need to represent patterns over time through recurrence.
The next section describes the NEAT method that implements evolution in
NEAT Drummer.

2.3 NeuroEvolution of Augmenting Topologies (NEAT)
and CPPNs

NEAT Drummer follows the idea in prior connectionist approaches that an
ANN can effectively represent music. Therefore, a method is needed to allow
the user to evolve ANNs. The NEAT method, described in this section, is
chosen for this purpose in NEAT drummer because it allows ANNs to increase
in complexity over generations. In particular, NEAT Drummer evolves a neural-
based encoding of drum patterns.

5


The NEAT method was originally developed to solve difficult control and
sequential decision tasks. The ANNs evolved with NEAT control agents that
select actions based on their sensory inputs. While previous methods that
evolved ANNs, i.e. neuroevolution methods, evolved either fixed topology net-
works (Gomez and Miikkulainen, 1999; Saravanan and Fogel, 1995), or arbitrary
random-topology networks (Yao, 1999), NEAT is notable for beginning evolu-
tion with a population of small, simple networks and complexifying the network
topology over generations into diverse species, leading to increasingly sophisti-
cated behavior. This section briefly reviews the NEAT method; Stanley and
Miikkulainen (2002, 2004) provide complete introductions.

NEAT is based on three key principles. First, to allow ANN structures to in-
crease in complexity over generations, a method is needed to keep track of which
gene is which; otherwise, it is not clear in later generations which individual is
compatible with which, or how their genes should be combined to produce off-
spring. NEAT solves this problem by assigning a unique historical marking to
every new piece of network structure that appears through a structural muta-
tion. The historical marking is a number assigned to each gene corresponding to
its order of appearance over the course of evolution. The numbers are inherited
during crossover unchanged, and allow NEAT to perform crossover without the
need for expensive topological analysis.

Second, NEAT traditionally speciates the population based on topological
similarity so that individuals compete primarily within their own niches instead
of with the population at large, which protects topological innovations. The
historical markings allow structures to be compared for this purpose. However,
because the user performs selection in interactive evolution instead of the evo-
lutionary algorithm itself, speciation is not applicable in NEAT Drummer and
therefore not utilized. Note that in Section 6, variants of regular non-interactive
NEAT are compared to NEAT Drummer, and these variants therefore do im-
plement speciation.

Third, unlike other systems that evolve network topologies and weights (Yao,
1999), NEAT begins with a population of simple networks with no hidden nodes.
New structure is introduced incrementally as structural mutations occur, and
only those structures survive that are found to be useful through fitness evalua-
tions. This way, NEAT searches through a minimal number of weight dimensions
and finds the appropriate complexity level for the problem. NEAT Drummer
lets the user evolve patterns of increasing complexity through this approach.

Finally, in NEAT Drummer, NEAT evolves a kind of ANN called a Compo-
sitional Pattern Producing Network (CPPN), which is designed to compactly
represent patterns with regularities, such as pictures and songs (Stanley, 2006,
2007). What distinguishes CPPNs from ANNs is that in addition to traditional
sigmoid functions, CPPN hidden nodes can include several classes of functions,
including periodic functions (like sine) for repetition and symmetric functions
(like Gaussian) for symmetry. An individual network can contain a heteroge-
neous set of functions in its nodes, which are evolved along with the weights.

To demonstrate the capabilities of such networks, Stanley (2006, 2007) showed
how simple canonical functions can be composed to create an overall network

6


Figure 1: CPPN Inputs and Outputs The user selects both a set of inputs
from among the channels in the MIDI file and a set of outputs corresponding
to specific drums.

that produces patterns with complex regularities and symmetries. Each com-
ponent function creates a novel geometric coordinate frame within which other
functions can reside. The idea in NEAT Drummer is that this representation
allows drum tracks with regular patterns to be discovered quickly and easily.

The next section explains how CPPNs are evolved in NEAT Drummer to
produce rhythms.

3 The NEAT Drummer Approach

The main idea in NEAT Drummer is that the temporal patterns of the instru-
mental parts of a song can be inherited by the drums by making the drums
a function of the other instruments. This section begins by explaining how
CPPNs encode rhythm and then details how they are evolved interactively.

3.1 CPPN Rhythm Generation

NEAT Drummer begins by generating an initial set of original drum tracks for
a provided song. To initiate this first generation, the user must first specify the
inputs and outputs of the CPPN (figure 1) through a graphical user interface
(GUI) provided by the program (figure 2).

The inputs are individual instrumental tracks from the chosen song and the
outputs are a set of drums that together produce the entire drum accompani-
ment.

From the inputs the CPPN derives its original patterns, which are therefore
functions of the original song (i.e. the scaffold) and its structure. In other words,
NEAT Drummer generates a rhythm that is a function of these inputs. Thus, it
is important to choose instruments that play salient motifs in the song so that
the drum pattern can be derived from the richest structures available. Further
texture can be achieved by inputting more than one MIDI channel, e.g. bass
and guitar.

Thus the user selects any combination of channels representing individual
instrumental parts from a MIDI file to be input into the CPPN. In this way,

7


Figure 2: NEAT Drummer Screenshot NEAT drummer presents an IEC
interface where visual representations of drum patterns help the user to de-
cide whether to listen to each candidate and then select their favorites. This
approach, i.e. choosing inputs and outputs and selecting favorites, is designed
to allow users to evolve compelling drum tracks without the need for musical
expertise.

8


NEAT Drummer generates rhythms from any MIDI file.
The user also chooses the percussion instruments that will play the rhythm.

Each such instrument is represented by a single output on the CPPN. For ex-
ample, one output may be a bass drum, one a snare, and the final a hi-hat. Any
number of drums, and hence any number of outputs, are permissible.

To produce the initial patterns, a set of random initial CPPNs with a min-
imal initial topology (following the NEAT approach) and the chosen inputs
and outputs are generated. The number of inputs in these initial topologies
corresponds to the number of instrument channels in the scaffold (e.g. guitar,
bass, etc.) plus a bias node. The relationship between the initial topology and
the original song is thus established through these inputs, which feed informa-
tion from the scaffold directly into the network. The number of outputs equals
the number of drums in the drum ensemble. The initial minimal topology has
random connectivity yet always contains exactly one hidden node. This single
hidden node ensures that initial patterns sound more interesting than percep-
trons, but are still relatively simple. Note that the internal topology is thus
unrelated to the scaffold except insofar as it is affected by the number of inputs.
Thus the apparent “knowledge” of the provided song in the pattern output by
the network is entirely a result of computing a function of the scaffold.

NEAT Drummer then inputs the selected channels into the CPPN over the
course of the song in sequential order and records the consequent outputs, which
represent drums being struck. Specifically, from time t = 0 to t = l, where l is
the length of the song, the inputs are provided and outputs of the CPPN are
sampled at discrete subintervals (i.e. ticks) up to l.

Individual notes input into the CPPN from selected MIDI channels are rep-
resented over time as spikes that begin high and decrease (i.e. decay) linearly
(figure 3). The period of decay is equivalent to the duration of the note. That
way, the CPPN “hears” the timing information from the supplied channels while
in effect ignoring pitch, which is unnecessary to appreciate rhythm. By allowing
the spikes to decay over their duration, each note becomes a kind of temporal
coordinate frame. That is, the CPPN in effect knows at any time where it is
within the duration of a note by observing the stage of its decay. That in-
formation allows it to create drum patterns that vary over the course of each
note.

Interestingly, it is potentially useful also to input temporal patterns that are
not part of the song itself. Such patterns can provide additional structure to the
drums by situating them within coordinate frames that describe how the user
wants the song to vary at a meta-level. For example, inputting a simple linear
function of time that indicates the position-in-song at each tick (figure 4a) in
addition to the instrument channels means that the output is a function of both
the song itself and the position-in-song. That way, the CPPN can produce a
drum track that shifts gradually from one motif to another over the course of
the song.

Similarly, by inputting position-in-measure (figure 4b) or position-in-beat
(figure 4c), the user can bias the output towards progressions across each mea-
sure or beat.

9


Figure 3: Channel Input Encoding. Regardless of the instrument, each note
in a sequence in any channel input to the CPPN is encoded as a spike that decays
over the duration of note. The pattern depicted in this figure shows how quarter
notes decay faster than half notes, thereby conveying timing information to the
CPPN, which samples this pattern at discrete ticks. The variable-intensity
row of boxes under the spikes depicts the intensity of the spike sampled at
discrete time steps (i.e. four per quarter note). The intensity at each timestep
is represented by the darkness in its respective column, which indicates how the
input channel “sounds” to the CPPN at that moment.

In this paper, these additional inputs are called conductors to make a metaphor
with the silent patterns expressed to an orchestra by its conductor. Additional
inputs that represent desired hidden contours beyond the pattern of the in-
struments themselves give the user an unprecedented level of control over the
nuances of the global output pattern.

In fact, any arbitrary sequence can be input as a conductor, which in effect
simply means a set of note spikes that are never actually heard. Thus the
pattern in figure 3, while introduced as an instrumental pattern, could also be
a complex conductor pattern that suggests a particular underlying motif that
the drums should elaborate. Note that in NEAT Drummer, by convention,
conductor inputs that represent time are spikes that start low and attack, which
conveys the idea of a timing signal, as opposed to notes from scaffolding inputs,
which are decaying spikes.

Unlike CPPN inputs, the level of each CPPN output is interpreted as the
volume (i.e. strength) of each drum strike. That way, NEAT Drummer can
produce highly nuanced effects through varying softness. Two consecutive drum
strikes one tick after another are interpreted as two separate drum strikes (as
opposed to one long strike). To produce a pause between strikes, the CPPN
must output an inaudible value for some number of intervening ticks. Because
the CPPN has one output for each drum, the end result of activating the network
over t ticks is a drum sequence for each drum in the ensemble.

An interesting aspect of this representation is that it does not make explicit
use of recurrent connections. While recurrent networks are often noted for their
ability to encode temporal patterns (Dolson, 1989; Todd and Latham, 1999;

10


(a) Position-in-Song

(b) Position-in-Measure

(c) Position-in-Beat

Figure 4: Potential NEAT Drummer Conductor Inputs. Each figure
depicts four measures of a conductor, which is a temporal coordinate frame
optionally provided by the user to provide additional structure to the song. The
simplest conductor (a) represents the current position in the song, suggesting
a smooth transition across the entire song. Position-in-measure (b) allows the
CPPN to know at every moment where it is within the current measure, which
allows it to improvise patterns within measures and “understand” the measure
structure of the song. Similarly, the time within each four-tick beat can be
input as well (c). Conductors offer the user a subtle yet powerful means to
influence the overall structure of the rhythm without the need for note-by-note
specification.

11


Chen and Miikkulainen, 2001), it is easier to simply express music as a function
of an existing temporal pattern (i.e. the melody and harmony) and thereby affix
one pattern to another without needing to learn the temporal synchronization
itself. Thus, while recurrence is well suited to temporal problems in which the
inputs are not known a priori, because music is deterministic, recurrence is
unnecessary; because the inputs are always the same, the outputs can simply be
expressed as a function of the inputs. Thus the CPPN can potentially represent
that function without recurrence.

NEAT Drummer generates each of the individuals in the initial population
with the same set of inputs and outputs. However, the initial CPPN weights and
activation functions for each member of the population are decided randomly. In
particular, every input is connected to every output with a randomized weight
within [ − 2, 2]. The activation function of each node is assigned randomly from
among the following options: sigmoid, binary threshold, Gaussian, linear, mul-
tiplication, and sine. To encourage interesting patterns in the initial generation,
a single hidden node with a random activation function is also connected into
the network by splitting a randomly chosen connection. Each song is divided
into ticks (four per beat). At each tick, the vector of note spike values at that
discrete moment of time for all the instruments is input. The CPPN is fully ac-
tivated and the value of each drum output is recorded so that all the generated
drum tracks can be visualized or played instantaneously to facilitate interactive
evolution, as explained in the next section.

3.2 Drum Pattern Interactive Evolution

As shown in figure 2, NEAT Drummer displays the set of candidate patterns
visually after they are generated.

It is important to note that unlike in many evolutionary experiments, pat-
terns in the initial generation already sound appropriate. This initial high qual-
ity underscores the contribution of the scaffold (i.e. the existing tracks) to the
structure of the generated patterns. Thus many appealing patterns already exist
in the first generation, demonstrating how quickly appropriate accompaniment
can be generated as a function of the source channels. In this way, a major
contribution of this research is in showing how rich context can be leveraged by
a connectionist system to successfully constrain output to appropriate patterns.

The aim of evolution is thus to elaborate on such patterns. The user can
choose to listen to any displayed pattern. When listening, the user can listen
to the drum track alone or the drum track with its associated song. The visual
presentation allows the user to quickly identify unappealing patterns without
wasting time listening to them (e.g. wherein the bass is hit over and over again
without pause).

Then either the user rates the individual patterns from which NEAT Drum-
mer chooses parents or the user selects a single parent of the next generation.
Further rounds of selecting and breeding continue until the user is satisfied.
In this way, drum tracks evolve interactively. Because of complexification in
NEAT, they can become increasingly elaborate as evolution progresses.

12


To encourage rapid elaboration over generations, the probability of adding
a connection or node in NEAT was 90%. While this high probability would
be deleterious in typical NEAT experiments (Stanley and Miikkulainen, 2002,
2004), because drum tracks tend to follow song structure, this domain supports
adding structure quickly. The mutation power, i.e. the maximum magnitude
of weight change, was 0.1 and the probability of mutating the weight of an
individual connection was 90%.

Finally, it is also important to note that in principle, the idea of representing
musical structure in a connectionist system through scaffolding and conductors
could be combined with a different evolutionary algorithm, or even a different
training mechanism. Thus while NEAT is a robust algorithm from which to
demonstrate the power of scaffolding, the benefit of the scaffolding approach is
likely compatible with other connectionist training approaches as well.

3.3 Musical Instrument Digital Interface

NEAT Drummer reads its input channels from Musical Instrument Digital In-
terface (MIDI) files. Standard MIDI format (SMF) is the most common MIDI
filetype. SMF format includes any number of tracks, each of which contains a
sequence of instrumental events that occur in up to 16 channels. Each channel
contains events that tell a particular instrument when and how loudly to play.
According to the specification, most of the instrument sounds can occur in any
of the 16 channels with the exception of percussion, for which channel 10 is
reserved.

NEAT Drummer can input any combination of the 16 channels into the
CPPN. That is, given a MIDI song, NEAT Drummer generates a drum pattern
as a function of any subset of the preexisting bass, guitar, vocals, etc. The
resulting drum patterns are all explicitly functions of the inputs, so if part of
a MIDI is input to the ANN, the percussion follows the structure of that part.
In this way, NEAT Drummer can generate percussion for MIDI songs based on
any subset of the preexisting instrument parts.

4 Experimental Design

This paper includes two sets of experiments. The first set focuses on the ca-
pability of the scaffolding approach to generate drum tracks. The second set
compares several other approaches with the scaffolding approach, both interac-
tive and supervised, to provide an objective validation of the methodology.

Also, because music appreciation is largely subjective and auditory, the re-
sults of NEAT Drummer should be judged in part on that basis. Therefore,
MIDI files for every experiment reported in this paper are available online at
http://eplex.cs.ucf.edu/neatmusic/. We invite readers to listen to the
recordings and judge the natural quality of the percussion tracks discussed in
Sections 5 and 6.

13


4.1 Testing Scaffolding

The first set of experiments aim to determine whether drum tracks generated
for particular songs are appropriate and nontrivial. The hope is that they
respect the structure and transitions of the song yet do not mimic its instruments
superficially. Such sophisticated correspondence can confirm the capacity of
functional relationships to generate plausible, human-like accompaniment.

Specifically, the first two experiments in this set investigate what happens
when salient instrument channels are input alone to the CPPN, which generates
drum tracks for the folk songs Johnny Cope and Oh! Susanna. A follow-on
experiment explores the consequences of inputting both instrument channels and
conductors for the folk song Oh! Dem Golden Slippers. The question is whether
the conductors add a dimension of variation that is seamlessly combined with the
structure of the original song in the resultant drum track. A complex conductor
is then input by itself into a CPPN to isolate its effects and easily discern the
functional relationship between the conductor and its outputs.

4.2 Comparisons

The second set of experiments are designed to scrutinize the power of scaffolding
via input from the original song by attempting to achieve comparable output
drum tracks without providing the original song as input. The aim is to illustrate
the contribution of such scaffolding by investigating how other approaches fare
without it. To control specifically for the contribution of the scaffold, each such
attempt is still a variant of NEAT. That way, differences in performance are
attributable to representation and scaffolding.

In this spirit, first, ten 30-generation attempts are made to interactively
evolve accompanying drums to Johnny Cope with NEAT without the drum
channels from the song input into the network. Instead, in the first five at-
tempts, the network is recurrent and inputs only a bias. These attempts com-
pare the capabilities of a recurrent network without any scaffolding to those of
the scaffolded networks. In the last five attempts, the network is feedforward
and provided only position-in-song as input. Typical best results are presented.

Second, three target-based experiments form a more objective comparison.
In these target-based runs, the aim is to reproduce a specific drum track that
was previously evolved with NEAT Drummer (i.e. with the scaffold provided)
as an accompaniment to Johnny Cope. This drum track is set as the target for
the target-based experiments, which do not have access to the scaffold.

Target-based runs rely on the same NEAT algorithm as NEAT Drummer;
however, the computer performs selection instead of a human user. Selection
is performed as in regular NEAT, wherein each individual in the population is
assigned a fitness based on the sum-squared error between the target pattern
and the attempted output:

f =

√∑t=l
t=0 M

2 − ∑t=nt=0 (xt − yt)
2

lM 2
, (1)

14


where M is the maximum possible error at any tick t, l is the number of ticks,
xt is the target note value at tick t, and yt is the output value of the network at
tick t. Note that if there are multiple output tracks, this expression is applied to
each and the fitness is the average. This fitness function is designed to approach
1.0 the better the output matches the target.

The main question is how hard it will be for NEAT to evolve the very same
rhythm that it evolved with the scaffold. Three alternative representations are
tested in this way:

• recurrent neural networks with only a bias input,
• feedforward networks with only a position-in-song conductor input, and
• feedforward networks with both a position-in-song conductor and a position-

in-measure conductor input.

In these target-based experiments, NEAT is run with typical successful
parameter settings for regular non-interactive evolution (Stanley and Miikku-
lainen, 2002, 2004). In particular, the population size was 100 and probability
of adding a connection or node in NEAT was 3% and 5%, respectively. The
mutation power, i.e. the maximum magnitude of weight change, was 0.1 and
the probability of mutating an individual connection was 80%. The compatibil-
ity coefficients for determining to which species individuals belong (Stanley and
Miikkulainen, 2002) were c1 = 1.0, c2 = 1.0, and c3 = 0.4. The compatibility
threshold Ct was adjusted dynamically in increments of 0.5 to maintain a stable
equilibrium of eight species.

If it turns out that any of these variants can evolve the target drum track, it
will show that the scaffold is not necessary to provide a context. On the other
hand, if none of the representation can evolve the target, it shows the critical
contribution of the scaffold.

In summary, experimental results are divided into two parts: First, the
power of scaffolding is tested through interactive evolution; second, scaffolding
is compared to several variants of NEAT Drummer without scaffolding. The
next section details the results of evolving interactively with the scaffold.

5 Scaffolding Results

While NEAT Drummer can theoretically input a drum channel from a MIDI
file and thereby generate variations of the percussion, this section focuses on
drum tracks generated from inputting non-percussion instruments, like guitars
and bass. Thus the MIDI songs input in this section do not include drums in
their original form.

Results in this section are reported through figures that are designed to
demonstrate the relationship between the CPPN inputs and outputs as the
song progresses over time. In the figures that follow, the inputs are arranged in
rows at the bottom of each depiction and the outputs are the rows above. Time

15


moves from left to right and each discrete column represents a tick of the clock.
No instrument can play at a rate faster than the clock ticks. There are four ticks
per beat in all songs tested. A slightly thicker dividing line between columns
denotes a measure break. While all drum tracks include bass, snare, and hi-hat
outputs, the number and types of drum outputs is unlimited in principle as long
as the right sounds are available.

Recall that inputs are spikes; in the figures, their decays are depicted as
decreasing darkness. In contrast, outputs represent volumes, wherein darker
shading indicates higher volume. The main difference between inputs and out-
puts is that a single note in the input may straddle several columns during its
decay. Outputs on the other hand are played as separate notes for every solid
column. For an output drum to last for more than a single tick before the next
drum attack, it must be followed by white (empty) columns.

5.1 Inputting Instrument Channels Alone

Figure 5 shows individuals from generations one and 11 generated for the folk
song Johnny Cope. The relationship between the the bass, hi-hat, and snare
and the three input channels is complex because each drum is related to all
three inputs. Note however that the instrumental patterns in measures three
and four are highly related though not identical. Slight differences exist between
the piano pattern in measure three and measure four; this difference is reflected
in the snare in both generations one and 11, which both slightly differ between
the early parts of measures three and four. Thus, the drum pattern’s subtle
variation is correlated to the music because of their coupling, which evokes a
subjective sense of appropriate style.

At measure 23, the song changes sharply by eliminating the piano part.
Consequently, the CPPN outputs also diverge from their previous consistent
motifs. This strongly coupled divergence that is carried both in the tune and in
the drums creates a sense of purposeful coordination, again yielding a natural,
sophisticated feel. In this way, the functional relationship represented by the
CPPN causes the drums to follow the contours of the music seamlessly.

Generation 11, which evolved 12 additional connections and six additional
nodes, reacts particularly strongly to the elimination of the piano by significantly
altering its overall pattern. In generation one, the shift is less dramatic, showing
how the user interactively pushed evolution towards a sharper transition over
those ten generations. Generation 11 also elaborates on the snare, making it
harder-hitting in the later measures than in earlier ones.

Results from generation 25 of Oh! Susanna are shown in figure 6. NEAT
Drummer produces similarly natural and style-appropriate rhythms for this song
as well, suggesting its generality. Because style is inherent in the original song’s
channels, it transfers seamlessly to the drum track without any need for explicit
stylistic rules. The result is an entertaining sound that could be attached to the
original instrumental tracks without raising suspicion.

It is interesting to listen to the songs with their generated drum tracks,
which makes it possible to judge their subjective quality (a critical aspect of

16


Figure 5: Johnny Cope Results. Results are depicted from two different
generations at two different parts of the song. The inputs from the original song,
which are always the same, are shown at bottom. Note the relationship between
the inputs and the outputs, and between the first generation and the eleventh,
which elaborates on the former. The motif in measures three and four is typical
of the first part of the song until measure 23, when it switches to a different
motif in both generations. Thus, the figure gives a sense of the two predominant
drum riffs exhibited in both generations. The main conclusion is that the output
is a function of the input that inherits its underlying style and character. (These
tracks are available at http://eplex.cs.ucf.edu/neatmusic/)

17


Figure 6: Oh! Susanna Outputs. This pattern from measures three through
six of Oh! Susanna is from generation 25 of evolution. The network evolved 15
hidden nodes and 41 connections. Near the end of measure four is a particu-
larly improvisational riff in the snare that transitions to measure five. This riff
is caused by variation in the piano and other inputs at the same time. As with
Johnny Cope, the drum pattern sounds natural and styled correctly for this up-
beat song. (This track is available at http://eplex.cs.ucf.edu/neatmusic/)

musical appreciation). In the authors’ experience (which the reader can also
judge), the generated tracks sound natural and lack the usual “mechanical”
quality of computer-generated music. Rather than repeating stock patterns,
core motifs subtly vary and are interspersed with occasional unique flourishes.
The personality of these variations is a byproduct of the personality that is
implicit in the song itself, simply functionally transformed into a different local
motif. This result further demonstrates that it is possible to inherit the natural
character of one pattern by deriving another from it. Of course, the evolved
song also in part reflects the tastes of the human user.

5.2 Inputting Instrument Channels and Conductors

Figure 7 highlights the effect of a conductor input on drum tracks produced
for the song Oh! Dem Golden Slippers, which has a very similar beginning and
end; all the measures in these parts are similar. Thus the question is whether a
conductor can introduce a sense of progression into the drums even though the
song itself undergoes little discernable progression between the start and finish.

Figure 7(a) shows example drum output for this song without any additional
conductor. Thus, with only the song’s channels as inputs, the resultant drum
pattern is also highly repetitive; the pattern in measures two and three is largely
preserved much later in measures 38 and 39 (figure 7a). Yet when a position-in-
song conductor (figure 4a) is added as an input, the difference in drum patterns
between measures two and three and measures 38 and 39 is dramatic, show-
ing the powerful effect of the simple conductor (figure 7b). Nevertheless, even

18


(a) Without Conductor

(b) With Position-in-song Conductor

(c) With Position-in-song and Position-in-measure Conductors

Figure 7: Oh! Dem Golden Slippers with and without Conductors
Output drum patterns are shown for the song in one case when no conductor is
input (a) and in the other where a position-in-song conductor is input (b, shown
at bottom). The difference in resultant drum patterns shows that the conductor
imposes a temporal progression on the drum track that does not derive from
the structure of the song itself, demonstrating the power of conductors to subtly
shape the structure of music. Finally, when conductors indicating both position-
in-song and position-in-measure are input simultaneously (c), progression is
enhanced both throughout the song and within each measure. (These tracks is
available at http://eplex.cs.ucf.edu/neatmusic/)

19


Figure 8: Complex Conductor. The conductor, which follows the pattern
quarter quarter half, is shown at bottom. Two three-part drum tracks that are
functions this conductor are shown above it. While both drum tracks are dif-
ferent, they are also both constrained by the underlying motif of the conductor.
(These tracks are available at http://eplex.cs.ucf.edu/neatmusic/)

though the drum pattern exhibits a sharply different motif at these two similar
parts of the song, it sounds appropriate and sophisticated in both parts because
it is a function of both the conductor and the instrument channels. Thus it is a
seamless variation on both influences simultaneously.

It is also possible to combine multiple conductors to affect the structure of
the output in more than one way. Figure 7(c) shows the impact of inputting both
the time in the song (figure 4a) and the time in the measure (figure 4b) together.
The result is that not only do the later drum patterns differ from the earlier ones,
but the interior of each measure varies in part independently of the instrumental
scaffold. This effect is subtle because there are five instrumental tracks already
influencing the pattern in each measure. However, closely comparing the drum
measure patterns in 7(c) to 7(a) and 7(b) does reveal a discernable difference.

Finally, figure 8 isolates the effect of a single complex conductor. The aim
is to show explicitly how the output of the CPPN is influenced by the incoming
conductor, which expresses the same quarter quarter half pattern as in figure
3. Thus, the outputs of two CPPNs that both take the same conductor are
displayed for comparison. The main result is that the patterns of the two three-
part tracks are both closely tied to the pattern of the conductor, wherein two
short events are always followed by a long one. Yet, within that framework, the
patterns nevertheless differ significantly, illustrating the idea that a conductor is
an implicit guide above which the pattern is realized, even if there is no explicit
song at all.

The next section exhibits the evolved CPPNs that produce the drum tracks
in this section.

20


5.3 Evolved CPPNs

Figure 9 shows the CPPNs that were evolved for each of the evolved drum
tracks in Sections 5.1 and 5.2. These networks range in complexity from 1 to 15
hidden nodes. Interestingly, the subjective quality of the accompaniment does
not seem to correlate to the complexity of the network. This perception makes
sense because the functional relationship to the original instrument channels
guarantees a tight coordination between drums and instruments. Thus creating
a plausible coordination does not require significant complexity. Furthermore,
if the underlying instrument channels themselves embody complex motifs and
progressions, then the drums inherit the same complexity even if the CPPN
that relates them is not itself complex.

What CPPN complexity affords, rather, is a more complex relationship that
is realized through more elaborate covariation. This subjective effect is subtle
yet perceptible, suggesting that more sophisticated compositions may suggest
to the human ear the complexity of the function relating their parts.

Yet the most important conclusion is that complexity is not essential to the
CPPN that relates one part to another because the complexity need only exist
originally in the preexisting parts. To a large extent, that original complexity
is transferred through the CPPN to any affiliated drum pattern.

The next section presents the results from the comparative experiments.

6 Comparison Results

While the results in Section 5 establish the quality of the tracks produced
through scaffolding and the power of conductors to shape the output pattern,
the question remains what is lost if the scaffold is not provided, as in prior au-
tomated music generation techniques. Can similar accompaniment be produced
without a scaffold? This section validates the role of the scaffold by answering
this question.

Typical results reported in this section can be heard at
http://eplex.cs.ucf.edu/neatmusic/.

6.1 Interactive Evolution Without Scaffolding

As described in Section 4.2, in the first set of comparisons, recurrent ANNs
with only a bias input and feedforward ANNs with position-in-song as input
were evolved interactively with no other inputs to accompany Johnny Cope.
Five 30-generation runs of both configurations were completed.

Figure 10 shows typical best results from these runs. The best results from
each set reveal a distinct difference between feedforward functions of position-in-
song and evolved recurrent networks. The feedforward patterns never develop
beyond monotonous unbroken gradients that gradually vary from loud to soft,
and sometimes back again (figure 10a). These gradients often span the length of
the song, or a large extent of it, and do not respect the measure structure. Thus,
overall, position-in-song alone is not enough to allow interactive evolution to

21


(a) Johnny Cope Gen. 1 (b) Johnny Cope Gen. 11 (c) Oh! Susanna

(d) Oh! Dem Golden without Conductors (e) Oh! Dem Golden with the Time Conduc-
tor

(f) Oh! Dem Golden with Measure and Time (g) Quarter Quarter
Half Conductor

(h) Quarter Quarter
Half Conductor

Figure 9: CPPN Drum Track Generators. The evolved CPPNs that pro-
duce every drum track shown in Sections 5.1 and 5.2 are depicted. While the
complexities vary, the quality of the output is similar because each produces a
function of a preexisting song, thereby inheriting its qualities. Activation func-
tions are denoted by S for sigmoid, M for multiplication, G for Gaussian, and
L for linear.

22


(a) Typical Best Feedforward

(b) Typical Best Recurrent

Figure 10: Typical Best Results from Interactive Evolution Without
Scaffolding. Best results from both types of representations tested are de-
picted. The feedforward network only inputs position-in-song and produces
unremarkable gradient patterns that do not follow the structure of music nor
the Johnny Cope song. The recurrent network produces repeating motifs, but
they are not synchronized with the measure and they do not vary with the
song. In this way, removing the scaffold removes a major advantage of NEAT
Drummer.

compete with evolving networks with a better scaffold. This result makes sense
because the only input is the position in the song, so the only way to develop a
significant number of changes is to add many hidden nodes, which would take
far longer than 30 generations. Also, because CPPNs have no knowledge of
when measures begin or end, the changes that do occur are difficult to evolve
to align with the measure structure.

In contrast, patterns interactively evolved with recurrent networks do display
more complexity and more frequent changes over time (figure 10b). Because
feedback can lead to complex oscillations without the need for many hidden
nodes, recurrent networks are better suited to producing complex variation early
in evolution. However, the drum patterns are difficult to evolve to align with
the contours of the music itself because the recurrent network is unaware of the
music without the scaffold. Furthermore, these networks suffer from the same
problem with measure structure as the feedforward networks: Even though

23


some motifs repeat, they repeat at haphazard times relative to each measure,
producing a disorganized aesthetic. For example, the bass drum in figure 10(b)
is hit several times at the start of the first measure, but by the fourth measure,
this motif has moved to the middle of the measure.

The overall result is that while the recurrent networks produce more com-
plexity, the patterns evolved by both networks are not synchronized with Johnny
Cope and therefore sound disjointed, highlighting the critical role of the scaffold
in tethering the accompaniment to the song itself.

6.2 Targeted Evolution Without Scaffolding

In this experiment, the same two types of networks as in the previous section,
i.e. one recurrent and one feedforward with position-in-song input, were evolved
to match a target (figure 11c), which is a rhythm for Johnny Cope output by
NEAT Drummer with the scaffold in the 11th generation. Clearly, the scaffold
provides an advantage, but the question addressed by this experiment is how
hard it is without the scaffold to approximate the same output even when the
precise target rhythm is known a priori. Because it is also known that the target
rhythm took exactly 11 generations to evolve when the scaffold was provided,
the number of generations it takes these variant networks to produce the same
output can be compared. Each variant attempted to match the target in 20
separate runs.

Figure 11 shows typical best results from these runs after 1, 600 NEAT gen-
erations with a population size of 100. It turns out that the feedforward network
suffered similar problems breaking away from simple gradients as in interactive
evolution. While such a network theoretically could approximate the target pat-
tern, it always became trapped on a local optimum because it is easy to reach a
fairly high fitness simply by approximating the general loudness of drums over
large contiguous periods of time. In other words, instead of attempting to dis-
cover every individual beat, it discovers their average energy and paints that
energy level across large swaths of time. Thus this type of network is demon-
strably ill-suited to producing such temporal patterns, either through interactive
evolution or target-based evolution.

However, interestingly, unlike in the 30-generation interactive experiment,
after 1600 generations the recurrent network typically produces a repeating
pattern that does synchronize with the measure structure. Thus one conclusion
is that recurrent networks can learn musical structure given sufficient time.
However, unlike the target pattern, which displays distinct variations (e.g. in the
latter half of figure 11c), the pattern produced by the best recurrent networks
tend to repeat the same measure pattern throughout the song after the first
generation (figure 11b). Also, this repeating motif is only vaguely reminiscent
of the target, which is likely because the recurrent network has trouble producing
the subtle repetition with variation that the target inherited from its scaffold
when it was evolved.

Because the feedforward results were disappointing, a third set of 20 runs
was attempted with a feedforward network that receives both position-in-song

24


(a) Typical Feedforward Champion

(b) Typical Recurrent Champion

(c) Target Pattern

Figure 11: Typical Best Results from Target-based Evolution Without
Scaffolding. Feedforward (a) and recurrent (b) results are depicted and the
target pattern is shown in (c). The aim was to match the target. As this
figure shows, neither variant successfully matched the target (which was evolved
in NEAT Drummer in 11 generations) after 1,600 generations, although the
recurrent variant evolved more complex patterns. This result confirms again
the importance of the scaffold.

25


Figure 12: Typical Result from Target-based Evolution with Position-
in-song and Position-in-measure Inputs. The improvement in pattern
complexity, and the adherence to measure structure, are apparent in comparison
to figure 11a. By providing position-in-measure as input, evolution can easily
produce patterns that follow the timing of measures, demonstrating the power of
conductor inputs. However, without the scaffolding inputs from Johnny Cope,
the drum pattern still does not match the target despite its regular structure.

and position-in-measure conductors as input. The idea is to relieve the network
of the need to discover the measure structure of music on its own, exploiting
the power of conductors. In fact, as figure 12 shows, providing position-in-
measure typically dramatically improves the complexity of the output and allows
it to break out of the local optima that trap such networks without position-in-
measure. Indeed, the feedforward output resembles the output of the recurrent
network and respects measure structure, demonstrating the contribution of the
additional conductor. Yet because even such conductors do not contain the
same song-specific information as the scaffold, like the recurrent network, the
output pattern is only marginally reminiscent of the actual target pattern, and
is also more repetitive.

Thus, after 1,600 generations, none of the variant networks are able to suc-
cessfully match a target pattern that was discovered in only 11 generations. Fig-
ure 13 summarizes this result by depicting fitness over time in the three variants,
each averaged over 20 runs. Whereas a fitness of 1.0 denotes a perfect match,
none of the variants reach a fitness of even 0.7. Interestingly, despite the appar-
ent aesthetic inferiority of the feedforward network with only position-in-song,
on average its fitness approaches that of the other two variants, demonstrating
why local optima characterized by smooth volume gradients attract it.

Although their final fitness levels are not far apart, the differences between
some of the variants are significant. In particular, recurrent networks pro-
duce significantly higher fitness than position-in-song alone throughout the run
(p < 0.05), and feedforward with the position-in-measure input outperforms
feedforward without it (p < 0.05). However, interestingly, by the end of each
run, recurrent networks on average are not significantly better than feedfor-

26


Figure 13: Fitness Over Time of the Three Variants in Target-based
Evolution. The increase in fitness over 1,600 generations of evolving the three
variant representations is shown, averaged over 20 runs for each. A perfect
fitness of 1.0 would mean the target is matched perfectly. None of the variants
exceed 0.7 fitness because the search is too unconstrained without the scaffold.

ward with position-in-measure, showing that feedforward networks can match
the performance of recurrent networks on this task when provided information
on the structure of music. However, most significantly, none of the variants can
produce a pattern close to the target within 1,600 generations.

The main conclusion from both interactive and target-based comparisons is
thus that the scaffold provides critical infrastructure. In effect, it constrains
the search to patterns that relate to the original song. If accompaniment is to
be evolved for an existing song, inputs from that song should be provided as
a scaffold. Without such context, the accompaniment becomes decoupled from
the song regardless of the representation.

A further result is that conductors make it easier to discover patterns that
respect musical structure. While the recurrent network does eventually discover
a measure-synchronized motif in the target-based runs, it takes hundreds of gen-
erations to achieve such synchrony. On the other hand, when time-in-measure
is provided as a conductor, measure structure is respected from the start.

Overall, this set of experiments confirms the contribution of the scaffold and
suggests that it should be a standard facet of any network-based attempt to
generate musical accompaniment.

7 Discussion

While the sheer size of a composition containing multiple interrelated tracks
suggests its complexity, the results with NEAT Drummer show that by encoding
some parts as functions of others, much of the apparent complexity is removed.
While each drum track contains hundreds of drum strikes over several minutes,

27


the networks that represent them contain on average only 22 connections. The
most salient feature of drum tracks generated in this way is that they sound
natural, hinting that the functional relationship may reflect a realistic aspect of
human musical creativity.

7.1 Implications of Scaffolding

The idea of scaffolding, i.e. deriving several parts from a preexisting pattern,
means that the most profound effort in musical creativity can be largely con-
centrated on a relatively small part of the overall composition, which can then
provide a scaffold for other parts. The interrelationships among the different in-
struments of a song can be expressed as functions of one or more scaffold tracks,
which is the approach taken in this paper. While effective, it is interesting that
the direction of the relationship could also go the other way: Melody and har-
mony can by expressed as a function of rhythm. In fact, harmony can also be a
function of melody and vice versa. Thus there is no apparent essential starting
point, though some may be more natural than others depending on the context.
Nevertheless, it is clear that something must form the scaffold from which all
other accompaniment is derived. Thus, as long as some seed is introduced, the
accompaniment can follow smoothly.

The main contribution of this paper is thus to advance automated music gen-
eration by introducing an effective method for representing some parts of a song
with respect to others in a connectionist framework. This approach significantly
simplifies the problem of constraining accompaniment appropriately. Neverthe-
less, of course, different styles of accompaniment may be more or less difficult
to discover, even with the scaffold. For example, can convincing jazz walking
bass be generated even in the context of other jazz instrumental tracks? Cer-
tainly it is possible that the interactive evolution process can discover a function
that expresses a particular style, yet the likelihood of such a discovery depends
on to what extent the style is already embodied by the existing scaffold. The
extent to which the scaffold contains essential stylistic cues, combined with the
complexity of the function that would create the right style in the absence of
such cues, determines the difficulty of the discovery. Thus this work does not
diminish the considerable human effort required to acquire specialized styles of
accompaniment.

Another intriguing possibility is that specific conductors can be developed
that constrain output to a desired style. Thus, while discovering a style based
on the scaffold alone may be difficult in some cases, providing both the scaffold
and carefully chosen conductors can potentially simplify the search, just as
conductors in this paper convey a priori measure and beat structure.

Interestingly, once a CPPN is found that yields a particular accompaniment
style, it can potentially be reused with other tracks. Thus, once discovered,
stylistic accompaniment may be transferable, which is an important topic for
future research.

Overall, then, while some styles of accompaniment are probably harder to
discover than others, the scaffold reduces the space that must be searched.

28


Nevertheless, although automated music generation may benefit from this
principle, it still does not solve the fundamental problem of generating the
scaffold itself. What kind of process can create the initial pattern? Interestingly,
it is possible that even an individual instrumental part can be generated from
an even more abstract underlying scaffold, i.e. one that is never actually heard,
like the conductors in this paper. These abstract patterns represent musical
structures below the level of the explicit notes and pauses. Rather, they are the
shape of the fabric upon which such notes are woven. An interesting hypothesis
is that human composers and songwriters construct at a cognitive level such
hidden “conductors” before the salient musical pattern emerges as notes and
rhythms. It may be difficult to discern, but several simple underlying functional
motifs that cannot be expressed in musical notation may underlie apparently
richly textured musical masterpieces. Perhaps this hidden factor plays a role in
our appreciation of music; when the many threads of a symphony stand out in
their synchronized majesty, perhaps the brain is appreciating the simple hidden
scaffold that unifies them in purpose.

Another complementary possibility is that even a long serial progression of
notes is actually a series of motifs that are functions of each other. This view
suggests that a scaffold or conductor underlying a long melody may itself be
short and compact. These considerations lead to the philosophical question of
how little is necessary to encode a “human essence” that functionally-related
parts can inherit. Perhaps only a very simple and short hidden function is all
that is essential to the subsequent unfolding of a symphony.

The size of the smallest necessary seed is practically important because the
smaller it is, the easier it will be in the future to create entire compositions
automatically. For example, if an entire complicated melody can be generated
from a simple initial function, and the rhythm and harmony can be generated
from the melody, then a future system might require a human to merely suggest
the barest motif, such as a gradual attack followed by several step-wise decays,
and from that point elaborate an entire composition through scaffolding.

Another important ingredient of NEAT Drummer that is not automated is
the user’s input. The product of a NEAT Drummer session thus in part reflects
the creativity of the user, and not just the search algorithm. It is an interesting
question whether the user can be entirely eliminated, allowing the computer
to compose completely on its own. However, while NEAT Drummer does not
eliminate the need for human input, what it does eliminate is the need for human
expertise by shifting the creative focus from composition to opinion (i.e. what
sounds the best). In this way, a significant obstacle to widespread, high-quality
musical creativity is removed.

Thus the promise of this work is that it opens a promising new avenue to
computer-generated music that raises interesting questions about how music is
encoded and generated by humans.

29


7.2 The Role of Recurrence in Connectionist Music

The experiments comparing recurrent networks to scaffolded CPPNs in Sec-
tion 6 yield interesting insight into the capabilities and limitations of recurrent
neural networks applied to generating musical accompaniment. In particular,
recurrent ANNs do produce complex motifs, but it is difficult to synchronize
them to musical structure. While a CPPN with a position-in-measure conduc-
tor can right away produce patterns that respect measures, it takes hundreds of
generations to achieve the same with recurrent networks, which is too long for
a human performing interactive evolution.

This result raises the question of whether it is biologically realistic in connec-
tionist models of music generation to rely upon recurrence as the main mecha-
nism of musical encoding. It is also plausible that the brain stores music in part
as simple functions that are layered and composed one upon another. It is no-
table that evolving a recurrent network (figure 11b) and a feedforward network
that is a function of both position-in-song and position-in-measure (figure 12)
produced results of similar quality. Thus the question remains open whether the
best infrastructure for generating temporal patterns is recurrence, or whether
it is two simple timing signals upon which feedforward functions can be built.
It may also be a combination of the two.

8 Conclusion

This paper argued that the reason human musicians can improvise and com-
pose vast and complex accompaniments almost instantaneously is that they are
in effect generating a simple function that relates one instrument’s part to an-
other’s. This idea was implemented in a program called NEAT Drummer that
generates novel drum tracks for existing MIDI songs. Furthermore, the idea
of a conductor, i.e. a simple hidden function that affects the overall pattern of
music, was introduced. The results demonstrated the viability of this model of
musical creativity, producing drum tracks that tightly follow the contours of real
songs yet still produce nontrivial accompaniment. The conductors seamlessly
introduced variational motifs over and above those already in the existing song
structure, creating a new way for humans with little musical expertise to control
the overall structure of a song. The main conclusion is that a significant portion
of musical creativity may be explained by the functional relationships between
the different parts of a song.

Acknowledgments

Special thanks to Michael Rosario for his prior work at the University of Central
Florida creating the software infrastructure for NEAT Drummer. Special thanks
also to Barry Taylor for granting special permission to utilize his own MIDI
productions of folk music in this work. Barry Taylor originally sequenced Johnny
Cope, Oh! Susanna, and Oh! Dem Golden Slippers (all without percussion),

30


which are the three songs for which drum tracks were generated in Sections 5
and 6. This research was supported in part by NSF grants IIS-REU: 0647120
and IIS-REU 0647018.

References

Barrett, F. J. (1998). Coda: Creativity and improvisation in jazz and orga-
nizations: Implications for organizational learning. Organization Science,
9(5):605–622. Special Issue: Jazz Improvisation and Organizing.

Berliner, P. F. (1994). Thinking in Jazz: The Infinite Art of Improvisation.
(CSE) Chicago Studies in Ethnomusicology. The University of Chicago
Press, Chicago.

Biles, J. A. (1999). Life with genjam: interacting with a musical iga, systems,
man, and cybernetics. In IEEE International Conference on Systems, Man,
and Cybernetics, pages 652–656.

Biles, J. A. (2007). Evolutionary computation for musical tasks. In Miranda,
E. R. and Biles, J. A., editors, Evolutionary Computer Music, chapter 2,
pages 28–51. Springer-Verlag New York, Inc., Secaucus, NJ, USA.

Chen, C.-C. and Miikkulainen, R. (2001). Creating melodies with evolving
recurrent neural networks. In Proceedings of the INNS-IEEE International
Joint Conference on Neural Networks, pages 2241–2246.

Dawkins, R. (1986). The Blind Watchmaker. Longman, Essex, U.K.

de Mantaras, R. L. and Arcos, J. L. (2002). Ai and music from composition to
expressive performance. AI Mag., 23(3):43–57.

Deutsch, O. E. (1965). Mozart: A Documentary Biography. Standford Univer-
sity Press, Standford, California. 59.

Dolson, M. (1989). Machine tongues xii: Neural networks. Computer Music
Journal, 13(3):28–40.

Furuta, H., Maeda, K., and E.Watanabe (1995). Application of genetic algo-
rithm to aesthetic design of bridge structures. Microcomput. Civil Eng,
10(6):415–421.

Gomez, F. and Miikkulainen, R. (1999). Solving non-Markovian control
tasks with neuroevolution. In IJCAI-99, pages 1356–1361, KAUF-ADDR.
KAUF.

Husbands, P., Copley, P., Eldridge, A., and Mandelis, J. (2007). An introduction
to evolutionary computing for musicians. In Miranda, E. R. and Biles, J. A.,
editors, Evolutionary Computer Music, chapter 1, pages 1–27. Springer-
Verlag New York, Inc., Secaucus, NJ, USA.

31


Hymer, S. (1990). On inspiration. In Stern, E. M., editor, Psychotherapy and
the Widowed Patient. The Haworth Press, Inc., New Rochelle, New York.

Johanson, B. and Poli, R. (1998). Gp-music: An interactive genetic program-
ming system for music generation with automated fitness raters. In Pro-
ceedings of the Third Annual Conference: Genetic Programming, pages
181–186.

Katz, P. and Longden, S. (1983). The jam session: A study of spontaneous group
process. In Middleman, R., editor, Activities and Action in Groupwork,
chapter 3, pages 37–52. The Haworth Press, Inc, Binghamton, NY.

Kuriyama, K. and Terano, T. (1997). Interactive story composition support
by genetic algorithms. In World Conf. Artificial Intelligence in Education,
page 615617, Kobe, Japan.

Lindenmayer, A. (1968). Mathematical models for cellular interaction in de-
velopment parts I and II. Journal of Theoretical Biology, 18:280–299 and
300–315.

McCormack, J. (2005). Open problems in evolutionary music and art. In Pro-
ceedings of Applications of Evolutionary Computing, (EvoMUSART 2005),
volume 3449 of Lecture Notes in Computer Science, pages 428–436, Berlin,
Germany. Springer Verlag.

Nelson, G. L. (1993). Sonomorphs: An application of genetic algorithms to
growth and development of musical organisms. In 4th Biennial Art and
Technology Symp., pages 155–169.

Oliver, M. (2006). On inspiration. Contemporary Music Review, 25(5/6):457 –
459.

Onisawa, T., Takizawa, W., and Unehara, M. (2000). Composition of melody
reflecting users feeling. IEEE Int. Conf. Industrial Electronics, Control and
Instrumentation, page 27382743.

Ruppel, R. R. (1998). Gottfried Keller and His Critics: A Case Study in Schol-
arly Criticism. Camden House, Columbia, SC.

Saravanan, N. and Fogel, D. B. (1995). Evolving neural control systems. IEEE
Expert, pages 23–27.

Sato, T. and Hagiwara, M. (1999). Tool creating support system using evolution-
ary techniques. Faji Shisutemu Shinpojiumu Koen Ronbunshu, 15:363366.

Schuller, G. (1968). Early Jazz. Oxford University Pres, New York.

Secretan, J., Beato, N., D’Ambrosio, D. B., Rodriguez, A., Campbell, A., and
Stanley, K. O. (2008). Picbreeder: Evolving pictures collaboratively online.
In CHI ’08: Proceeding of the twenty-sixth annual SIGCHI conference on
Human factors in computing systems, pages 1759–1768, New York, NY,
USA. ACM.

32


Sims, K. (1991). Artificial evolution for computer graphics. In Proceedings of the
18th Annual Conference on Computer Graphics and Interactive Techniques
(SIGGRAPH ’91), pages 319–328, New York, NY. ACM Press.

Stanley, K. O. (2006). Exploiting regularity without development. In Proceed-
ings of the AAAI Fall Symposium on Developmental Systems, Menlo Park,
CA. AAAI Press.

Stanley, K. O. (2007). Compositional pattern producing networks: A novel
abstraction of development. Genetic Programming and Evolvable Machines
Special Issue on Developmental Systems, 8(2):131–162.

Stanley, K. O. and Miikkulainen, R. (2002). Evolving neural networks through
augmenting topologies. Evolutionary Computation, 10:99–127.

Stanley, K. O. and Miikkulainen, R. (2004). Competitive coevolution through
evolutionary complexification. JAIR, 21:63–100.

Takagi, H. (2001). Interactive evolutionary computation: Fusion of the capac-
ities of EC optimization and human evaluation. Proceedings of the IEEE,
89(9):1275–1296.

Todd, P. M. and Loy, D. G. (1991). Music and Connectionism. MIT Press,
Cambridge, MA.

Todd, S. and Latham, W. (1992). Evolutionary Art and Computers. Academic
Press, London.

Todd, S. and Latham, W. (1999). The Mutation and Growth of Art by Comput-
ers, chapter 9, pages 221–250. Morgan Kaufmann, San Mateo, California.

Tokui, N. and Iba, H. (2000). Music composition with interactive evolutionary
computation. In Third International Conference on Generative Art, pages
215–226, Milan Italy.

Weick, K. E. (1998). Introductory essay: Improvisation as a mindset for organi-
zational analysis. Organization Science, 9(5):543–555. Special Issue: Jazz
Improvisation and Organizing.

Yao, X. (1999). Evolving artificial neural networks. Proceedings of the IEEE,
87(9):1423–1447.

33