Presence 9/1


Michael Cohen
mcohen@u-aizu.ac.jp
http://www.u-aizu.ac.jp/
Dmcohen
Spatial Media Group
Human Interface Lab.
University of Aizu 965-8580
Japan

Exclude and Include for Audio
Sources and Sinks:
Analogs of Mute & Solo Are Deafen

& Attend

Abstract

Non-immersive perspectives in virtual environments enable flexible paradigms of per-
ception, especially in the context of frames of reference for conferencing and musical
audition. Traditional mixing idioms for enabling and disabling various audio sources
employ mute and solo functions, that, along with cue, selectively disable or focus on
respective channels. Exocentric interfaces which explicitly model not only sources but
also sinks, motivate the generalization of mute and solo (or cue) to exclude and
include, manifested for sinks as deafen and attend (confide and harken). Such func-
tions, which narrow stimuli by explicitly blocking out and/or concentrating on selected
entities, can be applied not only to other users’ sinks for privacy, but also to one’s own
sinks for selective attendance or presence. Multiple sinks are useful in groupware,
where a common environment implies social inhibitions to rearranging shared sources
like musical voices or conferees, as well as individual sessions in which spatial arrange-
ment of sources, like the configuration of a concert orchestra, has mnemonic value. A
taxonomy of modal narrowcasting functions is proposed, and an audibility protocol is
described, comprising revoke, renounce, grant, and claim methods,
invocable by these narrowcasting commands to control superposition of soundscapes.

1 Introduction

An exocentric model in which a user is represented by an icon (avatar,
synthespian, vactor, and so on) in the context of a virtual space (as suggested by
Table 1) is useful in spatial sound systems; virtual environments with audio can
be thought of as graphical mixing consoles. As outlined by Table 2, since the
word speaker is ambiguously overloaded, meaning both loudspeaker and talker,
this paper uses source to mean both, a logical sound emitter. Similarly and sym-
metrically, sink is used to describe a virtual listener, a logical sound receiver.
Icons embodying sources and sinks may wander around virtual spaces, like min-
glers at a cocktail party, or upon the stage during a concert, hovering over the
shoulder of favorite musicians. For example, if a sink rotates (exocentrically vi-
sually), the apparent sonic location of the source revolves (egocentrically acous-
tically) accordingly.

Most discussions of presence in virtual environments are about its quality—
degrees of resolution and interactivity (Held & Durlach, 1992; Sheridan, 1992;
Sheridan, 1997). This paper assumes elaboration of its quantity (Cohen, 1995,

Presence, Vol. 9, No. 1, February 2000, 84–96

r 2000 by the Massachusetts Institute of Technology

84 P R E S E N C E : V O L U M E 9 , N U M B E R 1


1998; Cohen & Koizumi, 1998). One’s perceptual focus
need not be unique or singular. Split or shared percep-
tion can be thought of as violating the ‘‘one [sensory]
sink to a customer’’ allocation that is inherent to immer-
sive systems; in an exocentric paradigm, each user may
have an arbitrary number of dedicated virtual sensor
instances, and the mapping between sinks and hu-
mans may be one to many, many to one, or many to
many.

The case of many sinks designated by a single user,
explained in more detail later, describes situations in
which one has various simultaneous telepresences (like
talking on a phone while monitoring an intercom while
listening to music). Illustrating a one-to-many mapping
of sinks to users (as in broadcast media like TV or radio
which effectively employ a single delegate of a collective
audience), Cohen and Koizumi (1991) allowed two us-
ers to synchronously adjust the position of multiple
sources and a single shared sink in a virtual concert, as if
they were simultaneously conductor and (singleton) au-
dience. More prosaically, a normal conference call just
sums the signals from everyone participating, so that
they can be said to share a sink. An example of a many-
to-many sink:user mapping is a virtual concert in which
the audience shares a distribution of sinks: each user may
attend the same soundscape, but multiple sinks can be
used to decrease the granularity of audition. Such pre-
sentation styles blur the distinctions between composer,
conductor/performer, and audience, as hypertext blurs
the distinctions between author, publisher, and reader.
The extension into audio of spatial sound and the flex-
ible perspective models of virtual reality—catalyzed by
the convergence of telecommunication (including tele-
phony), computing, and electronics (including audio,
television, and video)—motivate extensions to tradi-
tional idioms for sound mixing in musical and confer-
encing applications (Cohen, 1997).

2 Deafen & Attend (Confide and Harken)

Traditional mixing idioms for selectively activating
multiple sources employ mute and solo functions, which,
along with cue, disable or focus on respective channels.

Sometimes, just the initial letters s and m are used (with
no lascivious association intended), the s standing
for select as well as solo. That mute blocks the output
of a source, it goes without saying. Exocentric interfaces,
which explicitly model not only sources but also loca-
tion, orientation, directivity, and multiplicity of sinks,
motivate the generalization of mute/solo and cue to ex-
clude and include, manifested for sinks as deafen/
confide and harken, a narrowing of stimuli by explicitly
blocking out and/or concentrating on selected entities
(Cohen, 1999). Deafen disables sinks; confide and
harken focus on them by disabling others. These
extensions can be described in the context of applica-
tion to three situations, presented in the following sec-
tions.

2.1 Deafen/Confide Invoked on Other
Users’ Sinks for Privacy

A simple conferencing configuration typically con-
sists of various icons representing distributed users, mov-
ing around shared spaces. These icons each represent a
source (the voice of the associated user) as well as a sink
(that user’s ears). Source attributes mute and solo or cue,
settable by each user for each source, are used to focus
on some channels exclusively, or selectively still them.
Solo picks out selected channels for aural scrutiny; mute
blocks the selection out of the mix. If privacy is desired,
confidentialities can be shared in a separate, acoustically
isolated (if still virtual), space.

Anisotropic (direction-dependent) sound radiation

Table I. User and Delegate—Projected Presence

Human pilot
Representative
(projected presence)

carbon community avatar
RL (real life) electronic puppet
meatspace synthespian (synthetic thespian)
motion capture vactor (virtual actor)

An avatar is the reification of an icon in a virtual
environment.

Cohen 85


patterns (Cohen & Koizumi, 1998; CRE, 1994), like
that shown in Figure 1, can be used to define projection
of sources and thereby control audibility. Such ‘‘nearest-
neighbors’’ or proximity-based techniques of spatial par-
titioning (Viegas & Donath, 1999) are useful, as in nor-
mal conversations, for situations in which one doesn’t
mind others noticing (as third-party witnesses to a first-
to-second person address), but multilobed radiation
patterns become impractical for situations in which
confidants are spatially distributed, and impossible

when sound must ‘‘skip over’’ unaddressed inter-
vening sinks, like knights in chess. Just as a mega-
phone directionally projects sound or an ear trumpet
collects it, a multimegaphone or ears trumpet, like that
imagined by Figure 2, represents a generalization, ca-
pable of projecting sound fields to multiple arbitrary
locations.

Alternatively and more practically, deafen/confide
functions, sink analogs of mute/solo, can be used to nar-
row multicasts to representatives of other humans in a

Table 2. Roles of sOUTput
rce and sINput

k

86 P R E S E N C E : V O L U M E 9 , N U M B E R 1


groupware environment.1 Rather than exclude/in-
clude attributes that näively assume symmetry in a
‘‘gossip circle’’ by linking mute to deafen and solo to at-
tend, such differentiated functions allow tighter control.
One might want to monitor an ongoing conference
while focusing remarks, or share with a gallery whose
cacophony one has no use for.

2.2 Deafen/Harken Invoked on One’s
Own Sinks

2.2.1 Deafen/Harken (Invoked on One’s Own
Sinks) Across Several Spaces for Selective
Attendance. Designation of multiple sinks across sev-
eral spaces effectively increases one’s attendance. A user
may simply fork themself, leaving one clone hither while
installing another yon, compositing soundscapes via the
superposition of multiple sinks’ presence (Begault,
1994, pp. 213–216). Such a multisink presence, en-

abling multiple receivers in different locations, explicitly
overlays multiple audio displays, allowing a conferee to
leave a pair of ears in one conversation, while sending
other pairs to side caucuses.

Audio entities, unlike visual, do not in general oc-
clude, although masking can be thought of as audio oc-
clusion (Bregman, 1990; Blauert, 1997; Cohen & Wen-

1. The ‘‘Cone of Silence,’’ used by Agent 86 and the Chief in Mel
Brooks’ TV show Get Smart, was intended to acoustically seclude two
spies, so that they could exchange secret information without anyone
eavesdropping.

Figure 1. Contour plot showing projection of radiation pattern
combining distance and directional effects. (Generated in Mathematica.)

Figure 2. Speaker or microphone as Hydra—multimegaphone
(sources) or ears trumpet (sinks).

Figure 3. Superposition of soundscapes (reproduced from Begault
with permission).

Cohen 87


zel, 1995). Combination of soundscapes can be done
directly, monaurally or stereophonically, as in a mixer, as
shown by Figure 3. In particular, stereo sources—real
(or mic’d on a dummy head) or artificial (binaurally spa-
tialized)—may be simply added (Cohen et al., 1993).
The overlaid existence so enabled suggests the name
given to this effect—sonic cubism, presenting multiple
simultaneous acoustic perspectives collapsed into a single
soundscape (comparable to the way visual cubism col-
lapses several viewpoints of a 3D scene onto a 2D sur-
face). Being anywhere is better than being everywhere,
since it is selective. A multisink presence is distilled ubiq-
uity, regarding multiple objects at once. Distinguishing
between other users’ sinks and one’s own (as the two
sets might be distinct, identical, or partly overlapping)
motivates choice of a special word to describe focusing
on a subset of (possibly many of) one’s own locations. In
the case of a user represented by multiple sinks, harken
recalls a transitive form of hark, a reflexive confide, de-
noting a sense of listening attentively or closely via one’s
designated sinks’ ears.

2.2.2 Deafen/Harken (Invoked on One’s Own
Sinks) in a Single Space for Selective Multipresence.
The designation of multiple sinks can be used to sharpen
the granularity of control within a single space, as sepa-
rate sinks can monitor individual sources via selective
amplification, even if those sources are not reposition-
able; just as in ordinary settings, social conventions
might inhibit dragging someone else around a shared
space. One could pay close attention to particular instru-
ments in a concert without rearranging the ensemble,
which would disturb the soundscape perceived by icons
representing other users in the common model. A useful
analogy is a ‘‘Rashomon simulcast,’’ after the epony-
mous Akira Kurosawa film based on the stories of Ry
unosuke Akutagawa (Akutagawa, 1952), which con-
trasted multiple perspectives of a single incident. Table 3
presents a taxonomy of points of view, sweeping the con-
tinuum from egocentric through exocentric user experi-
ences (Laurel, 1986).

Imagine, for example, that a concert attendee wanted
to pay special attention to a drum and rhythm guitar,
while preserving the configuration of the instruments.

Besides tradition and mnemonics, one reason for not
just rearranging the instruments around a singleton sink
is to maintain consistency with other listeners, distrib-
uted in time and space (both physical and virtual). One
could replicate, and be literally besides oneself. In
Figure 4, one avatar is located inside the drum, while
another doppelgänger is near the rhythm guitar.

The apparent paradoxes of one’s being in multiple
places simultaneously (Firesign Theatre, 1968) can be
resolved by partitioning the sources across the sinks. If
the sinks are distributed across separate virtual rooms,
each source is spatialized with respect to the sink in the
same room. In the case of autothronging—or multiple
sinks designated by a single user in the same space—an
autofocus mode can be employed by anticipating level-
difference localization, the tendency to perceive multiple
identical sources in different locations as a single fused
source. This is related to the precedence effect, or ‘‘rule
of the first wavefront’’ (Wallach et al., 1949; Haas,
1972; Blauert, 1996; Gilkey & Anderson, 1997). Rather
than adding or averaging the contribution of each
source to one’s multiple sinks, each source can be spa-
tialized with respect to only the best (loudest, as a func-
tion of distance and mutual gain, including focus and
orientation) sink, as shown in Figure 5.

3 Applications

These themes of multiple, selective presence have
been explored by two proofs-of-concept.

3.1 Helical Keyboard

The Helical Keyboard (Herder & Cohen, 1996) is
a virtual, piano-style keyboard wrapped through a left-
handed helix (Shepard, 1984), so that chroma (note
within an octave) maps to azimuth and pitch height
maps to elevation. The model was generated algorithmi-
cally with Mathematica (Wolfram, 1996), exported into
VRML (Hartman & Wernecke, 1996), and imported into
Open Inventor (Wernecke, 1994), where it is animated
by a MIDI streamer.

Designed to allow separate audition, for instance, of

88 P R E S E N C E : V O L U M E 9 , N U M B E R 1


harmony and melody, the model is meant to be experi-
enced in a hemispherical speaker array (Amano et al.,
1998). A single sink inside the helix near its base might
easily determine the azimuth of the harmony, but the
melodic notes would all seem to come from the upper
pole. As shown by Figure 6 (see p. 94), multiple sinks
can normalize the octave, and can be selectively disabled
by an active toggle.

3.2 MAW: Multidimensional Audio
Windows

MAW is an application for manipulating sound
sources and sinks in 2D virtual rooms, capable of driving
a heterogeneous backend (Cohen & Ludwig, 1991a,
1991b; Cohen, 1993). The graphical representation of
MAW’s virtual rooms is an orthographic plan view. Fig-
ure 7 shows a snapshot of such a representation as part
of a typical session (mixing the top-down metaphor used

in Figure 4 with frontal snapshots), using multiple sinks
as foci of a generalized fisheye (Furnas, 1986) audio
‘‘lens.’’

MAW supports multiple, independent, simultaneous con-
ferences and concerts; a source is inaudible to a sink in a dif-
ferent virtual room. The cut/paste idiom is used as a worm-
hole (teleporter), so a subcaucus may be spawned simply by
cutting a coterie out of one room and pasting it (‘‘beaming
down’’) into another. Users wanting to monitor simulta-
neous conferences need only fork themselves with, for ex-
ample, copy/paste, installing (multiply designated) replicant
sinks in each room of interest. A mixels panel,2 shown in

2. Mixels, acronymic for sound mixing elements,—in analogy to
dexels (depth elements), hogels (holographic elements), pixels (picture
elements), taxels (tactile array elements), texels (texture elements), and
voxels (volumetric elements, a.k.a. boxels), since they are like a raster
across which a soundscape is projected—define the granularity of con-
trol and degree of directional or spatial polyphony.

Figure 4. Virtual concert: multiple sinks (generalized multifocus audio fish-eye).

Cohen 89


Figure 8, can be used to activate or deactivate sources and
sinks with solo, mute, attend (confide or harken), and
deafen.

4 Formalization

The suite of inclusion and exclusion narrowcast
commands for sources and sinks are like analogs of
burning and dodging (shading) in photographic processing.
The analogy between source and sink operations is close,
and the semantics are identical: an icon is enabled by de-
fault unless it is explicitly excluded (with mute and deafen),
or peers are explicitly included (with solo or cue and
attend: confide or harken) when the respective icon is not.
Because a source or a sink is active by default, invoking ex-
clude and include operations simultaneously on an ob-
ject results in its being disabled. In predicate calculus
notation,

active(x) 5 Qexclude(x)

` (' y include(y) ⇒ include(x)).
(1)

So, for mute and solo, the relation is

active(sourcex) 5 Qmute(sourcex)

` (' y solo(sourcey) ⇒ solo(sourcex)),
(2a)

mute explicitly turning off a source, and solo disabling
the collocated (same room/window) complement of the
selection (in the spirit of ‘‘anything not mandatory is
forbidden’’). For deafen/attend, the relation is

active(sinkx) 5 Qdeafen(sinkx)

` (' y attend(sinky) ⇒ attend(sinkx)).
(2b)

5 Figurative Representation

Distinguishing between operations involving one’s
own and others’ representatives, Table 4 proposes a tax-
onomy of narrowcasting functions by juxtaposing dis-
abling and enabling operations, generally as well as for visual
and audio modalities. The famous ‘‘hear/speak/see no evil’’
monkeys, pictured in Figure 9, are examples of reflexive
‘‘[sink] deafen/[source] mute/[sink] avert,’’ which audio
relations fill the top-right sextant of Table 4.3 Traditional
mixing console functions mute/solo and cue, operating as
they do on sources that are metaphorically remote, corre-
spond to transitive ‘‘[source] mute’’ (illustrated by Figure
10a), which, along with transitive ‘‘[sink] deafen’’ (illustrated
by Figure 10b), fills the bottom-right sextant.

A figurative avatar in virtual space is humanoid, and
especially includes a head, which embodies not only a
center of consciousness, but also the ears, mouth, and
eyes. Exclude and include source and sink properties
can be visually represented by iconic attributes which can
distinguish between operations reflexive (invoked by a
user associated with a respective icon) and transitive (in-
voked by another user in the shared environment) (Co-
hen & Herder, 1998). Distributed users might typically
share spatial aspects of a groupware environment, but
attributes like mutedness or deafenedness are determined
and displayed on a per-user basis (Chen et al., 1999).

3. ‘‘Blind,’’ the dual of the visual ‘‘see no evil’’ avert operation, cor-
responds to a video conference ‘‘sneeze button,’’ which blocks or
freezes transmission.

Figure 5. Unicast source = sink transmissions. If an attending sink is
deafened (or peers confided in), remaining sinks adopt orphaned
sources.

90 P R E S E N C E : V O L U M E 9 , N U M B E R 1


Figure 7. MAW conference.

Figure 8. Mixels panel: sinks and sources across multiple spaces. The solo and confide columns employ the familiar ‘‘radio buttons’’ idiom, in
which the selection is presumed to be a singleton. Asserting an attribute for one object (by checking it in the respective column at the designated

row) resets it for any others, unless the selection set is explicitly extended (by holding down a shift key while asserting the property).

Cohen 91


For example, a source representing a human telecon-
feree denotes mutedness with an iconic hand clapped
over its mouth, oriented differently (thumb up or thumb
down) depending on whether the source was muted by
its owner (or one of its owners) or another unassociated
user. (In the former case, all the users in the space would
observe the mute, but, in the latter, only the user dis-
abling the remote source would see the mute.) An audio
muffler might be wrapped around an iconic head to de-
note its deafness, but to distinguish between self-im-
posed deafness (invoked by one whose attention is di-
rected elsewhere) and distally imposed (invoked by
another desiring selective privacy), hands clasped
over the ears can be oriented differently depending on
the agent of deafness. These cases are illustrated by
Table 2.

Such disabling qualities are not mutually exclusive, and,
indeed, the orthogonal iconic attributes can be superim-
posed, albeit confusingly, as in Figure 11, an ‘‘omnigrope’’
extreme case. Simultaneously applied filters can be repre-
sented by interpenetrated virtual models, nonverbal commu-
nication being used to symbolize access permissions.

6 Future Research

6.1 Continuum of Audibility

Besides the dichotomous on/off of the include/
exclude functions, we plan to fuzzify the audibility
continuum by programming functions that focus source-
to-sink transmissions without blocking them from others
in the same space, a ‘‘sharing’’ meant to denote a non-
private aside. (For instance, some home listening con-
soles have a so-called ‘‘mute’’ function that reduces the
volume by approximately 20 dB instead of cutting it al-
together.) A ‘‘casual confide’’ function could eventually
be combined with an obtrusive mode (Mershon, 1997;
Martens, 1997), invoking source-side, near-field transfer
functions for whispering and sotto voce effects. A notion
of ‘‘virtual social distance’’ (Michelitsch et al., 1998) can
be used to scale the quality of audio narrowcasts, includ-
ing representation as a synthetic murmur.

6.2 Audibility Protocol

Modeling sources and sinks as software objects, an
audibility protocol describes transitions between states in
which the respective methods are appropriate, as shown
by Figure 12.

Table 3. Points of View

Point of view Person Intimacy Object Distance Mode Perspective

exocentric 3rd public other distal transitive objective
vicariousness, empathy 2nd social, multipersonal familiar medial imperative
telepresence, autoempathy remote self
immersive 1st personal self proximal reflexive subjective

egocentric

Figure 9. Monkeys at Toshogu Shrine (in Nikko, Japan): Kikazaru
(‘‘hear no evil’’), Iwazaru (‘‘speak no evil’’), and Mizaru (‘‘see no evil’’).

92 P R E S E N C E : V O L U M E 9 , N U M B E R 1


Because of the asymmetry of both mute/solo and
deafen/attend (that is, audibility is assumed for collo-
cated icons), audibility of a source with respect to a sink
should be treated as a revocable privilege and a forsak-
able right. For example, audibility would be granted by

a source upon one’s entering a space, acknowledgeable
by the respective sink by claiming that attribute. A
source wishing to exclude certain sinks from audibility
would invoke, directly via deafen or indirectly via attend, a
revoke method, duly acknowledgeable by each disabled
sink’s renounce. Further policy extensions
will relax the symmetry of such a protocol, including
the ability to force audibility by overriding a source’s mute or
sink’s deafen (which a parent might invoke when telechiding
a distracted child: ‘‘How dare you attenuate my voice?!’’).

Groupware implementation of deafen/attend should
eventually be done nondistally, as privacy concerns be-
come relevant. Rather than distributing all the source
streams, as our prototypes do, and expecting the soft-
ware to ignore private transmissions for others, a prop-
erly secure implementation would restrict distribution
earlier. Full articulation of groupware extensions
for privacy requires multicasting of listening per-
missions.

Such protocols might ultimately be deployed directly
on the internet, dynamically loaded into multicasting
routers. Active networks allow programmable network
infrastructure. An application-specific audibility protocol
like that described here could dynamically reconfigure
the routers’ policy, for security, privacy, and to reduce
network traffic.

7 Conclusion

The protocols and methods defined and suggested
by this research enable narrow- and multicasting idioms
for selective privacy and attendance, scalability and LoD
(level of detail), and side- and back-channels. Usually
one thinks of one’s perspective as residing in a single
place—namely, behind one’s eyes, between one’s ears,

Figure 10. Distal exclude.

Table 4. Exclude/Include Taxonomy: Enable/Disable for One’s Own and Others’ Representatives

Cohen 93


and so forth—but telepresence enables such points of
attendance to be distributed and nonsingular, by repli-
cating subject instead of object. In Figure 13, separate
soundscapes corresponding to music (top left), tele-
phony (top right), mobile vehicular communication
(bottom left), and a workstation (bottom right) are
combined into a single percept, one’s perceptual spaces

Figure 6. Multipresence via multiple sinks.

Figure 11. Figurative avatar omnigroping interdigitation. A source
representing a human teleconferee denotes mutedness with an iconic

hand clapped over its mouth, oriented differently (thumb up or thumb

down) depending on whether the source was muted by its owner (or

one of its owners) or another user. To distinguish between deafness

self-imposed (invoked by a user whose attention is directed elsewhere)

versus distally imposed (invoked by a user desiring selective privacy),

hands clasped over the ears orient differently depending on the agent of

deafness. Being both virtual and conceptually orthogonal, these various

hands interpenetrate.

Figure 12. Audibility protocol.

94 P R E S E N C E : V O L U M E 9 , N U M B E R 1


being naturally coextensive, the center of one’s con-
sciousness being singular.

Acknowledgments

Jens Herder is the software architect for the Helical Keyboard
project and the developer of the Sound Spatialization Frame-
work, in which the avatar representations are deployed. Tom-
oyuki Kannoo implemented the ‘‘omnigroping’’ avatar. Hiroki
Sato prepared the monkey illustrations. This research has been
supported by a grant from the Fukushima Prefectural Founda-
tion for the Advancement of Science and Education.

References

Akutagawa, R. (1952). Rashomon and Other Stories. Charles E.
Tuttle Company, Inc. ISBN 0-8048-1457-0.

Amano, K., Matsushita, F., Yanagawa, H., Cohen, M., Herder,
J., Martens, W., Koba, Y., & Tohyama, M. (1998). A virtual
reality sound system using room-related transfer functions
delivered through a multispeaker array: The PSFC at the
University of Aizu Multimedia Center. TVRSJ: Trans. of the
Virtual Reality Society of Japan, 3(1), 1–12. ISSN 1344-
011x.

Begault, D. R. (1994). 3-D Sound for Virtual Reality and
Multimedia. Academic Press. ISBN 0-12-084735-3.

Blauert, J. (1996). Spatial Hearing: The Psychophysics of Hu-
man Sound Localization (2nd ed.). MIT Press. ISBN 0-262-
02413-6.

———. (1997). Acoustical simulation and auralization for VR
and other applications. Proc. ASVA: Int. Symposium on
Simulation, Visualization and Auralization for Acoustic Re-
search and Education (pp. 261–268). Tokyo.

Bregman, A. S. (1990). Auditory Scene Analysis: The Percep-
tual Organization of Sound. MIT Press. ISBN 0-262-
02297-4.

Chen, C., Thomas, L., Cole, J., & Chennawasin, C. (1999).
Representing the semantics of virtual spaces. IEEE MultiMe-
dia, April–June, 54–63.

Cohen, M. (1993). Integrating graphical and audio windows.
Presence: Teleoperators and Virtual Environments, 1(4), 468–
481. ISSN 1054-7460.

———. (1995). Besides immersion: Overlaid points of view
and frames of reference; using audio windows to analyze
audio scenes. In S. Tachi (Ed.), Proc. ICAT/VRST: Int.
Conf. Artificial Reality and Tele-Existence/Conf. on Virtual
Reality Software and Technology (pp. 29–38). Makuhari,
Chiba, Japan.

———. (1997). Exclude and include for audio sources and
sinks: Analogs of mute/solo & cue are deafen/confide &
harken. Proc. ICAD: Int. Conf. Auditory Display (19–28),
Palo Alto, CA.

———. (1998). Quantity of presence: Beyond person, num-
ber, and pronouns. In T. L. Kunii, & A. Luciani (Eds.), Cy-
berworlds (pp. 289–308). Springer-Verlag. ISBN 4-431-
70207-5.

———. (1999). Chat space models. Proc. Joint Meeting of the
137th Regular Meeting of the Acoustical Society of America
and the 2nd Convention of the European Acoustics Association:

Figure 13. Soundscape superposition: overlaid mutiple soundscapes.

Cohen 95


Forum Acusticum, 2 (pp. 1099). Berlin. Signal Processing
for Teleconferencing and Smart Microphones, 2pSPa7.

Cohen, M., Aoki, S., & Koizumi, N. (1993). Augmented au-
dio reality: Telepresence/VR hybrid acoustic environments.
Proc. Ro-Man: 2nd IEEE Int. Workshop on Robot and Human
Communication (pp. 361–364). Tokyo. ISBN 0-7803-
1407-7.

Cohen, M., & Herder, J. (1998). Symbolic representations of
exclude and include for audio sources and sinks. In M. Gö-
bel, J. Landauer, U. Lang, & M. Wapler (Eds.), Proc. VE:
Virtual Environments (pp. 235–242). Stuttgart: IEEE,
Springer-Verlag Wien. ISSN 0946-2767; ISBN 3-211-
83233-5.

Cohen, M., & Koizumi, N. (1991). Audio window. Den Gaku.
Tokyo Contemporary Music Festival: Music for Computer.

———. (1998). Virtual gain for audio windows. Presence:
Teleoperators and Virtual Environments, 7(1), 53–66. ISSN
1054-7460.

Cohen, M., & Ludwig, L. F. (1991a). Multidimensional audio
window management. IJMMS: the Journal of Person-Com-
puter Interaction, 34(3), 319–336. Special Issue on Com-
puter Supported Cooperative Work and Groupware. ISSN
0020-7373.

———. (1991b). Multidimensional audio window manage-
ment. In S. Greenberg (Ed.), Computer Supported Coopera-
tive Work and Groupware (pp. 193–210). London: Aca-
demic Press. ISBN 0-12-299220-2.

Cohen, M., & Wenzel, E. M. (1995). The design of multidi-
mensional sound interfaces. In W. Barfield & T. A. Furness
III (Eds.), Virtual Environments and Advanced Interface
Design, Chapter 8 (pp. 291–346). Oxford University Press.
ISBN 0-19-507555-2.

CRE (1994). CRE_TRON Library Reference Manual. Crystal
River Engineering, Inc., Revision B.

Firesign Theatre (1968). How can you be in two places at once
when you’re not anywhere at all. LP Columbia 9884; CD Mo-
bile Fidelity 834.

Furnas, G. W. (1986). Generalized fisheye views. Proc. CHI:
ACM Conf. on Computer-Human Interaction (pp. 16–23).
Boston.

Gilkey, R. H., & Anderson, T. R. (Eds.). (1997). Binaural and
Spatial Hearing in Real and Virtual Environments. Mah-
way, NJ: Lawrence Erlbaum and Associates. ISBN 0-8058-
1654-2.

Haas, H. (1972). The influence of a single echo on the audibil-
ity of speech. J. Aud. Eng. Soc., 20, 146–159.

Hartman, J., & Wernecke, J. (1996). The VRML 2.0 Hand-
book. Reading, MA: Addison-Wesley Developers Press. ISBN
0-201-47944-3.

Held, R. M., & Durlach, N. I. (1992). Telepresence. Presence:
Teleoperators and Virtual Environments, 1(1), 109–
112. ISSN 1054-7460.

Herder, J., & Cohen, M. (1996). Project report: Design of a
helical keyboard. Proc. ICAD: Int. Conf. Auditory Display
(pp. 139–142). www.santafe.edu/,icad/ICAD96/
proc96/herder.htm. Palo Alto, CA.

Laurel, B. (1986). Interface as mimesis. In D. A. Norman &
S. W. Draper (Eds.), User Centered System Design. Hillsdale,
NJ: Lawrence Erlbaum Associates.

Martens, W. L. (1997). Acoustics and perception of sound
sources at close range. In preparation. http://www.u-
aizu.ac.jp/,wlm/research/close_range.

Mershon, D. H. (1997). Phenomenal geometry and the mea-
surement of perceived auditory distance. In Gilkey & Ander-
son (1997), chapter 13 (pp. 257–274).

Michelitsch, G., Welling, G., & Ott, M. (1998). The role of
virtual distance in the design of communication services. In
T. Kamae (Ed.), IWNA: Proc. Int. Workshop on Networked
Appliances, Kyoto 55-2.

Shepard, R. N. (1984). Structural representations of musical
pitch. In D. Deutsch (Ed.), The Psychology of Music (pp.
343–390). Academic Press. ISBN 0-12-213560-1 or ISBN
0-12-213562-8.

Sheridan, T. B. (1992). Musings on telepresence and virtual
presence. Presence: Teleoperators and Virtual Environments,
1(1), 120–125. ISSN 1054-7460.

———. (1997). Further musings on the psychophysics of pres-
ence. Presence: Teleoperators and Virtual Environments, 5(2),
241–246. ISSN 1054-7460.

Viegas, F. B., & Donath, J. S. (1999). Chat circles. Proc. CHI:
ACM Conf. on Computer-Human Interaction, Pittsburgh.

Wallach, H., Newman, E. B., & Rosenzweig, M. R. (1949).
The precedence effect in sound localization. American Jour-
nal of Psychology, 57, 315–336.

Wernecke, J. (1994). The Inventor Mentor. Addison-Wesley
Developers Press. ISBN 0-201-62495-8.

Wolfram, S. (1996). The Mathematica Book (3rd ed.). Wolfram
Media/Cambridge University Press. ISBN 0-9650532-0-2.

96 P R E S E N C E : V O L U M E 9 , N U M B E R 1