8
The Design of Multidimensional
Sound Interfaces
Michael Cohen & Elizabeth M. Wenzel
Presented by:
Andrew Snyder & Thor Castillo
February 3, 2000
HFE760 - Dr. Gallimore
Table of Contents
• Introduction – How we localize sound
• Chapter 8
• Research
• Conclusion
Introduction
•
•
•
•
Ear Structure
Binaural Beats - Demo
Why are they important?
Localization Cues
Introduction
• Ear Structure
Introduction
• Binaural Beats – Demo
• Why are they Important?
Introduction
• Localization Cues
– Humans use auditory localization cues to help locate the position in
space of a sound source. There are eight sources of localization
cues:
• interaural time difference
• head shadow
• pinna response
• shoulder echo
• head motion
• early echo response
• reverberation
• vision
Introduction
• Localization Cues
– Interaural time difference describes the time delay between sounds
arriving at the left and right ears.
– This is a primary localization cue for interpreting the lateral position of
a sound source.
Introduction
• Localization Cues
– Head shadow is a term describing a sound having to go through or
around the head in order to reach an ear.
– The filtering effects of head shadowing cause one to have perception
problems with linear distance and direction of a sound source.
Introduction
• Localization Cues
– Pinna response desribes the effect that the external ear, or pinna, has
on sound.
– Higher frequencies are filtered by the pinna in such a way as to affect
the perceived lateral position, or azimuth, and elevation of a sound
source.
Introduction
• Localization Cues
– Shoulder echo - Frequencies in the range of 1-3kHz are reflected
from the upper torso of the human body.
Introduction
• Localization Cues
– Head motion - The movement of the head in determining a location of
a sound source is a key factor in human hearing and quite natural.
Introduction
• Localization Cues
– Early echo response and reverberation -Sounds in the real world are
the combination of the original sound source plus their reflections
from surfaces in the world (floors, walls, tables, etc.).
– Early echo response occurs in the first 50-100ms of a sounds life.
Introduction
• Localization Cues
– Vision helps us quickly locate the physical location of a sound and
confirm the direction that we perceive
Chapter 8 Contents
• Introduction
• Characterization and Control of Acoustic
Objects
• Research Applications
• Interface Control via Audio Windows
• Interface Issues: Case Studies
Introduction
• I/O generations and dimensions
• Exploring the audio design space
Introduction
• I/O generations and dimensions
– First Generation - Early computer terminals allowed only textual i/o –
Character-based user interface (CUI)
– Second Generation - As terminal technology improved, user could
manipulate graphical objects – Graphical User Interface (GUI)
– Third Generation –3D graphical devices.
– 3D audio: The sound has a spatial attribute, originating, virtually or
exactly, from an arbitrary point with respect to the listener – This
chapter focused on the third-generation of aural sector.
Introduction
• Exploring the audio design space
– Most people think that it would be easier to be hearing- than sightimpaired, even though the incidence of disability-related cultural
isolation is higher among the deaf than the blind.
– The development of user interfaces has historically been focused
more on visual modes than aural.
– Sound is frequently included and utilized to the limits of its availability
and affordability in PCs. However, computer aided exploitation of
audio bandwidth is only now beginning to rival that of graphics.
– Because of the cognitive overload that results from overburdening
other systems (perhaps especially the visual) there are strong
motivations for exploiting sound to its full potential
Introduction
• Exploring the audio design space
– This chapter reviews the evolving state of the art of non-speech audio
interfaces, driving both spatial and non-spatial attributes.
– This chapter will focus primarily on the integration of these new
technologies – crafting effective matches between projected user
desires and emerging technological capabilities.
Characterization and Control
of Acoustic Objects
Part of listening to a mixture of conversations or music is being able to hear
the individual voices or musical instruments. This synthesis/decomposition
duality is the opposite effect of masking: instead of sounds hiding each
other, they are complementary and individually perceivable.
Audio imaging – the creation of sonic illusions by manipulation of stereo
channels.
Stereo system – sound comes from only left and right transducers, whether
headphones or loudspeakers.
Spatial sound involves technology that allows sound to emanate from any
direction. (left-right, up-down, back-forth, and everything in between)
Characterization and Control
of Acoustic Objects
The cocktail party effect…we can filter sound according to
•
•
•
•
•
position
speaker voice
subject matter
tone/timbre
melodic line and rhythm
Characterization and Control
of Acoustic Objects
• Spatial dimensions of sound
• Implementing spatial sound
• Non-spatial dimensions and auditory
symbology
Characterization and Control of
Acoustic Objects
• Spatial dimensions of sound
– The goal of spatial sound synthesis is to project audio media into
space by manipulating sound sources so that they assume virtual
positions, mapping the source channel into three-dimensional space.
These virtual positions enable auditory localization.
– Duplex Theory (Lord Rayleigh, 1907) – human sound localization is
based on two primary cues to location, interaural differences in time
of arrival and interaural differences in intensity.
Characterization and Control of
Acoustic Objects
• Spatial dimensions of sound
– There are several problems with the duplex theory:
• Cannot account for the ability of subjects to localized many types of
sounds coming from many different regions (ex. Sound along the median
plane)
• When using duplex to generate sound cues in headphones, the sound is
perceived inside the head
• Most of the deficiencies with the duplex theory are linked to the
interaction of sound waves in the pinnae (outer ears)
Characterization and Control of
Acoustic Objects
• Spatial dimensions of sound
– Peaks and valleys in the auditory spectrum can be used as
localization cues for elevation of the sound source. Other cues are
also necessary to locate the vertical position of a sound source. This
is very important to researchers since it has never been truly
understood.
Characterization and Control of
Acoustic Objects
• Spatial dimensions of sound
– Localization errors in current sound generating technologies is very
common, some of the problems that persist are:
• Locating sound on the vertical plane
• Some systems can cause a front ↔ back reversal
• Some systems can cause an up ↔ down reversal
• Judging distance from the sound source! – We’re generally terrible at
doing this anyways!!!
– Sound localization can be dramatically improved with a dynamic
stimulus (can reduce amount of reversals)
• Allowing head motion
• Moving the location of the sound
• Researchers suggest that this can help externalize sound!!!
Characterization and Control of
Acoustic Objects
• Implementing spatial sound
– Physically locating loudspeakers in the place were each source is
located, relative to the listener. (Most direct forward)
• Not portable – Cumbersome
– Other approaches use analytic mathematical models of the pinnae
and other body structures in order to directly calculate acoustic
responses.
– A third approach to accurate real-time spatialization concentrates on
digital sound processors (DSP) techniques for synthesizing cues from
direct measurements of head related transfer functions. (The author
focuses on this type of approach)
Characterization and Control of
Acoustic Objects
• Implementing spatial sound
– DSP – The goals is to make sound spatializers that give the
impression that the sound is coming from different sources and
different locations.
– Why? - A display that focuses on this technology can exploit the
human ability to quickly and subconsciously locate sound sources.
– Convolution – Hardware and/or Software based engines performs the
convolution that filters the sound in some DSPs
Characterization and Control of
Acoustic Objects
• Implementing spatial sound
– Crystal River Engineering Convolvotron
– Gehring Research Focal Point
– AKG CAP (Creative Audio Processor)
– Head Acoustics
– Roland Sound Space (RSS) Processor
– Mixels
Characterization and Control of
Acoustic Objects
• Implementing spatial sound
– Crystal River Engineering Convolvotron
– What is it? – It is a convolution engine that spatializes sound by
filtering audio channels with transfer functions that simulate positional
effects.
• Alphatron & Acoustetron II
– The technology is good except for time delays do to computation of
30-40 ms (which can be picked up by the ear if used with visual
inputs)
Characterization and Control of
Acoustic Objects
• Implementing spatial sound
– Gehring Research Focal Point
– What is it? – Focal Point™ comprises two binaural localization
technologies, Focal Point Type 1 and 2.
• Focal Point 1 – the original Focal Point technology, utilizing time-domain
convolution with head related transfer function based impulse responses
for anechoic simulation.
• Focal Point 2 – a Focal Point implementation in which sounds are
preprocessed offline, creating interleaved sound files which can then be
positioned in 3D in real-time upon playback.
Characterization and Control of
Acoustic Objects
• Implementing spatial sound
– AKG CAP (Creative Audio Processor)
– What is it? – A kind of binaural mixing console. The system is used to
create audio recordings with integrated Head Related Transfer
Functions and other 3D audio filters.
Characterization and Control of
Acoustic Objects
• Implementing spatial sound
– Head Acoustics
– What is it? – A research company in Germany that has developed a
spatial audio system with an eight-channel binaural mixing console
using anechoic simulations as well as a new version of an artificial
head
Characterization and Control of
Acoustic Objects
• Implementing spatial sound
– Roland Sound Space (RSS) Processor
– What is it? – Roland has developed a system which attempts to
provide real-time spatialization capabilities for both headphones and
stereo loudspeaker presentation. The basic RSS system allows
independent placement of up to four sources using domain
convolution.
– What makes this system special is that it incorporates a technique
know as transaural processing, or crosstalk cancellation between the
stereo speakers. This technique seems to allow an adequate spatial
impression to be achieved.
Characterization and Control of
Acoustic Objects
• Implementing spatial sound
– Mixels
– The number of channels in a system corresponds to the degree of
spatial polyphony, simultaneous spatialized sound sources, the
system can generate. In the assumption that systems will increase
their capabilities enormously, via number of channels, we label their
number of channels as Mixels.
– By way of analogy to pixels and voxels, the atomic level of sound is
sometimes called mixels, acronymic for sound mixing elements
Characterization and Control of
Acoustic Objects
• Implementing spatial sound
– Mixels
– But, rather than diving in deeply into more spatial audio systems, the
rest of the chapter will concentrate on the nature of control interfaces
that will need to be developed to take full advantage of these new
capabilities.
Characterization and Control of
Acoustic Objects
• Non-spatial dimensions and auditory symbology
– Auditory icons - acoustic representations of naturally occurring events
that caricature the action being represented
– Earcons – elaborated auditory symbols which compose motifs into
artificial non-speech language, phrases distinguished by rhythmic and
tonal patterns
Characterization and Control of
Acoustic Objects
• Non-spatial dimensions and auditory symbology
– Filtears – a class of cues that independent of distance and direction.
They are used to attempt to expand the spectrum of how we used
sound. Used to create sounds with attributes attached to them. Think
of it as sonic typography: placing sound in space can be likened to
putting written information on a page. Filtears are dependant on
source and sink.
– Example: Imagine your telenegotiating with many people. You can
select attributes of a person’s voice. (distance from you, direction,
indoors-outdoors, whispers behind your ear, etc…)
Research Applications
• Virtual acoustic displays featuring spatial sound can be
thought of as enabling two performance advantages:
– Situation Awareness – Omnidirectional monitoring via direct
representation of spatial information reinforces or replaces
information in other modalities, enhancing one’s sense of presence or
realism.
– Multiple Channel Segregation – can improve intelligibility,
discrimination, selective attention among audio sources.
Research Applications
•
•
•
•
•
Sonification
Teleconferencing
Music
Virtual Reality and Architectural Acoustics
Telerobotics and Augmented Audio Reality
Research Applications
• Sonification
• Sonification can be thought of as auditory visualization and can be used
as a tool for analysis, for example, presenting multivariate data as
auditory patterns. Because visual and auditory channels can be
independent from each other, data can be mapped differently to each
mode of perception, and auditory mappings can be used to discover
relationships that are hidden in the visual display.
Interface Control via Audio
Windows
• Audio Windows is an auditory-object manager.
• The general idea is to permit multiple simultaneous audio
sources, such as teleconference, to coexist in a modifiable
display without clutter or user stress.
Interface Design Issues: Case
Studies
• Veos and Mercury (written with Brian Karr)
• Handy Sound
• Maw
Interface Design Issues:
Case Studies
• Veos and Mercury (written with Brian Karr)
– Veos - Virtual Environment Operating System
– Sound Render Implementation - A software package that interfaces
with a VR system (like Veos).
– The Audio Browser - A hierarchical sound file navigation and audition
tool.
Interface Design Issues:
Case Studies
• Handy Sound
– Handy Sound explores gestural control of an audio window system.
– Manipulating source position in Handy Sound
– Manipulating source quality in Handy Sound
– Manipulating sound volume in Handy Sound
– Summary - Handy sound demonstrates the general possibilities of
gesture recognition and spatial sound in a multichannel conferencing
environment.
Interface Design Issues:
Case Studies
• Maw
– Developed as an interactive frontend for teleconferencing, Maw
allows the user to arrange sources and sinks in a horizontal plane.
– Manipulating source and sink positions in Maw
– Organizing acoustic objects in Maw
– Manipulating sound volume in Maw
– Summary
Conclusion
Real world examples
Sound authoring tools for
future multimedia systems
Bezzi, Marco; De Poli, Giovanni; Rocchesso, Davide
Univ di Padova, Padova, Italy
Summary
•
A framework for authoring non-speech sound objects in the context of multimedia
systems is proposed. The goal is to design specific sound and their dynamic
behavior in such a way that they convey dynamic and multidimensional
information. Sound are designed using a three-layer abstraction model:
physically-based description of sound identity, signal-based description of sound
quality, perception- and geometry-based description of sound projection in space.
The model is validated with the aid of an experimental tool where manipulation of
sound objects can be performed in three ways: handling a set of parameter
control sliders, editing the evolution in time of compound parameter settings, via
client applications sending their requests to the sounding engine. [Author
abstract; 26 Refs; In English]
Conference Information: Proceedings of the 1999 6th International Conference
on Multimedia Computing and Systems - IEEE ICMCS'99; Jun 7-Jun 11 1999;
Florence, Italy; Sponsored by IEEE CS; IEEE Circuit and Systems Society
Interactive 3D sound hyperstories
for blind children
Lumbreras, Mauricio; Sanchez, Jaime
Univ of Chile, Santiago, Chile
Summary
•
Interactive software is currently used for learning and entertainment purposes.
This type of software is not very common among blind children because most
computer games and electronic toys do not have appropriate interfaces to be
accessible without visual cues. This study introduces the idea of interactive
hyperstories carried out in a 3D acoustic virtual world for blind children. We have
conceptualized a model to design hyperstories. Through AudioDoom we have an
application that enables testing cognitive tasks with blind children. The main
research question underlying this work explores how audio-based entertainment
and spatial sound navigable experiences can create cognitive spatial structures
in the minds of blind children. AudioDoom presents first person experiences
through exploration of interactive virtual worlds by using only 3D aural
representations of the space. [Author abstract; 21 Refs; In English]
Conference Information: Proceedings of the CHI 99 Conference: CHI is the
Limit - Human Factors in Computing Systems; May 15-May 20 1999; Pittsburgh,
PA, USA; Sponsored by ACM SIGCHI
Any questions???
References
•
Modeling Realistic 3-D Sound Turbulence
–
•
3D Sound Aids for Fighter Pilots
–
•
http://www.dsto.defence.gov.au/corporate/history/jubilee/sixtyyears18.html
3D Sound Synthesis
–
•
http://www-engr.sjsu.edu/~duda/Duda.Reports.html#R1
http://www.ee.ualberta.ca/~khalili/3Dnew.html
Binaural Beat Demo
–
http://www.monroeinstitute.org/programs/bbapplet.html
•
Begault, Durand R. "Challenges to the Successful Implementation of 3-D Sound", NASAAmes Research Center, Moffett Field, CA, 1990.
•
Begault, Durand R. "An Introduction to 3-D Sound for Virtual Reality", NASA-Ames
Research Center, Moffett Field, CA, 1992.
References
•
Burgess, David A. "Techniques for Low Cost Spatial Audio", UIST 1992.
•
Foster, Wenzel, and Taylor. "Real-Time Synthesis of Complex Acoustic Environments"
Crystal River Engineering, Groveland, CA.
•
Stuart Smith. "Auditory Representation of Scientific Data", Focus on Scientific Visualization,
H. Hagen, H. Muller, G.M. Nielson, eds. Springer-Verlag. 1993.
•
Stuart, Rory. "Virtual Auditory Worlds: An Overview", VR Becomes a Business, Proceedings
of Virtual Reality 92, San Jose, CA, 1992.
•
Takala, Tapio and James Hahn. "Sound Rendering". Computer Graphics, #26, 2, July 1992.
One last thing
• For those who want to have a little fun, try
this:
http://www.cs.indiana.edu/picons/javoice/in
dex.html
http://ourworld.compuserve.com/homepage
s/Peter_Meijer/javoice.htm
8
The Design of Multidimensional
Sound Interfaces
Michael Cohen & Elizabeth M. Wenzel
Presented by:
Andrew Snyder & Thor Castillo
February 3, 2000
HFE760 - Dr. Gallimore
Descargar

Human Decision Making - Wright State University