Abstract
In this paper we propose a conversational metaphor to
provide an easy access to an information base in the context
of a 3D aural environment. This approach tries to exploit
the hearing sense at the outmost. We show that this
approach allows us to build or to adapt current hypermedia
interfaces so that they can be used by blind people.
We analyze how to represent the static architecture of a
virtual environment in which the user travels, comparing it
with existing initiatives for enabling the visually impaired to
have access to computer systems. We discuss how a (blind)
user navigates through the environment, how he can manage
and control the flow of information and how he gets
oriented in this aural framework.
Keywords
Hypermedia, auditory I/O, aids for the impaired, metaphors,
virtual reality.
Introduction
It is widely known that hypermedia applications, in
particular those delivered in CD-ROM, will be prevalent in
certain application domains such as education.
Unfortunately, the interface metaphors emphasizing
graphics, images, etc. do not take into account blind people.
There are several initiatives that enable the visually
impaired to have generic access to computing systems, for
example the Mercator project [1], or the European GUIB
project [2]. However, in this paper we present a rather
different and more specific approach :
* how blind people can take profit from new 3D sound
technology, and
* how we can produce suitable metaphors to browse
information.
New technology that comes in some way from Virtual
Reality, enables us to produce a spatialized kind of sound
with headphones, called 3D sound. This sound processing is
made by convolutioning the desired sound with certain FIR
( Finite Impulse Response ) filters, called HRTF ( Head
Related Transfer Functions ), which are calculated with
special acoustic recordings taken from human ears [3]. The
final result heard with headphones is a sound in a certain
position of the space.
These capabilities tempted us to produce some kind of
virtual aural environment in order to generate one interface
to a certain information system for blind people. This
acoustic environment enables us to make use of one of the
major input modalities that the visually impaired have. But
nice sounds alone in the space do not solve the complexity
of a usable and friendly information system. How can we
produce useful material? There are several papers that talk
about 3D sound, but a few deal with models or metaphors to
exploit these capabilities.
THE METAPHOR
In our approach, we use a special version of the well known
hypertext model [4]. It is based on a direct graph of nodes
and links. Nodes represent certain documents. Links reflect
semantic relationships among documents. Normally the
links are displayed, either explicitly, as a menu of choices
on the screen, or implicitly, as an action that is taken when
the user does something to an identifiable target object in
the interface. The hypertext model offers many advantages,
like management of linked reference, content not bound to
a fixed structure, dynamic interaction, etc.
We propose a special hypermedial model, in which each
node has some type that reflects the kind of information that
it contains. Each type of node is mapped to a certain
speaker in a determined position of space. The role of the
speakers is to engage a conversation, and the user can
interact with these participants browsing and managing the
flow of the presented talk. The most interesting possibility
for the user is the link selection. If in a certain part of the
talk there is a link to another concept, the speaker in charge
of talking about this, makes a short comment. If the user is
interested in it, with a joystick indicates the direction of the
desired speaker, and presses one of the buttons. The user
can do this, because he knows two cues: the voice and the
most important one, the position in the space of each
speaker. The voice has been sampled from different real
persons, and processed in a off-line way in order to generate
3D voice. By means of the assigned type to the speaker,
each one gives a special information point-of-view and in
some way assigns an antropomorphical characteristic to the
information content, extending the idea described in [5].
Without knowing , the user is working with a hypermedia
system, since the hypertext structure has been mapped to the
conversation. Moreover, each link can reflect several
conversational characteristics, such us requests,
acknowledges, counter-offers ,etc. Thus, the presented
information is fine-grained, nonlinear, highly interconnected
and structure and type dependent on their application
domain, because each domain can present different classes
of information or point-of-view assignment.
THE ENVIRONMENT
This environment is controlled by activating 3D auditory
icons, which are presented in the space in the horizontal
head plane, and selected with a joystick with buttons. In this
way we are presenting some kind of sound direct
manipulation. Taking profit from 3D sound, the user can
select one of several simultaneous 3D auditory icons. This
characteristic is known as "cocktail party effect".
Moreover, a static surrounding is simulated by using an
auditory version of rooms metaphor, enabling us to
modelize a static environment architecture. So, the user can
walk along a corridor with rooms on one side. In each room
a coversation is being held, related to the whole subject of
the system. The organization of rooms along the corridor is
not arbitrary. Their order provides a kind of index.
In order to carry out task control, there is a special speaker,
called the assistant, who stands all the time in a fixed
position related to the user's position. This assistant's
function cover tasks such as backtracking and orientation of
the user by means of context dependent advice. In this way
control task and information content is presented
homogeneously: the user interacts with different persons.
When the user desires, the space aural simulation of the
environment is reinforced by means of verbal descriptions
presented by the assistant. There is evidence that this
modality creates isomorphism between the mental model
and the simulated space [6].
IMPLEMENTATION
The prototype was developed with Borland Pascal running
under Microsoft Windows. We have made two prototypes.
In the first one, all the simulation was made in an off-line
way by using a set of HRTF's, with limited real time
manipulation. In the second one, another PC machine with a
Gravis Ultrasound card is in charge of simulating the
environment and real time effects ( like steps sounds , doors
closing, etc.). This low cost audio board is capable of
generating an acceptable 3D real time sound simulation.
The two machines are linked by means of NetBios over an
ethernet network. With this prototype, short real time
auditory icons can be played simultaneously with the
previously rendered 3D voice. We are now developing a
head tracking version taking profit from one VictorMaxx
HMD ( Head Mounted Display).
CONCLUSIONS AND FUTURE WORK
The rooms-conversational metaphor provides the user with
the functionality of the system, and the 3D sound exploits a
very important auditory dimension : the position in space.
Thus we have tried to solve the difficulty presented when
navigating in an aural interface [7]. Some topics like 3D
auditory icons or sound direct manipulation present new
challenges in which we are working at present.
One of the more attractive possibilities is the production of
self-contained material packaged in a CD-ROM. With
certain trade-offs related to real-time object manipulation,
the final product will contain a large set of 3D auditory
icons to produce the environment simulation as well as the
voice of the different speakers. It is well known that the
price of special hardware to perform 3D sound is going
down; nevertheless the suggested approach requires only a
cheap platform.
ACKNOWLEDGMENT
We would like to thank Dr. Fred Wightman from Waisman
Center, University of Wisconsin, for having provided us
with a HRTF set.
References
1. Mynatt E., Edwards W., The Mercator Environment. A
Non Visual Interface to X Window and Unix
Workstation. GVU Tech Report GIT-GVU-92-05,
February 1992.
2. Weber, G., Kochanek D., Stephanidis C., Homatas G.,
Access by blind people to interaction objects in MS
Windows, in Proc. ECART 2 European Conference on
the Advancement of Rehabilitation Technology
( Stockholm, May 1993 ) , pp.2.2
3. Wenzel E.M., Localization in Virtual Acoustic Displays.
Presence , vol. 1 number 1, (1992), pp. 80-107
4. Conklin J., Hypertext: An introduction and Survey,
IEEE Computer, September 1987, pp. 17-41
5. Muller M., Farrel R., Cebulka K., Smith J., Issues in the
Usability of Time-Varying Multimedia. Multimedia
Design, ACM Press, 1992, pp.7-38
6. Denis M., Visual Images a Models of Described
Environments, in Proceedings of the INSERM-SETAA
conference Non-Visual HCI, (Paris, March 1993 )
pp. 3-12
7. Arons B., Hyperspeech: Navigating in speech-only
hypermedia. In Proceedings of Hypertext ‘91,
pp.133-146. ACM, 1991