CHI '95 ProceedingsTopIndexes
Short PapersTOC

A Metaphor for the Visually Impaired: Browsing Information in a 3D Auditory Environment

Mauricio Lumbreras, Gustavo Rossi

LIFIA - Laboratorio de Investigación y Formación en Informática de Avanzada
Dto. Informática - Universidad Nacional de La Plata
Calle 50 y 115 - 1er Piso - 1900 - La Plata
ARGENTINA
e-mail: [mauricio,grossi]@info.unlp.edu.ar

© ACM

Abstract

In this paper we propose a conversational metaphor to provide an easy access to an information base in the context of a 3D aural environment. This approach tries to exploit the hearing sense at the outmost. We show that this approach allows us to build or to adapt current hypermedia interfaces so that they can be used by blind people.

We analyze how to represent the static architecture of a virtual environment in which the user travels, comparing it with existing initiatives for enabling the visually impaired to have access to computer systems. We discuss how a (blind) user navigates through the environment, how he can manage and control the flow of information and how he gets oriented in this aural framework.

Keywords

Hypermedia, auditory I/O, aids for the impaired, metaphors, virtual reality.

Introduction

It is widely known that hypermedia applications, in particular those delivered in CD-ROM, will be prevalent in certain application domains such as education. Unfortunately, the interface metaphors emphasizing graphics, images, etc. do not take into account blind people. There are several initiatives that enable the visually impaired to have generic access to computing systems, for example the Mercator project [1], or the European GUIB project [2]. However, in this paper we present a rather different and more specific approach :
* how blind people can take profit from new 3D sound technology, and
* how we can produce suitable metaphors to browse information.
New technology that comes in some way from Virtual Reality, enables us to produce a spatialized kind of sound with headphones, called 3D sound. This sound processing is made by convolutioning the desired sound with certain FIR ( Finite Impulse Response ) filters, called HRTF ( Head Related Transfer Functions ), which are calculated with special acoustic recordings taken from human ears [3]. The final result heard with headphones is a sound in a certain position of the space.

These capabilities tempted us to produce some kind of virtual aural environment in order to generate one interface to a certain information system for blind people. This acoustic environment enables us to make use of one of the major input modalities that the visually impaired have. But nice sounds alone in the space do not solve the complexity of a usable and friendly information system. How can we produce useful material? There are several papers that talk about 3D sound, but a few deal with models or metaphors to exploit these capabilities.

THE METAPHOR

In our approach, we use a special version of the well known hypertext model [4]. It is based on a direct graph of nodes and links. Nodes represent certain documents. Links reflect semantic relationships among documents. Normally the links are displayed, either explicitly, as a menu of choices on the screen, or implicitly, as an action that is taken when the user does something to an identifiable target object in the interface. The hypertext model offers many advantages, like management of linked reference, content not bound to a fixed structure, dynamic interaction, etc.

We propose a special hypermedial model, in which each node has some type that reflects the kind of information that it contains. Each type of node is mapped to a certain speaker in a determined position of space. The role of the speakers is to engage a conversation, and the user can interact with these participants browsing and managing the flow of the presented talk. The most interesting possibility for the user is the link selection. If in a certain part of the talk there is a link to another concept, the speaker in charge of talking about this, makes a short comment. If the user is interested in it, with a joystick indicates the direction of the desired speaker, and presses one of the buttons. The user can do this, because he knows two cues: the voice and the most important one, the position in the space of each speaker. The voice has been sampled from different real persons, and processed in a off-line way in order to generate 3D voice. By means of the assigned type to the speaker, each one gives a special information point-of-view and in some way assigns an antropomorphical characteristic to the information content, extending the idea described in [5]. Without knowing , the user is working with a hypermedia system, since the hypertext structure has been mapped to the conversation. Moreover, each link can reflect several conversational characteristics, such us requests, acknowledges, counter-offers ,etc. Thus, the presented information is fine-grained, nonlinear, highly interconnected and structure and type dependent on their application domain, because each domain can present different classes of information or point-of-view assignment.

THE ENVIRONMENT

This environment is controlled by activating 3D auditory icons, which are presented in the space in the horizontal head plane, and selected with a joystick with buttons. In this way we are presenting some kind of sound direct manipulation. Taking profit from 3D sound, the user can select one of several simultaneous 3D auditory icons. This characteristic is known as "cocktail party effect".

Moreover, a static surrounding is simulated by using an auditory version of rooms metaphor, enabling us to modelize a static environment architecture. So, the user can walk along a corridor with rooms on one side. In each room a coversation is being held, related to the whole subject of the system. The organization of rooms along the corridor is not arbitrary. Their order provides a kind of index.

In order to carry out task control, there is a special speaker, called the assistant, who stands all the time in a fixed position related to the user's position. This assistant's function cover tasks such as backtracking and orientation of the user by means of context dependent advice. In this way control task and information content is presented homogeneously: the user interacts with different persons.

When the user desires, the space aural simulation of the environment is reinforced by means of verbal descriptions presented by the assistant. There is evidence that this modality creates isomorphism between the mental model and the simulated space [6].

IMPLEMENTATION

The prototype was developed with Borland Pascal running under Microsoft Windows. We have made two prototypes. In the first one, all the simulation was made in an off-line way by using a set of HRTF's, with limited real time manipulation. In the second one, another PC machine with a Gravis Ultrasound card is in charge of simulating the environment and real time effects ( like steps sounds , doors closing, etc.). This low cost audio board is capable of generating an acceptable 3D real time sound simulation. The two machines are linked by means of NetBios over an ethernet network. With this prototype, short real time auditory icons can be played simultaneously with the previously rendered 3D voice. We are now developing a head tracking version taking profit from one VictorMaxx HMD ( Head Mounted Display).

CONCLUSIONS AND FUTURE WORK

The rooms-conversational metaphor provides the user with the functionality of the system, and the 3D sound exploits a very important auditory dimension : the position in space. Thus we have tried to solve the difficulty presented when navigating in an aural interface [7]. Some topics like 3D auditory icons or sound direct manipulation present new challenges in which we are working at present.

One of the more attractive possibilities is the production of self-contained material packaged in a CD-ROM. With certain trade-offs related to real-time object manipulation, the final product will contain a large set of 3D auditory icons to produce the environment simulation as well as the voice of the different speakers. It is well known that the price of special hardware to perform 3D sound is going down; nevertheless the suggested approach requires only a cheap platform.

ACKNOWLEDGMENT

We would like to thank Dr. Fred Wightman from Waisman Center, University of Wisconsin, for having provided us with a HRTF set.

References

1. Mynatt E., Edwards W., The Mercator Environment. A Non Visual Interface to X Window and Unix Workstation. GVU Tech Report GIT-GVU-92-05, February 1992.
2. Weber, G., Kochanek D., Stephanidis C., Homatas G., Access by blind people to interaction objects in MS Windows, in Proc. ECART 2 European Conference on the Advancement of Rehabilitation Technology ( Stockholm, May 1993 ) , pp.2.2
3. Wenzel E.M., Localization in Virtual Acoustic Displays. Presence , vol. 1 number 1, (1992), pp. 80-107
4. Conklin J., Hypertext: An introduction and Survey, IEEE Computer, September 1987, pp. 17-41
5. Muller M., Farrel R., Cebulka K., Smith J., Issues in the Usability of Time-Varying Multimedia. Multimedia Design, ACM Press, 1992, pp.7-38
6. Denis M., Visual Images a Models of Described Environments, in Proceedings of the INSERM-SETAA conference Non-Visual HCI, (Paris, March 1993 ) pp. 3-12
7. Arons B., Hyperspeech: Navigating in speech-only hypermedia. In Proceedings of Hypertext ‘91, pp.133-146. ACM, 1991