CHI '95 ProceedingsTopIndexes
Doctoral ConsortiumTOC

Transforming Graphical Interfaces into Auditory Interfaces

Elizabeth D. Mynatt

Graphics, Visualization & Usability Center
Georgia Institute of Technology
Atlanta, GA 30332-0280
Tel: +1 (404) 894-3658
elizabeth.mynatt@cc.gatech.edu

© ACM

Keywords:

human-computer interaction, auditory interfaces, enabling technology, Mercator

SUMMARY

Imagine that you are driving to work and running late for an important meeting. Realizing that you need to print out some reports before the meeting, you call into work and start your standard word processing application. By listening to the auditory cues, you navigate to the file menu which has the print option. You print your reports so that they will be waiting for you when you arrive. While you're connected, you start your standard email application and listen to any new email messages. You delete some messages and file others into some predefined folders, planing to reply to the rest of the messages later that morning.

There are a number of aspects about this scenario which are unique, and have yet to be fully realized by current research and commercial systems. First, the application interfaces were presented completely in the auditory modality. You had no visual interface to these applications. But you were able to navigate the applications and identify the interface objects in the application, such as menus, by the auditory cues. Second, you were using your standard word-processing and email applications which you typically use everyday via their graphical interfaces. The applications had not been modified so that they could be used remotely, but their graphical interface had been transformed into an auditory interface. Since both interfaces (graphical and auditory) were based on a common model, you had little difficulty switching between them.

The graphical user interface (GUI) is, at this time, a common vehicle for facilitating human-computer communication. However, there are many times when a graphical user interface is inappropriate or unusable. For example many tasks involving monitoring or navigating the physical environment require that the user's visual attention is some place other than the computer screen. GUIs are also inappropriate when the computer user is blind or visually-impaired [1].

This research investigates transforming graphical user interfaces into auditory user interfaces. The instantiation of this work is the design and evaluation of auditory interfaces, for a system called Mercator, that provide transparent access to X Windows applications for computer users who are blind. A critical design criterion for access systems (typically called screen-readers) is that they support collaboration with users of graphical interfaces while providing an intuitive auditory interface. The need for collaboration requires that the graphical and auditory interfaces share the same underlying conceptual model. An equally important design constraint is that this transformation be transparent to the target application; that it be accomplished without modifying the application, or providing application specific information [3].

This work contributes to the field of human-computer interaction by exploring the transformation of an application interface from the graphical modality to the auditory modality while maintaining the underlying conceptual model of the application interface. In order to achieve this transformation, two primary design issues have been addressed. The first issue is modeling the graphical interface so that it captures the salient characteristics of the application interface while discarding information relevant only in the graphical presentation. The second issue is realizing the model of the graphical interface as an auditory interface. This paper provides a summary of the research conducted to address these two issues.

Transforming Application Interfaces

The goal in transforming an application interface from one modality to another is to retain the user's mental model of the application interface while optimizing the presentation of the interface for a specific modality. Identifying the constructs of the application interface which make up a significant portion of the user's mental model is central to this effort. The stages in this process are: At the end of this process, the graphical interface is modeled as an annotated object hierarchy. While this model is simpler than many models used in user interface management systems (UIMS), it appears to be sufficient for creating a corresponding auditory interface.

Interaction in the Auditory World

Designing auditory interfaces that serve as replacements for graphical interfaces is an arduous task. One design challenge is that the auditory cues must convey the contents of the interface, in addition to their more traditional role of conveying feedback about user actions and application events. Given the object-based model of the interface, the first step is to convey information about the individual objects. To support quick recognition and learnability of the auditory cues. auditory icons [2] are used to represent the interface objects. For example, an editable text field sounds like an old fashioned typewriter while a message bar (read-only text) sounds like a printer. The base auditory icons are then manipulated to convey attributes of the interface objects. For example, a greyed-out toggle button sounds like a muffled chain-pull light switch. Containers are represented by the sound of an opening door while the pitch of the sound indicates the relative size of the container.

Given these techniques, the task of identifying which auditory cues to use remains. Two steps in selecting auditory icons are evaluating the identifiability of auditory cues and evaluating the possible conceptual mappings between the cues and concepts in interfaces. Two experiments that explore these issues have been completed. First the identifiability of 64 short "everyday" sounds was evaluated. The experimental results highlighted sounds which were identifiable and indicated a distinction between using sounds to represent actions or objects. In the second experiment, subjects matched sounds and common concepts in graphical interfaces. The mappings chosen indicated four types of selection ranging from matching the sound to the semantics of the interface concept, to matching the sound to the physical appearance of the interface concept. Physical parameters of the sounds such as length and complexity also affected how the sounds were mapped to interface concepts.

A final design issue directly ties the structure of the mental model to the interaction in the auditory interface. Methods for navigating the structure of the interface must be based on the general model of the application interface and tailored to the auditory presentation. In this work, the structure of the interface has been realized as a tree where the different nodes represent parent or container constructs and the leaf nodes are the individual objects. Simple navigation is accomplished by walking the tree structure via the arrow keys on the numeric keypad while additional controls allow users to jump to different portions of the tree. Using these techniques, the physical act of navigating the interface reinforces the overall model of the interface structure.

References.

1. Boyd, L. H., Boyd, W. L. and Vanderheiden, G. C. The graphical user interface: Crisis, danger and opportunity. Journal of Visual Impairment and Blindness, pages 496-502, December 1990. 2. Gaver, W.W. The sonicfinder: An interface that uses auditory icons. Human Computer Interaction, 4:67-94, 1989. 3. Mynatt, E. D. and Edwards, W. K. Mapping GUIs to Auditory Interfaces. In UIST ‘92: The Fifth Annual Symposium on User Interface Software and Technology Conference Proceedings, November 1992.