CHI '95 ProceedingsTopIndexes
Interactive ExperienceTOC

Audio GUIs: Interacting with Graphical Applications in an Auditory World

Elizabeth D. Mynatt and W. Keith Edwards

Graphics, Visualization and Usability Center
Georgia Institute of Technology
Atlanta GA 30332-0280
[beth,keith]@cc.gatech.edu
(404) 894-[3658,6266]

© ACM

Abstract

One of the foremost design rules for human-computer interfaces is "Know Thy User." As designers, this rule is difficult to follow if the users are much different than us. The purpose of this interactive experience is to allow people to experience what interacting with graphical interfaces might be like for a blind computer user. In this exhibit, we demonstrate Mercator, a system which transforms X Windows applications into auditory interfaces. The exhibit allows individuals to interact with common graphical applications via an auditory interface. Additional applications of this work for mobile, limited-display devices are also described.

Keywords:

audio, human-computer interaction, auditory interfaces, interface models, rehabilitation engineering, users with special needs, disability

Introduction:

One important breakthrough in human-computer interfaces is the development of graphical user interfaces. These interfaces provide graphical representations for system objects such as disks and files, interface objects such as buttons and scrollbars, and computing concepts such as multi-tasking. However, there are many times when a graphical user interface is inappropriate or unusable. One example is when the task requires that the user's visual attention is somewhere besides the computer screen. Another example is when the computer user is blind or visually-impaired [1].

Imagine interacting with your computer desktop through a primarily auditory exchange. Perhaps you are driving to work and calling into your computer with your carphone. You might want to access your normal work environment from a small, mobile device such as a PDA, or you might be blind and need to work alongside your sighted colleague. These three scenarios require the interfaces to your standard graphical applications be transformed into efficient and intuitive auditory interfaces. The resulting auditory interface should leverage your knowledge of the graphical interface. Furthermore, for this strategy to be of general use, this transformation must be done without specific knowledge of the individual application.

The typical scenario to providing access to a graphical interface is as follows: While an unmodified graphical application is running, an outside agent collects information about the application interface by watching objects drawn to the screen and by monitoring the application behavior. This outside agent (or screen reader) then translates the graphical interface into an auditory and/or tactile interface. Not only does the screen reader translate the graphical presentation into an nonvisual presentation, but the screen reader often provides different user input mechanisms which are more appropriate with the new interface.

Our work in this area began with a simple question: How can we provide access to X Window applications for blind computer users. Historically, blind computer users have little trouble accessing standard ASCII terminals. The line-oriented textual output displayed on the screen is stored in the computer's framebuffer. An access program simply copies the contents of the framebuffer to a speech synthesizer, a Braille terminal or a Braille printer. Conversely, the contents of the framebuffer for a graphical interface are simple pixel values. To provide access to GUIs, it is necessary to intercept application output before it reaches the screen. This intercepted application output becomes the basis for an off-screen model of the application interface. The information in the off-screen model is then used to create alternative, accessible interfaces.

A primary goal of our work, called the Mercator Project, is to provide transparent access to X Windows applications for computer users who are blind or severely visually-impaired. In order to achieve this goal, we addressed two major problems. First, in order to provide transparent access to applications, we created a framework which could allow us to monitor, model and translate graphical interfaces of X Windows applications without modifying the applications. Second, given these application models, we developed a methodology for translating graphical interfaces into non-visual interfaces. This methodology mimics the advantages of GUIs in an nonvisual presentation.

GUI Models

The de facto standard graphical user interface for Unix environments is the X Window System. X Windows is based on a client-server architecture where X applications communicate with a display server over a network protocol. The Mercator architecture captures information about application GUIs by utilizing hooks in the underlying libraries of the X Window System. These hooks, which were in part designed by the authors, send notifications of changes in the application interface such as when a window is created, when a button is highlighted or when a window is dismissed [4].

The information gathered with the hooks forms the basis for an off-screen model of the application interface. A set of translation rules then processes the off-screen model, identifying higher-level objects in the graphical interface and then specifying the presentation of the objects in the auditory interface. The hypothesis behind this scheme is that some portions of the graphical interface directly contribute to the user's mental model of the application interface while other portions of the interface are simply artifacts of the visual presentation. An implicit question in this transformation is how is space used as an organizing medium in the graphical presentation. For example, objects such as menus are likely to be critical to the mental model of the application interface, but the spatial presentation of a group of menu buttons may be irrelevant. Does a column or row-based organization convey any information to the user? The spatial arrangement within the grouping may be important though. For example, are the buttons evenly spaced, or are some segregated from the others using distance as the only visual cue.

Audio GUIs

The primary interface design question addressed in this work is, given a model for a graphical application interface, what corresponding auditory interface do we present. Mercator interfaces are made up of auditory interface components which are related to graphical interface components such as menus, buttons, dialog boxes and so on. In addition to synthesized speech, auditory icons [2] are used to identify the auditory interface components and auditory filters are used to convey attributes of those components. For example, a text-entry field is represented by the sound of an old-fashioned typewriter and a low pass (muffling) filter conveys that the field is unavailable, that is, grayed out in a graphical interface. The label for the field is also read by the speech synthesizer.

Mercator provides a separate navigation method based on the hierarchy of the interface to replace the visual, spatial-oriented mouse navigation used in GUIs. The relationships between objects in the interface are modeled as a tree structure. Users can simply navigate the user interface by walking through the tree structure [3].

Project Information

The Mercator project is a joint effort by the Georgia Tech Multimedia Computing Group (a part of the Graphics, Visualization, and Usability Center) and the Center for Rehabilitation Technology. This work has been sponsored by the NASA Marshall Space Flight Center (Research Grant NAG8-194) and Sun Microsystems Laboratories, Inc.

We are coordinating our current design efforts with the Disability Action Committee on X (DACX) which is directed by Trace Research and Development Center. This committee is made up of Unix workstation vendors (Sun, DEC, IBM), researchers, commercial access vendors, the X Consortium, and other interested parties. The goal of the committee is to design and implement standard access solutions to X Windows for people with various motor and sensory impairments.

References

1. Boyd, L.H., Boyd, W.L. and Vanderheiden, G.C. The graphical user interface: Crisis, danger and opportunity. Journal of Visual Impairment and Blindness, pages 496-502, December 1990. 2. Gaver, W.W. The sonicfinder: An interface that uses auditory icons. Human Computer Interaction, 4:67-94, 1989. 3. Mynatt, E.D. "Auditory Presentation of Graphical User Interfaces," in Kramer, G. (ed) Auditory Display: Sonification, Audification and Auditory Interfaces, Santa Fe. Addison- Wesley: Reading MA., 1994. 4. Edwards, W.K. and Mynatt, E. "An Architecture for Transforming Graphical Interfaces," in the Proceedings of Seventh Annual ACM Symposium on User Interface Software and Technology (UIST), 1994.