Abstract
One of the foremost design rules for human-computer interfaces is "Know Thy User." As designers, this
rule is difficult to follow if the users are much different than us. The purpose of this interactive experience
is to allow people to experience what interacting with graphical interfaces might be like for a blind
computer user. In this exhibit, we demonstrate Mercator, a system which transforms X Windows
applications into auditory interfaces. The exhibit allows individuals to interact with common graphical
applications via an auditory interface. Additional applications of this work for mobile, limited-display
devices are also described.
Keywords:
audio, human-computer interaction, auditory interfaces, interface models,
rehabilitation engineering, users with special needs, disability
Introduction:
One important breakthrough in human-computer interfaces is the development of graphical user interfaces.
These interfaces provide graphical representations for system objects such as disks and files, interface
objects such as buttons and scrollbars, and computing concepts such as multi-tasking. However, there are
many times when a graphical user interface is inappropriate or unusable. One example is when the task
requires that the user's visual attention is somewhere besides the computer screen. Another example is when
the computer user is blind or visually-impaired [1].
Imagine interacting with your computer desktop through a primarily auditory exchange. Perhaps you are
driving to work and calling into your computer with your carphone. You might want to access your normal
work environment from a small, mobile device such as a PDA, or you might be blind and need to work
alongside your sighted colleague. These three scenarios require the interfaces to your standard graphical
applications be transformed into efficient and intuitive auditory interfaces. The resulting auditory interface
should leverage your knowledge of the graphical interface. Furthermore, for this strategy to be of general
use, this transformation must be done without specific knowledge of the individual application.
The typical scenario to providing access to a graphical interface is as follows: While an unmodified
graphical application is running, an outside agent collects information about the application interface by
watching objects drawn to the screen and by monitoring the application behavior. This outside agent (or
screen reader) then translates the graphical interface into an auditory and/or tactile interface. Not only does
the screen reader translate the graphical presentation into an nonvisual presentation, but the screen reader
often provides different user input mechanisms which are more appropriate with the new interface.
Our work in this area began with a simple question: How can we provide access to X Window applications
for blind computer users. Historically, blind computer users have little trouble accessing standard ASCII
terminals. The line-oriented textual output displayed on the screen is stored in the computer's framebuffer.
An access program simply copies the contents of the framebuffer to a speech synthesizer, a Braille terminal
or a Braille printer. Conversely, the contents of the framebuffer for a graphical interface are simple pixel
values. To provide access to GUIs, it is necessary to intercept application output before it reaches the
screen. This intercepted application output becomes the basis for an off-screen model of the application
interface. The information in the off-screen model is then used to create alternative, accessible interfaces.
A primary goal of our work, called the Mercator Project, is to provide transparent access to X Windows
applications for computer users who are blind or severely visually-impaired. In order to achieve this goal,
we addressed two major problems. First, in order to provide transparent access to applications, we created a
framework which could allow us to monitor, model and translate graphical interfaces of X Windows
applications without modifying the applications. Second, given these application models, we developed a
methodology for translating graphical interfaces into non-visual interfaces. This methodology mimics the
advantages of GUIs in an nonvisual presentation.
GUI Models
The de facto standard graphical user interface for Unix environments is the X Window System. X Windows
is based on a client-server architecture where X applications communicate with a display server over a
network protocol. The Mercator architecture captures information about application GUIs by utilizing
hooks in the underlying libraries of the X Window System. These hooks, which were in part
designed by the authors, send notifications of changes in the application interface such as when a window is
created, when a button is highlighted or when a window is dismissed [4].
The information gathered with the hooks forms the basis for an off-screen model of the application
interface. A set of translation rules then processes the off-screen model, identifying higher-level objects in
the graphical interface and then specifying the presentation of the objects in the auditory interface. The
hypothesis behind this scheme is that some portions of the graphical interface directly contribute to the
user's mental model of the application interface while other portions of the interface are simply artifacts of
the visual presentation. An implicit question in this transformation is how is space used as an organizing
medium in the graphical presentation. For example, objects such as menus are likely to be critical to the
mental model of the application interface, but the spatial presentation of a group of menu buttons may be
irrelevant. Does a column or row-based organization convey any information to the user? The spatial
arrangement within the grouping may be important though. For example, are the buttons evenly spaced, or
are some segregated from the others using distance as the only visual cue.
Audio GUIs
The primary interface design question addressed in this work is, given a model for a graphical application
interface, what corresponding auditory interface do we present. Mercator interfaces are made up of auditory
interface components which are related to graphical interface components such as menus, buttons, dialog
boxes and so on. In addition to synthesized speech, auditory icons [2] are used to identify the auditory
interface components and auditory filters are used to convey attributes of those components. For example, a
text-entry field is represented by the sound of an old-fashioned typewriter and a low pass (muffling) filter
conveys that the field is unavailable, that is, grayed out in a graphical interface. The label for the field is
also read by the speech synthesizer.
Mercator provides a separate navigation method based on the hierarchy of the interface to replace the
visual, spatial-oriented mouse navigation used in GUIs. The relationships between objects in the interface
are modeled as a tree structure. Users can simply navigate the user interface by walking through the tree
structure [3].
Project Information
The Mercator project is a joint effort by the Georgia Tech Multimedia Computing Group (a part of the
Graphics, Visualization, and Usability Center) and the Center for Rehabilitation Technology. This work has
been sponsored by the NASA Marshall Space Flight Center (Research Grant NAG8-194) and Sun
Microsystems Laboratories, Inc.
We are coordinating our current design efforts with the Disability Action Committee on X (DACX) which
is directed by Trace Research and Development Center. This committee is made up of Unix workstation
vendors (Sun, DEC, IBM), researchers, commercial access vendors, the X Consortium, and other interested
parties. The goal of the committee is to design and implement standard access solutions to X Windows for
people with various motor and sensory impairments.
References
1. Boyd, L.H., Boyd, W.L. and Vanderheiden, G.C. The graphical user interface: Crisis, danger and
opportunity. Journal of Visual Impairment and Blindness, pages 496-502, December 1990.
2. Gaver, W.W. The sonicfinder: An interface that uses auditory icons. Human Computer
Interaction, 4:67-94, 1989.
3. Mynatt, E.D. "Auditory Presentation of Graphical User Interfaces," in Kramer, G. (ed)
Auditory Display: Sonification, Audification and Auditory Interfaces, Santa Fe. Addison-
Wesley: Reading MA., 1994.
4. Edwards, W.K. and Mynatt, E. "An Architecture for Transforming Graphical Interfaces," in the
Proceedings of Seventh Annual ACM Symposium on User Interface Software and Technology (UIST),
1994.