Keywords:
human-computer interaction, auditory interfaces, enabling technology, Mercator
SUMMARY
Imagine that you are driving to work and running late for an important meeting. Realizing that you need to
print out some reports before the meeting, you call into work and start your standard word processing
application. By listening to the auditory cues, you navigate to the file menu which has the print option. You
print your reports so that they will be waiting for you when you arrive. While you're connected, you start
your standard email application and listen to any new email messages. You delete some messages and file
others into some predefined folders, planing to reply to the rest of the messages later that morning.
There are a number of aspects about this scenario which are unique, and have yet to be fully realized by
current research and commercial systems. First, the application interfaces were presented completely in the
auditory modality. You had no visual interface to these applications. But you were able to navigate the
applications and identify the interface objects in the application, such as menus, by the auditory cues.
Second, you were using your standard word-processing and email applications which you typically use
everyday via their graphical interfaces. The applications had not been modified so that they could be used
remotely, but their graphical interface had been transformed into an auditory interface. Since both interfaces
(graphical and auditory) were based on a common model, you had little difficulty switching between
them.
The graphical user interface (GUI) is, at this time, a common vehicle for facilitating human-computer
communication. However, there are many times when a graphical user interface is inappropriate or
unusable. For example many tasks involving monitoring or navigating the physical environment require that
the user's visual attention is some place other than the computer screen. GUIs are also inappropriate when
the computer user is blind or visually-impaired [1].
This research investigates transforming graphical user interfaces into auditory user interfaces. The
instantiation of this work is the design and evaluation of auditory interfaces, for a system called Mercator,
that provide transparent access to X Windows applications for computer users who are blind. A critical
design criterion for access systems (typically called screen-readers) is that they support
collaboration with users of graphical interfaces while providing an intuitive auditory interface. The need for
collaboration requires that the graphical and auditory interfaces share the same underlying conceptual
model. An equally important design constraint is that this transformation be transparent to the target
application; that it be accomplished without modifying the application, or providing application specific
information [3].
This work contributes to the field of human-computer interaction by exploring the transformation of an
application interface from the graphical modality to the auditory modality while maintaining the underlying
conceptual model of the application interface. In order to achieve this transformation, two primary design
issues have been addressed. The first issue is modeling the graphical interface so that it captures the salient
characteristics of the application interface while discarding information relevant only in the graphical
presentation. The second issue is realizing the model of the graphical interface as an auditory interface. This
paper provides a summary of the research conducted to address these two issues.
Transforming Application Interfaces
The goal in transforming an application interface from one modality to another is to retain the user's mental
model of the application interface while optimizing the presentation of the interface for a specific modality.
Identifying the constructs of the application interface which make up a significant portion of the user's
mental model is central to this effort. The stages in this process are:
-
Choosing an appropriate level of abstraction of the graphical interface as a starting point for the interface
model.
Graphical user interfaces can be decomposed at many levels. At the highest level, the semantic
level, different controls for reading and manipulating information are available. These operations are
couched in familiar graphical objects, such as menus, at the syntactic level. Finally, at the lexical level,
these objects are spatially presented on a two-dimensional display. Although most screen-reader
applications have focused on transforming the interface at the lexical level, this research has hypothesized
that transforming the interface at the semantic level best captures the underlying model of the user interface.
This model is then constrained with the syntactic vernacular so that the names of user-level components are
the same in the graphical and auditory interfaces.
-
Identify the major components of the model and the relationships between those components.
Graphical user interfaces are commonly constructed and presented as a collection of objects such
as windows, menus and scrollbars. These objects convey the operations available at the semantic level of
the interface while the names of these objects form the syntactic vernacular. The graphical interface is
constructed by grouping these objects into larger clusters, such as grouping controls in a dialog box.
Therefore, a reasonable starting point for constructing the interface model is identifying the user-level
objects and their hierarchical relationships. Luckily this information is often retrievable from monitoring the
execution of a graphical application.
-
Analyzing the visual presentation of the graphical interface to identify semantic information not represented
in the initial hierarchical object model.
For example the spatial distribution of objects in a dialog box may be important in understanding
the relationships of those objects. Even if the contents of the dialog box are at the same level of the object
hierarchy, their spatial arrangement from left to right, top to bottom implies an optimal order for working
with the objects. In other cases, artifacts of the graphical presentation, such as space-saving techniques, can
be ignored.
At the end of this process, the graphical interface is modeled as an annotated object hierarchy. While this
model is simpler than many models used in user interface management systems (UIMS), it appears to be
sufficient for creating a corresponding auditory interface.
Interaction in the Auditory World
Designing auditory interfaces that serve as replacements for graphical interfaces is an arduous task. One
design challenge is that the auditory cues must convey the contents of the interface, in addition to their more
traditional role of conveying feedback about user actions and application events. Given the object-based
model of the interface, the first step is to convey information about the individual objects. To support quick
recognition and learnability of the auditory cues. auditory icons [2] are used to represent the interface
objects. For example, an editable text field sounds like an old fashioned typewriter while a message bar
(read-only text) sounds like a printer. The base auditory icons are then manipulated to convey attributes of
the interface objects. For example, a greyed-out toggle button sounds like a muffled chain-pull light switch.
Containers are represented by the sound of an opening door while the pitch of the sound indicates the
relative size of the container.
Given these techniques, the task of identifying which auditory cues to use remains. Two steps in selecting
auditory icons are evaluating the identifiability of auditory cues and evaluating the possible conceptual
mappings between the cues and concepts in interfaces. Two experiments that explore these issues have been
completed. First the identifiability of 64 short "everyday" sounds was evaluated. The experimental results
highlighted sounds which were identifiable and indicated a distinction between using sounds to represent
actions or objects. In the second experiment, subjects matched sounds and common concepts in graphical
interfaces. The mappings chosen indicated four types of selection ranging from matching the sound to the
semantics of the interface concept, to matching the sound to the physical appearance of the interface
concept. Physical parameters of the sounds such as length and complexity also affected how the sounds
were mapped to interface concepts.
A final design issue directly ties the structure of the mental model to the interaction in the auditory
interface. Methods for navigating the structure of the interface must be based on the general model of the
application interface and tailored to the auditory presentation. In this work, the structure of the interface has
been realized as a tree where the different nodes represent parent or container constructs and the leaf nodes
are the individual objects. Simple navigation is accomplished by walking the tree structure via the arrow
keys on the numeric keypad while additional controls allow users to jump to different portions of the tree.
Using these techniques, the physical act of navigating the interface reinforces the overall model of the
interface structure.
References.
1. Boyd, L. H., Boyd, W. L. and Vanderheiden, G. C. The graphical user interface: Crisis, danger and
opportunity. Journal of Visual Impairment and Blindness, pages 496-502, December 1990.
2. Gaver, W.W. The sonicfinder: An interface that uses auditory icons. Human Computer
Interaction, 4:67-94, 1989.
3. Mynatt, E. D. and Edwards, W. K. Mapping GUIs to Auditory Interfaces. In UIST ‘92: The
Fifth Annual Symposium on User Interface Software and Technology Conference Proceedings,
November 1992.