CHI 97 Electronic Publications: Tutorials
Spoken Dialogue Interfaces
Susann LuperFoy
The MITRE Corporation
1820 Dolley Madison Blvd.
McLean, VA 22102 USA
+1 703 883 6091 (voice)
+1 703 883 1279 (fax)
luperfoy@mitre.org
Georgetown University
Department of Linguistics
Washington, DC 20057 USA
ABSTRACT
This introductory tutorial overviews recent advancements and current
efforts in the integration of speech processing with other components
of spoken-dialogue systems. It examines important results in designing,
constructing, and evaluating complete conversational systems that
integrate speech recognition and synthesis with other enabling
technologies. Among the disciplines contributing material for
the course are, therefore, speech recogntion and synthesis, but
also natural language processing, user-interface design, machine
translation, planning and plan recognition, gesture analysis,
computational discourse, and usability evaluation. The full-day
course is comprised of four sessions including an introduction
to the state of the art, review of existing spoken interface systems,
the integration of speech processing with other interaction modalities,
and a closing session on evalution methods, tools for developing
spoken dialogue systems, and other issues affecting the spoken
interface community.
Keywords
Speech, dialogue, conversational interfaces, natural language
© 1997 Copyright on this material is held by the authors.
INTRODUCTION
While interest in spoken language interfaces is at an all-time
high many people question the practicality of voice as a mode
of interaction with machines. This tutorial examines this issue
of appropriate uses of the speech modality in interfaces as part
of a broad introduction to the current state of the art and directions
for future research and development. We will review important
results in designing, constructing, and evaluating complete spoken-dialogue
systems that integrate speech recognition and synthesis with other
components of the user interface.
Format and Schedule
This tutorial will be conducted in lecture format with ample time
for questions and discussion. Material presented will be divided
into four segments corresponding to the two morning and two afternoon
sessions. The first segment will be devoted to introductory material
addressing the range of dialogue interface types and a review
of implemented systems that use spoken dialogue technology. The
introductory segment will also include a discussion of appropriate
uses of speech as a interaction modality, i.e., when does typed
or spoken natural language mprove the interface, when is it a
distraction from more suitable direct manipulation interaction, and
when can we grant the user the flexibility to choose modality.
The second morning session will cover the component technologies
of spoken dialogue interface systems: speech recognition, syntactic
analysis, semantic analysis, discourse processing, dialogue management,
output generation, and speech synthesis. Each of these technologies
will be described in terms of its contribution to a spoken dialogue
interface application and for each a brief tutorial on the current
state of the art will be given.
The first afternoon session is for review and discussion of several
innovative spoken dialogue interface systems. Video taped demonstrations
of existing systems in each of several categories will be shown,
among them computer-mediated human-to-human dialogue, simulated
human-human spoken dialogue, integration of speech with gestures
and facial expressions in output generation, voice control of
visualization interfaces, voice-only systems, and systems that
combine speech and direct manipulation input channels.
The closing session will review evaluation issues such as the
DARPA-funded ATIS (Air Travel Information Systems) community wide
evaluation effort. Remaining time will be devoted to a review
of some innovative tools for rapid construction of spoken dialogue
interfaces and discussion of student questions.
Target Audience
The target audience consists of consumers of research results
in spoken dialogue: government and commercial managers of technology,
students and faculty in both engineering and theoretical departments
who are developing hypotheses for longer term research, and language-system
designers and implementers who apply today's prevailing theories
to the construction of usable systems.
Students who complete this course will be familiar with the current
state of the art in research and commercial applications, they
will know where to look for further references, and they will
have ideas for judging the potential or actual contribution of
speech processing for a given interface system. The course will
expose students to a range of current application projects, tools
for developing spoken dialogue systems, and methods for evaluation.
REFERENCES
1. LuperFoy, S. (editor) "Automated Spoken Dialogue Systems,"
MIT Press, (forthcoming).
2. Roe, D. B. and J.G. Wilpon (editors) "Voice Communication
between Humans and Machines," National Academy Press, Washington
D.C., 1994.
3. Smith, R.W. and D.R. Hipp (editors) "Spoken Natural Language
Dialogue Systems: A Practical Approach," Oxford University
Press, 1994.
4. Waibel, A. and K.F. Lee (editors) "Readings in Speech
Recognition," Morgan Kaufman, San Mateo, CA, 1990.
CHI 97 Electronic Publications: Tutorials