CHI 97 Electronic Publications: Tutorials
CHI 97 Prev CHI 97 Electronic Publications: Tutorials Next

Spoken Dialogue Interfaces

Susann LuperFoy
The MITRE Corporation
1820 Dolley Madison Blvd.
McLean, VA 22102 USA
+1 703 883 6091 (voice)
+1 703 883 1279 (fax)
luperfoy@mitre.org

Georgetown University
Department of Linguistics
Washington, DC 20057 USA

ABSTRACT

This introductory tutorial overviews recent advancements and current efforts in the integration of speech processing with other components of spoken-dialogue systems. It examines important results in designing, constructing, and evaluating complete conversational systems that integrate speech recognition and synthesis with other enabling technologies. Among the disciplines contributing material for the course are, therefore, speech recogntion and synthesis, but also natural language processing, user-interface design, machine translation, planning and plan recognition, gesture analysis, computational discourse, and usability evaluation. The full-day course is comprised of four sessions including an introduction to the state of the art, review of existing spoken interface systems, the integration of speech processing with other interaction modalities, and a closing session on evalution methods, tools for developing spoken dialogue systems, and other issues affecting the spoken interface community.

Keywords

Speech, dialogue, conversational interfaces, natural language

© 1997 Copyright on this material is held by the authors.



INTRODUCTION

While interest in spoken language interfaces is at an all-time high many people question the practicality of voice as a mode of interaction with machines. This tutorial examines this issue of appropriate uses of the speech modality in interfaces as part of a broad introduction to the current state of the art and directions for future research and development. We will review important results in designing, constructing, and evaluating complete spoken-dialogue systems that integrate speech recognition and synthesis with other components of the user interface.

Format and Schedule

This tutorial will be conducted in lecture format with ample time for questions and discussion. Material presented will be divided into four segments corresponding to the two morning and two afternoon sessions. The first segment will be devoted to introductory material addressing the range of dialogue interface types and a review of implemented systems that use spoken dialogue technology. The introductory segment will also include a discussion of appropriate uses of speech as a interaction modality, i.e., when does typed or spoken natural language mprove the interface, when is it a distraction from more suitable direct manipulation interaction, and when can we grant the user the flexibility to choose modality.

The second morning session will cover the component technologies of spoken dialogue interface systems: speech recognition, syntactic analysis, semantic analysis, discourse processing, dialogue management, output generation, and speech synthesis. Each of these technologies will be described in terms of its contribution to a spoken dialogue interface application and for each a brief tutorial on the current state of the art will be given.

The first afternoon session is for review and discussion of several innovative spoken dialogue interface systems. Video taped demonstrations of existing systems in each of several categories will be shown, among them computer-mediated human-to-human dialogue, simulated human-human spoken dialogue, integration of speech with gestures and facial expressions in output generation, voice control of visualization interfaces, voice-only systems, and systems that combine speech and direct manipulation input channels.

The closing session will review evaluation issues such as the DARPA-funded ATIS (Air Travel Information Systems) community wide evaluation effort. Remaining time will be devoted to a review of some innovative tools for rapid construction of spoken dialogue interfaces and discussion of student questions.

Target Audience

The target audience consists of consumers of research results in spoken dialogue: government and commercial managers of technology, students and faculty in both engineering and theoretical departments who are developing hypotheses for longer term research, and language-system designers and implementers who apply today's prevailing theories to the construction of usable systems.

Students who complete this course will be familiar with the current state of the art in research and commercial applications, they will know where to look for further references, and they will have ideas for judging the potential or actual contribution of speech processing for a given interface system. The course will expose students to a range of current application projects, tools for developing spoken dialogue systems, and methods for evaluation.

REFERENCES

1. LuperFoy, S. (editor) "Automated Spoken Dialogue Systems," MIT Press, (forthcoming).

2. Roe, D. B. and J.G. Wilpon (editors) "Voice Communication between Humans and Machines," National Academy Press, Washington D.C., 1994.

3. Smith, R.W. and D.R. Hipp (editors) "Spoken Natural Language Dialogue Systems: A Practical Approach," Oxford University Press, 1994.

4. Waibel, A. and K.F. Lee (editors) "Readings in Speech Recognition," Morgan Kaufman, San Mateo, CA, 1990.


CHI 97 Prev CHI 97 Electronic Publications: Tutorials Next

CHI 97 Electronic Publications: Tutorials