



Ivan Bretan (1), Anna-Lena Ereback (1), Catriona MacDermid (2), and Annika Waern (1)
1 Swedish Institute of Computer Science, Box 1263, S-164 28 KISTA, Sweden
The DISA dialogue design methodology comprises at least
two separate stages of simulation. The first stage is based on
soliciting unrestricted, task-oriented, human-human
dialogues. For the second round of simulations, a
rudimentary dialogue model is used, describing how the
target service should behave in different stages of the
dialogue.
WAND defines the communicative space of the simulated
system - what the user can do and talk about. Since the
type of simulation we have strived for assumes that the
service has limited capabilities of understanding, the
dialogue model embodied in WAND allows for handling
only requests that could be mapped to corresponding
functionality in the service in question. When it came to
general speech understanding competence, this stage of the
simulation was liberal, and assumed rapid speaker-
independent continuous speech recognition with wide
grammatical coverage. The rationale for this was the fact
that we believed that a dialogue model derived under these
circumstances would best support the spectrum from
linguistically impoverished one-word command interaction
to full-fledged continuous speech understanding. Designing
dialogues which are adaptive with respect to the
sophistication of the technology available as well as the
user's need for system control is an explicit goal of the
project.
As far as the actual generation of speech output is
concerned, the options for making a wizard seem machine-
like have consisted of using text-to-speech conversion on
one hand, and voice distortion on the other. However, state-
of-the-art speech synthesis is generally perceived as less
intelligible than human speech, and distorted human speech
is by definition less easy to understand than normal speech.
Both are unlikely to have any resemblance to the voice
output of real automated telephone services. It turned out
that digitized, undistorted, spoken messages of the same
type that are used in existing automated telephone services
(such as voice mail systems) were quite sufficient to give
the impression that the subjects were communicating with a
machine (19 out of 20 subjects in our most elaborate study
believed this). Two factors contributed to this: (1) messages
were spoken in the same way as in these services (friendly
but formal); and (2) the relatively strange but consistent
prosody which is a result of using combinations of canned
spoken messages.
2 Telia Research AB, S-136 80 HANINGE, Sweden
RESULTS
Methodological results
Through basing the simulations in the second stage on a
rudimentary dialogue model, the behaviour of an automated
service could be approximated. A support tool, the Wizard's
Answering Device (WAND) represented this basic
organization of the different parts of the dialogues as a set of
panels each having several groups of messages, arranged
according to subtask, giving the wizard guidance on what
answers were appropriate in a given situation.
General observations
A number of observations were made in connection with the
principal study so far (a simulation of a speech-controlled
voice-mailbox) that we believe generalize to other speech-
controlled services. Most importantly, we verified what has
been reported elsewhere [6,7], that convergence and
colouring phenomena are prevalent. In subsequent
interviews, subjects even explicitly expressed the need they
had felt to imitate system language in order to compensate
for the lack of a clear model of the competence of the
dialogue counterpart. The need for a feedback model
reflecting the system's competence also proved important,
i.e. making explicit what the system is unable to hear
(corresponding to low acoustic score in the speech
recognizer), what it does not understand (vocabulary
unrelated to the service) and what it can't do (when the
functionality of the service is a limitation).
Service-specific data
For every service subject to this type of simulation, a lot of
data can be gathered in the form of transcribed dialogues.
Following careful analysis, it is possible to revise and refine
the dialogue model to reflect in more detail an organization
of the dialogue concerning tasks and subtasks that makes
sense to users. An important part of this refined dialogue
model is information about the vocabulary that users will
want to use in order to carry out different tasks. Already
during free dialogue collections, some data may have been
obtained (task organization and corresponding vocabulary),
but at this later stage the data will be much more structured,
through the use of the dialogue model.
FUTURE WORK
The third stage of the simulations, which we are currently
working with, aims at integrating more detailed dialogue
models into the simulations, using data from the second
stage. In this scenario, the models will not only contain
information about which messages are associated with
which subtasks, but also which type of user requests will
trigger what system messages, what the flow of dialogue
looks like, the dynamics of system control and initiative,
and what meta dialogues (such as help, requests for
clarification, etc.) can be initiated. This model will be
possible to load into WAND, in effect replacing parts of the
cognitive processing of the wizard. The purpose of this is to
approach the restrictions of actually implementable speech
understanding as closely as possible. In addition to using
these dialogue models in "generation mode" during the
simulations, it will in fact be possible to use them also as a
part of the language analysis machinery in the developed
service.
ACKNOWLEDGEMENTS
We thank Björn Bergström of Telia Research, who
deve-loped the first version of WAND.
References
1 Amalberti, R., Carbonell, N., and Falzon, P. (1993) "User
Representations of Computer Systems in Human-
Computer Speech Interaction," Int. J. Man-Machine
Studies, vol. 38, 547-566.
2 Dahlbäck, N., Jönsson, A., and Ahrenberg, L. (1993)
"Wizard of Oz Studies - Why and How", Proceedings of
the Workshop on Intelligent User Interfaces, Orlando,
Florida.
3 Dybkjaer, L. and Dybkjaer, H. (1993) "Wizard-of-Oz
Experiments in the Development of the Dialogue Model
for P1", Report 3a, Spoken Language Dialogue Systems,
STC Aalborg University, CCI Roskilde University, CST
University of Copenhagen, Denmark.
4 Fraser, N. and Gilbert, N. (1991) "Simulating Speech
Systems", Computer Speech and Language, vol. 5, 81-89.
5 Hauptmann, A. and Rudnicky, A. (1988) "Talking to
Computers: An Empirical Investigation," Int. J. Man-
Machine Studies, vol. 28, 583-604.
6 Karlgren, J. (1992). "The Interaction of Discourse
Modality and User Expectations in Human-Computer
Dialog," Licentiate Thesis at the Dept. of Computer and
Systems Sciences, University of Stockholm, Sweden.
7 Leiser, R. G. (1989) "Exploiting Convergence to Improve
Natural Language Understanding," Interacting with
Computers, vol. 1, 284-298.