Abstract
An important speech recognition issue is how large an improvement do you have
to make to the speech recognizor's accuracy rate so that people can detect an
improvement. We are exploring the just-noticeable difference (JND) for speech
recognition accuracy. Participants dictate pairs of 200-word passages and then
report which passage is recognized more accurately. The difference between the
accuracy rates of the passages is continually reduced until the subject is
unable to reliably report a difference (the method of limits). We used a
"Wizard of Oz" methodology to simulate speech recognizors with varied accuracy
rates. A second factor under investigation is how error correction affects
participants' perception of accuracy and whether the perception of accuracy
follows Weber's Law.
Keywords
Speech recognition, recognition accuracy, JND.
Introduction
At the moment, when measuring a speech recognizor's accuracy, it is easy to
find statistically significant differences between recognizors. However, what
is a practically significant difference? How much of a difference does there
have to be between recognizors for a person to perceive it? Given the limits
of human perception, we are investigating how much of a difference developers
have to make to a speech recognition product in order for people to notice the
improvement.
Methodology
To evaluate the JND of speech recognition accuracy we used a "Wizard of Oz"
methodology to simulate different accuracy rates because it was not possible
to simulate these rates with current speech recognition engines.
Several 200-word passages with varying numbers of errors were created. The
correct passage was displayed on the screen. As the participant spoke into a
disconnected microphone, the "recognized" words would appear underneath the
correct passage, with random errors simulating less-than-perfect recognition.
Participants compared pairs of passages with different error rates. They were
then asked, "Which passage was recognized most accurately?" One passage was
either 50%, 12%, 6%, 3%, or 1.5%. The second passage initially varied from the
first enough so that the difference could be easily detected. The difference
was then continually reduced using the method of limits to determine the JND.
Results and Discussion
Initial results have shown that the JND for accuracy appears to be between 5%
and 10% for most people. If they have to correct errors, the JND appears to be
even smaller.
We are working toward creating a guideline for developers, which says, "If you
currently have X% recognition, you will have to improve the recognition by at
least Y% for the majority of people to notice an improvement."