



Artificial Intelligence Laboratory
Electrical Engineering & Computer Science Department
University of Michigan
1101 Beal Avenue
Ann Arbor, Michigan 48109-2110
(313) 763-6739
kieras@eecs.umich.edu
Artificial Intelligence Laboratory
Electrical Engineering & Computer Science Department
University of Michigan
1101 Beal Avenue
Ann Arbor, Michigan 48109-2110
(313) 763-6448
swood@eecs.umich.edu
Department of Psychology
University of Michigan
330 Packard Road
Ann Arbor, Michigan 48104
(313) 763-1477
David_E._Meyer@um.cc.umich.edu
Since the CPM-GOMS methodology and its most noteworthy application [5] is the precursor to the present
work, some
background is important to make the contribution of the present work clear (see also [7] for a general
discussion of GOMS
methodologies). CPM-GOMS is based on the Model Human Processor (MHP) [4], which is a proposal for
how human
information processing is performed by a set of perceptual and motor processors surrounding a cognitive
processor; these
processors operate in parallel with each other. During performance of a task, the human engages in
perceptual, cognitive,
and motor activities; but since these activities can overlap each other in time, the total time to execute the
task is far less than
the total of the times for the individual activities. Predicting the time required to execute the task thus
requires determining
which individual perceptual, cognitive, and motor activities are overlapped.
In the CPM-GOMS methodology, the analyst constructs a schedule chart (PERT chart) to represent the
temporal
dependencies between the various sequential and parallel activities. Once this network of activities is
constructed, the
predicted execution time between the very first and the very last activity is the total of the times on the
critical path through
the network, which is the longest duration pathway along the dependencies between the task start and
completion. The
critical path can then be examined to determine which activities actually determine the time required to
complete the task.
However, the practical problem with CPM-GOMS methodology is that constructing the schedule charts
required to analyze
an interface design is quite labor-intensive. The analysis is performed on a set of benchmark task scenarios,
or task
instances. For each task instance and interface design, the interface analyst must choose the particular
hypothetical pattern of
perceptual, cognitive, and motor activities, and construct the schedule chart that shows which MHP
processors are active in
what order, and which processor actions depend on which other actions. Of course, the analyst may be able
to reuse large
portions of the schedule charts if the alternative designs or tasks involve only small variations that can be
represented just by
rearrangements of portions of the schedule charts (as in the Ernestine models, [5]). But due to the work
involved, the CPM-
GOMS method is recommended for predicting execution time only when there is a small number of
benchmark tasks to be
analyzed (see [7]).
This paper presents a new family of engineering models based on the EPIC (Executive Process-Interactive
Control) human
information processing architecture developed by Kieras and Meyer [9, 12], and the earlier Cognitive
Complexity Theory
(CCT) production-system analysis of human-computer interaction [3, 10]. EPIC is similar to the Model
Human Processor
(MHP) [4], but EPIC incorporates many recent theoretical and empirical results about human performance
in the form of a
computer simulation modeling software framework. Using EPIC, a generative model can be constructed
that represents the
general procedures required to perform a complex multimodal task as a set of production rules. (The term
generative is used
analogously to its sense in formal linguistics. The syntax of a language can be represented compactly by a
generative
grammar, a set of rules for generating all of the grammatical sentences in the language.) When the model is
supplied with
the external stimuli corresponding to a specific task instance, it will then execute the procedures in whatever
specific way the
task instance requires, thus simulating a human performing the task, and generating the predicted actions
and their time
course.
If a generative model based on EPIC can be applied to predicting execution time in a high-performance
task, it should be
considerably more efficient than the CPM-GOMS approach. Preliminary work with an EPIC model of the
telephone
operator tasks [13] was encouraging, showing fairly good accuracy in predicting task and event times for a
small set of task
instances. However, this preliminary model was constructed in a "scientific" mode, in which the model was
developed
iteratively to provide a good fit to a single protocol, and was then validated against two other protocols.
But for a
engineering model to be most useful, it should be usefully accurate in an a priori mode, requiring little or no
"tuning" based
on empirical task observation.
Thus the work reported here investigated the extent to which usefully accurate predictions could be made
with predictive
EPIC models that are based on a priori task analysis and principles of construction.
EPIC was designed to explicitly couple the basic information processing and perceptual-motor mechanisms
represented in
the MHP with a cognitive analysis of procedural skill, namely that represented by production-system
models such as CCT
[3], ACT [1], and SOAR [11]. Thus, EPIC has a production-rule cognitive processor surrounded by
perceptual-motor
peripherals; applying EPIC to a task situation requires specifying both the production-rule programming for
the cognitive
processor, and also the relevant perceptual and motor processing parameters. EPIC computational task
models are generative
in that the production rules supply general procedural knowledge of the task, and thus when EPIC interacts
with a simulated
task environment, the model generates the specific sequence of serial and parallel activities required to
perform specific
tasks. The model is driven by a task instance description that consists only of the sequence and timing of
events external to
the user, such as which characters appear at what location on the screen at what time, possibly in response
to actions
performed by the user. Thus the task analysis reflected in the model is general to a class of tasks, rather
than reflecting
specific task scenarios.
Figure 1 shows the overall structure of processors and memories in the EPIC architecture. At this level,
EPIC is rather
conventional, and closely resembles the MHP. However, there are some important new concepts in the
EPIC architecture;
this brief presentation will highlight some key properties of EPIC that both distinguish it from the MHP and
are important for
the work reported here. More details can be found in [9, 12]. It is important to note that EPIC was used "as
is" for the
modeling work reported here; the details and parameters of the architecture had been developed in other
task domains and
modeling projects.
As shown in Figure 1, there is a conventional flow of information from sense organs, through perceptual
processors, to a
cognitive processor (consisting of a production rule interpreter and a working memory), and finally to motor
processors that
control effector organs.
FIGURE 1.
Overall structure of the EPIC architecture showing information flow paths as solid lines,
mechanical control or
connections as dotted lines. The processors run independently and in parallel; task performance is
simulated by having the
EPIC model interact with a simulated task environment.
There are separate perceptual processors with distinct processing time characteristics, and separate motor
processors for
vocal, manual, and oculomotor (eye) movements. There are feedback pathways from the motor processors,
as well as tactile
feedback from the effectors, which are important in coordinating multiple tasks. The declarative/procedural
knowledge
distinction of the "ACT-class" cognitive architectures [1] is represented in the form of separate permanent
memories for
production rules and declarative information. Working memory (WM) contains all of the temporary
information tested for
and manipulated by the production rules, including control information such as task goals and sequencing
information, and
also conventional working memory items, such as representations of sensory inputs.
A single stimulus input to a perceptual processor can produce multiple outputs to be deposited in WM at
different times. The
first output is a representation that a perceptual event has been detected, followed later by a representation
that describes the
recognized event. The perceptual processors in EPIC are "pipelines," in that an input produces an output at
a certain later
time, independently of what particular time it arrives.
The cognitive processor is programmed in terms of production rules, and so in order to model a task, we
must supply a set of
production rules that specify what actions in what situations must be performed to do the task. We are
using the
Parsimonious Production System (PPS) interpreter, which is especially suited to task modeling work, as in
the CCT models
[3]. One important feature of PPS is that control information such as the current goals is simply another
type of WM item,
and so can be manipulated by rule actions. The cognitive processor accepts input only at the beginning of
each cycle, and
produces output at the end of the cycle, whose mean duration we estimate at 50 ms. A critical difference
with the MHP and
many other production system architectures is that on each cognitive processor cycle, any number of rules
can fire and
execute their actions; this parallelism is a fundamental feature of PPS. Thus, unlike the MHP, the EPIC
cognitive processor
is not constrained to be doing only one thing at time. Rather, multiple processing threads can be
represented simply as sets
of rules that happen to run simultaneously.
The EPIC motor processors are much more elaborate than those in the MHP. Certain results motivate our
assumptions that
the motor processors operate independently, but the hands are bottlenecked through a single manual
processor, and so
normally can be operated only either one at a time, or synchronized with each other. Current research on
movement control
suggests that movements are specified in terms of features, and the time to produce a movement depends on
its feature
structure as well as its mechanical properties. We have represented this property in highly simplified
models for the motor
processors. The input to the motor processors consist of a symbolic name for the desired movement, or
movement feature.
The processor recodes the symbol into a set of movement features, and then initiates the movement. An
important empirical
result is that effectors can be preprogrammed if the movement can be anticipated. In our model, this takes
the form of
instructing the motor processor to generate the features, and then at a later time instructing the movement to
be initiated. As
a result of the pre-generation of the features, the resulting movement will be made sooner. Finally, we
assume that a motor
processor can prepare only one movement at a time, but this preparation can be done in parallel with the
physical execution
of a previously commanded movement.
Briefly, the tasks analyzed in this report involve a human operator who sits at a computer-based workstation
and assists
customers to complete telephone calls. The specific class of tasks analyzed were ones in which the
customer dials "0"
followed by the destination telephone number, but then needs to supply orally a billing number to the
operator (hereafter
termed the user). The task begins when the workstation beeps to announce the arrival of a call, and then the
workstation
displays a variety of items on the screen about the call characteristics. The user must greet the customer
with one of two
greetings depending on whether the customer is calling from a pay phone or a private phone.
The major activity in the task is to use the screen information and the customer speech to determine which
keys to press to
specify the billing class of the call and then enter the billing number into the workstation, which then checks
the number for
validity. After getting the billing information from the customer, the user says "thank you." When the
workstation validates
the number, the user presses the POSITION RELEASE key to allow the call to proceed and signal readiness
to handle the
next call.
Many of the task activities can be overlapped; for example, the user typically starts pressing keys while the
customer is still
speaking, and can overlap much of his or her own speech with such activity and while waiting for the
workstation to respond.
An immediate insight from the EPIC architecture is that there are multiple possibilities for performing task
activities in
parallel. Accordingly, a series of models was constructed that represented discrete points on a continuum
starting with a
purely hierarchical and sequential description of the task, through models that took advantage of the parallel
processing
possibilities of the cognitive architecture, to models that represented highly optimized utilizations of the
architecture. Thus
the sequence of models represent a hypothetical increase in processing efficiency and sophistication, which
presumably
would be related to the degree of practice in the task. Since the users producing the data were highly
experienced, it was
expected that one of the more optimized models would provide the best account of their performance.
The first model, termed the Hierarchical Motor-Sequential model, was based on a straightforward GOMS
model for the task
that followed the NGOMSL notation [8] for describing task procedures as a hierarchical set of methods
consisting of
sequential executed actions. Figure 2 shows the hierarchy of Goals and Methods for the GOMS model for
the task. The
production rules implemented this GOMS model in a style similar to the CCT templates described in [3]. In
particular, each
GOMS method entailed executing a pair of "housekeeping" productions corresponding to the entry and
return from a
submethod (see [3]), and a separate production rule for each basic perceptual or motor operator step in the
method. The
production rule for each step always waited for any motor action to be completed before it would fire to
instruct the next
motor action, or to invoke a submethod. Likewise, if an action was taken to acquire perceptual information,
such as an eye
movement, the next rule to fire always waited until the perceptual information was available. This model
had a total of 50
production rules; one rule for each step in each method plus the additional "housekeeping" rules for each
method.
Although it had strictly sequential methods, the Hierarchical Motor-Sequential model overlaps some of the
task activities,
for example, typing the billing number can begin while the customer is still speaking digits. Using a
"pipeline" approach
similar to John's [6] model of transcription typing, as each digit arrives in Working Memory, the cognitive
processor sends
the corresponding key press command to the manual motor processor as soon as it finishes the previous
keystroke.
The second model, the Hierarchical Motor-Parallel model, assumed that the user could take advantage of
the motor
processor's ability to prepare the next movement while a movement is currently underway, and so physical
execution of the
next movement can be initiated as soon as the current movement is complete. The production rules from
the previous model
were simply modified so that they no longer waited for actions to be completed. Rather, each rule that
instructed a motor
processor merely waited for the relevant motor processor to be ready to prepare a new movement, and if the
next rule did not
use that processor, it did not have to wait for it to finish. Thus, activities involving different processors
could be performed
in parallel, and preparations for the next movement could be made in parallel with the execution of a
movement. As a
result, many purely cognitive activities, such as the rules performing method housekeeping, could then
execute while
perceptual-motor actions were taking place.
The third model, the Hierarchical Prepared Motor-Parallel model, assumed that the user would anticipate
the eye or hand
movements by instructing the motor processor to prepare movements in advance, as soon as it was ready to
accept movement
instructions, and as early as logically possible. This advance preparation results in substantial time savings
(typically 100-
250 ms) when the movement is actually to be made. Note that EPIC's motor processors do not impose a
time penalty for a
movement preparation that is subsequently not used or is overwritten by a different movement instruction.
Thus it is possible
to speed up performance if the likely next keystroke can be predicted.
This model was constructed by adding additional production rules to the Motor-Parallel model to send the
preparation
instructions to the motor processors at the right time. Such preparation was possible only for movements
that could be
assumed to be constant at that point in the task; for example, typing a digit of the billing number could not
be prepared in
advance, since the billing number would vary from task to task. In contrast, pressing the billing category
key could be
prepared far in advance, given that the task structure makes it reasonable to assume that this key is probably
the next one to
be hit.
A fourth model, the Hierarchical Premove/Prepared Motor-Parallel model went further, by actually making
the movements
in advance if logically possible, and then preparing for any subsequent motion. Thus certain keystrokes
could be anticipated
by moving the hand to the location of the key in advance, and then programming the actual keystroke
movement. Thus both
the physical movement and the motor programming were done as much in advance as possible, further
speeding task
execution.
FIGURE 2
The hierarchy of goals and methods in the GOMS model for the telephone operator
task. Connections labeled
as selection rules indicate possible additional subgoals.
The original Hierarchical Motor-Parallel model was then modified in a different direction, one involving
flattening the
methods. According to principles proposed in learning theories such as ACT [2] and SOAR [11], the
method housekeeping
and other such rules would be replaced as a result of practice by a more efficient set of rules that effectively
turn "subroutine"
methods into "in-line" methods. For example, a rule that invoked the submethod for entering a billing
number would be
replaced by a rule that simply performed the first substantive step for entering the billing number, and which
then chained to
the next step. The resulting rule set could be represented as a tree, in which each class of task would be
performed by a
sequence of rule firings along a single linear path through the tree, and each rule performs some substantive
task action or
decision, with no housekeeping rules. However, as in the Hierarchical Motor-Parallel models, the
perceptual-motor activities
can overlap substantially.
The Flattened Method models are perhaps closest to the CPM-GOMS models for the telephone operator
tasks [5], in that the
methods consist simply of sequences of operators, with no hierarchical submethod structure (see [7, 8] for
more discussion of
this distinction).
The rule set for the fifth model, the Flattened Motor-Parallel model was constructed by modifying the
Hierarchical Motor-
Parallel model to concatenate the steps of separate methods, with selection rules being replaced by simple
conditional tests
on each branch. A sixth model, the Premove/Prepared Flattened Motor-Parallel model, incorporated the
same advance
movement and preparation as the Premove/Prepared Hierarchical Motor-Parallel model. Because the
minimum number of
activities are on the critical path, this model produces the fastest execution times.
The basic question is how well the a priori constructed models predict actual task performance data. Using
videotaped task
performances collected, but not analyzed, during the Gray, John, and Atwood [5] Project Ernestine, we
selected task
instances covered by the models, and in which the operator made no substantial overt errors in performance,
and the
customer provided the relevant task information smoothly, without discussion with the operator. A set of
four task instances
for each of two users were selected. The video and audio recordings of the selected task instances were
digitized at full
frame rate, and the times of individual events (display changes, words of speech, and keystrokes), were
determined to the
nearest video frame (1/30 sec).
Each of the eight task instances was simulated with the EPIC models by programming the environment
simulation module
with the times of the externally-determined events (e.g., response time of the workstation, timing of each
word of the
customer's speech), and then running the EPIC system with the production rules for each model. All
perceptual-motor
parameters were kept fixed at values previously determined in earlier work [12, 13]. Thus the execution
time predictions
produced by the different models differed only as a result of how the production rules controlled the EPIC
architecture.
To provide a basis of judging the relative contribution of the EPIC models, the total task execution time was
predicted for
each task instance using the Keystroke-Level Model, which usually produces usefully accurate results in
ordinary computer
interface applications [4, 7]. The predicted task execution time was simply the total of the observed
relevant workstation
response times, the customer and user speaking times, and the total time for keystrokes (280 ms each) and
homing operators
(400 ms each).
Total Task Execution Times. Predicting the total task execution time is the key contribution of engineering
models for this
type of task. The normal definition of this time is the duration between the initial call arrival signal tone
and the last
keystroke, the POSITION RELEASE key. According to the workstation training materials, a certain screen
event is the
proper signal for hitting this key, and this was assumed in the GOMS analysis underlying the models.
However, according to
our informants, the POSITION RELEASE key is not very constrained by the task structure; in fact, it can be
pressed at a
variety of times, even well in advance of the screen event, and indeed the timing of this keystroke was quite
unstable in the
observed data. Accordingly, the total task execution time was calculated as the time to press the
penultimate key, the
START key, which is struck immediately after the last digit of the billing number is entered. The predicted
task execution
times for each model were compared to these observed task execution times.
All of the models, even the Keystroke-Level Model, accounted for a statistically significant 83% or more of
the variance in
the task execution times. This is due to the fact that the major determinant of the task execution time is the
length of the
billing number supplied by the customer, and all the models predict that the execution time will be longer as
the length of the
billing number increases. However, the goal of good engineering models is to supply predicted values of
usability metrics
that are not merely correlated with the empirically measured values, but are actually similar in numerical
value. Figure 3
shows the average absolute error in prediction, expressed as a percentage of average observed value. The
dotted line shows
10% error, a common rule of thumb for a useful level of prediction accuracy. The average absolute error of
prediction
ranges from 7% for the relatively simple Hierarchical Motor-Parallel model, to 14% for the worst-fitting
EPIC model, to
28% for the Keystroke-Level Model.
FIGURE 3
Average absolute error of prediction of total task execution time for each model.
All of the EPIC models appear to be usefully accurate in predicting total task execution time because they
all represent to
some extent how the task activities can be overlapped with each other (e.g., the billing number can be keyed
in while the
customer is still speaking the digits), so they all do a reasonable job of predicting overall task execution
time. In contrast, the
Keystroke-Level Model is much less accurate because it does not overlap any activities.
The surprise is that the highly optimized models did not fit the data as well as the simple Hierarchical
Motor-Parallel model,
which is only moderately efficient. This suggests that while users take advantage of the parallel preparation
and execution
capabilities of their motor processors to speed up their performance, they make little use of pre-positioning
the eyes and
hands in advance. Also, there is a hint that the flattened method models are "too fast" compared to the
hierarchical models,
but the difference is not large. Unfortunately, in these tasks, the Hierarchical model rules for submethod
"calls" and "returns"
tend to be overlapped with perceptual motor processing or external events, and so do not contribute to task
performance
time. A different type of task may be required to clearly distinguish these two families of models.
Individual Event Times. The scientific accuracy of the models can be tested more thoroughly by examining
the predicted and
observed timing of individual events such as keystrokes. These times were predicted very poorly by some
models, and only
moderately well by the best-fitting model, the Hierarchical Motor-Parallel model. Most of these events
consist of typing the
digits of the billing numbers. Detailed examination shows that in the observed task instances, the customer
speaks the digits
at a rate typically slower than the model (and apparently the actual users) can make the corresponding
keystrokes, but the
exact timing in the model is very sensitive to the delays in the situation, both those resulting from the
workstation design and
those due to speech recognition. Further work to characterize the details of the individual event timing is in
progress.
One important implication of the detailed results is that apparently the rate at which the customer speaks the
digits, not the
rate at which the user can type, is probably the major bottleneck in the task execution time. A second
implication is that
perhaps the reason why the users appear to be following a task strategy that is only moderately efficient is
that the task is so
limited by the customer's speaking rate that there is no need for the greater efficiency of the more highly
optimized models.
Some EPIC models for a high-performance task were constructed using a priori task analysis, construction
principles, and
parameter values, and these models were able to predict total task execution times with an accuracy high
enough to be useful
as engineering models for interface design. The detailed properties of the models suggest that the required
level of
optimization on the part of the user in these tasks may not be very high, although the users are highly
practiced and execution
speed is important. These results show the potential for EPIC to provide a framework for engineering
models in complex,
high-performance domains in which the user's performance time depends on the overlapped activity of
separate processing
capabilities.
The effort required to construct EPIC models seems to be considerably less than that for CPM-GOMS. In
both approaches,
the analyst must make many decisions about the details of task execution, such as when eye movements are
necessary, but
for EPIC models, these decisions are made only once for the general task procedures, rather than possibly
multiple times in
each specific benchmark task instance. Constructing the present models was relatively easy; the initial
GOMS model was
routine [7] once the information on the actual task procedures became available. Building the production-
rule models was a
matter of applying templates, both existing [3], or readily standardizable. Finally, the EPIC architecture
itself was fixed and
required no development for this analysis. In return for the rather modest construction effort, the resulting
EPIC model can
generate predicted execution times for all possible task instances within the scope of the GOMS model.
Thus EPIC models
would appear to be very efficient engineering models for high performance tasks.
At this point, EPIC is definitely a research system, and certainly is not ready for routine use by most
interface designers.
However, note that in some situations, such as the Ernestine project [5], the economics of the interface
evaluation problem
can make even a novel and demanding analysis approach a practical and useful solution. In addition,
following the
precedent of the CCT and NGOMSL engineering models (see [7]), as the EPIC architecture stabilizes and
experience is
gained in applying it to interface analysis problems, it should be possible to develop a simplified method of
analysis that will
enable designers to conveniently apply engineering models based on EPIC.
This work was supported by the Office of Naval Research Cognitive Sciences Program under grant
N00014-92-J-1173, and
NYNEX Science and Technology, Inc.
Abstract
Engineering models of human performance permit some aspects of usability of interface designs to be
predicted from an
analysis of the task, and thus can replace to some extent expensive user testing data. Human performance in
telephone operator tasks was successfully predicted using engineering models constructed in the EPIC
(Executive Process-Interactive
Control) architecture for human information-processing, which is especially suited for modeling
multimodal, complex tasks.
Several models were constructed on an a priori basis to represent different hypotheses about how users
coordinate their
activities to produce rapid task performance. All of the models predicted the total task time with useful
accuracy, and clarified some important properties of the task.
Introduction
Engineering models for human performance permit some aspects of user interface designs to be evaluated
analytically for
usability, without consuming resources for empirical user testing, by making usability predictions based on
an analysis of the
user's task in conjunction with principles and parameters of human performance [4, 7]. This paper reports
results on a new
class of engineering models for a type of high-performance task, namely the telephone operator tasks
studied by Gray, John,
and Atwood [5]. By "high performance" we mean that the task is time-stressed; the total execution time
must be minimized,
and the user of the workstation (the telephone operator) is well-practiced. These tasks are scientifically
interesting because
they are multimodal, involving speech reception and production as well as the usual visual display and
keystrokes, and also
because they are active system tasks [7] in that the user must respond to events produced by the external
environment, unlike
passive system text editing, which is basically paced by the user. As pointed out by John and Kieras [7],
engineering models
for active system tasks are currently under-developed. Finally, predicting performance in such tasks can be
economically
important; a detailed information-processing analysis of telephone operator tasks, the Gray, John, and
Atwood CPM-GOMS
models [5], were of considerable economic value in this domain where a second's reduction in average task
completion time
represents considerable financial savings.
Background on CPM-GOMS
Generative Models of Interface Procedures
THE EPIC ARCHITECTURE
MODELING THE TELEPHONE OPERATOR TASK
Task Summary
A Set of A-Priori Models
To simulate the user's performance, EPIC was "programmed" with a set of production rules capable of
performing all
possible instances of a class of telephone operator tasks. Under direction of the cognitive processor rules,
the perceptual and
motor processors move the eyes around, perceive stimuli on the operator's workstation screen, and reach for
and strike keys.
The time these activities require is determined by the perceptual and motor processors, but the production
rules can arrange
to overlap some of the activities in order to complete the entire task as rapidly as possible.
COMPARISON OF THE MODELS TO DATA
Observed and Predicted Times
Results
CONCLUSIONS
ACKNOWLEDGEMENT