



Departments of Computer Science and Psychology
and the HCI Institute
Carnegie Mellon University
Pittsburgh, PA 15213
(412) 268-7182
bej@cs.cmu.edu
Hilary Packer
Master of Software Engineering Program
Carnegie Mellon University
Pittsburgh, PA 15213
(412) 268-3759
hpacker@cs.cmu.edu
In his book Case study research: Design and methods,
Robert Yin [19] states that a case-study approach
has an advantage over surveys, experiments, and other
research strategies "...when a 'how' or 'why' question is
being asked about a contemporary set of events over which
the investigator has little or no control." ([19], p. 9) This
seems to describe the situation facing thefield of HCI when
evaluating methods. We are asking how a
giventechnique can be used to predict usability problems,
why it works in some situations and not in
others, and we have virtually no control over how an
analyst learns or uses a technique.
The essence of the case-study approach is to collect many
different typesof data and use them "in a triangulating
fashion" ([19] p. 13) to convergeon an explanation of what
happened. When multiple sources of information
converge, it boosts our confidence that we have understood
the series of decisions occurring in a case; how these
decisions were made, how theywere implemented, and
what result was achieved. This deep understanding should
allow us to know whether these processes and results are
likely to re occur with other developers or in the next
design project.
This paper presents the first in a series of five case studies
of novice analysts learning and applying different
usabilityassessment techniques to a single interface.
(Footnote 1)
These studies analyze several sources of evidence: problem
description forms filled out as the analysts worked,
discussions with theanalysts, final reports by the analysts,
and, to obtain detailed processdata, diaries kept by analysts
while they learned and applied these techniques. This case-
study approach allows us to look at what the analysts did,
the confusion and insights they had about the techniques, as
well as more traditional performance measures like the
number and type of usability problems identified.
The ACSE Multi-media Authoring System
The Advanced Computing for Science Education (ACSE)
project has built a multi-media science learning
environment to teach the skills of scientific reasoning [12].
This software system provides an author with a structured
document frameworkand a set of tools for constructing
science lessons containing text, still graphics, movies, and
simulations. The system provides the science student with
tools for navigating through the lesson, viewing the
movies, and manipulating and running the simulations.
Based on several years experience with this system, the
ACSE project recently designed a second version of
it'sauthoring tool, called the VolumeViewBuilder
(henceforth, Builder). The redesign was a group effort
recorded in a user interface specification written by a
software developer and a technical writer, both of whom
had recently sat in on an HCI design class [5]. The
implementation of the Builder was begun at the same time
as our case studies and continues at the time of this writing.
Figure 1. Example of an illustration in the Volume
View Interface Design [5, p. 16].
In that document,
this is a full-page illustration of the screen of the
Builder application
The ACSE project gave us their user interface specification
for analysis. The specification included an introduction to
the system and the goals of the document (4 pages), the
interface for science students (theVolumeViewExplorer, 9
pages) and the interface for the Builder (22 pages). The
specification included 37 figures of thescreens (1 in the
introduction, 11 in the Explorer, 24 in the Builder)which
ranged from small pictures of specialized cursor icons to
full-pagefigures of the entire screen (e.g., Figure 1).
The Example Multi-Media Document: The
target multi-media document was a biology lesson about
Drosophiladevelopment adapted from an advanced
undergraduate biology course using theACSE system for
laboratory sessions. This volume is 55 pages long, and
inaddition to text contains 23 high resolution images and
figures, 3 movies, 7 simulations, 10 fragments ofsimulation
code, and 10 review questions. The original volume was
producedwith the first version of the Builder and did not
included a table of contents, a glossary, or hyper-links,
because those features were notincluded in that version of
the Builder. The first author modified this lesson to include
these features and produced a hard copy target
document. This document did not run on a computer, but
the links were explicitly indicated in supplementary
lists,i.e., all the table of contents entries and their page
numbers were in alist, all the glossary terms were in a list,
all the hyperlink "hot phrases" and where they would point
were in a list. This document and the lists were given to
the analysts as a typical document created with the new
Builder.
The Analysts' Discussions
The discussions with the analysts were structured around
several open-ended questions: What did you do in your
analysis in the last couple of days? Did you have any
difficulty with the technique? Did you have any insights
into thetechnique? Did you discover anything else notable
about the technique? Thus, these discussions centered
around the process, not the content of the analyses. That is,
they discussed things like problems getting or
understanding papers, problems making the techniques
applicable to the Builder, types of information their
techniques needed or provided, but not specific usability
problems with the Builder (e.g., that the method of creating
hyperlinks was awkward or that the menu items were in the
wrongplace). The first author took notes during the
discussions, which contributed to the case in this paper.
These discussions were also audio taped, but the tapes have
not yet been analyzedand do not contribute to this case.
CW is an usability analysis technique similar in rationale
and execution to requirements or code walkthroughs. CW
focuseson the ease of learning a new interface. This
method has been evolving since its introduction in 1990 [9]
and it's current form [18] was eventually used by A1.
Inputs to a CW are a description of the interface,a task
scenario, assumptions about the knowledge a user will
bring to thetask, and the specific actions a user must
perform to accomplish the task with the interface. The
group, orindividual, performing the CW then examines
each step in the correct action sequence asking the
following four questions. "(1) Will the user try toachieve
the right effect? (2) Will the user notice that the correct
action is available? (3) Will the userassociate the correct
action with the effect that the user is trying toachieve? (4)
If the correct action is performed, will the user user seethat
progress is being made toward solution of the task?" ([18]
p. 106) If a credible success story cannot be told for each
question at each step, then a usability problem has
been identified and CW suggests ways of fixing the
problem.
Figure2. Timeline of the activities of analyst A1 as he learned and applied the Cognitive
Walkthrough inspection
method to the Builder multimedia authoring tool, as recorded in his diary forms.
The time course of the analysis
A timeline of the activities of A1 appears in Figure 2.
After choosing CW, A1 read several other papers [1, 9, 13,
15, 16] before finding [18] (which at the time was only a
University of Colorado technical report, but is
nowpublically available). In his final report, A1 stated that
[18] is the only reference really needed to learn and use the
technique and strongly recommends it to other analysts
because ofi ts clear, step-by-step description and concise
examples. When he found this paper he read it with special
emphasis on learning how to do the technique (total 3
hrs).
A1 then spent a total of 4.5 hours setting up task scenarios
with correctaction sequences in preparation for doing the
walkthrough. To do this, A1examined the functionality of
the Builder and the frequency of elements (e.g., PICT
frames, movies, glossary terms) occurring in the
Drosophila target document. In doing so, A1 discovered
that the Drosophila document did not include several of the
features included inthe Builder interface document, e.g., it
did not include graphical representations of simulation code
likeradio buttons for a fixed set of mutually exclusive
parameters or slide-bars for changing numerical parameters
in a simulation. Noting this deficit in the target document
in the diary for later discussion, A1 chose two task
scenarios, the first being a short one (creating pps. 1-3 of
the target document) to test his understanding of the
walkthrough technique, and the second being much longer
(creating pps. 24-35; 104 user actions).
A1 then determined the correct sequence of 36 user actions
for the shorttask (1.5 hours) and performed the
walkthrough (2 hours). He found six usability problems and
six associated design suggestions during this self-imposed
practice walkthrough. In addition to usability problems,
this short analysis pointed out many gaps inthe Builder
interface specification. After this, A1 read [17], which
hefound interesting, but not useful for applying CW.
A1 then began to apply the CW method in earnest. Since
he did not haveready access to designers of the system, he
performed the walkthrough himself, rather than in a group.
He first prepared the correct actionsequence for this longer
task (104 actions). He began the walkthrough with
frequent reference to the [18], but after2.5 hours of tightly
intertwined reading/analysis and a second reading of the
Builder specification, he did not need to refer tothe papers
anymore. He then spent 16 hrs doing the walkthrough, that
is,for each user action he asked the four questions
recommended by CW and either wrote a success story or
filled out a problem description report for each question.
During this phase hereported 36 usability problems and 35
design suggestions. His final report cites seven more
general psychology and HCI papers, which his diary
doesnot record as being read during this time period
(presumably read during his cognitive psychology course).
An example of the content of A1's walkthrough
Figure 3 shows a portion of the action sequence A1
prepared for himself for the second task scenario. He listed
Action Sequence
Task B: Create 24-35 in Drosophila document
Figure 3. Sequence of correct actions for a portion
of the long task scenario used by A1.
both the action a user would have to take toaccomplish the
task and the system's response for each step.
Figure 4 shows A1's walkthrough of the actions specified in
Figure 3. For each action, he enumerated the CW
questions and either checked off a question if he
considered it a success and recorded the reason for
hisjudgement, or justified why he thought the question
indicated a failure of the interface to support learnability.
If there was a failure at a action, A1 prepared a PDR (Figure
5) and put the number of the PDR in his CW. In addition,
A1 often recorded a suggestion for redesigning the system
to prevent the failure.
The PDRs asked for a judgment of how frequently the
problem would occur toa user of the Builder. A1 judged
that none of the problems he reportedwould occur only
once to a user, 22% to occur rarely, 37% to
occuroccasionally, 29% to occur often, 10% to occur
constantly, (2% were left blank). None of these judgments
camedirectly from using the technique, 39% came from
personal judgment, 56% came from "other" sources (5%
were left blank). The other source of frequency
information was usually the frequency analysis of the target
document that A1 had performed when selecting the task
scenarios, whereas a few were attributed simply to
"common sense."
The PDRs also asked for a judgment of theseverity of the
usability problem on a scale of 1 to 5, where 1=trivial,
CognitiveWalkthrough
Task B: Create 24-35 in Drosophila document
Figure 4. A1's answers to the four CW questions for
the actions shown in Figure 3.
The arrow pointing
to "26"indicates that this failure is reported in PDR
#26 shown in Figure 5.
In his final report, A1 said that the three most important
problems to fix were the icons in the tool palette, the items
in the menus, and the relationship of the glossary and table-
of-contents panes to the main-media pane of the Builder
window. The first two of these problems had many PDRs
associated with them (6 and 5, respectively, about 13% of
the total number of PDRs for each one). This is not
surprising, as two of CW's four questions focus on the
availability and meaningfulness of cues in the system, both
of which point to issues withicons and menus. The last
problem appears in only one PDR. However, A1 justifies
the importance of this problem with his frequency analysis
of the Drosophila document: glossary terms were themost
frequently occurring feature of that document.
Difficulties with CW and insights into its use
In all, A1 recorded 87 notes in his diary which he labeled
either adifficulty with the CW technique or an insight into
its use. Figure 2shows that the difficulties and insights
occur not only when initiallyreading about the technique or
when learning how to use it, but throughout the analysis.
These notes fall into several major content categories.
Also, some concerns disappear as A1 learns more or
becomes more familiar with CW, whereas others persist for
the duration of the analysis.
Background and training required of the analyst (3
notes). When reading about the CW technique, A1
was initially concerned about theamount of training
necessary for the analyst because of statements made in[9]
and [10]. However, this concern disappeared from his
diary notes after the 16th hour. By the final report, A1 was
"very comfortable with learning and applying the
technique" and asserted that "little or no experience in
either user interface evaluation or cognitive psychology
isrequired of the user [of the CW technique]."
Applicability of CW to the Builder application (5
notes). A1 wondered whether CW would scale up
to the complex Builder interface.However, reading [16]
seemed to allay these concerns. In the final report, A1 says
he "...wondered whether the assumptions of the
walkthrough...would apply to the evaluation of the Volume
View. But the assumptions about the user population
(novice users with Mac experience) turned out to be
consistent with these assumptions and the more recent
versions of the CW are notrestricted to walk-up-and-use
interfaces."
Underlying theory (13 notes). A1
expressed many initial concerns about the underlying theory
of CW. He was concerned about the validity of CE+ [13]
and what part the CE+ theory played in theactual
walkthrough. He was concerned about his lack of
knowledge about the underlying theory,and how that would
impact his ability to conduct an effective
walkthrough.However, these concerns disappear while
reading [18] because A1 reasoned that CW's assumption
that the user's actions would largely be guided by the
interface applied to the Builder. In his final report, A1 feels
confident in his ability to conduct a walkthrough, and states
that "Little or no experience in...cognitive psychology is
required... The main strength of the CW is that the
technique is very simple and easy to learn."
Task scenarios (13 notes). A1 had two types
of concerns about the task scenarios CW requires. Thefirst
concern is with his ability to develop "good" task scenarios.
He feltthat in order to come up with realistic task scenarios,
he would have to talk to the users of the Builder, but this
application was too new to have many users or a history of
use,so he had to settle for performing a frequency analysis
on the Drosophila example document. He continued to
have insights about the choice of task scenarios throughout
the analysis process, right up until the very end when
he decided that task scenarios for a design tool like the
Builder shouldinclude modification tasks and recovery
from error as well as the create-from-scratch tasks he
actually analyzed (this last insight came from the group
discussions, brought to the table by the analyst using
GOMS).
A1's second concern was about not having the designers
accessible to dictate the actions equences for the tasks as the
CW papers suggest. This remained a concern throughout
the analysis because he believed he needed to know more
about the Builder than the specification contained, as
evidenced by the fact thathis diary contains 23 notes about
gaps in the specification. However, A1 turned this problem
into a virtue by the final report; he stated, "determining the
sequence of correctactions for longer task scenarios can
take quite a lot of time; and thiscan be quite costly if being
done by a designer. Therefore, I suggest that the sequence
of actions is determinedby the evaluator himself. Another
reason for this modification is that it puts the evaluator
naturally in a situation of learning by exploration.After all,
this is the main focus of the Cognitive Walkthrough." A1
recognizes, however, that if theevaluator determines the
sequence of actions, he or she must be given an opportunity
to clarify questions with the designer.
The process of doing a CW (9 notes). A1
hadseveral concernsabout the process of performing a CW
that arose from reading the early research papers about the
technique [1, 9, 13, 15, 16]. He was concerned about filling
out structured forms, how to handle design suggestions,
and how to record usability problems that were not directly
connected to the task scenarios. However, all of
these concerns disappeared when A1 read [20].
During the practice walkthrough, A1 found the process of
stepping through re-occurring sub-procedures to be very
tedious. To relieve this burden, A1 introduced macros for
tasks that occur frequently in the samecontext in the task
scenarios. That is, he would perform the walkthrough for
the first occurrence of a sub-procedure at the lowest level
of granularity, but for subsequent occurrences, he used a
macro to symbolize the detailed steps.
One process concern recorded by A1 was justified when
the PDRs wereexamined closely. Early on, A1 was
concerned about how to keep track ofmany usability
problems particularly when the walkthrough is done over
an extended period of time. Of the 52 problems reported
by A1, we(the authors) agreed that 4 of these problems
were duplicates. Thisamounts to 8% of the problems, but
this problem might escalate rapidly withlarger systems.
The basic capabilities of the CW method (7
notes) A1 voiced an early concern, raised in [18]
that CW could not address global design issues. Evidently
nothing in his experience with CW helped to quell that
concern, for in his final report, A1 reiterates, "through
thefocus on the narrow path determined by the sequence of
correct actions, global design issues are not explicitly
addressed." Furthermore, A1 says the focus on correct
sequences of actions prevents CW from addressing the
easeof recovery from error. His suggestion for explicitly
including error-recovery task scenarios is an effort to
overcome this problem.
Finally, A1 was concerned that the technique doesnot
provide guidance for rating the frequency and severity of
usability problems. Indeed, even if this concerned had not
been articulated evidence from the PDRs is overwhelming.
On the42 PDRs submitted, none of the
estimates of either frequency or severity were accredited to
the CW technique.
Another message to designers is that CW itselfwill not give
you much guidance about how to pick task scenarios.
(This same message results from the GOMS and Claims
Analysis cases as well.) Scenario generation still seesm to
be a black art. However, this case suggests that including
modification and error-recovery tasks in the scenarios may
be important and easily overlooked. Similarly, CW will
not give any guidance as to the frequency orseverity of
usability problems (a deficit shared by all five evaluation
methods studied). These gaps in HCI techniques call to the
developers of evaluation techniques to fill a crying
need.
A more specific question to the developers of CW concerns
the modification suggested by A1 that macros be used to
reduce tedium. Before recommending this procedure to
designers, the impact of macros should be assessed. True,
they may decrease analysis time. However, they may also
focus attention away from tedious tasks that users
themselves would find objectionable, just as the analysts
do. The developers of CW, or others who want to extend
its use, may view the assessment of macros as another
opportunity for contribution.
In addition to the lessons learned from a single case study
such as this, the potential for many more lessons is
inherent in multiple case studies. The data from the other
fourevaluation techniques (Claims Analysis, User Action
Notation, GOMS, andHeuristic Evaluation) will contribute
to our knowledge of learning and using different methods
for HCI evaluation,especially as we begin to compare the
experiences using the converging evidence of the case-
study approach. However, even more can be learned with
replication and expansion of the conditions under which
these cases were performed. Therefore, the first author
invites any practitioner or researcher who wishes to
participate to contact her about collecting process data for
case-studies. The PDRs and diary forms are available for
distribution, as are the Introduction to HCI Methods lecture
materials, the Builder interface design document, and the
Drosophila example document.
Abstract
We present a detailed case study, drawn from many
information sources, of acomputer scientist learning and
using Cognitive Walkthroughs to assess a multi-media
authoring tool. This study results in several clear
messagesto both system designers and to developers of
evaluation techniques: this technique is currenlty
learnableand usable, but there are several areas where
further method-development would greatly contribute to a
designer's use of the technique. In addition, the emergent
picture of the process thisevaluator went through to
produce his analysis sets realistic expectations for other
novice evaluators who contemplate learning and using
Cognitive Walkthroughs.
Keywords:
usability engineering, inspection methods,
Cognitive Walkthrough.
Introduction
In the last few years, there has beengrowing interest in
studying different usability evaluation techniques
to understand their effectiveness, applicability, learnability,
and usability. Developers looking want to know which
techniques will be suited for their system, budget and time
frame. Universities want to know what to teach inthe
growing number of HCI courses. Researchers developing
techniques need to have feedback to allow them to improve
and expand the techniques in useful directions.
Some work has been done that compares
different techniques (e.g., [3], [4], [7], [8]). This work has
taken the form ofexperiments, formally comparing
performance outcomes of different techniques. The
dependent variables are typically quantitative: the number
(and type)of usability problems identified, how closely a
given technique predicts a user's behavior, the time it takes
to perform the evaluation, the labor costs involved.
Perhaps the biggest difficulty with these studiesis that they
provide no data about what people actually do when
theyare using these techniques. Without process data, it is
difficult tounderstandhow the technique itself leads the
analyst to identify usability problems,as opposed to the
analyst simply being clever. Without process data, it
adeveloper does not know what to expect when setting out
to use a new technique. Finally, without processdata, it is
difficult to provide meaningful feedback to the method-
developers so they can improve their techniques. This
situation is akin to a usability study of a system that only
provides numerical dataabout performance time or total
number of errors; what we need is the equivalent of a think-
aloud protocol for assessing usability techniques.
THE ANALYSIS SITUATION
It is not uncommon for a development team to ask one of
its members (or an outside person) with an interest in HCI
to make recommendations about a user interface design (in
fact, it was just such a request that led the first author to
switch fields from mechanical engineering to HCI 15 years
ago). When this happens the analyst must figure out what
assessment technique he or she wants to use, find books or
papers describing how to use it, learn the technique from
these materials, and apply it to the system design. In this
study, we set up a similar situation in the following way.
Figuring out what technique to use
Five volunteer analysts were given a 30minute lecture on
HCI assessment techniques. This lecture was the
methodspart of the Introduction and Overview of
Human-Computer Interaction tutorial given at the
CHI conference for the last three years [2]. The analysts
heard this lecture from the same lecturer (the first author)
and received the same tutorial notes and bibliography as
tutorial participants,which is the way scores of
professionals get their first introduction to usability
assessment methods each year. One week later, each
analyst chose the method they wished to use.
The system and specifications
The analysts were given two documents with which to do
their analyses: the user interface specification of the ACSE
multimedia authoring system and a target multimedia
document (described below). The analysts read these
documents at home for one week and then were given two
1-hour sessions with the head developer of the system to
clarify any points of misunderstanding. The results of these
sessions were written up and sent to each analyst via e-
mail. The analysts had access to the developer via e-mail if
any other difficulties arose understanding the documents,
and all such e-mail conversations were sent to all
analysts.
THE ANALYSIS PROCESS
The analysts worked primarily on their own for an elapsed
time of ten weeks. They used two forms for recording their
work on an ongoing basis: a structured diary and a problem
description form (described below). The group met once or
twice a week to discuss the analysis process. Each analyst
produced both a verbal and written final report of their
analyses. A questionnaire assessed the analysts educational
and professional background. All of these data were
analyzed to produce the case reported in this paper.
The Diary Form
The diary form was adapted from [14]. The analyst used it
to record his orher activity each half hour while working on
the analysis. The analyst wrote a short description of the
activity and then placed it in one of six different categories:
literature search, reading for "what-it-is", reading for "how-
to-do-it", reading/analysis (when reading and analysis are
so intertwined as to be inseparable), analysis (when the
analyst knows the technique well enough to analyze
without reference to the literature), and unrelated (e.g.,
copying papers). The form also had columns for recording
a difficulty with the technique, an insight into the
technique, a problem with the design, a solution to a design
problem, and "other." At any time, the analyst could make
a note on the form and write an extended explanation of
what heor she was thinking in a separate, free-form diary.
The Problem Description Form
The problem description report (PDR) was adapted from
[7]. The PDR provided an area for describing the problem,
an estimate of the severity and frequency of the problem,
and an assessment of whether these judgments came from
the technique itself, as a side effect of the technique, or
from some form of personal judgment. Each PDR had a
reference number that also appeared in a diary's column
for recording a design problem.
The Final Report
Each analyst presented a videotaped verbal and a written
report that included a brief summary of the technique, an
annotated bibliography of the papers used to learn and
apply the technique, any modifications made to the
technique to allow it to be used for the Builder
specification, any areas of exceptional doubts or confidence
about using the technique, suggestions for improving
thetechnique, and the three most important problems that
the designers should fix in the Builder. Again, the written
artifacts contribute to the case inthis paper, but the tapes
have not yet been analyzed.
THE COGNITIVE WALKTHROUGHANALYSIS
The analyst, A1
A1 was a researcher in the School of Computer Science. He
had taken over adozen courses in CS, considered himself
fluent in two programming languages, and had worked
professionally as a programmer before taking part in this
evaluation. He had taken one cognitive psychology course,
but none in HCI. A1 received graduate-course credit for
participating in this analysis.
The choice of a technique
A1 spent 12 hours finding and reading papers about several
HCI techniques and reading the Builder specification before
deciding which analysis technique to choose.
He considered PUMs [6] but rejected it because it required
getting the PUM simulation code. He considered heuristic
evaluation as defined by Jeffries and colleagues [7] but
decided he did not have sufficient UI design experience.
He considered using standards or guidelines but rejected it
because it meant "a lot of reading though thick books." He
read Chapters 5 (Usability Heuristics) and 8 (Interface
Standards) in Nielsen's Usability Engineering
[11]. Despite some doubts about whether he had the
prerequisite training,he chose Cognitive Walkthrough
(CW) based on the initial HCI methods overview lecture
(above) and the summary of the CHI'92 workshop on
usability inspection methods [10].
USER ACTION SYSTEM FEEDBACK
Re open Drosophila volume and go to p. 23
1) Click onDrosophilia icon brings up simulation
2) Choose "Show Volume" brings up Drosophila
volume
from the "Windows" menu with page 1 on top
The usability problems identified
A1 submitted 42 PDRs, all of which were written while
doing tightly-coupled reading/analysis or analysis. These
PDRs included 48 unique usability problems and 4
duplicate problems (there were 9 PDRs with more than
oneproblem on them) as judged by consensus between the
authors. These problems covered all aspects of the Builder
application: bookmarks, frames, cross-references,
glossary, table of contents, cursor, help system, pages,
undo, volume, and the entire application as a whole; but not
problemswith the Drosophila target document. Of these
PDRs, A1 said he found 61% of them directly from using
the technique, 12% as a side effect of using the technique,
15% simply from reading the Builder design specification,
and 10% from other sources (2% were left blank). Of the
"other" reports, the source of these problems were the group
discussions.
ProblemDescription Form
Reference number in diary: 26
Brief description of the problem: Starting a volume in builder
mode: Necessity to choose "Windows - ShowVolume" to
get into builder mode with the current volume, contradicts
Mac experience that you get into the correct application
just by clicking on afile created by this application
Howdid you find this problem? Using my technique
How frequently will users encounter this problem? Often
How did you estimate frequency? "Other" - [this will occur]
each time the builder wants to edit an existing volume
How serious is this problem? Serious
How did you estimate severity? Personal Judgement
Other comments? Design suggestion: Redesign procedure
to open an existing volume. 1) click on icon representing
the volume, 2) a dialog box comes up asking whether this
volume shall be opened in builder or explorer mode,
3) application loads volume and displays first page of it
without any further action.
3=moderately serious and 5=must be changed for software
to be usable! On this scale, A1 judged none of the problems
he found to be trivial problems, 12% to be 2, 29% to be
moderately serious (3), 49% to be 4, and 5% must be
changed for the software to beusable at all (5% were left
blank). Again, none of these judgments were attributed to
the CW technique, whereas 95% were personal judgment
(5% wereleft blank).
The quality of A1's walkthroughs
All of A1's products were examined in detail by John
Rieman, a developer ofthe CW method. Rieman's opinion
of the work was that "A1's final papershowed an excellent
understanding of the...technique...[His] walkthroughs were
fairly good...but they had some flaws." (personal
communication, 16 Dec 1994) Rieman thought that A1
often failed to recognize steps wherethe user may not have
the right goal (question 1). In addition, Rieman thought A1
was too strick in his failure criterion for question 3.
For instance, A1 listed a failure when theuser's goal was to
create a glossary entry, but the button was labelled"Gloss."
Rieman would have called that a successful instance of
label-following rather than a failure. The first flaw would
miss usabilityproblems in the interface and the second flaw
would produce false alarms. In future empirical usability
studies of the interface, we will be able to assess Rieman's
predictions of the effectiveness of A1's walkthroughs.
DISCUSSION
What this case study says about the Cognitive
Walkthrough evaluation technique
The over-riding sense of the experiences of A1 is that the
CognitiveWalkthrough evaluation technique is learnable
and usable for a computerdesigner with little psychological
or HCI training. There are also some specific lessons in the
details of this case study for different interest groups.
For computer designers, the strongest message is that
reading the early research papers about CW can be
confusing and raise many concerns about the theory and
practice of CW. (It is unclear from this case whether this is
true for everyone, or only for analysts without formal
psychology training.) However, the practitioner's guide
[18] cleared up most of those issues for A1, suggesting
that designers should read only that chapter. This case
study alone cannot make such a recommendation with
confidence because it is impossible to determine whether
A1 benefitted from [18] because it is sufficient tolearn the
technique or only because he had read the other papers.
To investigate this further, the first author assigned only
[18] to her undergraduate HCI class and lectured only from
the content of that paper. The walkthroughs conducted by
the six undergraduate teams were examined in detail by
Rieman and considered "all fairly good and some were
excellent." (personal communication, 16 Dec 1994) Where
there were problems they often involved question 1 (right
goal?) and question 3 (good label?) as was the case with A1.
This experience strengthens our recommendation to avoid
theearly papers and read only the practitioners' guide if you
want to use CW. However, pay particular attention to the
explanations and examples of questions 1 and 3.
A comment on the case-study approach
This case study approach used many different types of
information toconverge on a picture of the process A1 went
through to produce hisanalysis. This story itself can help
designers learn and use CW by providing realistic
expactations about the process, the difficulties and insights,
and the length of time it takes to perform an analysis. Such
expectations are missing from a textbook description of a
technique. However, without them, new analysts can
become unnecessarily discouraged, or never even consider
embarking on such an analysis.
FUTURE WORK AND AN INVITATION
An important element missing from this analysis is a
measure of how effective A1 was in predicting the usability
problems that occur in theACSE Builder application. The
traditional benchmark against which to compare usability
assessment is empirical usability testing. Unfortunately,
the Builder has not yet been implemented to specification,
so empirical evaluation must remain as future work.
Another interesting benchmark would be to compare the
design suggestions made through the CW technique to the
actual implementation of the Builder when it is finished. It
is possible that some portion of the usability problems
associated with the design document would be fixed in the
normal software development process as the programmers
think hard about the details of implementation.
Acknowledgments
This work was supported by the AdvancedResearch
Projects Agency, DoD, and monitored by the Office of
NavalResearch under contract N00014-93-1-0934. The
views and conclusions contained in this document are those
of the authors and should not be interpreted as representing
the official policies, either expressed orimplied, of ARPA,
ONR, or the U. S. Government. We thank Robin Jeffries
for providing her original PDR and helpful suggestions,
"A1" for his thoughtful and thorough CWs, and
John Rieman for his evaluations of the CWs discussed
herein.
References
1. Bell, B., Rieman, J., andLewis, C. (1990) Usability
Testing of a Graphical Programming System:Things We
Missed in a Programming Walkthrough. In CHI'90
Proceedings, Seattle, WA, ACM, NY.
2. Butler, K, Jacob, R. J. K., & John, B. E.(1993)
Introduction and Overview of Human-Computer
Interaction.Tutorial materials, presented at INTERCHI,
1993 (Amsterdam, April 24 -April 29), ACM, New York.
3. Cuomo, D. L. & Bowen, C. D. (1994)
Understandingusability issues addressed by three user-
system interface evaluationtechniques. Interacting with
Computers, 6, 1, 86-108.
4. Desurvire, H. W. (1994) Faster! Cheaper!! Are
usability inspection methods as efficient as empirical
testing? In Jakob Nielsen and Robert L. Mack (eds.)
Usability Inspection Methods.New York: John Wiley.
5. Gallagher, S. & Meter, G. (1993) Volume View
Interface Design. Unpublished Report of the ACSE
Project, School of Computer Science Carnegie Mellon
University, May 4, 1993.
6. Howes, A., and Young, R.M. (1991)Predicting the
Learnability of Task-Action Mappings. In
CHI'91Proceedings, New Orleans, LA, ACM, NY.
7. Jeffries, R., Miller, J.R., Wharton, C.,and Uyeda,
K.M., (1991) User interface evaluation in the real world:
Acomparison of four techniques, CHI'91 Proceedings,
New Orleans, LA,ACM, NY.
8. Karat, C.-M. (1994) A comparison of userinterface
evaluation methods. In J. Nielsen and R. L. Mack (eds.)
Usability Inspection Methods. New York: John Wiley.
9. Lewis, C., Polson, P., Wharton, C.,Rieman, J. (1990)
Testing a walkthrough methodology for theory-based
designof walk-up-and-use interfaces. In CHI'90
Proceedings, Seattle, WA,ACM, NY.
10. Mack, R., Nielsen, J. (1992) Usability Inspection
Methods: Summary Report of a Workshop held at CHI'92..
IBM Technical Report #IBMC-18273. IBM T.J. Watson
Research Center, YorktownHeights, NY.
11. Nielsen, J., (1993) Usability Engineering, San Diego:
Academic Press Inc.
12. Pane, J.F., & Miller,P.L. (1993). The ACSE
multimedia science learning environment. Proceedingsof
the 1993 International Conference on Computers in
Education, Taipei,Taiwan.
13. Polson, P.& Lewis, C. (1990) Theory-based design for
easily learned interfaces. Human ComputerInteraction, 5,
191-220.
14. Rieman, J. (1993) The diary study: A workplace-
oriented research tool to guide laboratory effortsIn
Proceedings of INTERCHI', 1993 Amsterdam, ACM, NY.
15. Rowley, D. E., & Rhoades, D. G.(1992) The
Cognitive Jogthrough: A fast-paced user interface
evaluationprocedure. In CHI'92 Proceedings, Monterey,
CA, ACM, NY.
16. Wharton, C., Bradford, J., Jeffries, R.,Franzke, M.
(1992) Applying Cognitive Walkthrough to more complex
userinterfaces: Experiences, issues and recommendations.
In CHI'92Proceedings, Monterey, CA, ACM, NY.
17. Wharton, C., Lewis, C. (1994) The roleof
psychological theory in usability inspection methods. In J.
Nielsen and R. L. Mack (eds.) Usability Inspection
Methods. New York: JohnWiley.
18. Wharton, C., Rieman, J., Lewis, C., &Polson, P.
(1994) The Cognitive Walkthrough Method: A
practitioner's guide. In J. Nielsen and R. L. Mack (eds.)
Usability Inspection Methods. New York: John Wiley..
v
19. Yin, R. K. (1994) Casestudy research: Design and
methods (2nd ed., Applied Social Research Methods Series
Vol. 5). Thousand Oaks, CA: Sage Publications.
FOOTNOTE
(1)
In all, data on five cases were collected with analysts
using CognitiveWalkthrough, Claims Analysis, User
Action Notation, GOMS, and Heuristic Evaluation.
However, due to space limitations, this paper will report
the details of only the first case.Return to text.