CHI '95 ProceedingsTopIndexes
PapersTOC

Learning and Using the Cognitive WalkthroughMethod: A Case Study Approach

Bonnie E. John

Departments of Computer Science and Psychology and the HCI Institute
Carnegie Mellon University
Pittsburgh, PA 15213
(412) 268-7182
bej@cs.cmu.edu

Hilary Packer
Master of Software Engineering Program
Carnegie Mellon University
Pittsburgh, PA 15213
(412) 268-3759
hpacker@cs.cmu.edu

© ACM

Abstract

We present a detailed case study, drawn from many information sources, of acomputer scientist learning and using Cognitive Walkthroughs to assess a multi-media authoring tool. This study results in several clear messagesto both system designers and to developers of evaluation techniques: this technique is currenlty learnableand usable, but there are several areas where further method-development would greatly contribute to a designer's use of the technique. In addition, the emergent picture of the process thisevaluator went through to produce his analysis sets realistic expectations for other novice evaluators who contemplate learning and using Cognitive Walkthroughs.

Keywords:

usability engineering, inspection methods, Cognitive Walkthrough.

Introduction

In the last few years, there has beengrowing interest in studying different usability evaluation techniques to understand their effectiveness, applicability, learnability, and usability. Developers looking want to know which techniques will be suited for their system, budget and time frame. Universities want to know what to teach inthe growing number of HCI courses. Researchers developing techniques need to have feedback to allow them to improve and expand the techniques in useful directions. Some work has been done that compares different techniques (e.g., [3], [4], [7], [8]). This work has taken the form ofexperiments, formally comparing performance outcomes of different techniques. The dependent variables are typically quantitative: the number (and type)of usability problems identified, how closely a given technique predicts a user's behavior, the time it takes to perform the evaluation, the labor costs involved. Perhaps the biggest difficulty with these studiesis that they provide no data about what people actually do when theyare using these techniques. Without process data, it is difficult tounderstandhow the technique itself leads the analyst to identify usability problems,as opposed to the analyst simply being clever. Without process data, it adeveloper does not know what to expect when setting out to use a new technique. Finally, without processdata, it is difficult to provide meaningful feedback to the method- developers so they can improve their techniques. This situation is akin to a usability study of a system that only provides numerical dataabout performance time or total number of errors; what we need is the equivalent of a think- aloud protocol for assessing usability techniques.

In his book Case study research: Design and methods, Robert Yin [19] states that a case-study approach has an advantage over surveys, experiments, and other research strategies "...when a 'how' or 'why' question is being asked about a contemporary set of events over which the investigator has little or no control." ([19], p. 9) This seems to describe the situation facing thefield of HCI when evaluating methods. We are asking how a giventechnique can be used to predict usability problems, why it works in some situations and not in others, and we have virtually no control over how an analyst learns or uses a technique.

The essence of the case-study approach is to collect many different typesof data and use them "in a triangulating fashion" ([19] p. 13) to convergeon an explanation of what happened. When multiple sources of information converge, it boosts our confidence that we have understood the series of decisions occurring in a case; how these decisions were made, how theywere implemented, and what result was achieved. This deep understanding should allow us to know whether these processes and results are likely to re occur with other developers or in the next design project.

This paper presents the first in a series of five case studies of novice analysts learning and applying different usabilityassessment techniques to a single interface. (Footnote 1) These studies analyze several sources of evidence: problem description forms filled out as the analysts worked, discussions with theanalysts, final reports by the analysts, and, to obtain detailed processdata, diaries kept by analysts while they learned and applied these techniques. This case- study approach allows us to look at what the analysts did, the confusion and insights they had about the techniques, as well as more traditional performance measures like the number and type of usability problems identified.

THE ANALYSIS SITUATION

It is not uncommon for a development team to ask one of its members (or an outside person) with an interest in HCI to make recommendations about a user interface design (in fact, it was just such a request that led the first author to switch fields from mechanical engineering to HCI 15 years ago). When this happens the analyst must figure out what assessment technique he or she wants to use, find books or papers describing how to use it, learn the technique from these materials, and apply it to the system design. In this study, we set up a similar situation in the following way.

Figuring out what technique to use

Five volunteer analysts were given a 30minute lecture on HCI assessment techniques. This lecture was the methodspart of the Introduction and Overview of Human-Computer Interaction tutorial given at the CHI conference for the last three years [2]. The analysts heard this lecture from the same lecturer (the first author) and received the same tutorial notes and bibliography as tutorial participants,which is the way scores of professionals get their first introduction to usability assessment methods each year. One week later, each analyst chose the method they wished to use.

The system and specifications

The analysts were given two documents with which to do their analyses: the user interface specification of the ACSE multimedia authoring system and a target multimedia document (described below). The analysts read these documents at home for one week and then were given two 1-hour sessions with the head developer of the system to clarify any points of misunderstanding. The results of these sessions were written up and sent to each analyst via e- mail. The analysts had access to the developer via e-mail if any other difficulties arose understanding the documents, and all such e-mail conversations were sent to all analysts.

The ACSE Multi-media Authoring System The Advanced Computing for Science Education (ACSE) project has built a multi-media science learning environment to teach the skills of scientific reasoning [12]. This software system provides an author with a structured document frameworkand a set of tools for constructing science lessons containing text, still graphics, movies, and simulations. The system provides the science student with tools for navigating through the lesson, viewing the movies, and manipulating and running the simulations.

Based on several years experience with this system, the ACSE project recently designed a second version of it'sauthoring tool, called the VolumeViewBuilder (henceforth, Builder). The redesign was a group effort recorded in a user interface specification written by a software developer and a technical writer, both of whom had recently sat in on an HCI design class [5]. The implementation of the Builder was begun at the same time as our case studies and continues at the time of this writing.

Figure 1. Example of an illustration in the Volume View Interface Design [5, p. 16].

In that document, this is a full-page illustration of the screen of the Builder application The ACSE project gave us their user interface specification for analysis. The specification included an introduction to the system and the goals of the document (4 pages), the interface for science students (theVolumeViewExplorer, 9 pages) and the interface for the Builder (22 pages). The specification included 37 figures of thescreens (1 in the introduction, 11 in the Explorer, 24 in the Builder)which ranged from small pictures of specialized cursor icons to full-pagefigures of the entire screen (e.g., Figure 1).

The Example Multi-Media Document: The target multi-media document was a biology lesson about Drosophiladevelopment adapted from an advanced undergraduate biology course using theACSE system for laboratory sessions. This volume is 55 pages long, and inaddition to text contains 23 high resolution images and figures, 3 movies, 7 simulations, 10 fragments ofsimulation code, and 10 review questions. The original volume was producedwith the first version of the Builder and did not included a table of contents, a glossary, or hyper-links, because those features were notincluded in that version of the Builder. The first author modified this lesson to include these features and produced a hard copy target document. This document did not run on a computer, but the links were explicitly indicated in supplementary lists,i.e., all the table of contents entries and their page numbers were in alist, all the glossary terms were in a list, all the hyperlink "hot phrases" and where they would point were in a list. This document and the lists were given to the analysts as a typical document created with the new Builder.

THE ANALYSIS PROCESS

The analysts worked primarily on their own for an elapsed time of ten weeks. They used two forms for recording their work on an ongoing basis: a structured diary and a problem description form (described below). The group met once or twice a week to discuss the analysis process. Each analyst produced both a verbal and written final report of their analyses. A questionnaire assessed the analysts educational and professional background. All of these data were analyzed to produce the case reported in this paper.

The Diary Form

The diary form was adapted from [14]. The analyst used it to record his orher activity each half hour while working on the analysis. The analyst wrote a short description of the activity and then placed it in one of six different categories: literature search, reading for "what-it-is", reading for "how- to-do-it", reading/analysis (when reading and analysis are so intertwined as to be inseparable), analysis (when the analyst knows the technique well enough to analyze without reference to the literature), and unrelated (e.g., copying papers). The form also had columns for recording a difficulty with the technique, an insight into the technique, a problem with the design, a solution to a design problem, and "other." At any time, the analyst could make a note on the form and write an extended explanation of what heor she was thinking in a separate, free-form diary.

The Problem Description Form

The problem description report (PDR) was adapted from [7]. The PDR provided an area for describing the problem, an estimate of the severity and frequency of the problem, and an assessment of whether these judgments came from the technique itself, as a side effect of the technique, or from some form of personal judgment. Each PDR had a reference number that also appeared in a diary's column for recording a design problem.

The Analysts' Discussions The discussions with the analysts were structured around several open-ended questions: What did you do in your analysis in the last couple of days? Did you have any difficulty with the technique? Did you have any insights into thetechnique? Did you discover anything else notable about the technique? Thus, these discussions centered around the process, not the content of the analyses. That is, they discussed things like problems getting or understanding papers, problems making the techniques applicable to the Builder, types of information their techniques needed or provided, but not specific usability problems with the Builder (e.g., that the method of creating hyperlinks was awkward or that the menu items were in the wrongplace). The first author took notes during the discussions, which contributed to the case in this paper. These discussions were also audio taped, but the tapes have not yet been analyzedand do not contribute to this case.

The Final Report

Each analyst presented a videotaped verbal and a written report that included a brief summary of the technique, an annotated bibliography of the papers used to learn and apply the technique, any modifications made to the technique to allow it to be used for the Builder specification, any areas of exceptional doubts or confidence about using the technique, suggestions for improving thetechnique, and the three most important problems that the designers should fix in the Builder. Again, the written artifacts contribute to the case inthis paper, but the tapes have not yet been analyzed.

THE COGNITIVE WALKTHROUGHANALYSIS

The analyst, A1

A1 was a researcher in the School of Computer Science. He had taken over adozen courses in CS, considered himself fluent in two programming languages, and had worked professionally as a programmer before taking part in this evaluation. He had taken one cognitive psychology course, but none in HCI. A1 received graduate-course credit for participating in this analysis.

The choice of a technique

A1 spent 12 hours finding and reading papers about several HCI techniques and reading the Builder specification before deciding which analysis technique to choose. He considered PUMs [6] but rejected it because it required getting the PUM simulation code. He considered heuristic evaluation as defined by Jeffries and colleagues [7] but decided he did not have sufficient UI design experience. He considered using standards or guidelines but rejected it because it meant "a lot of reading though thick books." He read Chapters 5 (Usability Heuristics) and 8 (Interface Standards) in Nielsen's Usability Engineering [11]. Despite some doubts about whether he had the prerequisite training,he chose Cognitive Walkthrough (CW) based on the initial HCI methods overview lecture (above) and the summary of the CHI'92 workshop on usability inspection methods [10].

CW is an usability analysis technique similar in rationale and execution to requirements or code walkthroughs. CW focuseson the ease of learning a new interface. This method has been evolving since its introduction in 1990 [9] and it's current form [18] was eventually used by A1. Inputs to a CW are a description of the interface,a task scenario, assumptions about the knowledge a user will bring to thetask, and the specific actions a user must perform to accomplish the task with the interface. The group, orindividual, performing the CW then examines each step in the correct action sequence asking the following four questions. "(1) Will the user try toachieve the right effect? (2) Will the user notice that the correct action is available? (3) Will the userassociate the correct action with the effect that the user is trying toachieve? (4) If the correct action is performed, will the user user seethat progress is being made toward solution of the task?" ([18] p. 106) If a credible success story cannot be told for each question at each step, then a usability problem has been identified and CW suggests ways of fixing the problem.

Figure2. Timeline of the activities of analyst A1 as he learned and applied the Cognitive Walkthrough inspection method to the Builder multimedia authoring tool, as recorded in his diary forms.

The time course of the analysis A timeline of the activities of A1 appears in Figure 2. After choosing CW, A1 read several other papers [1, 9, 13, 15, 16] before finding [18] (which at the time was only a University of Colorado technical report, but is nowpublically available). In his final report, A1 stated that [18] is the only reference really needed to learn and use the technique and strongly recommends it to other analysts because ofi ts clear, step-by-step description and concise examples. When he found this paper he read it with special emphasis on learning how to do the technique (total 3 hrs).

A1 then spent a total of 4.5 hours setting up task scenarios with correctaction sequences in preparation for doing the walkthrough. To do this, A1examined the functionality of the Builder and the frequency of elements (e.g., PICT frames, movies, glossary terms) occurring in the Drosophila target document. In doing so, A1 discovered that the Drosophila document did not include several of the features included inthe Builder interface document, e.g., it did not include graphical representations of simulation code likeradio buttons for a fixed set of mutually exclusive parameters or slide-bars for changing numerical parameters in a simulation. Noting this deficit in the target document in the diary for later discussion, A1 chose two task scenarios, the first being a short one (creating pps. 1-3 of the target document) to test his understanding of the walkthrough technique, and the second being much longer (creating pps. 24-35; 104 user actions).

A1 then determined the correct sequence of 36 user actions for the shorttask (1.5 hours) and performed the walkthrough (2 hours). He found six usability problems and six associated design suggestions during this self-imposed practice walkthrough. In addition to usability problems, this short analysis pointed out many gaps inthe Builder interface specification. After this, A1 read [17], which hefound interesting, but not useful for applying CW.

A1 then began to apply the CW method in earnest. Since he did not haveready access to designers of the system, he performed the walkthrough himself, rather than in a group. He first prepared the correct actionsequence for this longer task (104 actions). He began the walkthrough with frequent reference to the [18], but after2.5 hours of tightly intertwined reading/analysis and a second reading of the Builder specification, he did not need to refer tothe papers anymore. He then spent 16 hrs doing the walkthrough, that is,for each user action he asked the four questions recommended by CW and either wrote a success story or filled out a problem description report for each question. During this phase hereported 36 usability problems and 35 design suggestions. His final report cites seven more general psychology and HCI papers, which his diary doesnot record as being read during this time period (presumably read during his cognitive psychology course). An example of the content of A1's walkthrough Figure 3 shows a portion of the action sequence A1 prepared for himself for the second task scenario. He listed Action Sequence Task B: Create 24-35 in Drosophila document

USER ACTION SYSTEM FEEDBACK

Re open Drosophila volume and go to p. 23
1) Click onDrosophilia icon brings up simulation
2) Choose "Show Volume" brings up Drosophila volume from the "Windows" menu with page 1 on top

Figure 3. Sequence of correct actions for a portion of the long task scenario used by A1.

both the action a user would have to take toaccomplish the task and the system's response for each step.

Figure 4 shows A1's walkthrough of the actions specified in Figure 3. For each action, he enumerated the CW questions and either checked off a question if he considered it a success and recorded the reason for hisjudgement, or justified why he thought the question indicated a failure of the interface to support learnability. If there was a failure at a action, A1 prepared a PDR (Figure 5) and put the number of the PDR in his CW. In addition, A1 often recorded a suggestion for redesigning the system to prevent the failure.

The usability problems identified

A1 submitted 42 PDRs, all of which were written while doing tightly-coupled reading/analysis or analysis. These PDRs included 48 unique usability problems and 4 duplicate problems (there were 9 PDRs with more than oneproblem on them) as judged by consensus between the authors. These problems covered all aspects of the Builder application: bookmarks, frames, cross-references, glossary, table of contents, cursor, help system, pages, undo, volume, and the entire application as a whole; but not problemswith the Drosophila target document. Of these PDRs, A1 said he found 61% of them directly from using the technique, 12% as a side effect of using the technique, 15% simply from reading the Builder design specification, and 10% from other sources (2% were left blank). Of the "other" reports, the source of these problems were the group discussions.

The PDRs asked for a judgment of how frequently the problem would occur toa user of the Builder. A1 judged that none of the problems he reportedwould occur only once to a user, 22% to occur rarely, 37% to occuroccasionally, 29% to occur often, 10% to occur constantly, (2% were left blank). None of these judgments camedirectly from using the technique, 39% came from personal judgment, 56% came from "other" sources (5% were left blank). The other source of frequency information was usually the frequency analysis of the target document that A1 had performed when selecting the task scenarios, whereas a few were attributed simply to "common sense."

The PDRs also asked for a judgment of theseverity of the usability problem on a scale of 1 to 5, where 1=trivial, CognitiveWalkthrough Task B: Create 24-35 in Drosophila document

     

Figure 4. A1's answers to the four CW questions for the actions shown in Figure 3.

The arrow pointing to "26"indicates that this failure is reported in PDR #26 shown in Figure 5.

Figure 5.PDR #26 reporting the learnability problem identified in the CW of step 2 (Figure 4)

ProblemDescription Form

Reference number in diary: 26
Brief description of the problem: Starting a volume in builder mode: Necessity to choose "Windows - ShowVolume" to get into builder mode with the current volume, contradicts Mac experience that you get into the correct application just by clicking on afile created by this application Howdid you find this problem? Using my technique How frequently will users encounter this problem? Often How did you estimate frequency? "Other" - [this will occur] each time the builder wants to edit an existing volume How serious is this problem? Serious How did you estimate severity? Personal Judgement Other comments? Design suggestion: Redesign procedure to open an existing volume. 1) click on icon representing the volume, 2) a dialog box comes up asking whether this volume shall be opened in builder or explorer mode, 3) application loads volume and displays first page of it without any further action. 3=moderately serious and 5=must be changed for software to be usable! On this scale, A1 judged none of the problems he found to be trivial problems, 12% to be 2, 29% to be moderately serious (3), 49% to be 4, and 5% must be changed for the software to beusable at all (5% were left blank). Again, none of these judgments were attributed to the CW technique, whereas 95% were personal judgment (5% wereleft blank).

In his final report, A1 said that the three most important problems to fix were the icons in the tool palette, the items in the menus, and the relationship of the glossary and table- of-contents panes to the main-media pane of the Builder window. The first two of these problems had many PDRs associated with them (6 and 5, respectively, about 13% of the total number of PDRs for each one). This is not surprising, as two of CW's four questions focus on the availability and meaningfulness of cues in the system, both of which point to issues withicons and menus. The last problem appears in only one PDR. However, A1 justifies the importance of this problem with his frequency analysis of the Drosophila document: glossary terms were themost frequently occurring feature of that document. Difficulties with CW and insights into its use In all, A1 recorded 87 notes in his diary which he labeled either adifficulty with the CW technique or an insight into its use. Figure 2shows that the difficulties and insights occur not only when initiallyreading about the technique or when learning how to use it, but throughout the analysis. These notes fall into several major content categories. Also, some concerns disappear as A1 learns more or becomes more familiar with CW, whereas others persist for the duration of the analysis.

Background and training required of the analyst (3 notes). When reading about the CW technique, A1 was initially concerned about theamount of training necessary for the analyst because of statements made in[9] and [10]. However, this concern disappeared from his diary notes after the 16th hour. By the final report, A1 was "very comfortable with learning and applying the technique" and asserted that "little or no experience in either user interface evaluation or cognitive psychology isrequired of the user [of the CW technique]."

Applicability of CW to the Builder application (5 notes). A1 wondered whether CW would scale up to the complex Builder interface.However, reading [16] seemed to allay these concerns. In the final report, A1 says he "...wondered whether the assumptions of the walkthrough...would apply to the evaluation of the Volume View. But the assumptions about the user population (novice users with Mac experience) turned out to be consistent with these assumptions and the more recent versions of the CW are notrestricted to walk-up-and-use interfaces."

Underlying theory (13 notes). A1 expressed many initial concerns about the underlying theory of CW. He was concerned about the validity of CE+ [13] and what part the CE+ theory played in theactual walkthrough. He was concerned about his lack of knowledge about the underlying theory,and how that would impact his ability to conduct an effective walkthrough.However, these concerns disappear while reading [18] because A1 reasoned that CW's assumption that the user's actions would largely be guided by the interface applied to the Builder. In his final report, A1 feels confident in his ability to conduct a walkthrough, and states that "Little or no experience in...cognitive psychology is required... The main strength of the CW is that the technique is very simple and easy to learn."

Task scenarios (13 notes). A1 had two types of concerns about the task scenarios CW requires. Thefirst concern is with his ability to develop "good" task scenarios. He feltthat in order to come up with realistic task scenarios, he would have to talk to the users of the Builder, but this application was too new to have many users or a history of use,so he had to settle for performing a frequency analysis on the Drosophila example document. He continued to have insights about the choice of task scenarios throughout the analysis process, right up until the very end when he decided that task scenarios for a design tool like the Builder shouldinclude modification tasks and recovery from error as well as the create-from-scratch tasks he actually analyzed (this last insight came from the group discussions, brought to the table by the analyst using GOMS).

A1's second concern was about not having the designers accessible to dictate the actions equences for the tasks as the CW papers suggest. This remained a concern throughout the analysis because he believed he needed to know more about the Builder than the specification contained, as evidenced by the fact thathis diary contains 23 notes about gaps in the specification. However, A1 turned this problem into a virtue by the final report; he stated, "determining the sequence of correctactions for longer task scenarios can take quite a lot of time; and thiscan be quite costly if being done by a designer. Therefore, I suggest that the sequence of actions is determinedby the evaluator himself. Another reason for this modification is that it puts the evaluator naturally in a situation of learning by exploration.After all, this is the main focus of the Cognitive Walkthrough." A1 recognizes, however, that if theevaluator determines the sequence of actions, he or she must be given an opportunity to clarify questions with the designer.

The process of doing a CW (9 notes). A1 hadseveral concernsabout the process of performing a CW that arose from reading the early research papers about the technique [1, 9, 13, 15, 16]. He was concerned about filling out structured forms, how to handle design suggestions, and how to record usability problems that were not directly connected to the task scenarios. However, all of these concerns disappeared when A1 read [20].

During the practice walkthrough, A1 found the process of stepping through re-occurring sub-procedures to be very tedious. To relieve this burden, A1 introduced macros for tasks that occur frequently in the samecontext in the task scenarios. That is, he would perform the walkthrough for the first occurrence of a sub-procedure at the lowest level of granularity, but for subsequent occurrences, he used a macro to symbolize the detailed steps.

One process concern recorded by A1 was justified when the PDRs wereexamined closely. Early on, A1 was concerned about how to keep track ofmany usability problems particularly when the walkthrough is done over an extended period of time. Of the 52 problems reported by A1, we(the authors) agreed that 4 of these problems were duplicates. Thisamounts to 8% of the problems, but this problem might escalate rapidly withlarger systems.

The basic capabilities of the CW method (7 notes) A1 voiced an early concern, raised in [18] that CW could not address global design issues. Evidently nothing in his experience with CW helped to quell that concern, for in his final report, A1 reiterates, "through thefocus on the narrow path determined by the sequence of correct actions, global design issues are not explicitly addressed." Furthermore, A1 says the focus on correct sequences of actions prevents CW from addressing the easeof recovery from error. His suggestion for explicitly including error-recovery task scenarios is an effort to overcome this problem.

Finally, A1 was concerned that the technique doesnot provide guidance for rating the frequency and severity of usability problems. Indeed, even if this concerned had not been articulated evidence from the PDRs is overwhelming. On the42 PDRs submitted, none of the estimates of either frequency or severity were accredited to the CW technique.

The quality of A1's walkthroughs

All of A1's products were examined in detail by John Rieman, a developer ofthe CW method. Rieman's opinion of the work was that "A1's final papershowed an excellent understanding of the...technique...[His] walkthroughs were fairly good...but they had some flaws." (personal communication, 16 Dec 1994) Rieman thought that A1 often failed to recognize steps wherethe user may not have the right goal (question 1). In addition, Rieman thought A1 was too strick in his failure criterion for question 3. For instance, A1 listed a failure when theuser's goal was to create a glossary entry, but the button was labelled"Gloss." Rieman would have called that a successful instance of label-following rather than a failure. The first flaw would miss usabilityproblems in the interface and the second flaw would produce false alarms. In future empirical usability studies of the interface, we will be able to assess Rieman's predictions of the effectiveness of A1's walkthroughs.

DISCUSSION

What this case study says about the Cognitive Walkthrough evaluation technique

The over-riding sense of the experiences of A1 is that the CognitiveWalkthrough evaluation technique is learnable and usable for a computerdesigner with little psychological or HCI training. There are also some specific lessons in the details of this case study for different interest groups. For computer designers, the strongest message is that reading the early research papers about CW can be confusing and raise many concerns about the theory and practice of CW. (It is unclear from this case whether this is true for everyone, or only for analysts without formal psychology training.) However, the practitioner's guide [18] cleared up most of those issues for A1, suggesting that designers should read only that chapter. This case study alone cannot make such a recommendation with confidence because it is impossible to determine whether A1 benefitted from [18] because it is sufficient tolearn the technique or only because he had read the other papers. To investigate this further, the first author assigned only [18] to her undergraduate HCI class and lectured only from the content of that paper. The walkthroughs conducted by the six undergraduate teams were examined in detail by Rieman and considered "all fairly good and some were excellent." (personal communication, 16 Dec 1994) Where there were problems they often involved question 1 (right goal?) and question 3 (good label?) as was the case with A1. This experience strengthens our recommendation to avoid theearly papers and read only the practitioners' guide if you want to use CW. However, pay particular attention to the explanations and examples of questions 1 and 3.

Another message to designers is that CW itselfwill not give you much guidance about how to pick task scenarios. (This same message results from the GOMS and Claims Analysis cases as well.) Scenario generation still seesm to be a black art. However, this case suggests that including modification and error-recovery tasks in the scenarios may be important and easily overlooked. Similarly, CW will not give any guidance as to the frequency orseverity of usability problems (a deficit shared by all five evaluation methods studied). These gaps in HCI techniques call to the developers of evaluation techniques to fill a crying need.

A more specific question to the developers of CW concerns the modification suggested by A1 that macros be used to reduce tedium. Before recommending this procedure to designers, the impact of macros should be assessed. True, they may decrease analysis time. However, they may also focus attention away from tedious tasks that users themselves would find objectionable, just as the analysts do. The developers of CW, or others who want to extend its use, may view the assessment of macros as another opportunity for contribution.

A comment on the case-study approach

This case study approach used many different types of information toconverge on a picture of the process A1 went through to produce hisanalysis. This story itself can help designers learn and use CW by providing realistic expactations about the process, the difficulties and insights, and the length of time it takes to perform an analysis. Such expectations are missing from a textbook description of a technique. However, without them, new analysts can become unnecessarily discouraged, or never even consider embarking on such an analysis.

FUTURE WORK AND AN INVITATION

An important element missing from this analysis is a measure of how effective A1 was in predicting the usability problems that occur in theACSE Builder application. The traditional benchmark against which to compare usability assessment is empirical usability testing. Unfortunately, the Builder has not yet been implemented to specification, so empirical evaluation must remain as future work. Another interesting benchmark would be to compare the design suggestions made through the CW technique to the actual implementation of the Builder when it is finished. It is possible that some portion of the usability problems associated with the design document would be fixed in the normal software development process as the programmers think hard about the details of implementation.

In addition to the lessons learned from a single case study such as this, the potential for many more lessons is inherent in multiple case studies. The data from the other fourevaluation techniques (Claims Analysis, User Action Notation, GOMS, andHeuristic Evaluation) will contribute to our knowledge of learning and using different methods for HCI evaluation,especially as we begin to compare the experiences using the converging evidence of the case- study approach. However, even more can be learned with replication and expansion of the conditions under which these cases were performed. Therefore, the first author invites any practitioner or researcher who wishes to participate to contact her about collecting process data for case-studies. The PDRs and diary forms are available for distribution, as are the Introduction to HCI Methods lecture materials, the Builder interface design document, and the Drosophila example document.

Acknowledgments

This work was supported by the AdvancedResearch Projects Agency, DoD, and monitored by the Office of NavalResearch under contract N00014-93-1-0934. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed orimplied, of ARPA, ONR, or the U. S. Government. We thank Robin Jeffries for providing her original PDR and helpful suggestions, "A1" for his thoughtful and thorough CWs, and John Rieman for his evaluations of the CWs discussed herein.

References

1. Bell, B., Rieman, J., andLewis, C. (1990) Usability Testing of a Graphical Programming System:Things We Missed in a Programming Walkthrough. In CHI'90 Proceedings, Seattle, WA, ACM, NY.
2. Butler, K, Jacob, R. J. K., & John, B. E.(1993) Introduction and Overview of Human-Computer Interaction.Tutorial materials, presented at INTERCHI, 1993 (Amsterdam, April 24 -April 29), ACM, New York.
3. Cuomo, D. L. & Bowen, C. D. (1994) Understandingusability issues addressed by three user- system interface evaluationtechniques. Interacting with Computers, 6, 1, 86-108.
4. Desurvire, H. W. (1994) Faster! Cheaper!! Are usability inspection methods as efficient as empirical testing? In Jakob Nielsen and Robert L. Mack (eds.) Usability Inspection Methods.New York: John Wiley.
5. Gallagher, S. & Meter, G. (1993) Volume View Interface Design. Unpublished Report of the ACSE Project, School of Computer Science Carnegie Mellon University, May 4, 1993.
6. Howes, A., and Young, R.M. (1991)Predicting the Learnability of Task-Action Mappings. In CHI'91Proceedings, New Orleans, LA, ACM, NY.
7. Jeffries, R., Miller, J.R., Wharton, C.,and Uyeda, K.M., (1991) User interface evaluation in the real world: Acomparison of four techniques, CHI'91 Proceedings, New Orleans, LA,ACM, NY.
8. Karat, C.-M. (1994) A comparison of userinterface evaluation methods. In J. Nielsen and R. L. Mack (eds.) Usability Inspection Methods. New York: John Wiley.
9. Lewis, C., Polson, P., Wharton, C.,Rieman, J. (1990) Testing a walkthrough methodology for theory-based designof walk-up-and-use interfaces. In CHI'90 Proceedings, Seattle, WA,ACM, NY.
10. Mack, R., Nielsen, J. (1992) Usability Inspection Methods: Summary Report of a Workshop held at CHI'92.. IBM Technical Report #IBMC-18273. IBM T.J. Watson Research Center, YorktownHeights, NY.
11. Nielsen, J., (1993) Usability Engineering, San Diego: Academic Press Inc.
12. Pane, J.F., & Miller,P.L. (1993). The ACSE multimedia science learning environment. Proceedingsof the 1993 International Conference on Computers in Education, Taipei,Taiwan.
13. Polson, P.& Lewis, C. (1990) Theory-based design for easily learned interfaces. Human ComputerInteraction, 5, 191-220.
14. Rieman, J. (1993) The diary study: A workplace- oriented research tool to guide laboratory effortsIn Proceedings of INTERCHI', 1993 Amsterdam, ACM, NY.
15. Rowley, D. E., & Rhoades, D. G.(1992) The Cognitive Jogthrough: A fast-paced user interface evaluationprocedure. In CHI'92 Proceedings, Monterey, CA, ACM, NY.
16. Wharton, C., Bradford, J., Jeffries, R.,Franzke, M. (1992) Applying Cognitive Walkthrough to more complex userinterfaces: Experiences, issues and recommendations. In CHI'92Proceedings, Monterey, CA, ACM, NY.
17. Wharton, C., Lewis, C. (1994) The roleof psychological theory in usability inspection methods. In J. Nielsen and R. L. Mack (eds.) Usability Inspection Methods. New York: JohnWiley.
18. Wharton, C., Rieman, J., Lewis, C., &Polson, P. (1994) The Cognitive Walkthrough Method: A practitioner's guide. In J. Nielsen and R. L. Mack (eds.) Usability Inspection Methods. New York: John Wiley.. v 19. Yin, R. K. (1994) Casestudy research: Design and methods (2nd ed., Applied Social Research Methods Series Vol. 5). Thousand Oaks, CA: Sage Publications.

FOOTNOTE

(1) In all, data on five cases were collected with analysts using CognitiveWalkthrough, Claims Analysis, User Action Notation, GOMS, and Heuristic Evaluation. However, due to space limitations, this paper will report the details of only the first case.
Return to text.