



Dana L. Uehling
Karl Wolf
Prior to a usability testing session, an "expert" user is recorded performing a
task. The recording becomes a performance baseline. Later, during actual
usability testing, a "novice" user is recorded performing the same task. The
action recordings of the two users are then compared by the tool and the
comparison results are shown graphically. The hypothesis is that by
graphically comparing the actions of an expert and average novice users, a
usability analyst can quickly figure out where usability problems (e.g.
confusion with menu choices) arise with the user interface.
As part of the usability testing method we have been using at NASA's Goddard
Space Flight Center, we record one or more developers performing the usability
tasks and use their performance as a baseline for defining the desired
performance of a novice user [3]. Since the developers of the system typically
know the optimum way to perform a task, a large deviation from this optimum
performance may signify a usability problem. A tool to automate this
comparison of performance could be useful and cost effective.
A paper by Asahi and Iseki [1] describes a tool that analyzes logs of user
"events" combined with state information about the application and produces
various graphical representations of the analysis. Here, the authors were
analyzing a finite-state-machine application for a fax machine user interface.
The number of states was small and well defined, and each logged event
contained this state information.
Several questions were raised by this paper: Given that our applications
typically are not written as finite state machines, can a tool like this be
developed to help with our usability evaluations? How can we apply similar
recording and analysis techniques to our applications? What is the best way to
graphically show the matching of user actions?
The quest for automation forced us to answer many fundamental questions: What
is a measurable action? What information is logged in the event record and how
do we best use it? How do we compare actions? What criteria do we use? What
is the best way to display the comparison results? How do you get an analyst
involved? How can the analyst provide feedback to the tool so the tool can make
improved comparisons?
After the usability testing is complete, the usability test analyst runs the
usage_analyze utility. This utility allows the user to specify two
record files to compare, an "expert" file and a "novice" file. The files are
passed to a third utility called usage_compare, a perl script, which
performs the comparison. Several comparison techniques are available for
matching the novice action nodes to the expert action nodes: forward from last
matched node, forward from first unmatched node, nearest neighbor to last
matched node, and best fit. For example, the forward-from-last-matched-node
method tries to compare the next novice node to the expert node that is after
the last matched expert node. If that node does not match the novice node, it
will try to match the novice node with the next expert node in the series. It
continues searching through the expert nodes until a match is found or until it
has exceeded the analyst specified event match window.
After the comparison is performed, usage_analyze displays the results
in a graph of action nodes
[FIGURE 1: UsAGE graph of novice nodes vs. expert nodes.]. The
"expert" series of actions is displayed linearly across
the top of the graph. The novice series of actions is displayed as a
comparison of the expert's action, with arrows denoting the order in which the
actions were performed. Out-of-sequence actions are represented by arc's
connecting arrows pointing in the forward or reverse direction as necessary.
Unmatched actions taken by the novice appear as nodes placed vertically below
previously matched expert nodes. Currently, the nodes are labeled with the TAE
event name.
In addition to the graph, usage_analyze displays some metrics about the
comparison results including percentage of expert nodes matched to novice
nodes, ratio of novice to expert nodes, and percentage of unmatched novice
nodes.
During early prototype testing of UsAGE, it was determined it was hard for the
analyst to conceptually match the nodes on the graph with the actual operations
of the application. As a result, we added the capability to "playback" selected
portions of the graphed actions using the real application user interface.
This allows further investigation of the actions of the novice user. During a
report of the problems found, the playback feature can be used to illustrate
the problem areas to the application developers.
Another feature we would like to add is the ability to alter the matching of
the action nodes. If a task involves similar actions, UsAGE may
incorrectly match certain novice nodes to certain expert nodes. The usability
analysts, based on observations during the testing and their knowledge of the
tasks, may know that the novice node actually matched a different expert node.
The analyst would then want to be able to specify the particular matching and
have UsAGE reevaluate the other parts of the action sequence.
Other ways to graph the nodes and other metrics derived from the node data also
need to be explored. For example, if we can display more than one novice
compared to an expert on one graph, this would provide the ability to see if
each novice deviates from the expert's path at the same point in the series of
actions. This could suggest a common usability problem.
Finally, we need to apply this tool as part of a usability testing process and
evaluate its effectiveness at locating usability problems and its own
usability. This work and analysis is key to knowing if the hypothesis is
valid.
2. Hoiem, Derek E. and Sullivan, Kent D. Designing and using integrated data
collection and analysis tools: Challenges and considerations. Behaviour
& Information Technology, 1994, Vol.13 Nos. 1 and 2 (Jan. - April),
160-170.
3. Szczur, Martha. Usability testing - on a budget: a NASA usability test
case study. Behaviour & Information Technology, 1994, Vol.13 Nos. 1 and 2
(Jan. - April), 106-118.
4. Szczur, M. and Sheppard, S. TAE Plus: Transportable Applications
Environment Plus: A User Interface Development Environment. ACM Transactions
on Information Systems, Vol. 11, No. 1, January, 1993.
Abstract
This paper describes a prototype usability test tool which will automate
detection of serious usability problems. The tool records the actions that a
user makes while performing a predefined application task. Currently the tool
supports only user interfaces created with TAE Plus. Keywords:
Usability testing, user interface design, TAE Plus.
Introduction
It is well known that there are many benefits to evaluating a user interface
for usability. One method of evaluating a user interface is through usability
testing. The testing involves observing a typical user performing predefined
tasks with a system. Various types of information may be recorded including
the time it takes to perform the task, the number and types of errors made, and
the user's rating of the system. Often video recordings of the user sessions
are also made. This data is then analyzed to identify problem areas in the
user interface. This analysis is largely a manual process and can be quite
time consuming.
DEVELOPMENT of UsAGE
Early in the development cycle we spent some time researching efforts by
others to automate the analysis of usability testing data. While there have
been efforts to automate the collection of data during a usability test [2],
much of the analysis of this data is still a manual process. Our goal is to
automate this process as much as possible DESCRIPTION of UsAGE
We based our tool, UsAGE, on TAE Plus, a user interface
development and management system [4], since TAE already contained the ability
to record user actions. A utility called usage_collect was developed
to record and store the actions of the users during a usability test session.
The utility also enables the usability test administrator to record
time-stamped comments during the test session.
FUTURE DIRECTIONS
Most of the effort so far has been on exploring matching techniques and
adding basic functionality. We are still exploring how to graphically
represent the time spent between actions, with the knowledge that a large pause
between actions may represent time the user spent trying to decide what action
to try next and therefore may indicate a usability problem.Acknowledgments
The authors would like to acknowledge NASA's Marti Szczur and Sylvia Sheppard
who provided much support for this task
References
1. Asahi, T. and Iseki, O. UI-tester: A Tool for Measuring Usability. HCI
International '93 Abridged Proceedings, p. 216, 1993.