Logo AHome
Logo BIndex
Logo CACM Copy

shortpapTable of Contents


Don't Use a Product's Developers for Icon Testing

Julie B. Holloway*, John H. Bailey**

*University of California, Santa Cruz
Santa Cruz, CA
jewell@cats.ucsc.edu

**IBM, Santa Teresa Laboratory
San Jose, CA
baileyj@vnet.ibm.com


ABSTRACT

This study compared the results of 10 software developers and 10 university students for icon recognition and preferences. There were 54 icons and 15 concepts, and each concept had two to four representative icons. First, participants attempted to match each icon with one of the 15 product concepts. Next, the participants were asked to pick the best icon from the ones specifically designed to represent each concept. The students correctly recognized more icons (M = 34.7) than the developers (M = 27.8), t(18) = 2.1, p < .05. The icons recognized most often by students and developers were different for two concepts, and the icons preferred most often by students and developers were different for four concepts. We believe that the data support the hypothesis that using product developers rather than representative users can result in incorrect decisions in icon usage.

KEYWORDS:

Icon, developer, student, user, recognition, usability, preference.

INTRODUCTION

This study was originally designed to decide which icons should be used in a computer operations management product. A human factors engineer and graphics designer used ideas generated during a brainstorming session with developers to design icon alternatives for 15 different concepts. Some of the concepts were common (e.g. system down, authorization, modem), while others were more specialized (e.g. PPP link, FTP1, FTP2).

Using a product's development team for evaluating product usability is discouraged because their mental model is likely to be different than that of the users (Booth, 1989). However, limited time and resources during development of this product neccesitated using the developers for icon testing.

The data from the developers appeared to be valid, and it was used to make decisions about which icons to use in the product. However, the authors were left with a nagging suspision that the results of the test might have been different with representative users. To confirm this suspicion, two months later, we tested another set of users that had characteristics similar to those of the target users. These 'surragate' users were computer literate university students and recent university graduates.

In this paper, we describe the procedures used to collect the data and discuss the results of comparing developers' data and students' data. We hypothesized that there would be significant differences between recognition and preferences of the two groups.

METHODS

Participants

Participants for this study were 10 developers of an icon-based software product and 10 computer literate university students and recent university graduates. All participants volunteered for the experiment. One of the developers tested was also a participant in the initial icon brainstorming session.

Materials

The icon test consisted of 54 icons representing 15 interface concepts. There were two to four icons for each of the 15 concepts.

A Visual Basic program was developed to test icon recognition and preferences. The program consisted of a matching task for each of the 54 icons with the icon appearing in the center of the left side of a interface window and the 15 choices listed on the right side of the window. The preference task consisted of a window displaying an interface concept on the upper half and the two to four icon choices for the concept on the lower half. Fifteen preference choices were made by each subject. Each student participant was given a page with descriptions for each interface concept. The developers had the software specifications which contained the same interface concept descriptions.

Procedure Each participant was given a disk which contained the Visual Basic icon testing program. The instructions displayed in the first window directed the participants to match each icon with a concept they felt it represented best. They chose a "Next" button to progress through each icon. After the participants matched each icon with a concept, the preferences portion of the task was displayed. The participants were shown a concept with the two to four icons which were developed to represent the concept. They were instructed to select the one they preferred to represent the concept.

RESULTS

A t-test was performed on the mean number of correctly recognized icons. The students correctly recognized more icons (M = 34.7) than the developers (M = 27.8), t(18) = 2.1, p < .05 (see Figure 1).

fig.

A informal frequency analysis for recognition and preferences was done for each concept using only the icons specifically designed to represent that concept. The icons recognized most often by students and developers were different for two concepts (authorization and FTP2), and the icons preferred most often by students and developers were different for four concepts (modem, service, system down, and text attachment).

As a measure of the overall goodness of the icons, we compared the recognition percentages to the International Standards Organization (ISO) icon comprehension criteria, which is 67% and the American National Standard Institure (ANSI) criteria, which is 85%. Twenty-one of the 54 icons passed the ISO criteria and 10 of the 54 passed the ANSI criteria.

DISCUSSION

The difference in mean icon recognition rates between developers and students is not in the direction that one might expect. One possible explanation of this unexpected outcome is the slightly different testing procedure. Developers had the concept definitions avalable in the software specifications and may not have referenced them, while all of the students used the list of concept definitions.

This explaination works best for differences in identification of icons representing concepts with with complex sublties . However, only two of the concepts where more students correctly recognized the icons had complex sublties (FTP1, FTP2). The remaining three concepts where more students correctly recognized the icons were common (system down, reject, authorization). A more plausible expalanation might be that the developers had different mental models of the tested concepts than the students, and that there models had a worse fit to the icons than the students.

The hypothesis that there would be significant differences between recognition and preferences of the two groups.

ACKNOWLEDGMENTS

The authors would like to thank Tandem Computers, Inc. for their sponsorship of part of this study and the use of their data and programs.

REFERENCES

  1. Booth, P.A. An Introduction to Human-Computer Interaction. Lawrence Erlbaum Associates, London, UK, 1989.
  2. James, K., Lynk, L., Molinar, D. & Caird, J.K. The comprehension and use of word processing icons, in Proceedings of the Human Factors and Ergonomics Society 39th Annual Meeting, (San Diego, October, 1995), p. 941.