Julie B. Holloway*, John H. Bailey**
Using a product's development team for evaluating product usability is discouraged because their mental model is likely to be different than that of the users (Booth, 1989). However, limited time and resources during development of this product neccesitated using the developers for icon testing.
The data from the developers appeared to be valid, and it was used to make decisions about which icons to use in the product. However, the authors were left with a nagging suspision that the results of the test might have been different with representative users. To confirm this suspicion, two months later, we tested another set of users that had characteristics similar to those of the target users. These 'surragate' users were computer literate university students and recent university graduates.
In this paper, we describe the procedures used to collect the data and discuss the results of comparing developers' data and students' data. We hypothesized that there would be significant differences between recognition and preferences of the two groups.
A Visual Basic program was developed to test icon recognition and preferences. The program consisted of a matching task for each of the 54 icons with the icon appearing in the center of the left side of a interface window and the 15 choices listed on the right side of the window. The preference task consisted of a window displaying an interface concept on the upper half and the two to four icon choices for the concept on the lower half. Fifteen preference choices were made by each subject. Each student participant was given a page with descriptions for each interface concept. The developers had the software specifications which contained the same interface concept descriptions.
Procedure Each participant was given a disk which contained the Visual Basic icon testing program. The instructions displayed in the first window directed the participants to match each icon with a concept they felt it represented best. They chose a "Next" button to progress through each icon. After the participants matched each icon with a concept, the preferences portion of the task was displayed. The participants were shown a concept with the two to four icons which were developed to represent the concept. They were instructed to select the one they preferred to represent the concept.

A informal frequency analysis for recognition and preferences was done for each concept using only the icons specifically designed to represent that concept. The icons recognized most often by students and developers were different for two concepts (authorization and FTP2), and the icons preferred most often by students and developers were different for four concepts (modem, service, system down, and text attachment).
As a measure of the overall goodness of the icons, we compared the recognition percentages to the International Standards Organization (ISO) icon comprehension criteria, which is 67% and the American National Standard Institure (ANSI) criteria, which is 85%. Twenty-one of the 54 icons passed the ISO criteria and 10 of the 54 passed the ANSI criteria.
This explaination works best for differences in identification of icons representing concepts with with complex sublties . However, only two of the concepts where more students correctly recognized the icons had complex sublties (FTP1, FTP2). The remaining three concepts where more students correctly recognized the icons were common (system down, reject, authorization). A more plausible expalanation might be that the developers had different mental models of the tested concepts than the students, and that there models had a worse fit to the icons than the students.
The hypothesis that there would be significant differences between recognition and preferences of the two groups.