William Joseph King*, Jun Ohya**
*Human Interface Technology Laboratory
Fluke Hall; FJ-15
University of Washington
Seattle, Washington 98195, USA
jking@hitl.washington.edu
**Communication Systems Research Laboratories
Advanced Telecommunications Research Institute
Seika-cho, Soraku-gun
Kyoto, 619-02, Japan
ohya@atr-sw.atr.co.jp
Agents have become a predominat area of research
and development in human interfaces. A major issue in the development
of these agents is how to represent them and their activities
to the user. Anthropomorphic forms have been suggested, since
they provide a great degree of subtlety and afford social interaction.
However, these forms may be problematic since they may be inherently
interpretted as having a high degree of agency and intelligence.
An experiment is presented which supports these contentions.
agents, anthropomorphism, facial expression,
user interface design
In recent years, research into autonomous agents has increased dramatically. These agents are meant to carry out tasks for the user and serve as another layer of mediation within the system [1]. This layer of mediation employs artificial intelligence techniques to make decisions and to take certain actions within the system. This proactive agent acts autonomously or with permission from the user; therefore, it relieves the user of the burden of having to carry out tedious or repetitive actions [1,2].
A great challenge in designing the human interface to a system using autonomous agents is representing the state of the agents. On can assume that the user probably needs to know, at some high level of granularity, a number of specifics about the agentís operations. These specifics may include the stage the agent is at within a task, the certainty the agent has of its own decisions within the task, and the movement toward the goal of finishing the task. One can also assume that, at the highest level, the user simply needs to know how much trust to invest in the agent.
Trust, in this situation, is the relative ability level which the human user attributes to a certain software entity. Learning theory would indicate that this level of trust may be developed over time through the userís observation of the success and failure of a given agent. However, this may take quite a long time since many agents depend upon establishing a history of the userís actions and preferences to achieve their optimum performance [2]. In the short term, especially during the initial exposures of the user to the agent, the interface will have to provide some other information to aid the user in appraising the agent.
Anthropomorphic representations for agents
have been suggested [1, 3, 4, 5]. The reasoning behind this suggestion
is that the anthropomorphic representation allows for a rich set
of easily identifiable behaviors and for social interaction. However,
these representations do not tend to afford a quick assessment
of the agentís capabilities, especially during initial
exposure to the agent. If anything, these representations may
make the agent seem more intelligent, capable of a higher level
of agency, and more trustworthy than it actually is. Empirical
data are necessary to make any conclusive statements.
Eighteen adult subjects participated in the
experiment. The subject pool was chosen to represent a range of
expertise in the use of computers; however, none of the subjects
had prior exposure to this experimental system. Half of the subjects
were from îWestern philosophical traditions and half
of the subjects were from îEastern philosophical traditions.
Half of the subjects were male.
Twenty stimuli were chosen for the experiment..
These stimuli ranged from simple geometric shapes (e.g., sphere)
to three dimensional, fully articulated human forms. The set included
Chernoff faces, with both positive and negative facial expressions,
and stylized facial caricatures. It included very high resolution
human forms in two and three dimensions and in wire frame. Some
of the faces exhibited random eye blinking. The set also included
a variety of non-facial forms of equal geometric complexity to
balance the experiment. Each of the 20 stimuli was presented both
static and dynamic. In the dynamic presentation, the stimuli would
move into the subjectís field of view from above until
it reached the center of the display.
The measure used in this experiment was a questionnaire containing 40 sets (stimuli were presented in two different conditions) of three questions. The first question was a multi-dimensional scale developed to assess the type of phenomena which was presented. The scale was composed of a triangle with the vertices labeled îobject, îagent, and îevent. Subjects were asked to place an îX in the triangle at the place which best described the phenomenological properties of the stimuli presented.
The second and third questions were forced
choice. The second question asked the subject to again appraise
the stimuli; this time the subject had to make a forced choice
between the three. In the third question, subjects were asked
to rate the îintelligence or îpotential intelligence
of the stimuli shown. This rating was given on a standard ten
point scale, in which î1 represented no intelligence
and î10 represented high intelligence. All of the
questions and instructions were given in either English or Japanese.
Each subject was given written instructions before the trial began. The instructions included definitions of each type of phenomena. They also included sample answers to the questions. The subjects were allowed to ask questions about the instructions and experiment at this time.
The subjects were seated approximately four feet in front of a large, wall-sized lenticular display. A reflective marker was placed on their chin. The system was then calibrated so that the subjectís head could be tracked by remote infrared cameras. The lenticular display presented the stimuli in three dimensional monochrome; the stimuli were stabilized so that the subject could move his or her head and upper body to see different perspectives.
Once the subject was comfortable with the
experimental configuration and the system was calibrated, the
lights were dimmed, and the experiment began. The trial proceeded
at the subjectís own pace. In each presentation, the subject
was exposed to a representation or stimuli for 15 seconds. At
the conclusion of the presentation, the subject completed the
three questions associated with it and signaled for the next presentation.
This sequence continued until all presentations were made in random
order.
Preliminary results indicate that fully articulated human forms were appraised as agents significantly more often than other anthropomorphic forms and all of the other stimuli presented F(3,14)=18.43, p<.05. Subjects rated the intelligence of the human forms to be significantly higher than either the caricatures, t(17)=7.05, p<.001, and t(17)=6.01, p<.001, or the Chernoff faces, t(17)=8.44, p<.001, and t(17)=8.83, p<.001.
Within the human forms, the three dimensional
representations were significantly more likely to be appraised
as agents rather than objects. Appraised level of agency and intelligence
was found to be similar across the more symbolic caricatures and
Chernoff faces. However, all of the anthropomorphic representations,
including the more symbolic ones, were judged to be more intelligent
and capable of higher agency than the other stimuli presented
(e.g., geometric shapes, objects, etc.). The blinking human forms
were appraised to have significantly higher agency and to be more
intelligent than any of the other representations, including the
identical non-blinking forms.
The designer wants the software entity to be appraised with a certain level of agency and intelligence. While human facial displays and anthropomorphic representations may seem appropriate, the results indicate that these representations are judged to inherently have a high degree of agency and intelligence. Subtle behavioral displays (e.g., eye blinking) can have a great effect on the userís appraisal of these capabilities.
The designer of agent-based interfaces must
be careful to choose appropriate representations and behaviors.
These choices will have a dramatic effect on the userís
judgment of the agent and on the trust which the user places upon
the agent. Unless the agent is capable of human-like actions and
decisions, then the designer may be wise to avoid human-like representations.
The authors wish to thank Fumio Kishino and
Nobuyoshi Terashima for providing funding and a laboratory where
this research could be conducted.