Xerox Palo Alto Research Center

3333 Coyote Hill Road

Palo Alto, CA 94304

pirolli@parc.xerox.com

card@parc.xerox.com

To motivate previous designs of interactive information systems [6] , we appealed to mechanisms of cognitive science [5], and to general principles of information science [17] . We have argued that in an information-rich world, the real design problem to be solved is not so much how to collect more information, but rather, how to optimize the user's time, and we have deployed these principles in an attempt to increase relevant information gained per unit time expended. But for task analysis, design exploitation, and evaluation of information systems, a more developed theory is needed.

In this paper, we lay out the framework for an approach we call information foraging theory. This approach considers the adaptiveness of human-system designs in the context of the information ecologies in which tasks are performed. Typically, this involves understanding the variations in activity afforded by some space of human-system design parameters, and understanding how these variations trade-off the value of information gained against the costs of performing the activity. While complementary with information processing approaches to computer interaction theory, such as those in the GOMS family [5, 14], information foraging theory emphasizes a larger time-scale of behavior, the cost structure of external information- bearing environments, and human adaptation. Consider the time-scales of activity outlined by Newell [13] .

The sorts of information-seeking and sensemaking activities of interest to us span from the middle of the cognitive band of activity (~100 ms - 10 s), across the rational band (minutes to hours), and perhaps into the social band (days to months). Typically, information-processing models of cognition have addressed behavior at the cognitive band, and elementary cognitive mechanisms and processes (e.g., such as those summarized in the Model Human Processor, [5] ) play a large part in shaping observed behavior at that grain size. As the time scale of activity increases, "there will be a shift towards characterizing a system...without regard to the way in which the internal processing accomplishes the linking of action to goals" (Newell [13], p 150). Such explanations assume that behavior is governed by rational principles and largely shaped by the constraints and affordances of the task environment. Rather than assuming classical normative rationality, one may assume that the rationale for behavior is its adaptive fit to its external ecology [4] .

This is the essence of an ecological stance (Neisser, as cited in Bechtel [4]) towards cognition. Whereas information- processing models, such as GOMS, provide mechanistic accounts of how cognition operates, ecological models address why it operates that way, given the ecological context in which it occurs. This kind of integrated explanatory framework has been promoted by Marr [12] and, more recently, Anderson [1, 2] in cognitive science.

Optimal foraging theory is a theory that has been developed within biology for understanding the opportunities and forces of adaptation. We believe elements of this theory can help in understanding existing human adaptations for gaining and making sense out of information. It can also help in task analysis for understanding how to create new interactive information system designs.

Optimality models in general include the following three major components.

- Decision assumptions that identify which of the problems faced by an agent are to be analyzed. This might involve decisions about whether to pursue a given type of information item upon encountering it, the time to spend processing a collection of information items, the choice of moves to make in navigation, the choice of strategy under uncertainty, or degree of resource sharing.
- Currency assumptions, which identify how choices are to be evaluated. The general assumption in ecological analyses is that some feature x will exist over other features, if x satisfies some existence criteria. Existence criteria have two parts (a) a currency, and (b) a choice principle. Typically, optimal foraging models in anthropology and biology assume energy as a currency. Information foraging theory will assume information value as currency. Choice principles include maximization, minimization, and stability.
- Constraint assumptions, which limit and define the relationships among decision and currency variables. These will include constraints that arise out of the task structure, interface technology, and the abilities and knowledge of a user population.

We assume that information foraging is usually a task that is embedded in the context of some other task and its value and cost structure is consequently defined in relation to the embedding task and often changes dynamically over time [3, 18]. The value of information [16] and the relevance of specific sources [18] are not intrinsic properties of information-bearing representations (e.g., documents) but can only be assessed in relation to the embedding task environment. Usually, the embedding task is some ill-structured problem for which additional knowledge is needed in order to better define goals, available courses of action, heuristics, and so on [15, 20]. Such tasks might include such things as choosing a good graduate school, developing a financial plan for retirement, developing a successful business strategy, or writing an acceptable scientific paper. The structure of processing and the ultimate solution are, in large part, a reflection of the particular knowledge used to structure the problem space. Consequently, the value of the external information may often reside in the improvements to the outcomes of the embedding task.

The use of optimality models should not be taken as a hypothesis that human behavior is classically rational, with perfect information and infinite computational resources. A more successful hypothesis about humans is that they exhibit bounded rationality or make choices based on satisficing [19]. However, satisficing can often be characterized as localized optimization (e.g., hill-climbing) with resource bounds and imperfect information as included constraints [23]. Optimality models do not imply that animals or information foragers will necessarily develop so as to embrace the simple optimum. Rather, they describe the possibilities of a niche, a possible advantageous adaptation if not blocked by other forces (for example, the consequences of another adaptation). For us, these models help fill in what Anderson [1] calls the Rational Level theory of information access.

In the context of a an analysis of the Scatter/Gather document browser [9] we introduce two simple models, the information patch model, and the information diet model, borrowed rather directly from optimal foraging theory. These "conventional" models derive from Holling's disc equation [22], which states that rate of currency intake, R, is the ratio of net amount of currency gained (energy in the case of biological systems; information value in our case), U, divided by the total amount of time spent searching, Ts, and exploiting, Th,

**TABLE OF EQUATIONS. Equation 1 **

** TABLE OF EQUATIONS. Equation 2 **

The information patch model and the information diet model
are formulated as variants of Equation 2. We discuss the
analytic optimal solutions to these models in the context of
the illustrations.

We also develop a more comprehensive dynamic model that incorporates the information patch model and information diet model as subcomponents. Using dynamic programming we illustrate how one may determine the optimal human- system strategies using dynamic programming techniques.

We discuss an information patch model in the context of the Scatter/Gather text database browser.

Conceptually, a collection may be clustered, for instance, into B = 10 groups of related documents. Each cluster is represented by a separate area as in Figure 1. For each cluster, the user is presented with typical words that occur in the text contents of documents in a cluster, as well as the titles of the three most prototypical documents. The user may gather some subset of those clusters, by pointing and selecting buttons above each cluster, and then ask the system to scatter that subcollection into another B subgroups, by selecting a Scatter/Gather button at the top of the display in Figure 1. The clustering is based on a form of inter- document similarity computation based on representations of text contents. Scatter/Gather browsing and clustering employs methods that can occur in constant interaction-time [8].

**
FIGURE 1. The Scatter/Gather interface for navigation
through large document collections.**

This simple loop of activity can be characterized by a
cumulative gain function gi(t) that indicates how much
information value is acquired over time t in cluster-patches
of type i. In our empirical studies, we used specific task
instructions that indicated that the information value was
simply the number of relevant documents collected. The
proportion of relevant documents in a cluster is the
precision, P, of that cluster
If the document titles in a cluster were ordered by their
probability of relevance to a query, then the gain function
would shift qualitatively to a diminishing returns curve such
as gB(t) in Figure 2. That is, there would be higher number
of relevant documents encountered while scanning the initial
parts of the display of document titles, but this encounter rate
would diminish as one continues scanning. Simple patch-
foraging models [22] identify the overall maximum rate of information and the
optimal time at which to stop foraging in a patch. In these
traditional patch models, each patch type i with gain function
gi(ti)
The optimization problem is to determine the optimal vector
of patch residence times that maximizes the
currency intake rate R.

** TABLE OF EQUATIONS. Equation 3 **

where ts is the time it takes to scan and judge the relevance
of a title. If the time to handle a relevant item were th = 0,
then we would have

However, when handling costs are non-negligible,
** TABLE OF EQUATIONS. Equation 6 **

In this case, gi(t) is just a linear function of t. This is
illustrated qualitatively by function gA(t) in Figure 2. Next
we illustrate how a foraging model reveals the improvements
that would be made by introducing an interface method that
displayed cluster titles ordered by the similarity of document
contents to a user's query rather than in the unordered
presentation captured by Equation 6. This method is
currently unavailable in Scatter/Gather, but certainly possible
and advisable given our analysis.
[Footnote2]
This foraging model
illustrates explicitly how the overall rate of information gain
would change, as well as how the users' foraging time in
each cluster would be reduced.

,

is additionally characterized by the rate at which such
patch-gain functions will be encountered during prolonged
foraging,

.

For instance, over a sufficiently long run of
work with the Scatter/Gather interface one might be able to
classify encountered clusters by their expected gain
functions, gi

,

and to additionally associate with each cluster
type i the rate,

,

at which one expects to encounter clusters
with gain functions gi

.

** TABLE OF EQUATIONS. Equation 7 **

*Charnov's marginal value theorem *[7] states that the long-
term rate of currency intake is maximized by choosing patch
residence times so that the marginal value (instantaneous
rate) of the currency gain at the time of leaving each patch
equals the long-term average rate across all patches in the
ecology.
[Footnote3]
More technically, the full vector of rate
maximizing ti's, (t1, t2, ....,tN),

must fulfill the condition
specified by

**TABLE OF EQUATIONS. Equation 8 **

Charnov [7] shows that this theorem implies that residence
times decrease as overall rate of gains from the information
ecology improve and the optimal patch residence time occurs
when the instantaneous gain rate equals the overall rate of
gain from all patches that are foraged.

Discussion of the analytic formulation and solution of prey
selection models may be found in Stephen and Krebs [22],
but here we focus on three general results: (a) ranking by
*profitability, * (b) the zero-one rule, and (c) the independence
of inclusion from encounter rate. Assume that the items
being selected into the diet can be categorized as to their
profitability, which is the expected net gain divided by the
amount of time it takes to handle the item. In our case, the
cluster profitabilities will be proportional to lp. To select
the optimal diet, the items should be ranked by their
profitability. Starting with the most profitable items, include
lower-ranked items in the diet only if their profitability is
greater than or equal to the average rate of gain of all the
higher ranked items. Including items with profitability
lower than the average rate of gain of the all the higher
ranked items can only decrease the average. This is an
instance of the *principle of lost opportunity: * by handling
lower-ranked items in the diet one loses the opportunity to
go after higher-ranked items. The zero-one result simply
states that the optimal diet will be one in which items of a
given profitability level are chosen all-or-none. The
independence of inclusion from encounter rate result states
that the inclusion of lower-ranked items in a diet is solely
dependent on their profitability and not upon the rate at
which they are encountered. To use an everyday analogy: if
reading junk mail is categorized as a too-low profitability
item (because there are better things to pursue), then the
decision to ignore junk mail when it is encountered should
be made regardless of the amount or rate of junk mail
received.

**
TABLE 1. Empirical costs of Scatter/Gather use.**

(Paragraph text replaced with scanned image to render mathematical symbols.)

That is, assuming that the optimal choices (or

where "clustering history" represents the path by which the user has navigated through the cluster hierarchy by iteratively gathering then scattering clusters.

A reasonably average initial state x0 would be

** TABLE OF EQUATIONS. Equation 11 **

** TABLE OF EQUATIONS. Equation 12 **

At a given time, t, with T - t time remaining until the
deadline is hit, the number of relevant documents collected
from the displayed cluster titles will be either (a) all the
relevant cluster titles, if there is sufficient remaining time to
go through the whole cluster, or (b) as many relevant
documents as can be collected in the remaining time, as
characterized by Equation 6:

** TABLE OF EQUATIONS. Equation 13 **

The associated time-cost function is

** TABLE OF EQUATIONS. Equation 14 **

Unless the time cost of the operator takes the human-system
interaction past the deadline time, T, interaction returns to
the cluster that is superordinate in the cluster hierarchy, and
the number of foraged documents, remaining relevant
documents, and foraged subclusters are updated
appropriately, as is the history.

Operators for choosing the highest-precision i = 1, 2, ...9
clusters ca be defined by operators di,

** TABLE OF EQUATIONS. Equation 15 **

**
FIGURE 3. Expected gains and optimal operator choices
in Scatter/Gather navigation.**

If a cluster is divided into B subclusters (e.g., B = 10, in our
case), then the number of documents collected together by
gathering i subclusters together is, in the average case,

** TABLE OF EQUATIONS. Equation 16 **

The number of relevant documents that will be found in the
newly gathered cluster will be

** TABLE OF EQUATIONS. Equation 17 **

The cost of the gathering operation is

**TABLE OF EQUATIONS. Equation 18. **

Scatter-gathering clusters is optimal at the beginning of the task when time available and cluster precision is low and displaying-selecting is optimal later or at higher precisions. The optimal number selected clusters tends to increase with overall state precision, and precision increases with time.

This book analyzes feeding behavior in the way an engineer might study a new piece of machinery. An engineer might ask, among other questions, about the machine's purpose: is it for measuring time, wind speed, or income tax? This is a worthwhile question for the engineer, because machines are built with a purpose in mind, and any description of a machine should refer to its purpose. . . .Biologists also ask questions about purpose, about what things are for in contrast to the engineer, the biologist thinks of design or purpose as the product of natural selection, rather than as the product of a conscious creator. (Stephens & Krebs, [22], p. ix]

In this paper, we have tried to make a start at reversing the above analogy - exploiting theory developed in the service of behavioral ecology to analyze information ecologies and the design of interactive information systems. It is hoped that this line of research will give us a new set of tools to aid in the design of these systems.

2. The current Scatter/Gather does have a similarity search operator that works over the entire collection but cannot be restricted to selected clusters. Return to text

3. We are assuming that it is long-term average net gain that is being maximized. See Hames [10] for other optimization policies. Return to text