*
Sougata Mukherjea and James D. Foley
*

- Graphics, Visualization & Usability Center
- College of Computing
- Georgia Institute of Technology
- E-mail: sougata@cc.gatech.edu, foley@cc.gatech.edu

One common strategy to solve this problem is to use an overview diagram showing the
overall graph structure. However, the problem with these are that for any large
information space like the WWW, these diagrams are too confusing for the user. Therefore,
instead of showing the whole space, we need to show how the node can be reached from
important nodes (known as *landmarks* in the hypermedia literature). This is
similar to the common geographical navigation strategy of finding where we are in the
context of important landmarks.

This paper discusses an useful but simple method of showing the context of nodes of the World-Wide Web with respect to landmark nodes. We have implemented our method in the Navigational View Builder [3], a tool for forming effective visualizations of hypermedia systems. Examples are shown of how our method found out the context of some of the WWW pages about the research activities at the Graphics Visualization & Usability Center (GVU) at Georgia Tech (URL: http://www.gatech.edu/gvu/gvutop.html). Note that the node and link structure of the WWW were extracted by parsing the html documents using the strategy described in [4].

Thus, the importance of a node can be calculated to be the weighted sum of the second-order connectedness (SOC), the back second-order connectedness (BSOC), the indegree (I) and the outdegree (O). After the importance of the nodes are calculated, the landmark can be defined to be those nodes whose importance value is greater than a threshold. We used a threshold value of ten percent of the total number of nodes in the information space. Thus, the procedure for discovering landmarks can be summarized as follows:

- Calculate

*importance = (I + O) * wt1 + (SOC + BSOC) * wt2*

where*wt1 + wt2 = 1.0*. We found the best result using*wt1 = 0.4*and*wt2 = 0.6*. - Iff
*importance*> 10% of total number of nodes, the given node is a landmark.

- For each node
*n*that has a link to the node of interest*i*, if*n*is a landmark node and importance of*n*is greater than*i*, we make*n*the node of interest and recursively call our procedure for*n*. Figure 1 shows the context of the WWW page of the first author. Two landmark pages*People.Students.html*and*Multimedia.html*had links to this page. Thus they became nodes of interest. The procedure was recursively called for these pages. The recursion stopped when we reached*gvutop.html*since its importance is greater than all other nodes.

FIGURE 1:**Context of Sougata Mukherjea. Indicates that he is a student in GVU and part of the Multimedia group.** - For some node, say
*i*, it may happen that none of the nodes that have links to*i*are landmarks. For these nodes we find*n*, the node which has the maximum importance among the nodes that have links to*i*and moreover, the importance of*n*is greater than the importance of*i*.*n*becomes the new node of interest and the procedure is recursively called for*n*. For example, for the node*visdebug.gif*, none of the nodes that have links to it were landmarks. Therefore, we selected the node*visdebug.html*, the most important node that had link to it and called the procedure recursively for that node. The context for*visdebug.gif*was found to be following the path from*gvutop.html*to*visdebug.gif*via*research.html*,*SoftViz.html*and*visdebug.html*.

- It may happen that for a node
*i*none of the nodes that have links to it are landmarks and none have importance greater than*i*. For these nodes, we show the context by finding the shortest distance to this node from the most important node. Thus, Figure 2 shows the context of*section3.2.html*. No landmark node links to it and it's importance is greater than all nodes that link to it. Therefore, we show the shortest path from the most important node,*gvutop.html*to this node.

FIGURE 2:**Context of section3.2.html. Indicates that it is a page in the Multimedia research area.**

However, a major limitation of our system is that it uses just structural analysis for determining the importance of the nodes. This leads to unexpected results sometimes. For example, some new PhD students who have not yet decided on their research area, work in many areas. Since they have links to all these areas, their importance is high by our calculation. However, this does not seem correct. Thus, some contextual analysis is also needed. An useful way to do this is to make the importance of the node dependent on the number of accesses to the node. This can can be easily done by incorporating a web access log analysis tool [4] into our system. Finding other contextual methods of determining the importance of a node is an open research issue.

**1**-
M. Andreessen.
NCSA Mosaic Technical Summary.
Technical report, National Center for Supercomputing Applications,
1993.
**2**-
R. Botafogo, E. Rivlin, and B. Shneiderman.
Structural Analysis of Hypertexts: Identifying Hierarchies and
Useful Metrics.
*ACM Transactions on Office Information Systems*, 10(2):142-180, 1992. **3**-
S. Mukherjea, J. Foley, and S. Hudson.
Visualizing Complex Hypermedia Networks through Multiple
Hierarchical Views.
To appear in
*Proceedings of ACM SIGCHI '95*, May 1995. **4**-
J. Pitkow and K. Bharat.
WEBVIZ: A Tool for World-Wide Web Access Log Visualization.
In
*Proceedings of the First International World-Wide Web Conference*, Geneva, Switzerland, May 1994. **5**-
F. Valdez and M. Chignell.
Browsing Models for Hypermedia Databases.
In
*Proceedings of the Human Factors Society, 32nd Annual Meeting*, Santa Monica, Ca, 1988.