CHI '95 ProceedingsTopIndexes
Short PapersTOC

VGrep: A Graphical Tool for the Exploration of Textual Documents

Jeffrey D. McWhirter

Department of Computer Science
University of Colorado, Boulder, CO, 80309
303-492-7906, jeffm@cs.colorado.edu

© ACM

Abstract

Discovering the content and structure of textual files through keyword based search is a common task of computer users. However, the results of such a search is often difficult to understand and to use. This paper describes VGrep, a tool that facilitates keyword based search through large textual documents. VGrep provides the ability to formulate queries and present the results of the queries in an abstract graphical representation.

Keywords

Word search, visualization.

Introduction

One of the common tasks that programmers and other users of computers engage in is to search the contents of a set of files for particular key words or strings. The result of such a search is typically in a linear textual format listing the files and the lines that contained the search expression. For a small number of matches this output is sufficient for most tasks. However, as the number of matches grows, understanding the information being presented becomes more difficult. Furthermore, navigating through the set of files can be a convoluted task requiring the user to engage in a series of steps (e.g., identifying the line, bringing up a text viewer for the file, and going to the line number that is displayed in the search output). VGrep is a tool that supports keyword based search and navigation of the results of the search. In this paper we first present an example textually oriented search and then describe how VGrep is used to facilitate this search process.

Example

We now discuss a typical scenario of the use of grep, a Unix search command. Suppose a programmer needs to search for the keyword ibbox in a set of program source files. The search command might look like this:
    
grep -i -n ibbox *.C    
which causes all of the files ending in .C to be searched in a non-case sensitive manner for the string ``ibbox''. The output is to include the line numbers. A subset of the results of this search is:
    
...    
Gfx.C:1004:ibbox=gRect0;    
Gfx.C:1012:if(!parent)SignalChange(ibbox);    
...    
ShapeGfx.C:804:r.extent=ibbox.extent-2*p;    
ShapeGfx.C:807:r.origin=ibbox.origin+p;    
...    
Of course this is just a small subset of the output. It is quite easy to match a large number of lines in a multitude of files. The output readily becomes unmanageable, requiring the reformulation of the search criteria or limiting the number of searched files. The linear display of the matched filenames, line numbers and lines can quickly overwhelm the user.

VGrep

VGrep is a tool that allows a user to define and execute a word search on a set of files. The result of this search is displayed in a graphical format where each file is shown abstractly as a bitmap. Each line of the bitmap corresponds to a line in the file and each bit corresponds to a character. The result search lines are shown highlighted. This type of abstract graphical display was developed and explored by Eick et. al. to visualize, among other things, program line statistics
[1] and log files of telecommunication switch build processes [2].

FIGURE 1: VGrep Word Search

Figure 1 shows a screen snapshot of VGrep. The dialog box shown in the upper part of the figure allows the user to formulate and execute a query. The result of the query is shown in the main window. In the case of the figure the query is the same as described above. Each file that matched is displayed in the main window with the matched lines shown in the bitmap representation. For long files multiple bitmaps are used. The user can define scaling factors to shrink or expand the bitmaps along the x or y axis. The maximum file lines per bitmap can also be set by the user. One can re-execute a search on the same set of files. In this case the results of the new search are shown as a different color in the original bitmap file image. This is a useful facility to discover relationships within files based on different search criteria.

The use of an abstract graphical representation allows the user to see the full set of files that were matched and provides an abstract picture of the distribution of the matches in the files. It allows the user to see clusters and patterns of matches and may provide a higher level of understanding of the file structure and contents.

The VGrep tool also allows the user to navigate among the different files by clicking on the pertinent areas of the bitmap file image to control a textual representation of the file. On the right in the main window of the figure there are textual views of two of the matched files. As the user clicks on the bitmap representation the textual view is scrolled to the corresponding line which is highlighted. The user can also bring up an external text editor through the bitmap file representation.

FIGURE 2: VGrep Source Code Display

Displaying Source Code

The VGrep tool also allows for the abstract representation of source code as seen in Figure 2 (Note: The length of the file shown in Figure 2 is approximately 1600 lines, the bitmap representation has been been scaled down and color coding is not shown). The user provides a file name. This file is parsed and the result bitmap is color coded based on the type of each line in the file. The color coding used for comments, pre-processor directives, global code and procedures can be specified by the user. A text view of this file can also be brought as described above.

CONCLUSION

Using abstract graphical representations of files and word searches on those files allows one to more easily explore a large set of files and the results of the word search. The graphical representation allows one to gain a broader understanding of the information being presented and serves as a roadmap to navigating through the corresponding textual representation.

References

1. Stephen G. Eick, J. L. Steffen and E. E. Sumner. Seesoft - A tool for visualizing line oriented software statistics. In Transactions on Software Engineering, volume 18, pages 957-968, November, 1992.
2. Stephen G. Eick, Michael C. Nelson and Jeffrey D. Schmidt. Graphical Analysis of Computer Log Files. In Communications of the ACM, volume 37, pages 50-56, December 1994.

Return to text