![]()
Gary Marchionini and Carol Hert
Usability testing may aim at practical goals such as improving the content, function, and interface of the site or improving the sponsoring entity itself, or testing may be research-driven, aiming at understanding user behavior, interactivity, or the design process. Web sites may be aimed at one highly specific community of practice or the general population at large, may deal with complex, technical information or general interest information, and may be mainly informative or highly interactive and collaborative. Clearly, interface designers and site managers need a variety of methods and techniques available to them for conducting usability studies so that they can apply those that are most beneficial and practical to the case at hand. The purpose of this briefing is to focus on evaluation and usability in web sites maintained by large institutions.
WWW sites present unique challenges that traditional software such as a word processing package or CD-ROM games do not. First, web sites are constantly changing--new documents may be added and old pages removed at any time. Second, the overall performance of a site is never fixed--internally it is dependent on what files are transferred and user volume, from the users' point of view it is also dependent on the overall instant state of the Internet between their client program and the server. These two differences suggest that users attach themselves to an organic and highly active environment rather than a predictable performance environment. The nature of the web complicates naturalistic usability testing since overall system performance may not be controllable.
Designers and researchers have begun to cope with the new challenges of web site usability testing by applying various usability testing methods to good advantage. Nielsen (CHI 96 workshop, Nielsen & Mack, 1994) and others have illustrated the benefits of discount testing methods where a small number of users or UI experts provide feedback and new iterations of the site are produced quickly and iteratively. Others have used scenario-based testing or cognitive walkthroughs (Polson et al, 1992; Rieman et al., 1995) to systematically walk through a site from multiple user viewpoints. We have used a two-tiered design team to provide feedback for our designs for the Library of Congress National Digital Library Program. In this approach the aggregate team discusses general design specifications, a small team develops prototypes and presents them to the large group for critique and comment, and the process is iterated over time (see http://cs.umd.edu/projects/hcil/Research/1995/ndl.html for sample iterations, also Plaisant et al, CHI `97 design briefing).
Large institutional web sites offer unique challenges to designers in all phases of development, including usability testing. The first challenge is that no single person creates such a site--these sites emerge across different departments and eventually are merged under one or a few home page(s), but no single individual has full authority over a site or understands everything in the site. In large institutions such as the Library of Congress or the Bureau of Labor Statistics, or most large university campuses, there may be a person who has final authority but there are a collection of committees within departments and across departments or divisions that provide input, promote guidelines, and generally influence the site. The evolution of such sites is as much a social and political process as it a systematic design process. Usability test plans must acknowledge this challenge and take the variety of interests into account--the test plan must try to understand the elephant through the impressions of many different visually impaired observers.
The second challenge in large institutional web sites is an inertia effect. Web sites that get tens or hundreds of thousands of hits per day build a constituency that has invested time in learning navigational and general usage routines and any change will invariably bring comments, requests, and complaints that must be processed in some way, which incurs costs. Thus, an institutional site, once established, cannot be substantially changed too often without causing some user inconvenience and institutional costs in managing the change. Usability tests must therefore be fairly expansive and comprehensive and provide strong evidence for any recommendations for change.
At the Bureau of Labor Statistics, we are taking a multifaceted evaluation approach to make recommendations for improvement of three web sites: the Bureau of Labor Statistics web site (http://www.bls.gov); the Current Population Survey web site (http://www.bls.census.gov/cps/cpsmain.htm a joint effort between BLS and the Census Bureau); and the One Stop Shopping for Federal Statistics web site (not yet for public use). One goal of this effort (Fred Conrad, Cathy Dippo, Clyde Tucker, and John Bosley at BLS; Gary Marchionini, Anita Komlodi, Stephan Greene at UMCP; Carol Hert, Kim Gregson and Geoffrey McKim at Indiana U) is to improve the usability of the BLS and CPS site and influence the public release of the OSS site. A second goal is to develop a taxonomy of user needs and information-seeking strategies for government statistical data. Thus, there is a practical and a research emphasis to the overall evaluation project. A multifaceted approach to evaluation has been successful in the longitudinal evaluation of the Perseus hypermedia corpus (Marchionini & Crane) and is based on gathering data from a variety of viewpoints and integrating the data into a final assessment. We are using six approaches to data collection: a) discussions with staff, document analysis, and site mapping; b) interviews with help desk and other staff at BLS and Census; c) content analysis of email messages from users; d) focus group discussions with groups of people who either use statistics or help others use statistics (e.g. librarians); e) usability testing (using scenarios) followed by debriefing, and f) server transaction log analyses.
Because the sites are so large, no one person truly understands everything contained in these sites. We conducted team meetings in which staff with different responsibilities for creating and maintaining the sites explained their views of the site. We also examined various internal reports and documents related to the site (e.g., an extensive quality council assessment had been done). We also studied the structure and organization of the sites by creating site maps that provided at least three levels of detail for each site. These tasks provided context for the data gathering effort.
Interviews have been conducted thus far with eight staff from different divisions and with different responsibilities at BLS and Census. These people receive user requests by email and phone and all have other primary responsibilities beyond answering user questions. Questions may come directly from users or may be forwarded by other staff. Two of these professionals specialize in web site related questions and have high traffic rates (hundreds per month), and the others specialize in specific topics (e.g., income and poverty data, cpi, ppi or other index computations, etc.) and answer few (dozens per month) of specialized questions. To conduct these interviews, a structured protocol was developed that organized question prompts into four categories. Content/Context included questions that explored what the interviewees' job entailed and the types of statistical data they use. Users/Tasks included questions about types of users and typical tasks they bring to the agency and solicited ideas about new user communities that might be served by the site. Strategies included questions about how users seek information and ways the interviewee went about helping users meet their needs. Other included general prompts about impressions of the best and worst things about the web site and suggestions for improving the site. The interviews were conducted in person or on the phone, typically included two project team members plus the interviewee, and took between 40 and 70 minutes. In-person interviews were audiotaped and phone interviews were not. In all interviews, extensive notes were taken and then summarized and sent to the interviewee for verification and comment.
One month's email messages for the BLS and CPS sites were obtained (approximately 500 messages were forwarded from the help desk). The project team discussed a coding scheme that characterized the type of question and information seeking strategy. Based on a first scheme, 90 messages were coded by two coders as a pilot test (77% intercoder reliability). Based on this pilot, the coding scheme was revised and another sample of messages coded. The final coding scheme uses a content dimension and a strategy/question type dimension. The content dimension facets (what the user is asking about) include: system, data, methods, metadata, tools, publications, costs, and other. The strategy/question type dimension facets (how question is posed) include: what, how, when, where, who, do you have (existence of), is it an error, why, and other. Based on the revised coding scheme, samples of the messages are coded and results summarized for the two dimensions.
Focus group interviews will be conducted with small groups of people who typically would refer users to these statistical sites and with a group of potential end users. During these discussions, the goal will be to get a richer view of how and when people use statistical information, and the strategies they employ to access it. This technique is particularly designed for our evaluation of the One Stop Shopping site which is not yet available to the public, hence people will not be able to discuss how they use the site. Instead we will use the results of the groups to derive design recommendations.
Coupled with this effort, is parallel usability testing where groups of 6-7 people will use the OSS site to work through several predetermined scenarios and one of their own choice followed by a debriefing. Project staff will make observations of the participants as they interact individually with the web site. The large-grained analysis implied by this type of usability testing, which will provide more global design recommendations than a finer-grained analysis, seems warranted because the site is still under development and thus very fluid.
The BLS site serves a huge volume of users. There are more that half a million non-cgi-bin requests per month and more than 80000 unique hosts per month use the site. The server is set to capture minimal information: IP address/name, date/time, request (e.g., GET /ocohome.htm), protocol (e.g, HTTP/1.0), status code, and bytes transferred, yet one month's server logs contain more than 200Mb of data. The approach we are taking to transaction log analysis has three main components. First, user sessions are determined by parsing the logs for unique IP addresses/names and removing records containing GET .gifs and those with GET cgi-bin/imagemap. The .gif records are not informative and a decision was made to not try and analyze image map clicks because the image map coordinates are not consistent across different clients. The parsing is being done with C programs that write unique session files as output. The second component is to determine a coding scheme for the site. Examination of one month's logs show that there are more than 2500 unique URLs requested (many of which yield a 404 error). There are 253 unique pages (not including pages generated on the fly by cgi scripts) for the top three levels of the BLS site alone. Because we are interested in patterns of use, not only the typical summary statistics, we had to make decisions about what pages were most essential to include in the sequential analyses. There are very practical constraints in this regard, for example, manipulating an order 1 transaction matrix for 250 pages yields a 62500 cell matrix. We have chosen 60 key pages for the purposes of analysis. These 60 pages are mapped to single character codes and the individual session files are then coded. We aim to make the coding table driven so that different coding schemes could be easily substituted for the same session files. These coded files are then merged into a file suitable for use with the Sequence data analysis program. in this third component, we plan to investigate most popular paths, critical pages (e.g., starting/stopping), and specific subpath (e.g., help sequences, specific paths to pages such as the Occupational Outlook Handbook). We recongnize the inherent limitations of this approach to transaction logging (it would be better to instrument everyone's clients!), specifically, the logs do not include moves users make with cached pages, and the session parser cannot distinguish two or more concurrent sessions from the same domain name--a real concern for campus laboratories.
Based on these data we will make recommendations about redesigning the web sites and develop a user needs and strategies taxonomy that may be useful in other large institutional sites. There are already a set of recommendations that came out of interviews, email analysis, and developing the coding scheme for transaction log analysis. We believe this multifaceted approach will influence the redesign of the web sites because the data is substantial and triangulated across different methods, and key staff from many of the different agencies, divisions, and departments provided input. This is particularly important in large institutions where social and political concerns are as important as logic and inspired design.
References
Marchionini, G., & Crane, G. (1994). Evaluating hypermedia and learning: Methods and results from the Perseus Project. ACM Transactions on Information Systems, 12(1), 5--34.
Nielsen, J. & Mack, R. L. (Eds.). (1994). Usability inspection methods. New York: John Wiley & Sons.
Plaisant, C., Marchionini, G., Bruns, T., Komlodi, A., & Campbell, L. (in press). Bringing treasures to the surface: Iterative design for the Library of Congress National Digital Library Program. CH `97 design briefing.
Polson, P., Lewis, C., Rieman, J., & Wharton, C. (1992). Cognitive walkthroughs: A method for theory-based evaluation of user interfaces. International Journal of Man-Machine Studies, 36, 741-773.
Gary Marchionini is a professor in the College of Library and Information Services at the University of Maryland where he teaches courses in computer applications, human-computer communication, and research methods. He also has an appointment in the University of Maryland Center for Automation Research's Human-Computer Interaction Laboratory and is Director of the Digital Library Research Group. His Ph.D. is from Wayne State University in mathematics education with an emphasis on educational computing. His research interests are in information seeking in electronic environments and human-computer interaction. He has had grants and contracts from the National Science Foundation, Council on Library Resources, the National Library of Medicine, the Library of Congress, U.S. Department of Education, and NASA, among others. He has published over fifty articles, chapters and reports in a variety of books and journals. He serves on the editorial board of Journal of the American Society for Information Science, Information Processing & Management, Library Quarterly, Library and Information Science Research, Journal of Network and Computer Applications, and the Journal of Educational Multimedia and Hypermedia; he is co-editor of the new Journal of Digital Information. He is author of a book titled Information Seeking in Electronic Environments published by Cambridge University Press. He is the Director of Evaluation for the Perseus Project (a large-scale hypermedia corpus), principal investigator for a project to develop an interface for the Library of Congress National Digital Library, and co-principal investigator for a U.S. Department of Education Challenge Grant to develop a digital video learning community in Baltimore. He was the Conference Chairman for the 1996 ACM International Conference on Research and Development in Digital Libraries.
Carol A. Hert is an Assistant Professor at Indiana University's School of Library and Information Science. Her research interests are in the areas of user behavior on information systems (particularly information retrieval systems including the Web) and on evaluation of systems. She teaches in the areas of information technology standardization, information technology management, information seeking behavior, and research methods.