|
Quentin Stafford-Fraser
* |
Peter Robinson * |
|
Rank Xerox Research Centre (EuroPARC) Ravenscroft House, 61 Regent Street Cambridge CB2 1AB, United Kingdom Tel: +44 1223 341521 E-mail: fraser@europarc.xerox.com
|
*
University of Cambridge Computer Laboratory New Museums Site, Pembroke Street Cambridge CB2 3QG, United Kingdom Tel: +44 1223 334637 E-mail: pr@cl.cam.ac.uk
|
AbstractThe goal of `Computer Augmented Environments' is to bring computational power to everyday objects with which users are already familiar, so that the user interface to this computational power becomes almost invisible. Video is a very important tool in creating Augmented Environments and recent camera-manufacturing techniques make it an economically viable proposition in the general marketplace. BrightBoard is an example system which uses a video camera and audio feedback to enhance the facilities of an ordinary whiteboard, allowing a user to control a computer through simple marks made on the board. We describe its operation in some detail, and discuss how it tackles some of the problems common to these `Video-Augmented Environments'. |
We might hope that the computer of the future would be more like an assistant than like a typewriter, that it would perform mundane tasks on your behalf when it sees they are necessary rather than always operating directly under your control. The more a secretary knows about your way of life, your preferred modes of work, how cluttered your desk is and when you are in a meeting, the more useful he or she can be to you. The same applies to a computer, but the average PC has no knowledge of the world outside its box and has to be spoon-fed with information through its limited range of input devices.
This is part of the philosophy behind Computer-Augmented Environments; the desire to bring the computer `out of its box' and give it more awareness of the world around, so that it augments and enhances daily life rather than attempting to replace it. Can we, for example, make the computer understand what we do in our offices rather than putting an impoverished `office metaphor' on the machine's screen? Can we centre computational power around everyday objects with which users are already so familiar that they don't think of them as a human-computer interface? During work on the DigitalDesk [10, 16], for example, we experimented with ways of enabling the computer to recognise an ordinary pencil eraser, and using that as the means of deleting parts of an electronic image projected onto the desk. The motivation was simple: people already know how to use erasers, and will continue to use them for hand-drawn pictures. By augmenting the eraser's capabilities we simply expand its scope of use. We neither detract from its abilities to erase pencil, nor require the user to learn a new tool.
The solution is to give the computer a small number of senses which have a broad scope of application and which operate remotely, i.e. without direct contact with the objects being sensed. The obvious candidates are vision and hearing, through the use of video cameras and microphones. Similarly, if the computer is to communicate with humans, then the analogous output systems are audio feedback and perhaps some projection facilities. Not only do cameras and microphones have a broad range of application, they can often be applied to more than one task at the same time. A single camera might monitor a whole room, and detect when the room is occupied, when a meeting is in progress, and when an overhead projector is being used, as well as recording the meeting.
This paper describes the use of video in the creation of a Computer-Augmented Environment called BrightBoard, which uses a video camera and audio feedback to augment the facilities of an ordinary whiteboard. Until fairly recently the deployment of such systems in the average workplace would be limited by the cost both of cameras and of the computing power required to process video signals. However, manufacturing developments are making video cameras an economically viable alternative to more conventional sensors, and the typical office workstation is now fast enough for the simple image processing required for many of these `Video-Augmented Environments' (VAEs).
BrightBoard aims to capitalise on this natural means of expression by making use of the whiteboard as a user interface. It is not the first system to explore the whiteboard's potential. Projects such as Tivoli [11] have tried to capitalise on this natural means of expression, and have created note-taking, shared drawing and even remote conferencing systems by emulating and enhancing a whiteboard using a computer. There have been many variations on the whiteboard theme as well. VideoWhiteboard [15] used a translucent drawing screen on which the silhouette of the other party could be seen. Clearboard [8] was a similar system which used the metaphor of a glass whiteboard where the parties in a two-way video conference were on opposite sides of the glass, allowing both face-to-face discussion and shared use of a drawing space. Typically, these systems use a large-screen display and electronic pens whose position in the plane of the screen can be sensed and which may have pressure-sensitive tips to control the flow of `ink', thus giving a more realistic feel to the whiteboard metaphor. The Xerox Liveboard [4] is an example of a complete, self-contained, commercially available unit built on this basis; it has a workstation, Tivoli software, display and pens in one ready-to-install package. The software allows several users (potentially at different locations) to draw on a shared multi-page whiteboard where each page may be larger than the screen and scrollable. The software is controlled by a combination of gestures, buttons and pop-up menus. A more recent commercial offering is SoftBoard. This is a special whiteboard which uses pens and erasers similar to conventional which have a reflective sleeve allowing their position to be detected by an infra-red laser scanning device fitted to the board. The movements of the pens and erasers are relayed by the board to a computer, which can then construct an electronic version of the image on the board.
Such systems, while useful, have their failings. Their price is typically quoted in thousands of dollars, they are often delivered by a forklift truck, and are generally installed in board-rooms whose primary users have neither the time nor the inclination to master the software. They also fail to achieve the ease of use of a conventional whiteboard, even for those experienced with them. To quote one user, "The computer never quite gets out of the way".
Examples of whiteboard use observed by the author, which tend not to translate well into the electronic domain, include the following:
![]() | A `Print' command |
![]() | Selecting an area of the board |
![]() | A `Fax to Peter' command |
Eric is leading a meeting in a room equipped with a BrightBoard camera. He arrives early to discover that the room is a little chilly, so he writes the temperature he requires on the whiteboard,All this control has been achieved without Eric leaving the focal point of the meeting - the whiteboard - and without direct interaction with any machines or controls. It is easy to configure:,and the air-conditioning reacts appropriately. When the participants arrive, he makes a mark on the board to start the video-recording of the meeting,
, and then uses the board as normal. During the meeting the participants request a copy of a diagram on the board, so Eric marks that area of the board and prints off six copies
. As the meeting draws to a close, Eric scribbles a quick note requesting coffee for six, marks out the area as before, and mails it to his secretary
. Finally, he switches off the video-recorder and the air-conditioning by erasing his original marks.
BrightBoard uses a video camera pointed at a standard whiteboard,
thus eliminating the need for expensive installations and electronic pens. It
uses audio or other feedback, and so has no need of a large display. The
program, once configured, puts minimal load on the workstation, and requires no
manual intervention. The computer can therefore be shut away in a cupboard, or
put in a separate room - particularly useful for meeting rooms where the hum of
fans and the flicker of screens can be a distraction.
We shall look at each of these stages in turn.
The triggering module can wait either for movement, or for stability. By concatenating these two modes we say, in effect: "Ignore an unchanging whiteboard. Wait until you see movement in front of it, and when this movement has finished, then proceed with your analysis". A full-resolution image can then be captured, to be used in the following stages.
Unfortunately, these methods are rather too slow for an interactive system, so we use an adaptive thresholding algorithm developed by Wellner for the DigitalDesk [16]. This involves scanning the rows of pixels one at a time, alternating the direction of travel, and maintaining a running average of the pixel values. Any pixel significantly darker than the running average at that point is treated as black, while the others are taken to be white. This simple algorithm works remarkably well in cases where the image is known to have small areas of dark on a light background, such as we find in the typical printed page and on a whiteboard, and it only involves a single examination of each pixel.
Once a black pixel is found, a flood-fill algorithm is used to find all the black pixels directly connected to that one. As the fill proceeds, statistics about the pixels in the blob are gathered which will be used later in the recognition process; for example, the bounding box of the blob, the distribution of the pixels in each axis, the number of pixels which have white above them and the number with white to the right of them. For the sake of speed, an `upper limit' on the number of pixels is passed to the filling routine. Beyond this limit the blob is unlikely to be anything we wish to recognise, so the flood fill continues, marking the pixels as `seen', but the statistics are not gathered. When the fill completes, if the number of pixels is less than this upper limit but more than a specified lower limit, the blob's statistics are added to a list of items for further analysis.
Figure 4: A sample image captured... | |
Figure 5: ...and processed by BrightBoard |
,
where n is the number of dimensions, it means that the candidate blob is
on average more than D standard deviations from the prototype in each
dimension. By adjusting the value of D the selectiveness of our
recogniser can be controlled.
![[Figure 6]](qsf_fg13.gif)
The limitations of a very crude recogniser can be overcome to a substantial degree by the choice of command patterns. In practice we find that the current recogniser is capable of distinguishing well between the symbols in its known alphabet, but it has a tendency to be over-generous, and recognise as valid symbols some things which are not. The chances of these `false' symbols occurring in such relationships as to constitute a valid command are, however, very small.
Fortunately, there are languages available which specialise in the description and analysis of relationships: those designed for Logic Programming, of which Prolog is the most common. If the information from the recogniser is passed to a Prolog engine as a set of facts, we can then write Prolog rules to analyse the contents of the board. For each blob found, BrightBoard assigns a unique number x and adds a rule to a Prolog database an assertion of the form:
bounds( itemx, w, e, n, s )which specifies that blob x has a bounding box delimited by the north-west corner (w, n) and the south-east corner (e , s). Simple rules can then be written to determine, for example, whether blob A is inside blob B, or to the right of it, or of a similar size, from these entries. In addition, if the blob has been recognised, a second assertion will be made, of the form:
issym( itemx, y )which indicates that item x has been recognised as being symbol y. A `Print' command might then be defined as follows:
doprint :-This can be roughly translated as "there is a print command if we can find blobs X and Y such that X is a `P' and Y is a `checkbox' and X is inside Y, and nothing else is inside Y 2".
issym(X, p),
issym(Y, checkbox),
inside(X, Y),
/+ (inside(Z, Y), Z \= X)
Both current and previous states can be passed to Prolog, so that the rules can also specify that printing should only occur if either X or Y or both were not present in the previous analysis. This prevents a command from being accidentally repeated.
On a SPARCstation 2, BrightBoard took 4.5 seconds to capture, threshold, display, analyse and recognise the `Fax to Bob' command in the 740 x 570 image shown in Figure 4, from the time at which movement was no longer detected. There is much room for optimisation; speed was not a primary issue during BrightBoard's initial development.
<predicate> <filetype> <command>where <predicate> is, for example, `doprint', <command> is any valid UNIX shell command, and <filetype> is either `none' or the name of an image format3. If it is not `none' then a temporary file of the specified format is created and its filename can be passed to the UNIX command by using `%s' in the command file. A print command might then be:
doprint pgm pgmtops %s | lprthough more complicated actions would generally be implemented by a specially written script or another program. The commands currently employed also provide audio feedback to the user through the use of pre-recorded or synthesised speech. The user is informed not only when a print command has been seen, but also when the printing has been completed.
One predicate is given special treatment in the current version of BrightBoard. It is named `inc_area' and checks for the presence of symbols which mark the bounds of the area to be included in the image file. This allows small areas of the board to be printed, saved, sent as messages etc.
The next version of BrightBoard (currently under development) uses an extended protocol for the interaction between Prolog and the UNIX commands. The number of parameters of the predicate may be specified in the command file, and, if greater than zero, the values of the variables returned by the evaluation are passed to the UNIX command as environment variables. The UNIX command is executed once for each match found in a given image, with a special variableset on the last match. This allows the function of `inc_area' and similar predicates to be implemented by external programs, giving rise to much greater flexibility. As an example, a print command can consist of a P followed by a digit, where the digit represents the number of copies to be printed. The doprint predicate can then have a parameter in which it returns the digit found, and this information is passed to the executed command.
Until now, all use of BrightBoard has been by the author and two colleagues for a substantial number of demonstrations, but under fairly controlled lab conditions. The system has almost reached the stage where user testing is possible in a real office environment, and this is the obvious next step in its development.
The first is that there is minimal configuration required to set it up. All that is needed is a camera with a fairly unobstructed and `straight-on' view of the board, zoomed to a reasonable scale. It would be straightforward, therefore, to make a portable version of BrightBoard. A briefcase containing a laptop computer and a camera with a small tripod could be carried into any meeting room, the camera pointed at a board and the program started, and the whiteboard in the meeting room is immediately endowed with faxing, printing, and recording capabilities.
Secondly, the system is not limited to whiteboards - any white surface will suffice. Thus noticeboards, refrigerator doors, flipcharts, notebooks and papers on a desk can all be used as a simple user interface. The current version of BrightBoard has been switched from monitoring a whiteboard to providing a `desktop photocopying' device without even stopping and restarting the program. A camera clipped to a bookshelf above the desk and plugged into a PC (which will often be on the desk anyway) enables any document on the desk to be copied, saved, faxed without the user moving from the desk or touching a machine. If the user does not wish to write on the documents themselves, then Post-it notes or cardboard cut-outs with the appropriate symbols drawn on them can be used. Parts of the paper documents can be selected using the area delimiting symbols, and pasted into electronic documents. Resolution is a slight problem here, as a typical frame-grabber capturing half a page will only provide about 100 dots-per-inch; the same resolution as a poor-quality fax. It does, however, capture a grey-scale image, and the anti-aliasing effects make the resolution appear much higher than would be the case with a purely black & white image. In situations where a binary image is definitely needed, the greyscale information can be used to enhance the resolution artificially. We have found the appearance of thresholded images to be greatly improved by inserting extra pixels between the original grey pixels and giving them values based on linear interpolation between the neighbouring originals. A double-resolution image is formed which is then passed to the thresholding module. Since this increases the time required for thresholding by a factor of four, the process is only used in the output of images, and not in BrightBoard's internal processing.
A richer interaction model would be possible by monitoring the position and gestures of a user's hands. The system, before acting on a command, for example, could request confirmation from the user which might be given with a `thumbs-up' gesture. Conventional hand-tracking systems have generally required the user to wear special gloves and position-sensing devices [1], but systems have been built using video alone. Jakub Segen describes a system which allows the user to control a mouse-style pointer on the screen, or to `fly' through a Virtual Reality-like environment, using hand movements and gestures watched by a camera [12]. Myron Krueger's VideoPlace combines projected graphics with the silhouette of the user to create an artistic medium which responds to full-body gestures [9]. Such a system would be easier to include in a portable BrightBoard, as described above, than one involving extra hardware.
An interesting challenge would be the creation of a friendlier user interface to the Prolog rules. One of the aims of BrightBoard is that it should be accepted in a normal office environment, but the inhabitants of such an environment will not generally be Prolog programmers. A programming language allows us great flexibilty, however, which can be difficult to duplicate in other ways. Consider the following specification:
`A P in a box, possibly followed by another symbol representing a digit which is also inside the box, constitutes a print command, where the number of copies is given by the digit, or is one if no digit exists. There must be no other symbol inside the box.'It is difficult to imagine an easy way of representing this graphically. Indeed, even the concept represented by the words `followed by' must be explicitly defined. A textual front-end to the Prolog could possibly be created which would more closely resemble natural language, or a programming language with which users were more likely to be familiar. This is only an issue if the users are expected to customise the set of commands provided by BrightBoard.
BrightBoard is an example of this genre which illustrates some of the problems that will be issues to many video-augmented environments and shows what can be accomplished by combining relatively unsophisticated image processing and pattern recognition techniques with logic-based analysis of their results.
All product names are acknowledged as the trademarks of their respective owners.