



Colin Ware
Faculty of Computer Science
University of New Brunswick
P.O. Box 4400, Fredericton
New Brunswick, Canada E3B 5A3
Email: cware@UNB.ca
Based on a review of the facts about human stereo vision, a case is made
that the stereo processing mechanism is highly flexible. Stereopsis seems to
provide only local additional depth information, rather than defining the
overall 3D geometry of a perceived scene. New phenomenological and
experimental evidence is presented to support this view. The first
demonstration shows that kinetic depth information dominates stereopsis in a
depth cue conflict. Experiment 1 shows that dynamic changes in effective eye
separation are not noticed if they occur over a period of a few seconds.
Experiment 2 shows that subjects who are given control over their effective eye
separation, can comfortably work with larger than normal eye separations when
viewing a low relief scene. Finally, an algorithm is presented for the
generation of dynamic stereo images designed to reduce the normal eye strain
that occurs due to the mis-coupling of focus and vergence cues.
The stereoscopic depth cue consists of relative differences or
disparities between parts of the images available to the two eyes. In
normal circumstances this information is effective only for objects less than
25 meters away, and it is optimal for objects that are much closer.
In computer graphics, some objects have no inherent spatial size, others may be
representations of mountains or microscopic entities. One way of obtaining an
appropriate stereo view is to scale the scene and bring it to an appropriate
viewing distance. Another is to change the effective eye separations
dynamically. An interesting research question is whether it is possible to
create a system in which the stereo disparity is changed dynamically so as to
create near optimal disparities for perceiving depth information no matter
whether the graphical object is at a great distance or close. The question
addressed here is the extent to which changing disparities in real-time is
perceptually disturbing. Two new experiments and two demonstrations are
reported that show that human perception of depth through stereopsis is highly
flexible, large distortions of the correct perspective geometry are possible
and these distortions may be changed dynamically without undue perceptual ill
effects.
An algorithm is presented which is designed to take advantage of this
perceptual flexibility to allow the real-time adjustment of stereo disparity
values as the user moves through the image space. The goal is to create a
system in which the disparity values are always comfortable.
First some terminology and basic facts relating to stereo vision. Figure 1
illustrates the simplest possible stereo display. The eyes are fixated on the
vertical line a. A second line b is closer to a in the
right eye's image than in the left eye's image. The brain resolves this
discrepancy by perceiving the lines as being at different depths as shown.
Retinal disparity is the difference between the angular separation of a
and b in the two eyes Vergence is
the degree to which the two eyes converge to fixate a target (this is also
called phoria).
If the disparity between the two images becomes too great then diplopia
occurs. This is the appearance of the doubling of part of the image. Another
way of putting this is that the images are no longer fused.
Whether two images can be fused or not and the area within which fusion occurs
is called Panum's Fusion Area. However the size of Panum's fusion area
is highly dependent on a number of visual display parameters such as the
exposure duration to the images and the size of the targets. It is also true
that depth judgments can be made despite diplopia, in other words, outside of
the fusion area, although these are less accurate. For an excellent
introductory review of stereo vision from a human factors perspective see [7].
In stereo photogrammetry and in certain kinds of range finders it is common to
create stereo images which have an effective eye separation much larger than
any actual eye separation [5]. The reason for this is obvious; human eyes are
only placed approximately 6.3 cm apart, which means that stereo information is
only a useful depth cue up to 30 meters or so. However, if we can effectively
change the eye separation then far more distant objects can be resolved by
stereopsis. In viewing a mountain 10 km distant a virtual eye separation of 1
km might be appropriate. If viewing an object at 1 cm (as in a stereo
microscope) a virtual eye separation of 1 mm will be more suitable.
FIGURE 1.
An illustration of some of the basic geometry relating to
stereoscopic viewing.
A more subtle depth cue conflict can arise if we try to change the stereo
separation dynamically while moving through a scene. One of the most important
depth cues comes from the dynamic flow of information across the retina, and
evidence suggests that this is more important to 3D space perception than
stereopsis [1,2]. When we are driving along a highway, we have a very strong
sense of space yet almost all objects are likely to be outside the range at
which stereo disparity is an effective depth cue. Figure 2 illustrates the kind
of visual flow field that result from forward motion. This depth cue is also
called motion parallax or, in some cases, the kinetic depth effect.
FIGURE 2.
The kind of visual flow field that results from forward motion through
a 3D environment.
Dynamically changing disparities, should cause changes in the relative depths
in a scene. However, if other depth cues are stronger this effect may not be
apparent. There is some evidence that the other dynamic depth cues, such as
motion parallax, so dominate space perception that altering the effective
disparities will be invisible. The perceptual question is, can we fly around a
scene dynamically changing the effective eye separation without the users
perceiving a rubbery distortion of the scene? Distortion should occur if the
brain is a perfect geometry processor. Also, if rubbery distortion does appear
is the effect disturbing or is it an acceptable price to pay for optimal
stereopsis?
A piece of indirect evidence for the relative weakness of the stereo depth cue
comes from a paper by Wallack [10]. In his study Wallack increased the
effective eye separation in a telestereoscope which more than doubled the
effective eye separation of the subjects. His subjects viewed a rotating wire
object. The point that is relevant here is that before the actual experiment
Wallack had to discard half his subjects because they failed to perceive any
size distortion of the object as it rotated, whereas the disparity-vergence
information should have made the object appear to stretch greatly in depth as
it rotated. Clearly for those subjects that were discarded the kinetic depth
effect (perhaps combined with object rigidity assumptions) completely dominated
the percept. What is more after a short period of exposure the shape
distortion of the object appeared much reduced even for those subjects who
passed the initial test.
In a screen all objects lie in the same focal plane no matter what the apparent
depth. However, the eye may be fooled into thinking that they are at
different depths by means of stereo display that provides accurate disparity
and vergence information. The problem is that in screen based stereo displays
vergence information is provided correctly but focus information is not.
There is some evidence that the failure to correctly change focus information
causes a form of eye strain [5]. A recent Japanese study showed that after
watching 3D images for a while the eyes lose their ability to refocus quickly
[6]. This problem is present in all current generations of stereoscopic head
mounted displays and with monitor based stereo displays.
In another study it has been shown that the coupling of accommodation and
vergence can be changed [4] and that this change can persist for some time.
There appears to be considerable flexibility in the visual system regarding the
coupling of focus and vergence. Anytime that a person dons a pair of reading
glasses her visual system is forced to make an adjustment to a fixed change in
focal length of her eye. This forces a change in the focus vergence
relationship. With bifocals this re-adjustment must be continuously
effected.
In view of the above observations how may we reduce the problems associated
with the decoupling of focus and vergence in stereo display? One solution
that seems obvious is to try to make images lie in the vicinity of the monitor
screen, to reduce the parallax. This will minimize the focus vergence
discrepancy. Valyrus [8] found experimentally that the accommodation vergence
discrepancy should not be more than 1.6 degrees. He proposed a guideline based
on parallax, which he defined as the spatial discrepancy on the screen between
homologous image points from the two eyes. His guideline states that
P <= 0.03D
where P is the parallax and D is the viewing distance, otherwise diplopia will
occur. Veron et al. [9] used this formula to derive the guideline that screen
based stereo displays should be placed 2.3 metres from the viewer to give an
image that it should always be possible to fuse. They assumed that the virtual
object would always be placed behind the screen.
Based on a different analysis of the problem Williams and Parrish [12]
concluded that a practical viewing volume falls between -25% and + 60% of the
viewer to screen distance. They proposed a method whereby objects at
different depths can optimally use the available disparity range and show how
objects at two or more different distances can be brought into the useful
viewing volume. Their scheme parcels out the available disparity so that
certain depth ranges are enhanced stereoscopically, while others are reduced in
terms of the stereo depth. For example, in a scene with two objects, the
distance between the front and back of each objects is allocated a large
disparity range, while the empty space between them is made devoid of
disparity. What is interesting here is that this approach assumes that
disparity is more important for seeing the local 3D shape of the individual
objects rather than the 3D relationship between the two objects. Disparity
becomes only local depth cue and not a global depth cue. Whether or not this
is appropriate it assumes that the brain can tolerate inconsistencies between
disparity information and other depth cues.
A reasonable working hypothesis is that most of our understanding of 3D space
comes from depth cues such as occlusion, motion parallax and linear
perspective. Stereo disparity provides additional, rather local information
about relative depths. Therefore it may be reasonable to devise algorithms to
dynamically adjust disparity information so that it is optimal for a particular
situation, because the fact that depth cue conflicts will result is unlikely to
be noticed.
The following series of demonstrations and experimental studies were all
devised to test the validity of this hypothesis and explore the usefulness of
dynamic stereo adjustments.
1) By changing the eye separation parameter in a computer graphics stereo
display the scene can be flattened or depth enhanced. Given the correct
viewing position it is possible to construct a stereo view of a 3D scene such
that the images presented to the two eyes are correct for an object in the
vicinity of the monitor [2]. We assume that an eye separation parameter is set
to 1.0 for "correct viewing" and 0.0 when both eyes get the same image, as in
single viewpoint graphics. Clearly, it is also possible to set this parameter
to intervening values, or even outside of the range 0-1. If it is negative
then depth relationships are inverted (which in not likely to be useful), if
greater than 1.0 then stereo depth is enhanced. Thus this parameter has the
effect of flattening or depth enhancing the image in terms of disparity cues.
2) By scaling an object and bringing it closer or moving it futher away, the
separation of the eyes in relative to the object's size can be changed. This
if a 5 km mountain viewed at 10 km distance is shrunk to 0.5 meter and moved
to a viewing distance of 1 meter the eyes will be effectively 10,000 times
further apart relative to the mountains original size.
3) By use of mirrors or prisms it is possible to actually change the optical
separation of the pupils of the eyes. We will not be concerned with this
method here.
All of the studies, except for the last used an Indigo Extreme with
the Cyberscope(TM). The Cyberscope consists of a hood that can be
placed over a small monitor allowing for the stereo viewing of properly
constructed images. The Cyberscope uses front surface mirrors to displace and
rotate the images presented to the two eye as shown in Figure 3
FIGURE 3.
The cyberscope optically rotates the images from the two
halves of the screen, 90 deg clockwise and counter clockwise respectively, and
superimposes them. This is done using front surface mirrors to provide perfect
optical clarity.
DEMONSTRATION
1: Does Motion parallax dominate stereo disparity when they are in conflict?
This study was directed at the relationship between optical flow information
spatial cues, and stereo disparity cues. A special display was created in
which the flow of visual information was consistent either with a continuously
approaching surface, or with an inflating surface, This scene was constructed
to be self similar at all scales of resolution, and it was designed to be
constantly expanding about a center at the plane of the screen (Illustrated in
Figure 4 and in Color Plate 1, Ware,). Constantly inflating objects are not
common but an expanding flow field is present whenever we move forward through
the environment. Thus, based on experience observers might be expected to
perceive movement of the scene towards them.
However, the scene was viewed in stereo and the stereo depth cues should have
been enough to tell the observers that the scene was in fact expanding away
from them.
FIGURE 4.
A schematic cross section of the recursively defined scene. The
truncated pyramid in the center of each group of three, has a group of three
truncated pyramids on top of it. The scene is self similar through scale
transformations about the dot. The scene was viewed in stereo from above while
continuously expanding.
The issue was, would subjects perceive the scene as inflating - which was the
only geometrically consistent way of perceiving the pattern - or would they
perceive something constantly coming towards them, as would be more consistent
with everyday experience.
Observers were asked to look at this display and comment on what they saw. The
general consensus on observing this display is that it shows a scene "coming up
towards me". This impression lessens somewhat with time, and sometimes the rate
of advance appears to slow and speedup, depending on the stage in the
animation cycle. However, none of the observers reported expansion, and none
of them reported seeing the scene moving away from them, as was in fact
happening. This suggests a very powerful dominance of the optical flow
information over stereo information.
In the introduction to this paper a case was made that the perceptual motor
system is capable of re calibrating the disparity depth cue mechanism in the
presence of other depth cues, such as motion parallax. Another way of
describing it is that the disparity mechanism is insensitive to low frequency
change. This study is directed to asking the question of how fast this re
calibration can take place by measuring the frequence of disparity changes that
are just detectable.
In order to investigate this problem a scene was constructed in which a moving
carpet dotted with truncated pyramids moved perpetually towards the observer.
An image configured for the Cyberscope is illustrated in Color Plate 2 (Ware).
The scene was viewed in stereo and the effective eye separation was changed
sinusoidally with an accelerating frequency. To describe this transformation
it is useful to examine the extremes. With zero eye separation we have the
same images presented to the two eyes but kinetic information consistent with a
3D scene (like looking at a moving television picture). A separation of 6.3 cm
is normal and results in a correct stereo display for which disparity cues and
the motion parallax are consistent with a 3D scene (making the normal
perceptual assumptions about rigidity). Sinusoidal changes in eye separation
should result in a sensation of oscillating depth if the brain were to rely
primarily on disparity information but this would be in conflict with the
rigidity assumption given the linear perspective and motion flow information.
On each trial the change in separation was started slowly and gradually sped up
until it became noticeable. This speedup was such that after 50 seconds the
eye separation was being changed at 1 Hz. At 100 seconds the frequency would
be 2 Hz. There was also a random offset in time to the start of oscillation
so that subjects could not anticipate this in their responses. The actual eye
separation did not oscillate through the full range but varied with different
amplitudes under different conditions. The amplitudes of oscillation were 10%,
20%, 30% The eye separations also varied 6.3 cm, 4.2 cm and 2.1 cm. Thus
there were nine viewing conditions given by the product of these sets of
settings.
One of the problems with this study was the difficulty in describing to
subjects what they were supposed to look for. Subjects do not report
sinusoidal depth changes. Instead a kind of paradoxical sideways movement is
perceived; it is paradoxical because it is going in both directions at once.
Subjects had to be trained to be able to see this phenomenon. Once this was
achieved they were instructed to push a mouse button as soon as the
oscillation became noticeable. This had the effect of recording the result and
initiating the next trial.
Nine subjects who were all undergraduate or graduate students were used as
observers.
The results showing the mean frequency at which subjects detected the
paradoxical motion are plotted in Figure 5. They show that the oscillation
frequency that is detectable varies inversely with the amplitude of
oscillation, and inversely with the effective eye separation. Both of these
are to be expected since both eye separation and amplitude increase the amount
of disparity change over time. The worst case was for a the maximum amplitude
and the maximum eye separation, in which case the time to average frequency was
0.3 Hz. This is a remarkably rapid rate of adaptation to changing disparity
ratios.
FIGURE 5.
Results from Experiment 1 showing how the frequency at which
oscillating eye separations are detected varies with different amplitudes and
base values.
In practical terms what the results mean is that with a moving pattern eye
separation can be changed dynamically as long as it is done gradually, taking
several seconds to smoothly change. In this case viewers are unlikely to
notice that anything unusual has happened.
The second experiment was initially designed to address the issue whether or
not there is an obviously correct eye separation setting that is consistent
with the geometry of a scene. However, in our pilot study we found that
subjects had very little idea of what the "correct setting" was, therefore we
changed the task and asked the users to create "the maximum comfortable
setting" in terms of eye separation. Subjects were given control over the
effective eye separation and instructed to increase the eye separation until
diplopia occurred and then move it back to a comfortable value. The moving
carpet display was used again for this study.
FIGURE 6.
The moving carpet of truncated pyramids was presented in stereo and
at different angles to the vertical plane of the monitor screen.
Subjects were given controls that allowed them to adjust the eye separation by
depressing one of two keys, one of which increased the eye separation, the
other of which decreased the eye separation. Subjects did this with the
computer graphics model of the moving carpet set at 8 different angles with
respect to the monitor (as shown in Figure 6) and they repeated the procedure
twice to provide two settings at each angle.
FIGURE 7.
The results 10 subjects participating in Experiment 2
The results are shown in Figure 7. They indicate that subjects could
comfortably tolerate much greater disparities with scenes having little depth
in them, and small disparities with scene that contained a lot of depth. This
suggests that automatically changing the effective eye separation information
about depth in a scene is probably a good idea even if it means breaking the
rules of consistent geometry to do so. It is also clear that there are large
individual difference with respect to the amount of disparity that can be
tolerated suggesting that users of stereo displays should be able to customize
a disparity parameter for their own comfort.
The following algorithm was created to allow for the viewing of any scene with
automatic adjustment so the stereo values would be in a reasonable range and
could change dynamically. This algorithm has three steps.
Step 1: measure the closest portion of the displayed image. (This can be done
by sampling the Z buffer).
Step 2: scale the scene about a mid point between the observer's two eyes in
such a way that the closest point lies just behind the screen.
Step 3: render this modified scene in stereo using the normal methods for
constructing off axis perspective views [3].
The transformation is illustrated in Figure 8.
FIGURE 8.
Schematic illustration of the effects of the stereo adjustment
algorithm.
This algorithm achieves the following things.
DEMONSTRATION
2: Stereo adjustment algorithm with a large screen
Representatives of the UNB Ocean Mapping Group showed several thousand people
the dynamic stereo display using a large format, 60 in diagonal screen,
Electrohome projector at the CeBIT trade show in Germany. They all viewed a
digital elevation map showing a line of undersea volcanoes in the South
Pacific. This is shown as a monocular image in Color Plate 3, Ware. The eye
separation was set to be considerably larger than usual, about 24 cm, in order
to get appropriate disparities given the approximately 3 meter viewing
distance. The interface allowed people to "fly" over the terrain using a
six-degree of freedom input device [11]. In general viewers were very
impressed by the large format, high resolution stereo display. None reported
that the scene appeared to be expanding and contracting as they moved through
the artificial landscape.
All of the evidence presented here is consistent with the hypothesis that the
disparity depth cue is a highly flexible depth enhancement, rather than the
primary determinant of 3D space perception. What this means is that in the
absence of evidence to the contrary, hyper stereo adjustments are a useful tool
in information display. We apparently do not need to be careful about matching
the stereo geometry with the actual eye geometry. Rather what is important is
to create stereo displays which maximize disparity gradients while maintaining
them at a level below that at which diplopia sets in. Given this
interpretation it seems to be worth artificially changing scenes so that the
stereo information about relative depths is optimized, even though this stereo
information may be in conflict with the other depth cues available, such as
linear perspective and motion flow. The two advantages to such manipulations
will be that disparities can be optimized for depth discrimination in a given
scene and vergence-focus conficts can be reduced - which has the effect of
reducing long term eye-strain.
Abstract
Keywords:
Stereo displays, Virtual reality, 3D displays.
Introduction
Stereo Vision
. Eye Separation
Cue Conflict
Occlusion is one of the major depth cues. It is a perceptual rule that says
that closer objects always occlude (i.e. cover up) further objects. The
problem is that when disparity information causes an object appear in front of
a screen display the edge of the screen may appear to occlude that object and
since occlusion is the stronger depth cue, the conflict is resolved
perceptually in favor of occlusion, destroying the illusion of depth.
The Vergence focus problem
When we fixate objects at different depths, two things happen: the degree of
convergence of the eyes changes (called vergence) and the focal length of the
lens in the eye changes to create a sharp image on the retina. The vergence
and the focus mechanism are known to be coupled in the human visual system.
In fact if one eye is covered the vergence of that covered eye changes as the
uncovered eye focuses on objects at different distances.
Summary Of Major Points So Far
Changing Effective Eye Separations
It is possible to change the effective eye separation by a number of means.
Equipment Used
EXPERIMENT
1: How fast can we change disparity cues without the effect being noticeable ?
Method
Results
EXPERIMENT
2: How do observers adjust their eye separation ?
ALGORITHM FOR DYNAMIC STEREO ADJUSTMENT
CONCLUSION