NYU CSCI-GA 2271 - Binocular Vision - D3041487

Home> Schools> New York University> Computer Science (CSCI-GA) > CSCI-GA 2271> Binocular Vision

NYU CSCI-GA 2271 - Binocular Vision

Pages 7

Download Save

Unformatted text preview:

Binocular VisionIntroductionCameras Geometry and Epipolar LinesBinocular VisionDavi Geiger Courant Institute of Mathematical SciencesNew York University251 Mercer Street, New York, NY 10012, U.S.A.INTRODUCTIONBinocular Vision, or Stereo Vision, is the process of recovering scene surfaces, or relative depth in-formation, from a pair of left and right images (see figure1). There are various different methods ofextracting relative depth from images, some of them are (i) relative size of known objects, (ii) occlu-sion cues, such as presence of T-Junctions, (iii) motion information, (iv) focusing and defocusing, (v)relative brightness. Moreover, there are active methods such as the use of Radar or Laser to extractdepth information from scenes, which requires beams of sound waves or laser waves to be emitted.Stereo vision has one advantage over other methods: it is passive and accurate. Among the passivemethods is the one capable of generating the most accurate relative depth information. In the animalkingdom, stereo vision is used for grabbing/catching objects at nearby distance. Monkeys and Hu-mans, which have arms, are examples of systems in the animal kingdom that use stereo vision for ac-curate depth information at arm length distances. Note that stereo vision typically provide accuraterelative depth information within a range of distances, and that beyond such a range stereo will not beefficient. The human stereo vision system have left and right eyes, but not top and bottom eyes. Thereason is apparent linked to gravity, i.e., that humans are built/evolved vertically and move horizon-tally because of gravity. Given the position of the heads and their movement, the natural place to putthe two eyes to obtain depth information is on the left and right positions. The human visual system isa reference to this work since (i) it is the whole motivation of why we created such a system in thefirst place (ii) it is the best stereo system we can examine, much more robust, flexible, and accuratethan any machine system built today (iii) it is interesting to understand who we are. We will now elab-orate on how a system of a pair of left and right cameras could obtain depth information. Left Image Right Image Left Image Figure 1. An example: The stereo (pair) images of the Pentagon took from an airplane. The first two images consist of thepair and the third image is the copy of the first one, and is displayed for cross fusers.1Each surface patch in the 3D world is projected to a set of pixels on the left image and to another setof pixels on the right image. These two sets of pixels do not need to be of the same size, they willvary according to the angle the surface patch makes with each projected image. A central problem inbinocular vision is to find the surface patch correspondence between left and right images, i.e., to findwhich (set of) pixel on the left image matches which (set of) pixel on the right image. Various prob -lems must be resolved to find such correspondence. Left Image Right Image Left ImageFigure 2. Julesz’s Random Dot Stereogram. The left image, a black and white image, is generated by a random programthat assigns black or white at each pixel according to a random number. The right image is constructed from the left imagein the following way: an imaginary square inside the left image is displaced a few pixels to the left and the empty space isfilled with a random generator. When the stereo pair is shown, the observers can identify/match the imaginary square onboth images and consequently “see” a square in front of the background. It shows that stereo matching occurs withoutrecognition.The introduction of the random dot stereograms (see Figure 2) gave definite evidence that the binocu-lar process of matching pixels does not need to rely on a recognition process. In the random dot stere-ograms, there is no evidence/features on either left or right image that uniquely match, i.e., everyblack/white pixel on one image is identical to any black/white pixel on the other image. Yet, the hu-man vision system is able to find a unique matching of pixels yielding a unique depth recovery. Noteven the identification of illusory contour is known a priori of the stereo process. Figure 3 gives evi-dence that the human visual system does not process illusory contours/surfaces before processingbinocular vision. Accordingly, binocular vision is here described as a process that does not requireany recognition or contour detection a priori. (a) left image (b) right image (c) left image 2(d) left image (e) right image (f) left imageFigure 3. (a)—(c) Stereogram of the Kanizsa Square. (b) and (c) are for cross fusion. (d)—(e) Stereogram of “Stars”,where the illusory surface seen at (d) is not matched to the illusory surface seen at (e), rather a new illusory surfaceemerge as a result of stereo fusion. This is a proof that illusory surfaces do not precede stereo vision.Let us start by analyzising the geometry of a stereo head and show how various simplifications to thematching problem can be obtained. We then introduce an optimization approach to resolve the re-maining ambiguities to the pixel matching. Finally we offer an interpretation of the whole binocularprocess as the one to produce the simplest scene interpretation given the input data (left and right im-ages). CAMERAS GEOMETRY AND EPIPOLAR LINESUnderstanding the geometric constraints imposed by the configuration of two cameras allows fordramatic reduction of the search for the optimal matching. Projective Camera and Pixel representationLet us start by considering one camera under projective geometry and its properties. See Figure 4.Let ),,( ZYXP  be a point in the 3D world represented by a “world” coordinate system. Let O bethe center of projection of a camera where a camera reference frame is placed. The camera coordinatesystem has the z component perpendicular to the camera frame (where the image is produced) andthe distance between the center O and the camera frame is the focal length, f. In this coordinatesystem the point ),,( ZYXP  is described by the vector  ),,(OOOOZYXP

View Full Document


School:
Email:
New Password:
Confirm Password:

NYU CSCI-GA 2271 - Binocular Vision

Sign up for free to view:

Please select your school