CMU CS 10701 - Inferring Depth from Single Images in Natural Scenes - D333796

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> Inferring Depth from Single Images in Natural Scenes

DOC PREVIEW

CMU CS 10701 - Inferring Depth from Single Images in Natural Scenes

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Inferring Depth from Single Images in Natural Scenes Byron Boots Department of Computer Science Carnegie Mellon University Pittsburgh PA 15213 beb cs cmu edu Abstract The inverse optics problem is one of the oldest and most well known problems in visual perception Inferring the underlying sources of visual images has no analytic solution Recent work on brightness color and form has suggested that visual percepts represent the probable sources of visual stimuli and not the stimuli as such suggesting an empirical theory of visual perception Here I explore this idea by framing the perception of depth as a machine learning problem I apply two algorithms with varying levels of model complexity compare their ability to infer depth both with each other and with the best previous solutions 1 Introduction It has long been recognized that sources of visual stimuli cannot be uniquely specified by the energy that reaches sensory receptors the same pattern of light projected onto the retina may arise from different combinations of illumination reflectance and transmittance and from objects of different sizes at different distances and in different orientations Figure 1 Nevertheless visual agents must respond to real world events The inevitably uncertain sources of visual stimuli thus present a quandary although the physical properties of a stimulus cannot uniquely specify its provenance success depends on behavioral responses that are appropriate to the stimulus source This dilemma is referred to as the inverse optics problem For more than a century now investigators have surmised that the basis of successful biological vision in the face of the inverse optics problem is the inclusion of prior experience in visual processing presumably derived from both evolution and individual development This empirical influence on visual perception first suggested by George Berkeley in 1709 1 has been variously considered in terms of Helmholtz s unconscious inferences 2 the organizational principles advocated by gestalt psychology 3 and the framework of ecological optics developed by Gibson 4 More recently these broad interpretations have been bolstered by a wealth of evidence suggesting that many visual percepts can be predicted according to the real world sources to which an animal has always been exposed 5 6 7 In fact many of the anomalous percepts that humans see in response to simple visual stimuli may be rationalized in this way 6 7 In the present work I have explored the notion of an empirical approach to visual percep tion by framing the inverse optics problem as a machine learning problem Specifically I have asked how depth to surfaces in natural scenes may be inferred from monocular images I look at previous approaches to the problem and suggest novel alternatives My results demonstrate the feasibility of solving the inverse problem from two perspectives a naive linear regression perspective and from a more complex graphical modeling perspective Figure 1 The inverse optics problem with respect to geometry Objects of different sizes at different distances and in different orientations may project the same image on the retia 2 Related Work 2 1 Traditional approaches to computer vision Geometrical aspects of the inverse optics problem are frequently encountered in computer vision in the form of recovering three dimensional structure from two dimensional images Most work in this area has focused on stereopsis 8 structure from motion 9 or depth from defocus 10 all of which rely on a differential comparison between multiple images Animals however are able to judge spatial geometry from a monocular image and it is thought that this ability lies at the heart of the perception of geometrical space 11 Despite this fact the only well known method of inferring depth from a single image is the shape from shading algorithm 12 a technique that devises models of image formation based on the physics of light interaction and then inverts the models to solve for depth These inverted models are highly underconstrained requiring many simplifying assumptions e g Lambertian surface reflectance that seldom hold in images of natural scenes 14 Recently researchers have begun to recover geometrical structure from two dimensional images empirically using learning based techniques 2 2 Learning based methods in computer vision Despite the large quantity of evidence suggesting the importance of empirical data in vision there have been surprisingly few attempts to leverage machine learning techniques to infer scene geometry from monocular images I am only aware of two different methods Andrew Ng s group at Stanford University is using discriminatively trained Markov random fields to infer depth from monocular images collected from a mobile platform 13 This approach is quite successful and has the advantage of directly learning depth maps based on statistics of images and their underlying sources Potentially such an approach could be tied to vision studies which have similarly used images and depth maps to explain perceptual phenomena 5 14 Aloysha Efros group at Carnegie Mellon University is using a completely different technique where subjects hand label the possible orientation of surfaces in images 15 Their algorithm learns geometric classes defined by simple orientations such as sky ground and vertical surfaces in a scene The labels are then used to cut and fold the image providing a simple pop up model of a visual scene This method performs surprisingly well for a wide range of images and is visually appealing but is highly inaccurate and not directly related to the the true statistics of underlying depths in visual images 2 3 Filtering in visual inference In previous attempts at monocular inference it has been suggested that a variety cues are essential for judging depth In particular convolutional filters such as Laws masks for texture energy and oriented edge detectors have been used to develop complex feature vectors describing local information in image patches 13 15 Additionally the local patches themselves are augmented with information from multi scale decompositions of the image in order to provide additional scene context 13 This latter point is extremely important as local information is insufficient to determine depth 6 7 Despite their extensive use linear filters are problematic Features derived in this way introduce a priori assumptions about the importance of particular patterns and spatial frequencies on

View Full Document

CMU CS 10701 - Inferring Depth from Single Images in Natural Scenes

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

CMU CS 10701 - Inferring Depth from Single Images in Natural Scenes

Sign up for free to view:

Please select your school