DOC PREVIEW
CMU CS 10701 - Inferring Depth from Single Images in Natural Scenes

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Inferring Depth from Single Images in NaturalScenesByron BootsDepartment of Computer ScienceCarnegie Mellon UniversityPittsburgh, PA [email protected] inverse optics problem is one of the oldest and most well knownproblems in visual perception. Inferring the underlying sources of visualimages has no analytic solution. Recent work on brightness, color, andform has suggested that visual percepts represent the probable sources ofvisual stimuli and not the stimuli as such, suggesting an empirical theoryof visual perception. Here I explore this idea by framing the perceptionof depth as a machine learning problem. I apply two algorithms withvarying levels of model complexity, compare their ability to infer depth,both with each other and with the best previous solutions.1 IntroductionIt has long been recognized that sources of visual stimuli cannot be uniquely specified bythe energy that reaches sensory receptors; the same pattern of light projected onto the retinamay arise from different combinations of illumination, reflectance and transmittance, andfrom objects of different sizes, at different distances and in different orientations (Figure1). Nevertheless, visual agents must respond to real-world events. The inevitably uncertainsources of visual stimuli thus present a quandary: although the physical properties of astimulus cannot uniquely specify its provenance, success depends on behavioral responsesthat are appropriate to the stimulus source. This dilemma is referred to as the inverse opticsproblem.For more than a century now, investigators have surmised that the basis of successful bi-ological vision in the face of the inverse optics problem is the inclusion of prior experi-ence in visual processing, presumably derived from both evolution and individual devel-opment. This empirical influence on visual perception, first suggested by George Berke-ley in 1709[1]. has been variously considered in terms of Helmholtz’s ”unconscious in-ferences,”[2] the ”organizational principles” advocated by gestalt psychology[3], and theframework of ”ecological optics” developed by Gibson[4]. More recently, these broadinterpretations have been bolstered by a wealth of evidence suggesting that many visualpercepts can be predicted according to the real-world sources to which an animal has al-ways been exposed[5,6,7]. In fact, many of the anomalous percepts that humans see inresponse to simple visual stimuli may be rationalized in this way[6,7].In the present work, I have explored the notion of an empirical approach to visual percep-tion by framing the inverse optics problem as a machine learning problem. Specifically, Ihave asked how depth to surfaces in natural scenes may be inferred from monocular im-ages. I look at previous approaches to the problem and suggest novel alternatives. Myresults demonstrate the feasibility of solving the inverse problem from two perspectives: anaive linear regression perspective and from a more complex graphical modeling perspec-tive.Figure 1: The inverse optics problem with respect to geometry. Objects of different sizesat different distances and in different orientations may project the same image on the retia.2 Related Work2.1 Traditional approaches to computer visionGeometrical aspects of the inverse optics problem are frequently encountered in computervision in the form of recovering three-dimensional structure from two-dimensional images.Most work in this area has focused on stereopsis [8], structure from motion [9], or depthfrom defocus [10]; all of which rely on a differential comparison between multiple images.Animals, however, are able to judge spatial geometry from a monocular image, and it isthought that this ability lies at the heart of the perception of geometrical space[11]. Despitethis fact, the only well-known method of inferring depth from a single image is the ”shapefrom shading” algorithm[12], a technique that devises models of image formation based onthe physics of light interaction and then inverts the models to solve for depth. These in-verted models are highly underconstrained, requiring many simplifying assumptions (e.g.,Lambertian surface reflectance) that seldom hold in images of natural scenes[14]. Recentlyresearchers have begun to recover geometrical structure from two dimensional images em-pirically using learning-based techniques.2.2 Learning-based methods in computer visionDespite the large quantity of evidence suggesting the importance of empirical data in vi-sion, there have been surprisingly few attempts to leverage machine learning techniques toinfer scene geometry from monocular images. I am only aware of two different methods.Andrew Ng’s group at Stanford University is using discriminatively trained Markov ran-dom fields to infer depth from monocular images collected from a mobile platform [13].This approach is quite successful and has the advantage of directly learning depth-mapsbased on statistics of images and their underlying sources. Potentially, such an approachcould be tied to vision studies which have similarly used images and depth maps to explainperceptual phenomena [5, 14].Aloysha Efros’ group at Carnegie Mellon University is using a completely different tech-nique where subjects hand-label the possible orientation of surfaces in images[15]. Theiralgorithm learns geometric classes defined by simple orientations such as sky, ground, andvertical surfaces in a scene. The labels are then used to ”cut and fold” the image providinga simple ”pop-up” model of a visual scene. This method performs surprisingly well for awide range of images and is visually appealing, but is highly inaccurate and not directlyrelated to the the true statistics of underlying depths in visual images.2.3 Filtering in visual inferenceIn previous attempts at monocular inference it has been suggested that a variety cues areessential for judging depth. In particular, convolutional filters such as Laws’ masks fortexture energy and oriented edge detectors have been used to develop complex feature vec-tors describing local information in image patches [13,15]. Additionally, the local patchesthemselves are augmented with information from multi-scale decompositions of the imagein order to provide additional scene context[13]. This latter point is extremely important aslocal information is insufficient to determine depth [6,7].Despite their extensive use, linear filters are problematic. Features derived in this


View Full Document

CMU CS 10701 - Inferring Depth from Single Images in Natural Scenes

Documents in this Course
lecture

lecture

12 pages

lecture

lecture

17 pages

HMMs

HMMs

40 pages

lecture

lecture

15 pages

lecture

lecture

20 pages

Notes

Notes

10 pages

Notes

Notes

15 pages

Lecture

Lecture

22 pages

Lecture

Lecture

13 pages

Lecture

Lecture

24 pages

Lecture9

Lecture9

38 pages

lecture

lecture

26 pages

lecture

lecture

13 pages

Lecture

Lecture

5 pages

lecture

lecture

18 pages

lecture

lecture

22 pages

Boosting

Boosting

11 pages

lecture

lecture

16 pages

lecture

lecture

20 pages

Lecture

Lecture

20 pages

Lecture

Lecture

39 pages

Lecture

Lecture

14 pages

Lecture

Lecture

18 pages

Lecture

Lecture

13 pages

Exam

Exam

10 pages

Lecture

Lecture

27 pages

Lecture

Lecture

15 pages

Lecture

Lecture

24 pages

Lecture

Lecture

16 pages

Lecture

Lecture

23 pages

Lecture6

Lecture6

28 pages

Notes

Notes

34 pages

lecture

lecture

15 pages

Midterm

Midterm

11 pages

lecture

lecture

11 pages

lecture

lecture

23 pages

Boosting

Boosting

35 pages

Lecture

Lecture

49 pages

Lecture

Lecture

22 pages

Lecture

Lecture

16 pages

Lecture

Lecture

18 pages

Lecture

Lecture

35 pages

lecture

lecture

22 pages

lecture

lecture

24 pages

Midterm

Midterm

17 pages

exam

exam

15 pages

Lecture12

Lecture12

32 pages

lecture

lecture

19 pages

Lecture

Lecture

32 pages

boosting

boosting

11 pages

pca-mdps

pca-mdps

56 pages

bns

bns

45 pages

mdps

mdps

42 pages

svms

svms

10 pages

Notes

Notes

12 pages

lecture

lecture

42 pages

lecture

lecture

29 pages

lecture

lecture

15 pages

Lecture

Lecture

12 pages

Lecture

Lecture

24 pages

Lecture

Lecture

22 pages

Midterm

Midterm

5 pages

mdps-rl

mdps-rl

26 pages

Load more
Download Inferring Depth from Single Images in Natural Scenes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Inferring Depth from Single Images in Natural Scenes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Inferring Depth from Single Images in Natural Scenes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?