DOC PREVIEW
U of M PSY 5036W - Learning object categories for efficient bottom-up recognition

This preview shows page 1-2-17-18-19-35-36 out of 36 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Learning object categories for efficientbottom-up recognitionDaniel KerstenPsychology Department, University of Minnesotakersten.orgSupported by ONR N 00014-07-1-0937Wednesday, December 1, 2010•Enormous range of variability in the images for a given object category, eg. “foxes”•Enormous objective uncertainty regarding image features present for any given exemplarChallenge of complexity in natural image inputWednesday, December 1, 2010How to learn to be maximally effective across a broad range of tasks?•Need generative “world model” that can account for previously unexperienced combinations of objects, background, lighting, pose, ...•Need efficient selection of critical diagnostic features to index object classes that will generalize across all within-class instances•Learning object categories•The challenge of learning from a small number of examplesWednesday, December 1, 2010Mechanisms for flexible recognition•Generative mechanisms: “Analysis by Synthesis”Yuille, A., & Kersten, D. (2006). Vision as Bayesian inference: analysis by synthesis? Trends Cogn Sci, 10(7), 301-308. Wednesday, December 1, 2010For recognition, analysis by synthesis useful when:•Segmentation in cluttered scenes•Transformations that are computationally difficult to do bottom-up, e.g.‣orientation in 3D depth‣articulations, e.g. scissors‣occlusion•Competing/interacting object property/scene hypothesesWednesday, December 1, 2010Figure 5: Top left: Inpu t image. Top right: Bottom-up pr oposals for text and faces are shownby boxes. A face is “hallucinated” in a tree. Bottom centre: Overall segmentation (bottom lef t),Detection of letters and faces. Bottom r ight: Synthesised imageTh ere is evidence that reliable diagnostic information f or certain categories is available from verysimple image m easurements [35, 32], and that humans make certain categorical decisions s ufficientlyfast to preclude a verification loop [40](bu t see [41] and [42]).“Where do the generative models come from?”Ideally the generative models, the discriminative models, and the stochastic grammar would allbe learnt from natural images. Th is is n ot difficult in principle because, as discus s ed in Griffithsand Yuille, learning the model from data is simp ly another example of statistical inference. TheHelmh oltz machine [43] gives an illustration of how a generative model, and an inference algorithm,can be learnt. This approach, however, has been applied only to simp le visual stimuli. SimilarlyFriston [16] suggests learning models u s ing the E x pectation-Maximization algorithm. Althoughthis is a usefu l metaphor, the challenge is to see whether this idea can be translated to algorithmsthat can deal with the complexities of natural images.Learning generative and discriminative models is an extremely difficult pr oblem in practicedue to the large dimensionality of natural images. Th ere has recently, however, been dr amaticpr ogress on the similar, but arguably simpler, problem of learning a stochastic grammar for naturallanguages (see article by Chater and Mann ing). At present, different components of the imageparsin g model are learnt individually. For example, the discriminative m odels for text and facesare trained using labelled examples of “face”, “text”, and “non-face”, “non-text”. Similarly the8Tu, Z., Chen, X., Yuille, A., & Zhu, S. (2005). Image Parsing: Unifying Segmentation, Detection and Recognition. IJCV, 63(2).InputThree models: text, faces, textureComputational ExampleWednesday, December 1, 2010Figure 5: Top left: Input image. Top right: Bottom-up proposals for text and faces are show nby boxes. A f ace is “hallucinated ” in a tree. Bottom centre: Overall segmentation (bottom left),Detection of letters and faces. Bottom right: Synthesised imageThere is evidence that reliable diagnostic in formation for certain categories is available from verysimple image measurements [35, 32], and that hum ans make certain categorical decisions suffi cientlyfast to pr eclude a verification loop [40](but see [41] and [42 ]).“Where do t he generative models come from?”Ideally the generative models, the discriminative models, and the stochastic grammar would allbe learnt from natural images. This is not difficult in principle because, as discussed in Griffithsand Yuille, learning the model from data is simply another example of statistical inference. TheHelmh oltz machin e [43] gives an illustration of how a generative model, and an inference algorithm,can be learnt. This appr oach, however, has been app lied only to simple visual stimuli. SimilarlyFriston [16] suggests lear nin g models using the Expectation -Maximization algorithm. Althoughthis is a useful metaphor, the challenge is to see whether this idea can be translated to algorithmsthat can deal with the comp lexities of natural images.Learning generative and discriminative models is an extremely difficult problem in practicedue to the large dimensionality of natural images. There has recently, h owever, been dramaticprogress on the similar, bu t arguably simp ler, problem of learnin g a stoch astic grammar for naturallanguages (see article by Chater and Manning). At present, d ifferent components of the imageparsing model are learnt individually. For exam ple, the discriminative models for text and facesare trained using labelled examples of “face”, “text”, and “non-face”, “non-text”. Similarly the8Bottom-up resultFigure 5: Top left: Inpu t image. Top right: Bottom-up pr oposals for text and faces are shownby boxes. A face is “hallucinated” in a tree. Bottom centre: Overall segmentation (bottom left),Detection of letters and faces. Bottom r ight: Synthesised imageTh ere is evidence that reliable diagnostic information f or certain categories is available from verysimple image m easurements [35, 32], and th at humans make certain categorical d ecisions s ufficientlyfast to preclude a verification loop [40](bu t see [41] and [42]).“Where do the generative models come from?”Ideally the generative models, the discriminative models, and the stochastic grammar would allbe learnt from natural images. Th is is n ot difficult in principle because, as discus s ed in Griffithsand Yuille, learning the model from data is simp ly another example of statistical inference. TheHelmh oltz machine [43] gives an illustration of how a gen erative model, and an inf eren ce algorithm,can be learnt. This


View Full Document

U of M PSY 5036W - Learning object categories for efficient bottom-up recognition

Download Learning object categories for efficient bottom-up recognition
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Learning object categories for efficient bottom-up recognition and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Learning object categories for efficient bottom-up recognition 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?