DOC PREVIEW
U of M PSY 5036W - Perceptual integration, Cooperative Computation

This preview shows page 1-2-20-21 out of 21 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 21 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Computational VisionU. Minn. Psy 5036Daniel KerstenLecture 25: Perceptual integration, Cooperative ComputationInitialize‡Spell check offOff@General::spell1D;OutlineLast timeObject recognitionTodayPutting ideas togetherIntegrating perceptual informationModular vs. cooperative computationFor the most part, we've treated visual estimation as if it is done in distinct "modules", such as, surface-color-from-radiance (Land, 1959), shape-from-shading (Horn, 1975), optic flow (Hildreth, 1983) or structure-from-motion (Ullman, 1979).In contrast to the modularity theories of vision, it is phenomenally apparent that visual information is integrated to provide a strikingly singular description of the visual environment. By looking at how human perception puts inte-grates scene attributes, we may get some idea of how vision modules in the brain interact, and what they represent.Some basic graph types in vision (Review from Lecture 6)Some basic graph types in vision (Review from Lecture 6)See: Kersten, D., & Yuille, A. (2003) and Kersten, Mamassian & Yuille (2004)‡Basic Bayesp@S » ID =p@I » SD p@SDp@IDUsually, we will be thinking of the Y term as a random variable over the hypothesis space, and X as data. So for visual inference, Y = S (the scene), and X = I (the image data), and I = f(S).We'd like to have:p(S|I) is the posterior probability of the scene given the image-- i.e. what you get when you condition the joint by the image data. The posterior is often what we'd like to base our decisions on, because as we discuss below, picking the hypothesis S which maximizes the posterior (i.e. maximum a posteriori or MAP estimation) minimizes the average probability of error.p(S) is the prior probability of the scene.p(I|S) is the likelihood of the scene. Note this is a probability of I, but not of S.2 25.PerceptualIntegration.nbWe've seen that the idea of prior assumptions that constrain otherwise underconstrained vision problems is a theme that pervades much of visual perception. Where do the priors come from? Some may be built in early on or hardwired from birth, and others learned in adulthood. See: Adams, W. J., Graf, E. W., & Ernst, M. O. (2004). Experience can change the 'light-from-above' prior. Nat Neurosci, 7(10), 1057-1058 for a recent example of learning the light from above prior for shape perception.‡Low-level visionWe've seen a number of applications of Basic Bayes, including the algorithms for shape from shading and optic flow.In 1985, Poggio, Torre and Koch showed that solutions to many of computational problems of low vision could be formu-lated in terms of maximum a posteriori estimates of scene attributes if the generative model could be described as a matrix multiplication, where the image I is matrix mapping of a scene vector S:25.PerceptualIntegration.nb 3In 1985, Poggio, Torre and Koch showed that solutions to many of computational problems of low vision could be formu-lated in terms of maximum a posteriori estimates of scene attributes if the generative model could be described as a matrix multiplication, where the image I is matrix mapping of a scene vector S:Then a solution corresponded to minimizing a cost function E, that simultaneously tries to minimize the cost due to reconstructing the image from the current hypothesis S, and a prior "smoothness" constraint on S. l is a (often free) parameter that determines the balance between the two terms. If there is reason to trust the data, then l is small; but if the data is unreliable, then more emphasis should be placed on the prior, thus l should be bigger.For example, S could correspond to representations of shape, stereo, edges, or motion field, and smoothness be modeled in terms of nth order derivatives, approximated by finite differences in matrix B.The Bayesian interpretation comes from multivariate gaussian assumptions on the generative model:‡DiscountingThis Bayes net describes the case where the joint distribution can be factored as:pHs1, s2, IL = p(I|s1,s2)p(s1)p(s2)Optimal inference for this task requires that we calculate the marginal posterior:p(s1|I) ∝ ŸS2pHs1, s2» IL „s2Liu, Knill & Kersten (1995) describe an example with: I -> 2D x-y image measurements, s1-> 3D object shape, and s2-> viewBloj et al. (1999) have an example estimating s1-> surface chroma (saturation) with s2-> illuminant direction.4 25.PerceptualIntegration.nbThis Bayes net describes the case where the joint distribution can be factored as:pHs1, s2, IL = p(I|s1,s2)p(s1)p(s2)Optimal inference for this task requires that we calculate the marginal posterior:p(s1|I) ∝ ŸS2pHs1, s2» IL „s2Liu, Knill & Kersten (1995) describe an example with: I -> 2D x-y image measurements, s1-> 3D object shape, and s2-> viewBloj et al. (1999) have an example estimating s1-> surface chroma (saturation) with s2-> illuminant direction.‡Ideal observer for the "snap shot" model of visual recognition: Discounting viewsTjan et al. describe an application to object recognition in noise.Eight views of four objects. (See Tjan B., Braje, W., Legge, G.E. & Kersten, D. (1995) Human efficiency for recognizing 3-D objects in luminance noise.Vision Research, 35, 3053-3069.)25.PerceptualIntegration.nb 5Eight views of four objects. (See Tjan B., Braje, W., Legge, G.E. & Kersten, D. (1995) Human efficiency for recognizing 3-D objects in luminance noise.Vision Research, 35, 3053-3069.)Let X = the vector describing the image data. Let Oi represent object i, where i = 1 to N. Suppose that Oi is represented in memory by M "snap shots" of each object, call them views (or templates) Vij, where j = 1, M.p HOi» XL =‚j=1Mp IVij» XM=‚j=1Mp IX » VijM p IVijMp HXLGiven image data, Ideal observer chooses i that maximizes the posterior p HOi» XL. If we assume that the p(X) is uniform, the optimal strategy is equivalent to choosing i that maximizes:L HiL =‚j=1Mp IX » VijM p IVijMIf we assume i.i.d additive gaussian noise (as we did for the signal-known-exactly detection ideal), thenp IX » VijM =1Js 2 p Np exp -12 s2 °X - Vij¥2where p is the number of pixels in the image.Tjan et al. showed that size, spatial uncertainty and detection efficiency played large roles in accounting for human object recognition efficiency. Interestingly, highest recognition efficiencies (~7.8%) were found for small silhouettes of the objects. (The small silhouettes were 0.7 deg, vs. 2.4 deg for the large silhouettes).6 25.PerceptualIntegration.nb‡now..more on Cue


View Full Document

U of M PSY 5036W - Perceptual integration, Cooperative Computation

Download Perceptual integration, Cooperative Computation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Perceptual integration, Cooperative Computation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Perceptual integration, Cooperative Computation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?