U of M PSY 5038 - Visual integration, Cooperative Computation - D797227

Home> Schools> University of Minnesota- Twin Cities> Psychology (PSY) > PSY 5038> Visual integration, Cooperative Computation

U of M PSY 5038 - Visual integration, Cooperative Computation

School name University of Minnesota- Twin Cities

Course Psy 5038- Introduction to Neural Networks

Pages 22

Download Save

Unformatted text preview:

Introduction to Neural NetworksDaniel KerstenVisual integration, Cooperative ComputationInitialize‡Spell check offOff@General::spell1D;OutlineLast timeMotion measurement -- local, ambiguousMotion integration -- combine local ambiguous measurements to determine global directionToday‡Use Bayesian framework to better understand how perceptual information gets put together. Look at examples of actual calculations.‡Modular vs. cooperative computationVisual estimation is studied as if it is done in distinct "modules", such as, reflectance estimatation (lightness), shape-from-shading, optic flow or structure-from-motion.In contrast to the modularity theories of vision, it is phenomenally apparent that visual information is integrated to provide a strikingly singular description of the visual environment. By looking at how human perception puts inte-grates scene attributes, we may get some idea of how vision modules in the brain interact, and what they represent.Overview of Bayesian integration of visual informationSee: Kersten, D., & Yuille, A. (2003) and Kersten, Mamassian & Yuille (2004)‡Basic Bayes: Integration of visual image measurements (features) with prior knowledgep@S ID =p@I SD p@SDp@ID~ p@I SD p@SDS is a random variable over the hypothesis space, and I as data. So for visual inference, S (a description of the scene or object), and I (the image data, measureable features), and I = f(S).We'd like to have:p(S|I) is the posterior probability of the scene given the image-- i.e. what you get when you condition the joint by the image data. The posterior is often what we'd like to base our decisions on, because picking the hypothesis S which maximizes the posterior (i.e. maximum a posteriori or MAP estima-tion) minimizes the average probability of error.p(S) is the prior probability of the scene.p(I|S) is the likelihood of the scene. We've seen that the idea of prior assumptions that constrain otherwise underconstrained vision problems is a theme that pervades much of visual perception. Where do the priors come from? These are “built in”, learned early on or hardwired from birth, and others learned in adulthood. See: Adams, W. J., Graf, E. W., & Ernst, M. O. (2004). Experience can change the 'light-from-above' prior. Nat Neurosci, 7(10), 1057-1058 for a recent example of learning the light from above prior for shape perception.‡General Bayesian theory of low-level visual integration for separate “modules”We've seen a number of applications of Basic Bayes, including the algorithms for shape from shading and optic flow.In 1985, Poggio, Torre and Koch showed that solutions to many of computational problems of early vision could be formulated in terms of maximum a posteriori estimates of scene attributes if the generative model could be described as a matrix multiplication, where the image I is matrix mapping of a scene vector S:Then a solution corresponded to minimizing a cost function E, that simultaneously tries to minimize the cost due to reconstructing the image from the current hypothesis S, and a prior "smoothness" constraint on S. l is a (often free) parameter that determines the balance between the two terms. If there is reason to trust the data, then l is small; but if the data is unreliable, then more emphasis should be placed on the prior, thus l should be bigger.For example, S could correspond to representations of shape, stereo, edges, or motion field, and smoothness be modeled in terms of nth order derivatives, approximated by finite differences in matrix B.The Bayesian interpretation comes from multivariate gaussian assumptions on the generative model:2 25.VisualIntegration.nb(1)(2)From Poggio, Torre & Koch, 1985A key point is that the maximum a posteriori solution based on equations 1 and 2 above is linear. Thus given the “right” representation, a broad range of estimation problems can be modeled as simple linear networks. However, we noted early on that there are also severe limitations to linear estimation. We’ll look at some of the challenges below.25.VisualIntegration.nb 3‡Discounting: Emphasing one model cause of image data over anotherThis graph describes the case where the joint distribution can be factored as:pHs1, s2, IL = p(I|s1,s2)p(s1)p(s2)Optimal inference for this task requires that we calculate the marginal posterior:p(s1|I) µ∝ ŸS2pHs1, s2IL „s2Liu, Knill & Kersten (1995) describe an example with: I -> 2D x-y image measurements, s1-> 3D object x-y-z coordinates, and s2-> viewBloj et al. (1999) have a Bayesian example in color vision where, estimating s1-> surface chroma (saturation) with s2-> illuminant direction.‡Ideal observer for the "snap shot" model of visual recognition: Discounting viewsTjan et al. describe an application to object recognition in noise. Their idea was to measure how efficiently human observers recognize objects given view variation, and different types of image information. (See Tjan et al. (1995) Human efficiency for recognizing 3-D objects in luminance noise.Vision Research, 35, 3053-3069.)4 25.VisualIntegration.nbEight views of four objects are shown above. Four different types of image information are shown below: shaded, large silhouette, line drawing, and small sillhouette. It had previously been shown that humans can name objects just as rapidly for line-drawing versions as for fully shaded versions of the same objects (Biederman & Ju, 1988). But it isn’t clear whether this may be due to high efficiency for line-drawings, i.e. that the visual system is in some sense “well-tuned” for line drawings. Let’s look specifically at how an ideal recognizer can be calculated that discounts views.Let X = the vector describing the image data. Let Oi represent object i, where i = 1 to N. Suppose that Oi is represented in memory by M "snap shots" of each object, call them views (or templates) Vij, where j = 1, M. Given image data X, the posterior probability of Oi is computed by integrating out view, i.e. summing or “marginalizing” with respect to the M viewpoints.25.VisualIntegration.nb 5Let X = the vector describing the image data. Let Oi represent object i, where i = 1 to N. Suppose that Oi is represented in memory by M "snap shots" of each object, call them views (or templates) Vij, where j = 1, M. Given image data X, the posterior probability of Oi is computed by integrating out view, i.e. summing or “marginalizing” with respect to the M viewpoints.p HOiXL =‚j=1Mp

View Full Document


School:
Email:
New Password:
Confirm Password:

U of M PSY 5038 - Visual integration, Cooperative Computation

Sign up for free to view:

Please select your school