U of M PSY 5038 - Unifying neural network computations using Bayesian decision theory

Unformatted text preview:

Introduction to Neural NetworksUnifying neural network computations using Bayesian decision theory‡Initialize:<< "MultivariateStatistics`"Off@General::spell1D;SetOptions@ListDensityPlot, ImageSize Ø SmallD;SetOptions@DensityPlot, ImageSize Ø SmallD;SetOptions@ContourPlot, ImageSize Ø SmallD;IntroductionLast timeReview: Graphical Models of dependenceNatural patterns are complex, and in general it is difficult and often impractical to build a detailed quantitative genera-tive model. But natural inputs, such as sounds and images, do have regularities, and we can get insight into the problem by considering how various factors might produce them.One way to begin simplifying the problem is to note that not all variables have a direct influence on each other. So draw a graph in which lines only connect variables that influence each other. We use directed graphs to represent conditional probabilities.Basic rules: Condition on what is known, and integrate out what you don't care about‡Condition on what is known:Given a state of the world S, and inputs I, the "universe" of possibilities is:(1)p HS, ILIf we know I (i.e. the visual system has measured some image feature I), the joint can be turned into a conditional (posterior):If we know I (i.e. the visual system has measured some image feature I), the joint can be turned into a conditional (posterior):(2)p HS IL = p HS, ILêp HIL‡Integrate out what we don't care aboutWe don't care to estimate the noise (or other generic, nuisance, or secondary variables):(3)p ISsignalIM =‚Snoisep ISsignal, SnoiseIM,or if continuous =‡Snoisep ISsignal, SnoiseIM„ SnoiseCalled "integrating out" or "marginalization"Graphical models and general inference‡Three types of nodes in a graphical model: known, unknown to be estimated, unknown to be integrated out (marginalized)We have three basic states for nodes in a graphical model: knownunknown to be estimatedunknown to be integrated out (marginalization). Many problems in perception and cognition can be approached by first analyzing the causes that produces the sensory input. A causal state of the world S, gets mapped to some input data I, perhaps through some intermediate parameters L, i.e. S->L->I. And then one can ask in order to achieve some behavior goal, what kind of information needs to be extracted about the causes.So for example, face identity S determines facial shape L. L with other factors, like illumination in turn determines the image input data I itself. Consider three very broad types of task:‡Data inference: synthesis Data synthesis (generative or forward model): We want to model I through p(I|S). In our example, we want to specify "Bill", and then p(I|S="Bill") can be implemented as an algorithm to spit out images of Bill. If there is an intermediate variable, L, it gets integrated out.‡Hypothesis ("inverse") inference or estimationHypothesis inference: we want to model samples for S: p(S|I). Given an image,we want to spit out likely object identities, so that we can minimize risk, or do MAP classification for accurate identification. Again there is an interme-diate variable, L, it gets integrated out. ‡Learning (parameter inference)learning can also be viewed as estimation: we want to model L: p(L|I,S), to learn how the intermediate variables are distributed. Given lots of samples of outputs and their inputs, we want to learn the mapping parameters between them. (Alternatively, do a mental switch and consider a neural network in which an input S gets mapped to an output I through intermediate variables L. We can think of L as representing synaptic weights to be learned.)Two basic examples in standard statistics are:Regression: estimating parameters that provide a good fit to data. E.g. slope and intercept for a straight line through points {xi,yi}.Density estimation: Regression on a probability density functions, with the added condition that the area under the fitted curve must sum to one.2 Lect26_DecisionTheory.nblearning can also be viewed as estimation: we want to model L: p(L|I,S), to learn how the intermediate variables are distributed. Given lots of samples of outputs and their inputs, we want to learn the mapping parameters between them. (Alternatively, do a mental switch and consider a neural network in which an input S gets mapped to an output I through intermediate variables L. We can think of L as representing synaptic weights to be learned.)Two basic examples in standard statistics are:Regression: estimating parameters that provide a good fit to data. E.g. slope and intercept for a straight line through points {xi,yi}.Density estimation: Regression on a probability density functions, with the added condition that the area under the fitted curve must sum to one.Recall: Fruit classification example The the graph specifies how to decompose the joint probability: p[F, C, Is, Ic ] = p[ Ic | C ] p[C | F ] p[Is | F ] p[F ]‡Three MAP tasksPick most probable fruit AND color--Answer "red tomato"Pick most probable color--Answer "red"Pick most probable fruit--Answer "apple"Why didn't "red tomato", the most probable fruit/color combination, predict that the most probable fruit is apple?Some basic graph types in vision‡Basic Bayesp@S ID =p@I SDp@SDp@IDS (the scene), and I is (the image data), and I = f(S).We'd like to have:p(S|I), where is the posterior probability of the scene given the image-- i.e. what you get when you condition the joint by the image data. The posterior is often what we'd like to base our decisions on, because as we discuss below, picking the hypothesis S which maximizes the posterior (i.e. maximum a posteriori or MAP estimation) minimizes the average probability of error.p(S) is the prior probability of the scene.p(I|S) is the likelihood of the scene. Note this is a probability of I, but not of S.Lect26_DecisionTheory.nb 3S (the scene), and I is (the image data), and I = f(S).We'd like to have:p(S|I), where is the posterior probability of the scene given the image-- i.e. what you get when you condition the joint by the image data. The posterior is often what we'd like to base our decisions on, because as we discuss below, picking the hypothesis S which maximizes the posterior (i.e. maximum a posteriori or MAP estimation) minimizes the average probability of error.p(S) is the prior probability of the scene.p(I|S) is the likelihood of the scene. Note this is a probability of I, but not of S.See: Sinha, P., & Adelson, E. (1993). Recovering


View Full Document

U of M PSY 5038 - Unifying neural network computations using Bayesian decision theory

Download Unifying neural network computations using Bayesian decision theory
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Unifying neural network computations using Bayesian decision theory and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Unifying neural network computations using Bayesian decision theory 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?