U of M PSY 5036W - Probability Overview - D2714753

Home> Schools> University of Minnesota- Twin Cities> Psychology (PSY) > PSY 5036W> Probability Overview

DOC PREVIEW

U of M PSY 5036W - Probability Overview

School name University of Minnesota- Twin Cities

Course Psy 5036w- Computational Vision

Pages 34

This preview shows page 1-2-16-17-18-33-34 out of 34 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 34 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Probability Overview‡Initialize standard library files:Off@General::spell1D;The next package is needed for the add-on multivariate gaussianGoalsReview the basics of probability distributions and statisticsMore on generative modeling: drawing samplesGraphical models for inferenceOptimal inference and Task dependenceProbability overviewRandom variables, discrete probabilities, probability densities, cumulative distributions‡Discrete: random variable X can take on a finite set of discrete valuesX = {x(1),...,x(N)]‚i=1Npi=‚i=1NpHX = xHiLL= 1‡Densities: X takes on continuous values, x, in some range.Density : pHxLAnalogous to material mass,we can think of the probability over some small domain of the random variable as " probability mass " :probHx < X < dx + xL=‡XdX+XpHxL „ xprobHx < X < dx + xL> pHxLdxWith the mass analogy, however, an object Hevent spaceL always " weighs " 1 :‡-¶¶pHxL „ x = 1Cumulative distribution:prob H X < xL=‡-¶xpHXL „ X‡Densities of discrete random variablesThe Dirac Delta function, d[•], allows us to use the mathematics of continuous distributions for discrete ones, by defining the density as:p[x]=⁄i=1Npid[x - x[i]], where d[x - x[i]] =:¶0for x = x@iDfor x ≠ x@iDThink of the delta function, d[•], as e wide and 1/e tall, and then let e -> 0, so that:‡-¶¶dHyL „ y = 1The density, p[x], is a series of spikes. It is infinitely high only at those points for which x = x[i], and zero elsewhere. But "infinity" is scaled so that the local mass or area around each point x[i], is pi.‡Joint probabilitiesProb HX AND YL= pHX, YLJoint density : pHx, yL2 ProbabilityOverview.nbThree basic rules of probabilitySuppose we know everything there is to know about a set of variables (A,B,C,D,E). What does this mean in terms of probability? It means that we know the joint distribution, p(A,B,C,D,E). In other words, for any particular combination of values (A=a,B=b, C=c, D=d,E=e), we can calculate, look up in a table, or determine some way or another the number p(A=a,B=b, C=c, D=d,E=e).Deterministic relationships are special cases. ‡Rule 1: Conditional probabilities from joints: The product ruleProbability about an event changes when new information is gained.Prob(X given Y) = p(X|Y)pHX »YL=pHX, Y LpHYLpHX, Y L= pHX »YLpHYLThe form of the product rule is the same for densities as for probabilities.‡Rule 2: Lower dimensional probabilities from joints: The sum rule (marginalization)pHXL=‚i=1NpHX, Y HiLLpHxL=‡-¶¶pHx, yL „ x‡Rule 3: Bayes' ruleFrom the product rule, and since p[X,Y] = p[Y,X], we have:pHY »XL=pHX »YLpHYLpHXL, and using the sum rule, pHY »XL=pHX »YLpHYL⁄YpHX, Y LProbabilityOverview.nb 3‡Bayes Terminology in inferenceSuppose we have some partial data (see half of someone's face), and we want to recall or complete the whole. Or suppose that we hear a voice, and from that visualize the face. These are both problems of statistical inference. We've already studied how to complete a partial pattern using energy minimization, and how energy minimization can be viewed as probability maximization.We typically think of the Y term as a random variable over the hypothesis space (a face), and X as data or a stimulus (partial face, or sound). So for recalling a pattern Y from an input stimulus X, We'd like to have a function that tells us:p(Y | X) which is called the posterior probability of the hypothesis (face) given the stimulus (partial face or sound).-- i.e. what you get when you condition the joint by the stimulus data. The posterior is often what we'd like to base our decisions on, because it can be proved that picking the hypothesis Y which maximizes the posterior (i.e. maximum a posteriori or MAP estimation) minimizes the average probability of error.p(Y) is the prior probability of the hypothesis. Some hypotheses are more likely than others. Given a context, such as your room, some faces are a priori more likely than others. For me an image patch stimulating my retina in my kitchen is much more likely to be my wife's than my brother's (who lives in another state). This shows that priors are contingent, i.e. conditional on context, p(Y| context).p(X|Y) is the likelihood of the hypothesis. Note this is a probability of X, but not of Y.(The sum over X is one, but the sum over Y isn't necessarily one.)‡Bayes Terminology in visual perceptionp@S »ID =p@I »SDp@SDp@IDUsually, we will be thinking of the Y term as a random variable over the hypothesis space, and X as data. So for visual inference, Y = S (the scene), and X = I (the image data), and I = f(S).We'd like to have:p(S|I) is the posterior probability of the scene given the image-- i.e. what you get when you condition the joint by the image data. The posterior is often what we'd like to base our decisions on, because as we discuss below, picking the hypothesis S which maximizes the posterior (i.e. maximum a posteriori or MAP estimation) minimizes the average probability of error.p(S) is the prior probability of the scene.p(I|S) is the likelihood of the scene. Note this is a probability of I, but not of S.4 ProbabilityOverview.nb‡IndependenceKnowledge of one event doesn't change the probability of another event. p(X)=p(X|Y)p(X,Y)=p(X)p(Y)Density mapping theoremSuppose we have a change of variables that maps a discrete set of x's uniquely to y's: X->Y.‡Discrete random variablesNo change to probability function. The mapping just corresponds to a change of labels, so the probabilities p(X)=p(Y).‡Continuous random variablesForm of probability density function does change because we require the probability "mass" to be unchanged: p(x)dx = p(y)dySuppose, y=f(x)ProbabilityOverview.nb 5Suppose, y=f(x)pY HyL dy = pX HxL dxIn higher dimensions, the transformation is done by multiplying the density by the Jacobian, the determinant of the matrix of partial derivatives of the change of coordinates.pY(y)=ŸdHy - f HxLL f-1HxL pXHxL „ xover each monotonic part of f.Convolution theorem for adding rvsLet x be distributed as g(x), and y as h(x). Then the probability density for z=x+y is, f(z):(1)f HzL =‡g HsL h Hz - sL „sStatistics‡Expectation & varianceAnalogous to center of mass:Definition of expectation or average:Average@XD = Xê= E@XD = S x@iD p@x@iDD ~‚i=1NxiêNm = E@XD=‡x pHxL dxSome rules:E[X+Y]=E[X]+E[Y]E[aX]=aE[X]E[X+a]=a+E[X]Definition of variance:s2=Var[X] = E[[X-m]^2]=⁄j=1NHHpHxHjLLL HxHjL- mLL2=⁄j=1NpjIxj- mMM2Var@XD=‡Hx - mL2pHxL dx

View Full Document