DOC PREVIEW
UB CSE 555 - Bayes Decision Theory

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Bayes Decision TheoryMinimum-Error-Rate ClassificationMinimum Error Rate Classifier DerivationClassifiers, Discriminant Functionsand Decision SurfacesForms of Discriminant FunctionsDecision RegionThe Two-Category caseThe Normal DistributionRelationship between Entropy and Normal DensityNormal Distribution, Mean 0, Standard Deviation 1The Normal Density in Pattern RecognitionMultivariate densityMean and Covariance MatrixMultivariate Normal DensityLinear Combinations of Normally distributed variables are normally distributedMahanalobis DistanceCSE 555: Srihari0Bayes Decision TheoryMinimum-Error-Rate ClassificationClassifiers, Discriminant Functions and Decision SurfacesThe Normal DensityCSE 555: Srihari1Minimum-Error-Rate Classification•Actions are decisions on classesIf action αiis taken and the true state of nature is ωjthen: decision is correct if i = jand in error if i ≠ jSeek a decision rule that minimizes the probability of errorwhich is the error rateCSE 555: Srihari2Minimum Error Rate Classifier Derivation• zero-one loss function:• Therefore, the conditional risk is: The risk corresponding to this loss function is the average probability error”• Minimize the risk requires maximize P(ωi| x)(since R(αi| x) = 1 – P(ωi| x))For Minimum error rateDecide ωiif P (ωi| x) > P(ωj| x) ∀j ≠ i c,...,1j,i ji 1ji 0),(ji=⎩⎨⎧≠==ωαλ∑∑≠==−===1jijcj1jjjii)x|(P1)x|(P )x|(P)|()x|(RωωωωαλαCSE 555: Srihari3• Regions of decision and zero-one loss function, • If λ is the zero-one loss function which means:b12a12)(P)(P2 then 0 12 0 if)(P)(P then0 11 0θωωθλθωωθλλλ==⎟⎟⎠⎞⎜⎜⎝⎛===⎟⎟⎠⎞⎜⎜⎝⎛=λλθωωωθωωλλλλ>=−−)|x(P)|x(P :if decide then )(P)(P. Let2111211212212Class-conditional pdfsLikelihood Ratio Classification• Likelihood Ratio p(x/ω1)/p(x/ω2).• If we use a zero-one loss functiondecision boundaries are determined bythreshold θa. • If loss function penalizes miscategorizing ω2as ω1more than converse we get larger threshold θband hence R1becomes smallerCSE 555: Srihari4Classifiers, Discriminant Functionsand Decision Surfaces• Many methods of representing pattern classifiersSet of discriminant functions gi(x), i = 1,…, cClassifier assigns feature x to class ωiif gi(x) > gj(x) ∀j ≠ iClassifier is a machinethat computes cdiscriminant functionsFunctional structure of a general statistical patternClassifier with dinputs andcdiscriminant functionsgi(x)CSE 555: Srihari5Forms of Discriminant Functions•Let gi(x) = - R(αi| x)(max. discriminant corresponds to min. risk!)•For the minimum error rate, we take gi(x) = P(ωi| x)(max. discrimination corresponds to max. posterior!)gi(x) ≡ P(x | ωi) P(ωi)gi(x) = ln P(x | ωi) + ln P(ωi)CSE 555: Srihari6Decision Region• Feature space divided into c decision regionsif gi(x) > gj(x) ∀j ≠ i then x is in Ri2-D, two-category classifier with Gaussian pdfsDecision Boundary = two hyperbolasHence decision region R2 is not simply connectedEllipses mark where density is 1/e times that of peak distributionCSE 555: Srihari7The Two-Category caseA classifier is adichotomizerthat has two discriminant functions g1and g2Let g(x) ≡ g1(x) – g2(x)Decide ω1 if g(x) > 0 ; Otherwise decide ω2The computation of g(x))(P)(Pln)|x(P)|x(Pln )x|(P)x|(P)x(g212121ωωωωωω+=−=CSE 555: Srihari8The Normal DistributionA bell-shaped distribution defined by the probability density functionIf the random variable X follows a normal distribution, then• The probability that X will fall into the interval (a,b) is given by• Expected, or mean, value of X is• Variance of X is• Standard deviation of X, , is2)(21221)(σµπσ−−=xexp∫badxxp )(µ∫∞∞−== dxxxpXE )(][222 )()(])[()(σµµ∫∞∞−=−=−= dxxpxxExVar2σσσ=xCSE 555: Srihari9Relationship between Entropy and Normal DensityEntropy of a distribution∫∞∞−= dxxpxpxpH )(ln)())((Measured in nats. If log2is uses the unit is bitsEntropy measures uncertainty in the values of points selected randomly from a distributionNormal distribution has maximum entropy over all distributions having a given mean and varianceCSE 555: Srihari10Normal Distribution, Mean 0, Standard Deviation 1With 80% confidence the r.v. will lie in the two-sided interval[-1.28,1.28]CSE 555: Srihari11The Normal Density in Pattern Recognition• Univariate density• Analytically tractable, continuous• A lot of processes are asymptotically Gaussian• Central Limit Theorem: aggregate effect of a sum of a large number of small, independent random disturbances will lead to a Gaussian distribution• Handwritten characters, speech sounds are ideal or prototype corrupted by random process Where: µ= mean (or expected value) of xσ2= expected squared deviation or variance,x21exp 21)x(P2⎥⎥⎦⎤⎢⎢⎣⎡⎟⎠⎞⎜⎝⎛−−=σµσπUnivariate normal distribution has roughly 95% of its area in the range|x-µ|<2σ.The peak of the distribution has value p(µ)=1/sqrt(2πσ)CSE 555: Srihari12Multivariate densityMultivariate normal density in d dimensions is:where:x = (x1, x2, …, xd)t(t stands for the transpose vector form)µ = (µ1, µ2, …, µd)tmean vectorΣ = d*dcovariance matrix|Σ|and Σ-1are determinant and inverse respectively),(~)()()(21exp)2(1)(12/12/Σ⎥⎦⎤⎢⎣⎡−Σ−−Σ=−µµµπNxpasdabbreviatexxxptdCSE 555: Srihari13Mean and Covariance Matrix•Formal Definitionsdxxpxxxxdxxxpxtt)())(())([()(][∫∑∫−−=−−Ε==Ε=µµµµµMean vector has its components which are means of variablesCovariance :Diagonal elements are variances of variablesCross-diagonal elements are covariances of pairs of variablesStatistical independence means off-diagonal elements are zeroCSE 555: Srihari14Multivariate Normal Density•Specified by d+d(d+1)/2 parameters: mean and independent elements of covariance matrixLocii of pointsof constantdensity arehyperellipsoidsSamples drawn from a 2-D Gaussian lie in a cloud centered at the mean µ.Ellipses show lines of equal probability density of the GaussianCSE 555: Srihari15Linear Combinations of Normally distributed variables are normally distributedWhitening Transformmatrixidentity thetoequalmatrix covariance hason distributi sformed that tranensuresscoordinate the toappliedAation transformthen theesEigen valu ingcorrespond ofmatrix diagonal isA and oforsEigen vectlorthonormathearecolumnssematrix whois2/1w−Φ=∑ΦAAction of a linear


View Full Document
Download Bayes Decision Theory
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayes Decision Theory and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayes Decision Theory 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?