UCSD ECE 271A - The Gaussian Classifier - D1583021

Home> Schools> University of California, San Diego> Electrical & Computer Engineer (ECE) > ECE 271A> The Gaussian Classifier

DOC PREVIEW

UCSD ECE 271A - The Gaussian Classifier

School name University of California, San Diego

Course Ece 271a- Statistical Learning I

Pages 43

This preview shows page 1-2-3-20-21-22-41-42-43 out of 43 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 43 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

The Gaussian classifierNuno Vasconcelos ECE Department, UCSDp,Bayesian decision theoryrecall that we have•Y–state of the worldY –state of the world• X – observations• g(x) – decision function• L[g(x),y] – loss of predicting y with g(x)Bayes decision rule is the rule that minimizes the riskgiven x, it consists ofpicking the prediction of minimum[]),(,YXLERiskYX=given x, it consists of picking the prediction of minimum conditional risk]),([)|(minarg)(|*ixgLxiPxgMXY∑=2]),([)|(minarg)(1|)(ixgLxiPxgiXYxg∑=MAP rulefor the “0-1” loss⎧)(1th ti l d i i l i thitii⎩⎨⎧=≠=yxgyxgyxgL)(,0)(,1]),([the optimal decision rule is the maximum a-posteriori probability rule)|(maxarg)(*xiPxgthe associatedrisk is the probability of errorof this rule)|(maxarg)(*|xiPxgXYi=the associated risk is the probability of errorof this rule (Bayes error)there is no other decision function with lower error3MAP ruleby application of simple mathematical laws (Bayes rule, monotonicity of the log)yg)we have shown that the following three decision rules are optimal and equivalent• 1))|(maxarg)(|*xiPxiXYi=[]• 2)[])()|(maxarg)(|*iPixPxiYYXi=• 3)[])(log)|(logmaxarg)(|*iPixPxiYYXi+=4• 1) is usually hard to use, 3) is frequently easier than 2)Examplethe Bayes decision rule is usually highly intuitivewehaveusedanexamplefromcommunicationswe have used an example fromcommunications• a bit is transmitted by a source, corrupted by noise, and received by a decoderchannelYX5• Q: what should the optimal decoder do to recover Y?Examplethis was modeled as a classification problem with Gaussian classes222)(1),,(σµσµ−−=xexG)()1|(),,()0|(0|σµGPxGxPYX=• or, graphically,22),,(πσµ),,()1|(1|σµxGxPYX=6µ1µ0BDRfor which the optimal decision boundary is a threshold•pick“0”if)(2pick 0 if)1()0(log201201YYPPxµµσµµ−++<µ1µ0ik1ik07pick 1pick 0BDRback to our signal decoding problem•in this case T = 0 5in this case T = 0.5• decision rule⎩⎨⎧><=5.0 if,15.0 if,0 xxY⎩• this is intuitive8• we place the threshold midway along the noise sources and adapt according to the class priorBDRwhat is the role of the prior for class probabilities?)(2the priormoves the threshold up or downdepending on the)1()0(log201201YYPPxµµσµµ−++<•the prior moves the threshold up or down, depending on the probabilities of 0s and 1s.how relevant is the prior?• it is weighed by the inverse of the normalized distance between the means1µµ• if the classes are very far apart, the prior makes no difference201σµµ−9y• if the classes are exactly equal (same mean) the prior gets infinite weightThe Gaussian classifierthis is one example of a Gaussian classifier•in practice werarelyhave only one variablein practice we rarelyhave only one variable• typically X = (X1, …, Xn) is a vector of observationsthe BDR for this case is equivalent, but more interesting qgthe central different is the class-conditional distributions are multivariate Gaussian⎬⎫⎨⎧Σ=−)()(11)|(1|TYXixP⎭⎬⎫⎩⎨⎧−Σ−−Σ)()(2exp||)2(1iiTiidxxµµπ10The Gaussian classifierin this case⎬⎫⎨⎧Σ−)()(11)|(1TiP•theBDR⎭⎬⎫⎩⎨⎧−Σ−−Σ=)()(2exp||)2()|(1| iiTiidYXxxixPµµπthe BDR[])(log)|(logmaxarg)(|*iPixPxiYYXi+=• becomes⎢⎡Σ−)()(1)(1*xxxiT⎥⎤+Σ⎢⎣⎡−Σ−−=)(l)2l(1)()(2maxarg)(1iPxxxidiiTiiµµ11⎥⎦⎤+Σ−)(log)2log(2iPYidπThe Gaussian classifierdi i i tthis can be written as[]iiixdxiαµ+=),(minarg)(*discriminant:PY|X(1|x ) = 0.5with[]iiiixdxiαµ+),(minarg)()()(),(1yxyxyxdiTi−Σ−=−the optimal rule is toassign x to the closest class)(log2)2log(iPYidi−Σ=παthe optimal rule is to assign x to the closest classclosest is measured with the Mahalanobis distance di(x,y)to which theαconstantis added to account for the class12to which theαconstant is added to account for the class priorThe Gaussian classifierfirst special case of interest:• all classes have the same covariance,thBDRbii∀Σ=Σ ,the BDRbecomes[]iixdxiαµ+= ),(minarg)(*• withisame metric for )()(),(1yxyxyxdT−Σ−=−)(l2)2l(PdΣall classesconstant, not function13)(log2)2log(iPYdi−Σ=πα,of i, can be droppedThe Gaussian classifierin detail[]1*[])(log2)()(minarg)(1*iPxxxiYiTii−−Σ−=−µµ[])(log2minarg1111iPxxxxTTTTΣ+ΣΣΣ=−−−−µµµµ[])(log2minargiPxxxxYiiiii−Σ+Σ−Σ−Σ=µµµµ[])(log22minarg111iPxxxYiTiTiT−Σ+Σ−Σ=−−−µµµ[]i⎥⎥⎤⎢⎢⎡ΣΣ)(l111TTiP⎥⎥⎥⎦⎢⎢⎢⎣+Σ−Σ=−−444434444213210)(log21maxarg11iTiwYiTiwTiiiPxµµµ14The Gaussian classifierin summary,discriminant:PY|X(1|x ) = 0.5with)(maxarg)(*xgxiii=•with)(10wxwxgiTii+=)(log111iPwwTii+Σ−=Σ=−−µµµ• the BDR is a linear function or a linear discriminant)(log20iPwYiii+Σ=µµ15Geometric interpretationclasses i,j share a boundary if •there is asetofxsuch thatthere is a set of xsuch that)()( xgxgji=• or ()()0T()()000=−+−jijiwwxww()11+ΣΣ−−xTµµ()0)(log21)(log2111=⎠⎞⎜⎝⎛−Σ++Σ−+Σ−Σ−−jPiPxYjTjYiTijiµµµµµµ1622⎠⎝YjjYiiGeometric interpretationnote that()11+ΣΣ−−xTµµ()0)(log21)(log2111=⎟⎠⎞⎜⎝⎛−Σ++Σ−+Σ−Σ−−jPiPxYjTjYiTijiµµµµµµ• can be written as())(1⎞⎛iP22⎠⎝next we use()0)()(log221111=⎟⎠⎞⎜⎜⎝⎛−Σ−Σ−Σ−−−−jPiPxYYjTjiTiTjiµµµµµµnext, we use=Σ−Σ−−TTTTjTjiTiµµµµ11111117=Σ−Σ+Σ−Σ−−−−jTjjTijTiiTiµµµµµµµµ1111Geometric interpretationwhich can be written as11TTΣΣ−−111111jTjjTijTiiTijTjiTiµµµµµµµµµµµµ=Σ−Σ+Σ−Σ=Σ−Σ−−−−)()()()(1111jiTjjiTijTjijiTiµµµµµµµµµµµµ=−Σ+−Σ=Σ−+−Σ−−−−)()(1jiTjijjjµµµµ−Σ+−using this in()0)(log21111=⎞⎜⎜⎛+ΣΣΣ−−−iPxYTTTµµµµµµ18()0)(log22=⎠⎜⎜⎝+−Σ−Σ−Σ−jPxYjjiijiµµµµµµGeometric interpretationleads to() ()()310)()(log22111=⎟⎟⎠⎞⎜⎜⎝⎛+−−Σ+−Σ−−−jPiPxYYjiTjiTjiµµµµµµ4444444444434444444444421)(01wbxwT−Σ==+−µµ)()(log2)()()(1jPiPbwYjiTjiji+−Σ+−=Σ=−µµµµµµthis is the equation of the hyper-plane of parameters)(2jPY19hyperplane of parametersw and bGeometric interpretationwhich can also be written

View Full Document