UCSD ECE 271A - Bayesian Decision Theory - D2008354

Home> Schools> University of California, San Diego> Electrical & Computer Engineer (ECE) > ECE 271A> Bayesian Decision Theory

DOC PREVIEW

UCSD ECE 271A - Bayesian Decision Theory

School name University of California, San Diego

Course Ece 271a- Statistical Learning I

Pages 27

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Bayesian decision theoryBayesian decision theoryBayesian decision theoryMAP ruleMAP ruleMAP ruleMAP ruleMAP ruleThe log trickMAP ruleExampleExampleExampleExampleExampleExampleBDRBDRBDRBDRBDRBDRBDRBDRBDRBDRBayesian decision theoryNuno Vasconcelos ECE Department, UCSD2Bayesian decision theoryrecall that we have• Y – state of the world• X – observations• g(x) – decision function• L[g(x),y] – loss of predicting y with g(x)Bayes decision rule is the rule that minimizes the riskgiven x, it consists of picking the prediction of minimum conditional risk[]),(,YXLERiskYX=]),([)|(minarg)(1|)(*ixgLxiPxgMiXYxg∑==3Bayesian decision theorythe associated risk • is the Bayes risk, and cannot be beatenfor a binary classification problem• optimal decision is to “pick 0” if• i.e. we pick 0, when the probability of X given that Y=0 dividedby that given Y=1 is greater than a threshold• the optimal threshold T* depends on the costs of the two types of error and the probabilities of the two classes ⎥⎦⎤⎢⎣⎡=∑=]),([)|(**1|ixgLxiPERMiXYX}1,0{)(*∈xg)0(]0,1[)1(]1,0[)1|()0|(*||YYYXYXPLPLTxPxP=>4MAP rulelet’s consider the “0-1” loss• in this case the optimal decision function is⎩⎨⎧=≠=yxgyxgyxgL)(,0)(,1]),([]),([)|(minarg)(1|)(*ixgLxiPxgMiXYxg∑==)|(minarg)(|)(xiPxgiXYxg∑≠=[])|)((1minarg|)(xxgPXYxg−=)|)((maxarg|)(xxgPXYxg=)|(maxarg|xiPXYi=5MAP rulefor the “0-1” loss the optimal decision rule is the maximum a-posteriori probability rulewhat is the associated risk?)|(maxarg)(*|xiPxgXYi=dxixgLxiPxPRMiXYX∫∑== ]),([)|()(**1|dxxiPxPMxgiXYX∫∑≠= )|()()*(|dxxxgyPxPXYX∫≠= )|)(*()(|dxxxgyPXY∫≠= )),(*(,6MAP rulebut• is really just the probability of error of the decision rule g*(x)• note that the same result would hold for any g(x), i.e. R would be the probability of error of g(x)• this implies the followingfor the “0-1” loss• the Bayes decision rule is the MAP rule• the risk is the probability of error of this rule (Bayes error)• there is no other decision function with lower errordxxxgyPRXY∫≠= )),(*(*,)|(maxarg)(*|xiPxgXYi=7MAP ruleusually can be written in a simple form given a probabilistic model for X and Yconsider the two-class problem, i.e. Y=0 or Y=1• the BDR is• pick “0” when and “1” otherwise• using Bayes rule⎭⎬⎫⎩⎨⎧<≥==)|1()|0(,1)|1()|0(,0)|(maxarg)(|||||*xPxPxPxPxiPxiXYXYXYXYXYiif if )|1()|0(||xPxPXYXY≥)()1()1|()()0()0|()|1()|0(||||xPPxPxPPxPxPxPXYYXXYYXXYXY≥⇔≥8MAP rule• noting that PX(x) is a non-negative quantity this is the same as• pick “0” whenby using the same reasoning, this can be easily generalized to• note that:• many class-conditional distributions are exponential (e.g. the Gaussian) • this product can be tricky to compute (e.g. the tail probabilities are quite small)• we can take advantage of the fact that we only care about the order of the terms on the right-hand side)1()1|()0()0|(||YYXYYXPxPPxP≥)()|(maxarg)(|*iPixPxiYYXi=9The log trickthis is the log trick• which is to take logs• note that the log is a monotonically increasing function• from which• the order is preservedbabaloglog >⇔>ablog blog a())(log)|(logmaxarg)()|(logmaxarg)()|(maxarg)(|||*iPixPiPixPiPixPxiYYXiYYXiYYXi+===10MAP rulein summary• for the zero/one loss, the following three decision rules are• optimal and equivalent• 1)• 2)• 3)• 1) is usually hard to use, 3) is frequently easier than 2))|(maxarg)(|*xiPxiXYi=[])()|(maxarg)(|*iPixPxiYYXi=[])(log)|(logmaxarg)(|*iPixPxiYYXi+=11Examplethe Bayes decision rule is usually highly intuitiveexample: communications• a bit is transmitted by a source, corrupted by noise, and received by a decoder• Q: what should the optimal decoder do to recover Y?channelYX12Exampleintuitively, it appears that it should just threshold X• pick T• decision rule• what is the threshold value?• let’s solve the problem with the BDR⎩⎨⎧><=T if,1T if,0 xxY13Examplewe need• class probabilities: • in the absence of any other info let’s say• class-conditional densities:• noise results from thermal processes, electrons moving around and bumping each other• a lot of independent events that add up• by the central limit theorem it appears reasonable to assume that the noise is Gaussianwe denote a Gaussian random variable of mean µ and variance σ2by21)1()0( ==YYPP),(~2σµNX14Examplethe Gaussian probability density function issince noise is Gaussian, and assuming it is just added to the signal we have• in both cases, X corresponds to a constant (Y) plus zero-mean Gaussian noise• this simply adds Y to the mean of the Gaussian222)(221),,()(σµπσσµ−−==xXexGxPchannelYX),0(~ ,2σεεNYX +=15Examplein summary• or, graphically,),1,()1|(),0,()0|(||σσxGxPxGxPYXYX==21)1()0( ==YYPP1016Exampleto compute the BDR, we recall thatand note that • terms which are constant (as a function of i) can be dropped• since we are just looking for the i that maximizes the function• since this is the case for the class-probabilities• we have [])(log)|(logmaxarg)(|*iPixPxiYYXi+=21)1()0( ==YYPP)|(logmaxarg)(|*ixPxiYXi=17BDRthis is intuitive• we pick the class that “best explains” (gives higher probability) the observation• in this case, we can solve visually• but the mathematical solution is equally simple10pick 1pick 018BDRlet’s consider the more general case• for which),,()1|( ),,()0|(1|0|σµσµxGxPxGxPYXYX==)|(logmaxarg)(|*ixPxiYXi=222222)(22)(minarg2)()2log(21maxarg21logmaxarg22σµσµπσπσσµiiiixixxei−=⎭⎬⎫⎩⎨⎧−−−=⎪⎭⎪⎬⎫⎪⎩⎪⎨⎧=−−19BDR• or• the optimal decision is, therefore• pick 0 if• or, pick 0 if)2(minarg )2(minarg 2)(minarg*22222iiiiiiiixxxxiµµµµσµ+−=+−=−=202101211200)(222µµµµµµµµ−<−+−<+−xxx201µµ+<x20BDRfor a problem with Gaussian classes, equal variancesand equal class probabilities• optimal decision boundary is the threshold• at the mid-point between the two meansµ1µ0pick 1pick 021BDRback to our signal decoding problem• in this case T = 0.5• decision rule• this is, once again, intuitive• we place the threshold midway along the noise sources⎩⎨⎧><=5.0 if,15.0 if,0 xxY22BDRwhat is the point of going through all the math?• now we know that the intuitive threshold is actually optimal, and in which sense it is optimal (minimum probability or error)• the Bayesian solution keeps us

View Full Document