DOC PREVIEW
UCSD ECE 271A - Bayesian Decision Theory

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Bayesian decision theoryNotationBayesian decision theoryBayesian decision theoryClassificationTools for solving BDT problemTools for solving BDT problemTools for solving BDT problemThe chain rule of probabilityThe chain rule of probabilityThe chain rule of probabilityTools for solving BDT problemsMarginalizationMarginalizationIndependenceIndependenceTools for solving BDT problemsTools for solving BDT problemsBayes ruleBayesian decision theoryBayesian decision theoryBayesian decision theoryBayesian decision theoryExampleExampleExampleExampleExampleExampleBayesian decision theoryNuno Vasconcelos ECE Department, UCSD2Notationthe notation in DHS is quite sloppy• e.g. show that• really not clear what this meanswe will use the following notation• subscripts are random variables (uppercase)• arguments are the values of the random variables (lowercase) • equivalent to ∫= dzzPzerrorPerrorP )()|()()|(00|yxPYX)|(00yYxXP==3Bayesian decision theoryframework for computing optimal decisions on problems involving uncertainty (probabilities)basic concepts:•world:• has states or classes, drawn from a state or class random variable Y• fish classification, Y ∈ {bass, salmon}• student grading, Y ∈ {A, B, C, D, F}• medical diagnosis ∈ {disease A, disease B, …, disease M}• observer:• measures observations (features), drawn from a random process X• fish classification, X = (scale length, scale width) ∈ R2• student grading, X = (HW1, …, HWn) ∈ Rn• medical diagnosis X = (symptom 1, …, symptom n) ∈ Rn4Bayesian decision theory• decision function:• observer uses the observations to make decisions about the state of the world y• if x ∈ Ω and y ∈ Ψ the decision function isthe mappingsuch thatand yois a prediction of the state y• loss function:• is the cost L(yo,y) of deciding for yowhen the true state is y• usually this is zero if there is no error and positive otherwise• goal: to determine the optimal decision function for the loss L(.,.)Ψ→Ω:goyxg=)(5Classificationwe will focus on classification problems• the observer tries to infer the state of the world• we will also mostly consider the “0-1” loss functionbut the regression case• the observer tries to predict a continuous y• is basically the same, for a suitable loss function, e.g. squared error{}Miixg ,,1 ,)( K∈=⎩⎨⎧=≠=yxgyxgyxgL)(,0)(,1]),([ℜ∈)(xg2)(]),([ xgyyxgL −=6Tools for solving BDT problemin order to find optimal decision function we need a probabilistic description of the problem• in the most general form this is the joint distribution• but we frequently decompose it into a combination of two terms• these are the “class conditional distribution” and “class probability”• class probability• prior probability of state i, before observer actually measures anything• reflects a “prior belief” that, if all else is equal, the world will be in state i with probability PY(i)),(,ixPYX{)()|(),(|,iPixPixPYYXYX43421=7Tools for solving BDT problemclass-conditional distribution:• is the model for the observations given the class or state of the worldconsider the grading example• I know, from experience, that a% of the students will get A’s, b% B’s, c% C’s, and so forth• hence, for any student, P(A) = a/100, P(B) = b / 100, etc.• these are the state probabilities, before I get to see any of the student’s work• the class-conditional densities are the models for the grades themselves• let’s assume that the grades are always Gaussian, i.e. they are completely characterized by a mean and a variance8Tools for solving BDT problem• knowledge of the class changes the mean grade, e.g. I expect • A students to have an average HW grade of 90%• B students 75%• C students 60%, etc• this means that • i.e. the distribution of class i is a Gaussian of mean µiand variance σnote that the decompositionis a special case of a very powerful tool in Bayesian inference),,()|(|σµiYXxGixP=)()|(),(|,iPixPixPYYXYX=9The chain rule of probabilityis an important consequence of the definition of conditional probability• note that, by recursive application of • we can write this is called the chain rule of probabilityit allows us to modularize inference problems)()|(),(|,yPyxPyxPYYXYX=×=),...,|(),...,,(21,...,|21,...,,2121nXXXnXXXxxxPxxxPnn...),...,|(32...,|32××nXXXxxxPn)()|(...1|1nXnnXXxPxxPnnn−−××10The chain rule of probabilitye.g. in the medical diagnosis scenario• what is the probability that you will be sick and have 104oof fever? • breaks down a hard question (prob of sick and 104) into two easier questions• Prob (sick|104): everyone knows that this is close to one)104()104|()104,(111|, XXYXYPsickPsickP=You havea cold!!1)104|(|=sickPXY11The chain rule of probabilitye.g. what is the probability that you will be sick and have 104oof fever? • Prob(104): still hard, but easier than P(sick,104) since we know only have one random variable (temperature)• does not depend on sickness, it is just the question “what is the probability that someone will have 104o?”• gather a number of people, measure their temperatures and make an histogram that everyone can use after that)104()104|()104,(111|, XXYXYPsickPsickP=12Tools for solving BDT problemsfrequently we have problems with multiple random variables• e.g. when in the doctor, you are mostly a collection of random variables• x1: temperature• x2: blood pressure• x3: weight• x4: coughwe can summarize this as • a vector X = (x1, …, xn) of n random variables• PX(x1, …, xn) is the joint probability distributionbut frequently we only care about a subset of X13Marginalizationwhat if I only want to know if the patienthas a cold or not?• e.g. having a cold does not depend on blood pressure and weight• all that matters are fever and cough• that is, we need to know PX1,X4(a,b)we marginalize with respect to a subset of variables• (in this case X1and X4)• this is done by summing (or integrating) the others out∫∫=324321,,,41,),,,(),(432141dxdxxxxxPxxPXXXXXX∑=43432141,4321,,,41,),,,(),(xxXXXXXXxxxxPxxP?)(coldP14Marginalizationextremely important equation:• seems trivial, but for large models is a major computational assetfor probabilistic inference• for any question, there are lots of variables which are irrelevant• direct evaluation is frequently intractable• typically, we combine with the chain rule to explore independence


View Full Document

UCSD ECE 271A - Bayesian Decision Theory

Download Bayesian Decision Theory
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Bayesian Decision Theory and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Bayesian Decision Theory 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?