UB CSE 574 - Decision-Theory - D2356825

Home> Schools> University at Buffalo, The State University of New York> Computer Science & Engineering (CSE) > CSE 574> Decision-Theory

DOC PREVIEW

UB CSE 574 - Decision-Theory

School name University at Buffalo, The State University of New York

Course Cse 574- Introduction to Machine Learning

Pages 15

This preview shows page 1-2-3-4-5 out of 15 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 15 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Machine Learning ! !! ! !Srihari 1 Decision Theory Sargur Srihari [email protected] Learning ! !! ! !Srihari 2 Decision Theory • Using probability theory to make optimal decisions • Input vector x, target vector t – Regression: t is continuous – Classification: t will consist of class labels • Summary of uncertainty associated is given by p(x,t) • Inference problem is to obtain p(x,t) from data • Decision: make specific prediction for value of t and take specific actions based on tMachine Learning ! !! ! !Srihari 3 Medical Diagnosis Problem • X-ray image of patient • Whether patient has cancer or not • Input vector x is set of pixel intensities • Output variable t represents whether cancer or not C1 is cancer and C2 is absence of cancer • General inference problem is to determine p(x,Ck) which gives most complete description of situation • In the end we need to decide whether to give treatment or not. Decision theory helps do thisMachine Learning ! !! ! !Srihari 4 Bayes Decision • How do probabilities play a role in making a decision? • Given input x and classes Ck using Bayes theorem • Quantities in Bayes theorem can be obtained from p(x,Ck) either by marginalizing or conditioning wrt appropriate variableMachine Learning ! !! ! !Srihari 5 Minimizing Expected Error • Probability of mistake (2-class) • Minimum error decision rule – For a given x choose class for which integrand is smaller – Since p(x,Ck)=p(Ck|x)p(x), choose class for which a posteriori probability is highest – Called Bayes Classifier Single input variable x If priors are equal, decision is based on class-conditional densities p(x|Ck)Machine Learning ! !! ! !Srihari 6 Minimizing Expected Loss • Unequal importance of mistakes • Medical Diagnosis • Loss or Cost Function given by Loss Matrix • Utility is negative of Loss • Minimize Average Loss • Minimum Loss Decision Rule – Choose class for which is minimum – Trivial once we know a posteriori probabilities True Class Decision Made Loss Function for Cancer Decision € Lkjk∑p(Ck| x)Machine Learning ! !! ! !Srihari 7 Reject Option • Decisions can be made when a posteriori probabilities are significantly less than unity or joint probabilities have comparable values • Avoid making decisions on difficult casesMachine Learning ! !! ! !Srihari 8 Inference and Decision • Classification problem broken into two separate stages – Inference, where training data is used to learn a model for p(Ck,x) – Decision, use posterior probabilities to make optimal class assignments • Alternatively can learn a function that maps inputs directly into labels • Three distinct approaches to Decision Problems 1. Generative 2. Discriminative 3. Discriminant FunctionMachine Learning ! !! ! !Srihari 9 1. Generative Models • First solve inference problem of determining class-conditional densities p(x|Ck) for each class separately • Then use Bayes theorem to determine posterior probabilities • Then use decision theory to determine class membershipMachine Learning ! !! ! !Srihari 10 2. Discriminative Models • First solve inference problem to determine posterior class probabilities p(Ck|x) • Use decision theory to determine class membershipMachine Learning ! !! ! !Srihari 11 3. Discriminant Functions • Find a function f (x) that maps each input x directly to class label – In two-class problem, f (.) is binary valued • f =0 represents class C1 and f =1 represents class C2 • Probabilities play no role – No access to posterior probabilities p(Ck|x)Machine Learning ! !! ! !Srihari 12 Need for Posterior Probabilities • Minimizing risk – Loss matrix may be revised periodically as in a financial application • Reject option – Minimize misclassification rate, or expected loss for a given fraction of rejected points • Compensating for class priors – When far more samples from one class compared to another, we use a balanced data set (otherwise we may have 99.9% accuracy always classifying into one class) – Take posterior probabilities from balanced data set, divide by class fractions in the data set and multiply by class fractions in population to which the model is applied – Cannot be done if posterior probabilities are unavailable • Combining models – X-ray images (xI) and Blood tests (xB) – When posterior probabilities are available they can be combined using rules of probability – Assume feature independence p(xI,, xB|Ck)= p(xI,|Ck) p(xB,|Ck) [Naïve Bayes Assumption] – Then p(Ck|xI,, xB) α p(xI,, xB|Ck)p(Ck) α p(xI,|Ck) p(xB,|Ck) p(Ck) α p(Ck|xI) p(Ck|xB)/p(Ck) – Need p(Ck) which can be determined from fraction of data points in each class. Then need to normalize resulting probabilities to sum to oneMachine Learning ! !! ! !Srihari 13 Loss Functions for Regression • Curve fitting can also use a loss function • Regression decision is to choose a specific estimate y(x) of t for a given x • Incur loss L(t,y(x)) • Squared loss function L(t,y(x))={y(x)-t}2 • Minimize expected loss Taking derivative and setting equal to zero yields a solution y(x)=Et[t|x] Regression function y(x), which minimizes the expected squared loss, is given by the mean of the conditional distribution p(t|x)Machine Learning ! !! ! !Srihari 14 Inference and Decision for Regression • Three distinct approaches (decreasing complexity) • Analogous to those for classifiction 1. Determine joint density p(x,t) Then normalize to find conditional density p(t|x) Finally marginalize to find conditional mean Et[t|x] 2. Solve inference problem of determining conditional density p(t|x) Marginalize to find conditional mean 3. Find regression function y(x) directly from training dataMachine Learning ! !! ! !Srihari 15 Minkowski Loss Function • Squared Loss is not only possible choice for regression • Important example concerns multimodal p(t|x) • Minkowski Loss Lq=|y-t|q • Minimum of E[t|x] is given by – conditional mean for q=2, – conditional median for q=1 and – conditional mode for q

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5 out of 15 pages.

UB CSE 574 - Decision-Theory

Sign up for free to view:

Please select your school