Machine Learning ! !! ! !Srihari 1 Decision Theory Sargur Srihari [email protected] Learning ! !! ! !Srihari 2 Decision Theory • Using probability theory to make optimal decisions • Input vector x, target vector t – Regression: t is continuous – Classification: t will consist of class labels • Summary of uncertainty associated is given by p(x,t) • Inference problem is to obtain p(x,t) from data • Decision: make specific prediction for value of t and take specific actions based on tMachine Learning ! !! ! !Srihari 3 Medical Diagnosis Problem • X-ray image of patient • Whether patient has cancer or not • Input vector x is set of pixel intensities • Output variable t represents whether cancer or not C1 is cancer and C2 is absence of cancer • General inference problem is to determine p(x,Ck) which gives most complete description of situation • In the end we need to decide whether to give treatment or not. Decision theory helps do thisMachine Learning ! !! ! !Srihari 4 Bayes Decision • How do probabilities play a role in making a decision? • Given input x and classes Ck using Bayes theorem • Quantities in Bayes theorem can be obtained from p(x,Ck) either by marginalizing or conditioning wrt appropriate variableMachine Learning ! !! ! !Srihari 5 Minimizing Expected Error • Probability of mistake (2-class) • Minimum error decision rule – For a given x choose class for which integrand is smaller – Since p(x,Ck)=p(Ck|x)p(x), choose class for which a posteriori probability is highest – Called Bayes Classifier Single input variable x If priors are equal, decision is based on class-conditional densities p(x|Ck)Machine Learning ! !! ! !Srihari 6 Minimizing Expected Loss • Unequal importance of mistakes • Medical Diagnosis • Loss or Cost Function given by Loss Matrix • Utility is negative of Loss • Minimize Average Loss • Minimum Loss Decision Rule – Choose class for which is minimum – Trivial once we know a posteriori probabilities True Class Decision Made Loss Function for Cancer Decision € Lkjk∑p(Ck| x)Machine Learning ! !! ! !Srihari 7 Reject Option • Decisions can be made when a posteriori probabilities are significantly less than unity or joint probabilities have comparable values • Avoid making decisions on difficult casesMachine Learning ! !! ! !Srihari 8 Inference and Decision • Classification problem broken into two separate stages – Inference, where training data is used to learn a model for p(Ck,x) – Decision, use posterior probabilities to make optimal class assignments • Alternatively can learn a function that maps inputs directly into labels • Three distinct approaches to Decision Problems 1. Generative 2. Discriminative 3. Discriminant FunctionMachine Learning ! !! ! !Srihari 9 1. Generative Models • First solve inference problem of determining class-conditional densities p(x|Ck) for each class separately • Then use Bayes theorem to determine posterior probabilities • Then use decision theory to determine class membershipMachine Learning ! !! ! !Srihari 10 2. Discriminative Models • First solve inference problem to determine posterior class probabilities p(Ck|x) • Use decision theory to determine class membershipMachine Learning ! !! ! !Srihari 11 3. Discriminant Functions • Find a function f (x) that maps each input x directly to class label – In two-class problem, f (.) is binary valued • f =0 represents class C1 and f =1 represents class C2 • Probabilities play no role – No access to posterior probabilities p(Ck|x)Machine Learning ! !! ! !Srihari 12 Need for Posterior Probabilities • Minimizing risk – Loss matrix may be revised periodically as in a financial application • Reject option – Minimize misclassification rate, or expected loss for a given fraction of rejected points • Compensating for class priors – When far more samples from one class compared to another, we use a balanced data set (otherwise we may have 99.9% accuracy always classifying into one class) – Take posterior probabilities from balanced data set, divide by class fractions in the data set and multiply by class fractions in population to which the model is applied – Cannot be done if posterior probabilities are unavailable • Combining models – X-ray images (xI) and Blood tests (xB) – When posterior probabilities are available they can be combined using rules of probability – Assume feature independence p(xI,, xB|Ck)= p(xI,|Ck) p(xB,|Ck) [Naïve Bayes Assumption] – Then p(Ck|xI,, xB) α p(xI,, xB|Ck)p(Ck) α p(xI,|Ck) p(xB,|Ck) p(Ck) α p(Ck|xI) p(Ck|xB)/p(Ck) – Need p(Ck) which can be determined from fraction of data points in each class. Then need to normalize resulting probabilities to sum to oneMachine Learning ! !! ! !Srihari 13 Loss Functions for Regression • Curve fitting can also use a loss function • Regression decision is to choose a specific estimate y(x) of t for a given x • Incur loss L(t,y(x)) • Squared loss function L(t,y(x))={y(x)-t}2 • Minimize expected loss Taking derivative and setting equal to zero yields a solution y(x)=Et[t|x] Regression function y(x), which minimizes the expected squared loss, is given by the mean of the conditional distribution p(t|x)Machine Learning ! !! ! !Srihari 14 Inference and Decision for Regression • Three distinct approaches (decreasing complexity) • Analogous to those for classifiction 1. Determine joint density p(x,t) Then normalize to find conditional density p(t|x) Finally marginalize to find conditional mean Et[t|x] 2. Solve inference problem of determining conditional density p(t|x) Marginalize to find conditional mean 3. Find regression function y(x) directly from training dataMachine Learning ! !! ! !Srihari 15 Minkowski Loss Function • Squared Loss is not only possible choice for regression • Important example concerns multimodal p(t|x) • Minkowski Loss Lq=|y-t|q • Minimum of E[t|x] is given by – conditional mean for q=2, – conditional median for q=1 and – conditional mode for q
View Full Document