DOC PREVIEW
Rutgers University CS 536 - INTRODUCTION TO Machine Learning

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1INTRODUCTION TOMachine LearningETHEM ALPAYDIN© The MIT Press, 2004Edited for CS536 Fall 05- Rutgers UniversityAhmed ElgammalLecture Slides forCHAPTER 3:Bayesian Decision Theory2Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)3Probability and Inference Result of tossing a coin is ∈ {Heads,Tails} Random var X ∈{1,0}Bernoulli: P {X=1} = poX (1 - po)(1 - X) Sample: X = {xt }Nt =1Estimation: po= # {Heads}/#{Tosses} = ∑txt / N Prediction of next toss:Heads if po> ½, Tails otherwiseLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)4Classification Credit scoring: Inputs are income and savings. Output is low-risk vs high-risk Input: x = [x1,x2]T,Output: C ∈ {0,1} Prediction: ==>===>==otherwise 0 )|0()|1( if 1 chooselyequivalentor otherwise 0 50)|1( if 1 choose212121CCCC ,xxCP ,xxCP. ,xxCP3Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)5Bayes’ Rule()()()()xxxppPPCCC| | =posteriorlikelihoodpriorevidencePrior: The knowledge we have as to the value of C before looking at the observation x()()110==+= CC PPLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)6Bayes’ Rule()()()()xxxppPPCCC| | =posteriorlikelihoodpriorevidence•Likelihood: the conditional probability that an event belonging to C has associated observation value x•Evidence: the marginal probability that an observation x is seen regardless of C() ( )()()()00|11|==+=== CCCC PpPpp xxx4Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)7Bayes’ Rule()()()()xxxppPPCCC| | =posteriorlikelihoodpriorevidence• Combining the prior with what the data tells us, we can calculate the posterior probability P(C|x) after having seen the observation x• The posterior sum up to 1()()1|1|0==+=xx CC PpLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)8Bayes’ Rule()()()()xxxppPPCCC| | =posteriorlikelihoodpriorevidence• Bayesian learning: from training data how to estimate P(C) and P(x|C)5Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)9Bayes’ Rule()()()()xxxppPPCCC| | =posteriorlikelihoodpriorevidence()()() ( )( ) ( )( )()()1|1|000|11|110==+===+=====+=xxxxxCCCCCCCCPpPpPppPPLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)10Bayes’ Rule: K>2 Classes()()()()()()()()∑===KkkkiiiiiCPCpCPCppCPCpCP1||||xxxxx() ()∑==≥KiiiCPCP11 and 0• MAP: Maximal a Posteriori class: pick the class that maximizes the posterior probability()())|(maxargC| max | if chooseMAPxxxjCkkiiCPCPCPCj≡=6Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)11Bayes’ Rule: K>2 Classes()()()()()()()()∑===KkkkiiiiiCPCpCPCppCPCpCP1||||xxxxx() ()∑==≥KiiiCPCP11 and 0• ML: Maximal Likelihood Class: Special case; assume all class priors P(Cj) are equal()())|(maxarg)()|()|(maxargC| max | if chooseMAPjCMLjjjCkkiiCPCCPCPCPCPCPCjjxxxxx≡=≡=Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)12Losses and Risks It may be the case that decisions are not equally good or costly. Actions: αi Loss of αiwhen the state is Ck: λik Expected risk (Duda and Hart, 1973)() ( )() ( )xxxx|min|if choose||1kkiikKkikiRRCPRαααλα==∑=7Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)13Losses and Risks: 0/1 Loss≠==λkikiikif 1if 0For minimum risk, choose the most probable class() ( )()()xxxx|1|||1iikkKkkikiCPCPCPR−==λ=α∑∑≠=Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)14Losses and Risks: Reject10 otherwise 11if if 0<λ<+=λ==λ ,Kikiik() ()() ( ) ()xxxxx|1||||11iikkiKkkKCPCPRCPR−==αλ=λ=α∑∑≠=+()()()otherwise reject1| and ||if choose λ−>≠∀> xxxikiiCPikCPCPC8Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)15Discriminant Functions()K,,i,giK1 =x()()xxkkiiggC maxif choose=()(){}xxxkkiigg max| ==RK decision regions R1,...,RK()()()()()−=iiiiiCPCpCPRg|||xxxxαLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)16K=2 Classes Dichotomizer (K=2) vs Polychotomizer (K>2) g(x) = g1(x) – g2(x) Log odds:()>otherwise 0if choose21CgC x()()xx||log21CPCP9Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)17Utility Theory Prob of state k given evidence x: P (Sk|x) Utility of αiwhen state is k: Uik Expected utility:()()()()xxxx| max|if Choose||jjiikkikiEUEUαSPUEUααα==∑Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)18Value of Information Expected utility using x only Expected utility using x and new feature z z is useful if EU (x,z) > EU (x) ()()∑=kkikiSPUEU xx |max()()∑=kkikiz,SPUz,EU xx |max10Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)19Bayesian Networks graphical models, probabilistic networks, belief networks Nodes are hypotheses (random vars) and the prob corresponds to our belief in the truth of the hypothesis Arcs are direct direct influences between hypotheses The structure is represented as a directed acyclic graph (DAG) The parameters are the conditional probs in the arcs (Pearl, 1988, 2000; Jensen, 1996; Lauritzen, 1996)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)20Causes and Bayes’ RuleDiagnostic inference:Knowing that the grass is wet, what is the probability that rain is the cause?causaldiagnostic()()()()()()()()( )()750602040904090|~||||.......R~PRWPRPRWPRPRWPWPRPRWPWRP=×+××=+==Notice: knowing the grass is wet increases the probability of rain from 0.4 to 0.7511Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)21Causal vs Diagnostic InferenceCausal inference: If the sprinkler is on, what is the probability that the grass is wet?P(W|S) = P(W|R,S) P(R|S) + P(W|~R,S) P(~R|S)= P(W|R,S) P(R) + P(W|~R,S) P(~R)= 0.95 0.4 + 0.9 0.6 = 0.92Diagnostic inference: If the grass is wet, what is the probabilitythat the sprinkler is on? P(S|W) = 0.35 > 0.2 P(S)P(S|R,W) = 0.21 Explaining away:


View Full Document
Download INTRODUCTION TO Machine Learning
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view INTRODUCTION TO Machine Learning and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view INTRODUCTION TO Machine Learning 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?