Rutgers University CS 536 - INTRODUCTION TO Machine Learning - D2264034

Home> Schools> Rutgers University- The State University of New Jersey> (CS) > CS 536> INTRODUCTION TO Machine Learning

DOC PREVIEW

Rutgers University CS 536 - INTRODUCTION TO Machine Learning

School name Rutgers University- The State University of New Jersey

Course Cs 536- Machine Learning

Pages 13

This preview shows page 1-2-3-4 out of 13 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

1INTRODUCTION TOMachine LearningETHEM ALPAYDIN© The MIT Press, 2004Edited for CS536 Fall 05- Rutgers UniversityAhmed ElgammalLecture Slides forCHAPTER 3:Bayesian Decision Theory2Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)3Probability and Inference Result of tossing a coin is ∈ {Heads,Tails} Random var X ∈{1,0}Bernoulli: P {X=1} = poX (1 - po)(1 - X) Sample: X = {xt }Nt =1Estimation: po= # {Heads}/#{Tosses} = ∑txt / N Prediction of next toss:Heads if po> ½, Tails otherwiseLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)4Classification Credit scoring: Inputs are income and savings. Output is low-risk vs high-risk Input: x = [x1,x2]T,Output: C ∈ {0,1} Prediction: ==>===>==otherwise 0 )|0()|1( if 1 chooselyequivalentor otherwise 0 50)|1( if 1 choose212121CCCC ,xxCP ,xxCP. ,xxCP3Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)5Bayes’ Rule()()()()xxxppPPCCC| | =posteriorlikelihoodpriorevidencePrior: The knowledge we have as to the value of C before looking at the observation x()()110==+= CC PPLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)6Bayes’ Rule()()()()xxxppPPCCC| | =posteriorlikelihoodpriorevidence•Likelihood: the conditional probability that an event belonging to C has associated observation value x•Evidence: the marginal probability that an observation x is seen regardless of C() ( )()()()00|11|==+=== CCCC PpPpp xxx4Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)7Bayes’ Rule()()()()xxxppPPCCC| | =posteriorlikelihoodpriorevidence• Combining the prior with what the data tells us, we can calculate the posterior probability P(C|x) after having seen the observation x• The posterior sum up to 1()()1|1|0==+=xx CC PpLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)8Bayes’ Rule()()()()xxxppPPCCC| | =posteriorlikelihoodpriorevidence• Bayesian learning: from training data how to estimate P(C) and P(x|C)5Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)9Bayes’ Rule()()()()xxxppPPCCC| | =posteriorlikelihoodpriorevidence()()() ( )( ) ( )( )()()1|1|000|11|110==+===+=====+=xxxxxCCCCCCCCPpPpPppPPLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)10Bayes’ Rule: K>2 Classes()()()()()()()()∑===KkkkiiiiiCPCpCPCppCPCpCP1||||xxxxx() ()∑==≥KiiiCPCP11 and 0• MAP: Maximal a Posteriori class: pick the class that maximizes the posterior probability()())|(maxargC| max | if chooseMAPxxxjCkkiiCPCPCPCj≡=6Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)11Bayes’ Rule: K>2 Classes()()()()()()()()∑===KkkkiiiiiCPCpCPCppCPCpCP1||||xxxxx() ()∑==≥KiiiCPCP11 and 0• ML: Maximal Likelihood Class: Special case; assume all class priors P(Cj) are equal()())|(maxarg)()|()|(maxargC| max | if chooseMAPjCMLjjjCkkiiCPCCPCPCPCPCPCjjxxxxx≡=≡=Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)12Losses and Risks It may be the case that decisions are not equally good or costly. Actions: αi Loss of αiwhen the state is Ck: λik Expected risk (Duda and Hart, 1973)() ( )() ( )xxxx|min|if choose||1kkiikKkikiRRCPRαααλα==∑=7Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)13Losses and Risks: 0/1 Loss≠==λkikiikif 1if 0For minimum risk, choose the most probable class() ( )()()xxxx|1|||1iikkKkkikiCPCPCPR−==λ=α∑∑≠=Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)14Losses and Risks: Reject10 otherwise 11if if 0<λ<+=λ==λ ,Kikiik() ()() ( ) ()xxxxx|1||||11iikkiKkkKCPCPRCPR−==αλ=λ=α∑∑≠=+()()()otherwise reject1| and ||if choose λ−>≠∀> xxxikiiCPikCPCPC8Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)15Discriminant Functions()K,,i,giK1 =x()()xxkkiiggC maxif choose=()(){}xxxkkiigg max| ==RK decision regions R1,...,RK()()()()()−=iiiiiCPCpCPRg|||xxxxαLecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)16K=2 Classes Dichotomizer (K=2) vs Polychotomizer (K>2) g(x) = g1(x) – g2(x) Log odds:()>otherwise 0if choose21CgC x()()xx||log21CPCP9Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)17Utility Theory Prob of state k given evidence x: P (Sk|x) Utility of αiwhen state is k: Uik Expected utility:()()()()xxxx| max|if Choose||jjiikkikiEUEUαSPUEUααα==∑Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)18Value of Information Expected utility using x only Expected utility using x and new feature z z is useful if EU (x,z) > EU (x) ()()∑=kkikiSPUEU xx |max()()∑=kkikiz,SPUz,EU xx |max10Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)19Bayesian Networks graphical models, probabilistic networks, belief networks Nodes are hypotheses (random vars) and the prob corresponds to our belief in the truth of the hypothesis Arcs are direct direct influences between hypotheses The structure is represented as a directed acyclic graph (DAG) The parameters are the conditional probs in the arcs (Pearl, 1988, 2000; Jensen, 1996; Lauritzen, 1996)Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)20Causes and Bayes’ RuleDiagnostic inference:Knowing that the grass is wet, what is the probability that rain is the cause?causaldiagnostic()()()()()()()()( )()750602040904090|~||||.......R~PRWPRPRWPRPRWPWPRPRWPWRP=×+××=+==Notice: knowing the grass is wet increases the probability of rain from 0.4 to 0.7511Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)21Causal vs Diagnostic InferenceCausal inference: If the sprinkler is on, what is the probability that the grass is wet?P(W|S) = P(W|R,S) P(R|S) + P(W|~R,S) P(~R|S)= P(W|R,S) P(R) + P(W|~R,S) P(~R)= 0.95 0.4 + 0.9 0.6 = 0.92Diagnostic inference: If the grass is wet, what is the probabilitythat the sprinkler is on? P(S|W) = 0.35 > 0.2 P(S)P(S|R,W) = 0.21 Explaining away:

View Full Document