CMU CS 10701 - Naive Bayes and Logistic Regression - D1367997

Home> Schools> Carnegie Mellon University> Computer Science (CS) > CS 10701> Naive Bayes and Logistic Regression

DOC PREVIEW

CMU CS 10701 - Naive Bayes and Logistic Regression

School name Carnegie Mellon University

Course Cs 10701- Introduction to Machine Learning

Pages 27

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 27 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Recitation 3Naive Bayes and Logistic Regressionand a surprise...Ekaterina Spriggs, 10701/15781 Fall 2009Friday, September 25, 2009Recitation 3Naive Bayes and Logistic Regressionand decision trees!Ekaterina Spriggs, 10701/15781 Fall 2009Friday, September 25, 2009ClassifiersP (Y |X1, . . . , Xn)P (Y |X1, . . . , Xn)directlyP (Y |X)=P (X|Y )P (Y )/P (X)BayesDiscriminativeGenerativeP (X1, . . . , Xn|Y ),P(Y )Friday, September 25, 2009Generative vs discriminative modelsFriday, September 25, 2009NB decision rulef : X → YMost probable value of f(x)=y:Friday, September 25, 2009NB decision ruleX1, . . . , Xntimecrossover dribbleshoot...crossover dribblex11. . . x1nx21. . . x2n. . .xN1. . . xNny1y2. . .yN,f : X → YMost probable value of f(x)=y:Friday, September 25, 2009NB decision ruleX1, . . . , Xnx11. . . x1nx21. . . x2n. . .xN1. . . xNn,y1y2. . .yNtimef(x11, . . . , x1n)...f : X → YMost probable value of f(x)=y:Friday, September 25, 2009YpredictMAP= arg maxyj∈YP (Y = yj|X = x1, . . . , X = xn)Decision ruleYpredictNB= arg maxyj∈Yn!iP (X = xi|Y = yj)P (Y = yj)YpredictMLE= arg maxyj∈YP (X = x1, . . . , X = xn|Y = yj)f : X → YMost probable value of f(x)=y:Friday, September 25, 2009YpredictNB= arg maxyj∈Yn!iP (X = xi|Y = yj)P (Y = yj)YpredictMLE= arg maxyj∈YP (X = x1, . . . , X = xn|Y = yj)Decision ruleYpredictMAP= arg maxyj∈YP (Y = yj|X = x1, . . . , X = xn)f : X → YMost probable value of f(x)=y:Friday, September 25, 2009YpredictNB= arg maxyj∈Yn!iP (X = xi|Y = yj)P (Y = yj)Bayes optimalYpredictMLE= arg maxyj∈YP (X = x1, . . . , X = xn|Y = yj)Decision ruleYpredictMAP= arg maxyj∈YP (Y = yj|X = x1, . . . , X = xn)f : X → YMost probable value of f(x)=y:Friday, September 25, 2009NB decision ruleYpredictNB= arg maxyj∈Yn!iP (X = xi|Y = yj)P (Y = yj)P (X = xi|Y = yj)=P (Y = yj)=you know this from classyou know this from classMatlab structures...Friday, September 25, 2009% example for NB - how to count%number of data pointsnum_data_points = 9;%number of possible values for the featuresx_num_outcomes = 4;%feature dimensionalityx_dim = 3;%number of possible outcomesy_num_outcomes = 2; Y = [0 0 0 1 1 1 0 0 0]';X = [ 4 3 4 3 4 3 4 3 4; 1 4 3 1 2 1 2 3 1; 3 1 2 4 1 4 1 3 2; ]'; fprintf('Data:\n');[X,Y] %class conditional probability table%(each_feature, y_outcomes, x_outcomes)table = zeros(x_dim, y_num_outcomes, x_num_outcomes); %P(X_i == k, Y = j) = ?% what is this: X(Y ==1, 3) % some code....% .... i = 1; %dim of X j = 1; %values of Y k = 1; %values of X table(i, j, k) = (sum( X(Y==j, i) == k ) + something )/ (sum( Y == j ) + something' );% more code.... %once you have your conditional probability table, how do you make a decision?Friday, September 25, 2009NB decision ruleYpredictNB= arg maxyj∈Yn!iP (X = xi|Y = yj)P (Y = yj)In the homework:Y = 0 or Y =1n!iP (X = xi|Y = 1)P (Y = 1)n!iP (X = xi|Y = 0)P (Y = 0)Comparing:≥Friday, September 25, 2009NB decision ruleYpredictNB= arg maxyj∈Yn!iP (X = xi|Y = yj)P (Y = yj)Problem?Hint: “argmax” a monotonic function of the decision rule0.2*0.3 = alright0.2*0.3*0.8*0.1*0.02*0.7... = troubleFriday, September 25, 2009NB: Gaussian inputs vs discrete inputsGaussian inputs: in classDiscrete inputs: in homework** See suggested reading for Gaussian NBFriday, September 25, 2009NB: performanceClassification performance:not the same as error rate...f(X1,X2)Yprediction truth!I(f(X1,X2)=Y )|Y |Friday, September 25, 2009NB: error ratef(X1,X2)Yf(X1,X2) != YP (X1,X2,Y)!X1,X2,YI(f(X1,X2) != Y )P (X1,X2,Y)prediction truthFriday, September 25, 2009Logistic regressionP (Y =1|X, w)=g(w0+!iwixi)g(w0+!iwixi)=11+ew0+!iwixig(w0+!iwixi)=g(w0+n!i=1wixi)=g(n!i=0wixi)x0=1Friday, September 25, 2009LR: learning the weightsMLE!lnP (DY|DX, w)lnP (DY|DX, w)=!jlnP (yj|xj, w)Gradient ascent: Concave:Log-likelihood:wikipedia or class on MondayFriday, September 25, 2009LR: gradient ascentw ← w + !δδw!jlnP (yi|xi, w)lnP (DY|DX, w)=!jlnP (yj|xj, w)... For all features i:wt+1i← wti+ !!jxji[yj− P (Yj=1|xj, wt)]Until... estimate doesn’t change “much”Friday, September 25, 2009LR: classification ruleClassification rule: see class notesFriday, September 25, 2009Information theoryQuick intro, this will be covered in classFriday, September 25, 2009Information theoryEntropy:H(Y )=−k!i=1P (Y = yi) log2P (Y = yi)Information gain:IG(Y |X)=H(Y ) − H(Y |X)On average, smallest number of bits needed to transmit values drawn from Y’s distributionHow much better can we do if we share knowledge about X?Friday, September 25, 2009Information theoryConditional entropyH(Y |X)=−v!j=1P (X = xj)k!i=1P (Y = yi|X = xj) log2P (Y = yi|X = xj)Friday, September 25, 2009Decision treesID3check wikipediaFriday, September 25, 2009Credits Andrew Moore’s lectures: http://www.autonlab.org/tutorials/ Carlos Guestrin’s lectures: http://select.cs.cmu.edu/class/10701-F09/schedule.htmlTommi Jaakola lecturesVideos: CMU graphics labFriday, September 25, 2009Homework questions?Friday, September 25,

View Full Document

CMU CS 10701 - Naive Bayes and Logistic Regression

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-25-26-27 out of 27 pages.

CMU CS 10701 - Naive Bayes and Logistic Regression

Sign up for free to view:

Please select your school