DOC PREVIEW
CMU CS 10601 - Logistic regression

This preview shows page 1-2-3-24-25-26 out of 26 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 26 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Logistic RegressionMachine Learning 10-601Tom M. MitchellMachine Learning DepartmentCarnegie Mellon UniversityFebruary 11, 2008Required reading: Mitchell draft chapter on “Naïve Bayes and Logistic Regression.”(available on class website)Naïve Bayes in a NutshellBayes rule:Assuming conditional independence among Xi’s:So, classification rule for Xnew= < X1, …, Xn> is:Logistic RegressionIdea:• Naïve Bayes implies some functional form for P(Y|X)• Why not derive the form for P(Y|X), then learn its parameters directly?• Consider learning f: X Æ Y, where• X is a vector of real-valued features, < X1…Xn>• Y is boolean• assume all Xiare conditionally independent given Y• model P(Xi| Y = yk) as Gaussian N(μi,σi)• model P(Y) as Bernoulli (π)• What does that imply about the form of P(Y|X)?Derive form for P(Y|X) for continuous XiLogistic functionVery convenient!impliesimpliesimpliesVery convenient!impliesimpliesimplieslinear classification rule!Logistic regression more generally• Logistic regression in more general case, where y ∈{y1... yR} : learn R-1 sets of weightsfor k<Rfor k=RTraining Logistic Regression: MCLE• Choose parameters W=<w0, ... wn> to maximize conditional likelihoodof training data• Training data D = • Data likelihood = • Data conditionallikelihood = whereExpressing Conditional Log LikelihoodMaximizing Conditional Log LikelihoodGood news: l(W) is concave function of WBad news: no closed-form solution to maximize l(W)Maximize Conditional Log Likelihood: Gradient AscentGradient ascent algorithm: iterate until change < εRepeat: For all i,That’s all for M(C)LE. How about MAP?• One common approach is to define priors on W– Normal distribution, zero mean, identity covariance• Helps avoid very large weights and overfitting• MAP estimateMLE vs MAP • Maximum conditional likelihood estimate• Maximum a posteriori estimateGenerative vs. Discriminative ClassifiersTraining classifiers involves estimating f: X Æ Y, or P(Y|X)Generative classifiers: (e.g., Naïve Bayes)• Assume some functional form for P(X|Y), P(Y)• Estimate parameters of P(X|Y), P(Y) from training data• Use Bayes rule to calculate P(Y|X= xi)Discriminative classifiers: (e.g., Logistic Regression)• Assume some functional form for P(Y|X)• Estimate parameters of P(Y|X) directly from training dataNaïve Bayes vs Logistic RegressionConsider Y boolean, Xicontinuous, X=<X1... Xn>Number of parameters to estimate:•NB: •LR:Naïve Bayes vs Logistic RegressionConsider Y boolean, Xicontinuous, X=<X1... Xn>Number of parameters:• NB: 4n +1• LR: n+1Estimation method:• NB parameter estimates are uncoupled• LR parameter estimates are coupledG.Naïve Bayes vs. Logistic Regression• Generative and Discriminative classifiers• Asymptotic comparison (as # training examples Æ infinity)• when model assumptions correct• GNB, LR produce identical classifiers• in fact, the Bayes optimal classifier• when model assumptions incorrect• LR is less biased – does not assume cond indep.• therefore expected to outperform GNB[Ng & Jordan, 2002]Naïve Bayes vs. Logistic Regression• Generative and Discriminative classifiers• Non-asymptotic analysis (see [Ng & Jordan, 2002] )• convergence rate of parameter estimates – how many training examples needed to assure good estimates?• GNB order log n (where n = # of attributes in X)• LR order nGNB converges more quickly to its (perhaps less correct) asymptotic estimatesRate of covergence: logistic regressionLet hDis,mbe logistic regression trained on m examples in ndimensions. Then with high probability:Implication: if we wantfor some constant , it suffices to pick m to be order n examples Æ Convergences to its asymptotic classifier, in order n examples(result follows from Vapnik’s structural risk bound, plus fact that VCDim of n dimensional linear separators is n )[Ng & Jordan, 2002]Rate of covergence: naïve Bayes parameters[Ng & Jordan, 2002]Some experiments from UCI data setsWhat you should know:• Logistic regression– Functional form follows from Naïve Bayes assumptions• For Gaussian Naïve Bayes assuming variance σi,k= σι• For discrete-valued Naïve Bayes too– But training procedure picks parameters without the conditional independence assumption– MLE training: pick W to maximize P(Y | X, W)– MAP training: pick W to maximize P(W | X,Y)• ‘regularization’• helps reduce overfitting • Gradient ascent/descent– General approach when closed-form solutions unavailable• Generative vs. Discriminative classifiers– Bias vs. variance


View Full Document

CMU CS 10601 - Logistic regression

Documents in this Course
lecture

lecture

40 pages

Problem

Problem

12 pages

lecture

lecture

36 pages

Lecture

Lecture

31 pages

Review

Review

32 pages

Lecture

Lecture

11 pages

Lecture

Lecture

18 pages

Notes

Notes

10 pages

Boosting

Boosting

21 pages

review

review

21 pages

review

review

28 pages

Lecture

Lecture

31 pages

lecture

lecture

52 pages

Review

Review

26 pages

review

review

29 pages

Lecture

Lecture

37 pages

Lecture

Lecture

35 pages

Boosting

Boosting

17 pages

Review

Review

35 pages

lecture

lecture

32 pages

Lecture

Lecture

28 pages

Lecture

Lecture

30 pages

lecture

lecture

29 pages

leecture

leecture

41 pages

lecture

lecture

34 pages

review

review

38 pages

review

review

31 pages

Lecture

Lecture

41 pages

Lecture

Lecture

15 pages

Lecture

Lecture

21 pages

Lecture

Lecture

38 pages

Notes

Notes

37 pages

lecture

lecture

29 pages

Load more
Download Logistic regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Logistic regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Logistic regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?