Unformatted text preview:

Logistic Regression and Newton s Method 36 350 Data Mining 18 November 2009 Readings in textbook Sections 10 7 logistic regression sections 8 1 and 8 3 optimization and 11 3 generalized linear models Contents 1 Logistic Regression 1 1 Likelihood Function for Logistic Regression 1 2 Logistic Regression with More Than Two Classes 1 4 5 2 Newton s Method for Numerical Optimization 2 1 Newton s Method in More than One Dimension 5 7 3 Generalized Linear Models and Generalized Additive Models 8 3 1 Generalized Additive Models 9 3 2 An Example Including Model Checking 9 Last time we looked at the situation where we have a vector of input fea and we want to predict a binary class Y In that lecture the two tures1 X classes were Y 1 and Y 1 in this one it will simplify the book keeping to make the classes Y 1 and Y 0 1 Logistic Regression A linear classifier as such doesn t give us probabilities for the classes in any particular case But we ve seen that we often want such probabilities to handle different error costs between classes or to give us some indication of confidence for bet hedging or perhaps most important when perfect classification isn t possible People sometimes try to get conditional probabilities for classes by learning a linear classifier and then saying the class probabilities at a point depend on its margin from the boundary but this is a dubious hack and should be avoided unless you want to descend to the level of bad psychologists and economists If you want to estimate probabilities fit a stochastic model 1 If we have some discrete features we can handle them through indicator variables as in linear regression 1 Several steps above that level one can think like an old school statistician and ask how can I use linear regression on this problem x for short p x be 1 The most obvious idea is to let Pr Y 1 X a linear function of x Every increment of a component of x would add or subtract so much to the probability The conceptual problem here is that p must be between 0 and 1 and linear functions are unbounded 2 The next most obvious idea is to let log p x be a linear function of x so that changing an input variable multiplies the probability by a fixed amount The problem is that logarithms are unbounded in only one direction and linear functions are not 3 Finally the easiest modification of log p which has an unbounded range p We can make this is the logistic or logit transformation log 1 p a linear function of x without fear of nonsensical results Of course the results could still happen to be wrong but they re not guaranteed to be wrong This last alternative is logistic regression Formally the model logistic regression model is that p x b x w 1 p x 1 eb x w 1 b x w b x w 1 e 1 e 2 log Solving for p this gives p Notice that the over all specification is a lot easier to grasp in terms of the transformed probability that in terms of the untransformed probability 2 Recall that to minimize the mis classification rate we should predict Y 1 when p 0 5 and Y 0 when p 0 5 This means guessing 1 whenever b x w is non negative and 0 otherwise So logistic regression gives us a linear classifier like we saw last time Recall further that the distance from the decision boundary is b kwk x w k wk So logistic regression not only says where the boundary between the classes is but also says via Eq 2 that the class probabilities depend on distance from the boundary in a particular way and that they go towards the extremes 0 and 1 more rapidly when kwk is larger It s these statements about probabilities which make logistic regression more than just a linear classifier It makes stronger more detailed predictions and can be fit in a different way but those strong predictions could be wrong Using logistic regression to predict class probabilities is a modeling choice just like it s a modeling choice to predict quantitative variables with linear 2 Unless you ve taken statistical mechanics in which case you recognize that this is the Boltzmann distribution for a system with two states which differ in energy by b x w 2 0 0 1 0 0 0 0 5 1 0 1 0 0 5 1 0 1 0 0 5 0 5 1 1 w 2 2 1 0 1 2 2 x 2 0 0 0 5 0 5 1 0 0 5 1 0 0 0 Linear classifier with b x 1 Logistic regression with b 2 5 w 5 5 0 0 x 1 x 2 1 0 0 5 0 5 0 0 0 5 1 0 1 0 1 0 0 5 0 0 x 2 0 5 x 2 0 5 Logistic regression with b 0 5 w 1 1 0 5 1 0 Logistic regression with b 0 1 w 2 2 1 0 1 0 x 1 0 5 0 0 0 5 1 0 x 1 Figure 1 Effects of scaling logistic regression parameters Values of x1 and x2 are the same in all plots Unif 1 1 for both coordinates but labels were generated randomly from logistic regressions with b 0 1 w 0 2 0 2 top left from b 0 5 w 1 1 top right from b 2 5 w 5 5 bottom left and from a perfect linear classifier with the same boundary The large black dot is the origin 3 regression In neither case is the appropriateness of the model guaranteed by the gods nature mathematical necessity etc We begin by positing the model to get something to work with and we end if we know what we re doing by checking whether it really does match the data or whether it has systematic flaws Logistic regression is one of the most commonly used tools for applied statistics and data mining There are basically four reasons for this 1 Tradition 2 In addition to the heuristic approach above the quantity log p 1 p plays an important role in the analysis of contingency tables the log odds Classification is a bit like having a contingency table with two columns classes and infinitely many rows values of x With a finite contingency table we can estimate the log odds for each row empirically by just taking counts in the table With infinitely many rows we need some sort of interpolation scheme logistic regression is linear interpolation for the log odds 3 It s closely related to exponential family distributions Pm where the probability of some vector v is proportional to exp w0 j 1 fj v wj If one of the components of v is binary and the functions fj are all the identity function then we get a logistic regression Exponential families arise in many contexts …


View Full Document

CMU STA 36350 - Lecture

Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?