Stanford LING 289 - Logistic Regression (15 pages)

Previewing pages 1, 2, 3, 4, 5 of 15 page document View the full content.
View Full Document

Logistic Regression



Previewing pages 1, 2, 3, 4, 5 of actual document.

View the full content.
View Full Document
View Full Document

Logistic Regression

50 views


Pages:
15
School:
Stanford University
Course:
Ling 289 - History of Computational Linguistics
History of Computational Linguistics Documents
Unformatted text preview:

Logistic regression with R Christopher Manning 4 November 2007 1 Theory We can transform the output of a linear regression to be suitable for probabilities by using a logit link function on the lhs as follows logit p log o log p 0 1 x1 2 x2 k xk 1 p 1 The odds can vary on a scale of 0 so the log odds can vary on the scale of precisely what we get from the rhs of the linear model For a real valued explanatory variable xi the intuition here is that a unit additive change in the value of the variable should change the odds by a constant multiplicative amount Exponentiating this is equivalent to 1 elogit p p o 1 p e 0 1 x1 2 x2 k xk 2 k xk 3 0 1 x1 2 x2 e e e e The inverse of the logit function is the logistic function If logit z then ez 1 ez The logistic function will map any value of the right hand side z to a proportion value between 0 and 1 as shown in figure 1 Note a common case with categorical data If our explanatory variables xi are all binary then for the ones that are false 0 we get e0 1 and the term disappears Similarly if xi 1 e i xi e i So we are left with terms for only the xi that are true 1 For instance if x3 x4 x7 1 only we have logit p o 0 3 4 7 e 0 e 3 e 4 e 7 4 5 The intuition here is that if I know that a certain fact is true of a data point then that will produce a constant change in the odds of the outcome If he s European that doubles the odds that he smokes Let L L D B be the likelihood of the data D given the model where B 0 k are the parameters of the model The parameters are estimated by the principle of maximum likelihood Technical point there is no error term in a logistic regression unlike in linear regressions 1 Note that we can convert freely between a probability p and odds o for an event versus its complement p o o p 1 p o 1 1 0 0 0 2 0 4 0 6 0 8 1 0 Logistic function 6 4 2 0 2 4 6 Figure 1 The logistic function 2 Basic R logistic regression models We will illustrate with the Cedegren dataset on the website cedegren read table cedegren txt header T You need to create a two column matrix of success failure counts for your response variable You cannot just use percentages You can give percentages but then weight them by a count of success failures attach cedegren ced del cbind sDel sNoDel Make the logistic regression model The shorter second form is equivalent to the first but don t omit specifying the family ced logr glm ced del cat follows factor class family binomial logit ced logr glm ced del cat follows factor class family binomial The output in more and less detail ced logr Call glm formula ced del cat follows factor class family binomial logit Coefficients Intercept 1 3183 followsV 0 5341 catd catm catn 0 1693 0 1786 0 6667 factor class 2 factor class 3 factor class 4 1 2704 1 0480 1 3742 Degrees of Freedom 51 Total i e Null Null Deviance 958 7 Residual Deviance 198 6 AIC 446 1 catv 0 7675 42 Residual summary ced logr Call glm formula ced del cat follows factor class family binomial logit Deviance Residuals Min 1Q Median 3Q Max 2 followsP 0 9525 3 24384 1 34325 0 04954 1 01488 6 40094 Coefficients Intercept catd catm catn catv followsP followsV factor class 2 factor class 3 factor class 4 Estimate Std Error z value Pr z 1 31827 0 12221 10 787 2e 16 0 16931 0 10032 1 688 0 091459 0 17858 0 08952 1 995 0 046053 0 66672 0 09651 6 908 4 91e 12 0 76754 0 21844 3 514 0 000442 0 95255 0 07400 12 872 2e 16 0 53408 0 05660 9 436 2e 16 1 27045 0 10320 12 310 2e 16 1 04805 0 10355 10 122 2e 16 1 37425 0 10155 13 532 2e 16 Dispersion parameter for binomial family taken to be 1 Null deviance 958 66 Residual deviance 198 63 AIC 446 10 on 51 on 42 degrees of freedom degrees of freedom Number of Fisher Scoring iterations 4 Residual deviance is the difference in G2 2 log L between a maximal model that has a separate parameter for each cell in the model and the built model Changes in the deviance the difference in the quantity 2 log L for two models which can be nested in a reduction will be approximately 2 distributed with dof equal to the change in the number of estimated parameters Thus the difference in deviances can be tested against the 2 distribution for significance The same concerns about this approximation being valid only for reasonably sized expected counts as with contingency tables and multinomials in Suppes 1970 still apply here but we and most people ignore this caution and use the statistic as a rough indicator when exploring to find good models We re usually mainly interested in the relative goodness of models but nevertheless the high residual deviance shows that the model cannot be accepted to have been likely to generate the data pchisq 198 63 42 1 However it certainly fits the data better than the null model which means that a fixed mean probability of deletion is used for all cells pchisq 958 66 198 63 9 1 What can we see from the parameters of this model catd and catm have different effects but both are not very clearly significantly different from the effect of cata the default value All following environments seem distinctive For class all of class 2 4 seem to have somewhat similar effects and we might model class as a two way distinction It seems like we cannot profitably drop a whole factor but we can test that with the anova function to give an analysis of deviance table or the drop1 function to try dropping each factor anova ced logr test Chisq Analysis of Deviance Table Model binomial link logit Response ced del Terms added sequentially first to last NULL cat Df Deviance Resid Df Resid Dev P Chi 51 958 66 4 314 88 47 643 79 6 690e 67 3 follows 2 228 86 45 factor class 3 216 30 42 drop1 ced logr test Chisq Single term deletions Model ced del cat follows Df Deviance none 198 63 cat 4 368 76 follows 2 424 53 factor class 3 414 93 414 93 2 011e 50 198 63 1 266e 46 factor class AIC LRT Pr Chi 446 10 608 23 170 13 2 2e 16 668 00 225 91 2 2e 16 656 39 216 30 2 2e 16 The …


View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams

Loading Unlocking...
Login

Join to view Logistic Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Logistic Regression and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?