STAT 333 Discussion 13 May 1, 2013Logistic regression for binary response variables• Let pi= Pr(Yi= 1|Xi= xi),logpi1 − pi= b0+ b1xi• model fit: out=glm(y~x, family=binomial), summary, anova, step• predict: predict(out, newdata=data.frame(x=1), type="response")• no adjusted R2; use AIC.Example-runoff data:Data were collected over a 4-year period from a Madison home. The objective is to predict the probability that arain storm produces surface runoff using the information of the total amount of precipitation, the storm intensityand duration.> runoff=read.table("runoff.txt",header=T)> runoff[1:5,]duration precip intens runoff_event1 111 0.47 0.96 02 473 0.34 0.18 03 108 0.16 0.24 04 290 0.33 0.24 05 34 0.12 0.30 0> dim(runoff)[1] 231 4> out1=glm(runoff_event~duration+precip+intens, family=binomial, data=runoff)> summary(out1)Call:glm(formula = runoff_event ~ duration + precip + intens, family = binomial,data = runoff)Deviance Residuals:Min 1Q Median 3Q Max-1.7329 -0.3444 -0.2152 -0.1646 2.5936Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) -5.0170706 0.7356896 -6.820 9.13e-12duration 0.0006898 0.0023334 0.296 0.767519precip 2.5412219 1.1359910 2.237 0.025286intens 1.9567206 0.5555283 3.522 0.000428(Dispersion parameter for binomial family taken to be 1)Null deviance: 227.82 on 230 degrees of freedomResidual deviance: 116.02 on 227 degrees of freedom1AIC: 124.02Number of Fisher Scoring iterations: 6> # we delete the unsignificant predictor "duration" and refit the model again.> # carefully interpret the meaning of estimated coefficient> # what is the null deviance and residuals deviance> out2=glm(runoff_event~precip+intens, family=binomial, data=runoff)> summary(out2)Call:glm(formula = runoff_event ~ precip + intens, family = binomial,data = runoff)Deviance Residuals:Min 1Q Median 3Q Max-1.6970 -0.3523 -0.2146 -0.1631 2.6065Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) -4.9017 0.6157 -7.961 1.70e-15precip 2.8148 0.6750 4.170 3.05e-05intens 1.8377 0.3753 4.896 9.78e-07(Dispersion parameter for binomial family taken to be 1)Null deviance: 227.82 on 230 degrees of freedomResidual deviance: 116.11 on 228 degrees of freedomAIC: 122.11Number of Fisher Scoring iterations: 6> # find the relationship between the following two values> predict(out2, newdata=data.frame(duration=100,precip=0.5,intens=1))1-1.656628> predict(out2, newdata=data.frame(duration=100,precip=0.5,intens=1), type="response")10.1602151> # explain two kinds of residuals: deviance residuals and pearson residuals> library(boot)> par(mfrow=c(2,2))> plot(out2, 1)> plot(log(fitted.values(out2)/(1-fitted.values(out2))),glm.diag(out2)$rd)> plot(glm.diag(out2)$h,glm.diag(out2)$rp)> plot(out2, 5)2−5 0 5 10 15−2 −1 0 1 2 3Predicted valuesResiduals●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Residuals vs Fitted2107752●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●−5 0 5 10 15−1 0 1 2log(fitted.values(out2)/(1 − fitted.values(out2)))glm.diag(out2)$rd●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0.00 0.02 0.04 0.06 0.08 0.10 0.12−2 0 2 4glm.diag(out2)$hglm.diag(out2)$rp0.00 0.02 0.04 0.06 0.08 0.10 0.12−2 0 2 4 6LeverageStd. Pearson resid.●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●Cook's distance0.51Residuals vs
View Full Document