Unformatted text preview:

Stat 504, Lecture 10 1'&$%Even More onLogistic RegressionLast time, we began to discuss various kinds ofhypothesis tests in logistic regression. Let’s discusshow to do them in SAS PROC LOGISTIC.Testing H0: βj=0versusH1: βj=0. TheWald chisquare statistics z2=(ˆβj/SE(ˆβk))2for thesetests are displayed along with the estimatedcoefficients in the “Analysis of Maximum LikelihoodEstimates” section. A value of z2bigger than 3.84indicates that we can reject the null hypothesis βj=0at the .05-level.Testing the joint significance of all predictors.In the modellogπ1 − π= β0+ β1X1+ β2X2+ ···+ βpXp,this is the test of H0: β1= β2= ··· = βp= 0 versusthe alternative that at least one of the coefficientsβ1,...,βpis not zero.Stat 504, Lecture 10 2'&$%In other words, this is testing the null hypothesis thatan intercept-only model is correct, versus thealternative that the current model is correct. In theSAS output, three different chisquare statistics forthis test are displayed in the section “Testing GlobalNull Hypothesis: Beta=0,” corresponding to the thelikelihood ratio, score and Wald tests. This test has pdegrees of freedom. Large chisquare statistics lead tosmall p-values and provide evidence against theintercept-only model in favor of the current model.If these three tests agree, that’s evidence that thelarge-sample approximations are working well and theresults are trustworthy. If the results from the threetests disagree, most statisticians would tend to trustthe likelihood-ratio test more than the other two.Testing that an arbitrary group of coefficientsis zero. To test the null hypothesis that a group of kcoefficients is zero, we need to fit two models:• the reduced model which omits the k predictorsin question, and• the current model which includes them.Stat 504, Lecture 10 3'&$%The null hypothesis is that the reduced model is true;the alternative is that the current model is true. Toperform the test, we must look at the “Model FitStatistics” section and examine the value of “−2LogL” for “Intercept and Covariates.” Thelikelihood-ratio statistic is∆G2= −2logL from reduced model− (−2logL from current model)and the degrees of freedom is k (the number ofcoefficients in question). The p-value isP (χ2k≥ ∆G2). Larger values of ∆G2lead to smallp-values, which provide evidence against the reducedmodel in facvor of the current model.Another way to calculate the test statistic is∆G2= G2from reduced model− G2from current model,where the G2’s are the overall goodness-of-fitstatistics which we will mention next.Stat 504, Lecture 10 4'&$%Overall goodness-of-fit test. In grouped-dataexamples where the ni’s are sufficiently large, we canalso test the overall fit of our model versus asaturated alternative. That is, we test the nullhypothesis that the current model is true, versus thealternative of a model that estimates a successprobability πiindependently for each rowi =1, 2,...,N of the dataset.As we discussed last time, this goodness-of-fit test isbasedonthePearsonstatisticX2=N i=1r2i,whereri=yi− ˆµiˆV (ˆµi)=yi− niˆπiniˆπi(1 − ˆπi)is the Pearson residual for case i. Orwecanusethedeviance statisticG2=2N i=1yilogyiˆµi+(ni− yi)logni− yini− ˆµi.The degrees of freedom for this test is N minus thenumber of parameters being estimated in the currentmodel (including an intercept, if present).Stat 504, Lecture 10 5'&$%The p-value is the area to the right of X2or G2underthe chisquare density curve.Notice that, for this test, large values of X2or G2lead to small p-values which provide evidence that thecurrent model does not fit. In all of the previoustests, small p-values led us to prefer the currentmodel. But in this test, small p-values lead us toreject the current model.To get PROC LOGISTIC to print out the teststatistics X2and G2, we need to include the optionSCALE=NONE in the model statement, like this:model y/n = logconc / scale=none;Applying this to the dose-response example from thelast lecture, we get a new section in the output:Deviance and Pearson Goodness-of-Fit StatisticsCriterion DF Value Value/DF Pr > ChiSqDeviance 3 0.5347 0.1782 0.9112Pearson 3 0.3610 0.1203 0.9482Stat 504, Lecture 10 6'&$%The large p-values indicate that the model seems tofit well. However, in this example, the ni’s are notlarge enough for the χ2approximation to betrustworthy. Recall from the last lecture that, inorder for the approximation to work well, we need tohave at least 80% of the expected counts in the N × 2table to be at least 5.0, and none should be less than1.0. (This is the N × 2 table with rows i =1,...,Ncorresponding to cases in the dataset, and columnscorresponding to successes yiand failures ni− yi.)SAS does not automatically calculate the estimatedexpected number of successes and failures, but theyare easy to get. If we include the statementoutput out=results predicted=phatreschi=pearson resdev=deviance;in the PROC LOGISTIC call, then SAS creates a newdataset called “results” that includes all of thevariables in the original dataset, the predictedprobabilities ˆπi, the Pearson residuals and thedeviance residuals. Then we can add some code tocalculate and print out the estimated expectednumber of successes ˆµi= niˆπiand failuresni− ˆµi= ni(1 − ˆπi).Stat 504, Lecture 10 7'&$%A revised SAS program that does all this is shownbelow:options nocenter nodate nonumber linesize=72;data dose;input logconc n y;cards;-5 6 0-4 6 1-3 6 4-2 6 6-1 6 6;proc logist data=dose;model y/n = logconc / scale=none;output out=results predicted=phatreschi=pearson resdev=deviance;run;data diagnostics;set results;shat = n*phat;fhat = n*(1-phat);run;proc print data=diagnostics;var logconc y n phat shat fhat pearson deviance;run;Stat 504, Lecture 10 8'&$%Running this program gives a new output section:Obs logconc y n phat shat fhat pearson deviance1 -5 0 6 0.00809 0.04854 5.95146 -0.22121 -0.312212 -4 1 6 0.12677 0.76060 5.23940 0.29375 0.282143 -3 4 6 0.72098 4.32586 1.67414 -0.29660 -0.291304 -2 6 6 0.97872 5.87232 0.12768 0.36119 0.508055 -1 6 6 0.99878 5.99268 0.00732 0.08561 0.12104Most of the “shat” and “fhat” values are less than5.0, so the χ2approximation is not trustworthy.Even if the goodness-of-fit tests based on X2or G2are not trustworthy, testing the relative fit of “nested”models based on differences in X2or G2can still beeffective. That


View Full Document

PSU STAT 504 - Logistic Regression

Download Logistic Regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Logistic Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Logistic Regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?