PSU STAT 504 - Logistic Regression - D3051727

Home> Schools> Penn State University> Statistics (STAT) > STAT 504> Logistic Regression

DOC PREVIEW

PSU STAT 504 - Logistic Regression

School name Penn State University

Course Stat 504- Analysis of Discrete Data

Pages 17

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 17 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Stat 504, Lecture 11 1'&$%Still More AboutLogistic RegressionLast time, we were re-examining the 3 × 2 × 2 tablethat cross-classiﬁes 800 boys according tosocioeconomic status (S), whether a boy scout (B),and juvenile delinquency (D):Socioeconomic Boy Delinquentstatus scout Yes NoLow Yes 11 43No 42 169Medium Yes 14 104No 20 132High Yes 8 196No 2 59We analyzed the BD marginal table and showed howthis 2 × 2 analysis can be carried out as a logisticregression with a dummy variable. Now let’s do asimilar analysis for the 3 × 2 table that classiﬁessubjects by S and D:Stat 504, Lecture 11 2'&$%Socioeconomic Delinquentstatus Yes NoLow 53 212Medium 34 236High 10 255Two odds ratios of interest are53 × 23634 × 212=1.735,53 × 25510 × 212=6.375.We estimate that the odds of delinquency for the S =low group are 1.735 times as high as for the S =medium group, and 6.375 times as high as for the S =high group. The estimated log odds arelog 1.735 = .5512,log 6.375 = 1.852,and the standard errors are153+1212+134+1236=0.2392,153+1212+110+1255=0.3571.Stat 504, Lecture 11 3'&$%Now let’s replicate this analysis using logisticregression. First, we re-express the data in terms of yi= number of delinquents and ni= number of boys forthe three S-groups:yiniLow 53 265Medium 34 270High 10 265Then we deﬁne a pair of dummy indicators,X1=1ifS=medium,0 otherwise,X2=1ifS=high,0 otherwise.Let π = odds of delinquency. Then the modellogπ1 − π= β0+ β1X1+ β2X2says that the log-odds of delinquency are β0forS=low, β0+ β1for S=medium, and β0+ β2forS=high.Stat 504, Lecture 11 4'&$%Therefore,β1= log-odds for S=medium− log-odds for S=low,β2= log-odds for S=high− log-odds for S=low,and we expect to getˆβ1= −.5512 andˆβ2= −1.852.The estimated intercept should beˆβ0= log(53/212) = −1.386A SAS program for ﬁtting this model is shown below.options nocenter nodate nonumber linesize=72;data new;input S $ y n;cards;low 53 265medium 34 270high 10 265;proc logist data=new;class S / order=data param=ref ref=first;model y/n = S / scale=none;run;Stat 504, Lecture 11 5'&$%In the class statement, the option order=data tellsSAS to sort the categories of S by the order in whichthey appear in the dataset rather than alphabeticalorder. The option param=ref tells SAS to create a setof two dummy variables to distinguish among thethree categories. The option ref=first makes S=lowthe reference group (i.e. the group for which bothdummy variables are zero). Some relevant portions ofthe output are shown below.Deviance and Pearson Goodness-of-Fit StatisticsCriterion DF Value Value/DF Pr > ChiSqDeviance 0 0.0000 . .Pearson 0 0.0000 . .Number of events/trials observations: 3Model Fit StatisticsInterceptIntercept andCriterion Only CovariatesAIC 593.053 560.801SC 597.738 574.855-2 Log L 591.053 554.801Testing Global Null Hypothesis: BETA=0Test Chi-Square DF Pr > ChiSqLikelihood Ratio 36.2523 2 <.0001Score 32.8263 2 <.0001Wald 27.7335 2 <.0001Stat 504, Lecture 11 6'&$%Type III Analysis of EffectsWaldEffect DF Chi-Square Pr > ChiSqS 2 27.7335 <.0001Analysis of Maximum Likelihood EstimatesStandard WaldParameter DF Estimate Error Chi-Square Pr > ChiSqIntercept 1 -1.3863 0.1536 81.4848 <.0001S medium 1 -0.5512 0.2392 5.3080 0.0212S high 1 -1.8524 0.3571 26.9110 <.0001Odds Ratio EstimatesPoint 95% WaldEffect Estimate Confidence LimitsS medium vs low 0.576 0.361 0.921S high vs low 0.157 0.078 0.316In this case, the “intercept only” model says thatdelinquency is unrelated to socioeconomic status, sothe test of the global null hypothesis β1= β2=0isequivalent to the usual test for independence in the3 × 2 table. The estimated coeﬃcients and SE’s are aswe predicted, and the estimated odds ratios areexp(−.5512) = 0.576 = 1/1.735,exp(−1.852) = 0.157 = 1/6.375.Stat 504, Lecture 11 7'&$%Collapsing and goodness of fit. In this example,we could have also arranged the input data like this:SByinilow scout 11 54low nonscout 42 211medium scout 14 118medium nonscout 20 152high scout 8 204high nonscout 2 61A SAS program for ﬁtting the same model is shownbelow.options nocenter nodate nonumber linesize=72;data new;input S $ B $ y n;cards;low scout 11 54low nonscout 42 211medium scout 14 118medium nonscout 20 152high scout 8 204high nonscout 2 61;proc logist data=new;class S / order=data param=ref ref=first;model y/n = S / scale=none;run;Stat 504, Lecture 11 8'&$%The parameter estimates from this new program areexactly the same as before:Analysis of Maximum Likelihood EstimatesStandard WaldParameter DF Estimate Error Chi-Square Pr > ChiSqIntercept 1 -1.3863 0.1536 81.4848 <.0001S medium 1 -0.5512 0.2392 5.3080 0.0212S high 1 -1.8524 0.3571 26.9110 <.0001But the overall ﬁt statistics are diﬀerent. Before, wehad X2= 0 and G2= 0 because the model wassaturated (there were three parameters and N =3lines of data). But now, the ﬁt statistics are:Deviance and Pearson Goodness-of-Fit StatisticsCriterion DF Value Value/DF Pr > ChiSqDeviance 3 0.1623 0.0541 0.9834Pearson 3 0.1602 0.0534 0.9837Number of events/trials observations: 6The model appears to ﬁt very well, but it is no longersaturated. What happened? Recall that X2and G2are testing the null hypothesis that the current modelis correct, versus the alternative of a saturated model.When we disaggregated the data by levels of B, usingsix input lines rather than three, the current modeldid not change but the saturated model did; thesaturated model was enlarged to six parameters.Stat 504, Lecture 11 9'&$%Another way to intepret the overall X2and G2goodness-of-ﬁt tests is that they are testing thesignificance of all omitted covariates. If we collapsethe data over B and use only three lines of data, thenSAS is unaware of the existence of B. But if wedisaggregate the data by levels of B and do notinclude it in the model, then SAS has the opportunityto test the ﬁt of the current model—in which theprobability of delinquency varies by S alone—againstthe saturated alternative in which the probability ofdelinquency varies by each combination of the levelsof S and B. When the data are disaggregated, thegoodness-of-ﬁt tests are actually testing thehypothesis that D is unrelated to B once S has beentaken into account—i.e., that D and B areconditionally independent given S.Here’s another way to think about it. The currentmodel has three parameters:• an intercept, and• two dummies for S.Stat 504, Lecture 11 10'&$%But the

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5-6 out of 17 pages.

PSU STAT 504 - Logistic Regression

Sign up for free to view:

Please select your school