DOC PREVIEW
PSU STAT 504 - Logistic Regression

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Stat 504, Lecture 11 1'&$%Still More AboutLogistic RegressionLast time, we were re-examining the 3 × 2 × 2 tablethat cross-classifies 800 boys according tosocioeconomic status (S), whether a boy scout (B),and juvenile delinquency (D):Socioeconomic Boy Delinquentstatus scout Yes NoLow Yes 11 43No 42 169Medium Yes 14 104No 20 132High Yes 8 196No 2 59We analyzed the BD marginal table and showed howthis 2 × 2 analysis can be carried out as a logisticregression with a dummy variable. Now let’s do asimilar analysis for the 3 × 2 table that classifiessubjects by S and D:Stat 504, Lecture 11 2'&$%Socioeconomic Delinquentstatus Yes NoLow 53 212Medium 34 236High 10 255Two odds ratios of interest are53 × 23634 × 212=1.735,53 × 25510 × 212=6.375.We estimate that the odds of delinquency for the S =low group are 1.735 times as high as for the S =medium group, and 6.375 times as high as for the S =high group. The estimated log odds arelog 1.735 = .5512,log 6.375 = 1.852,and the standard errors are153+1212+134+1236=0.2392,153+1212+110+1255=0.3571.Stat 504, Lecture 11 3'&$%Now let’s replicate this analysis using logisticregression. First, we re-express the data in terms of yi= number of delinquents and ni= number of boys forthe three S-groups:yiniLow 53 265Medium 34 270High 10 265Then we define a pair of dummy indicators,X1=1ifS=medium,0 otherwise,X2=1ifS=high,0 otherwise.Let π = odds of delinquency. Then the modellogπ1 − π= β0+ β1X1+ β2X2says that the log-odds of delinquency are β0forS=low, β0+ β1for S=medium, and β0+ β2forS=high.Stat 504, Lecture 11 4'&$%Therefore,β1= log-odds for S=medium− log-odds for S=low,β2= log-odds for S=high− log-odds for S=low,and we expect to getˆβ1= −.5512 andˆβ2= −1.852.The estimated intercept should beˆβ0= log(53/212) = −1.386A SAS program for fitting this model is shown below.options nocenter nodate nonumber linesize=72;data new;input S $ y n;cards;low 53 265medium 34 270high 10 265;proc logist data=new;class S / order=data param=ref ref=first;model y/n = S / scale=none;run;Stat 504, Lecture 11 5'&$%In the class statement, the option order=data tellsSAS to sort the categories of S by the order in whichthey appear in the dataset rather than alphabeticalorder. The option param=ref tells SAS to create a setof two dummy variables to distinguish among thethree categories. The option ref=first makes S=lowthe reference group (i.e. the group for which bothdummy variables are zero). Some relevant portions ofthe output are shown below.Deviance and Pearson Goodness-of-Fit StatisticsCriterion DF Value Value/DF Pr > ChiSqDeviance 0 0.0000 . .Pearson 0 0.0000 . .Number of events/trials observations: 3Model Fit StatisticsInterceptIntercept andCriterion Only CovariatesAIC 593.053 560.801SC 597.738 574.855-2 Log L 591.053 554.801Testing Global Null Hypothesis: BETA=0Test Chi-Square DF Pr > ChiSqLikelihood Ratio 36.2523 2 <.0001Score 32.8263 2 <.0001Wald 27.7335 2 <.0001Stat 504, Lecture 11 6'&$%Type III Analysis of EffectsWaldEffect DF Chi-Square Pr > ChiSqS 2 27.7335 <.0001Analysis of Maximum Likelihood EstimatesStandard WaldParameter DF Estimate Error Chi-Square Pr > ChiSqIntercept 1 -1.3863 0.1536 81.4848 <.0001S medium 1 -0.5512 0.2392 5.3080 0.0212S high 1 -1.8524 0.3571 26.9110 <.0001Odds Ratio EstimatesPoint 95% WaldEffect Estimate Confidence LimitsS medium vs low 0.576 0.361 0.921S high vs low 0.157 0.078 0.316In this case, the “intercept only” model says thatdelinquency is unrelated to socioeconomic status, sothe test of the global null hypothesis β1= β2=0isequivalent to the usual test for independence in the3 × 2 table. The estimated coefficients and SE’s are aswe predicted, and the estimated odds ratios areexp(−.5512) = 0.576 = 1/1.735,exp(−1.852) = 0.157 = 1/6.375.Stat 504, Lecture 11 7'&$%Collapsing and goodness of fit. In this example,we could have also arranged the input data like this:SByinilow scout 11 54low nonscout 42 211medium scout 14 118medium nonscout 20 152high scout 8 204high nonscout 2 61A SAS program for fitting the same model is shownbelow.options nocenter nodate nonumber linesize=72;data new;input S $ B $ y n;cards;low scout 11 54low nonscout 42 211medium scout 14 118medium nonscout 20 152high scout 8 204high nonscout 2 61;proc logist data=new;class S / order=data param=ref ref=first;model y/n = S / scale=none;run;Stat 504, Lecture 11 8'&$%The parameter estimates from this new program areexactly the same as before:Analysis of Maximum Likelihood EstimatesStandard WaldParameter DF Estimate Error Chi-Square Pr > ChiSqIntercept 1 -1.3863 0.1536 81.4848 <.0001S medium 1 -0.5512 0.2392 5.3080 0.0212S high 1 -1.8524 0.3571 26.9110 <.0001But the overall fit statistics are different. Before, wehad X2= 0 and G2= 0 because the model wassaturated (there were three parameters and N =3lines of data). But now, the fit statistics are:Deviance and Pearson Goodness-of-Fit StatisticsCriterion DF Value Value/DF Pr > ChiSqDeviance 3 0.1623 0.0541 0.9834Pearson 3 0.1602 0.0534 0.9837Number of events/trials observations: 6The model appears to fit very well, but it is no longersaturated. What happened? Recall that X2and G2are testing the null hypothesis that the current modelis correct, versus the alternative of a saturated model.When we disaggregated the data by levels of B, usingsix input lines rather than three, the current modeldid not change but the saturated model did; thesaturated model was enlarged to six parameters.Stat 504, Lecture 11 9'&$%Another way to intepret the overall X2and G2goodness-of-fit tests is that they are testing thesignificance of all omitted covariates. If we collapsethe data over B and use only three lines of data, thenSAS is unaware of the existence of B. But if wedisaggregate the data by levels of B and do notinclude it in the model, then SAS has the opportunityto test the fit of the current model—in which theprobability of delinquency varies by S alone—againstthe saturated alternative in which the probability ofdelinquency varies by each combination of the levelsof S and B. When the data are disaggregated, thegoodness-of-fit tests are actually testing thehypothesis that D is unrelated to B once S has beentaken into account—i.e., that D and B areconditionally independent given S.Here’s another way to think about it. The currentmodel has three parameters:• an intercept, and• two dummies for S.Stat 504, Lecture 11 10'&$%But the


View Full Document

PSU STAT 504 - Logistic Regression

Download Logistic Regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Logistic Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Logistic Regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?