PSU STAT 544 - Polytomous Regression - D2081012

Home> Schools> Penn State University> Statistics (STAT) > STAT 544> Polytomous Regression

DOC PREVIEW

PSU STAT 544 - Polytomous Regression

School name Penn State University

Course Stat 544- Categorical Data

Pages 19

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 19 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Stat 544, Lecture 20 1'&$%More on PolytomousRegression ModelsLast time, we ﬁt a model to the now-famous alligatorfood-choice dataset.Primary Food ChoiceLake Sex Size Fish Inv. Rept. Bird OtherHancock M small 7 1 0 0 5large 4 0 0 1 2Fsmall16 3 2 2 3large 3 0 1 2 3Oklawaha M small 2 2 0 0 1large 13 7 6 0 0Fsmall3 9 1 0 2large 0 1 0 1 0Trafford M small 3 7 1 0 1large 8 6 6 3 5Fsmall2 4 1 1 4large 0 1 0 0 0George M small 13 10 0 2 2large 9 0 0 1 2Fsmall3 9 1 0 1large 8 1 0 0 1Stat 544, Lecture 20 2'&$%We letπ1= prob. of ﬁsh,π2= prob. of invertebrates,π3= prob. of reptiles,π4= prob. of birds,,π5= prob. of other,and made “ﬁsh” be the baseline category. The logitequations werelog„πjπ1«= β0+ β1X1+ ···for j =2, 3, 4, 5. The X’s included• three dummy indicators for lake,• a dummy for sex, and• a dummy for size.Therefore, each logit equation had six coeﬃcients tobe estimated, so the number of free parameters in thismodel was 4 × 6 = 24.We found that• lake was highly signiﬁcant (Wald chisquare=36.2,df=12),Stat 544, Lecture 20 3'&$%• size was highly signiﬁcant (Wald chisquare=15.9,df=3),• sex was not signiﬁcant (Wald chisquare=2.2,df=3).Wald statistics might not be as accurate as deviancetests. Let’s adopt an analysis-of-deviance approach tocompare various models.First, let’s ﬁnd the deviance G2for the null(intercept-only) model, a model with just fourparameters. Because there are N =4× 2 × 2=16unique covariate patterns, the saturated model willhave 16 × (5 − 1) = 64 free parameters, so the G2statistic for the null model should have 64 − 4=60degrees of freedom. Let’s ﬁt the null model in PROCLOGISTIC, like this:options nocenter nodate nonumber linesize=72;data gator;input lake $ sex $ size $ food $ count;cards;Hancock male small fish 7--lines omitted--George female large other 1;Stat 544, Lecture 20 4'&$%proc logist data=gator;freq count;class lake size sex / order=data param=ref ref=first;model food(ref=’fish’) = / link=glogitaggregate scale=none;run;The ﬁt statistics are:Model Convergence StatusConvergence criterion (GCONV=1E-8) satisfied.-2 Log L = 604.3629Deviance and Pearson Goodness-of-Fit StatisticsCriterion DF Value Value/DF Pr > ChiSqDeviance 0 0.0000 . .Pearson 0 0.0000 . .Number of unique profiles: 1What happened? By default, the aggregate optioncalculates goodness-of-ﬁt statistics for a table thataggregates over the unique patterns for the covariatesappearing in the model. In this case, there are nocovariates in the model, so there is only one “uniqueproﬁle” and the intercept-only model is considered tobe saturated.We want SAS to compute the ﬁt statistics relative toa saturated model that estimates the responseprobabilities independently for each combination ofStat 544, Lecture 20 5'&$%lake, sex and size. To do that, we change the modelstatement like this:proc logist data=gator;freq count;class lake size sex / order=data param=ref ref=first;model food(ref=’fish’) = / link=glogitaggregate=(lake size sex) scale=none;run;Now the results are:Deviance and Pearson Goodness-of-Fit StatisticsCriterion DF Value Value/DF Pr > ChiSqDeviance 60 116.7611 1.9460 <.0001Pearson 60 106.4922 1.7749 0.0002Number of unique profiles: 16Repeating the model-ﬁtting for various sets ofpredictors, we obtain the followinganalysis-of-deviance table:Stat 544, Lecture 20 6'&$%Model G2dfSaturated 0.00 0Lake + Size + Lake×Size∗∗35.40 32Lake + Size + Sex 50.26 40Lake + Size 52.48 44Lake 73.57 48Size 101.61 56Sex 114.66 56Null 116.76 60∗∗Note: did not convergeWe ran into trouble when we included the lake×sizeinteraction. Here are some relevant portions of theoutput:Model Convergence StatusQuasi-complete separation of data points detected.WARNING: The maximum likelihood estimate may not exist.WARNING: The LOGISTIC procedure continues in spite of the abovewarning. Results shown are based on the last maximumlikelihood iteration. Validity of the model fit isquestionable.Deviance and Pearson Goodness-of-Fit StatisticsCriterion DF Value Value/DF Pr > ChiSqDeviance 32 35.3989 1.1062 0.3109Pearson 32 38.2807 1.1963 0.2058Number of unique profiles: 16Stat 544, Lecture 20 7'&$%Model Fit StatisticsInterceptIntercept andCriterion Only CovariatesAIC 612.363 587.001SC 625.919 695.451-2 Log L 604.363 523.001Testing Global Null Hypothesis: BETA=0Test Chi-Square DF Pr > ChiSqLikelihood Ratio 81.3622 28 <.0001Score 73.0595 28 <.0001Wald 44.1606 28 0.0268Type III Analysis of EffectsWaldEffect DF Chi-Square Pr > ChiSqlake 12 18.6397 0.0976size 4 2.8868 0.5769lake*size 12 6.2811 0.9013WARNING: The validity of the model fit is questionable.Analysis of Maximum Likelihood EstimatesStandard WaldParameter food DF Estimate Error Chi-SquareIntercept bird 1 -2.4423 0.7372 10.9757Intercept invert 1 -1.7492 0.5417 10.4256Intercept other 1 -1.0561 0.4105 6.6195Intercept reptile 1 -2.4423 0.7372 10.9757lake Oklawaha bird 1 -10.2353 253.2 0.0016lake Oklawaha invert 1 2.5377 0.7645 11.0196lake Oklawaha other 1 0.5452 0.8377 0.4236Stat 544, Lecture 20 8'&$%lake Oklawaha reptile 1 0.8329 1.3204 0.3979lake Trafford bird 1 0.8329 1.3204 0.3979lake Trafford invert 1 2.5377 0.7645 11.0196lake Trafford other 1 1.0561 0.7540 1.9618lake Trafford reptile 1 1.5261 1.1151 1.8728lake George bird 1 0.3629 1.0517 0.1191lake George invert 1 1.9211 0.6392 9.0317lake George other 1 -0.6179 0.7512 0.6766lake George reptile 1 -0.3302 1.2673 0.0679size large bird 1 1.5950 1.0098 2.4951size large invert 1 -10.2786 154.6 0.0044size large other 1 0.7196 0.7151 1.0126size large reptile 1 0.4964 1.2986 0.1461lake*size Oklawaha large bird 1 8.5176 253.2 0.0011lake*size Oklawaha large invert 1 9.0046 154.6 0.0034lake*size Oklawaha large other 1 -12.6194 137.4 0.0084lake*size Oklawaha large reptile 1 0.3398 1.7692 0.0369lake*size Trafford large bird 1 -0.9664 1.6365 0.3488lake*size Trafford large invert 1 9.3566 154.6 0.0037lake*size Trafford large other 1 -1.1896 1.1119 1.1446lake*size Trafford large reptile 1 0.1322 1.6365 0.0065lake*size George large bird 1 -2.3488 1.6251 2.0890lake*size George large invert 1 7.2735 154.6 0.0022lake*size George large other 1 -0.7802 1.1399 0.4685lake*size George large reptile 1 -11.1988 204.6 0.0030“Quasi-separation” means that the model eﬀectivelyincludes dummy indicators for groups with observedfrequencies of zero, so that the ML estimates forcertain coeﬃcients are running oﬀ to ±∞. Notice inthe table of ML estimates that

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3-4-5-6 out of 19 pages.

PSU STAT 544 - Polytomous Regression

Sign up for free to view:

Please select your school