PSU STAT 544 - Overdispersion and Diagnostics - D1985267

Home> Schools> Penn State University> Statistics (STAT) > STAT 544> Overdispersion and Diagnostics

DOC PREVIEW

PSU STAT 544 - Overdispersion and Diagnostics

School name Penn State University

Course Stat 544- Categorical Data

Pages 22

This preview shows page 1-2-21-22 out of 22 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 22 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Stat 544, Lecture 11 1'&$%Overdispersionand DiagnosticsLogistic regression and three-way tables. Thistable, which we saw in Lecture 7, classiﬁes n = 800boys according to socioeconomic status (S), whethera boy scout (B), and juvenile delinquency (D):Socioeconomic Boy Delinquentstatus scout Yes NoLow Yes 11 43No 42 169Medium Yes 14 104No 20 132High Yes 8 196No 2 59As we discussed in Lecture 7, there are many diﬀerentmodels that we could ﬁt to this table. But if we thinkof D as a response and B and S as potentialpredictors, we can focus on a subset of models thatmake sense.Stat 544, Lecture 11 2'&$%Let π be the probability of delinquency. The simplestmodel worth considering is the null or intercept-onlymodel,log„π1 − π«= β0, (1)in which D is unrelated to B or S. If we were to ﬁtthis model in PROC LOGISTIC using thedisaggregated data (all six lines), we would ﬁnd thatthe X2and G2statistics are identical to those weobtained in Lecture 7 from testing the null hypothesis“S and B are independent of D.” That is, testing theoverall ﬁt of (1) is equivalent to testing the ﬁt of themodel (D, SB), because (1) says that D is unrelatedto S and B but makes no assumptions about whetherS and B are related.After (1), we may want to ﬁt a model that has a maineﬀect for B,log„π1 − π«= β0+ β1X1, (2)where X1is a dummy indicator equal to 1 for scoutsand 0 for non-scouts. If the data are provided in sixlines, the goodness-of-ﬁt test for model (2) will beequivalent to the test for (DB, SB), which says thatStat 544, Lecture 11 3'&$%D and S are conditionally independent given B. Thismakes sense, because (2) says that S has no eﬀect onD once B has been taken into account. Now considerthe model that has main eﬀects for S,log„π1 − π«= β0+ β2X2+ β3X3, (3)where X2and X3are dummy indicators todistinguish among the three categories of S. Thismodel says that B has no eﬀect on D once S has beentaken into account. The goodness-of-ﬁt tests for (3)are equivalent to testing the null hypothesis that(DS, BS) ﬁts, i.e. that D and B are conditionallyindependent given S.The logit model that has main eﬀects for B and S,log„π1 − π«= β0+ β1X1+ β2X2+ β3X3, (4)corresponds to the model of homogeneous associationwhich we discussed in Lecture 7. We could not ﬁtthat model at that time, because the ML estimateshave no closed-form solution. But with logisticregression software, ﬁtting this model is no moredicult than for any other model. This model says thatthe eﬀect of B on D, when expressed in terms of oddsStat 544, Lecture 11 4'&$%ratios, is identical across the levels of S. Equivalently,it says that the odds ratios describing the relationshipbetween S and D are identical across the levels of B.If this model does not ﬁt, we have evidence that theeﬀect of B on D varies across the levels of S, or thatthe eﬀect of S on D varies across the levels of B.Finally, the saturated model can be written aslog„π1 − π«= β0+ β1X1+ β2X2+ β3X3+ β4X1X2+ β5X1X3, (5)which has main eﬀects for B and S and theirinteractions. This model has X2= G2=0.Let’s collect the ﬁt statistics from these models intoan analysis-of-deviance table.Model G2df pSaturated 0.00 0 —S + B 0.15 2 .928S 0.16 3 .984B 28.80 4 .000Null 36.43 5 .000From this table, we may conclude that:Stat 544, Lecture 11 5'&$%• The Null model clearly does not ﬁt.• Adding B to the Null model drops the devianceby 36.41 − 28.80 = 7.61, which is highlysigniﬁcant because P (χ21≥ 7.61) = .006. So theB model ﬁts signiﬁcantly better than the Nullmodel. But the B model still doesn’t ﬁt.• Adding S to the Null model drops the devianceby 36.41 − 0.16 = 36.25, and P (χ22≥ 36.25) ≈ 0.So the S model ﬁts signiﬁcantly better than theNull model. And the S model ﬁts the data verywell.• Adding B to the S model, i.e. comparing S + Bto S alone, drops the deviance by only .01. Sothe ﬁt of S + B is not signiﬁcantly better than S.Both of these models ﬁt the data well. (If S ﬁts,then S + B must also ﬁt, because S is a specialcase of S + B.) Given that both of these modelsﬁt, we prefer the S model because it’s simpler.Based on this table, the best model is S, because it’sthe simplest model that ﬁts the data well.Stat 544, Lecture 11 6'&$%Overdispersion. For any model, the overall X2orG2can be viewed as testing the joint signiﬁcance ofall covariates that are not in a given model. Largevalues for X2and G2suggest that our model is toosmall, i.e. that the current set of covariates doesn’tadequately explain the observed variation in thesample proportions pi= yi/ni.Sometimes lack of ﬁt can be remedied by adding morecovariates to the model. But sometimes it can’t.Perhaps no more covariates are available. Perhapsadding more covariates (e.g. high-order interactions)would make the model very complicated and diﬃcultto interpret.As we brieﬂy discussed in Lecture 9, another possiblereason for lack of ﬁt is overdispersion. Overdispersionmeans that the variance in the response yiis largerthan the niπi(1 − πi) indicated by the binomial model.There is no such thing as overdispersion in ordinarylinear regression. In a linear regression modelyi∼ N( xTiβ, σ2),the variance σ2is estimated independently of themean function xTiβ. With discrete response variables,Stat 544, Lecture 11 7'&$%however, the possibility for overdispersion existsbecause the commonly used distributions specifyparticular relationships between the variance and themean. For example, if yi∼ Bin(ni,πi), the mean isμi= niπiand the variance is μi(ni− μi)/ni.IfyiisPoisson with mean μi, the variance is μi.With real data, we may ﬁnd that the variance of theresponse yiis greater than it should be under thegiven model. (Underdispersion is also theoreticallypossible, but rare in practice.) McCullagh and Nelder(1989) say that overdispersion tends to be the rulerather than the exception. If our model is correct, thePearson residuals should behave like standardizedresiduals, i.e. like standard normal variates. But iftheir variance is substantially larger than one—andthis extra variation is “spread across” theobservational units, rather than concentrated in asmall number of outliers—then we have evidence ofoverdispersion.Remedies for overdispersion. Overdispersion canbe handled in two diﬀerent ways. One way is tospecify a richer parametric model, where thedistribution of the response

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-21-22 out of 22 pages.

PSU STAT 544 - Overdispersion and Diagnostics

Sign up for free to view:

Please select your school