DOC PREVIEW
PSU STAT 544 - Overdispersion and Diagnostics

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Stat 544, Lecture 11 1'&$%Overdispersionand DiagnosticsLogistic regression and three-way tables. Thistable, which we saw in Lecture 7, classifies n = 800boys according to socioeconomic status (S), whethera boy scout (B), and juvenile delinquency (D):Socioeconomic Boy Delinquentstatus scout Yes NoLow Yes 11 43No 42 169Medium Yes 14 104No 20 132High Yes 8 196No 2 59As we discussed in Lecture 7, there are many differentmodels that we could fit to this table. But if we thinkof D as a response and B and S as potentialpredictors, we can focus on a subset of models thatmake sense.Stat 544, Lecture 11 2'&$%Let π be the probability of delinquency. The simplestmodel worth considering is the null or intercept-onlymodel,log„π1 − π«= β0, (1)in which D is unrelated to B or S. If we were to fitthis model in PROC LOGISTIC using thedisaggregated data (all six lines), we would find thatthe X2and G2statistics are identical to those weobtained in Lecture 7 from testing the null hypothesis“S and B are independent of D.” That is, testing theoverall fit of (1) is equivalent to testing the fit of themodel (D, SB), because (1) says that D is unrelatedto S and B but makes no assumptions about whetherS and B are related.After (1), we may want to fit a model that has a maineffect for B,log„π1 − π«= β0+ β1X1, (2)where X1is a dummy indicator equal to 1 for scoutsand 0 for non-scouts. If the data are provided in sixlines, the goodness-of-fit test for model (2) will beequivalent to the test for (DB, SB), which says thatStat 544, Lecture 11 3'&$%D and S are conditionally independent given B. Thismakes sense, because (2) says that S has no effect onD once B has been taken into account. Now considerthe model that has main effects for S,log„π1 − π«= β0+ β2X2+ β3X3, (3)where X2and X3are dummy indicators todistinguish among the three categories of S. Thismodel says that B has no effect on D once S has beentaken into account. The goodness-of-fit tests for (3)are equivalent to testing the null hypothesis that(DS, BS) fits, i.e. that D and B are conditionallyindependent given S.The logit model that has main effects for B and S,log„π1 − π«= β0+ β1X1+ β2X2+ β3X3, (4)corresponds to the model of homogeneous associationwhich we discussed in Lecture 7. We could not fitthat model at that time, because the ML estimateshave no closed-form solution. But with logisticregression software, fitting this model is no moredicult than for any other model. This model says thatthe effect of B on D, when expressed in terms of oddsStat 544, Lecture 11 4'&$%ratios, is identical across the levels of S. Equivalently,it says that the odds ratios describing the relationshipbetween S and D are identical across the levels of B.If this model does not fit, we have evidence that theeffect of B on D varies across the levels of S, or thatthe effect of S on D varies across the levels of B.Finally, the saturated model can be written aslog„π1 − π«= β0+ β1X1+ β2X2+ β3X3+ β4X1X2+ β5X1X3, (5)which has main effects for B and S and theirinteractions. This model has X2= G2=0.Let’s collect the fit statistics from these models intoan analysis-of-deviance table.Model G2df pSaturated 0.00 0 —S + B 0.15 2 .928S 0.16 3 .984B 28.80 4 .000Null 36.43 5 .000From this table, we may conclude that:Stat 544, Lecture 11 5'&$%• The Null model clearly does not fit.• Adding B to the Null model drops the devianceby 36.41 − 28.80 = 7.61, which is highlysignificant because P (χ21≥ 7.61) = .006. So theB model fits significantly better than the Nullmodel. But the B model still doesn’t fit.• Adding S to the Null model drops the devianceby 36.41 − 0.16 = 36.25, and P (χ22≥ 36.25) ≈ 0.So the S model fits significantly better than theNull model. And the S model fits the data verywell.• Adding B to the S model, i.e. comparing S + Bto S alone, drops the deviance by only .01. Sothe fit of S + B is not significantly better than S.Both of these models fit the data well. (If S fits,then S + B must also fit, because S is a specialcase of S + B.) Given that both of these modelsfit, we prefer the S model because it’s simpler.Based on this table, the best model is S, because it’sthe simplest model that fits the data well.Stat 544, Lecture 11 6'&$%Overdispersion. For any model, the overall X2orG2can be viewed as testing the joint significance ofall covariates that are not in a given model. Largevalues for X2and G2suggest that our model is toosmall, i.e. that the current set of covariates doesn’tadequately explain the observed variation in thesample proportions pi= yi/ni.Sometimes lack of fit can be remedied by adding morecovariates to the model. But sometimes it can’t.Perhaps no more covariates are available. Perhapsadding more covariates (e.g. high-order interactions)would make the model very complicated and difficultto interpret.As we briefly discussed in Lecture 9, another possiblereason for lack of fit is overdispersion. Overdispersionmeans that the variance in the response yiis largerthan the niπi(1 − πi) indicated by the binomial model.There is no such thing as overdispersion in ordinarylinear regression. In a linear regression modelyi∼ N( xTiβ, σ2),the variance σ2is estimated independently of themean function xTiβ. With discrete response variables,Stat 544, Lecture 11 7'&$%however, the possibility for overdispersion existsbecause the commonly used distributions specifyparticular relationships between the variance and themean. For example, if yi∼ Bin(ni,πi), the mean isμi= niπiand the variance is μi(ni− μi)/ni.IfyiisPoisson with mean μi, the variance is μi.With real data, we may find that the variance of theresponse yiis greater than it should be under thegiven model. (Underdispersion is also theoreticallypossible, but rare in practice.) McCullagh and Nelder(1989) say that overdispersion tends to be the rulerather than the exception. If our model is correct, thePearson residuals should behave like standardizedresiduals, i.e. like standard normal variates. But iftheir variance is substantially larger than one—andthis extra variation is “spread across” theobservational units, rather than concentrated in asmall number of outliers—then we have evidence ofoverdispersion.Remedies for overdispersion. Overdispersion canbe handled in two different ways. One way is tospecify a richer parametric model, where thedistribution of the response


View Full Document

PSU STAT 544 - Overdispersion and Diagnostics

Download Overdispersion and Diagnostics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Overdispersion and Diagnostics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Overdispersion and Diagnostics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?