PSU STAT 504 - Generalized Linear Models - D2720749

Home> Schools> Penn State University> Statistics (STAT) > STAT 504> Generalized Linear Models

PSU STAT 504 - Generalized Linear Models

Course Stat 504- Analysis of Discrete Data

Pages 16

Download Save

Unformatted text preview:

Stat 504, Lecture 15 1'&$%GeneralizedLinear ModelsLast time, we introduced the elements of the GLIM:• The response y, with likelihoodf(y; θ)=expyθ − b(θ)a(φ)+ c(y, φ), (1)where θ is the canonical parameter;• the linear predictorη = xTβ,where x is a vector of covariates and β is to beestimated; and• the link function, which connects η to µ,g(µ)=η.The data actually consist of responses yiandcovariate vectors xifor units i =1,...,N, but in thisnotation, the subscript i has been suppressed.Stat 504, Lecture 15 2'&$%An interesting feature of GLIM’s is that, although thelikelihood function is assumed to be of the form (1),nothing has been assumed about the support of y (i.e.the set of possible values that the random variable ycan take). It doesn’t matter whether the support isthe whole real line, some portion of the real line, theset of non-negative integers, or whatever. The sametheory applies whether y is discrete or continuous.Another interesting feature of GLIM’s is that themean and variance of y can be deduced from thelikelihood function (1). Using some fundamentalmathematical properties of the derivatives of theloglikelihood function, it can be shown thatµ = b(θ)and thatVar(y)=a(φ)b(θ).Because µ depends on θ but not φ, we may write thevariance asVar(y)=a(φ)V (µ)where V (·) is called the variance function. Thisfunction captures the relationship, if any, between themean and variance of y.Stat 504, Lecture 15 3'&$%In many cases, a(φ) will have the forma(φ)=φ/w,where φ is the dispersion parameter and w is a knownweight.Example: normal response. Under the normalmodel y ∼ N(µ, σ2), the log-density islog f = −12σ2(y − µ)2−12log 2πσ2=yµ − µ2/2σ2−y22σ2−12log 2πσ2.Therefore, the canomical parameter is θ = µ, and theremaining elements are b(θ)=µ2/2, φ = σ2,a(φ)=φ, andc(y, φ)=−y22φ−12log (2πφ) .In a heteroscedastic model y ∼ N(µ, σ2/w ), where wis a known weight, we would have φ = σ2anda(φ)=φ/w.Stat 504, Lecture 15 4'&$%Example: binomial response. If y ∼ n−1Bin(n, π),then the log-probability mass function islog f = log n! − log(ny)! − log(n(1 − y))!+ ny log π + n(1 − y) log(1 − π)=y logπ1−π+ log(1 − π)1/n+ c,where c doesn’t involve π. Therefore, the canonicalparameter isθ = logπ1 − π,the b-function isb(θ)=−log(1 − π) = log1+eθ,the dispersion parameter is φ = 1, and a(φ)=φ/n.Stat 504, Lecture 15 5'&$%Maximum-likelihood estimation. Recall that aheteroscedastic normal model is ﬁt by weighted leastsquares (WLS),ˆβ =XTWX−1XTWy,where y is the response and W is the diagonal matrixof weights. (This is equivalent to OLS regression ofW1/2y on W1/2X.)It turns out that the ML estimate of β in a GLIMmay be computed by a Fisher scoring procedure thatis equivalent to iteratively reweighted least squares(IRWLS). One iteration of this algorithm isβ(t+1)=XTWX−1XTWz,where z =(z1,...,zN)Tis the vector whose ithelement iszi= ηi+∂ηi∂µi(y − µi),where ηi= xTiβ(t), µi= g−1(ηi), and W is thediagonal matrix of weightswi=Var(yi)∂ηi∂µi2−1.Stat 504, Lecture 15 6'&$%In the GLIM literature, ziis often called theadjusted dependent variate or the workingvariate. Fisher scoring can therefore be regarded asIRWLS carried out on a transformed version of theresponse variable. At each cycle, we• use the current estimate for β to calculate a newworking variate z and a new set of weights W ,and then• regress z on X using weights W to get theupdated β.Viewing Fisher scoring as IRWLS has an additionaladvantage: It provides an excellent basis for us toderive model-checking diagnostics. The diagnosticsthat are commonly used in regression—plottingresiduals versus ﬁtted values, leverage and inﬂuencemeasures, etc.—have obvious analogues in GLIM’swhen we view the ﬁtting procedure as IRWLS.Covariance matrix. The ﬁnal value for (XTWX)−1upon convergence is the estimated covariance matrixforˆβ.Stat 504, Lecture 15 7'&$%What about the dispersion parameter? Recallthat the variance of yiis usually of the formV (µi)φ/wi, where V is the variance function, φ is thedispersion parameter, and wiis a known weight. Inthis case, φ cancels out of the IRWLS procedure andˆβ itself is the same under any assumed value for φ.But the estimated covariance matrix forˆβ doesdepend on φ.Residuals. The Pearson residual is deﬁned asr =yi− ˆµˆVar(y),where ˆµ is the ML estimate for µ, andˆVar(y)=a(φ)V (ˆµ)is the estimated variance of y.If we write the deviance as D =Ni=1diwhere diisthe contribution of the ith unit, then the devianceresidual isr = sign(y − µ)√d.For example, in a binomial model yi∼ Bin(ni,πi),Stat 504, Lecture 15 8'&$%the Pearson residual isri=yi− niˆπiniˆπi(1 − ˆπi),and the deviance residual is ri= sign(yi− niˆπi)√di,wheredi=2yilogyiniˆπi+(ni− yi)logni− yini− niˆπi.(For computational purposes, interpret 0 log 0 as 0.)Deviance and the Pearson residuals behave somethinglike the standardized residuals in linear regression.McCullagh and Nelder suggest that distributionalproperties of deviance residuals may be a little betterthan the Pearson, but either one is okay.Plotting residuals versus fitted values. Plot theresiduals on the vertical axis versus the linearpredictor η on the horizontal axis. As in linearregression, we hope to see something like a“horizontal band” with mean ≈ 0 and constantvariance as we move from left to right.• Curvature in the plot may be due to a wrong linkfunction or the omission of a nonlinear (e.g.quadratic) term for an important covariate.Stat 504, Lecture 15 9'&$%• Non-constancy of range suggests that thevariance function may be incorrect.For binary responses, this plot is not very informative;all the points will lie on two curves, one for y = 0 andthe other for y = 1. However, the plot may still helpus to ﬁnd outliers (residuals greater than about 2 or 3in the positive or negative direction).Plotting residuals versus individual covariates.In the same way, we can also plot the residuals versusa single covariate. (If the model has only onepredictor, this will be equivalent to the last plot.)Again, we hope to see something like a horizontalband. Curvature in this plot suggests that thex-variable in question ought to enter into the modelin a nonlinear fashion—for example, we

View Full Document


School:
Email:
New Password:
Confirm Password:

PSU STAT 504 - Generalized Linear Models

Sign up for free to view:

Please select your school