PSU STAT 544 - Fitting Generalized Linear Models

Unformatted text preview:

Stat 544, Lecture 13 1'&$%Fitting GeneralizedLinear ModelsLast time, we introduced the elements of the GLIM:• The response y, with distributionf(y; θ)=expjyθ − b(θ)a(φ)+ c(y, φ)ff,where θ is the canonical parameter. from whichwe showed that E(y)=μ = b(θ) andVar(y)=a(φ)b(θ).• The linear predictorη = xTβ,where x is a vector of covariates and β is to beestimated.• The link function, which connects η to μ,g(μ)=η.In this notation, the subscript i has been suppressed.Stat 544, Lecture 13 2'&$%In many cases, a(φ) will have the forma(φ)=φ/w,where φ is the dispersion parameter and w is a knownweight.Example: normal response. Under the normalmodel y ∼ N(μ, σ2), the log-density islog f = −12σ2(y − μ)2−12log`2πσ2´=yμ − μ2/2σ2−y22σ2−12log`2πσ2´.Therefore, the canonical parameter is θ = μ, and theremaining elements are b(θ)=μ2/2 and φ = σ2,a(φ)=φ.In a heteroscedastic model y ∼ N(μ, σ2/w ), where wis a known weight, we would have φ = σ2anda(φ)=φ/w.Example: binomial response. If y ∼ n−1Bin(n, π),then the log-probability mass function islog f = log n! − log(ny)! − log(n(1 − y))!+ ny log π + n(1 − y) log(1 − π)Stat 544, Lecture 13 3'&$%=y log“π1−π”+ log(1 − π)1/n+ c,where c doesn’t involve π. Therefore, the canonicalparameter isθ = log„π1 − π«,the b-function isb(θ)=−log(1 − π) = log“1+eθ”,the dispersion parameter is φ = 1, and a(φ)=φ/n.Canonical link. We showed that the canonicalparameter for y ∼ N(μ, σ2)isθ = μ, and thecanonical parameter for ny ∼ Bin(n, π)isθ = logit(π).Notice that the most commonly used link function fora normal model is η = μ, and the most commonlyused link function for the binomial model usη = logit(π). This is no accident. When η = θ,wesaythat the model has a canonical link. Canonical linkshave some nice properties, which we will discuss.It is often convenient to use a canonical link. Butconvenience does not imply that the data actuallyconform to it. It will be important to check theStat 544, Lecture 13 4'&$%appropriateness of the link through diagnostics,whether or not the link is canonical.Fitting generalized linear models via Fisherscoring. ML estimation for β may be carried out viaFisher scoring,β(t+1)= β(t)+h−El(β(t))i−1l(β(t)),where l is the loglikelihood function for the entiresample y1,...,yN.Temporarily changing the notation, we will now let l,land ldenote the contribution of a singleobservation yi= y to the loglikelihood and itsderivatives. We do this for simplicity, understandingthat the corresponding functions for the entire sampleare obtained by summing the contributions over thesample units i =1,...,N.Ignoring constants, the loglikelihood isl(θ; y)=yθ − b(θ)a(φ).The contribution of y = yito the jth element of theStat 544, Lecture 13 5'&$%score vector is∂l∂βj=„∂l∂θ«„∂θ∂μ«„∂μ∂η«„∂η∂βj«.The first factor is∂l∂θ=y − b(θ)a(φ)=y − μa(φ).Because μ = b(θ), the second factor is∂θ∂μ=1b(θ)=1V (μ)=a(φ)Var(y),where V (μ)=b(θ) is the variance function that wediscussed last time.The third factor, ∂μ/∂η, will depend on the linkfunction. The fourth factor is ∂η/βj= xij, where xijis the jth element of the covariate vector xi= x forthe ith observation. Putting it all together, we have∂l∂βj=y − μVar(y)„∂μ∂η«xij.If we are using the canonical link η = θ, then∂μ/∂η = ∂μ/∂θ = b(θ), so the score becomes∂l∂βj=y − μVar(y)b(θ) xij=y − μa(φ)xij.To find the expected second derivatives, we can useStat 544, Lecture 13 6'&$%the property−E„∂2l∂βj∂βk«= E»„∂l∂βj«„∂l∂βk«–= E„y − μVar(y)«2„∂μ∂η«2xijxik=1Var(y)„∂μ∂η«2xijxik.With the canonical link, this becomesE„∂2l∂βj∂βk«= −b(θ)a(φ)xijxik.But under the canonical link, the actual secondderivative is∂2l∂βj∂βk=∂∂βk„∂l∂θ«„∂θ∂βj«=„∂l∂θ«„∂2θ∂βj∂βk«+„∂θ∂βj«„∂2l∂θ2«„∂θ∂βk«.=0+„∂2l∂θ2«xijxik.Also, we saw last time that∂2l∂θ2= −b(θ)a(φ),so with the canonical link, the actual secondStat 544, Lecture 13 7'&$%derivatives equal the observed second derivatives, andFisher scoring is the same thing as Newton-Raphson.Under the canonical link, the second derivatives areconstant with respect to the data y.Putting it together. For an arbitrary link, we havejust shown that∂l∂βj=y − μVar(y)„∂μ∂η«xij,−E„∂2l∂βj∂βk«=1Var(y)„∂μ∂η«2xijxik.It follows that the score vector for the entire data sety1,...,yNcan be written as∂l∂β= XTA(y − μ),where X =(x1,...,xN)T,A = Diag»[Var(yi)]−1„∂μi∂ηi«–,= Diag»Var(yi)„∂ηi∂μi«–−1,and y and μ now denote the entire vectorsy =(y1,...,yN)T,Stat 544, Lecture 13 8'&$%μ =(μ1,...,μN)T.The expected Hessian matrix becomes−E„∂2l∂β∂βT«= XTWX,whereW = Diag"[Var(yi)]−1„∂μi∂ηi«2#= Diag"Var(yi)„∂ηi∂μi«2#−1.An iteration of Fisher scoring is thenβ(t+1)= β(t)+“XTWX”−1XTA (y − μ) ,where W , A and μ are calculated from β(t).Iteratively reweighted least squares. Recall thata heteroscedastic normal model is fit by weightedleast squares (WLS),ˆβ =“XTWX”−1XTWy,where y is the response and W is the diagonal matrixof weights, which is equivalent to OLS regression ofStat 544, Lecture 13 9'&$%W1/2y on W1/2X.We can arrange the step of Fisher scoring to make itresemble WLS. First, rewrite it asβ(t+1)=“XTWX”−1hXTWXβ(t)+ XTA (y − μ)i.Then note that Xβ =(η1,...,ηN)T= η. Also, notethat A and W are related byA = W„∂η∂μ«,where ∂η/∂μ = Diag(∂ηi/∂μi). Therefore, we canwrite it asβ(t+1)=“XTWX”−1XTWz,wherez = η +„∂η∂μ«(y − μ)=(z1,...,zN)T,where zi= ηi+(∂ηi/∂μi)(yi− μi). In the GLIMliterature, ziis often called the adjusted dependentvariate or the working variate. Fisher scoring cantherefore be regarded as iteratively reweighted leastsquares (IRWLS) carried out on a transformed versionStat 544, Lecture 13 10'&$%of the response variable.At each cycle, we• use the current estimate for β to calculate a newworking variate z and a new set of weights W ,and then• regress z on X using weights W to get theupdated β.Viewing Fisher scoring as IRWLS makes it easy toprogram this


View Full Document

PSU STAT 544 - Fitting Generalized Linear Models

Download Fitting Generalized Linear Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Fitting Generalized Linear Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Fitting Generalized Linear Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?