DOC PREVIEW
PSU STAT 544 - Logistic Regression

This preview shows page 1-2-3-4-5-6 out of 17 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 17 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Stat 544, Lecture 9 1'&$%More AboutLogistic RegressionLast time, we introduced the logistic regression model.This model assumes yi∼ Bin(ni,πi), i =1,...,Nwhere niis known and πidepends on covariatesxi=(xi1,...,xip)T.The assumed relationship islogit(πi) = log„πi1 − πi«= xTiβ,where β =(β1,...,βp) is to be estimated. The MLestimateˆβ can be found by the Newton-Raphsonprocedure,β(t+1)= β(t)+“XTW(t)X”−1XT“y − μ(t)”,Stat 544, Lecture 9 2'&$%wherey =26666664y1y2...yN37777775,X=26666664xT1xT2...xTN37777775,μ=26666664n1π1n2π2...nNπN37777775,andW = Diag (niπi(1 − πi)) .Upon convergence, (XTWX)−1is an estimatedcovariance matrix forˆβ.Failure to converge. With the design matrixX =2666666641 −51 −41 −31 −21 −1377777775,Newton-Raphson converged quickly. But see whathappens when we takeStat 544, Lecture 9 3'&$%X =2666666641010101111377777775.> x <- cbind(int=1,dummy=c(0,0,0,1,1))> result <- nr.logit(x,y,n)1...2...3...4...5...6...7...8...9...10...11...12...13...14...15...16...17...18...19...20...> logit.print(result)Algorithm failed to converge by 20 iterationscoef SE coef/SE pvalint -0.9555114 0.5262348 -1.82 0.069dummy 22.1584062 7037.4000146 0.00 0.997Loglikelihood = -10.6351604403886NR failed to converge. The coefficient for the dummyvariable is huge (it’s the log of an odds ratio) and thestandard error is enormous. If we increased thenumber of iterations, the coefficient would continue tomove toward ∞, eventually producing an overflow.The problem is that the dummy variable singles out agroup of observations for which the observed successrate is 1. The logistic regression coefficient for thisdummy variable is the the log-odds ratio for the 2 × 2table shown below.Stat 544, Lecture 9 4'&$%Dead Alivelog10(conc.)> −312 0log10(conc.)≤−35 13Whenever we have a dummy variable or set ofcovariates that singles out a group of observationsthat are “all successes” or “all failures,” we willencounter this problem. The result is that NR willfail to converge by a reasonable number of iterations,one or more estimated coefficients will be large, andtheir standard errors will be even larger.Inferences about coefficients. With large samples,a reasonable approximation isˆβ ∼ N(β,ˆV (ˆβ))whereˆV (ˆβ)=(XTWX)−1comes from the final stepof NR. Therefore, we can test H0: βj=0bycomparingt =ˆβjSE(ˆβj)to N(0, 1), where SE(ˆβj) is the square root of the(j, j)th element ofˆV (ˆβ). This is known as a Waldtest. An approximate 95% confidence interval for βjisˆβj± 1.96 SE(ˆβj).Stat 544, Lecture 9 5'&$%An alternative way to test H0: βj= 0 is to removethe jth column from the design matrix X, re-fit themodel, and compare the drop in 2 × l(β)toχ21. Thisis a likelihood ratio (LR) test.Wald and LR tests are asymptotically equivalent, soin large samples they will yield similar results. Withsmaller samples, the results from the two tests may besomewhat different. When differences arise, manystatisticians would put more trust LR than in Wald.Inferences about functions of coefficients. For alinear function θ = aTβ, a reasonable approximation isaTˆβ ∼ N(aTβ, aTˆV (ˆβ)a ).For a nonlinear function θ = g(β), the delta methodapproximation isg(ˆβ) ∼ N( g(θ),g(ˆθ)TˆV (ˆβ)g(ˆθ)).For example, consider our dose-response modellogit(πi)=β0+ β1zi.In dose-response settings, one parameter of interest isLD50, the value of zithat gives a 50% death rate.Stat 544, Lecture 9 6'&$%Because logit(.5) = 0,LD50= −β0/β1.The derivatives are∂LD50/∂β0= −1/β1,∂LD50/∂β1= β0/β21,so the approximate variance forˆLD50= −ˆβ0/ˆβ1ish−1/ˆβ1,ˆβ0/ˆβ21i“XTˆWX”−124−1/ˆβ1ˆβ0/ˆβ2135,whereˆW is the final value of W = Diag( niπi(1 − πi))upon convergence of Newton-Raphson.Fitted values. One important nonlinear function isπi= πi(β) = expit(xTiβ)=exTiβ(1 + exTiβ).You can easily verify that∂πi∂βj= πi(1 − πi)xijand thus∂πi∂β= πi(1 − πi)xi.Stat 544, Lecture 9 7'&$%Therefore, an estimated variance for the fittedprobability ˆπi= πi(ˆβ)isˆV (ˆπi)=π2i(1 − πi)2xTi(XTWX)−1xi.In general, this should be smaller than the estimatedvariance of pi= yi/ni,ˆV (pi)=pi(1 − pi)/ni, becauseit borrows information across the entire datasety1,...,yN.Now consider the entire vector of probabilitiesπ =(π1,...,πN)T.The N × p Jacobian (first-derivative) matrix forπ = π(β)is∂π∂β= UXwhere U = Diag( πi(1 − πi) ). By the multivariateδ-method, an approximate covariance matrix for π isˆV (ˆπ)=UX (XTˆWX)−1XTU.Although this matrix is N × N, its rank is p, the sameas that of (XTˆWX)−1.Let’s apply this to our dose-response dataset:> result <- nr.logit(x,y,n)1...2...3...4...5...6...7...8...Stat 544, Lecture 9 8'&$%> betahat <- result$beta> pihat <- exp(x%*%betahat)/(1+exp(x%*%betahat))> pihat <- as.vector(pihat) # so that diag() will work correctly> U <- diag(pihat*(1-pihat))> W <- diag(n*pihat*(1-pihat))> U%*%x%*%solve(t(x)%*%W%*%x)%*%t(x)%*%U[,1] [,2] [,3] [,4] [,5][1,] 2.567787e-04 0.0017104253 -0.0002206373 -0.0003674949 -4.166649e-05[2,] 1.710425e-03 0.0132130140 0.0051442689 -0.0014207956 -1.974174e-04[3,] -2.206373e-04 0.0051442689 0.0242286177 0.0040489346 3.270299e-04[4,] -3.674949e-04 -0.0014207956 0.0040489346 0.0011056943 1.048585e-04[5,] -4.166649e-05 -0.0001974174 0.0003270299 0.0001048585 1.028922e-05Now let’s find the covariance matrix for the rawsample proportions pi= yi/ni:>p<-y/n> diag(p*(1-p)/n)[,1] [,2] [,3] [,4] [,5][1,] 0 0.00000000 0.00000000 0 0[2,] 0 0.02314815 0.00000000 0 0[3,] 0 0.00000000 0.03703704 0 0[4,] 0 0.00000000 0.00000000 0 0[5,] 0 0.00000000 0.00000000 0 0The estimated variances for p1=0,p5= 1 and p6=1are zero, which is nonsensical. The estimatedvariances for p2and p3are substantially larger thanfor ˆπ2and ˆπ3.This makes sense. The estimates ˆπ1,...,ˆπ5are basedon a two-parameter model for the entire datasety1,...,y5. The raw proportions p1,...,p5are MLestimates from a saturated, five-parameter model thatsays each πimay lie anywhere in (0, 1). Therefore, theStat 544, Lecture 9 9'&$%ˆπi’s will be more efficient estimates than the pi’s,provided that the model is true. If the model is nottrue, the ˆπi’s could be substantially biased.Therefore, it is important to check whether the modelfits the data well.Goodness-of-fit testing. In ordinary linearregression, we assess the fit of the model


View Full Document

PSU STAT 544 - Logistic Regression

Download Logistic Regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Logistic Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Logistic Regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?