New version page

Stanford STATS 191 - Weighted Least Squares

Upgrade to remove ads
Upgrade to remove ads
Unformatted text preview:

Lecture 14: Weighted Least SquaresNancy R. ZhangStatistics 191, Stanford UniversityMarch 5, 2008Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 1 / 25Review - Binary responses data exampleA clinic sent fliers to itsclients to encourageeveryone, but especiallyolder persons, to get a flushot for protection against anexpected flu epidemic.150 clients randomlysampled2Y : did they get flu shot?3Predictor variables: Age,health awareness.Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 2 / 25Review - Binary responses modelModel: Y ∈ {0, 1},P(Y = 1|X1, . . . , Xp) = g−1(β1X1+ β2X2+ . . . βpXp).Whereg(π) = logπ1 − π.The inverse g−1isg−1(z) =ez1 + ez.We have no choice but to accept non-constant variance,Var(Y ) = π(X )[1 − π(X )].Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 3 / 25Review - Model interpretationAn intuitive quantity to assess probabilities:odds =P(Y = 1|X )P(Y = 0|X ).In the logistic regression model,log(odds) = βX .The parameter β is the contribution of unit increase in X to theincrease (decrease) in odds. For example, if X were binary as well,logodds(X = 1)odds(X = 0)= β.Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 4 / 25Review - Model fittingFitting can be done by Newton-Raphson:1Let u0= (δl(β)δβi)i=1,...,pbe the gradient vector.2Let H be the Hessian matrix hi,j=δ2l(β)δβiδβj.3Start with an initial β(0), then iterate β(t+1)= β(t)− (H(t))−1u(t).The idea is, for each iteration t, to approximate l(β) locally by aquadratic:l(β) = l(β(t)) + u(t)0(β − β(t)) +12(β − β(t))0H(t)(β − β(t)),and solve for δl(β)/δβ ≈ u(t)+ H(t)(β − β(t)) = 0.For logistic regression model,β(t+1)= β(t)+ {X0diag[π(t)i(1 − π(t)i)]X }(−1)X0(y − π(t)).This is equivalent to doing a weighted linear regression at each step.Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 5 / 25Inference for βIn Gaussian case:ˆβ = (X0X )−1X0Y , Y ∼ N(Xβ, σ2I).Since Gaussian vectors remain Gaussian under linear transforms,ˆβ ∼ N(β, (X0X )−1σ2).For logistic regression,ˆβ is no longer linear in Y . However,asymptotically (i.e. n large), it is Gaussian. It’s covariance can beestimated by\cov(ˆβ) = (X0diag[ˆπi(1 − ˆπ(t)i)]X )−1.From the square root of the diagonals elements of the above matrixyou can get\s.e.(ˆβ).Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 6 / 25Wald tests for β1Confidence intervals for β:[ˆβ − zα/2\s.e.(ˆβ),ˆβ + zα/2\s.e.(ˆβ)]2Two sided test H0: β = 0, reject ifˆβ\s.e.(ˆβ)> zα/23Test of constraint H0: Cj×pβp×1= hj×1, reject if(Cˆβ − h)0(C\cov(ˆβ)C0)−1(Cˆβ − h)is larger than χ2j,1−α.Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 7 / 25Assessment of model fitIn linear regression, we used the F -test:F =[SSE(RM) −SSE(FM)]/[∆df ]SSE(FM)/[n − df (FM)].F ∼ F∆df ,n−df (FM).The analogous quantity of SSE for non-linear models is deviance:Deviance(ˆβ) = −2[l(˜β, Y ) − l(ˆβ, Y )],wherel(·, Y ) is log-likelihood,˜β is fit of data using saturated model (n predictors).Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 8 / 25Assessment of model fitFor Gaussian model,Deviance(ˆβ) =nXi=1(yi−ˆyi)2.For Logistic model,Deviance(ˆβ) = 2nXi=1YilogYiˆπi+ (1 − Yi) log1 − Yi1 − ˆπi= 2Xobserved × logobservedfitted.Convention: 0 × log 0 = 0.Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 9 / 25Nested Chi-squared tests of model fitAs for SSE, the greater the deviance, the poorer the fit. If reducedmodel (RM) were true, thenDeviance(RM) − Deviance(FM) → χ2df (FM)−df (RM).Thus, reject RM at asymptotic level alpha ifDeviance(RM) − Deviance(FM) > χ2df (FM)−df (RM),1−α.Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 10 / 25Model diagnosisIn linear regression, the standardized residuals were used to diagnosemodel fit.ri= yi−ˆyi, r∗i=ribσri=rip1 − pii.The analogous quantity here is the Pearson residual,ri=yi− ˆπipˆπi(1 − ˆπi),r∗i=riq1 −ˆhii.whereˆhiiare diagonals ofHat = W1/2X (X0WX )−1X0W1/2.W = diag[πi(1 − πi)].Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 11 / 25Model diagnosisAnother quantity you can use is the deviance residuals:Deviance(ˆβ) = 2nXi=1YilogYiˆπi+ (1 − Yi) log1 − Yi1 − ˆπi= 2Xobserved × logobservedfitted.So let dibe the contribution of data point i to the above measure ofmis-fit:di= YilogYiˆπi+ (1 − Yi) log1 − Yi1 − ˆπiDeviance residual:pdi× sign(yi− ˆπi).Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 12 / 25Model Selection1Nested models: Deviance χ2. Thus, reject RM at asymptotic levelα ifDeviance(RM) − Deviance(FM) > χ2df (FM)−df (RM),1−α.Deviance = −2[l(˜β, Y ) − l(ˆβ, Y )]2Non-nested models:1AIC −2loglik + 2p2Cp, but you would be doing a local asymptotic approximation usingPearson’s residuals.ri=yi− ˆπipˆπi(1 − ˆπi), Cp=Xir2i+ 2pˆσ2,3BIC −2loglik + p log nNancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 13 / 25Multinomial DataIf the response Y belongs to K categories.1Designate one category as the “base” category.2P(Y = k|X ) =eX βk1 +PK −1l=1eX βlHere, βk= (βk1, . . . , βkp)0.P(Y = K |X ) =11 +PK −1l=1eX βl3p × (K − 1) parameters.4βkifor k-th category and i-th predictor interpreted as increase inlog-odds from base category.Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 14 / 25Count data1Men and women were asked whether they believed in the after life(1991 General Social Survey).2Results:Y N or UM 435 147 582F 375 134 509Total 810 281 10913Question: is belief in afterlife independent of gender?Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 15 / 25Contingency TablesY N or UM 435 147 582F 375 134 509Total 810 281 10911Model: Yij∼ Poisson(λij).2H0: Independence. i.e. λij= λαiβj.3HA: λijarbitrary.4Pearson’s χ2Test:χ2=Xi,j(Yij− Eij)2Eij∼ χ21(under H0)5Why 1 df? Independence model has 5 (λ, 2 α’s, 2 β’s) parameters,2 constraints ⇒ 3 df. Unrestricted model has 4 parameters.Nancy R. Zhang (Statistics 191) Lecture 14 March 5, 2008 16 / 25Under independence:log E(Yij) = log λij= log λ + log αi+ log βj.What about variance? Because the data is Poisson,Var(Yij) = E(Yij) = λij.Thus, the variance scales with the mean.Log stabilizes variance.But unlike before, we are explicit modeling data as


View Full Document
Download Weighted Least Squares
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Weighted Least Squares and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Weighted Least Squares 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?