Stanford STATS 203 - Lecture Notes - D634407

Home> Schools> Stanford University> Statistics (STATS) > STATS 203> Lecture Notes

DOC PREVIEW

Stanford STATS 203 - Lecture Notes

School name Stanford University

Course Stats 203- Introduction to Regression Models and Analysis of Variance

Pages 21

This preview shows page 1-2-20-21 out of 21 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 21 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

- p. 1/??Statistics 203: Introduction to Regressionand Analysis of VarianceCourse reviewJonathan Taylor- p. 2/??Today■Review / overview of what we learned.- p. 3/??General themes in regression models■Specifying regression models.◆What is the joint (conditional) distribution of all outcomesgiven all covariates?◆Are outcomes independent (conditional on covariates)? Ifnot, what is an appropriate model?■Fitting the models.◆Once a model is specified how are we going to estimatethe parameters?◆Is there an algorithm or some existing software to fit themodel?■Comparing regression models.◆Inference for coefficients in the model: are some zero (i.e.is a smaller model better?)◆What if there are two competing models for the data? Whywould one be preferable to the other?◆What if there are many models for the data? How do wecompare models for the data?- p. 4/??Simple linear regression model■Only one covariate■Yi= β0+ β1Xi+ εi, 1 ≤ i ≤ n■Errors ε are independent N(0, σ2).- p. 5/??Multiple linear regression model■Yi= β0+ β1Xi1+ · · · + βp−1Xi,p−1+ εi, 1 ≤ i ≤ n■Errors ε are independent N(0, σ2).■βj’s: (partial) regression coefficients.■Special cases: polynomial / spline regression models whereextra columns are functions of one covariate.- p. 6/??ANOVA & categorical variables■Generalization of two-sample tests■One-way (fixed)Yij= µ + αi+ εij, 1 ≤ i ≤ r, 1 ≤ j ≤ nα’s are constants to be estimated. Errors εijareindependent N(0, σ2).■Two-way (fixed):Yijk= µ+αi+βj+(αβ)ij+εijk, 1 ≤ i ≤ r, 1 ≤ j ≤ m, 1 ≤ k ≤ nα’s, β’s, (αβ)’s are constants to be estimated. Errors εijkareindependent N(0, σ2).■Experimental design: when balanced layouts are impossible,which is the best design?- p. 7/??Generalized linear models■Non-Gaussian errors.■Binary outcomes: logistic regression (or probit).■Count outcomes: Poisson regression.■A “link” and “variance” function determine a GLM.■Link:g(E(Yi)) = g(µi) = ηi= xiβ = Xi0β0+ · · · + Xi,p−1βp−1.■Variance function:Var(Yi) = V (µi).- p. 8/??Nonlinear regression models■Regression function depends on parameters in a nonlinearfashion.■Yi= f(Xi1, . . . , Xip; θ1, . . . , θq) + εi, 1 ≤ i ≤ n■Errors εiare independent N(0, σ2).- p. 9/??Robust regression■Suppose that we have additive noise, but not Gaussian.Likelihood of the formL(β|Y, X0, . . . , Xp−1) ∝ exp −ρ Y −Pp−1j=0βjXjs!!■Leads to robust regressionnXi=1ρ Yi−Pp−1j=0Xijβjs!.■Can downweight residuals with bigger tails than normalrandom variables.- p. 10/??Random & mixed effects ANOVA■When the levels of the categorical variables in an ANOVAare a sample from a population, effects should be treated asrandom.■One-way (random):Yij= µ + αi+ εij, 1 ≤ i ≤ r, 1 ≤ j ≤ nwhere αi∼ N(0, σ2α) are random, independent of the errorsεijwhich are independent N(0, σ2).■Introduces correlation in the Y ’s:Cov(Yij, Yi0j0) = δii0σ2α+ δjj0σ2.- p. 11/??Mixed linear models■Essentially a model of covariance between observationsbased on “subject” effects.■General form:Yn×1= Xn×pβp×1+ Zn×qγq×1+ εn×1where◆ε ∼ N(0, σ2I);◆γ ∼ N(0, D) for some covariance D.■In this modelY ∼ N(Xβ, ZDZ0+ σ2I).■Covariance is modelled through “random effect” designmatrix Z and covariance D.- p. 12/??Time series regression models■Another model of covariance between observations, basedon dependence in time.■Yn×1= Xn×pβp×1+ εn×1■In these models, ε ∼ N(0, Σ) where the covariance Σdepends on what kind of time series model is used (i.e.which ARMA(p, q) model?).■Example, if ε is AR(1) with parameter ρ thenΣij= σ2ρ|i−j|.- p. 13/??Functional linear model■We talked about a functional two-sample t-test.■General formYi,t= β0,t+p−1Xj=1Xijβj,t+ εi,twhere the noise εi,tis a random function, independentacross “observation” (curves) Yi,·.■Parameter estimates are curves: leads to nice inferenceproblems for smooth random curves.- p. 14/??Least squares■Multiple linear regression – OLSbβ = (XtX)−1XtY.■Non-constant variance but independent – WLSbβ = (XtW X)−1XtW Y, Wi= 1/Var(Yi)■General correlation – GLSbβ = (XtΣ−1X)−1XtΣ−1Y, Cov(Y ) = σ2Σ.- p. 15/??Maximum likelihood■In the Gaussian setting, with Σ known least squares is MLE.■In other cases, we needed iterative techniques to solve MLE:◆nonlinear regression: iterative projections onto the tangentspace;◆robust regression: IRLS with weights determined byψ = ρ0◆generalized linear models: IRLS, Fisher scoring◆time series regression models: two-stage procedureapproximates MLE (can iterate further, though)◆mixed models: similar techniques (though we skipped thedetails)- p. 16/??Diagnostics: influence and outliers■Diagnostic plots:◆Added variable plots.◆QQplot.◆Residuals vs. fitted.◆Standardized residuals vs. fitted.■Measures of influence■Cook’s distance.■DF F IT S.■DF BET AS.■Outlier test with Bonferroni correction.■Techniques are most developed for multiple linear regressionmodel, but some can be generalized (using “whitened”residuals).- p. 17/??Penalized regression■We looked at ridge regression, too.■A generic example of the “bias-variance” tradeoff in statistics.■MinimizeSSEλ(β) =nXi=1 Yi−p−1Xj=1Xijβj!2+ λp−1Xj=1β2j.■Other penalties possible: basic idea is that the penalty is ameasure of “complexity” of the model.■Smoothing spline: ridge regression for scatterplot smoothers.- p. 18/??Hypothesis tests: multiple linear regression■Multiple linear regression: model R has j less coefficientsthan model F – equivalently there are j linear constraints onβ’s.■F =SSE(R)−SSE(F )jSSE(F )n−p∼ Fj,n−p(if H0is true)■Reject H0: R is true at level α if F > F1−α,j,n−p.- p. 19/??Hypothesis tests: general case■Other models: DEV (M) = −2 log L(M) replaces SSE(M).■Difference D = DEV (R) − DEV (F ) ∼ χ2j(asymptotically).■Denominator in the F statistic is usually either known orbased on something like Pearson’s X2:bφ =1n − pnXi=1ri(F )2.In general, residuals are “whitened”r(M) = Σ(M)−1/2(Y − Xbβ(M)).■Reject H0: R is true at level α if D > χ21−α,n−p.- p. 20/??Model selection: AIC, BIC, stepwise■Best subsets regression (leaps): adjusted R2, Cp.■Akaike Information Criterion (AIC)AIC(M) = −2 log L(M) + 2 ·

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-20-21 out of 21 pages.

Stanford STATS 203 - Lecture Notes

Sign up for free to view:

Please select your school