- p. 1/??Statistics 203: Introduction to Regressionand Analysis of VarianceCourse reviewJonathan Taylor- p. 2/??Today■Review / overview of what we learned.- p. 3/??General themes in regression models■Specifying regression models.◆What is the joint (conditional) distribution of all outcomesgiven all covariates?◆Are outcomes independent (conditional on covariates)? Ifnot, what is an appropriate model?■Fitting the models.◆Once a model is specified how are we going to estimatethe parameters?◆Is there an algorithm or some existing software to fit themodel?■Comparing regression models.◆Inference for coefficients in the model: are some zero (i.e.is a smaller model better?)◆What if there are two competing models for the data? Whywould one be preferable to the other?◆What if there are many models for the data? How do wecompare models for the data?- p. 4/??Simple linear regression model■Only one covariate■Yi= β0+ β1Xi+ εi, 1 ≤ i ≤ n■Errors ε are independent N(0, σ2).- p. 5/??Multiple linear regression model■Yi= β0+ β1Xi1+ · · · + βp−1Xi,p−1+ εi, 1 ≤ i ≤ n■Errors ε are independent N(0, σ2).■βj’s: (partial) regression coefficients.■Special cases: polynomial / spline regression models whereextra columns are functions of one covariate.- p. 6/??ANOVA & categorical variables■Generalization of two-sample tests■One-way (fixed)Yij= µ + αi+ εij, 1 ≤ i ≤ r, 1 ≤ j ≤ nα’s are constants to be estimated. Errors εijareindependent N(0, σ2).■Two-way (fixed):Yijk= µ+αi+βj+(αβ)ij+εijk, 1 ≤ i ≤ r, 1 ≤ j ≤ m, 1 ≤ k ≤ nα’s, β’s, (αβ)’s are constants to be estimated. Errors εijkareindependent N(0, σ2).■Experimental design: when balanced layouts are impossible,which is the best design?- p. 7/??Generalized linear models■Non-Gaussian errors.■Binary outcomes: logistic regression (or probit).■Count outcomes: Poisson regression.■A “link” and “variance” function determine a GLM.■Link:g(E(Yi)) = g(µi) = ηi= xiβ = Xi0β0+ · · · + Xi,p−1βp−1.■Variance function:Var(Yi) = V (µi).- p. 8/??Nonlinear regression models■Regression function depends on parameters in a nonlinearfashion.■Yi= f(Xi1, . . . , Xip; θ1, . . . , θq) + εi, 1 ≤ i ≤ n■Errors εiare independent N(0, σ2).- p. 9/??Robust regression■Suppose that we have additive noise, but not Gaussian.Likelihood of the formL(β|Y, X0, . . . , Xp−1) ∝ exp −ρ Y −Pp−1j=0βjXjs!!■Leads to robust regressionnXi=1ρ Yi−Pp−1j=0Xijβjs!.■Can downweight residuals with bigger tails than normalrandom variables.- p. 10/??Random & mixed effects ANOVA■When the levels of the categorical variables in an ANOVAare a sample from a population, effects should be treated asrandom.■One-way (random):Yij= µ + αi+ εij, 1 ≤ i ≤ r, 1 ≤ j ≤ nwhere αi∼ N(0, σ2α) are random, independent of the errorsεijwhich are independent N(0, σ2).■Introduces correlation in the Y ’s:Cov(Yij, Yi0j0) = δii0σ2α+ δjj0σ2.- p. 11/??Mixed linear models■Essentially a model of covariance between observationsbased on “subject” effects.■General form:Yn×1= Xn×pβp×1+ Zn×qγq×1+ εn×1where◆ε ∼ N(0, σ2I);◆γ ∼ N(0, D) for some covariance D.■In this modelY ∼ N(Xβ, ZDZ0+ σ2I).■Covariance is modelled through “random effect” designmatrix Z and covariance D.- p. 12/??Time series regression models■Another model of covariance between observations, basedon dependence in time.■Yn×1= Xn×pβp×1+ εn×1■In these models, ε ∼ N(0, Σ) where the covariance Σdepends on what kind of time series model is used (i.e.which ARMA(p, q) model?).■Example, if ε is AR(1) with parameter ρ thenΣij= σ2ρ|i−j|.- p. 13/??Functional linear model■We talked about a functional two-sample t-test.■General formYi,t= β0,t+p−1Xj=1Xijβj,t+ εi,twhere the noise εi,tis a random function, independentacross “observation” (curves) Yi,·.■Parameter estimates are curves: leads to nice inferenceproblems for smooth random curves.- p. 14/??Least squares■Multiple linear regression – OLSbβ = (XtX)−1XtY.■Non-constant variance but independent – WLSbβ = (XtW X)−1XtW Y, Wi= 1/Var(Yi)■General correlation – GLSbβ = (XtΣ−1X)−1XtΣ−1Y, Cov(Y ) = σ2Σ.- p. 15/??Maximum likelihood■In the Gaussian setting, with Σ known least squares is MLE.■In other cases, we needed iterative techniques to solve MLE:◆nonlinear regression: iterative projections onto the tangentspace;◆robust regression: IRLS with weights determined byψ = ρ0◆generalized linear models: IRLS, Fisher scoring◆time series regression models: two-stage procedureapproximates MLE (can iterate further, though)◆mixed models: similar techniques (though we skipped thedetails)- p. 16/??Diagnostics: influence and outliers■Diagnostic plots:◆Added variable plots.◆QQplot.◆Residuals vs. fitted.◆Standardized residuals vs. fitted.■Measures of influence■Cook’s distance.■DF F IT S.■DF BET AS.■Outlier test with Bonferroni correction.■Techniques are most developed for multiple linear regressionmodel, but some can be generalized (using “whitened”residuals).- p. 17/??Penalized regression■We looked at ridge regression, too.■A generic example of the “bias-variance” tradeoff in statistics.■MinimizeSSEλ(β) =nXi=1 Yi−p−1Xj=1Xijβj!2+ λp−1Xj=1β2j.■Other penalties possible: basic idea is that the penalty is ameasure of “complexity” of the model.■Smoothing spline: ridge regression for scatterplot smoothers.- p. 18/??Hypothesis tests: multiple linear regression■Multiple linear regression: model R has j less coefficientsthan model F – equivalently there are j linear constraints onβ’s.■F =SSE(R)−SSE(F )jSSE(F )n−p∼ Fj,n−p(if H0is true)■Reject H0: R is true at level α if F > F1−α,j,n−p.- p. 19/??Hypothesis tests: general case■Other models: DEV (M) = −2 log L(M) replaces SSE(M).■Difference D = DEV (R) − DEV (F ) ∼ χ2j(asymptotically).■Denominator in the F statistic is usually either known orbased on something like Pearson’s X2:bφ =1n − pnXi=1ri(F )2.In general, residuals are “whitened”r(M) = Σ(M)−1/2(Y − Xbβ(M)).■Reject H0: R is true at level α if D > χ21−α,n−p.- p. 20/??Model selection: AIC, BIC, stepwise■Best subsets regression (leaps): adjusted R2, Cp.■Akaike Information Criterion (AIC)AIC(M) = −2 log L(M) + 2 ·
View Full Document