DOC PREVIEW
UW-Madison STAT 572 - Model Selection and Multicollinearity

This preview shows page 1-2 out of 7 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Model Selection and Multicollinearity Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 20 2007 Statistics 572 Spring 2007 Multiple Linear Regression Case Study February 20 2007 1 14 SAT Scores SAT scores Data analysis illustrates model selection and multicollinearity Data set from 1982 on all fifty states Variables sat State average SAT score verbal plus quantitative takers Percentage of eligble students that take the exam income Median family income of test takers 100 years Average total high school courses in English science history mathematics public Percentage of test takers attending public high school expend Average state dollars spent per high school student 100 rank Median percentile rank of test takers Statistics 572 Spring 2007 Multiple Linear Regression February 20 2007 2 14 The Big Picture Overview The Big Picture When there are many possible explanatory variables often times several models are nearly equally good at explaining variation in the response variable R 2 and adjusted R 2 measure closeness of fit but are poor criteria for variable selection AIC and BIC are sometimes used as objective criteria for model selection Stepwise regression searches for best models but does not always find them Models selected by AIC or BIC are often overfit Tests after model selection are not valid typically Parameter interpretation is complex Statistics 572 Spring 2007 Multiple Linear Regression The Big Picture February 20 2007 3 14 Geometric Viewpoint of Regression Geometry Consider a data set with n individuals each with a response variable y k explanatory variables x1 xk plus an intercept 1 This is an n k 2 matrix Each row is a point in k 1 dimensional space if we do not plot the intercept We can also think of each column as a vector ray from the origin in n dimensional space The explanatory variables plus the intercept define a k 1 dimensional hyper plane in this space This is caled the column space of X Statistics 572 Spring 2007 Multiple Linear Regression February 20 2007 4 14 The Big Picture Geometric Viewpoint of Regression Geometry cont The vector y y r where r is the residual vector In least squares regression the fitted value y is the orthogonal projection of y into the column space of X The residual vector r is orthogonal perpendicular to the column space of X Two vectors are orthogonal if their dot product equals zero The dot product of w w1 wn and z z1 zn is P n i 1 wi zi r is orthogonal to every explanatory variable including the intercept This explains why the sum of residuals is zero when there is an intercept Understanding least squares regression as projection into a smaller space is helpful for developing intuition about linear models degrees of freedom and variable selection Statistics 572 Spring 2007 Multiple Linear Regression Model Evaluation February 20 2007 5 14 R2 R2 The R 2 statistic is a generalization of the square of the correlation coefficient R 2 can be interpreted as the proportion of the variance in y explained by the regression R2 SSReg SSErr 1 SSTot SSTot Every time a new explanatory variable is added to a model the R 2 increases Statistics 572 Spring 2007 Multiple Linear Regression February 20 2007 6 14 Model Evaluation Adjusted R 2 Adjusted R 2 Adjusted R 2 is an attempt to account for additional variables MSErr MSTot SSErr n k 1 1 SSTot n 1 n 1 SSErr 1 n k 1 SSTot n 1 1 1 R2 n k 1 adj R 2 1 The model with the best adjusted R 2 has the smallest 2 Statistics 572 Spring 2007 Multiple Linear Regression Variable Selection February 20 2007 7 14 Maximum Likelihood Maximum Likelihood The probability of observable data is represented by a mathematical expression relating parameters and data values For fixed parameter values the total probability is one Likelihood is the same expression for this probability of the observed data but is considered as a function of the parameters with the data fixed The principle of maximum likelihood is to estimate parameters by making the likelihood probability of the observed data as large as possible In regression least squares estimates i are also maximum likelihood estimates Likelihood is only defined up to a constant typically Statistics 572 Spring 2007 Multiple Linear Regression February 20 2007 8 14 Variable Selection AIC AIC Akaike s Information Criterion AIC is based on maximum likelihood and a penalty for each parameter The general form is AIC 2 log L 2p where L is the likelihood and p is the number of parameters In multiple regression this becomes RSS AIC n log 2p C n where RSS is the residual sum of squares and C is a constant In R the functions AIC and extractAIC define the constant differently We only care about differences in AIC so this does not matter so long as we consistently use one or the other The best model by this criterion minimizes AIC Statistics 572 Spring 2007 Multiple Linear Regression Variable Selection February 20 2007 9 14 BIC BIC Schwartz s Bayesian Information Criterion BIC is similar to AIC but penalizes additional parameters more The general form is BIC 2 log L log n p where n is the number of observations L is the likelihood and p is the number of parameters In multiple regression this becomes RSS BIC n log log n p C n where RSS is the residual sum of squares and C is a constant In R the functions AIC and extractAIC also find BIC setting with the extra argument k log n where n is the number of observations The best model by this criterion minimizes BIC Statistics 572 Spring 2007 Multiple Linear Regression February 20 2007 10 14 Variable Selection Computing Stepwise Regression If there are p explanatory variables we can in principle compute AIC or BIC for every possible combination of variables There are 2p such models Instead we typically begin with a model and attempt to add or remove variables that decrease AIC the most continuing until no single variable change makes an improvement This process need not find the global best model It is wise to begin searches from models with both few and many variables to see if they finish in the same place Statistics 572 Spring 2007 Multiple Linear Regression Variable Selection February 20 2007 11 14 Computing R code The R function step searches for best models according to AIC or BIC The first argument is a fitted lm model abject This is the starting point of the search An optional second argument provides a formula of the largest possible model to consider Examples form formula sat takers income


View Full Document

UW-Madison STAT 572 - Model Selection and Multicollinearity

Download Model Selection and Multicollinearity
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Model Selection and Multicollinearity and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Model Selection and Multicollinearity and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?