UW-Madison STAT 572 - Model Selection and Multicollinearity - D220330

Home> Schools> University of Wisconsin, Madison> Statistics (STAT) > STAT 572> Model Selection and Multicollinearity

DOC PREVIEW

UW-Madison STAT 572 - Model Selection and Multicollinearity

School name University of Wisconsin, Madison

Course Stat 572- Statistical Methods for Bioscience II

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Case Study SAT Scores SAT scores Data analysis illustrates model selection and multicollinearity Model Selection and Multicollinearity Data set from 1982 on all fifty states Variables Bret Larget sat State average SAT score verbal plus quantitative takers Percentage of eligble students that take the exam income Median family income of test takers 100 years Average total high school courses in English science history mathematics public Percentage of test takers attending public high school expend Average state dollars spent per high school student 100 rank Median percentile rank of test takers Departments of Botany and of Statistics University of Wisconsin Madison February 20 2007 Statistics 572 Spring 2007 Multiple Linear Regression The Big Picture February 20 2007 1 14 Statistics 572 Spring 2007 Overview Multiple Linear Regression The Big Picture The Big Picture February 20 2007 2 14 Geometric Viewpoint of Regression Geometry When there are many possible explanatory variables often times several models are nearly equally good at explaining variation in the response variable R 2 and adjusted R 2 measure closeness of fit but are poor criteria for variable selection AIC and BIC are sometimes used as objective criteria for model selection Stepwise regression searches for best models but does not always find them Models selected by AIC or BIC are often overfit Consider a data set with n individuals each with a response variable y k explanatory variables x1 xk plus an intercept 1 This is an n k 2 matrix Each row is a point in k 1 dimensional space if we do not plot the intercept We can also think of each column as a vector ray from the origin in n dimensional space The explanatory variables plus the intercept define a k 1 dimensional hyper plane in this space This is caled the column space of X Tests after model selection are not valid typically Parameter interpretation is complex Statistics 572 Spring 2007 Multiple Linear Regression February 20 2007 3 14 Statistics 572 Spring 2007 Multiple Linear Regression February 20 2007 4 14 The Big Picture Geometric Viewpoint of Regression Model Evaluation R2 Geometry cont The vector y y r where r is the residual vector In least squares regression the fitted value y is the orthogonal projection of y into the column space of X The residual vector r is orthogonal perpendicular to the column space of X Two vectors are orthogonal if their dot product equals zero The dot product of w w1 wn and z z1 zn is P n i 1 wi zi r is orthogonal to every explanatory variable including the intercept This explains why the sum of residuals is zero when there is an intercept Understanding least squares regression as projection into a smaller space is helpful for developing intuition about linear models degrees of freedom and variable selection Statistics 572 Spring 2007 Multiple Linear Regression Model Evaluation February 20 2007 5 14 The R 2 statistic is a generalization of the square of the correlation coefficient R 2 can be interpreted as the proportion of the variance in y explained by the regression R2 SSReg SSErr 1 SSTot SSTot Every time a new explanatory variable is added to a model the R 2 increases Statistics 572 Spring 2007 Adjusted R 2 Multiple Linear Regression Variable Selection Adjusted R 2 February 20 2007 6 14 Maximum Likelihood Maximum Likelihood Adjusted R 2 is an attempt to account for additional variables adj R 2 The probability of observable data is represented by a mathematical expression relating parameters and data values For fixed parameter values the total probability is one MSErr 1 MSTot SSErr n k 1 1 SSTot n 1 n 1 SSErr 1 n k 1 SSTot n 1 1 1 R2 n k 1 Likelihood is the same expression for this probability of the observed data but is considered as a function of the parameters with the data fixed The principle of maximum likelihood is to estimate parameters by making the likelihood probability of the observed data as large as possible In regression least squares estimates i are also maximum likelihood estimates The model with the best adjusted R 2 has the smallest 2 Statistics 572 Spring 2007 R2 Multiple Linear Regression February 20 2007 Likelihood is only defined up to a constant typically 7 14 Statistics 572 Spring 2007 Multiple Linear Regression February 20 2007 8 14 Variable Selection AIC Variable Selection AIC BIC BIC Akaike s Information Criterion AIC is based on maximum likelihood and a penalty for each parameter The general form is AIC 2 log L 2p where L is the likelihood and p is the number of parameters In multiple regression this becomes RSS AIC n log 2p C n Multiple Linear Regression Variable Selection February 20 2007 where RSS is the residual sum of squares and C is a constant In R the functions AIC and extractAIC also find BIC setting with the extra argument k log n where n is the number of observations The best model by this criterion minimizes BIC 9 14 Multiple Linear Regression Variable Selection February 20 2007 such models Computing The R function step searches for best models according to AIC or BIC The first argument is a fitted lm model abject This is the starting point of the search Instead we typically begin with a model and attempt to add or remove variables that decrease AIC the most continuing until no single variable change makes an improvement An optional second argument provides a formula of the largest possible model to consider Examples This process need not find the global best model It is wise to begin searches from models with both few and many variables to see if they finish in the same place Statistics 572 Spring 2007 10 14 R code If there are p explanatory variables we can in principle compute AIC or BIC for every possible combination of variables There are Statistics 572 Spring 2007 Computing Stepwise Regression 2p BIC 2 log L log n p where n is the number of observations L is the likelihood and p is the number of parameters In multiple regression this becomes RSS BIC n log log n p C n where RSS is the residual sum of squares and C is a constant In R the functions AIC and extractAIC define the constant differently We only care about differences in AIC so this does not matter so long as we consistently use one or the other The best model by this criterion minimizes AIC Statistics 572 Spring 2007 Schwartz s Bayesian Information Criterion BIC is similar to AIC but penalizes additional parameters more The general form is Multiple Linear Regression February 20 2007 11 14 form formula sat takers income

View Full Document