Econ 3120 1st Edition Lecture 12 Outline of Current Lecture I Goodness of Fit II Unbiasedness Current Lecture III Motivation Motivation Multiple regression allows us to account for more than one factor in explaining our dependent variable y Consider the familiar example of the relationship between schooling and wages Suppose we have data on the SAT score of the individual while she was in high school We might be interested in estimated a relationship of the form log wage 0 1educ 2SAT u Or to take a simple model from macroeconomics suppose we want to estimate the determinants of a country s growth rate We may model the growth rate of a country from 1980 2000 as a function of per capita income in 1980 and income inequality as measured by the Gini coefficient growthrate 0 1inc80 2Gini u A multivariate model with two independent variables x1 and x2 takes the form y 0 1x1 2x2 u In this case 1 represents the change in the y for a one unit change in x1 holding all other factors x2 and u fixed This is the partial derivative of y with respect to x1 holding x2 and u fixed Our x 0 s don t have to be separate variables they can actually be f unctions of the same variable For example suppose we are studying the relationship between household consumption and income and we model the relationship as follows cons 0 inc 2inc2 u 1 In this case the effect of income on consumption depends on both 1 and 2 inc cons u constant 1 2inc The general form of the multivariate model with k independent variables is y 0 1x1i 2x2i kxki ui 1 Note that I use the notation x ji for observation i and variable x j while Wooldridge uses xi j Analogous to the bivariate model the key assumption is the independence of the error term and the regressors independent variables E u x1 x2 xk 0 This implies that u must be independent of and uncorrelated with all of the explanatory variables x j If u is correlated with any of these variables the assumption does not hold and our estimates will be unbiased more on this later on 2 Estimating Multivariate Regression Parameters Estimation of s in a multivariate models follows a similar procedure to bivariate estimation We first start with the independence assumption E u x1 x2 xk 0 Cov x j u 0 and impose the sample analog on our estimates using equation 1 This implies 1 n x1 yi 0 1x1i kxki 0 1 n x2 yi 0 1x1i kxki 0 1 n xk yi 0 1x1i kxki 0 Note that these equations are the same as the first order conditions from the minimization of the sum of squared residuals min 0 k u 2 i 2 min 0 k yi 0 1x1i kxki 2 The actual computation of these estimates is very involved and is best left to Stata 3 Fitted Values Residuals and Goodness of Fit 3 1 Fitted Values and Residuals OLS fitted values and residuals are constructed just as they were in the These notes represent a detailed interpretation of the professor s lecture GradeBuddy is best used as a supplement to your own notes not as a substitute bivariate case y i 0 1x1i kxki u i yi y i The algebraic properties of residuals and fitted values that we defined in the bivariate case also hold in the multivariate case 1 The sum of the OLS residuals is zero u i 0 2 The sample covariance between each x jand the estimated residuals is zero 1 n 1 xi x u i u 0 x jiu i 0 3 The sample covariance between the fitted values and the estimated residuals is zero 1 n 1 y i y u i u 0 y iu i 0 3 2 Goodness of fit We construct R 2 our goodness of fit measure just as before 3 total sum of squares SST yi y 2 explained sum of squares SSE y i y 2 residual sum of squares SSR u 2 i R 2 SSE SST 1 SSR SST The R 2 represents the proportion of the variation in y that is explained by the variation in the x s One important feature of R 2 in the multivariate case is that it always increases when additional variables are added to the regression This follows because SST is always the same but including additional variables may decrease the SSR even if it s only by a little Thus we cannot simply use an increase in R 2 as evidence that an additional regressor belongs in the model 1 4 Regression Anatomy Here s another way to think about multiple regression coefficients As an aside note that we can write bivarate regression parameters as functions of variances and covariances Consider the model y 0 1x u Now taking the covariance of both sides with x and dividing by Var x yields Cov y x Cov 0 1x u x Cov 1x x 1 Cov y x Var x The estimate 1 is just the sample analog of the right hand side 1 Covd y x Vard x Now back to multiple regression Consider a model with two regressors y 0 1x1 2x2 u 1Note that there are adjustments that are made to R 2 to account for this such as the Adjusted R 2 reported by Stata Normally however people just report the standard R 2 so it is important to keep the above caveat in mind 4 Now take the following auxiliary regressions of the x s on each other x1 10 2x2 x 1 2 x2 20 1x1 x 2 In this case x1 and x2 are the error terms It follows that 1 and 2 are actually the result of bivariate regressions of y on x1 and x2 1 Cov y x 1 Var x 1 2 Cov y x 2 Var x 2 The proof is straightforward Let s take a step back and think about what this means x1 represents the variation in x1 that is left over after accounting for x2 Some people call this partialling out x2 from x1 So 1 is the result of the regression of y on x1 after x1has been purged of variation relating to x2 This is another way to think about what it means to say that we are controlling for x2in the regression We can extend this to models with many regressors by partialling out all of the other x s from a given x j then running the bivariate regression of y on x j Suppose we have the general model y 0 1x1 kxk u To find j we do the following 1 Regress x j on all other x s Gather the residuals x j 2 Run the resulting bivariate regression of y on
View Full Document