New version page

# CORNELL ECON 3120 - Proxy Variables

Type: Lecture Note
Pages: 2
Documents in this Course

## This preview shows page 1 out of 2 pages.

View Full Document

End of preview. Want to read all 2 pages?

View Full Document
Unformatted text preview:

Econ 3120 1st Edition Lecture 19Outline of Current Lecture I. Generalized Least Squares and Feasible Generalized Least SquaresCurrent LectureII. Proxy VariablesProxy Variables As we have seen, exclusion of key explanatory variables from a regression can bias coefficients. Consider the regression: log(wage) = β0 +β1educ+β2exper +β3abil +u (1) If we run a regression that omits ability, then the composite error term (β3abil +u) could be correlatedwith educ, leading to biased estimates for β1. What we have done at various points in this class is use a proxy variable for abil. In this case, let’s use IQ, and estimate log(wage) = β0 +β1educ+β2exper +β3IQ+e (2) How do we relate equations (1) and (2)? Consider the “auxiliary” equation that relates IQ and ability: abil = δ0 +δ1IQ+ν (3) Substituting (3) into (1) yields: log(wage) = β0 +β1educ+β2exper +β3(δ0 +δ1IQ+ν) +u = β0 +β3δ0 +β1educ+β2exper +β3δ1IQ+β3ν +u From this equation, we can see that the key assumption is that E(β3ν +u|educ, exper,IQ) = 0. In particular, this means that the unobserved component of ability ν, cannot be related to education or experience. In other words, after IQ is controlled for, the remaining variation of ability is uncorrelated with the x’s. 1 A natural question is therefore when we can expect ν to be uncorrelated with education or experience. This is difficult, if not impossible, to prove. The argument against IQ as a valid proxy variable would be that IQ isn’t the only component of ability that would lead to more schooling. For example, a drive to succeed may not be related to IQ but could lead to higher education and earnings. 1.1 Using a Lagged Dependent Variable as a Proxy Variable If we have a panel of data that spans multiple periods, we can sometimes use the lagged (or lastperiod) value of the dependent variable as a proxy variable. Suppose we are interested in measuring the impact of a remedial education program on child learning, and we have child test scores before and after the program. We run the model scorei,t = β0 +β1 programi +β2scorei,t−1 +u where programi indicates whether the child was in the remidial education program. Here, the lagged test score is meant to serve as a proxy variablefor schooling inputs up to time t, so that the effect of the program is not attributed to those inputs. As before, the lagged score must absorb all unobserved factors related to program placement that also affect test scores in time t. If, for example, students are placed in the program based on their potential to improve, then this may not be entirely reflected in scorei,t−1. 2 Measurement Error 2.1 Dependent Variable Suppose we are interested in running the following regression model: y ∗ = β0 +β1x1 +...+βkxk +u (4) where y ∗ represents a true value of the underlying variable, and y represents what you observe. Measurement is given by These notes represent a detailed interpretation of the professor’s lecture. GradeBuddy is best used as a supplement to your own notes, not as a substitute.e0 = y−y ∗ 2 Where e0 represents mean-zero “noise”, uncorrelated with y ∗ or u. Thus, you run the regression y = β0 +β1x1 +...+βkxk +u+e0 (5) When will estimates using (5) instead of (4) give consistent estimators? This happens as long as e0 is not correlated with the x 0 s that is, if E(e0|x1,..., xk) = 0. Although the estimators under (5) will still be consistent, the standard errors will increase. This occurs because the composite error term is now u+e0. Assuming that u and e0 areuncorrelated, Var(u+e0) = σ 2 u +σ 2

View Full Document Unlocking...