Proxy Variables

(2 pages)
Previewing page 1 of actual document.

Proxy Variables


Lecture number:
Lecture Note
Cornell University
Econ 3120 - Applied Econometrics

Unformatted text preview:

Lecture 19 Outline of Current Lecture I. Generalized Least Squares and Feasible Generalized Least Squares Current Lecture II. Proxy Variables Proxy Variables As we have seen, exclusion of key explanatory variables from a regression can bias coefficients. Consider the regression: log(wage) = β0 +β1educ+β2exper +β3abil +u (1) If we run a regression that omits ability, then the composite error term (β3abil +u) could be correlated with educ, leading to biased estimates for β1. What we have done at various points in this class is use a proxy variable for abil. In this case, let’s use IQ, and estimate log(wage) = β0 +β1educ+β2exper +β3IQ+e (2) How do we relate equations (1) and (2)? Consider the “auxiliary” equation that relates IQ and ability: abil = δ0 +δ1IQ+ν (3) Substituting (3) into (1) yields: log(wage) = β0 +β1educ+β2exper +β3(δ0 +δ1IQ+ν) +u = β0 +β3δ0 +β1educ+β2exper +β3δ1IQ+β3ν +u From this equation, we can see that the key assumption is that E(β3ν +u|educ, exper,IQ) = 0. In particular, this means that the unobserved component of ability ν, cannot be related to education or experience. In other words, after IQ is controlled for, the remaining variation of ability is uncorrelated with the x’s. 1 A natural question is therefore when we can expect ν to be uncorrelated with education or experience. This is difficult, if not impossible, to prove. The argument against IQ as a valid proxy variable would be that IQ isn’t the only component of ability that would lead to more schooling. For example, a drive to succeed may not be related to IQ but could lead to higher education and earnings. 1.1 Using a Lagged Dependent Variable as a Proxy Variable If we have a panel of data that spans multiple periods, we can sometimes use the lagged (or lastperiod) value of the dependent variable as a proxy variable. Suppose we are interested in measuring the impact of a remedial education program on child learning, and we have child test scores before and after the program. We run the model scorei,t = β0 +β1 programi +β2scorei,t−1 +u where programi indicates whether the child was in the remidial education program. Here, the lagged test score is meant to serve as a proxy variable for schooling inputs up to time t, so that the effect of the program is not attributed to those inputs. As before, the lagged score must absorb all unobserved factors related to program placement that also affect test scores in time t. If, for example, students are placed in the program based on their potential to improve, then this may not be entirely reflected in scorei,t−1. 2 Measurement Error 2.1 Dependent Variable Suppose we are interested in running the following regression model: y ∗ = β0 +β1x1 +...+βkxk +u (4) where y ∗ represents a true value of the underlying variable, and y represents what you observe. Measurement is given by e0 = y−y ∗ 2 Where e0 represents mean-zero “noise”, uncorrelated with y ∗ or u. Thus, you run the ...

View Full Document

Access the best Study Guides, Lecture Notes and Practice Exams