# CORNELL ECON 3120 - Proxy Variables (2 pages)

Previewing page*1*of 2 page document

**View the full content.**## Proxy Variables

Previewing page
*1*
of
actual document.

**View the full content.**View Full Document

## Proxy Variables

0 0 1279 views

19

- Lecture number:
- 19
- Pages:
- 2
- Type:
- Lecture Note
- School:
- Cornell University
- Course:
- Econ 3120 - Applied Econometrics
- Edition:
- 1

**Unformatted text preview:**

Econ 3120 1st Edition Lecture 19 Outline of Current Lecture I Generalized Least Squares and Feasible Generalized Least Squares Current Lecture II Proxy Variables Proxy Variables As we have seen exclusion of key explanatory variables from a regression can bias coefficients Consider the regression log wage 0 1educ 2exper 3abil u 1 If we run a regression that omits ability then the composite error term 3abil u could be correlated with educ leading to biased estimates for 1 What we have done at various points in this class is use a proxy variable for abil In this case let s use IQ and estimate log wage 0 1educ 2exper 3IQ e 2 How do we relate equations 1 and 2 Consider the auxiliary equation that relates IQ and ability abil 0 1IQ 3 Substituting 3 into 1 yields log wage 0 1educ 2exper 3 0 1IQ u 0 3 0 1educ 2exper 3 1IQ 3 u From this equation we can see that the key assumption is that E 3 u educ exper IQ 0 In particular this means that the unobserved component of ability cannot be related to education or experience In other words after IQ is controlled for the remaining variation of ability is uncorrelated with the x s 1 A natural question is therefore when we can expect to be uncorrelated with education or experience This is difficult if not impossible to prove The argument against IQ as a valid proxy variable would be that IQ isn t the only component of ability that would lead to more schooling For example a drive to succeed may not be related to IQ but could lead to higher education and earnings 1 1 Using a Lagged Dependent Variable as a Proxy Variable If we have a panel of data that spans multiple periods we can sometimes use the lagged or lastperiod value of the dependent variable as a proxy variable Suppose we are interested in measuring the impact of a remedial education program on child learning and we have child test scores before and after the program We run the model scorei t 0 1 programi 2scorei t 1 u where programi indicates whether the child was in the remidial education program Here the lagged test score is meant to serve as a proxy variable for schooling inputs up to time t so that the effect of the program is not attributed to those inputs As before the lagged score must absorb all unobserved factors related to program placement that also affect test scores in time t If for example students are placed in the program based on their potential to improve then this may not be entirely reflected in scorei t 1 2 Measurement Error 2 1 Dependent Variable Suppose we are interested in running the following regression model y 0 1x1 kxk u 4 where y represents a true value of the underlying variable and y represents what you observe Measurement is given by These notes represent a detailed interpretation of the professor s lecture GradeBuddy is best used as a supplement to your own notes not as a substitute e0 y y 2 Where e0 represents mean zero noise uncorrelated with y or u Thus you run the regression y 0 1x1 kxk u e0 5 When will estimates using 5 instead of 4 give consistent estimators This happens as long as e0 is not correlated with the x 0 s that is if E e0 x1 xk 0 Although the estimators under 5 will still be consistent the standard errors will increase This occurs because the composite error term is now u e0 Assuming that u and e0 are uncorrelated Var u e0 2 u 2 e

View Full Document