New version page

# CORNELL ECON 3120 - Omitted Variable Bias with Many Regressors

Type: Lecture Note
Pages: 2
Documents in this Course

3 pages

6 pages

## This preview shows page 1 out of 2 pages.

View Full Document

End of preview. Want to read all 2 pages?

View Full Document
Unformatted text preview:

Econ 3120 1st Edition Lecture 15Outline of Current Lecture I. Omitted Variable Bias with Many RegressorsCurrent LectureII. Dummy VariablesDummy Variables Dummy variables, (aka binary variables, indicator variables or dichotomous variables), are simply variables that take on a value of 0 or 1. They indicate a single status of the observation. Some examples female (=1 for female, =:0 for male) non-white (=1 if race is non-white, =0 if white) urban (=1 if the person lives in an urban area, =0 if lives in a rural area) Note that we could also define our dummy variables to indicate male, white, or rural, but it turns out not to matter (more on this below). Dummy variables change the intercept of the regression equation. For example, suppose we want to examine therelationship between test scores and class sizes in primary schools. We think that the gender of the child also has an effect on test scores, so we include it in the model. We therefore model the relationship as score = β0 +β1 f emale+β2clsize+u (1) How do we interpret β1? β1 actually represents a shift in the intercept associated with the gender of the child. To see this, take the conditional expectation for females and for males: E(score| f emale = 0, clsize) = β0 +β2clsize E(score| f emale = 1, clsize) = β0 +β1 +β2clsize The difference between these two equations is simply a shift in the intercept from β0 to β0 +β1. 1 score Slope = β2 β1 β0 female male class size This interpretation easily generalizes to situations with more independent variables. The coeffi- cients on the continuous variables (i.e., “slope coefficients”) remain the same for different values of the dummy variable, but the dummy variable shifts the intercept.What would happen if you included the dummy variable male in the equation, where male = 1 if the child is a male, and 0 if she is female? You would therefore be running the regression: score = β0 +β1 f emale+β2clsize+β3male+u It is not possible to run this regression, because male is simply a linear combination of f emale (male = 1− f emale). This violates Assumption MLR.3. If you tried to do this in Stata, the program would drop one of these dummy variables for you. Thus, you could include either male or f emale, but not both. It turns out not to matter which one you include. If you ran the regression score = α0 +α1male+α2clsize+u (2) Then, using male = 1 − f emale, you can show that (2) becomes (1) when you set α0 = β0 + β1 and α1 = −β1 Note that we can use dummy variables if we have more than two categories. Suppose that we have 3 categories for race: white, black, and other. We run the regression including two of these variables: score = β0 +β2size+β3white+β4black +u Where again, we have to exclude other since other = 1−white−black. Interactions between dummy variables 2 We can interact dummy variables to create individual intercepts for each sub-category within the two dummy variables. Suppose we interact f emale with a variable public indicating whether the student is in a publicschool. The new var ables are constructed as f emale · public =    1 if f emale = 1, public = 1 0 These notes represent a detailed interpretation of the professor’s lecture. GradeBuddy is best used as a supplement to your own notes, not as a substitute.otherwise We can therefore run the regression: score = β0 +β1 f emale+β2clsize+β3 public+β4 f emale · public+u (3) This regression implies a separate intercept for each gender x school type category. To see this, take conditional expectations to yield the following intercepts: private public male β0 β0 +β3 f emale β0 +β1 β0 +β1 +β3 +β4 Suppose we hadn’t included the interaction and instead ran the regression: score = β0 +β1 f emale+β2clsize+β3 public+u (4) In equation (4), we are not letting the intercept vary by each individual gender x school type category. We are assuming that the difference in mean test scores for females is the same in both public and private school. In equation (3), on the other hand, the effect of being female in a private school is β1, while the effect of being female in a public school is β1 + β4. Thus, β4 allows the effect of gender to vary by school type. The parameter represents the difference in test scores between females and males in public schools, relative to that difference in private schools (holding class size constant). 1 Interactions between dummy variables and continuous variables We can interact dummy variables and other variables to change a slope coefficient in our regression. Suppose we would like to test whether the effect of class size on test scores differs by 1This issometimes called a difference-in-difference estimator, because it can be written as follows: β4 = E(score|f emale = 1, public = 1, clsize)−E(score| f emale = 0, public = 1, clsize)− −[E(score| f emale = 1, public = 0, clsize)−E(score| f emale = 0, public = 0, clsize)] 3 gender. We generate a new variable that equals f emale · clsize and regress: score = β0 +β1 f emale+β2clsize+β3 f emale · clsize+u Differentiating with respect to class size for each value of f emale and holding u constant yields the following slope coefficients: ∂ score ∂ clsize f emale=0 = β2 ∂ score ∂ clsize f emale=1 = β2 +β3 Thus, β3 represents the difference in the effect of class size on test scores for females relative to males. score β1 female slope = β2+β3 β0 male l i slope = β2 class

View Full Document
Unlocking...