##
This **preview** shows page *1-2-19-20*
out of 20 **pages**.

*View Full Document*

End of preview. Want to read all 20 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document**Unformatted text preview:**

Lecture 6. Regression Analysis: Basic Theory IIProf. Edson Severnini (CMU) Applied Econometrics I 1 / 20The Conditional Expectation Function and RegressionIRecall that when we have a linear CEF like this oneE [lnYi|Pi, GROUPi, SATi, lnPIi]= α + βPi+150Xj=1γGROUPji+ δ1SATi+ δ2lnPIiregression is strategy that, as Angrist and Pischke sayI“matches students by value of GROUPi, SATi, and lnPIi”I“compares the average earnings of matched students who wentto private (Pi= 1) and public (Pi= 0) schools for eachpossible combination of the conditioning variables”I“produces a single average by averaging all of thesecell-specific contrasts”IThis kind of regression can be useful for estimating causalimpactsIWe typically estimate the model using OLSProf. Edson Severnini (CMU) Applied Econometrics I 2 / 20Ordinary Least Squares (OLS) RegressionIIn Lecture 5 we talked about the mechanics of OLS with oneregressorYi= α + βXi+ eiIα is the interceptIβ is a slopeIeiis the error term or residualIOLS means we are forming estimators ˆα andˆβ that minimizeresidual sum of squaresRSS =nXi=1e2iProf. Edson Severnini (CMU) Applied Econometrics I 3 / 20Ordinary Least SquaresIOLS estimators, found by minimizing the RSSˆβ =C (Xi, Yi)V (Xi)ˆα =¯Y −ˆβ¯XIClearly if the two variables are uncorrelated we expect theregression slope (β) to be zeroIThe fitted value of a regression isˆYi= ˆα +ˆβXiso the value of Yiis the sum of the fitted value and residualYi=ˆYi+ eiIUseful properties of the residuals eiE [ei] = 0E [Xiei] = 0Prof. Edson Severnini (CMU) Applied Econometrics I 4 / 20Fitted Values and ResidualsIInterpreting the properties of the residualsE [ei] = 0E [Xiei] = 0IThe first property is not surprisingIThe second property is importantIIt tells us that the regression residuals are uncorrelated withthe regressors that made them (the covariance is 0)IThus the residual eiis that part of the dependent variable Yithat is independent of the regressorsProf. Edson Severnini (CMU) Applied Econometrics I 5 / 20Regressions with a Dummy RegressorIWe have already seen that sometimes it will be useful to havea dummy variable as a regressorILet Dibe a dummy variable, withE [Yi|Di= 0] = αE [Yi|Di= 1] = α + βThen β tells us the difference in Yiassociated with thedummy switching from 0 to 1β = E [Yi|Di= 1] − E [Yi|Di= 0]INow if we estimate a regression modelYi= α + βDi+ eiˆβ estimates this difference in the conditional expected value ofthe outcome variableProf. Edson Severnini (CMU) Applied Econometrics I 6 / 20Multiple RegressionIMost interesting regressions have multiple regressorsILet’s think about the case with two regressorsYi= α + β1X1i+ β2X2i+ eiINow with a little work (calculating OLS with two regressors)we can prove thatˆβ1=Cov(Yi,˜X1i)V (˜X1i)where˜X1iis the residual from the regression of X1ion X2iX1i= π0+ π1X2i+˜X1iIA key observation: the residual˜X1iis that part of X1ithat isnot correlated with X2iProf. Edson Severnini (CMU) Applied Econometrics I 7 / 20Multiple RegressionILet’s take an exampleYi= α + β1X1i+ β2X2i+ eiwith Yibeing equal to earnings, X1ia dummy for privatecollege, and X2ia measure of parental incomeINowˆβ1=Cov(Yi,˜X1i)V (˜X1i)where˜X1iis the residual from the regression of private collegeattendance X1ion parental income X2iX1i= π0+ π1X2i+˜X1iIHere the residual˜X1iis that part of the private-collegeattendance decision (X1i) that is not correlated with parentalincome (X2i)Prof. Edson Severnini (CMU) Applied Econometrics I 8 / 20Multiple RegressionIWe can generalize this idea for a model with K regressorsYi= α + β1X1i+ . . . + βKXKi+ eiINow for the kth regressorˆβk=Cov(Yi,˜Xki)V (˜Xki)where˜Xkiis the residual from the regression of Xkion allother K − 1 regressorsIThe key idea here is that˜Xkiis that part of Xkithat is notcorrelated with the other regressorsIPut another way, we are learning about the distinctive rolethat Xkihas in the CEF of YiProf. Edson Severnini (CMU) Applied Econometrics I 9 / 20Multiple RegressionIThe interpretation we have been discussing is especiallyrevealing if we are interested in a dummy variable coefficientIFor example, in the private college examplelnYi= α + βPi+150Xj=1γjGROUPji+ eiINow if we regress Pion all other regressors we will just beestimating the group mean Pjso if person i is in group j˜Pij= Pij−¯PjIThis means thatˆβ =Cov(lnYi,˜Pij)V (˜Pij)=Cov(lnYi, Pij−¯Pj)V (Pij−¯Pj)IIn this case OLS is an estimator that depends purely on thewithin-group covariance on log income and private schoolattendanceProf. Edson Severnini (CMU) Applied Econometrics I 10 / 20Multiple RegressionILet’s try an example with n = 8log income (lnYi) PijGROUP0iGROUP1ilnY11 1 0lnY21 1 0lnY30 1 0lnY40 1 0lnY51 0 1lnY60 0 1lnY70 0 1lnY80 0 1ILet’s start by finding Cov (lnYi, Pij−¯Pj) = E (lnYi.(Pij−¯Pj))lnY11 −12+ lnY21 −12+ lnY30 −12+ lnY40 −12+lnY51 −14+ lnY60 −14+ lnY70 −14+ lnY80 −14Prof. Edson Severnini (CMU) Applied Econometrics I 11 / 20Multiple RegressionISo the Cov(lnYi, Pij−¯Pj) is12(lnY1+ lnY2) −12(lnY3+ lnY4)+34lnY5−14(lnY6+ lnY7+ lnY8)=(lnY1+ lnY2)2−(lnY3+ lnY4)2+34lnY5−(lnY6+ lnY7+ lnY8)3Prof. Edson Severnini (CMU) Applied Econometrics I 12 / 20Multiple RegressionINow, V (Pij−¯Pj) = E ((Pij−¯Pj)2) = 7/4 because1 −122+1 −122+0 −122+0 −122+1 −142+0 −142+0 −142+0 −142=74IThen it is easy to show that Cov(lnYi, Pij−¯Pj)/V (Pij−¯Pj) is47(lnY1+ lnY2)2−(lnY3+ lnY4)2+37lnY5−(lnY6+ lnY7+ lnY8)3Prof. Edson Severnini (CMU) Applied Econometrics I 13 / 20Multiple RegressionIThis is fantastic!IWe have just demonstrated that the OLS estimatorˆβ =Cov(lnYi, Pij−¯Pj)V (Pij−¯Pj)is a weighted average of within-group differences in lnYi47lnY1+ lnY22−lnY3+ lnY42+37lnY5−lnY6+ lnY7+ lnY83Prof. Edson Severnini (CMU) Applied Econometrics I 14 / 20Omitted Variable BiasIIn regression analysis we essentially group people so as tomake all else equalIwe attempt to recreate an experimental setting by controllingfor all relevant factorsIFailure to do so results in omitted variable biasIExample: suppose we can successfully control for differencesin students with a dummy variable for applicant group AiIThe appropriate equation is a long regressionlnYi= αl+ βlPi+ γlAi+ elibut instead we estimate a

View Full Document