Lecture TenLectureProperties of OLS EstimatorsPowerPoint PresentationSlide 5Other Properties of EstimatorsOutline: RegressionAssumptions18.4 Error Variable: Required ConditionsThe Normality of ePathologiesSlide 12Slide 13Slide 14Slide 15Pathologies ( Cont. )Pathologies (Cont.)Slide 18Slide 19Slide 20Slide 21Slide 22Slide 2318.9 Regression Diagnostics - IResidual AnalysisDiagnostics ( Cont. )DiagnosticsSlide 28Slide 29Slide 30Slide 31Diagnostics (Cont.)HeteroscedasticityHomoscedasticityDiagnostics ( Cont.)Non Independence of Error VariablesSlide 37Fix-UpsFix-ups (Cont.)Data Errors: May lead to outliersOutliersSlide 42Procedure for Regression DiagnosticsSlide 44Slide 45Slide 46Part II: Experimental MethodOutlineCritique of RegressionExperimental Method: # ExamplesDeterrence and the Death PenaltyIsaac Ehrlich Study of the Death Penalty: 1933-1969Slide 53Ehrlich Results: Elasticities of Homicide with respect to ControlsCritique of Ehrlich by Death Penalty OpponentsSlide 56Experimental MethodSlide 58Slide 59Police Intervention with Experimental ControlsWhy is Treatment Assigned Randomly?Slide 62Experimental Method: Clinical TrialsConclusions from the Clinical TrialsPseudo Experimental MethodSlide 66Slide 67Slide 68Test Whether the Difference Between Gasolines is Zero:Smallest = 25 Q1 = 46 Median = 55 Q3 = 70.25 Largest = 94 IQR = 24.25 Outliers: Smallest = 25 Q1 = 46 Median = 55 Q3 = 70.25 Largest = 94 IQR = 24.25 Outliers:Slide 71Slide 72Midterm Grade Distribution 2004Midterm 2000Slide 75Slide 761Lecture Ten2Lecture•Part I: Regression–properties of OLS estimators–assumptions of OLS–pathologies of OLS–diagnostics for OLS•Part II: Experimental Method3Properties of OLS Estimators•Unbiased:•Note: y(i) = a + b*x(i) + e(i)•And summing over observations i and dividing by n:•Recall, the estimator for the slope is: bbE )ˆ(])([])([*)(,,*eiexixbyiygsubtractinsoexbay ninixixxixyiyb1 12])([])(][)([ˆ4•And substituting in this expression for the estimator, the expression for•And taking expectations•Note:• yiy )( ninininixixxixeiebbxixxixeiexixbb1 12121])([])(][)([ˆ])([])([*]})([])([*{ˆbbE )ˆ( ninixixeiexixbbbEb1 12])([])(][)([ˆ)ˆ(ˆ ninixixeiexixEbEbEbVAR1 1222}])([])(][)([{)]ˆ(ˆ[)ˆ(5•So•The dispersion in the estimate for the slope depends upon unexplained variance, and inversely on the dispersion in x.•the estimate, the unexplained mean square, is used for the variance of e. nixixbVAR122])([)ˆ(6Other Properties of Estimators•Efficiency: makes optimum use of the sample information to obtain estimators with minimum dispersion•Consistency: As the sample size increases the estimator approaches the population parameter7Outline: Regression•The Assumptions of Least Squares•The Pathologies of Least Squares•Diagnostics for Least SquaresAssumptions•Expected value of the error is zero, E[e]= 0•The error is independent of the explanatory variable, E{e [x-Ex]}=0•The errors are independent of one another, E[e(i)e(j)] = 0 , i not equal to j.•The variance is homoskedatic, E[e(i)]2=E[e(j)]2 •The error is normal with mean zero and variance sigma squared,2918.4 Error Variable: Required Conditions•The error is a critical part of the regression model.•Four requirements involving the distribution of must be satisfied.–The probability distribution of is normal.–The mean of is zero: E() = 0.–The standard deviation of is for all values of x.–The set of errors associated with different values of y are all independent.The Normality of From the first three assumptions we have:y is normally distributed with meanE(y) = 0 + 1x, and a constant standard deviation From the first three assumptions we have:y is normally distributed with meanE(y) = 0 + 1x, and a constant standard deviation 0 + 1x10 + 1x20 + 1x3E(y|x2)E(y|x3)x1x2x3E(y|x1)The standard deviation remains constant,but the mean value changes with x11Pathologies•Cross section data: error variance is heteroskedatic. Example, could vary with firm size. Consequence, all the information available is not used efficiently, and better estimates of the standard error of regression parameters is possible.•Time series data: errors are serially correlated, i.e auto-correlated. Consequence, inefficiency.12Lab 6: Autocorrelation?13Lab Six: Durbin-Watson Statistic1415Genr: Error = residGenr: errorlag1=resid(-1)Error (t) = a +b *error(t-1) + e(t)16Pathologies ( Cont. )•Explanatory variable is not independent of the error. Consequence, inconsistency, i.e. larger sample sizes do not lead to lower standard errors for the parameters, and the parameter estimates (slope etc.) are biased.•The error is not distributed normally. Example, there may be fat tails. Consequence, use of the normal may underestimate true 95 % confidence intervals.17Pathologies (Cont.)•Multicollinearity: The independent variables may be highly correlated. As a consequence, they do not truly represent separate causal factors, but instead a common causal factor.18View/open selected/one window/one groupIn Group Window: View/ correlationsView/open selected/one window/one groupIn Group Window: View/Multiple Graphs/Scatter/Matrix of all pairs1920Price = a +b*bedrooms+c*house_size01 + d*lot_sixe01+e212223Price = a*dummy2 +b*dummy34 +c*dummy5 +d*house_size01 +e2418.9 Regression Diagnostics - I•The three conditions required for the validity of the regression analysis are:–the error variable is normally distributed.–the error variance is constant for all values of x.–The errors are independent of each other.•How can we diagnose violations of these conditions?25 Residual Analysis•Examining the residuals (or standardized residuals), help detect violations of the required conditions.•Example 18.2 – continued:–Nonnormality. •Use Excel to obtain the standardized residual histogram.•Examine the histogram and look for a bell shaped. diagram with a mean close to zero.26Diagnostics ( Cont. )•Multicollinearity may be suspected if the t-statistics for the coefficients of the explanatory variables are not significant but the coefficient of determination is high. The correlation between the explanatory variable can
View Full Document