Lecture TenWhere Do We Go From Here?LectureProperties of OLS EstimatorsSlide 5Slide 6Other Properties of EstimatorsOutline: RegressionAssumptions18.4 Error Variable: Required ConditionsThe Normality of ePathologiesSlide 13Slide 14Slide 15Slide 16Pathologies ( Cont. )Pathologies (Cont.)Slide 19Slide 20Slide 21Slide 22Slide 23Slide 2418.9 Regression Diagnostics - IResidual AnalysisDiagnostics ( Cont. )DiagnosticsSlide 29Slide 30Slide 31Slide 32Diagnostics (Cont.)HeteroscedasticityHomoscedasticityDiagnostics ( Cont.)Non Independence of Error VariablesSlide 38Fix-UpsFix-ups (Cont.)Data Errors: May lead to outliersOutliersSlide 43Procedure for Regression DiagnosticsPart II: Experimental MethodOutlineCritique of RegressionExperimental Method: # ExamplesDeterrence and the Death PenaltyIsaac Ehrlich Study of the Death Penalty: 1933-1969Slide 51Ehrlich Results: Elasticities of Homicide with respect to ControlsCritique of Ehrlich by Death Penalty OpponentsSlide 54Experimental MethodSlide 56Slide 57Police Intervention with Experimental ControlsWhy is Treatment Assigned Randomly?Slide 60Experimental Method: Clinical TrialsConclusions from the Clinical TrialsPseudo Experimental MethodChallengerSlide 65Slide 66Slide 67Slide 68Slide 69Slide 70Slide 71Test Whether the Difference Between Gasolines is Zero:Slide 73Slide 74Slide 75Slide 76Midterm 2000Slide 78Slide 791Lecture Ten2Where Do We Go From Here?RegressionPropertiesAssumptionsViolationsDiagnosticsModelingProbabilityProbabilityCountANOVAContingency Tables3Lecture•Part I: Regression–properties of OLS estimators–assumptions of OLS–pathologies of OLS–diagnostics for OLS•Part II: Experimental Method4Properties of OLS Estimators•Unbiased:•Note: y(i) = a + b*x(i) + e(i)•And summing over observations i and dividing by n:•Recall, the estimator for the slope is: bbE )ˆ(])([])([*)(,,*eiexixbyiygsubtractinsoexbay ninixixxixyiyb1 12])([])(][)([ˆ5•And substituting in this expression for the estimator, the expression for•And taking expectations•Note:• yiy )( ninininixixxixeiebbxixxixeiexixbb1 12121])([])(][)([ˆ])([])([*]})([])([*{ˆbbE )ˆ( ninixixeiexixbbbEb1 12])([])(][)([ˆ)ˆ(ˆ ninixixeiexixEbEbEbVAR1 1222}])([])(][)([{)]ˆ(ˆ[)ˆ(6•So•The dispersion in the estimate for the slope depends upon unexplained variance, and inversely on the dispersion in x.•the estimate, the unexplained mean square, is used for the variance of e. nixixbVAR122])([)ˆ(7Other Properties of Estimators•Efficiency: makes optimum use of the sample information to obtain estimators with minimum dispersion•Consistency: As the sample size increases the estimator approaches the population parameter8Outline: Regression•The Assumptions of Least Squares•The Pathologies of Least Squares•Diagnostics for Least SquaresAssumptions•Expected value of the error is zero, E[e]= 0•The error is independent of the explanatory variable, E{e [x-Ex]}=0•The errors are independent of one another, E[e(i)e(j)] = 0 , i not equal to j.•The variance is homoskedatic, E[e(i)]2=E[e(j)]2 •The error is normal with mean zero and variance sigma squared,21018.4 Error Variable: Required Conditions•The error is a critical part of the regression model.•Four requirements involving the distribution of must be satisfied.–The probability distribution of is normal.–The mean of is zero: E() = 0.–The standard deviation of is for all values of x.–The set of errors associated with different values of y are all independent.The Normality of From the first three assumptions we have:y is normally distributed with meanE(y) = 0 + 1x, and a constant standard deviation From the first three assumptions we have:y is normally distributed with meanE(y) = 0 + 1x, and a constant standard deviation 0 + 1x10 + 1x20 + 1x3E(y|x2)E(y|x3)x1x2x3E(y|x1)The standard deviation remains constant,but the mean value changes with x12Pathologies•Cross section data: error variance is heteroskedatic. Example, could vary with firm size. Consequence, all the information available is not used efficiently, and better estimates of the standard error of regression parameters is possible.•Time series data: errors are serially correlated, i.e auto-correlated. Consequence, inefficiency.13Lab 6: Autocorrelation?14Lab Six: Durbin-Watson Statistic1516Genr: Error = residGenr: errorlag1=resid(-1)Error (t) = a +b *error(t-1) + e(t)17Pathologies ( Cont. )•Explanatory variable is not independent of the error. Consequence, inconsistency, i.e. larger sample sizes do not lead to lower standard errors for the parameters, and the parameter estimates (slope etc.) are biased.•The error is not distributed normally. Example, there may be fat tails. Consequence, use of the normal may underestimate true 95 % confidence intervals.18Pathologies (Cont.)•Multicollinearity: The independent variables may be highly correlated. As a consequence, they do not truly represent separate causal factors, but instead a common causal factor.19View/open selected/one window/one groupIn Group Window: View/ correlationsView/open selected/one window/one groupIn Group Window: View/Multiple Graphs/Scatter/Matrix of all pairs2021Price = a +b*bedrooms+c*house_size01 + d*lot_sixe01+e222324Price = a*dummy2 +b*dummy34 +c*dummy5 +d*house_size01 +e2518.9 Regression Diagnostics - I•The three conditions required for the validity of the regression analysis are:–the error variable is normally distributed.–the error variance is constant for all values of x.–The errors are independent of each other.•How can we diagnose violations of these conditions?26 Residual Analysis•Examining the residuals (or standardized residuals), help detect violations of the required conditions.•Example 18.2 – continued:–Nonnormality. •Use Excel to obtain the standardized residual histogram.•Examine the histogram and look for a bell shaped. diagram with a mean close to zero.27Diagnostics ( Cont. )•Multicollinearity may be suspected if the t-statistics for the coefficients of the explanatory variables are not significant but the coefficient of determination is high. The correlation between the explanatory variable can then be
View Full Document