**Unformatted text preview:**

Bivariate Regression Hypothesis independent variable causes change in the dependent variable Unit 4 Test Study Guide Y Dependent Variable X Independent Variable Coef cient Slope Constant Y Intercept Y X A one unit change in X results in a change in the expected value of Y If is large steep line it means the independent variable has a large effect on the dependent variable in comparison if is small there is a less of an effect on the dependent variable Ordinary Least Squares Regression OLS a type of Multiple Regression Model only use if 1 our dependent variable is continuous and unbounded or 2 Our dependent variable is normally distributed Y 1 X 1 2 X 2 3 X 3 k X k T statistic the statistical significance N sample size T Test statistic t which essentially means t 0 s Coefficient Standard Error of Coefficient T statistic goes up p value goes down lower probability the null hypotheses is true Degrees of Freedom D F for Multiple Regression Models N of independent variables 1 Degrees of Freedom for a Bivariate Regression N 2 P value P probability there is no relationship ranges between 0 and 1 lower p values means there is more con dence we want less than or equal to 05 Probability that the null hypothesis is true given our data Our research hypothesis is supported with low p values and our null hypothesis is supported with high p values REMEMBER P values 05 1 96 10 1 64 T values have to pass the critical values at the p values above in order to reject the null Null Hypothesis 0 t 0 Remember the goal is to reject the null hypothesis Assumptions of OLS Model The descriptions you can give about the results of the OLS Model 2 1 the values of the dependent variable are independent of each A straight line adequately represents the relationship in the population tting a Linearity linear model to nonlinear relationship results in biased estimates Independent Observations other without independence the estimates are unbiased but the standard errors are typically biased downward this means we are more likely to mistakenly reject the null hypothesis Polling individuals uniquely instead of in a group meaning the errors are normally distributed this also implies that the dependent variable is normally distributed 3 Dependent variable is normally distributed or at least plausible normally distributed Normality Unusual Cases and the Regression Line things you should take notice of Linearity O L S assumes the relationship between the independent variable and the dependent variable is linear Ask yourself is the relationship between the independent and dependent variable in a straight line Outliers when a case has an unusual Y value given it s X value Ask yourself does a case have an unusual value of its dependent value given the value of its dependent variable Leverage When a case has an unusual X value leverage is not always bad Ask yourself does the case have an unusual value for its independent variable i e is it far from the mean of the independent variable Influence a case that is both an outlier and has leverage is said to influence the regression line it affects both the constant and the slope Ask yourself does an outlier case have high leverage Prediction from Models We can use the model to predict the dependent variable for every case in our data set those predictions are not going to be perfect Residuals The difference between the actual value of the dependent variable and the predicted value of the dependent variable u Y i Y i Goodness of fit Models it tells us how well the model predicts the dependent variable the standard error of the regression model 1 Root Mean Square Error THE AVERAGE ERROR FROM OUR MODEL it provides a measure of the average accuracy of the model in the metric of the dependent variable The root mean squared error is a measure of the typical deviations from the regression line It is essentially the average of all residuals of the model The smaller the residuals the better the goodness of t 2 where k of independent variables 1 and ui are the residuals can also be 2 Root MSE ui N k interpreted as Y i Y i 2 R2 it is the model that tells us the proportion of the variance in the dependent variable therefore it ranges from 0 1 The closer our R2 is to 1 the more of the variation our model explains the closer our R2 is to 1 the better our model is at predicting the dependent variable R2 Regression of squares total of squares deviations from the mean Deviations from the mean predicted by our model over the total Regression sum of squares total sum of squares residual sum of squares total deviations from mean minus deviations that our model does not explain To simplify R2 Total of square residual of square Total of Squares Formula R2 Y i Y 2 We calculate the total sum of squares like we did before subtract the mean of the dependent variable from all the observed values of the dependent variable and then square that and add them all up The size of R2 is most important when we are trying to build the model that is most predictive because if we are simply hypothesis testing the value of R2 is less important When you need to include control variables when random assignment is not present in an experiment Without random assignment we need to statistically control for potentially confounding variables Random assignment is randomly putting cases into the control or experimental group Multiple Regression Model when we have multiple hypotheses of what independent variables are causing our dependent variable regression models only show correlation but we must still infer causation and so to infer causation we must rule out alternative explanations Remember spuriousness is when another factor may be the actual cause of the relationship and you may be overlooking it x z y We go from this Y 1 X 1 2 X 2 3 X 3 k X k to this Y 1 X 1 2 Z Y Dependent variable X Independent Variable Z Controlling for Spuriousness X and Y are both independent variables Y 1 X 1 2 Z Including Z in the model allows us to examine the effect of X holding Z constant therefore we ll know how X influences Y without worrying about Spuriousness 1 The effect of X on Y controlling for Z 2 The effect of Z on Y controlling for X With more than three independent variables each is the effect of that particular independent variable holding all other independent variables constant EXAMPLE What caused someone to vote for President Obama Income partisanship ideology Vote 1 Income 2 Partisanship 3 Ideology Now imagine you want to

View Full Document