**Unformatted text preview:**

Unit 4 Test Study GuideBivariate Regression—Hypothesis: independent variable causes change in the dependent variable Y=Dependent VariableX=Independent Variableβ=Coefﬁcient=Slope∝=Constant=Y-Intercept**A one unit change in X results in a β change in (the expected value) of Y**- If β is large, steep line, it means the independent variable has a large effect on the dependent variable; in comparison, if β is small, there is a less of an effect on the dependent variable. Y= ∝+βXOrdinary Least Squares Regression (OLS)—a type of Multiple Regression Model; only use if: 1. our dependent variable is continuous and unbounded or 2. Our dependent variable is normally distributed. Y=∝+β1X1+β2X2+ β3X3… βkXkT-statistic—the statistical significance- T-Test/statistic: t=β−0s^β which essentially means t=Coefficient ( β)Standard Error of Coefficient (β)- T-statistic goes up, p-value goes down, lower probability the null hypotheses is true- N: sample size–Degrees of Freedom (D.F.) for Multiple Regression Models = N− ¿# of independent variables− ¿1- Degrees of Freedom for a Bivariate Regression= N−¿2- P-value: P=probability there is no relationship; ranges between 0 and 1; lower p-values means there is more conﬁdence (we want less than or equal to .05). Probability that the null hypothesis is true, given our data. Our research hypothesis is supported with low p-values, and our null hypothesis is supported with high p-values. REMEMBER: P-values: .05=1.96 & .10=1.64***T-values have to pass the critical values at the p-values above in order to reject the null***Null Hypothesis: β=0; t = 0.**Remember the goal is to reject the null hypothesis**Assumptions of OLS Model (The descriptions you can give about the results of the OLS Model)—1. Linearity —A straight line adequately represents the relationship in the population; ﬁtting a linear model to nonlinear relationship results in biased estimates2. Independent Observations —the values of the dependent variable are independent of each other; without independence the estimates are unbiased, but the standard errors are typically biased downward (this means we are more likely to mistakenly reject the null hypothesis). Polling individuals uniquely instead of in a group. 3. Dependent variable is normally distributed (or at least plausible normally distributed)- Normality,meaning the errors are normally distributed, this also implies that the dependent variable is normally distributedUnusual Cases and the Regression Line (things you should take notice of)—Linearity—O.L.S. assumes the relationship between the independent variable and the dependent variable is linear. Ask yourself: is the relationship between the independent and dependent variable in a straight line?Outliers—when a case has an unusual Y value given it’s X value. Ask yourself: does a case have an unusual value of its dependent value given the value of its dependent variable?Leverage—When a case has an unusual X value (leverage is not always bad). Ask yourself: does the case have an unusual value for its independent variable? i.e. is it far from the mean of the independent variable?Influence—a case that is both an outlier and has leverage is said to “influence” the regression line (it affects both the constant and the slope). Ask yourself: does an outlier case have high leverage?Prediction from Models- We can use the model to predict the dependent variable for every case in our data set (those predictions are not going to be perfect).-Residuals: The difference between the actual value of the dependent variable and the predicted value of the dependent variable— ^u=Yi−^Yi“Goodness of fit” Models— it tells us how well the model predicts the dependent variable; the standarderror of the regression model.1. Root Mean Square Error—THE AVERAGE ERROR FROM OUR MODEL; it provides a measure of the average accuracy of the model in the metric of the dependent variable. The root mean squared error is ameasure of the typical deviations from the regression line. It is essentially the average of all residuals of the model. **The smaller the residuals, the better the “goodness of ﬁt”Root MSE=√∑^ui2N −k ; where k=# of independent variables−1, and ^ui2 are the residuals (can also be interpreted as Yi−^Yi ). 2. R2—it is the model that tells us the proportion of the variance in the dependent variable; therefore it ranges from 0-1.The closer our R2 is to 1, the more of the variation our model explains; the closer our R2 is to 1, the better our model is at predicting the dependent variable.R2=Regression∑of squarestotal∑of squares (Deviations from the mean predicted by our model over the total deviations from the mean)Regression sum of squares=total sum of squares−¿residual sum of squares (total deviations from mean minus deviations that our model does not explain)To simplify: R2 = Total∑of square — residual∑of squareTotal∑of SquaresFormula: R2=∑(Y ¿¿i−Y )2−∑¿¿¿¿¿- We calculate the total sum of squares like we did before: subtract the mean of the dependent variable from all the observed values of the dependent variable and then square that and add them all up. - The size of R2 is most important when we are trying to build the model that is most predictive because if we are simply hypothesis testing, the value of R2 is less important. When you need to include control variables: when random assignment is not present in an experimentWithout random assignment, we need to statistically control for potentially confounding variables.Random assignment is randomly putting cases into the control or experimental group. Multiple Regression Model – when we have multiple hypotheses of what independent variables are causing our dependent variable (regression models only show correlation, but we must still infer causation, and so to infer causation, we must rule out alternative explanations). Remember spuriousness is when another factor may be the actual cause of the relationship and you may be overlooking it (x <-z-> y).We go from this: Y=∝+β1X1+β2X2+ β3X3… βkXk to this: Y=∝+β1X1+β2Z Y= Dependent variableX= Independent VariableZ= Controlling for SpuriousnessX and Y are both independent variables- Including Z in the model, allows us to examine the effect of X holding Z constant (therefore, we’llknow how X influences Y without worrying about Spuriousness). **Y=∝+β1X1+β2Z-β1= The

View Full Document