Exam 4 Bi variate Regression Remember the sample mean Zero Sum Property xi x 0 Where c is any constant other than the mean Least Square properties Where do we go from here Theories Hypotheses Models How do we think the world works Typically Additive Y 1 X1 2 X 2 Some independent variables X How big an effect X has on Y Bivariate Regression Y X Y Dependent variable X Independent Variable Constant Coefficient A one unit change in X results in a 1 and X2 cause changes in some dependent variable Y Our hypothesis An Independent variable X causes change in the Dependent Variable Y Regression tells you the Slope of a relationship A blast from the Past Think back to high school algebra This Bivariate regression is just like lines and Cartesian coordinates change the expected value of Y SAT scores are supposed to estimate a student s first college year GPA Y X also known as the Y Intercept also known as the Slope Bivariate regression example with fake data GPA SAT GPA Y Dependent var SAT X Independent var Now look at the slides Slide 9 shows a scatter plot of all the points If you moved from a SAT score of 1100 to 1200 you would multiply by 100 since 1200 1100 100 which would give you the predicted increase in that change So 057 0 003 Constants and Slopes is our ESTIMATE of is our estimate of Y is our predicted value of Y Y X 0 57 0 003 O L S If someone had and SAT score of zero this would be there GPA remember the lowest SAT score you can get is 200 For every point better on the SAT this is the expected increase in GPA This method of minimizing the sum of squared errors is called We may use O L S if Ordinary Least Squares 1 Our Dependent Variable is CONTINIOUS and UNBOUNDED 2 Our Dependent Variable is Normally Distributed BOTH OF THESE ARE ASSUMPTIONS no lowest or highest level Estimate in Bivariate Regression X i X Y i Y Xi X 2 Y X Things to note The formulas both rely on the Means of the variables These are ESTIMATES of the true unknown and in the population Sensitive to outliers Uncertainty and As usual Since these are estimated based on a sample we are uncertain the about the actual values of That uncertainty is reduced as our sample size increases We call our measure of uncertainty the STANDARD ERROR There is a standard error for and a Standard error for Smaller standard errors the more confident we are that our estimates are equal to the true values Uncertainty around The uncertainty around Is often important However for our purposes we are LESS concerned about 1 Is not directly related to our hypothesis test WHY Usually Sometimes has meaningless value 2 is used in PREDICTING values of Y 3 BUT FIRST Determine if the INDEPENDENT Variable X affects the Dependent variable Y This is NOT the same as reason 2 If is predicting the values of Y that already means there is a relationship And that s related to uncertainty around What are we doing Remember our example 057 0 003 Estimates in Bivariate Regression Variable Constant SAT Coefficient 0 5668 0 0031 Standard error 0 7786 0 0007 We can now determine whether the relationship between SAT and GPA is statistically significant Blasts from the recent Past This hypothesis test is like our previous hypothesis tests Compare our observed to the value of if there were no relationship This will be a t test in which we compare two values and divide the difference b the standard error Though in real life it will be even easier than that b c of computers Statistical Significance Once again we are looking to reject the null Non Directional Hypothesis The null hypothesis is 0 Directional hypothesis The null hypothesis is 0 or that the relationship is in the other direction If 0 No matter how high or low the SAT score is the GPA will remain the same for all data 0 003 so if there is an increase in a SAT score we can increase the GPA by 0 003 The Null hypothesis The null hypothesis is 0 But why But we know points per SAT point gained Testing for Statistical Significance T test t 0 S t coefficient Bivariate regression D F n of parameters Degrees of freedom 1 constant 2 coefficient For bivariate regression D F n 2 Statistical Significance Variable Constant SAT 4 552 t 0 00306 0 00067 Coefficient 0 5668 0 0031 Standard error 0 7786 0 0007 if you go back to slide 10 you see there is 10 data points So 10 2 is 8 8 degrees of freedom We can reject the null hypothesis at the 1 level 05 level and the 01 Level in a two tailed 1 1 869 05 2 306 01 2 896 So we can say there IS a statistically significant relationship between SAT score and GPA Our theory When the president is more popular more people say that they are members of his D O F 8 test Another example party Data Quarterly Gallup survey data on average presidential approval and percent of the respondents who say they are members of the president s party Years 1960 1996 Y X party approval O L S Model Results Variable Constant Approval Coefficient 36 58 0 255 Standard error 4 688 0 085 Does presidential approval have a statistically significant effect on partisanship t 0 255 0 085 2 99 D O F 142 1 1 656 05 1 98 D O F number of quarters 1996 1960 36 25 144 144 2 142 D O F Predicted values We can use predicted values to help understand the size of the effects Y X Y X Pick two values of interest George H W Bush Highest Approval 80 Lowest Approval 35 Y 36 58 0 255 80 56 98 Y 36 58 0 255 35 45 5 our model More predicted values Ronald Reagan Highest Approval 64 Lowest Approval 38 So one of the largest popularity swings changes partisanship by about 11 5 according to Remember there is UNCERTAINTY around these predictions Y 36 58 0 255 64 52 97 Y 36 58 0 255 38 46 3 Predicted Values variable REMEMBER there is error Bivariate regression What did this model do The expected value of the Dependent variable given the specified value of the independent 1 It drew a straight line through the points 2 None of the actual observed points fall on that line So the model does NOT do a good job of predicting values of the dependent variable But it can tell us if the independent variable is affecting the dependent variable Assumptions of O L S Something s to note Hats and bars on top of variables are very important R2 r 2 R2 r 2 R2 r 2 R2 r 2 Y i individual observation Actual value of one observation Y i Prediction of one observation You re …
View Full Document