UNM STAT 145 - Chapter 3 CORRELATION AND REGRESSION

Unformatted text preview:

Chapter 3TOPICSLIDELinear Regression Defined2Regression Equation3The Slope or b4The Y-Intercept or a5What Value of the Y-Variable Should be Predicted When r= 0?7The Regression Line9The Point of Averages12Residuals15Extrapolation, Restricted Range, and Lurking Variables20Tutorials• Obtaining a linear regression analysis in Excel 2007CORRELATION AND REGRESSION➊ The stronger the correlation, the more accurately one variable can be predicted from another variable➋ By using the linear regression equation, we can predict scores for one variable (the Y-variable) from scores on a second variable (the X-variable) The linear regression equation assumes the statistical relationship between two variables follows a straight line known as the regression lineChapter 3LINEAR REGRESSION➊ The regression equation consists of four parts:• The predicted value for the Y-variable or y’• The slope of the regression line or b• The known value of the X-variable or x• The value for the y-intercept or aChapter 3LINEAR REGRESSIONaxbyi'➊ The slope of the regression line or b :• Has the same sign (+ or -) as the correlation coefficient r• Is a function of the strength of the correlation and the ratio of standard deviations for X and Y variablesChapter 3LINEAR REGRESSIONaxbyi'SDxSDyrb➊ The value for the y-intercept or a :• Is the point where the regression line crosses the y-axis• Is the predicted value of y when the x-variable equals zero• This value may sometimes be a strange value, but remember it’s a predicted valueChapter 3LINEAR REGRESSIONaxbyi'➊ The y-intercept equals:• The slope of the regression equation (b)times the overall mean for the x-variable (X ) subtracted from• The overall mean for the y-variable (Y)Chapter 3LINEAR REGRESSIONXbYa➊ If the correlation is zero, that means the value for the slope is zero and the regression line is flat (i.e., horizontal)➋ If b= 0, then the y-intercept formula simplifies to: Which means the regression equation simplifies to:Chapter 3LINEAR REGRESSIONYaYy'Why?➊ If there is no correlation between two variables, the best prediction for either variable is its mean➋ On average, the mean is closer to all values in a distribution compared to any other score• In other words, if the mean is used to predict each score in a data set, the average error in prediction will be smaller compared to using some other score from the distributionChapter 3LINEAR REGRESSION➊ What values make the regression line?• The values predicted by the regression equation create the regression lineChapter 3LINEAR REGRESSIONaxbyi'These predicted points all fall on the regression line➊ Represents a central point inside the points of a scatterplot• The points in a scatterplot can be thought of as regressing to this central point➋ Is the best fitting line and is also known as the line of least-squares• Imagine the different angles you could plot a straight line through a scatterplot• The line that would result in the smallest average distance from all points would be the regression lineChapter 3LINEAR REGRESSIONChapter 3LINEAR REGRESSIONRegression EquationThe blue line is the regression line. The points that make this line are the predicted values from the regression equation.➊ Every linear regression line passes through the point of averages• The point of averages is located by the intersection of the overall mean for the x-variable and the overall mean of the y-variable➋ Point predicted closer to the point of averages are, on average, more accurate than points plotted further away from this pointChapter 3LINEAR REGRESSIONChapter 3LINEAR REGRESSIONRegression EquationThe black dot represents the point of averages where the overall means for the x-variable (Father’s Height 69 inches) and y-variable (Son’s Height 71.5 inches). This point is always found on a linear regression line➊ The regression line can be plotted using Excel, however, you can also plot this line using two points:• The point of averages and• The y-intercept➋ You can also plot the regression line by plugging-in values of the x-variable into the regression equation and solving for the predicted value of the y-variable Remember – the regression line is made-up of all the predicted values of the y-variable or y‘Chapter 3LINEAR REGRESSION➊ The term residuals refers to the amount of error in prediction• In other words, the regression equation produces a predicted value for the y-variable• The difference between the predicted value of Y and the real value of Y is known as error or the residual• Excel can calculate the residuals for each predicted score, however if we were to obtain the residuals by hand, the formula used is:• Formula for Residuals: y – y ‘Chapter 3LINEAR REGRESSIONChapter 3LINEAR REGRESSIONRegression EquationThe distance between each real point and the regression line is a residual or error in prediction. The sum of the residuals is always equal to zero.ResidualResidual➊ Residuals can help identify outliers• When a residual is very large, it may indicate an outlier• Outliers can have the effect of increasing or decreasing the slope of the regression line• This means that outliers can also increase or decrease the correlation between two variables • Depending on the size of the outlier, a researcher may want to run the regression analysis with and without the outlier to see how much the score may affect the resultsChapter 3LINEAR REGRESSION➊ The regression equation attempts to predict the mean of the y-variable at each value of the x-variable – WHY?• Suppose you have three fathers who are each 74 inches tall (or 6’2”)• Each of these fathers has a son who is a different height• The value of the x-variable entered into the regression equation will be the same for each of these three fathers• What value for sons’ heights should the equation try to predict?Chapter 3LINEAR REGRESSIONChapter 3LINEAR REGRESSIONRegression EquationThe regression equation will try to predict the average height of the sons (y-variable) at each height of the fathers (x-variable).What height should be predicted for the three sons who each have a father that is 74” tall?➊ What is meant by extrapolation?• Predicting values beyond the range of the data used to develop the regression equation➋ What is meant by limited range ?• When the regression equation is based on a very narrow range of data compared to the true


View Full Document

UNM STAT 145 - Chapter 3 CORRELATION AND REGRESSION

Download Chapter 3 CORRELATION AND REGRESSION
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter 3 CORRELATION AND REGRESSION and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 3 CORRELATION AND REGRESSION 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?