# SCC GBS 221 - Chapter 12 Simple Linear Regression (11 pages)

Previewing pages 1, 2, 3, 4 of 11 page document
View Full Document

## Chapter 12 Simple Linear Regression

Previewing pages 1, 2, 3, 4 of actual document.

View Full Document
View Full Document

## Chapter 12 Simple Linear Regression

85 views

Pages:
11
School:
Scottsdale Community College
Course:
• 2 pages

• 5 pages

• 3 pages

• 4 pages

Unformatted text preview:

Chapter 12 Simple Linear Regression Introduction n Exam Score vs Hours Studied Scenario n Regression Analysis n n n used to quantify the relation between 2 or more variables so you can predict the value of one variable based on the value of another develop an equation to predict the value of a dependent variable based on the value of one or more independent variables Correlation Analysis n n measures the strength of linear relation between a pair of variables if you plan to predict Y from X they ought to be related 2 1 Simple vs Multiple Regression n Simple Regression Analysis n n n n use a single independent variable to predict the dependent variable estimated Score 40 0816 1 4966 Hours r2 7432 Multiple Regression Analysis n n n n use multiple independent variables to predict the dependent variable the set of independent variables should be independent of one another and each should be highly related to the dependent variable estimated Score 33 914 3 472 GPA 1 698 Absences 1 395 Hours r2 7654 3 Characterizing Relationships n Direct Relation n Inverse Relation n n n n 100 pure relation between the pair of variables there is no scatter with respect to line of best fit so the value of Y can be determined exactly without error based on value of X Stochastic Statistical Random Relation n n n line of best fit has negative slope Deterministic Functional Relation n n line of best fit has positive slope a less than perfect relation between the pair of variables since variables other than X impact Y there is scatter with respect to line of best fit and there will be error when use x to predict y How characterize the apparent relation between Exam Score and Hours Studied 4 2 Simple Linear Regression Model Population Linear Regression Equation n y 0 1 x e n n represents the combined effects of other variables and is assumed to have mean of 0 and variance of 2 Sample Linear Regression Equation y b 0 b 1 x 5 Least Squares Method Line Of Best Fit n The sample regression line won t perfectly fit the sample points there will be errors in fit Why error in fit residual y y n n Provides the best fitting line in the sense that it has the minimum amount of squared deviation between each observed value and the corresponding point on the regression line Minimizes the sum of squared residuals in order to n n prevent and errors from cancelling draws added attention to any large errors n prefers to make several small errors in order to avoid large errors 6 3 Least Squares Method Line Of Best Fit n Properties of the Least Squares regression equation n n 1 b 0 and b1 are unbiased estimators of 0 and 1 2 line passes through the point x y y y 0 n 3 the sum of the residuals is zero n 4 the sum of the squared residuals is minimized slope b1 x x y y x x 2 y y 2 minimum y intercept b0 y b1 x n Exam Score vs Hrs Studied n n n the sample regression equation is compute the predicted values compute the residuals and squared residuals 7 Conditional Distribution Of y n Figure 12 8 on page 511 n Why is y variable at any given value x n Distribution of y is assumed Normal with mean n y The regression equation is the line which connects the mean value of y at each value of x 8 4 Correlation Analysis Concepts n n n n Measures the strength of linear relation between two variables If you intend to use X to predict Y how strongly related are they The slope of the sample regression equation was 1 4965 so these variables seem to move together The mean exam score was 76 and variation among student scores was s 11 2504 n some of the variation in scores can be explained by taking into account hours studied 9 Strength of Relationship r 98 r2 96 r 78 r2 61 r 34 r2 12 r 12 r2 01 r 01 r2 00 r 99 r2 98 r 64 r2 41 r 33 r2 12 r 11 r2 01 10 5 Correlation Analysis TOTAL VARIATION IN SCORES 92 76 UNEXPLAINABLE BY HOURS STUDIED SSR SSE y y 2 y y 2 92 88 SST y y 2 EXPLAINABLE BY HOURS STUDIED 88 76 95 85 y 76 75 65 55 45 35 0 5 10 15 20 25 30 35 11 Correlation Analysis TOTAL VARIATION IN SCORES SST y y 2 n EXPLAINABLE BY HOURS STUDIED UNEXPLAINABLE BY HOURS STUDIED SSR SSE y y 2 y y 2 Exam Score vs Hours Studied n SST SSR SSE 12 6 Coefficient Of Determination n n Measures the proportion of variation in variable y that is explained by variable x Indicates how well the sample regression line fits the sample data n 2 estimated by r 2 n 0 r2 1 2 r n explained variation SSR total variation SST y y y y 2 2 Exam Score vs Hrs Studied 13 Coefficient Of Correlation n estimated by r n 1 r 1 r sign of b1 r n n Value of r Strength of correlation 9 to 1 very high 7 to 9 high 5 to 7 moderate 3 to 5 weak 2 Interpretation There is a strength direct or inverse correlation between variable X and variable Y 0 to 3 little if any Exam Score vs Hrs Studied 14 7 Coefficient Of Correlation n When working with multiple variables common to obtain the correlation between each pair of variables n n a triangular correlation matrix Can investigate whether or not the potential independent variables are truly independent of one another Score Hours Hours 0 862 GPA 0 489 0 566 Absences 0 343 0 234 GPA 0 028 15 Limitations Of Regression Analysis n Regression Correlation cannot prove cause and effect relationships n n Brightman article Don t use the regression model to predict beyond range of observed X values 16 8 Mean Square Error Standard Error of Estimate n Measures amount of scatter around the regression line n Serves as an estimate of 2 n Standard Error of Estimate n n n Square root of MSE Serves as an estimate of s est SSE n 2 y y 2 n 2 y y 2 n 2 Used for inference regarding the regression line n n n SSE M S E n 2 hypothesis tests interval estimates Exam Score vs Hrs Studied 17 t Test for Significance of the Slope n b1 estimates 1 n H0 1 0 no relation between the two variables n HA 1 0 n test statistic b 1 whose sampling distr follows t n 2 n Standard Error of the Slope n n n is a relation between the two variables measures ROSE when use b 1 to estimate 1 sb1 M S E x x 2 s est x x Exam Score vs Hrs Studied 2 18 9 Interval Estimation …

View Full Document

Unlocking...