EXST7034 - Regression Techniques Page 1Coefficient of Determination - R2The SSTotal (corrected) is the amount of unexplained variation which existswithout a regression line.The SSRegression is that part of the SSTotal which is explained by the regressionline. R is the proportion of the SSTotal (corrected) accounted for by the Regression2line (SSReg). R = = = 1 2SSSS SS SSSS SSSSRegressionTotal Total TotalTotal ErrorErrorEXST7034 - Regression Techniques Page 2Some Properties of R21) 0 R 1 which is often multiplied by 100 and expressed as a %ŸŸ22) R = 1.0 iff Y = Y for all i (perfect prediction, SSE = 0)^233iXYi perfect prediction R =1 Ê2iXYi perfect random scatter R =0 Ê23) R = r for simple linear regression22XY For a simple linear regression, the “correlation" is between either X and Y or Y333and Y . These are the same since Y is a linear function of X .^^333In the general case (multiple regression) there are various X's, so the correlation isbetween Y and Y only.^334) R = r for all models with intercepts22YY^5) R 1.0 when there are different repeated values of Y23 at some value of X (no matter how well the model fits)3iXYiEXST7034 - Regression Techniques Page 3Proofs:1 through 3 are trivial4) r = , and since Y = Y__^2YY^(Y - Y)Y^_^(Y -Y) (Y - Y)__^Š‹DDD3333222 = since Y Y = Y (Y +e )= Y + Y e = Y^^^ ^^^Š‹DDDYY - nY^_(Y -Y) (Y - Y)__^33332222DD DDD33 3 3 3 3 33 3 = = RDD(Y Y)^_(Y Y)_233225) we will come back to this proof later6) Model: Y = + X + 3! "" 3"" %i SSResidual = (Y - Y ) = SS^D3" "2 Y = b + b X^3! ""i Model: Y = + X + X + 3 ! "" ## 3"""%ii SSResidual = (Y - Y ) = SS^D3" #2 Y = b + b X + b X^3 ! "" ##ii where b , b and b are the OLS estimators!" # Then it is clear that SS SS , and therefore#"Ÿ SS SSSS#"YY YYŸTherefore, R does not when additional variables are added to a2DECREASEmodel. It generally , though it may stay the same.INCREASESEXST7034 - Regression Techniques Page 4Correlation coefficient “r" this is a measure of the linear association between two variables r œDDD(X X)(Y Y)_–(Y Y) (X X)__3333È22 and it is also given by the square root of the coefficient of determination r R with the sign added to match the slopeœ2 either can be used, though the R seems to have a clearer interpretation2However, r is often used, possibly because it will be closer to 1 for any R value2except 0 and 1 eg if R = 0.25 then r = 0.25 0.50 which appears “better"2Èœ For a simple linear regression, the “correlation" calculated is between either X3and Y or Y and Y . These are the same since Y is a linear function of^^33 3 3X.3In the general case (multiple regression) there are various X's, so the correlation isbetween Y and Y only.^33EXST7034 - Regression Techniques Page 5Illustration of R using EXAMPLE 1 handout2ANOVA TABLE SOURCE d.f. SS MS F Regression 1 160.0 160.0 72.727 Residual or Error 8 17.6 2.2 Total 9 177.6 b = 4.0 b = 10.2 S = 1.48324"!2Tabular value: F = 5.32,!!& ").0. , , so F > F and we REJECT H! !!& ").0 !. , , R = = 0.9009 or 90.09%2160.0177.6 so we can state that this model accounts for 90.09% of the total variation (afteradjusting for the mean). What is a “GOOD" R value?2It depends on your . If you regress something that you KNOW is aexpectationsstrong relationship (eg. a fishes body length on his weight, or the length ofpeoples right arms versus their left arms) you may expect an R of 0.93 or20.95, and you may consider a value of 0.80 or 0.85 to be “POOR".If you have a model which you do not expect to be good, (eg. Can I predict thedensity of fish in an area from the width of the stream at that point?), youmay be with an R of 0.30 or
View Full Document