DOC PREVIEW
LSU EXST 7015 - Lecture Notes

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Statistical Techniques II Page 48 This can also be used to test each parameter estimate against zero. Regression in GLM PROC GLM and PROC MIXED do regression, but do not have all of the regression diagnostics available that we find in PROC REG. However, they do have a few advantages. They facilitate the inclusion of class variable (something we will be interest in later), and They provide tests of both Type I and TYPE II SS (as well as Types III and IV). The formatting is different, but most of the same information is available. Tests of both SS 1 and SS 3 are given by default. Note that the Type II and Type III are the same as in PROC REG (recall extra SS), but tests are provided. These F test values are calculated by dividing each SS (Sequential or Partial) by the MSE. Also note that the t-tests of the parameter estimates are the same as the tests of the Partial SS. More material and Summarization of Multiple regression will be done with the second example. Multicollinearity An important consideration in multiple regression is the effect of correlation among independent variables. There is a problem that exists when two independent variables are very highly correlated. The problem is called multicollinearity. At one extreme of this phenomenon is the case where two independent variables are perfectly correlated. This results in "singularity", and the X'X matrix that cannot be inverted. To illustrate the problem, take the following data set. Y X1 X2 1 1 2 2 2 3 3 3 4 If entered in PROC REG, SAS will report problems and will fit only the first variable, since the second one is perfectly correlated. Suppose we did want to fit both parameters for X1 and X2, what bi values could we get. The table below shows some possible values for b1 and b2. Acceptable values of b0, b1 and b2. b0 b1 b2 0 1 0 ‐1 0 1 99 100 ‐99 999 1000 ‐999 ‐101 ‐100 101 ‐1001 ‐1000 1001 ‐1000001 ‐1000000 1000001 There are an infinite number of solutions when singularity exists, and that is why no program can, or should, fit the parameter estimates. 48Statistical Techniques II Page 49 But suppose that I took and added to one of the Xi observations the value 0.0000000001. Now the two independent variables are not PERFECTLY correlated!!! SAS will report no error and will give a solution. How good is that solution. Remember how the bi values could go way up or way down as long as they were balanced by the other? b0 b1 b2 0 1 0 ‐1 0 1 99 100 ‐99 999 1000 ‐999 ‐101 ‐100 101 ‐1001 ‐1000 1001 Typically when very high correlations exist (but NOT perfect correlations) small changes in the data result in large fluctuations of the regression coefficients. Basically, under these conditions, the regression coefficient estimates are useless. Also, the variance estimates are inflated. So how do we detect these problems? First, look at the correlations, the simple correlations among the Xi variables produced by the PROC REG in the summary statistics section. For the Phosphorus example Correlation CORR X1 X2 X3 Y X1 1.0000 0.4616 0.1520 0.6934 X2 0.4616 1.0000 0.3175 0.3545 X3 0.1520 0.3175 1.0000 0.3617 Large correlations (usually > 0.9) can indicate potential multicollinearity problems. However, to detect Multicollinearity these statistics alone are not enough. It is possible that there is no pairwise correlation, but that some combination of Xi variables correlates with some other combination. So we need another statistic to address this. The Variance Inflation Factor (VIF) is the statistic most commonly used to detect this problem. For the Phosphorus example Variance Variable Tolerance Inflation INTERCEP . 0.00000000 X1 0.78692352 1.27077152 X2 0.72432171 1.38060199 X3 0.89915421 1.11215627 VIF values over 5 or 10, or a mean of the VIF values much over 2 indicate potential problems with multicollinearity. Tolerance is just the inverse of the VIF, so as VIF go up, Tolerance goes down. Both can be used to detect multicollinearity. We will ignore Tolerance. Another criteria for the evaluation of multicolinearity are called the "Collinearity Diagnostics". 49Statistical Techniques II Page 50 The value examined is the "condition number" or "condition index". This criteria does not provide a P-value. The last condition number is examined and values of 30 to 40 are considered to indicate probable multicolinearity. Another evaluation criteria for multicolinearity is the "condition number" or "condition index" Collinearity Diagnostics Condition --------------------Proportion of Variation-------------------- Number Eigenvalue Index Intercept LtofStay Age CulRatio XRay 1 7.92221 1.00000 0.00008203 0.00026608 0.00008384 0.00241 0.00055647 2 0.70667 3.34824 0.00062369 0.00113 0.00066774 0.02105 0.00504 3 0.23449 5.81248 0.00150 0.00158 0.00222 0.67694 0.00076682 4 0.04727 12.94588 0.00082735 0.04556 0.00039099 0.00635 0.00030739 5 0.03554 14.93049 0.00123 0.00104 0.00130 0.09639 0.43090 6 0.02878 16.59008 0.01898 0.01720 0.02926 0.06172 0.48518 7 0.01657 21.86586 0.03592 0.60385 0.02075 0.05279 0.05646 8 0.00546 38.09407 0.03300 0.27836 0.00195 0.01730 0.00245 9 0.00301 51.29072 0.90784 0.05102 0.94338 0.06505 0.01834 Multiple Regression So, the multiple regression differs from the SLR in that it has several variables. We need new statistics to examine parameter estimates from these variables, and to determine if there are problems among the variables. I will collectively refer to these as the “variable diagnostics” and these will be covered in the next section. Recall that multicollinearity can cause the regression coefficients to fluctuate greatly. Examining the Sequential Parameter Estimates for large fluctuations as variables enter is another indicator of multicollinearity. Sequential Parameter Estimates INTERCEP X1 X2 X3 81.277777778 0 0 0


View Full Document
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?