DOC PREVIEW
UT Knoxville STAT 201 - 3) diagnostics

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

_X Y1 Y2 Y3 Y4 X4Stem & Leaf of Studentized ResidualStem & Leaf of Studentized ResidualSAS Code for Stem & LeafSPSS Code for Stem & LeafSAS Code to Detect HeteroscedasticitySPSS Code to Detect HeteroscedasticityInfluence Diagnostics in SASBASIC ASSUMPTIONS OF LEAST SQUARES REGRESSIONAssumption of IndependenceAssumption of NormalityPROBLEMATIC ISSUES THAT BIAS THE RESULTS OF THE REGRESSIONMeasurement ErrorMis-Specified Model (or Specification Error)MulticollinearityInfluential Observations_ X Y1 Y2 Y3 Y4 X410 8.04 9.14 7.46 8 6.58 8 6.95 8.14 6.77 8 5.7613 7.58 8.74 12.74 8 7.71 9 8.81 8.77 7.11 8 8.8411 8.33 9.26 7.81 8 8.4714 9.96 8.10 8.84 8 7.04 6 7.24 6.13 6.08 8 5.25 4 4.26 3.10 5.39 19 12.5012 10.84 9.13 8.15 8 5.56 7 4.82 7.26 6.42 8 7.91 5 5.68 4.74 5.73 8 6.89Course: Multiple Regression Topic: Regression Diagnostics 1REGRESSION DIAGNOSTICS: KNOW THY DATAThus far we have discussed the basics of bivariate and multiple regression and can easily compute a linear model that relates a dependent variable to one or more predictor variables. We should not, however, accept the results of the regression analysis without thoroughly examining and understanding our data to ensure that we have not violated assumptions of the regression analysis nor fallen prey to problems that bias the results of the analysis. The following example developed by Anscombe (1973, American Statistician, 27, p. 17-21; as described at Gerard Dallal’s website: wysiwyg://87/http://www.tufts.edu/~gdallal/anscombe.htm) demonstrates why a regression equation should not be accepted at face value.Anscombe generated data for six variables (Y1, Y2, Y3, Y4, X1, and X4) to demonstrate that very different patterns of data can produce identical regression models. The following table contains the data for the six variables.Regressing Y1, Y2, and Y3, respectively, on X and Y4 on X4 produces four identical regression equations: Y = 3.0 + .5X. The following plots reveal the pattern of actual data values for each bivariate association around the regression line. Y1=X Y2=XY3=X Y4=X4Course: Multiple Regression Topic: Regression Diagnostics 2The plots of Y1 on X, Y2 on X, Y3 on X, and Y4 on X4 reveal strikingly different patterns despite the common regression equation. Notice that the pattern of data for Y2 on X is curvilinear in nature. The pattern of data for Y4 on X4 reveals that the regression line is being pulled in an upward direction by one extreme point – the regression line would be flat if not for this extreme point. Hopefully this example convinces you of the importance of examining the data before blindly accepting the results of a regression analysis.The validity of the regression model can be seriously compromised by violations of assumptions of the regression model and by other problems. Today we will discuss the consequences of violating assumptions and other problematic issues and diagnostics that aid in the detection of assumption violation and various problems.BASIC ASSUMPTIONS OF LEAST SQUARES REGRESSIONWhen the regression equation is used to estimate population level relations from sample data it is important that we satisfy a set of assumptions. The assumptions are necessary for our inference from the sample to the population to be valid. Underlying our regression estimates (i.e.,y-intercept and betas) are hypothetical sampling distributions of all possible estimates (i.e., estimated beta-values) from a sample of a given size. And it is this sampling distribution that enables us to draw inferences from our sample (which provides one possible estimate from the sampling distribution) to the population. Satisfying the assumptions ensures that we know the characteristics of the sampling distribution for our estimates. When the assumptions are satisfied the sampling distribution for a given beta is normally distributed, the mean of the sampling distribution for a given beta is equal to the population parameter, and the variance of all possible sample estimates of beta (i.e., the sampling distribution) is as small as possible. So, when the assumptions are satisfied, estimates of the regression parameters are unbiased in the sense that across all possible samples the average sample value is equivalent to the population value and there will be little variation among the sample values (relative to other regression techniques). In other words, we our sample estimate is unbiased (i.e., accurate) and it’s standard error will be small – our sample provides a decent estimate of the population parameters. The basic assumptions of Least Squares Regression are phrased in regard to the residuals (i.e., errors) of the regression model. Those basic assumptions are that the errors are (1) independent, (2) normally distributed with a mean of zero, and (3) have a constant variance. Let’s examine consequences of violating each assumption and methods of detecting the violation. Assumption of IndependenceThe assumption of independence indicates that the errors for each observation are unrelated. Violating the assumption of independence result in a biased underestimation of the standard error of the regression coefficient (e.g., SEB), which in turn increases the Type I error rate of our significance tests (recall that the t-test of a beta is t = B/SE). Violations are incurred via sampling techniques. In cross sectional studies (i.e., studies in which data are collected at onepoint in time), sampling persons who share similar genetic or social-structural backgrounds usually violates the independence assumption. For example, if we have husbands and wives complete measures relationship satisfaction their responses are typically not independent becausethey share in common the relationship. The solution is to treat the relationship as the unit of analysis or use analysis strategies that explicitly model the dependence in the data (e.g., multi-Stem & Leaf of Studentized Residual Stem Leaf # 1 8 1 1 11 2 0 0 02 2 -0 100 3 -0 7 1 -1 -1 6 1 -2 1 1 ----+----+----+----+


View Full Document

UT Knoxville STAT 201 - 3) diagnostics

Documents in this Course
Chapter 8

Chapter 8

43 pages

Chapter 7

Chapter 7

30 pages

Chapter 6

Chapter 6

43 pages

Chapter 5

Chapter 5

23 pages

Chapter 3

Chapter 3

34 pages

Chapter 2

Chapter 2

18 pages

Chapter 1

Chapter 1

11 pages

Load more
Download 3) diagnostics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view 3) diagnostics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view 3) diagnostics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?