UT Knoxville STAT 201 - 3) sld_diagnostics - D38766

Home> Schools> University of Tennessee> Statistics (STAT) > STAT 201> 3) sld_diagnostics

DOC PREVIEW

UT Knoxville STAT 201 - 3) sld_diagnostics

School name University of Tennessee

Course Stat 201- Introduction to Statistics

Pages 48

This preview shows page 1-2-3-23-24-25-26-46-47-48 out of 48 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 48 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

______________________________________________________________________________________X Y1 Y2 Y3 Y4 X4Stem & Leaf of Studentized ResidualStem & Leaf of Studentized Residual1REGRESSION DIAGNOSTICS:KNOW THY DATA2Know Thy Data-Never blindly accept the results of a regression analysis-Violation of assumptions and other problematic issues can produce misleading or potentially inaccurate results-Examine the results in regard to the nature of your dataAn Example from Anscombe (1973)______________________________________________________________________________________ X Y1 Y2 Y3 Y4 X410 8.04 9.14 7.46 8 6.58 8 6.95 8.14 6.77 8 5.7613 7.58 8.74 12.74 8 7.71 9 8.81 8.77 7.11 8 8.8411 8.33 9.26 7.81 8 8.4714 9.96 8.10 8.84 8 7.04 6 7.24 6.13 6.08 8 5.25 4 4.26 3.10 5.39 19 12.5012 10.84 9.13 8.15 8 5.56 7 4.82 7.26 6.42 8 7.91 5 5.68 4.74 5.73 8 6.893-Different patterns of data can produce the same regression-Y1 = X; Y2 = X; Y3 = X; Y4 = X4 All produce the same model: Y = 0.5X + 3.0Data And Regression For Y = X4Y = 0.5X + 3.0Data And Regression For Y2 = X5Y2 = 0.5X + 3.0Data And Regression For Y3 = X6Y3 = 0.5X + 3.0Data And Regression For Y4 = X4Y4 = 0.5X4 + 3.07Organization of Lecture8-Diagnosing & Consequences of Assumption Violations-Diagnosing & Consequences of Problematic IssuesAssumptions of Ordinary Least Squares Regression9The errors are:(1) Independent(2) Normally distributed with mean of zero(3) Have a constant varianceIf Assumptions are Satisfied10-Sample based regression coefficients (betas & Y-intercept) have underlying sampling distribution -e.g., value of b1 is one possible sample estimate from a distribution of all possible sample estimates of true population value B1-Sampling distribution enables statistical inference-When assumptions are satisfied the sampling distribution fora given coefficient is:-normally distributed-mean of sampling distribution equals population parameter-variability of sampling distribution is minimizedIn other words11-Our sample coefficients will provide decent estimates of population parameters because:-Sample estimate (e.g., b1) is unbiased-on average (across all possible samples) sample estimate will equal the population parameter-Standard error of regression estimate will be minimizedrelative to other forms of regressionAssumption of Independence-Errors for each observation are unrelated12-Violating this assumption UNDERESTIMATES the SEB, which increases Type I error rate (i.e., t = B/SEB)-Violation arises from sampling methods- observations that share genetic or contextual influences-data from longitudinal (i.e., repeated measures) methods-Solution- Use techniques that model the dependencee.g., Multi-level modeling or time-series analysesAssumption of Normality13-Errors are normally distributed-Violation of normality biases the accuracy of the p-values ofstatistical tests – non-normal errors do not fit t-distribution-Large sample sizes are robust to violation -Can diagnose normality with graphical methods-Stem-and-leaf plot of residuals (actual 3 types of residuals)3 Types of Residuals14-Raw residual: error = YYˆ-keyword is “r” in SAS-Standardized residual-error that is divided by standard error of the estimate-adjusts residual for the expected amount of error-keyword is “student” in SAS-Studentized residual-residual is adjusted by a standard error of estimate that is computed with the deletion of the given observation-keyword is “rstudent” in SASStem & Leaf Plot For Normality-Histogram of residuals that is turned on its side15-Residual is divided into two pieces (1) value to left of the decimal forms the stem (or Y-Axis)(2) value to the right of the decimal forms the leaf e.g., a residual of 1.23 provides a stem of 1 and leaf of 2-Stem values are listed twice-once to account for leaves of 0-4 and again for leaves 5-9E.g., Residuals: -1.68, -1.12, 0.01, 0.11, 0.52, 0.63, 0.77, 1.23, 1.35, 1.54 Stem Leaf 1 5 1 23 0 567 0 01-1 1-1 6Stem & Leaf Plot For Normality16-Distribution of leaves should be normal (i.e., bell shaped)-Deviations from bell shape suggest non-normality“Normal” Stem & LeafStem & Leaf of Studentized Residual Stem Leaf 1 8 1 11 0 0 02 -0 100 -0 7 -1 -1 6 -2 1 17“Negatively Skewed” Stem & LeafStem & Leaf of Studentized Residual Stem Leaf 1 677889 1 1123 0 0 02 -0 1 -0 7 -1 -1 6 -2 1 18Obtaining Stem-and-Leaf In SAS19-2-Steps(1) output the residuals (raw, standardized, or studentized) from proc reg(2) input the residuals into proc univariate and use plot optionproc reg;model y1 = x;output out=residy1 r=rsy1 student=stndry1 rstudent=sry1; run;data temp; set residy1;proc univariate plot; var rsy1 stndry1 sry1; run; Correcting Violations of Normalityraw residualstandardized residualstudentized residual20-Transform DV with square or square root-DV for a positively skewed distribution “pulls” in the right tail – the few large values in positive tail are pulled in.-2DV for a negatively skewed distribution “stretches” out the right tail – large values bunched in positive tail are stretched out-if DV has negative values a constant should be added to each score beforesquaring to maintain the order of the data-e.g., -42 becomes larger than -22 -Problem- results of regression are in regard to transformed DV. Is the transformed DV (e.g., self-esteem2) meaningful?Assumption of Constant VarianceHomoscedasticityHeteroscedasticity21-Homoscedasticity – the variance of the errors are constant across values of the predictor variable(s)-Heteroscedasticity – the variance of the errors change acrossvalues of the predictor variable(s)22-Figure on left, distribution of error around the regression line has same variance at each value of X-Figure on right, variance of the distribution of error around the regressionline increases across values of XConsequence of Heteroscedasticity-Does not affect the unbiasedness of

View Full Document