UNL STAT 870 - Lecture notes - D518446

Home> Schools> University of Nebraska-Lincoln> (STAT) > STAT 870> Lecture notes

DOC PREVIEW

UNL STAT 870 - Lecture notes

School name University of Nebraska-Lincoln

Course Stat 870- Multiple Regression Analysis

Pages 8

This preview shows page 1-2-3 out of 8 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 8 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Modified Levene’s (Brown-Forsythe) TestBreusch-Pagan Test3.4 Overview of tests involving residuals, 3.5 Correlation testof normality, and 3.7 F test for lack of fit Formal hypothesis tests can be done also to help with making decisions about the regression model satisfying its assumptions. Read on your own. We will only discuss a couple of them. 3.6 Tests for constancy of error varianceModified Levene’s (Brown-Forsythe) TestThe hypotheses can be written informally as H0: Constant varianceHa: Non-constant varianceThis test can be used when the error variance consistently increases or decreases as a function of X. The test still works even if the normality assumption of the i is violated.1.Find the residuals from the sample regression model.2.Divide the data into two groups using the median of X. Denote the corresponding residuals as ei1 and ei2. 3.Find the median residual value, 1e% and 2e%, for both groups.  2012 Christopher R. Bilder3.514.Find the absolute deviation of the residuals about their median: di1=|ei1-1e%| and di2=|ei2-2e%|. 5.Use the standard two-sample t-test for population means(pooled variance) to conduct the hypothesis test. 1 2L1 2d dt1 1sn n where 2 2i1 1 i2 22(d d ) (d d )sn 2   n1=number of observations in group 1n2=number of observations in group 2jd=sample mean of dij for group jThis test statistic has approximately a t(n-2) distribution under the hypothesis of constant variance. Note that the test statistic is measuring the average deviation from the residual median for each group. To understand the statistic, think about what would happen if the di1’s where much larger than the di2’s. Example: College and HS GPA (HS_college_GPA_ch3.R)> gpa<-read.table(file = "C:\\chris\\UNL\\STAT870\\Chapter1\\gpa.txt", header=TRUE, sep = "")> mod.fit<-lm(formula = College.GPA ~ HS.GPA, data = gpa) 2012 Christopher R. Bilder3.52> library(car) #The Levene's Test function is in the package for Fox and Weisberg's book (although the function is not mentioned in the book!)> group<-ifelse(test = gpa$HS.GPA < median(gpa$HS.GPA), yes = 1, no = 2)> leveneTest(y = mod.fit$residuals, group = group)Levene's Test for Homogeneity of Variance (center = median) Df F value Pr(>F)group 1 0.0766 0.7851 18 Warning message:In leveneTest.default(y = mod.fit$residuals, group = group) : group coerced to factor.#Could use leveneTest(y = mod.fit$residuals, group = # factor(group)) to avoid the warning messageFox uses a regular ANOVA model to produce a F-statistic. Remember that a t-random variable squared with 18 degrees of freedom is an F-random variable with1 numerator degree of freedom and 18 denominator degrees of freedom. Thus, the test statistic is Lt 0.28.Below is some more code just to verify the leveneTest() function was doing what we want it to do. > #Using the formulas instead for Levene's test> e.tilde<-tapply(X = mod.fit$residuals, INDEX = group, FUN = median)> data.frame(gpa, e = mod.fit$residuals, group, med.e = e.tilde[group]) #Show groups, e.tilde[group] matches up residuals with correct median HS.GPA College.GPA e group med.e1 3.04 3.1 0.26546091 2 0.090119762 2.35 2.3 -0.05177482 1 -0.077286773 2.70 3.0 0.40334475 2 0.09011976 2012 Christopher R. Bilder3.534 2.05 1.9 -0.24187731 1 -0.077286775 2.83 2.5 -0.18761083 2 0.090119766 4.32 3.7 -0.03010181 2 0.090119767 3.39 3.4 0.32058048 2 0.090119768 2.32 2.6 0.26921493 1 -0.077286779 2.69 2.8 0.21034134 2 0.0901197610 0.83 1.6 0.31170591 1 -0.0772867711 2.39 2.0 -0.37976115 2 0.0901197612 3.65 2.9 -0.36133070 2 0.0901197613 1.85 2.3 0.29805437 1 -0.0772867714 3.83 3.2 -0.18726921 2 0.0901197615 1.22 1.8 0.23883914 1 -0.0772867716 1.48 1.4 -0.34307203 1 -0.0772867717 2.28 2.0 -0.30279873 1 -0.0772867718 4.00 3.8 0.29378887 2 0.0901197619 2.28 2.2 -0.10279873 1 -0.0772867720 1.88 1.6 -0.42293538 1 -0.07728677> d<-abs(mod.fit$residuals - e.tilde[group]) > t.test(formula = d ~ group, mu = 0, var.equal = TRUE, alternative = "two.sided") Two Sample t-testdata: d by group t = -0.2768, df = 18, p-value = 0.7851alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.1375136 0.1055000 sample estimates:mean in group 1 mean in group 2 0.2479522 0.2639590 The test statistic is Lt 0.28 and the p-value is 0.7851. There is not sufficient evidence against the constant variance assumption.  2012 Christopher R. Bilder3.54Breusch-Pagan TestAssumes the error terms are: Normally distributed (different from Levene) Variance of i, denoted by 2i, is related to X by2i 0 1 ilog( ) X    where “log” means natural log. This means that 2i increases or decreases with Xi depending on the value of 1. Note that if 1=0, then we have constant variance!Test H0:1=0 vs. Ha:10. Test statistic:  2BP2SSR 2XSSE nwhere SSR is the regression sum of squares when regressing e2 on X and SSE is the usual sum of square error. In large samples, this test statistic has approximately a 2 distribution with 1 degree of freedom. Steps:1.For the regression model E(Yi)=0+1Xi, find the residuals and SSE. 2012 Christopher R. Bilder3.552.Using the squared residuals as the response variableand X as the independent variable, find SSR for this model. 3.Calculate the test statistic and p-value.Example: College and HS GPA (HS_college_GPA_ch3.R)> library(lmtest) #Location of the function> bptest(formula = College.GPA ~ HS.GPA, data = gpa, studentize = FALSE) #KNN version of the test Breusch-Pagan testdata: College.GPA ~ HS.GPA BP = 0.1959, df = 1, p-value = 0.658 2BPX 0.1959= and p-value = 0.65805There is not sufficient evidence against the constant variance assumption. Example: Trees data from MASS package (trees.R)> #Levene's test> library(car) > group<-ifelse(test = trees$Height

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2-3 out of 8 pages.

UNL STAT 870 - Lecture notes

Sign up for free to view:

Please select your school