UW-Madison STAT 371 - Pooled Standard Error - D1694898

Home> Schools> University of Wisconsin, Madison> Statistics (STAT) > STAT 371> Pooled Standard Error

DOC PREVIEW

UW-Madison STAT 371 - Pooled Standard Error

School name University of Wisconsin, Madison

Course Stat 371- Intro to Statistics

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Two Independent SamplesBret LargetDepartment of StatisticsUniversity of Wisconsin - MadisonOctober 18, 2004Statistics 371, Fall 2004Comparing Two Groups• Chapter 7 describes two ways to compare two populationson the basis of independent samples:a confidence intervalfor the difference in population meansand a hypothesis test.• The basic structure of the confidence interval is the sameas in the previous chapter — an estimate plus or minus amultiple of a standard error.• Hypothesis testing will introduce several new concepts.Statistics 371, Fall 2004 1Setting• Model two populations as buckets of numbered balls.• The population means are µ1and µ2, respectively.• The population standard deviations are σ1and σ2, respec-tively.• We are interested inestimating µ1− µ2and in testing thehypothesis thatµ1= µ2.meanµ1sdσ1y1(1) ,..., yn1(1)y1s1meanµ2sdσ2y1(2) ,..., yn2(2)y2s2Statistics 371, Fall 2004 2Standard Error of ¯y1− ¯y2The standard error of the difference in two sample means is anempirical measure of how far the difference in sample meanswill typically be from the difference in the respective populationmeans.SE(¯y1− ¯y2) =vuuts21n1+s22n2An alternative formula isSE(¯y1− ¯y2) =q(SE(¯y1))2+ (SE(¯y2))2This formula reminds us of how to find the length of thehypotenuse of a triangle.(Variances add, but standard deviations don’t.)Statistics 371, Fall 2004 3Pooled Standard ErrorIf we wish to assume that the two population standard deviationsare equal, σ1= σ2, then it makes sense to use data from bothsamples to estimate the common population standard deviation.We estimate the common population variance with a weightedaverage of the sample variances, weighted by t he degrees offreedom.s2pooled=(n1− 1)s21+ (n2− 1)s22n1+ n2− 2Thepooled standard error is then as below.SEpooled= spooleds1n1+1n2Statistics 371, Fall 2004 4Sampling DistributionsThe sampling distribution of the difference in sample means hasthese characteristics.• Mean: µ1− µ2• SD:rσ21n1+σ22n2• Shape: Exactly normal if both populations are normal,approximately normal if populations are not normal but bothsample sizes are sufficiently large.Statistics 371, Fall 2004 5Theory for Confidence IntervalThe recipe for constructing a confidence interval for a single pop-ulation mean is based on facts about the sampling distributionof the statisticT =¯Y − µSE(¯Y ).Similarly, the theory for confidence intervals for µ1− µ2is basedon the sampling distribution of the statisticT =(¯Y1−¯Y2) − (µ1− µ2)SE(¯Y1−¯Y2)where westandardize by subtracting the mean and dividing bythe standard deviation of the sampling distribution.Ifboth populations are normal and if we know the populationStatistics 371, Fall 2004 6Theory for Confidence Intervalstandard deviations, thenPr−1.96 ≤(¯Y1−¯Y2) − (µ1− µ2)rσ21n1+σ22n2≤ 1. 96= 0.95where we can choose z other than 1.96 for different confidencelevels. This statement is true because the expression in t hemiddle has a standard normal distribution.But in practice, we don’t know the population standard devia-tions. If we substitute in sample estimates instead, we get this.Pr−t ≤(¯Y1−¯Y2) − (µ1− µ2)rs21n1+s22n2≤ t= 0.95We need to choose different end points to account for theadditional randomness in the denominator.Statistics 371, Fall 2004 6Theory for Confidence IntervalIt turns out that the sampling distribution of the statistic aboveisapproximately a t distribution where the degrees of freedomshould be estimated from the data as well.Algebraic manipulation leads to the following expression.Pr(¯Y1−¯Y2) − tss21n1+s22n2≤ µ1− µ2≤ (¯Y1−¯Y2) + tss21n1+s22n2= 0.95We use a t multiplier so that the area between −t and t undera t distribution with the estimated degrees of freedom will be0.95.Statistics 371, Fall 2004 6Confidence Interval for µ1− µ2The confidence interval for differences in population means hasthesame structure as that for a single population mean.(Estimate) ± (t Multiplier) ×SEThe only difference is that for this more complicated setting, wehavemore complicated formulas for the standard error and thedegrees of freedom.Here is the df formula.df =(SE21+ SE22)2SE41/(n1− 1) + SE42/(n2− 1)where SEi= si/√nifor i = 1, 2.As a check, the value is often close to n1+ n2− 2. (This willbe exact if s1= s2and if n1= n2.) The value from the messyformula will always be between the smaller of n1− 1 and n2− 1and n1+ n2− 2.Statistics 371, Fall 2004 7ExampleExercise 7.12In this example, subjects with high blood pressure are randomlyallocated to two treatments. Thebiofeedback group receivesrelaxation training aided by biofeedback and meditation overeight weeks. The control group does not. Reduction in systolicblood pressure is tabulated here.Biofeedback Controln 99 93¯y 13.8 4.0SE 1.34 1.30For 190 degrees of freedom (which come from both the simpleand messy formulas) the table says to use 1.977 (140 is roundeddown) whereas with R you find 1.973.Statistics 371, Fall 2004 8ExampleA calculator or R can compute the margin of error.> se = sqrt(1.34^2 + 1.3^2)> tmult = qt(0.975, 190)> me = round(tmult * se, 1)> se[1] 1.866976> tmult[1] 1.972528> me[1] 3.7We are 95% confident that the mean reduction in systolicblood pressure due to the biofeedback treatment in apopulation of similar individuals to those in this studywould be between 6.1 and 13.5 mm more than the meanreduction in the same population undergoing the controltreatment.Statistics 371, Fall 2004 8Example Using RExercise 7.21This exercise examines the growth of bean plants under red andgreen light. A 95% confidence interval is part of the outputbelow.> ex7.21 = read.table("lights.txt", header = T)> str(ex7.21)‘data.frame’: 42 obs. of 2 variables:$ height: num 8.4 8.4 10 8.8 7.1 9.4 8.8 4.3 9 8.4 ...$ color : Factor w/ 2 levels "green","red": 2 2 2 2 2 2 2 2 2 2 ...> attach(ex7.21)> t.test(height ~ color)Welch Two Sample t-testdata: height by colort = 1.1432, df = 38.019, p-value = 0.2601alternative hypothesis: true difference in means is not equal to 095 percent confidence interval:-0.4479687 1.6103216sample estimates:mean in group green mean in group red8.940000 8.358824Statistics 371, Fall 20049Example Assuming Equal VariancesFor the same data, were we to assume that the

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

UW-Madison STAT 371 - Pooled Standard Error

Sign up for free to view:

Please select your school