Pooled Standard Error We estimate the common population variance with a weighted average of the sample variances weighted by the degrees of freedom s2 pooled 2 n1 1 s2 1 n2 1 s2 n1 n2 2 The pooled standard error is then as below SEpooled spooled s 1 1 n1 n2 Statistics 371 Fall 2004 4 Sampling Distributions Comparing Two Groups SD r 22 12 n1 n2 Statistics 371 Fall 2004 Theory for Confidence Interval 5 The recipe for constructing a confidence interval for a single population mean is based on facts about the sampling distribution of the statistic Y T SE Y Similarly the theory for confidence intervals for 1 2 is based on the sampling distribution of the statistic T Y 1 Y 2 1 2 SE Y 1 Y 2 where we standardize by subtracting the mean and dividing by the standard deviation of the sampling distribution If both populations are normal and if we know the population Statistics 371 Fall 2004 6 Bret Larget October 18 2004 Shape Exactly normal if both populations are normal approximately normal if populations are not normal but both sample sizes are sufficiently large Chapter 7 describes two ways to compare two populations on the basis of independent samples a confidence interval for the difference in population means and a hypothesis test The basic structure of the confidence interval is the same as in the previous chapter an estimate plus or minus a multiple of a standard error Hypothesis testing will introduce several new concepts Department of Statistics Mean 1 2 University of Wisconsin Madison The sampling distribution of the difference in sample means has these characteristics Two Independent Samples Statistics 371 Fall 2004 If we wish to assume that the two population standard deviations are equal 1 2 then it makes sense to use data from both samples to estimate the common population standard deviation Statistics 371 Fall 2004 Setting 1 Model two populations as buckets of numbered balls The population means are 1 and 2 respectively The population standard deviations are 1 and 2 respectively We are interested in estimating 1 2 and in testing the hypothesis that 1 2 mean 1 sd 1 1 1 y1 yn1 y1 s1 mean 2 sd 2 2 2 y1 yn2 y2 Statistics 371 Fall 2004 s2 2 Theory for Confidence Interval Standard Error of y 1 y 2 standard deviations then The standard error of the difference in two sample means is an empirical measure of how far the difference in sample means will typically be from the difference in the respective population means Pr 1 96 Y 1 Y 2 1 2 r 12 22 n1 n2 1 96 0 95 v u u s2 s2 SE y 1 y 2 t 1 2 n1 n2 where we can choose z other than 1 96 for different confidence levels This statement is true because the expression in the middle has a standard normal distribution An alternative formula is But in practice we don t know the population standard deviations If we substitute in sample estimates instead we get this Pr t Y 1 Y 2 1 2 r s2 s2 2 1 n1 n2 t SE y 1 y 2 q SE y 1 2 SE y 2 2 This formula reminds us of how to find the length of the hypotenuse of a triangle 0 95 We need to choose different end points to account for the additional randomness in the denominator Variances add but standard deviations don t Statistics 371 Fall 2004 Statistics 371 Fall 2004 6 3 Example Using R Theory for Confidence Interval Exercise 7 21 It turns out that the sampling distribution of the statistic above is approximately a t distribution where the degrees of freedom should be estimated from the data as well This exercise examines the growth of bean plants under red and green light A 95 confidence interval is part of the output below ex7 21 read table lights txt header T str ex7 21 data frame 42 obs of 2 variables height num 8 4 8 4 10 8 8 7 1 9 4 8 8 4 3 9 8 4 color Factor w 2 levels green red 2 2 2 2 2 2 2 2 2 2 attach ex7 21 t test height color Welch Two Sample t test Algebraic manipulation leads to the following expression Pr s s s2 s2 s21 s21 2 1 2 Y 1 Y 2 t 2 0 95 Y 1 Y 2 t n1 n2 n1 n2 We use a t multiplier so that the area between t and t under a t distribution with the estimated degrees of freedom will be 0 95 data height by color t 1 1432 df 38 019 p value 0 2601 alternative hypothesis true difference in means is not equal to 0 95 percent confidence interval 0 4479687 1 6103216 sample estimates mean in group green mean in group red 8 940000 8 358824 Statistics 371 Fall 2004 9 Statistics 371 Fall 2004 6 Example Assuming Equal Variances Confidence Interval for 1 2 For the same data were we to assume that the population variances were equal the degrees of freedom the standard error and the confidence interval are all slightly different The confidence interval for differences in population means has the same structure as that for a single population mean t test height color var equal T Two Sample t test The only difference is that for this more complicated setting we have more complicated formulas for the standard error and the degrees of freedom data height by color t 1 1064 df 40 p value 0 2752 alternative hypothesis true difference in means is not equal to 0 95 percent confidence interval 0 4804523 1 6428053 sample estimates mean in group green mean in group red 8 940000 8 358824 Estimate t Multiplier SE Here is the df formula df 2 2 SE2 1 SE2 4 SE4 1 n1 1 SE2 n2 1 where SEi si ni for i 1 2 As a check the value is often close to n1 n2 2 This will be exact if s1 s2 and if n1 n2 The value from the messy formula will always be between the smaller of n1 1 and n2 1 and n1 n2 2 Statistics 371 Fall 2004 10 Hypothesis Tests 7 Example Hypothesis tests are an alternative approach to statistical inference Unlike confidence intervals where the goal is estimation with assessment of likely precision of the estimate the goal of hypothesis testing is to ascertain whether or not data is consistent with what we might expect to see assuming that a hypothesis is true The logic of hypothesis testing is a probabilistic form of proof by contradiction In logic if we can say that a proposition H leads to a contradiction then we have proved H false and have proved notH to be true In hypothesis testing if observed data is highly unlikely under an assumed hypothesis H then there is strong but not definitive evidence that the hypothesis is false Statistics 371 Fall 2004 Statistics 371 Fall 2004 11 Exercise 7 …
View Full Document