Comparing Two Groups Standard Error of y 1 y 2 Chapter 7 describes two ways to compare two populations on the basis of independent samples a confidence interval for the difference in population means and a hypothesis test The basic structure of the confidence interval is the same as in the previous chapter an estimate plus or minus a multiple of a standard error Hypothesis testing will introduce several new concepts The standard error of the difference in two sample means is an empirical measure of how far the difference in sample means will typically be from the difference in the respective population means v u 2 us SE y 1 y 2 t 1 n1 s2 2 n2 An alternative formula is SE y 1 y 2 q SE y 1 2 SE y 2 2 This formula reminds us of how to find the length of the hypotenuse of a triangle Variances add but standard deviations don t Statistics 371 Fall 2003 1 Statistics 371 Fall 2003 3 Setting Two Independent Samples Bret Larget Department of Statistics Model two populations as buckets of numbered balls The population means are 1 and 2 respectively The population standard deviations are 1 and 2 respectively We are interested in estimating 1 2 and in testing the hypothesis that 1 2 University of Wisconsin Madison October 17 2003 mean 1 sd 1 1 1 y1 yn1 y1 s1 mean 2 sd 2 2 2 y1 yn2 y2 s2 Statistics 371 Fall 2003 Statistics 371 Fall 2003 2 Sampling Distributions Theory for Confidence Interval The sampling distribution of the difference in sample means has these characteristics standard deviations then Pr 1 96 SD Y 1 Y 2 1 2 r Mean 1 2 r 12 22 n1 n2 1 96 0 95 where we can choose z other than 1 96 for different confidence levels This statement is true because the expression in the middle has a standard normal distribution 12 22 n1 n2 Shape Exactly normal if both populations are normal approximately normal if populations are not normal but both sample sizes are sufficiently large But in practice we don t know the population standard deviations If we substitute in sample estimates instead we get this Pr t Y 1 Y 2 1 2 r s2 s2 1 2 n1 n2 t 0 95 We need to choose different end points to account for the additional randomness in the denominator Statistics 371 Fall 2003 5 Statistics 371 Fall 2003 6 Pooled Standard Error Theory for Confidence Interval If we wish to assume that the two population standard deviations are equal 1 2 then it makes sense to use data from both samples to estimate the common population standard deviation The recipe for constructing a confidence interval for a single population mean is based on facts about the sampling distribution of the statistic Y T SE Y We estimate the common population variance with a weighted average of the sample variances weighted by the degrees of freedom s2 pooled Similarly the theory for confidence intervals for 1 2 is based on the sampling distribution of the statistic 2 n1 1 s2 1 n2 1 s2 n1 n 2 2 T The pooled standard error is then as below Y 1 Y 2 1 2 SE Y 1 Y 2 where we standardize by subtracting the mean and dividing by the standard deviation of the sampling distribution s 1 1 SEpooled spooled n1 n2 If both populations are normal and if we know the population Statistics 371 Fall 2003 4 Statistics 371 Fall 2003 6 Confidence Interval for 1 2 Example The confidence interval for differences in population means has the same structure as that for a single population mean A calculator or R can compute the margin of error Estimate t Multiplier SE The only difference is that for this more complicated setting we have more complicated formulas for the standard error and the degrees of freedom Here is the df formula df 2 2 SE2 1 SE2 4 SE1 n1 1 SE4 2 n2 1 where SEi si ni for i 1 2 As a check the value is often close to n1 n2 2 This will be exact if s1 s2 and if n1 n2 The value from the messy formula will always be between the smaller of n1 1 and n2 1 and n1 n2 2 Statistics 371 Fall 2003 7 se sqrt 1 34 2 1 3 2 tmult qt 0 975 190 me round tmult se 1 se 1 1 866976 tmult 1 1 972528 me 1 3 7 We are 95 confident that the mean reduction in systolic blood pressure due to the biofeedback treatment in a population of similar individuals to those in this study would be between 6 1 and 13 5 mm more than the mean reduction in the same population undergoing the control treatment Statistics 371 Fall 2003 Theory for Confidence Interval Example It turns out that the sampling distribution of the statistic above is approximately a t distribution where the degrees of freedom should be estimated from the data as well Exercise 7 12 Algebraic manipulation leads to the following expression Pr Y 1 Y 2 t s s2 s21 2 1 2 Y 1 Y 2 t n1 n2 s s22 s21 0 95 n1 n2 8 In this example subjects with high blood pressure are randomly allocated to two treatments The biofeedback group receives relaxation training aided by biofeedback and meditation over eight weeks The control group does not Reduction in systolic blood pressure is tabulated here We use a t multiplier so that the area between t and t under a t distribution with the estimated degrees of freedom will be 0 95 n y SE Biofeedback Control 99 93 13 8 4 0 1 34 1 30 For 190 degrees of freedom which come from both the simple and messy formulas the table says to use 1 977 140 is rounded down whereas with R you find 1 973 Statistics 371 Fall 2003 6 Statistics 371 Fall 2003 8 Example Assuming Equal Variances Logic of Hypothesis Tests For the same data were we to assume that the population variances were equal the degrees of freedom the standard error and the confidence interval are all slightly different All of the hypothesis tests we will see this semester fall into this general framework 1 State a null hypothesis and an alternative hypothesis t test height color var equal T Two Sample t test data height by color t 1 1064 df 40 p value 0 2752 alternative hypothesis true difference in means is not equal to 0 95 percent confidence interval 0 4804523 1 6428053 sample estimates mean in group green mean in group red 8 940000 8 358824 2 Gather data and compute a test statistic 3 Consider the sampling distribution of the test statistic assuming that the null hypothesis is true 4 Compute a p value a measure of how consistent the data is with the null hypothesis in consideration of a specific alternative hypothesis Statistics 371 Fall 2003 10 Example Using R 12 Hypothesis Tests Exercise 7 …
View Full Document