New version page

UNC-Chapel Hill STOR 151 - Comparing Two Means

Pages: 19
Documents in this Course

This preview shows page 1-2-3-4-5-6 out of 19 pages.

View Full Document

End of preview. Want to read all 19 pages?

View Full Document
Unformatted text preview:

Comparing Two MeansExample: 10 men in this class have a mean height of 71.44inches and a standard deviation 2.926 inches.61 women in this class have a mean height of 65.5 inches and astandard deviation 3.226 inches.Is this a statistically significant difference?1Standard error for the women:SE1=3.226√61= .4130Standard error for the men:SE2=2.926√10= .9253Standard error for the difference:SE =qSE21+ SE22= 1.013t statistic:t =¯x2− ¯x1SE=71.44 − 65.51.013= 5.84.Is this statistically significant?2Before we go on...In this instance, the answer is clearly yes. For a t value as largeas 5, we don’t need a detailed computation of the P-value toconclude that the result is statistically significant.However, in other cases it could be more critical, so we showhow to obtain the P-value.3Degrees of FreedomThe Welch-Satterthwaite Formula:df =s21n1+s22n221n1−1s21n12+1n2−1s22n22where s1, s2are the individual standard deviations and n1, n2arethe sample sizes.In this case s1= 3.226, s2= 2.926, n1= 61, n2= 10 and hencedf = 12.9 (say 13 for looking up in table).4However there is also the simpler formuladf = min(n1, n2) − 1.In this case, that leads to df = 9.In practice, it’s good enough to use the simpler formula, certainlyin this course.5Extract from the table of critical values of the t statistic withdf = 9 or 13:Confidence leveldf 80% 90% 95% 98% 99% 99.8%9 1.383 1.833 2.262 2.821 3.250 4.29713 1.350 1.771 2.160 2.650 3.012 3.852At P=.05, the critical value for significance is either 2.262 or2.160 depending on which df you use.Either way, t = 5.84 is clearly significant.6Suppose we want a 95% confidence interval for the difference inheight between men and women.Recall: the observed mean difference is 71.44 − 65.5 = 5.94 andthe standard error is 1.013.Based on df = 13, the critical value for a 95% confidence intervalis 2.16.Therefore, the desired 95% confidence interval is5.94 ± 2.16 × 1.013 = (3.75, 8, 13).7General Method: Difference of Means1. Calculate ¯x1, ¯x2, SE1, SE2.2. Combined SE =qSE21+ SE22.3. Calculate df: either Welch-Satterthwaite formula or simplerdf = min(n1, n2) − 1.4. For a hypothesis test, t =¯x1−¯x2SE; convert to a P-value usingtable of t statistics5. For a confidence interval, calculate critical value t∗corre-sponding to desired confidence level (e.g. in our example wehad df = 13, confidence level 95%, led to t∗= 2.16). Thenthe confidence interval is¯x2− ¯x1± t∗× SE.8Example (Question 9.24, page 451)Following are the numbers of newspapers read by a sample ofwomen and of menWomen: 5,3,6,3,7,1,1,3,0,4,7,2,2,7,3,0,5,0,4,4,5,14,3,1,2,1,7,2,5,3,7Men: 0,3,7,4,3,2,1,12,1,6,2,2,7,7,5,3,14,3,7,6,5,5,2,3,5,5,2,3,3(a) Construct and interpret a plot comparing responses of malesand females(b) Construct and interpret a 95% confidence intervals compar-ing populations means(c) Show the five steps of a significance test comparing popula-tions means(d) State and check the assumptions90 2 4 6 8 10 12 14Newspapers ReadMENWOMENBox plot for number of newspapers read by women and by men.10(a) See boxplots; male and female distributions look very similar(b) With suffix 1 indicating women, 2 for men: ¯x1= 3.774, ¯x2=4.414, s1= 2.929, s2= 3.100, n1= 31, n2= 29, SE =rs21n1+s22n2= 0.78. We have df = 57.1 according to Welch-Satterthwaite formula, df = 28 by simpler formula. Based ondf = 28, the critical value of t for a 95% confidence intervalis t∗= 2.048, so the 95% confidence interval for µ2− µ1is4.414 − 3.774 ± 2.048 × 0.78 = (−0.957, 2.237).(c) The t statistic for a hypothesis test is¯x2−¯x1SE= 0.82; this iswell below the critical value for a test at significance level.05 (the critical value is t∗= 2.048, as in part 2) so we DONOT REJECT the null hypothesis that the means for menand women are equal.11(d) The assumptions require independence of the two samples(probably more or less correct); randomness of the two sam-ples (depends on how the samples were obtained); and ap-proximately normal distributions for the samples themselves(probably true to a reasonable approximation).12Paired Comparison TestsConsider the following dataset, based on the midterm and finalexam scores of a recent course of mine (not STOR 151!):Student 1 2 3 4 5 6 7 8 9 10Midterm 86 100 90 90 94 73 76 76 95 87Final 84 95 77 83 70 76 54 81 90 84Difference 2 5 13 7 24 –3 22 –5 5 3Mean midterm score=86.7; Mean final score=79.4; Difference7.3Is this significant evidence of a difference?13We could apply the same test as previously, which would lead to¯x1= 86.7,¯x2= 79.4,s1= 9.06,s2= 11.37,SE =s9.06210+11.37210= 4.60,t =86.7 − 79.44.60= 1.59,df = 9.The two-sided P-value is 0.15 — this is greater than 0.05, there-fore the result is not significant.However, there is an error in this calculation....14The two samples are not independent since they represent scoresfrom the same students.Instead, we apply a matched pairs test:1. Calculate the difference in scores for each student.2. Carry out a single-sample test for the null hypothesis thatthe mean difference is 0.15DetailsThe 10 differences 2, 5, 13, 7, 24, –3, 22, –5, 5, 3 have a mean¯x = 7.3 and a standard deviation s = 9.67.The standard error is9.67√10= 3.06.The t statistic is7.33.06= 2.39.The corresponding two-sided P-value is 0.04.Therefore, the result is statistically significant16Message of this example:The standard two-sample test is valid only when the two sam-ples are independent (along with other assumptions: quantitativevariables, random sampling, approximately normal distributions)When the observations are directly paired (e.g. two exam scoresfrom the same student, two medical results from the same pa-tient, etc.) it is possible to apply a paired comparison test in-stead.The main difference is a different method of computing the stan-dard error. This has implications for both hypothesis testing andconfidence interval calculations.17Example: Problem 9.39, page 467/468.In a study to determine whether exercise reduced blood pressure,a sample of three patients was tested, with the following results:Subject Before After1 150 1302 165 1403 135 120(a) Explain why the three “before” observations and the three“after” observations are dependent samples.(b) Find the sample mean of the before scores, sample mean ofthe after scores, and the sample mean of d = before − after.(c) Find a 95% confidence interval for the mean difference, andinterpret it.18(a) They are from the same

View Full Document Unlocking...