DOC PREVIEW
UW-Madison STAT 371 - STAT 371 Lecture Notes

This preview shows page 1-2-21-22 out of 22 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 22 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Bret Larget Department of Statistics University of Wisconsin Madison October 18 2004 Two Independent Samples Statistics 371 Fall 2004 Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples a confidence interval for the difference in population means and a hypothesis test The basic structure of the confidence interval is the same as in the previous chapter an estimate plus or minus a multiple of a standard error Hypothesis testing will introduce several new concepts Statistics 371 Fall 2004 1 Setting Model two populations as buckets of numbered balls The population means are 1 and 2 respectively The population standard deviations are 1 and 2 respectively We are interested in estimating 1 2 and in testing the hypothesis that 1 2 mean 1 sd 1 1 1 y1 yn1 y1 s1 mean 2 sd 2 2 2 y1 yn2 y2 s2 Statistics 371 Fall 2004 2 Standard Error of y 1 y 2 The standard error of the difference in two sample means is an empirical measure of how far the difference in sample means will typically be from the difference in the respective population means v u 2 us SE y 1 y 2 t 1 n1 s2 2 n2 An alternative formula is SE y 1 y 2 q SE y 1 2 SE y 2 2 This formula reminds us of how to find the length of the hypotenuse of a triangle Variances add but standard deviations don t Statistics 371 Fall 2004 3 Pooled Standard Error If we wish to assume that the two population standard deviations are equal 1 2 then it makes sense to use data from both samples to estimate the common population standard deviation We estimate the common population variance with a weighted average of the sample variances weighted by the degrees of freedom s2 pooled 2 n1 1 s2 1 n2 1 s2 n1 n2 2 The pooled standard error is then as below SEpooled spooled Statistics 371 Fall 2004 s 1 1 n1 n2 4 Sampling Distributions The sampling distribution of the difference in sample means has these characteristics Mean 1 2 SD r 12 22 n1 n2 Shape Exactly normal if both populations are normal approximately normal if populations are not normal but both sample sizes are sufficiently large Statistics 371 Fall 2004 5 Theory for Confidence Interval The recipe for constructing a confidence interval for a single population mean is based on facts about the sampling distribution of the statistic Y T SE Y Similarly the theory for confidence intervals for 1 2 is based on the sampling distribution of the statistic T Y 1 Y 2 1 2 SE Y 1 Y 2 where we standardize by subtracting the mean and dividing by the standard deviation of the sampling distribution If both populations are normal and if we know the population Statistics 371 Fall 2004 6 Theory for Confidence Interval standard deviations then Pr 1 96 Y 1 Y 2 1 2 r 12 22 n1 n2 1 96 0 95 where we can choose z other than 1 96 for different confidence levels This statement is true because the expression in the middle has a standard normal distribution But in practice we don t know the population standard deviations If we substitute in sample estimates instead we get this Pr t Y 1 Y 2 1 2 r 2 s2 1 s2 n1 n2 t 0 95 We need to choose different end points to account for the additional randomness in the denominator Statistics 371 Fall 2004 6 Theory for Confidence Interval It turns out that the sampling distribution of the statistic above is approximately a t distribution where the degrees of freedom should be estimated from the data as well Algebraic manipulation leads to the following expression s s s22 s22 s21 s21 1 2 Y 1 Y 2 t 0 95 Pr Y 1 Y 2 t n1 n2 n1 n2 We use a t multiplier so that the area between t and t under a t distribution with the estimated degrees of freedom will be 0 95 Statistics 371 Fall 2004 6 Confidence Interval for 1 2 The confidence interval for differences in population means has the same structure as that for a single population mean Estimate t Multiplier SE The only difference is that for this more complicated setting we have more complicated formulas for the standard error and the degrees of freedom Here is the df formula df 2 2 SE2 1 SE2 4 SE4 1 n1 1 SE2 n2 1 where SEi si ni for i 1 2 As a check the value is often close to n1 n2 2 This will be exact if s1 s2 and if n1 n2 The value from the messy formula will always be between the smaller of n1 1 and n2 1 and n1 n2 2 Statistics 371 Fall 2004 7 Example Exercise 7 12 In this example subjects with high blood pressure are randomly allocated to two treatments The biofeedback group receives relaxation training aided by biofeedback and meditation over eight weeks The control group does not Reduction in systolic blood pressure is tabulated here n y SE Biofeedback 99 13 8 1 34 Control 93 4 0 1 30 For 190 degrees of freedom which come from both the simple and messy formulas the table says to use 1 977 140 is rounded down whereas with R you find 1 973 Statistics 371 Fall 2004 8 Example A calculator or R can compute the margin of error se sqrt 1 34 2 1 3 2 tmult qt 0 975 190 me round tmult se 1 se 1 1 866976 tmult 1 1 972528 me 1 3 7 We are 95 confident that the mean reduction in systolic blood pressure due to the biofeedback treatment in a population of similar individuals to those in this study would be between 6 1 and 13 5 mm more than the mean reduction in the same population undergoing the control treatment Statistics 371 Fall 2004 8 Example Using R Exercise 7 21 This exercise examines the growth of bean plants under red and green light A 95 confidence interval is part of the output below ex7 21 read table lights txt header T str ex7 21 data frame 42 obs of 2 variables height num 8 4 8 4 10 8 8 7 1 9 4 8 8 4 3 9 8 4 color Factor w 2 levels green red 2 2 2 2 2 2 2 2 2 2 attach ex7 21 t test height color Welch Two Sample t test data height by color t 1 1432 df 38 019 p value 0 2601 alternative hypothesis true difference in means is not equal to 0 95 percent confidence interval 0 4479687 1 6103216 sample estimates mean in group green mean in group red 8 940000 8 358824 Statistics 371 Fall 2004 9 Example Assuming Equal Variances For the same data were we to assume that the population variances were equal the degrees of freedom the standard error and the confidence interval are all slightly different …


View Full Document

UW-Madison STAT 371 - STAT 371 Lecture Notes

Documents in this Course
HW 4

HW 4

4 pages

NOTES 7

NOTES 7

19 pages

Ch. 6

Ch. 6

24 pages

Ch. 4

Ch. 4

10 pages

Ch. 3

Ch. 3

20 pages

Ch. 2

Ch. 2

28 pages

Ch. 1

Ch. 1

24 pages

Ch. 20

Ch. 20

26 pages

Ch. 19

Ch. 19

18 pages

Ch. 18

Ch. 18

26 pages

Ch. 17

Ch. 17

44 pages

Ch. 16

Ch. 16

38 pages

Ch. 15

Ch. 15

34 pages

Ch. 14

Ch. 14

16 pages

Ch. 13

Ch. 13

16 pages

Ch. 12

Ch. 12

38 pages

Ch. 11

Ch. 11

28 pages

Ch. 10

Ch. 10

40 pages

Ch. 9

Ch. 9

20 pages

Ch. 8

Ch. 8

26 pages

Ch. 7

Ch. 7

26 pages

Load more
Download STAT 371 Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view STAT 371 Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view STAT 371 Lecture Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?