1 2 Comparing two proportions 22S 30 105 Statistical Methods and Cmputing Differences between Population Proportions Introduction to Contingency Tables Recall In a two independent sample problem we want to compare two populations or the responses to two different treatments using data from two independent samples Lecture 20 April 8 2011 When we are interested in comparing the proportions of successes in two groups the notation is Population Sample Sample Population proportion size proportion 1 p1 n1 p 1 2 p2 n2 p 2 Kate Cowles 374 SH 335 0727 kcowles stat uiowa edu 3 4 We compare the populations by doing inference about the difference Example Do seatbelts protect children during car accidents p1 p2 study of deaths among children involved in car accidents during an 18 month period between the population proportions The statistic that estimates this difference is p 1 p 2 the difference between the two sample proportions two simple random samples one sample from population of children who were wearing seatbelts at the time of car accident one sample from population of children who were not wearing seatbelts at the time of car accident parameters of interest proportions of children who die in car accidents from each of these populations 5 The sampling distribution of p 1 p 2 Population Sample Sample Population proportion size proportion 3 0 024 seatbelts p1 123 123 no seatbelts p2 290 6 When both samples are large the distribution of p 1 p 2 is approximately normal 13 290 0 045 The mean of this normal distribution is p1 p2 To determine whether the study provides significant evidence that seatbelts affect the proportion of kids who die if they are involved in a car accident we test the hypotheses The standard deviation of the difference is H0 p1 p2 0 or H0 p1 p2 Ha p1 p2 6 0 or Ha p1 6 p2 Because we don t know p1 and p2 we must replace them with estimates These estimates will be different for confidence intervals versus hypothesis tests To estimate how large the difference is we compute a confidence interval for the difference p1 p2 7 v u u u u u u u t p1 1 p1 p2 1 p2 n1 n2 8 Confidence intervals for comparing two proportions Rules of thumb for using this confidence interval To compute a c i we estimate the population proportions p1 and p2 by their corresponding sample proportions p 1 and p 2 1 Both populations are at least 10 times as large as the samples The resulting standard error of p 1 p 2 is SE v u u u u u u u t p 1 1 p 1 p 2 1 p 2 n1 n2 The approximate level C two sided confidence interval is p 1 p 2 z SE where z is the upper 1 C 2 standard normal cutoff 2 The counts of successes and failures are 5 or more in each sample 9 Car accident example 10 The 95 two sided confidence interval is p 1 p 2 z SE 0 024 0 045 1 96 0 0184 0 021 0 0 036 0 057 0 015 Population Sample Sample Population proportion size proportion 3 0 024 Seatbelts p1 123 123 No seatbelts p2 290 13 290 0 045 p 1 p 2 0 021 v u u u u u u u t p 1 1 p 1 p 2 1 p 2 n1 n2 v u u u u 0 024 0 976 0 045 0 955 utu 123 290 0 0184 SE We are 95 confident that this interval covers the true difference between the proportions of kids who die from car accidents in the population who were wearing seatbelts at the time of the accident vs the population who were not The interval includes the value 0 so it is plausible based on this data that there is no difference 11 12 The hypothesis test We must standardize p 1 p 2 to get a z statistic For the formal hypothesis test the hypotheses are We do this under the assumption that H0 is true that is that p1 and p2 have the same value p H0 p1 p2 0 or H0 p1 p2 Ha p1 p2 6 0 or Ha p1 6 p2 Suppose we had set 05 when we were designing the study Instead of estimating p1 and p2 separately in the standard deviation of the difference we pool the two samaples and use the overall sample proportion to estimate the single population parameter p The pooled sample proportion is p total count of successes in both samples n1 n2 The test statistic is p p 2 0 z vuu 1 1 1 u tp 1 p n n 1 2 13 Car accident example To get the p value for the two sided test we look for the area under a standard normal curve that is farther away from 0 than 1 01 in either direction The pooled sample proportion is p 14 16 3 13 0 039 123 290 413 Table A gives 156 as the area to the left of 1 01 p value 2 0 156 0 312 The z statistic is z We cannot reject the null hypothesis This particular set of sample data does not provide evidence that the proportion of children dying in car accidents differs between the population of those wearing seatbelts at the time of the accident and the population of those not 0 024 0 045 0 v u u u t 1 1 0 039 0 961 123 290 1 01 15 Contingency Tables and the Chi square test An equivalent way of comparing two population proportions that generalizes to more than two populations 16 To test the hypotheses H0 p1 p2 0 or H0 p1 p2 Ha p1 p2 6 0 or Ha p1 6 p2 Begin by presenting the data as a two way table with rows representing levels of one variable and columns representing levels of the other using the two way table we must compute the expected counts These are the counts we would expect except for random variation if H0 were true Seatbelt example expected count Seatbelts Died Did not die Yes 3 120 No 13 277 Total 16 397 Total 123 290 413 row total column total table total If H0 were true there would be just one p shared by both populations Our best estimate is again the pooled sample proportion p 0 039 17 The Chi Square Test Seatbelt example Recall that the expected counts were computed under the assumption that the null hypothesis was true Observed counts for reference Seatbelts Died Did not die Yes 3 120 No 13 277 Total 16 397 Total 123 290 413 We can test the null hypothesis by determining whether the differences between the observed and expected counts are too large to be likely to be due to chance Expected counts Seatbelts Yes No Total Died Did not die 4 8 118 2 11 3 278 7 16 397 18 Total 123 290 413 Notation Oi is the observed count in cell …
View Full Document