Chapter 26 Chapter26 Presentation 1213 Comparing Counts Copyright 2009 Pearson Education Inc 1 Test of Independence Contingency tables categorize counts on two categorical variables so that we can see whether the distribution of counts on one variable is contingent on the other A test of independence examines whether there is a significant association between a pair of categorical variables Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc 2 Test of Independence Cont In Chapter 3 we saw an example of survival on board the Titanic Was survival proportion associated with the passengers ticket class Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc 3 Test of Independence Cont In a test of independence of two categorical variables the generic hypotheses are H0 Row and column classifications are independent HA Row and column classifications are not independent i e they are associated with each other Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc 4 Assumptions and Conditions Counted Data Condition Check that the data are counts for the categories of a categorical variable Independence Assumption The counts in the cells should be independent of each other Randomization Condition The counts should be a random sample from some population 10 Condition The sample should consist of less than 10 of the population of interest Question does the Titanic data meet the two condition above Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc 5 Assumptions and Conditions cont Sample Size Assumption We must have enough data for the methods to work Expected Cell Frequency Condition We should expect to see at least 5 individuals in each cell Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc 6 What Is Meant by Expected Cell Count Given the row totals and column totals in the contingency table what cell counts would we expect to see inside the table if the row and column classifications were independent of each other What cell counts would represent perfect independence of row and column classifications and maintain the row and column totals Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc 7 Calculating Expected Cell Counts If rows and columns were really independent you should be able to take the probability of being in a particular row TIMES the probability of being in a particular column TIMES the total number of observations to get the cell count for that corresponding cell This represents the expected cell count This is done for all cells in the contingency table Mechanically it is simpler although equivalent to calculate for each cell row total x column total grand total Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc 8 Calculations Cont How different are the actual observed cell counts from these expected cell counts It is natural to look at the differences between the observed and expected counts in each cell Obs Exp These differences are actually residuals so adding up all of these differences will result in a sum of 0 Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc 9 Calculations cont We ll handle the residuals as we did in regression by squaring them Obs Exp 2 To get an idea of the relative sizes of the differences for each cell we will divide these squared quantities by the expected cell count for that cell Obs Exp Exp Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc 2 10 Calculations cont We then add up all of these values This is the test statistic called the Chi square or Chisquared statistic Obs Exp 2 2 all cells Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc Exp 11 Calculations cont Chi square models are actually a family of distributions indexed by degrees of freedom much like the t distribution DF Rows 1 Cols 1 in this case Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc 12 Calculations and Hypotheses Recall our null and alternative hypotheses H0 Row and column classifications are independent HA Row and column classifications are not independent i e they are associated with each other Use the Chi square test statistic to find the P value for this hypothesis test Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc 13 Calculations and Hypotheses Cont Large Chi square values mean lots of deviation from the null hypothesis so they give small Pvalues A good Chi square calculator for finding p values can be found at http www stat tamu edu west applets chisqdemo html Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc 14 Chi Square P Values The Chi square statistic is used only for testing hypotheses not for constructing confidence intervals If the observed counts don t match the expected the Chi square test statistic will be large it can t be too small or negative If the calculated value of 2 is large enough we ll reject the null hypothesis Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc 15 The Chi Square Statistic in Use The following data table represents the results of a survey of some randomly selected Chevy Ford and Toyota owners and their maintenance experience during the first 5 years of owning their new car Is maintenance experience associated with the type of car Test with 0 05 Type of Car Maintenance Experience Chevy Ford Toyota 42 34 70 Regular Maint Only 24 18 12 Regular Unexpected Maint Ho Maintenance experience is independent of the type of car Ha Maintenance experience is not independent of or is associated with the type of car Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc 16 Find the Expected Cell Counts Maint Experience Reg Maint Only Reg Unexpect Chevy Type of Car Ford Toyota 42 34 70 Row 1 Total 146 24 18 12 Row 2 Total 54 Column 1 Total 66 Column 2 Total 52 Column 3 Total 82 Number of Obs 200 Expected Cell Count Eij ith row total jth column total number of obs Expected Values Maint Experience Chevy Reg Maint Only 146 66 200 48 18 Reg Unexpect 17 82 Chapter26 Presentation 1213 Copyright 2009 Pearson Education Inc Type of Car Ford 37 96 14 04 Toyota 59 86 22 14 17 Logic Behind the Expect Cell Count Formula 146 73 200 What is the probability of being in the top row 66 33 What is the probability of being in the left column 200 So if row and column classifications are really independent the probability of an observation being in 73 33 2409 the top row left column cell is So how many cars out of the 200 should have ended up in that cell 2409
View Full Document