New version page

UF STA 3024 - Association between Two Categorical Variables

This preview shows page 1-2-3-4-5-6 out of 19 pages.

View Full Document
View Full Document

End of preview. Want to read all 19 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

Chapter 10Association between Two Categorical VariablesContingency Tables and 2c(Chi-Square) TestsWhat we have seen so far:o In Chapters 3 and 11 we searched for associationbetween two quantitative variables. o In Chapter 12 we added one or more categoricalvariables to the quantitative predictors.o In Chapter 13 we had a quantitative response andone or more categorical predictor variables eachwith 2 or more categories. o This was an extension of what we saw in Chapter9, where we had a quantitative response and acategorical predictor that had 2 categories(populations) [with dependent or independentsamples].o In Chapters 8 and 9 we looked at the differencebetween proportions (i.e., response is acategorical variable with 2 categories) of twopopulations (categorical predictor with 2categories) with random samples (dependent orindependent) from these populations. Chapter 10 Fall 2007Page 1 of 19We now extend this to the case of a categoricalpredictor with 2 or more categories and acategorical response with 2 or more categories,where data from a random sample are summarized inan r  c contingency table.Example: Last semester after the week-end whenGator Basketball team won the game that put them inthe Final Four (which ended at 11:30 p.m.), 101students in a Statistics class were asked to report theirgender and whether or not have watched the wholegame, part of it or not at all. The following tablesummarizes the responses:Watched? Gender TotalMale FemaleWhole game 10 21 31Part of Game 12 24 36None 4 30 34Total 26 75 101To compare the differences in how much each genderwatched the game, we need to find percentages ineach category; but first we have to decide whichvariable is the response and which one is thepredictor, so that we can decide what to put in thedenominator of these proportions.Chapter 10 Fall 2007Page 2 of 19In this example,  The response is how much each student watchedthe game and  The predictor is gender. To compare the two genders we will divide thenumbers in each “cell” of the above table by thetotal number of students of each gender, i.e.,divide the number of observations in each cellby the total in each predictor (gender) category Such a division will give how much of the gamewatched by gender, i.e., the conditionaldistribution of response:Conditional Distribution of ResponseWatched?GenderTotalMale FemaleWhole game38.5%(10/26)28.0%(21/75)30.7%(31/101)Part of Game46.2%(12/26)32.0%(24/75)35.6%(36/101)None15.4%( 4/26)40.0%(30/75)33.7%(34/101)Total100.0%(26/26)100.0%(75/75)100.0%(101/101) In the above table, we see that malestudents watched more of the game than thefemales. Chapter 10 Fall 2007Page 3 of 19 Can we extend this to the wholepopulation of males and the whole population offemales? The above data are from a sample. In order to extendthe findings to the whole populations of male andfemale UF students we need: Data should be a SRS from the population ofinterest (Do you think that is the case?) If we can assume so, then we need to carry out atest of significance, to see if the differences arestrong enough to extend to the populations. We will carry out a test of independence of thetwo variables (vs. not independence or noassociation).If the two variables (gender and game watching) areindependent of each other, Then we would expect to see the same percentagedistribution of response for both genders. Thus we will have the following table of expectedfrequencies in each cell calculated by assuming thatthe two variables are independent of each other.Chapter 10 Fall 2007Page 4 of 19Expected frequencies(Assuming independence)Watched?GenderTotalMale FemaleWhole game 8(26×0.307)23(75×0.307)31/101= 30.7%Part of Game9(26×0.356)27(75×0.356)36/101=35.6%None9 (26×0.337) 25(75×0.337)34/101= 33.7%Total 26 75 101Expected frequencies are calculated using( ) ( )( )Column Total Row TotalExp Grand Total�=Chapter 10 Fall 2007Page 5 of 19Testing for Independence in contingency TablesAssumptions: Simple Random Sample from the populationof interest Expected counts ≥ 5 in each cell(Observed counts ≥ 5 in each cell is good)HypothesesHo: Two variables are independentHa: Two variables are NOT independentTest Statistic:22all cells( Observed Expected )Expectedc-=�Where (Row Total Coloumn TotalExpected = Grand Total�P-Value from the 2c tables with df = (Number of rows – 1) × (Number of Columns – 1)= (r – 1) × (c – 1)Decision Rule: Reject Ho if p-value ≤  as usual.Conclusion: Explain your decision, in simpleEnglish to the layman.Chapter 10 Fall 2007Page 6 of 19Example (Continued)WatchedGame?Observed Frequencies Expected FrequenciesGenderTotalGenderTotalMale Female Male FemaleWhole 10 21 31 7.98 23.02 31Part 12 24 36 9.27 26.73 36None 4 30 34 8.75 25.25 34Total 26 75 101 26 75 101Expected frequencies = (Col total)(Row Total)ExpGrand Total=Now we can use a worksheet to find the calculatedvalue of the test statistic, 2calc:Obs Exp(Obs – Exp)(Obs – Exp)22( Obs Exp )Exp-10 7.98 2.02 4.0804 0.511312 9.27 2.73 7.4529 0.80404 8.75 – 4.75 22.5625 2.578621 23.02 – 2.02 4.0804 0.177324 26.73 – 2.73 7.4529 0.278830 25.25 4.75 22.5625 0.8936101 = nAlways101 = nAlways0AlwaysNotneeded2calc=5.1536 Degrees of freedom = (r – 1)(c – 1) = (3 – 1)(2 – 1) = 2Chapter 10 Fall 2007Page 7 of 19The p-value:( ) ( )2 2 2( 2 ) cal ( 2 )p value P P 5.1536c c c- = � = �In the 2c-table (Table see on page A4 of your text)we look for 5.1536 on the line with df = 2. It is notthere. But we see that,( )2( 2 )P 5.99 0.050c � = ( )2( 2 )P 5.1536c � = p-value( )2( 2 )P 4.61 0.100c � =Hence 0.05 < p-value < 0.10Decision: Reject Ho at 10% level of significance butnot at 1% or 5% levels.Conclusion: The observed data indicate that there isa significant association between gender andbasketball watching habits of UF students.HOWEVER, since we do not have a simple randomsample (in fact we may have a highly biased sample)we should not extend this conclusion to all UFstudents.Chapter 10 Fall 2007Page 8 of 19Example: Are income and happiness associated?Some very important question you should answerbefore you dive in (so that you can identify


View Full Document
Loading Unlocking...
Login

Join to view Association between Two Categorical Variables and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Association between Two Categorical Variables and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?