Inference for TwoWay TablesInference for Two-Way TablesAnalysis of Two-Way TablesSections 9.1,2Categorical data summariesObjective: Extend two-category, two-sample analysisColumn variable RowRow variable Sample 1 Sample 2 totalsSuccessesXXX+XSuccessesX1X2X1+ X2Failures n1– X1n2– X2(n1+ n2) –(X1+ X2) Column totals n1n2n1+ n2“Tabular organization” easily generalizes to:MltitdtMulti-category data Multi-sample frameworkCategorical data relationships H0Hap1=p2p1≠p2p1 p2p1≠p2⇔versus⇔same conditional distrib tionsvarying conditional distrib tionsdistributionsdistributions⇔⇔no association association“Tabular thinking” easily generalizes to:MltitdtMulti-category data Multi-sample frameworkSetup and notation for two-way tables“r × c table:”Row Column variable RowVariable Col 1…Col c totalsRow 1…cells…cellsRow rColumn totals n Joint distribution*: appears across cellsMarginal distributions*:appear in“margins”Marginal distributions :appear in margins Conditional distributions*: appear in individual rows or columns* Often displayed as percentagesExample: Wine purchasesDoes wine pick depend on background music? In a study of shopping impulses, n1= 84 wine purchases with no pp g p ,1pmusic played, n2= 75 with French music played, and n3= 84 with Italian music played were each classified by the type of wine purchasedtype of wine purchased.Music RowWineNoneFrenchItaliantotalsWineNoneFrenchItaliantotalsFrench 30 39 30 99Italian 11 1 19 31Other 43 35 35 113Col totals 84 75 84 243Example: Wine purchases (continued)Use joint and marginal distributions to explore the variables individually and taken together:Music RowWineNoneFrenchItaliantotalsygWineNoneFrenchItaliantotalsFrench 12% 16% 12% 41%Italian 5% 0% 8% 13%Other 18% 14% 14% 47%Col totals 35% 31% 35% 100%Example: Wine purchases (continued)Use conditional distributions to explore relationships:MusicMusicWine None French ItalianFrench 36% 52% 36%Italian 13% 1% 23%Other 51% 47% 42%Coltotals100%100%100%Col totals100%100%100%Example: Business survivalDoes an excusive-territory clause affect the survival of a franchise? A SRS of n = 170 franchises were cross-classified by success and by the presence of an exclusive territory clause in their contract.Exclusive territory RowSuccess Yes No totalsYes 108 15 123No 34 13 47Coltotals14228170Col totals14228170Example: Business survival (continued)Conditional distributions:Exclusive territorySuccess Yes NoYes 76% 54%No 24% 46%Coltotals100%100%Th itit l thlf hiCol totals100%100%The excusive-territory clause appears to help franchises surviveSampling frameworkTwo different sampling frameworks: Wine purchases: Multiple, independent SRSs, with data classified into (the same) multiple categories Extends the two-sample setupBusiness survival:One SRS with multipleBusiness survival:One SRS, with multiple (categorical) measurements made on each individual Extends the setup underlying matched-pairs experimentsThe same inference procedures are used in both casesApproach to inferenceObjective: Test H0: no association versus Ha: association Compare observed and expected cell counts under H0St d di ddiff d tStandardize squared-differences and aggregate across categories“Chi-square” statisticExample: Wine purchases (continued)Music RowWineNoneFrenchItaliantotalsObserved counts:Note: same totalsWineNoneFrenchItaliantotalsFrench 30 39 30 99Italian 11 1 19 31Other 43 35 35 113Col totals 84 75 84 243E pected co nts nderHno associationMusic RowWineNoneFrenchItaliantotalsExpected counts under H0: no association:WineNoneFrenchItaliantotalsFrench 34.2 30.6 34.2 99Italian 10.7 9.6 10.7 31Oth39 134 939 1113Other39.134.939.1113Col totals 84 75 84 243Example: Wine purchases (continued)Standardized squared-differences:MusicRowMusicRowWine None French Italian totalsFrench 0.52 2.33 0.52 3.38lItalian 0.01 7.67 6.40 14.08Other 0.40 0.00 0.42 0.82Col totals 0.93 10.01 7.35X2= 18.28Properties of the chi-square statisticAssume either sampling framework from before. The chi-square statisticqhas an approximate chi-square (χ2) distribution with (r –1)(c –1) degrees of freedom In general, χ2(k) denotes a χ2distribution with kdegrees of freedomg Always right-skewed and takes only positive valuesNote: Both symbols X and χ are “chi,” but the former is uppercaseCalculating χ2probabilities and critical valuesSuppose V is χ2 (k). In Excel: For c > 0, chidist(c, k) = P(V ≥ c) For 0 < α < 1, chiinv(α, k)is thecfor which P(V≥c)=αis the cfor which P(V≥c) αChi-square test for two-way tables Assumptions: A valid sampling framework for two-way tables Hypotheses: H0: no association versus Ha: association Test statistic: P-value: P(V ≥ X2) where V is χ2(k) with k = (r –1)(c –1) ROT: Valid if both and every exp. count ≥ 1(or just every exp count≥5fora22 table)(or just every exp. count ≥5 for a 2×2 table)Example: Wine purchases (continued)Data: Three independent SRSs of categorical dataHypotheses: H0: no association versus Ha: associationTest statistic:X2= 18.23P-value: P(V ≥ 18.23) = 0.001, withk=(r–1)(c–1)=4dfwith k (r1)(c1) 4 d.f.Decision: Reject H0at significance level α = 0.05, and conclude a dependency ofwine pick on backgroundconclude a dependency of wine pick on background musicROT:min exp count = 9 6≥19ROT:, min. exp. count = 9.6 ≥19Supplemental analysisWhen H0 is rejected, the individual standardized squared differences may identify details of the associationExample: Wine purchases (continued)MusicRow7.67 + 6.40 = 14.07MusicRowWine None French Italian totalsFrench 0.52 2.33 0.52 3.38Italian 0.01 7.67 6.40 14.08Other 0.40 0.00 0.42 0.82Coltotals0.9310.017.35X2=18.28 Two cells account for 77% of X2= 18.28Big differences between observed and expectedCol totals0.9310.017.35X 18.28Big differences between observed and expectedRethinking the two-sample z test for proportionsTo test H0: p1= p2versus Ha: p1≠ p2 in the two-sample setup:p Two-sample z test for proportions Chi-square test for two-way tablesEach procedure leads to the same P-value!Example: Gender and garment labels (continued)Do the genders respond differently to “No Sweat” garment labels?g“No sweat” Gender RowInfluence?FemaleMaletotalsInfluence?FemaleMaletotalsLikely 63 27 90Unlikely 233 224 457That is:n= 296X=63n= 251 andX=27Col totals 296 251
View Full Document