Slide 1Test 3Practice ProblemsAdditional Reading and ExamplesSlide 5Motivating ExampleQuantitative VariablesE. Categorical DataCategorical DataCategorical DataMarginal DistributionCategorical DataExample 31Example 31Example 31Example 31Example 31Example 31Example 31Example 31Example 31Example 31Conditional DistributionConditional DistributionsConditional DistributionConditional DistributionsExample 32Example 32Example 32Example 32Example 32Example 32Example 32Slide 34Simpson’s ParadoxSimpson’s ParadoxSimpson’s ParadoxSimpson’s ParadoxExample 33 and Motivating ExampleExample 33Example 33Example 33Example 33Example 33Example 33Example 33Example 33Example 33Example 33Example 33Example 33Example 33Example 33Slide 54STAT 210Lecture 17Describing Relationships For Categorical VariablesOctober 4, 2017Test 3Friday, October 6Covers chapter 5, pages 99 – 138Combination of multiple choice questions and short answer questions and problems.Formulas provided, please bring calculator and writing instrument.Practice ProblemsPages 130 through 137Relevant problems: V.12 through V.16Recommended problems: V.12, V.15 and V.16Additional Reading and ExamplesRead pages 127 through 129Top Hat 2Motivating ExampleOver the years much debate has occurred on whether such demographics as race, sex, religious preference, sexual preference, etc. should impact decisions on whether a student is admitted to a college or university. In cases where discrimination is charged, statistics can be used to analyze whether discrimination has occurred and if so the extent of the discrimination.Quantitative VariablesEverything to this point has assumed two quantitative variables, an independent (or explanatory) variable X and a dependent (or response) variable Y. We have talked about how to describe the relationship between the two variables (direction, form and strength) and how the scatterplot, correlation coefficient and regression line can be used to help do this.E. Categorical DataNow suppose we have two qualitative or categorical variables: the variables vary in name, but not in magnitude, implying that they cannot be ranked.All we can do is name the categories and count the number of observations falling in each category.The question remains: is there a relationship between the two variables?Categorical DataWith two variables, we can count the number of observations that fall in each pair of categories. The counts are displayed in a two-way table.Categorical DataFreshman Sophomore Junior SeniorWarning 48 36 15 23Probation 29 42 12 14Good standing 71 37 18 62Marginal DistributionThere exists a marginal distribution for each variable.A marginal distribution lists the categories of the variable together with the frequency (count) or relative frequency (percentage) of observations in each category.Categorical DataFreshman Sophomore Junior SeniorWarning 48 36 15 23 122Probation 29 42 12 14 97Good standing 71 37 18 62 188 148 115 45 99 407Example 31Variable 1: Smoking Status (Smoker, Nonsmoker)Variable 2: Cough Status (Cougher, Noncougher)Example 31Two-Way TableCough No CoughSmokerNonsmokerExample 31Two-Way TableCough No CoughSmoker 43NonsmokerExample 31Two-Way TableCough No CoughSmoker 43 43NonsmokerExample 31Two-Way TableCough No CoughSmoker 43 43Nonsmoker 19Example 31Two-Way TableCough No CoughSmoker 43 43Nonsmoker 19 95Example 31Two-Way TableCough No CoughSmoker 43 43 86 Nonsmoker 19 95 114Example 31Marginal Distribution for Smoking Status Frequency Relative FrequencySmoker 86 86/200 = 43%Nonsmoker 114 114/200 = 57%Example 31Two-Way TableCough No CoughSmoker 43 43Nonsmoker 19 95 62 138Example 31Marginal Distribution for Coughing Status Frequency Relative FrequencyCough 62 62/200 = 31%No Cough 138 138/200 = 69%Conditional DistributionFor a specific category of variable 1, calculate the conditional distribution (conditioned on the category of variable 1) of the other variable.This can be done for each category of variable 1.The conditional distributions can be in terms of frequencies (counts) or relative frequencies (percentages).Conditional Distributions Variable 2 = Academic Year Freshman Sophomore Junior SeniorVariable 1 Warning 39% 30% 12% 19% = Academic Probation 30% 43% 12% 15% Status Good standing 38% 20% 9% 33%Conditional DistributionIf the conditional distributions of variable 2 are nearly the same for each category of variable 1, then we say that there is not an association between the two variables.If there are significant differences in the conditional distributions of variable 2 for the different categories of variable 1, then we say that there is an association between the two variables.Conditional Distributions Variable 2 = Academic Year Freshman Sophomore Junior SeniorVariable 1 Warning 39% 30% 12% 19% = Academic Probation 30% 43% 12% 15% Status Good standing 38% 20% 9% 33%Above the conditional distributions are not all the same: of those in good standing, there are fewer sophomores than for warning and probation, and there are more seniors than for warning and probation.Example 32Example 32Two-Way TableCough No CoughSmoker 43 43Nonsmoker 19 95Example 32Conditional distribution of coughing status for SMOKERS.Frequency Relative FrequencyCough 43 43/86 = 50%No Cough 43 43/86 = 50%Example 32Conditional distribution of coughing status for NONSMOKERS.Frequency Relative FrequencyCough 19 19/114 = 17%No Cough 95 95/114 = 83%Example 32Conditional Dist. Conditional Dist.for Smokers for NonsmokersCough 50% 17%No Cough 50% 83%Example 32Conditional Dist. Conditional Dist.for Smokers for NonsmokersCough 50% 17%No Cough 50% 83%Is there an association between the two variables ???Top HatExample 32YESSince the conditional distributions are notthe same, then there is a
View Full Document