# VCU STAT 210 - Lecture17 (54 pages)

Previewing pages*1, 2, 3, 4, 25, 26, 27, 51, 52, 53, 54*of 54 page document

**View the full content.**## Lecture17

Previewing pages
*1, 2, 3, 4, 25, 26, 27, 51, 52, 53, 54*
of
actual document.

**View the full content.**View Full Document

## Lecture17

0 0 103 views

- Pages:
- 54
- School:
- Virginia Commonwealth University
- Course:
- Stat 210 - Basic Practice of Statistics

**Unformatted text preview:**

STAT 210 Lecture 17 Describing Relationships For Categorical Variables October 4 2017 Test 3 Friday October 6 Covers chapter 5 pages 99 138 Combination of multiple choice questions and short answer questions and problems Formulas provided please bring calculator and writing instrument Practice Problems Pages 130 through 137 Relevant problems V 12 through V 16 Recommended problems V 12 V 15 and V 16 Additional Reading and Examples Read pages 127 through 129 Top Hat 2 Motivating Example Over the years much debate has occurred on whether such demographics as race sex religious preference sexual preference etc should impact decisions on whether a student is admitted to a college or university In cases where discrimination is charged statistics can be used to analyze whether discrimination has occurred and if so the extent of the discrimination Quantitative Variables Everything to this point has assumed two quantitative variables an independent or explanatory variable X and a dependent or response variable Y We have talked about how to describe the relationship between the two variables direction form and strength and how the scatterplot correlation coefficient and regression line can be used to help do this E Categorical Data Now suppose we have two qualitative or categorical variables the variables vary in name but not in magnitude implying that they cannot be ranked All we can do is name the categories and count the number of observations falling in each category The question remains is there a relationship between the two variables Categorical Data With two variables we can count the number of observations that fall in each pair of categories The counts are displayed in a two way table Categorical Data Freshman Sophomore Warning 48 Probation 29 Good standing 71 36 42 Junior Senior 15 12 37 23 14 18 62 Marginal Distribution There exists a marginal distribution for each variable A marginal distribution lists the categories of the variable together with the frequency count or relative frequency percentage of observations in each category Categorical Data Freshman Sophomore Junior Senior Warning 48 36 15 23 122 Probation 29 42 12 14 97 Good standing 71 37 18 62 188 148 115 45 99 407 Example 31 Variable 1 Smoking Status Smoker Nonsmoker Variable 2 Cough Status Cougher Noncougher Example 31 Two Way Table Cough Smoker Nonsmoker No Cough Example 31 Two Way Table Cough Smoker 43 Nonsmoker No Cough Example 31 Two Way Table Cough Smoker 43 Nonsmoker No Cough 43 Example 31 Two Way Table Cough Smoker 43 Nonsmoker No Cough 43 19 Example 31 Two Way Table Cough Smoker 43 Nonsmoker No Cough 43 19 95 Example 31 Two Way Table Cough Smoker 43 Nonsmoker No Cough 43 19 86 95 114 Example 31 Marginal Distribution for Smoking Status Frequency Smoker 86 Nonsmoker Relative Frequency 86 200 43 114 114 200 57 Example 31 Two Way Table Cough Smoker 43 Nonsmoker No Cough 43 19 62 95 138 Example 31 Marginal Distribution for Coughing Status Frequency Cough 62 No Cough 138 Relative Frequency 62 200 31 138 200 69 Conditional Distribution For a specific category of variable 1 calculate the conditional distribution conditioned on the category of variable 1 of the other variable This can be done for each category of variable 1 The conditional distributions can be in terms of frequencies counts or relative frequencies percentages Conditional Distributions Variable 2 Academic Year Variable 1 Warning 39 Academic Status Probation 30 Good standing Freshman Sophomore Junior 30 12 19 43 38 12 20 15 9 Senior 33 Conditional Distribution If the conditional distributions of variable 2 are nearly the same for each category of variable 1 then we say that there is not an association between the two variables If there are significant differences in the conditional distributions of variable 2 for the different categories of variable 1 then we say that there is an association between the two variables Conditional Distributions Variable 2 Academic Year Variable 1 Warning 39 Academic Status Probation 30 Good standing Freshman Sophomore Junior 30 12 19 43 38 12 20 15 9 Senior 33 Above the conditional distributions are not all the same of those in good standing there are fewer sophomores than for warning and probation and there are more seniors than for warning and probation Example 32 Example 32 Two Way Table Cough Smoker 43 Nonsmoker No Cough 43 19 95 Example 32 Conditional distribution of coughing status for SMOKERS Frequency Relative Frequency Cough 43 43 86 50 No Cough 43 43 86 50 Example 32 Conditional distribution of coughing status for NONSMOKERS Frequency Relative Frequency Cough 19 19 114 17 No Cough 95 95 114 83 Example 32 Conditional Dist Conditional Dist for Smokers for Nonsmokers Cough 50 17 No Cough 50 83 Example 32 Conditional Dist Conditional Dist for Smokers for Nonsmokers Cough 50 17 No Cough 50 83 Is there an association between the two variables Top Hat Example 32 YES Since the conditional distributions are not the same then there is a significant association between smoking status and cough status Top Hat Simpson s Paradox Simpson s Paradox Simpson s Paradox There are two categorical variables and we observe a relationship between the two variables Now we divide the data set up into subgroups and when we do so the relationship that we observe reverses This reversing of the relationship is referred to as Simpson s Paradox Simpson s Paradox There exists a lurking variable that creates a reversal in the direction of a relationship between two variables when the lurking variable is ignored as opposed to the relationship between the two variables when the lurking variable is considered The lurking variable creates subgroups and failure to take the lurking variable into consideration can lead to misleading conclusions regarding the association between the two variables Example 33 and Motivating Example Example 33 Let s first justify the claim of the women s group Example 33 Let s first justify the claim of the women s group Of the 360 men who applied to college 18 180 198 were accepted This is a 198 360 55 acceptance rate Example 33 Let s first justify the claim of the women s group Of the 360 men who applied to college 18 180 198 were accepted This is a 198 360 55 acceptance rate Of the 200 women who applied to college 24 64 88 were accepted This is a 88 200 44 acceptance rate Example 33 Let s first justify the claim of the women s group Of the 360 men who applied to college 18 180 198 were accepted This is a 198 360 55

View Full Document