Statistics 303Relationships Between Two Categorical VariablesSlide 3SPSS OUTPUTReview of Two-Way TablesSlide 6Chi-Squared Test for IndependenceSlide 8Slide 9Chi-Square TestAnalysis in SPSS gives us:Link between Diabetes and Heart Disease?Slide 13Link between Diabetes and Heart Disease? Data: Diabetes: 1=Not have diabetes, 2=Have Diabetes Control: 1=Controlled, 2=UncontrolledSlide 15Slide 16Link between Diabetes and Heart Disease? SPSS OutputSlide 18Is there a relationship between exposure to R-rated movies and adolescent smoking?Slide 20Is there a relationship between exposure to R-rated movies and adolescent smoking?SPSS OutputSlide 23Statistics 303Chapter 9Two-Way TablesRelationships Between Two Categorical Variables•Relationships between two categorical variables–Depending on the situation, one of the variables is the explanatory variable and the other is the response variable.–In this case, we look at the percentages of one variable for each level of the other variable. –Examples: •Gender and Soda Preference•Country of Origin and Marital Status•Smoking Habits and Socioeconomic StatusRelationships Between Two Categorical Variables•Relationships between two categorical variables–A two-way table can summarize the data for relationships between two categorical variables.•Example: Gender and Highest Degree ObtainedRS HIGHEST DEGREE * RESPONDENTS SEX CrosstabulationCount83 106 189405 542 94756 76 132160 226 386100 93 193804 1043 1847lt high schoolhigh schooljunior collegebachelorgraduate degreeRS HIGHESTDEGREETotalmale femaleRESPONDENTS SEXTotalSPSS OUTPUT•Example: PercentsRS HIGHEST DEGREE * RESPONDENTS SEX Crosstabulation83 106 18943.9% 56.1% 100.0%10.3% 10.2% 10.2%405 542 94742.8% 57.2% 100.0%50.4% 52.0% 51.3%56 76 13242.4% 57.6% 100.0%7.0% 7.3% 7.1%160 226 38641.5% 58.5% 100.0%19.9% 21.7% 20.9%100 93 19351.8% 48.2% 100.0%12.4% 8.9% 10.4%804 1043 184743.5% 56.5% 100.0%100.0% 100.0% 100.0%Count% within RSHIGHEST DEGREE% withinRESPONDENTS SEXCount% within RSHIGHEST DEGREE% withinRESPONDENTS SEXCount% within RSHIGHEST DEGREE% withinRESPONDENTS SEXCount% within RSHIGHEST DEGREE% withinRESPONDENTS SEXCount% within RSHIGHEST DEGREE% withinRESPONDENTS SEXCount% within RSHIGHEST DEGREE% withinRESPONDENTS SEXlt high schoolhigh schooljunior collegebachelorgraduate degreeRS HIGHESTDEGREETotalmale femaleRESPONDENTS SEXTotalReview of Two-Way Tables•Two-way tables come about when we are interested in the relationship between two categorical variables.–One of the variables is the row variable.–The other is the column variable.–The combination of a row variable and a column variable is a cell.Review of Two-Way Tables•Example:GENDER * TOMATOES CrosstabulationCount11 8 196 13 1917 21 38FMGENDERTotalN YTOMATOESTotalRow variableColumn variableColumn TotalsRow TotalsOverall TotalCellsChi-Squared Test for Independence•To test whether or not there is a relationship between the row variable and the column variable, we use the chi-square statistic (X2), which can be calculated in the computer.•The null hypothesis (H0) is no relationship among the two variables, i.e. the variables are independent.•The alternative hypothesis (HA) is that there is a relationship, i.e. the variables are not independent.•For 2x2 tables, we require that all four expected cell counts be 5 or more.•For tables larger than 2x2, we will use this approximation whenever the average of the expected counts is 5 or more and the smallest expected count is 1 or more.Chi-Squared Test for Independence•A comparison of the proportion of “successes” in two populations leads to a 2x2 table. •We can compare two population proportions either by the chi-square test or by the two-sample z test from section 8.2•These tests always give exactly the same result.•The chi-square statistic is equal to the square of the z statistic and χ2(1) critical values are equal to the squares of the corresponding N(0,1) critical values.•Advantage of the z test: We can test either one-sided or two-sided alternatives•Chi-square test always tests the two-sided alternative•Advantage of chi-square: We can compare more than two populations•z-Test compares only two populationsChi-Squared Test for Independence•The chi-square statistic compares the observed cell counts with the expected cell counts•The chi-square statistic is a measure of how much the observed cell counts in a two-way table diverge from the expected cell counts.•If the expected counts and the observed counts are very different, a large value of X2 will result. Large values of X2 provide evidence against the null hypothesis.nal)column tot total(rowcount cell expectedcount expectedcount) expected count (observed22XChi-Square Test•Like the t distributions, the χ2 distributions are described by a single parameter, degrees of freedom (df).•The degrees of freedom for the chi-square test are df = (r – 1)*(c – 1 ) = (#rows – 1)*(#columns – 1).•For a 2x2 table, we have df = (2 – 1)(2 – 1) = 1.•The p-value is determined by looking in Table F.•P(χ2 ≥ X2) Notice Table F gives probabilities to the right. Also, note χ2 distributions take only positive values and are skewed to the right.Analysis in SPSS gives us:The p-value is 0.103. Because this is larger than 0.05 we fail to reject H0 and conclude there is no significant relationship between gender and tomato enjoyment.Chi-Square Tests2.661b1 .1031.703 1 .1922.695 1 .101.191 .09638Pearson Chi-SquareContinuity CorrectionaLikelihood RatioFisher's Exact TestN of Valid CasesValue dfAsymp. Sig.(2-sided)Exact Sig.(2-sided)Exact Sig.(1-sided)Computed only for a 2x2 tablea. 0 cells (.0%) have expected count less than 5. The minimum expected count is8.50.b. We are interested in this row:Link between Diabetes and Heart Disease?•Background:Contradictory opinions: • 1. A diabetic’s risk of dying after a first heart attack is the same as that of someone without diabetes. There is no link between diabetes and heart disease. vs.•2. Diabetes takes a heavy toll on the body and diabetes patients often suffer heart attacks and strokes or die from cardiovascular complications at a much younger age. •So we use hypothesis test based on the latest data to see what’s the right conclusion. •There are a total of 5167 managed-care patients, among which 1131 patients are non-diabetics and 4036 are diabetics. Among the non-diabetic patients, 42% of them had their
View Full Document