Chapter 10 Analysis of Categorical Data Fall 2011 Introduction Case Study Example In Costa Rica the vampire bat Desmodus rotundus feeds on the blood of domestic cattle If the bats respond to a hormonal signal cows in estrous in heat may be bitten with a different probability than cows not in estrous Bitten by a bat Not bitten by a bat Total In estrous 15 7 22 Not in estrous 6 322 328 Total 21 329 350 The proportion of bitten cows among those in estrous is 15 22 682 while the proportion of bitten cows among those not in estrous is 6 328 018 Case Study Example In an experiment fish are placed in a large tank for a period of time and some are eaten by large birds of prey The fish are categorized by their level of parasitic infection either uninfected lightly infected or highly infected It is to the parasites advantage to be in a fish that is eaten as this provides an opportunity to infect the bird in the parasites next stage of life The observed proportions of fish eaten are quite different among the categories Eaten Not eaten Total Uninfected 1 49 50 Lightly Infected 10 35 45 Highly Infected 37 9 46 The proportions of eaten fish are respectively 1 50 02 10 45 222 and 37 46 804 Total 48 93 141 Overview We will study this data from two different points of view 1 Two sample problem p versus p 1 2 1 2 2 p1 p2 Relative Risk Odss Ratio Chi square Test 1 2 Goodness of fit test Test for association between two categorical variables Part I Two Sample Problem 10 7 Confidence Interval for p1 p2 Setup 1 Population picture board 2 Sampled data Condition Not Sample 1 y1 n1 y1 n1 Sample 2 y2 n2 y2 n2 Recall One Sample Confidence Interval We pretend we have 4 more observations i e sample size is n 4 and that out of those 4 extra observations there are 2 successes and 2 failures i e successes is Y 2 y 2 p n 4 r and SEp A 95 confidence interval for p is p 1 96 SEp p 1 p n 4 Confidence Interval for p1 p2 Again we pretend that we have 1 4 more observations Split between between the two samples n1 2 and n2 2 2 Out of the 4 extra observations there are 2 successes Split between the two samples y1 1 y2 1 y1 1 y2 1 p 1 p 2 n1 2 n2 2 Confidence Interval for p1 p2 A confidence interval for p1 p2 is p 1 p 2 z 2 SEp 1 p 2 y2 1 y1 1 p 2 n1 2 n2 2 s p 1 1 p 1 p 2 1 p 2 n1 2 n2 2 p 1 SEp 1 p 2 Comment This confidence interval formula is used for all confidence levels not just 95 Example on the board Find a 95 confidence interval for the difference in probabilities of being bitten by a vampire bat between cows in estrous and those not Example Interpretation In the study setting in Costa Rica we are 95 confident that the probability that a cow in estrous is bitten by a vampire bat is larger than the probability of cow not in estrous being bitten by an amount between 456 and 835 10 9 Relative Risk and the Odds Ratio Introduction p1 p2 provides information about the magnitude of the difference between p1 and p2 There are other ways compare these values e g the ratio Relative Risk Relative risk is a ratio of two probabilities both of the same event but under different conditions p1 p2 For example if the probability of a low birthweight baby given that the mother is a smoker is twice as high as if the mother is a nonsmoker the relative risk of low birthweight for smokers relative to nonsmokers is 2 Estimate the relative risk This is simply the estimated proportions p 1 p 2 Example Bat Bites Bitten by a bat Not bitten by a bat Total In estrous 15 7 22 p 1 15 22 682 Not in estrous 6 322 328 Total 21 329 350 p 2 6 328 018 The estimated relative risk 682 p 1 37 88 p 2 018 Thus we estimate that the risk of being bitten is more than 37 times greater for cows in estrous versus cows not in estrous The Odds Ratio Odds ratios are another way to compare probabilities If the probability of an event E is Pr E the odds of event E Pr E 1 Pr E The odds ratio of two events often denoted is the ratio of the odds So the odds ratio for events with probabilities p1 and p2 is p1 1 p2 p1 1 p1 p2 1 p2 1 p1 p2 Comparing Relative Risk and Odds Ratios Relative risk and odds ratios are not identical but are similar to one another The exact relationship is this odds ratio p1 1 p2 p2 1 p1 relative risk 1 p2 1 p1 These will be very close when p1 and p2 are both small Example Bat Bites p 1 15 22 682 p 2 6 328 018 The estimated odds of being bitten are 682 2 142 1 682 among cows in estrous 018 0186 1 018 The estimated odds ratios is among cows in estrous 2 142 115 0186 Thus we estimate that the odds of being bitten is 115 times greater for cows in estrous versus cows not in estrous Confidence Interval for the Odds Ratio The sampling distribution of is not normal But the sampling distribution of log is approximately normal So we first compute a confidence interval for log And then transform back exponentiate to get a confidence interval for Confidence Interval for the Odds Ratio The sampling distribution of is not normal But the sampling distribution of log is approximately normal So we first compute a confidence interval for log And then transform back exponentiate to get a confidence interval for Confidence Interval for log log z 2 SElog r SElog 1 1 1 1 n11 n12 n21 n22 The 2 2 table is given by n11 n21 n12 n22 Steps for the Confidence Interval for odds ratio 1 Calculate log 2 Construct a confidence interval for log using the formula log z 2 SElog 3 Exponentiate the endpoints to get a confidence interval for Comments log means log base e Your textbook does not use ln Example on the board Find a 95 confidence interval for the odds ratio of being bitten by a vampire bat between cows in estrous and those not Example Interpretation Under the study conditions in Costa Rica we are 95 con dent that the odds that a cow in estrous is bitten by a vampire bat are between 34 392 and 384 536 times higher than for cows not in estrous Part II Chi square tests Introduction to Chi square Tests The 2 Test The data we observe is the category of each individual summarized by …
View Full Document