DOC PREVIEW
Duke STA 101 - Exploratory data analysis

This preview shows page 1-2 out of 5 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 5 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

9/16/09 1 Not in FPP Exploratory data analysis with two qualitative variables Exploratory data analysis with two qualitative variables  Main tools  Contigency tables  Conditional, marginal, and joint frequencies Motivating example  Surviving the Titanic  Was there a class discrimination in survival of the wreck of the Titanic?  “It has been suggested before the Enquiry that the third-class passengers had been unfairly treated, that their access to the boat deck had been impeded; and that when they reached the deck the first and second-class passengers were given precedence in getting places in the boats.” Lord Mersey, 1912 Titanic: Class by survival9/16/09 2 Titanic: Marginal frequencies  % Dead = 1513/2224 = 0.68  % Alive = 711/2224 = 0.32  % in first class = 325/2224 = 0.14  % in second class = 285/2224 = 0.13  % in third class = 706/2224 = 0.32  % crew = 908/2224 = 0.41 Titanic: Conditional frequenceis  % (Alive | 1st) = 203/325 = 0.625  % (Alive | 2nd) = 118/285 = 0.414  % (Alive | 3rd) = 178/706 = 0.252  % (Alive | Crew) = 212/908 = 0.233  Based on these frequencies does there appear to be class discrimination? Titanic: Class by person type 1st Class 2nd Class 3rd Class Crew Child. 6 24 79 0 109 Wom. 144 93 165 23 425 Men 175 168 462 885 1690 325 285 706 908 2224 Titanic: percentage of men in each class  % (Man | 1st) = 175/325 = 0.54  % (Man | 2nd) = 168/285 = 0.59  % (Man | 3rd) = 462/706 = 0.65  % (Man | Crew) = 885/908 = 0.97  There are larger percentages of men in third class and crew9/16/09 3 Surviving the Titanic  A reason for class differences in survival:  Larger percentages of men died  3rd class consisted of mostly men.  Hence, a larger percentage of 3rd class passengers died.  Be alert for effects of other variables when considering relationships Relative risk and odds ratios  Motivating example  Physicians’ health study (1989): randomized experiment with 22071 male physicians at least 40 years old  Half the subjects assigned to take aspirin every other day  Other half assigned to take a placebo, a dummy pill that looked and tasted like aspirin Physicians’ health study  Here are the number of people in each cell: Heart attack No heart attack Aspirin 104 10933 Placebo 189 10845 Relative risk y1 y2 x1 a b x2 c d Risk of y1 for level x1=a/(a+b) Risk of y1 for level x2=c/(c+d) € Relative risk =a/(a +b)c /(c + d)9/16/09 4 Relative risk for physicians’ health study  Relative risk of a heart attack when taking aspirin versus when taking a placebo equals  People that take aspirin are 0.55 times less likely to have a heart attack € RR =104 /(104 + 10933)189 /(189 +10845)= 0.55Odds ratios y1 y2 x1 a b x2 c d Odds of y1 for level x1=a/b Odds of y1 for level x2=c/d € Odds ratio =a/bc / dOdds ratios for physicians’ health study  Relative risk of a heart attack when taking aspirin versus taking a placebo is  Odds of having a heart attack when taking aspirin over odds of a heart attack when taking a placebo (odds ratio) € RR =104 /(104 + 10933)189 /(189 +10845)= 0.55€ OR =104 /10933189 /10845= 0.546Interpreting odds ratios and relative risks  When the variables X and Y are independent  odds ratio = 1 relative risk = 1  When subjects with level x1 are more likely to have y1 than subjects with level x2, the  odds ratio > 1 relative risk > 1  When subjects with level x1 are less likely to have y1 than subjects with level x2, then  odds ratio < 1 relative risk < 19/16/09 5 Odds in the news  William Safire, February 7, 2002  Odds against being Democratic candidate for presidency in 2004  Gore 2:1  Lieberman 5:1  Daschle 4:1  Gephardt 15:1  Biden 5:1  Edwards 9:1  Kerry 4:1  Leahy 6:1  Dodd 4:1  Feingold 8:1 Relative risk vs absolute risk  % smokers who get lung cancer: 8% (conservative guess here)  Relative risk of lung cancer for smokers: 800%  Getting lung cancer is not commonplace, even for smokers. But, smokers’ chances of getting lung cancer are much, much higher than non-smokers’ chances. Simpsons paradox  When a third variable seemingly reverses the association between two other variables  Hot hand


View Full Document

Duke STA 101 - Exploratory data analysis

Download Exploratory data analysis
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Exploratory data analysis and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Exploratory data analysis 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?