DOC PREVIEW
CMU STA 36402-36608 - Homework

This preview shows page 1 out of 2 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 2 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Homework 7: Diabetes36-402, Advanced Data AnalysisDue at the start of class, 22 March 2011A classic data set for classification problems, logistic regression and relatedmethods comes from a study of the correlates of diabetes among the PimaIndians of Arizona, collected as part of a long-term study to understand whythe Pima, like many other Native American groups, suffer from a much higherrate of diabetes than other populations in the US. (For background on thestudy, and the issue, see http://diabetes.niddk.nih.gov/dm/pubs/pima/.)Our version of the data is the data set pima in the package faraway.1It containsinformation of 768 adult Pima women, some but not all of whom have diabetes.See help(pima) for a description of the variables. Note that the column nameddiabetes indicates how much of a history of diabetes there was in the woman’sfamily; it is the last column, test, which indicates whether the or not the womanherself is diabetic.1. (10 points) Make graphic and numerical summaries of the data. If thereare any obvious irregularities in the data, describe them, say why youthink they are irregularities, and remove them as appropriate.2. (20 points) Fit a logistic regression model to predict diabetes, using allthe other variables as inputs. What are the estimated coefficients?3. (10 points) What is the probability of having diabetes for a woman whohas been pregnant twice, has a glucose concentration of 99, a diastolicpressure of 64, 22 mm of tricep thickness, an insulin level of 76, a BMI of26, a diabetes “pedigree function” of 0.25, and is 30 years old. Give a 95%confidence interval for this prediction, assuming the model is correctlyspecified.4. (10 points) How do the odds of having diabetes change for a woman whomoves from the third quartile of the BMI distribution to the first quar-tile, with all else held constant? Give a 95% confidence interval for thedifference in odds, assuming the model is correct specified.5. (20 points) Do women with diabetes have higher diastolic blood pressurethan women without diabetes? Is the blood pressure coefficient signifi-cant in your model? Explain why the answers to these two questions areactually compatible.1This homework is in fact based on problem 3 in chapter 2 of Faraway’s textbook.16. (10 points) Describe how you can check whether this model fits the data.7. (20 points) Does the model fit the data?8. (10 points, extra credit) Use bootstrapping to find confidence intervals forthe coefficients from question (2), the predicted probability in question(3), and the difference in odds in question (4). Compare them to yourearlier answers, and explain how this relates to your findings in


View Full Document

CMU STA 36402-36608 - Homework

Documents in this Course
Load more
Download Homework
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Homework and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Homework 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?