New version page

UT SDS 328M - SDS 328M Lab

Documents in this Course
Load more

This preview shows page 1-2 out of 6 pages.

View Full Document
View Full Document

End of preview. Want to read all 6 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

Malak Alammar ` SDS 328M Lab 11 1. State the two specific research questions that you are going to attempt to answer with your analysis. It should be clear from reading your research question which variables you are analyzing. a. When controlling for smoking status of primary beneficiaries (smoker, non-smoker, former smoker), is there a significant difference in annual medical expenses billed to a health insurance company between the aforementioned individuals distinguished by three different BMI categories (healthy, overweight, and obese)? b. Is there an interaction between the difference in levels of BMI (healthy, overweight, obses) and smoking status (smoker, former smoker, non-smoker) of our subjects? 2. Run a multi-factor ANOVA model with interaction to answer your two research questions. Include all steps and write a full conclusion in context. You do not need to confirm assumptions again, but you should log-transform the response variable so that the normality assumption is met. Report an appropriate effect size for your entire model with your interpretation. a. Assumptions: random sample, independent observations, each group is normally distributed, equal variance among all groups (homogeneity of variance, homoscedasticity) b. Hypotheses: i. Hypothesis Set 1: 1. H0: Accounting for the smoking status of the primary beneficiary, there is no difference in total annual medical expenses billed by health insurance across the three different BMI categories (healthy weight, overweight, obese). 2. HA: Accounting for the smoking status of the primary beneficiary, there is a significant difference in total annual medical expensesbilled by health insurance across the three different BMI categories (healthy weight, overweight, obese). ii. Hypothesis Set 2: 1. H0: Accounting for BMI, there is no difference in total annual medical expenses billed by health insurance across the different smoking statuses of the primary beneficiary (smoker, former smoker, not a smoker) 2. HA: Accounting for BMI, there is a difference in total annual medical expenses billed by health insurance across the different smoking statuses of the primary beneficiary (smoker, former smoker, not a smoker) c. R Output i. d. Conclusions: i. While controlling for the smoking status of the primary beneficiary, the log of total annual medical expenses billed by health insurance do significantly differ based on patient BMI (F = 15.808, df = (2, 1290), p < .05). ii. While controlling for BMI, the log of total annual medical expenses billed by health insurance do significantly differ based on the smoking status of the primary beneficiary (F = 505.473, df = (2, 1290), p < .05). iii. There is also a significant interaction between BMI and the smoking status of the primary beneficiary on the log of total annual medical expenses billed by health insurance (F = 11.294, df = (4, 1290), p < .05). e. Effect Sizei. BMI and the smoking status of the primary beneficiary account for 51.92% of the variation in the log of total annual medical costs billed by health insurances. 3. Briefly describe at least one limitation to this study unrelated to assumptions and what could be done differently in a future study to address it. a. There was a limitation to our result due to the fact that we had to transform our data of cost since the originally collected data was shown to be not normal upon observation of our 1st grouped boxplot. Since data was not normal originally, it fails our assumption of normal distribution of values within each categorical group. To remedy this, we transformed the value of cost by finding log values of each data point. Once transformation occurs, we are guaranteed less general strength of hypothesis tests. Log(costs) is generally not as applicable as true monetary value when comparing differences between our three categorical levels. 4. Do the results of this analysis differ from what you found in Lab 10, and if so, why do you think that is? What future analyses should be considered? a. The null hypotheses was rejected, meaning there is a significant difference in the log of annual medical expenses between people of three different BMI categories: healthy, overweight, and obese, while controlling for the smoking status of the primary beneficiary. These results also showed that there is a significant difference in the log of annual medical expenses between different smoking statuses while controlling for BMI. Finally, the results also showed that there is a significant interaction between BMI and smoking status on the log of total annual medical expenses. b. In lab 10, the null hypothesis was rejected, meaning there is a significant difference in the log of annual medical expenses between people of three different BMI categories: healthy, overweight, and obese. So in this lab, we find that different levels of BMI still have a significant effect on the log of total annual medical costs billed by health insurance even though we are now controlling for the smoking status of the primary beneficiary. c. Future analyses could identify a different variable(s) or a different combination/interaction that might be able to explain more of the variance in ourresponse variable (costs). This would have an impact on how our population can expect medical costs to change based on their conditions.R Code: > mydata <- read.csv("Lab11_insurance.csv") > library(car) > options(contrasts = c(unordered = "contr.sum", ordered = "contr.poly")) > int_anova1 <- lm(log(costs) ~ bmi * smoker, data = mydata) > Anova(int_anova1, type = 3) Anova Table (Type III tests) Response: log(costs) Sum Sq Df F value Pr(>F) (Intercept) 41880 1 111496.425 < 2.2e-16 *** bmi 12 2 15.808 1.651e-07 *** smoker 380 2 505.473 < 2.2e-16 *** bmi:smoker 17 4 11.294 5.209e-09 *** Residuals 485 1290 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > summary(multi_anova)$adj.r.squared [1]


View Full Document
Loading Unlocking...
Login

Join to view SDS 328M Lab and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view SDS 328M Lab and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?