# UCLA STATS 101A - stats_101a_hw2 (14 pages)

Previewing pages*1, 2, 3, 4, 5*of 14 page document

**View the full content.**## stats_101a_hw2

Previewing pages
*1, 2, 3, 4, 5*
of
actual document.

**View the full content.**View Full Document

## stats_101a_hw2

0 0 426 views

- Pages:
- 14
- School:
- University of California, Los Angeles
- Course:
- Stats 101a - Introduction to Design and Analysis of Experiment

**Unformatted text preview:**

Stats 101a Hw 2 Linda Che 404449070 Section 1A 10 24 2017 Problem 1 a In the conclusion it was stated 99 4 of variability is du to calculating the sum of the residuals This contradicts the plot of residuals where there are some points far from the center indicating there are leverage points and outliers The residual plot shows a pattern and is not completely randomized b The traditional linear model does seem to fit it well except the residual plot seems to have a pattern You can improve it by getting rid of a leverage point and outliers Getting rid of the outliers would make the x a better predictor value and would improve the R 2 It would also randomize the residuals so that the residual plot doesn t violate the linear model Problem 2 library alr3 Loading required package car library car diamonds read table Downloads diamonds txt header TRUE part 1 a m2 lm Size Price diamonds summary m2 Call lm formula Size Price data diamonds Residuals Min 1Q Median 3Q Max 0 019873 0 005567 0 000120 0 005478 0 022062 Coefficients Estimate Std Error t value Pr t Intercept 0 0723394 0 0030740 23 53 2e 16 Price 0 0002634 0 0000057 46 20 2e 16 Signif codes 0 0 001 0 01 0 05 0 1 1 Residual standard error 0 008414 on 47 degrees of freedom Multiple R squared 0 9785 Adjusted R squared 0 978 F statistic 2135 on 1 and 47 DF p value 2 2e 16 plot diamonds Price diamonds Size abline m2 par mfrow c 2 2 plot m2 b The linear model seems to fit the plot pretty well and there seems to be a linear relationship between the size and price of diamonds However if you look closely at the residual plot you can see that there is a pattern which would be a violation of the traditional linear fit Part 2 a summary powerTransform cbind diamonds Size diamonds Price 1 data diamonds bcPower Transformations to Multinormality Est Power Rounded Pwr Wald Lwr bnd Wald Upr Bnd Y1 0 2393 0 1 0400 0 5615 Y2 0 0172 0 0 6114 0 5771 Likelihood ratio tests about transformation parameters LRT df pval LR test lambda 0 0 1 432924 2 0 488477551 LR test lambda 1 1 13 325320 2 0 001277743 TSize diamonds Size 1 4 TPrice log diamonds Price plot TSize TPrice m3 lm TPrice TSize summary m3 Call lm formula TPrice TSize Residuals Min 1Q Median 3Q Max 0 223411 0 045628 0 001625 0 038482 0 141232 Coefficients Estimate Std Error t value Pr t Intercept 12 2252 0 1546 79 09 2e 16 TSize 4 0501 0 1025 39 53 2e 16 Signif codes 0 0 001 0 01 0 05 0 1 1 Residual standard error 0 06816 on 47 degrees of freedom Multiple R squared 0 9708 Adjusted R squared 0 9702 F statistic 1563 on 1 and 47 DF p value 2 2e 16 par mfrow c 2 2 plot m3 b The R 2 for this mondel is weaker than the previous model because of the transformation However it still looks very linear with a better residual graph Part 3 Comparing the two plots we can say the transformed plot is a better fit because it still maintains a linear relationship but the residual plot has more randomized points and is not a violation to the linear model However the R 2 for the transformed model is lower showing that the relationship in the plot is weakened by the transformation There are less violations Problem 3 part a library readr echo2 read csv Downloads echo2 csv m1 lm basebp sbp echo2 summary m1 Call lm formula basebp sbp data echo2 Residuals Min 1Q Median 3Q Max 53 078 13 390 0 638 12 967 57 949 Coefficients Estimate Std Error t value Pr t Intercept 106 75829 4 34105 24 593 2e 16 sbp 0 19724 0 02915 6 766 5 82e 11 Signif codes 0 0 001 0 01 0 05 0 1 1 Residual standard error 18 8 on 339 degrees of freedom Multiple R squared 0 119 Adjusted R squared 0 1164 F statistic 45 78 on 1 and 339 DF p value 5 823e 11 plot echo2 sbp echo2 basebp abline m1 sbp resid resid m1 plot echo2 sbp sbp resid manual residual plot abline 0 0 par mfrow c 2 2 plot m1 The first plot is the residual plot This shows us the difference of the actual y value and the predicted y value In this case the points of residual are randomly dispersed on the horizontal which means that a linear model would be appropriate for this data part b confint m1 level 0 95 2 5 97 5 Intercept 98 2195154 115 2970707 sbp 0 1399007 0 2545818 hist echo2 sbp hist echo2 basebp Both histograms are symmetric so it follows the Vitrivius theory part c anova m1 Analysis of Variance Table Response basebp Df Sum Sq Mean Sq F value Pr F sbp 1 16183 16183 4 45 78 5 823e 11 Residuals 339 119838 353 5 Signif codes 0 0 001 0 01 0 05 0 1 1 16183 353 5 F value calculation from the ANOVA summary 1 45 77935 R2 0 1164 339 2 R2 1 R2 F value calculation 1 44 3943 We reject the null because the P value is to 0 05 Based on the F test above we reject the null that states that the two variances are equal part d se sqrt var echo2 basebp 1 cor echo2 sbp echo2 basebp 2 SE B se 1 18 77402 sse sum m1 residuals 2 sse 1 119837 7 ssr sum m1 fitted values mean echo2 basebp 2 ssr 1 16183 35 sst sse ssr sst 1 136021 rsquare abs cor echo2 basebp echo2 sbp use pairwise complete obs 2 R Squared value rsquare 1 0 1189768 1 rsquare 1 0 8810232 part e R2 is a measure of the linear relationship between our predictor variable sbp and our response basebp part f n 341 1 1 rsquare n 1 n 2 the adjusted Rsquare 1 0 1163779 The adjusted rsquared takes into consideration the degrees of freedom and thus it is a better measure of the linear relationship Problem 4 part a library descr table for gender and smoking habits CrossTable echo2 gender echo2 hxofCig Cell Contents N Chi square contribution N Row Total N Col Total N Table Total echo2 hxofCig echo2 gender heavy moderate non smoker Total female 43 39 140 222 0 036 5 155 2 738 0 194 0 176 0 631 0 651 0 632 0 453 0 749 0 126 0 114 0 411 male 25 47 47 119 0 068 9 616 5 108 0 210 0 395 0 395 0 349 0 368 0 547 0 251 0 073 0 138 0 138 Total 68 86 187 341 0 199 0 252 0 548 table for gender and heart attacks CrossTable echo2 gender echo2 hxofMI Cell Contents N Chi square contribution N …

View Full Document