Stat 401 X/FW HW 10 answers (sketch for problem 1) 1) There is not any one specific way to do this problem. We gave 8 points for process and 5 points for predictions. You get 3 points for free because I removed a question. Process points: Examining scatterplot OR residual*pred plot to check lineari ty: 3 pts All subsets variable selection: 2 pts Checking Cook's D: 1 pt Describing process clearly: 2 pts None of these are all or nothing. If you plotted something but didn't look at it clearly, that might be ‐1. I told you to log transform all size variables. logweight with log transformed X: AIC and BIC: loglength logchest logneck Cook's D: very influential point (D=5.3 and one ca 0.9) residual plot: some unusually large studenti zed residuals (ca ‐4 and +4) predictions: 28.6, 125.2, 478.9 or 3.35, 4.83, 6.17 on the log scale prediction points: 4 out of 5 If remove the bear with D=5.3, the second moves up to D=1.1. Refit without those two points: Cook's D: great Residual plot: looks great predictions: 31.1, 124.8, 457.1 or 3.43, 4.83, and 6.12 on the log scale prediction points: 5 out of 5 If someone adds 95% prediction intervals for each prediction, give them a bonus point. Those intervals, for the model fit with 2 points removed, are: (25.0, 38.7), (102.7, 151.8), (371, 563) If you only removed one point, things still look good and the predictions are very good. predictions: 32.2, 124.2, 441.5 or 3.47, 4.82, and 6.09 on the log scale. prediction points: 5 out of 5 The actual weights are 34, 125, and 446, or 3.53, 4.83, 6.10 on the log scale. If they used a different model, give them 5 prediction points if all three predictions are within 10% of the actual weights. 4 points if they are all within 20%. 3 points if all within 30%, 2 points if all within 40% and 1 point if all within 50%. My SAS code: data bears; infile 'bear.txt' firstobs=2; input length sex weight chest headlen headwid month neck; run; data bear2; infile 'bear2.txt' firstobs=2; input length sex weight chest headlen headwid month neck; run; data all; set bears bear2; loglength = log(length); logweight = log(weight); logchest = log(chest); loghead = log(headlen); logwidth = log(headwid); logneck = log(neck); run; proc reg; model logweight = loglength logchest loghead logwidth logneck month sex / selection=cp best=10 aic sbc; run; proc reg data=all; model logweight = loglength logchest logneck; output out=resids r=resid p=yhat cookd=d rstudent=rs; run; proc print; where d > 0.8; run; proc sgplot; scatter x=yhat y=rs; run; data all2; set all; if d < 0.8; /* I know that if I rerun without only bear 12, then bear 49 */ /* (currently d=0.91) goes over 1; this eliminates both */ /* another way to eliminate points */ i = _n_; /* I is the observation number */ if i = 12 then delete; if i = 49 then delete; run; proc reg; model logweight = loglength logchest logneck; output out=resids r=resid2 p=yhat2 cookd=d2 rstudent=rs2 lcl=lpi ucl=upi; run;/* since the all and all2 data sets already have resid, yhat, d, */ /* and rs, variables, use new names here */ /* If you try to use the old names, SAS will refuse to do that */ /* and use new names anyway */ /* there is a note in the log, but that is often overlooked */ /* then you just get frustrated when resid hasn't changed */ data resids2; set resids; predwt = exp(yhat2); lpi = exp(lpi); upi = exp(upi); run; proc print; where weight = .; var predwt lpi upi; title 'Predictions'; run; 2) removed from this week’s assignment 3) iridium data, 1 point each a) F = 1.03, p = 0.43. No evidence of interaction between strata and depth. b) Hard to tell. If you want to look hard to find a problem it would be that the variance of the residuals is greatest for intermediate predicted values. Note: This is difficult to fix by transformation. There are not many observations, especially at large predicted values, so I would probably call this plot adequate. c) F = 12.6, p < 0.0001. There is strong evidence that at least one depth mean, averaged over strata, differs from the others. d) (1,3), (2,3), (3,4), (3,5) or Depth 3 is significantly different from each of the others (p varying from 0.019 to < 0.0001). There is no evidence of differences among any other pair of depths. Note: need to use Tukey adjustment because interested in all pairs. My SAS code: data iridium; infile 'iridium.txt' firstobs=2; input iridium strata depth; run; proc glm; class strata depth; model iridium = strata depth strata*depth; lsmeans depth / pdiff adjust=tukey; output out=resids r=resid p=yhat; run; proc sgplot; scatter x=yhat y=resid; run;
View Full Document