DOC PREVIEW
ISU STAT 401 - hw10ans

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Stat 401 X/FW HW 10 answers (sketch for problem 1) 1) There is not any one specific way to do this problem. We gave 8 points for process and 5 points for predictions. You get 3 points for free because I removed a question. Process points: Examining scatterplot OR residual*pred plot to check lineari ty: 3 pts All subsets variable selection: 2 pts Checking Cook's D: 1 pt Describing process clearly: 2 pts None of these are all or nothing. If you plotted something but didn't look at it clearly, that might be ‐1. I told you to log transform all size variables. logweight with log transformed X: AIC and BIC: loglength logchest logneck Cook's D: very influential point (D=5.3 and one ca 0.9) residual plot: some unusually large studenti zed residuals (ca ‐4 and +4) predictions: 28.6, 125.2, 478.9 or 3.35, 4.83, 6.17 on the log scale prediction points: 4 out of 5 If remove the bear with D=5.3, the second moves up to D=1.1. Refit without those two points: Cook's D: great Residual plot: looks great predictions: 31.1, 124.8, 457.1 or 3.43, 4.83, and 6.12 on the log scale prediction points: 5 out of 5 If someone adds 95% prediction intervals for each prediction, give them a bonus point. Those intervals, for the model fit with 2 points removed, are: (25.0, 38.7), (102.7, 151.8), (371, 563) If you only removed one point, things still look good and the predictions are very good. predictions: 32.2, 124.2, 441.5 or 3.47, 4.82, and 6.09 on the log scale. prediction points: 5 out of 5 The actual weights are 34, 125, and 446, or 3.53, 4.83, 6.10 on the log scale. If they used a different model, give them 5 prediction points if all three predictions are within 10% of the actual weights. 4 points if they are all within 20%. 3 points if all within 30%, 2 points if all within 40% and 1 point if all within 50%. My SAS code: data bears; infile 'bear.txt' firstobs=2; input length sex weight chest headlen headwid month neck; run; data bear2; infile 'bear2.txt' firstobs=2; input length sex weight chest headlen headwid month neck; run; data all; set bears bear2; loglength = log(length); logweight = log(weight); logchest = log(chest); loghead = log(headlen); logwidth = log(headwid); logneck = log(neck); run; proc reg; model logweight = loglength logchest loghead logwidth logneck month sex / selection=cp best=10 aic sbc; run; proc reg data=all; model logweight = loglength logchest logneck; output out=resids r=resid p=yhat cookd=d rstudent=rs; run; proc print; where d > 0.8; run; proc sgplot; scatter x=yhat y=rs; run; data all2; set all; if d < 0.8; /* I know that if I rerun without only bear 12, then bear 49 */ /* (currently d=0.91) goes over 1; this eliminates both */ /* another way to eliminate points */ i = _n_; /* I is the observation number */ if i = 12 then delete; if i = 49 then delete; run; proc reg; model logweight = loglength logchest logneck; output out=resids r=resid2 p=yhat2 cookd=d2 rstudent=rs2 lcl=lpi ucl=upi; run;/* since the all and all2 data sets already have resid, yhat, d, */ /* and rs, variables, use new names here */ /* If you try to use the old names, SAS will refuse to do that */ /* and use new names anyway */ /* there is a note in the log, but that is often overlooked */ /* then you just get frustrated when resid hasn't changed */ data resids2; set resids; predwt = exp(yhat2); lpi = exp(lpi); upi = exp(upi); run; proc print; where weight = .; var predwt lpi upi; title 'Predictions'; run; 2) removed from this week’s assignment 3) iridium data, 1 point each a) F = 1.03, p = 0.43. No evidence of interaction between strata and depth. b) Hard to tell. If you want to look hard to find a problem it would be that the variance of the residuals is greatest for intermediate predicted values. Note: This is difficult to fix by transformation. There are not many observations, especially at large predicted values, so I would probably call this plot adequate. c) F = 12.6, p < 0.0001. There is strong evidence that at least one depth mean, averaged over strata, differs from the others. d) (1,3), (2,3), (3,4), (3,5) or Depth 3 is significantly different from each of the others (p varying from 0.019 to < 0.0001). There is no evidence of differences among any other pair of depths. Note: need to use Tukey adjustment because interested in all pairs. My SAS code: data iridium; infile 'iridium.txt' firstobs=2; input iridium strata depth; run; proc glm; class strata depth; model iridium = strata depth strata*depth; lsmeans depth / pdiff adjust=tukey; output out=resids r=resid p=yhat; run; proc sgplot; scatter x=yhat y=resid; run;


View Full Document

ISU STAT 401 - hw10ans

Download hw10ans
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view hw10ans and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view hw10ans 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?