ISU STAT 401 - hw8ans - D3068833

Home> Schools> Iowa State University> Statistics (STAT) > STAT 401> hw8ans

DOC PREVIEW

ISU STAT 401 - hw8ans

School name Iowa State University

Course Stat 401- Stat Meth for Rsrch

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Stat 401 F/XW HW 8 sketch answers: 1) Meat pH over 24 hours (2 pt each part) a) There seems to be lack of fit. Residuals plot show a clear bend indicating lack of fit. b) There is strong evidence of lack of fit. F value = 10.08, p value = 0.0078 proc glm; class time; model ph = logtime time /ss1; title 'ANOVA LOF test to log time'; run; Source DF Type I SS Mean Square F Value Pr > F logtime 1 3.51940421 3.51940421 357.60 <.0001 time 4 0.39697079 0.09924270 10.08 0.0078 c)Last two observations have ph values of 5.3 and 5.47 so we do not need those to predict ph = 6. SAS code data meat; infile 'meat24.txt' firstobs=2; input time ph; logtime = log(time); run; proc glm data=meat; model ph=logtime; output out=resids r=resid p=yhat; run; /*a*/ proc sgplot data=resids; scatter y=resid x=yhat; run; /*b lack of fit */ resid-0.4-0.3-0.2-0.10.00.10.20.30.4yhat5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9proc glm; class time; model ph = logtime time /ss1; title 'ANOVA LOF test to log time'; run; 2) Insect diversity in tropical forest patches a) 2 pts, 1 for estimates, 1 for test. Intercept: 36.2, slope for log(area): 12.4 test of slope=0: T = 4.48, p = 0.0005 b) 2 pts. experimental study, casual inference is allowed for example, increasing log(area) by one unit will increase the number of butterfly species by 12.4. Better: doubling the area will increase the number of butterfly species by an average of 8.6. Note: 8.6 = ln(2)*12.4. You can calculate this change by predicting the number of species for an area of 1 = 36.2 + 12.4*ln(1) = 36.2 and for an area of 2 = 36.2 = 12.4*ln(2) = 44.8. The difference is 8.6. The choice of areas of 1 and 2 is arbitrary. You get the same change for areas of 10 and 20, or areas of 50 and 100. c) 1 pt. F for lack of fit = 0.98, p = 0.40 . No evidence of lack of fit of the regression with X=log(area). d)2 pt. This asks 2 questions, which I label da) and db) da) 󰇛󰇜󰇛󰇜󰇛󰇜 is smallest when X0=mean(x). For these data, that occurs at X0 = mean(X) = 2.302, or area of 10.0. Note: as I suggested in my e-mail, look at the only piece that depends on X0 and see when that is the smallest, that is when You can either do this algebraically, using the formula for the se of the mean. You really want the se of an observation, but that equals sqrt(s^2 + (se mean)^2). Since the se of the mean is constant, this is smallest when se mean is smallest. Or, you can evaluate the se obs for a range of X values. Printing out the se's shows that something near area = 10 has the smallest se obs. db) se(pred) = sqrt(s^2 + se(mean)^2). se(mean) is the only piece that varies with the choice of X0, so se(pred) is minimized at the same place as se(mean), i.e. an area of 10. e) 1 pt. No. The smallest possible se(pred) for n = 16 patches is 󰇛󰇜1 ≈ 24.6 f) 1 pt. No. When n is increased to 160, the smallest possible value for se is around 23.78 󰇛󰇜1 ≈ 23.78 Note: Increasing the sample size (i.e. measuring more patches) won't help here. That will decrease se(mean), but do nothing (on average) to s^2. Notice that for these data, the root MSE = 23.78. Most of the se(pred) is due to the large s^2. To get a more precise estimate of the speciesrichness, the investigators will have to restict their study to more homogeneous patches (perhaps by controlling for patch age or not studying as large a geographic area), or they will need to find additional variables that reduce the variability in the predictions. /* My SAS code: */ data diversity; infile 'diversity.txt' firstobs=2; input area species; logarea = log(area); carea = area; run; proc glm; model species = logarea; title 'Diversity data'; run; proc glm; class carea; model species = logarea carea /ss1; title 'ANOVA LOF for logarea regression'; run; proc means; var logarea; title "To find mean(X), which is the X with the smallest se mean"; run; 3) Pace of Life and Heart Disease a) 2 pt. Scatterplot Matrix. Note: General pattern is no clear relationship between any pair of variables.b) No and. Note: yoso we releast ononly 22%nswer needou might neject the ne coeffic%, so thisded, since notice thatnull hypotcient is dis model hasthe answet the overthesis thaifferent fs little eer here is rall F-statat all coeffrom 0. Hoexplanatory the same tistic hasfficients owever, thy power. as the anss a p-valueare equal he R2 of thswer for pe of 0.041to zero. he model ipart 16 At isc) 1 pt. The residual plot (above) is ok. Note: You might notice a very slight tendency for increase in error variance for the first few observations but this pattern is not continued for anything with a predicted value larger than 18. My (PMD) practice is to not over-interpret small hiccups in these plots. d) 2 pt. From SAS we get Standard Parameter Estimate Error t Value Pr > |t| Intercept 3.178695669 6.33694595 0.50 0.6194 bank 0.405216955 0.19710205 2.06 0.0480 walk 0.451601069 0.20087352 2.25 0.0316 talk -0.179609584 0.22221536 -0.81 0.4249 So the requested equation with se’s below the components is: mean heart = 3.18 + 0.40 bank + 0.45 walk – 0.18 talk (6.33) (0.20) (0.20) (0.22) SAS code: data pace; infile 'pace.txt' firstobs=2; input bank walk talk heart; run; /*a*/ proc sgscatter; matrix bank walk talk heart; run; /* b and d */ proc glm data=pace; resid-9-8-7-6-5-4-3-2-10123456789yhat15 16 17 18 19 20 21 22 23 24 25ANOVA LOF test to log timemodel heart = bank walk talk; output out=resids r=resid p=yhat; run; /* c */ proc sgplot; scatter x=yhat y=resid; run; e) 1 pt. One (of many) possible statements is: An increase of 1 unit of walking rate while holding the talking rate and bank processing rate constant is associated with a 0.45 unit increase in heart attack death rate. Note: This is an observational study, so writing a one‐sentence conclusion is complicated by not making that conclusion sound like a causal claim. The key

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

ISU STAT 401 - hw8ans

Sign up for free to view:

Please select your school