Chapter 12 Correlation and Regression Fall 2011 150 100 50 0 Daily energy expenditure kJ day 200 Example Energy expenditure in African Mole rats 0 50 100 Body mass g 150 200 10 8 6 4 2 0 Time to clean the streets h 12 Example snow fall and time to clean the streets 0 2 4 6 8 Snow fall in 10 12 Correlation examples 30 r 0 17 14 r 0 95 4 y 0 20 2 1 10 6 y 8 10 10 20 12 2 3 4 x 5 6 1 2 3 4 x 5 Correlation examples 20 r 0 32 8 y 6 4 10 10 1 2 3 x 4 5 0 2 y 40 30 20 10 0 r 0 94 1 5 2 0 2 5 3 0 3 5 x 4 0 4 5 5 0 Correlation examples r 1 0 r 1 8 2 10 4 y 6 6 y 4 8 2 1 2 3 x 4 5 1 2 3 x 4 5 Correlation examples y 2 1 0 y 10 1 2 3 x 4 5 1 2 3 x 4 5 15 10 y 5 3 r 0 97 5 r 0 97 0 15 0 4 r 0 1 2 3 x 4 5 Inference for the slope 1 H0 Y is not linearly related to X or 1 0 versus HA Y is linearly related to X or 1 6 0 Anova table and F test Source df Regression Residual 1 n 2 Total n 1 SS P b12 xi x 2 P yi y 2 MS F SS df SS df MS Reg MS Res p value from F distribution df are 1 and n 2 Residual standard s deviation P p yi y i 2 se MS res n 2 Inference for the slope 1 Or t test se SEb1 pP xi x 2 t b1 SEb1 on df n 2 n number of pairs of animals etc We can also get confidence intervals for the true slope 1 b1 r sy sx H0 also means that the true correlation 0 Inference for the intercept 0 s SEb0 se 1 x 2 P n xi x 2 Then we use a t test with t b0 SEb0 on df n 2 n of pairs p Snowfall se P MS res 0 23 0 48 hours n 7 days x 3 48 in q xi x 2 22 3 and b0 0 31 hours 2 SEb0 0 48 17 3 48 22 3 0 39 hours A 95 confidence interval for 0 is 0 31 2 571 0 39 0 lies in the interval Good i e 0 69 1 31 hours Residuals x y y 31 1 38x r y y 3 2 1 4 2 6 6 9 3 6 1 7 5 0 4 9 2 4 4 4 9 6 4 8 2 1 7 7 4 73 2 24 3 90 9 83 5 28 2 66 7 21 0 17 0 16 0 50 0 23 0 48 0 56 0 49 10 0 4 2 0 4 4 0 0 6 Residuals 0 2 8 0 2 0 6 0 Time to clean the streets h 12 Residual plots snow fall data 0 2 4 6 8 Snow fall in 10 12 2 4 6 Predicted values 8 10 Residual plots We look at the residual plot to see if the assumptions of the linear model are met 2 The relationship between X and Y is linear 4 The residual standard deviation e does not depend on X homogeneity of variance Nice cloud of points without any pattern 30 20 150 50 10 30 0 10 100 Residuals 0 Daily energy expenditure kJ day 200 Residual plots mole rat data 0 50 100 Body mass g 150 200 40 60 80 100 120 Predicted values 140 R commands for regression bodymass c 42 57 70 74 65 79 82 158 165 energy c 40 43 53 60 72 69 70 105 168 plot bodymass energy pch 16 cor energy bodymass fit lm energy bodymass summary fit anova fit abline fit residuals fit plot bodymass residuals fit pch 16 abline h 0 qqnorm residuals fit pch 16 plot fit Example tree age in the Amazon rain forest Exercise 12 43 20 trees X diameter cm and Y age yr using Carbon dating Analysis of Variance Table Df Sum Sq Mean Sq F value Pr F diameter 1 423561 423561 5 0824 0 03687 Residuals 18 1500095 83339 Coefficients Estimate Std Error t value Pr t Intercept 18 770 265 148 0 071 0 9443 diameter 4 392 1 948 2 254 0 0369 r 0 42
View Full Document