Correlation Plots r 0 97 Departments of Botany and of Statistics University of Wisconsin Madison Statistics 371 y Bret Larget 2 1 6th December 2005 0 1 2 Regression 2 1 0 1 2 1 2 x Correlation Correlation Plots r 0 21 Notice that the correlation is not affected by linear transformations of the data such as changing the scale of measurement 5 0 5 10 y r n yi y 1 X xi x n 1 sx sy Pi 1 x x yi y pP i P xi x 2 yi y 2 10 The correlation coefficient r is measure of the strength of the linear relationship between two variables 2 1 0 Correlation Plots Correlation Plots r 1 0 y 2 10 5 1 0 y 5 1 10 2 r 0 21 2 1 0 1 2 2 1 x 1 2 1 2 x Correlation Plots Correlation Plots r 0 2 0 2 1 1 y 0 3 1 4 2 r 1 y 0 2 1 0 1 2 2 1 0 Correlation Plots Summary of Correlation I 4 r 0 97 I 3 I 2 y I 1 I I 0 I 0 0 0 5 1 0 1 5 2 0 The correlation coefficient r measures the strength of the linear relationship between two quantitative variables on a scale from 1 to 1 The correlation coefficient is 1 or 1 only when the data lies perfectly on a line with negative or positive slope respectively If the correlation coefficient is near one this means that the data is tightly clustered around a line with a positive slope Correlation coefficients near 0 indicate weak linear relationships However r does not measure the strength of nonlinear relationships If r 0 rather than X and Y being unrelated it can be the case that they have a strong nonlinear relationshsip If r is close to 1 it may still be the case that a nonlinear relationship is a better description of the data than a linear relationship x Correlation Plots Simple Linear Regression 4 r 0 97 Simple linear regression is the statistical procedure for describing the relationship between an quantitative explanatory variable X and a quantitative response variable Y with a straight line I In simple linear regression the regression line is the line that minimizes the sum of the squared residuals 2 1 0 y 3 I 2 0 1 5 1 0 0 5 0 0 Riley Finding a best linear fit Riley Larget is my son I Below is a plot of his height versus his age from birth to 8 years I Any line we can use to predict Y from X will have the form Y b0 b1 X where b0 is the intercept and b1 will be the slope I The value y b0 b1 x is the predicted value of Y if the explanatory variable X x I In simple linear regression the predicted values form a line In more advanced forms of regression we can fit curves or fit functions of multiple explanatory variables 45 35 25 Height inches 55 I 0 20 40 60 80 Age months I The plot indicates that it is not reasonable to model the relationship between age and height as linear over the entire age range but it is fairly linear from age 2 years to 8 years 24 96 months Finding a best linear fit I For each data point xi yi the residual is the difference between the observed value and the predicted value yi y i I Graphically each residual is the positive or negative vertical distance from the point to the line I Simple linear regression identifies the line that minimizes the n X residual sum of squares yi y i 2 45 40 Height inches 50 Riley i 1 30 40 50 60 Age months 70 80 90 Least Squares Regression The General Case I We won t derive them but there are simple formulas for the slope and intercept of the least squares line as a function of the sample means standard deviations and the correlation coefficient Y Let X x zsx so X is z standard deviations above the mean y y b1 x b1 x b1 zsx sy zsx y r sx y rz sy b0 b1 X b1 r sy sx I Notice that if X is z SDs above the mean we predict Y to be only rz SDs above the mean I In the typical situation r 1 so we predict the value of Y to be closer to the mean in standard units than X I This is called the regression effect b0 y b1 x A Special Case b0 b1 x zsx Riley cont Consider the predicted value of an observation X x y n length age2 mx mean age2 sx sd age2 my mean height2 sy sd height2 r cor age2 height2 print c mx sx my sy r n b0 b1 x 1 61 5625000 21 8661649 45 6718750 y b1 x b1 x b1 r sy sx b0 my b1 mx print c b0 b1 1 30 2493290 5 4829043 0 9990835 16 0000000 0 2505185 y So the regression line always goes through the point x y Riley s predicted height in inches 30 25 0 25 Riley s age in months Riley Interpretation I We can interpret the slope to mean that from age 2 to 8 years Riley grew an average of about 0 25 inches per month or about 3 inches per year I The intercept is the predicted value when X 0 or Riley s height length at birth I This interpretation may not be reasonable if 0 is out of the range of the data 45 40 Height inches 50 Riley Plot of Data 30 40 50 60 70 80 90 Age months Riley Extrapolation 50 30 40 Height inches 0 2 0 0 20 0 2 residuals fit2 60 0 4 70 Riley A Residual Plot 40 45 fitted fit2 50 55 0 50 100 Age months 150 Riley Extrapolation Predicted height at age 15 y 30 25 0 25 180 75 3 I 50 40 In a study on oocytes developing egg cells from the frog Xenopus laevis a biologist injects individual oocytes from the same female with radioactive leucine to measure the amount of leucine incorporated into protein as a function of time See Exercise 12 3 on page 536 20 30 Height inches 60 70 I Frogs 0 50 100 150 Residual Standard Deviation R Entering the data I The residual standard deviation sY X is a measure of a typical size of a residual Here is some R code to create a data frame frog with the time and leucine variables I Its formula is time c 0 10 20 30 40 50 60 leucine c 0 02 0 25 0 54 0 69 1 07 1 5 1 74 frog data frame time leucine frog 1 2 3 4 5 6 7 time leucine 0 0 02 10 0 25 20 0 54 30 0 69 40 1 07 50 1 50 …
View Full Document