Simple Linear Regression Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison January 28 2008 Statistics 572 Spring 2008 Simple Linear Regression Simple Linear Regression January 29 2008 1 12 Example Phosphorous Example Researchers gathered data to evaluate the use of phosphorus P by nine corn plants The data consist of x the inorganic P in soil ppm and y the plant available P ppm x y 1 64 4 71 5 54 9 81 13 93 11 76 23 77 23 95 28 109 We wish to use the inorganic phosphorous level in the soil to predict the plant available phosphorous in the corn plants It is good practice to put the data into an R data frame I will show two ways to accomplish this Statistics 572 Spring 2008 Simple Linear Regression January 29 2008 2 12 Simple Linear Regression Example Creating a Data Frame in R soilP c 1 4 5 9 13 11 23 23 28 cornP c 64 71 54 81 93 76 77 95 109 phos data frame soilP cornP rm soilP cornP str phos data frame 9 obs of 2 variables soilP num 1 4 5 9 13 11 23 23 28 cornP num 64 71 54 81 93 76 77 95 109 Statistics 572 Spring 2008 Simple Linear Regression Simple Linear Regression January 29 2008 3 12 Example Creating a Data Frame in Excel Create a spread sheet with a header row with variable names and one row per observation Save the file as a comma separated variable file CSV Read using read table with the sep argument phos2 read table phos csv sep header T str phos2 data frame 9 obs of 2 variables soilP int 1 4 5 9 13 11 23 23 28 cornP int 64 71 54 81 93 76 77 95 109 Statistics 572 Spring 2008 Simple Linear Regression January 29 2008 4 12 Simple Linear Regression Example Graphical exploration of two quantitative variables 110 100 cornP 90 library lattice plot xyplot cornP soilP data phos pch 16 80 70 60 0 5 10 15 20 25 soilP Statistics 572 Spring 2008 Simple Linear Regression Simple Linear Regression January 29 2008 5 12 Objectives Objectives of simple linear regression Description To describe the relationship between inorganic P in soil and plant available P Estimation To estimate the population mean plant available P level at a given level of inorganic P in soil Prediction To predict the plant available P level for an individual plant at a given level of inorganic P in soil Testing To test if there is a relationship between inorganic P in soil and plant available P Statistics 572 Spring 2008 Simple Linear Regression January 29 2008 6 12 Simple Linear Regression Model Simple Linear Regression Model yi 0 1 xi ei ei iid N 0 2 i 1 n y 0 1 x is the true regression line 0 is the intercept 1 is the slope xi is the explanatory variable yi is the response variable ei is random error iid stands for independent and identically distributed Statistics 572 Spring 2008 Simple Linear Regression Simple Linear Regression January 29 2008 7 12 January 29 2008 8 12 Assumptions Simple Linear Regression Assumptions 1 The model is correct E yi 0 1 xi 2 Errors ei are independent 3 Errors ei have homogeneous variance Var ei 2 4 Errors ei have normal distribution ei N 0 2 Statistics 572 Spring 2008 Simple Linear Regression Estimating Model Parameters Estimation Estimating Model Parameters A well estimated line should be close to the data points The least squares criterion says that best line is the one that Pn minimizes i 1 yi b0 b1 xi 2 The solution to this problem is Pn x x yi y i 1 Pn i 1 2 i 1 xi x 0 y 1 x The fitted values are y i 0 1 xi The estimated variance is 2 Statistics 572 Spring 2008 1 n 2 Pn i 1 yi y i 2 Simple Linear Regression Estimation January 29 2008 9 12 Estimating Model Parameters An Alternative Viewpoint The correlation coefficient r is a number between 1 and 1 that measures the strength of the linear relationship between x and y n 1 X xi x yi y r n 1 sx sy i 1 The estimated y for an x that is z standard deviations from the mean is rz standard deviations from the mean In other words y y rzsy The estimated slope and intercept are s 1 r syx 0 y 1 x The regression line goes through the point x y Statistics 572 Spring 2008 Simple Linear Regression January 29 2008 10 12 R Example Simple Linear Regression in R library arm fit lm cornP soilP data phos display fit lm formula cornP soilP data phos coef est coef se Intercept 61 58 6 25 soilP 1 42 0 39 n 9 k 2 residual sd 10 69 R Squared 0 65 Statistics 572 Spring 2008 Simple Linear Regression January 29 2008 11 12 R Example Simple Linear Regression in R 110 100 plot xyplot cornP soilP data phos pch 16 type c p r cornP 90 80 The argument type c p r tells xyplot to plot both points and a regression line 70 60 0 5 10 15 20 25 soilP Statistics 572 Spring 2008 Simple Linear Regression January 29 2008 12 12
View Full Document