Multiple Linear Regression Bret Larget Departments of Botany and of Statistics University of Wisconsin Madison February 6 2007 Statistics 572 Spring 2007 Multiple Linear Regression February 8 2007 1 10 The Big Picture Multiple Linear Regression Most interesting questions in biology involve relationships between multiple variables There are typically multiple explanatory variables Interactions between variables can be important in understanding a process We will now study statistical models for when there is a single continuous quantitative response variable and multiple explanatory variables Explanatory variables may be quantitative or factors categorical variables Statistics 572 Spring 2007 Multiple Linear Regression February 8 2007 2 10 The Big Picture Model We extend simple linear regression to consider models of the following form yi 0 1 x1i 2 x2i k xki ei where ei iid N 0 2 for i 1 n y is the response variable x1 x2 xk are the explanatory variables Some people use the terms dependent and independent variables I do not like this terminology because the xi are often not independent ei are random errors 0 is an intercept and 1 k are slopes Statistics 572 Spring 2007 Multiple Linear Regression February 8 2007 3 10 The Big Picture Multiple Regression Objectives Inference estimation and testing on the model parameters Estimation prediction of y at x1 x2 xk Model selection Select which explanatory variables are best to include in a model Statistics 572 Spring 2007 Multiple Linear Regression February 8 2007 4 10 The Big Picture Estimation of Regression Coefficients We extend the least squares criterion from SLR Seek the parameters b0 bk that minimize n X yi b0 b1 x1i b2 x2i bk xki 2 i 1 The solution is the set of estimated coefficients 0 1 k The ith fitted value is y i 0 1 x1i k xki The ith residual is yi y i The least square criterion minimizes the sum of the squared residuals also called the sum of squares for error SSErr The estimate of the variance 2 is the mean squared error MSErr or Pn 2 i 1 yi y i 2 n k 1 Statistics 572 Spring 2007 Multiple Linear Regression February 8 2007 5 10 The Big Picture Matrix Notation for Estimates There are no simple expressions for the estimated coefficients The matrix y1 y2 y notation solution is 1 1 X x11 x12 1 x1n yn X T X concise 1 xk1 xk2 xkn 0 1 k XTy y X X X T X 1 X T y Hy The matrix H is called the hat matrix The diagonal entries are the leverages Statistics 572 Spring 2007 Multiple Linear Regression February 8 2007 6 10 The Big Picture k 2 Case The model is yi 0 1 x1i 2 x2i ei where ei iid N 0 2 for i 1 n Intercept 0 expected y when x1 0 x2 0 Slope 1 expected change in y for 1 unit increase in x1 with x2 held constant Slope 2 expected change in y for 1 unit increase in x2 with x1 held constant Statistics 572 Spring 2007 Multiple Linear Regression February 8 2007 7 10 The Big Picture Formula 1 2 P P P P2 x1i2 x 1 x2i x 2 yi y x1i x 1 yi y x2i x x2i x 2 P P 2 1i x 1 x2i x 2 x1i x 1 2 xP 2 x2i x 2 P P P yi y x1i x 1 x2i x 2 x1i x 1 P yi y x2i x 2 x1i x 1 2 P P 2 2i x 2 x1i x 1 x2i x 2 2 xP 2 x x 1 1i 0 y 1 x 1 2 x 2 Statistics 572 Spring 2007 Multiple Linear Regression February 8 2007 8 10 The Big Picture Pesticide Example A study was conducted to assess the toxic effect of a pesticide on a given species of insect The data consist of dose rate of the pesticide x1 units unknown body weight of an insect x2 grams maybe rate of toxic action y time to death in minutes maybe toxic read table toxic txt header T str toxic data frame 19 dose num weight num effect num Statistics 572 Spring 2007 obs of 3 variables 0 696 0 729 0 509 0 559 0 679 0 583 0 742 0 781 0 865 0 321 0 354 0 134 0 184 0 304 0 208 0 367 0 406 0 49 0 0 324 0 367 0 321 0 375 0 345 0 341 0 327 0 256 0 214 Multiple Linear Regression February 8 2007 9 10 The Big Picture Analysis Use R to show graphical analysis Use R to show differences in possible models to fit attach toxic fit0 lm effect 1 fit1 lm effect dose fit2 lm effect weight fit12 lm effect dose weight fit21 lm effect weight dose Statistics 572 Spring 2007 Multiple Linear Regression February 8 2007 10 10
View Full Document