Stat 501 Sept. 8 Analyzing Residuals (related to pages 100-108 in the book)Recall that for an individual observation, a residual iiyˆy is the difference between actual and predicted y-valuesResidual analysis is an important part of regression modeling. By examining residuals, we can learn whether there are notable outliers, whether we’ve used a suitable model for how the mean E(y) relates to x is suitable, and whether the variance of is unrelated to values of x.:One important way to analyze residuals is to plot the residuals against the predicted values (fits).In simple regression, this is equivalent to plotting residuals against values of x. Concepts to consider when interpreting this plot. 1. The “ideal” is a horizontal random looking band of points. This means all is well with the regression – the residuals appear to be random. 2. A curved pattern indicates we should search for a new equation for E(y). 3. A cone-shaped pattern indicates that the variance of y is not the same for all values of x. 4. Extreme points in the vertical direction (residuals) indicate outliers. .To start. On the web go to www.stat.psu.edu/~rho/stat501fa04/. Click on the link for the labs and assignments datasets, and then click on the link for the dataset sept8lab.MTW. This will open Minitab with the data worksheet in place. The worksheet includes data for four separate situations. Activity 1. Columns c1 and c2 give data for the number of letters printed in a specified time done with the dominant hand (dom) and at a different time with the non-dominant hand (nondom). Sixty-three individuals participated. Use Stat>Regression>Fitted Line Plot. Make dom be the y-variable and nondom be the x-variable. Click on Graphs and request a graph of residuals versus fits. Then click OK enough times to make something happen. ANSWER THE QUESTIONS ON THE ATTACHED SHEET. Activity 2. Columns c4 and c5 give heights and foot lengths (both in cm) for 33 males. Use Stat>Regression>Fitted Line Plot. Make foot be the y-variable and height be the x-variable. Request a graph of residuals versus fits. ANSWER THE QUESTIONS ON THE ATTACHED SHEET.Activity 3. Column c7 gives the concentration of an element within a chemical solution, while c8 gives the time (hours) since the solution was created. Do a regression with y = Concen and x = Time. Request agraph of residuals versus fits. ANSWER THE QUESTIONS ON THE ATTACHED SHEET.Activity 4. c10 = SalePrice of recently sold homes, while c11 = SqrFeet= size of the home in square feet. Do a regression with y = SalePrice and x =SqrFeet. Request a graph of residuals versus fits. ANSWER THE QUESTIONS ON THE ATTACHED SHEET.Activity 5. Return to activity 3 about c7 = y = chemical concentration and c8 = x = Time. In the Stat>Regression>Fitted Line Plot, use the Options button, and request an analysis of the log base ten of y. Again, also get a plot of residuals versus fits. ANSWER THE QUESTIONS ON THE ATTACHED SHEET.Stat 501 Sept. 8 Name _________________________________ e-mail ID ______________Activity 1 (dom and nondom)a. What is the value of R2 for this regression? Write a sentence that interprets this value in the context of this problem. b. Briefly explain why the two graphs created for this activity are evidence that there seem to be no difficulties in using a simple linear regression in this situation.Activity 2 (foot and height) Explain what difficulty is indicated by plot of residuals. How do you think this difficulty affects the regression results? Activity 3 (Concen and Time) Explain what difficulty with the regression model is indicated by the twodifferent plots created. Activity 4 (SalePrice and SqrFeet) Explain why the plot of residuals is evidence that the variance of sale price is not the same for all home sizes. Activity 5 (Log of concentration and Time) Explain whether you think the model with the log of concentration is suitable. Base your answer on the two plots
View Full Document