ST 311 Week 13 In Class Activity Correlation and Regression Problem 1 The National Health and Nutrition Examination Survey NHANES collected data on a random sample of almost 5000 adults One of the variables collected was the weight of the person in Kilograms Another variable collected was the height of the subject in centimeters The resulting data was used to produce this output Simple linear regression results Dependent Variable Weight Independent Variable height Weight 68 64 0 895 height Sample size 4978 R correlation coefficient 0 4425 R sq 0 1958 Estimate of error standard deviation 18 21 Parameter estimates Parameter Intercept Slope Estimate 68 64 0 895 Std Err 4 318 0 0257 Alternative 0 0 DF 4976 4976 T Stat 15 897 34 815 P Value 0 0001 0 0001 a Explain to someone who knows no statistics what the slope of this regression line tells us in the context of this problem The expected mean weight of an adult increases by 0 895 kilograms for each additional centimeter of height b If a person is 170 centimeters tall what weight would you predict for them Use correct notation and units if appropriate 68 64 0 895 170 83 51 c A person was 170 centimeters tall and weighed 80 kilograms What is the residual for this 170 80 person Observed height weight Residual Observed y predicted y 80 83 51 3 51 Predicted height weight 170 83 51 1 Page ST 311 Week 13 In Class Activity Problem 2 Last semester an instructor collected data on the weight of his students book bags in pounds and the number of books in the bag The resulting output is given below Simple linear regression results Dependent Variable weight Independent Variable books weight 7 34 1 926 books Sample size 10 R correlation coefficient 0 5361 R sq 0 28739375 Estimate of error standard deviation 3 9552412 Parameter estimates Parameter Intercept Slope Estimate 7 34 1 926 Std Err 1 7946615 1 0725154 Alternative 0 0 DF 8 8 T Stat 4 0889244 1 7962172 P Value 0 0035 0 1102 a Explain to someone who knows no statistics what the slope of this regression line tells us in the context of this problem For each additional book added to a book bag the expected weight of the book bag increases by 1 926 pounds b Explain to someone who knows no statistics what the intercept of this regression line tells us in the context of this problem The expected mean weight of an empty book bag a bag with 0 books is 7 34 pounds c Another random sample of 10 students bags gave a regression line with slope 2 069 and intercept 5 61 Explain what the intercept of this regression line tells us in the context of this problem The expected mean weight of an empty book bag a bag with 0 books is 5 61 pounds d Notice that the intercepts based on the two regression lines are different Does it mean that a regression model is inappropriate for this problem No Samples vary and so do their statistics The slope and intercept of a regression line are statistics based on sample data It is not expected that two different samples would produce the same regression equation especially with only 10 data points This does not mean the method is inappropriate 2 Page ST 311 Week 13 In Class Activity Problem 3 A clinical researcher Researcher A wants to investigate how body weight affects blood sugar in diabetic patients She randomly samples 40 diabetic patients with body weights ranging from 155 to 198 lbs and records their blood sugar Using a software she obtains the following regression line 20 023 0 252 where Y is the blood sugar in mg dL and X is the body weight in pounds The correlation coefficient is obtained as 0 89 a Suppose another researcher Researcher B takes her body weight measurements in kilograms instead of pounds on the same sample of patients What is the correlation coefficient for the new regression line Will this regression line be the same as before Note 1 kg 2 21 lbs The correlation coefficient will be 0 89 the same as Researcher A because the correlation coefficient is unitless and is not affected by unit changes The regression line will not be the same If the units of either variable change the units of the intercept same units as Y and or slope ratio of the units of Y over the units of X will also change b Now suppose there s another researcher Researcher C who wants to investigate how blood sugar affects body weight in diabetic patients using the same sample Will her correlation coefficient and regression line be different from those obtained by Researcher A The correlation coefficient will be 0 89 the same as Researcher A and B because the correlation coefficient is not affected by the variable assignment it doesn t matter which is X and which is Y It only measures the relationship between the two variables The regression line will not be the same If the variables are switched the units of the intercept will be switched same units as Y and its value will change Similarly the units of the slope ratio of the units of Y over the units of X will also change as will its value c Suppose Researcher A has two diabetic patients Bob and Fred with weights 180 and 122 lbs respectively She wants to predict their blood sugar levels Consider whether it is appropriate for her to use her regression line to make predictions for these patients Discuss your thoughts below Since the range of the data observed is from 155 to 198 lbs it would not be appropriate to use this model to predict Fred s blood sugar His weight is 122 lbs well below the range It would be acceptable to use this model for Bob s blood sugar 3 Page ST 311 Week 13 In Class Activity d Based on the regression line and which of the following scatter plots likely describes the data collected by Researcher A Select one or more 0 89 20 023 0 252 We have and So we know the association is positive We can eliminate Plot 3 The correlation is fairly strong but not perfect so we can eliminate Plot 1 It may be Plot 2 a distinct positive correlation points fairly close to the line It may be Plot 4 a clear strong linear correlation but a few outlying points that may be bringing the correlation value down Either Plot 2 or 4 is acceptable it s very difficult to know which is the case 4 Page
View Full Document