Unformatted text preview:

Section 3 3 Regression Let s look at a scatterplot showing the number of grams of protein x and the number of grams of fat y in 28 Burger King menu items Scatterplot of Fat vs Protein 70 60 Fat 50 40 30 20 10 10 20 30 40 50 60 Protein What type of association if any do we see positive linear association What does this mean BK menu items which are above average in protein grams tend to be above average in fat grams and those that are below average in protein grams tend to be below average in fat grams How strong an association is it try to estimate the correlation coefficient r Here r 805 We notice considerable variation in the number of fat grams in a BK menu item and we think that at least a part of that variation is explained by a variation in protein grams Why only part Let s see Example Pizza Cost A basic pizza cheese only costs 10 and each additional topping costs 2 If we draw a graph of how total pizza cost varies with the number of toppings we notice that every point lies on the same line and that any variation in total cost is completely explained by a variation in the number of toppings But the points in our BK scatterplot do not fall on the same line Based on our scatterplot we believe that we should be able to predict approximately the number of grams of fat in a BK menu item by using the number of grams of protein that it has To do this we use a linear equation called a linear model to approximate the data But there are many different straight lines that we could draw through the cluster of data points How do we pick a best line The method we use to do this is called the least squares criterion and it is based on an analysis of the errors that we will inevitably have in using a straight line to model the data points Our line may not go through many or even any of the data points but we want it to be close to many of them Let s see how this might work Consider the following scatterplot with our model being the line Note The over the y indicates always a value predicted from a model as opposed to an actual data value y To measure quantitatively how well the line fits the data we consider the errors e made in using the line the model to predict the actual y values of the data points These errors are called residuals and are calculated by actual value predicted value If the model overestimates then and e is negative If the model underestimates then and e is positive To assess how well our line has done can we simply add up the values of e No because as the following table shows large positive and large negative errors can cancel concealing by how much the line really missed the data points x 1 1 2 4 y 1 2 2 6 1 75 1 75 3 5 5 e 0 75 0 25 1 0 50 0 5625 0 0625 1 0 25 So what can we do Just what we did with the standard deviation we square the differences We want the line that has the smallest possible sum of squared errors the least squares line So the question becomes How do we find the equation of this least squares line Answer Minitab or our TI83 84 We do know that because it is a line it will have the form Recall that a the constant term represents the y intercept and that b the coefficient of x represents the slope of the line Example BK menu items a Find the equation of the regression Minitab b Interpret the slope and the y intercept in the context of the problem c Use the regression model to predict the number of fat grams in a BK menu item with 27 protein grams Answer a To make our work easier to understand we often write what x and y represent as well as the units for each one Now everyone will know that we are predicting fat grams from protein grams b slope 1 100 The slope tells us how much changes for each additional unit of x The number of fat grams in a BK menu item is predicted to increase by 1 1 grams for each additional gram of protein The units for the slope are always y intercept 2 043 This means that the line passes crosses the vertical axis at the point 0 2 043 This would seem to say that a BK menu item with 0 grams of protein should have a negative number of fat grams clearly an absurdity Here there is no reasonable interpretation of the y intercept within the context of the problem so we can just think of it as a starting point for the regression line c When x 27 grams of protein then or about 27 7 grams of fat Assessing how good our model is for making predictions We know that there is variation in the observed values of y the number of fat grams and we believe that a part of that variation is explained by variation in x the number of protein grams that is by the linear model But exactly what part of the variation of in the y values is explained If 100 of the variation in the y values is explained by the model then all the points lie on our line and the model is perfect Recall the pizza toppings example If 0 is explained then the model is worthless The number that gives the percentage of the variation in the y values explained by the model is called the coefficient of determination and is denoted by r2 or Minitab R2 and yes it is the square of the correlation coefficient r Because r is always between 1 and 1 r2 is always between 0 and 1 0 and 100 A value of r2 near 1 means a very useful model for making predictions and a value of r2 near 0 means a nearly worthless model In our BK menu example r2 as reported by Minitab was 0 648 Thus our model was reasonable explaining 64 8 of the variation in fat grams in BK menu items as being due to a variation in protein grams in those items Example Use of TI 83 84 in regression The ages and values of 11 Orion used cars are given in the following table X Age yrs 5 4 6 5 5 5 6 6 2 7 7 y Price 8500 10300 7000 8200 8900 9800 6600 9500 16900 7000 4800 We want to use age x in years to predict value y in dollars of used Orions First we need to make sure that our data has a linear association so we want to enter it in our calculator and look at a scatterplot To enter the data push …


View Full Document

APSU MATH 1530 - Regression

Loading Unlocking...
Login

Join to view Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Regression and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?