DOC PREVIEW
UGA STAT 4210 - Chapter 12

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Chapter 12 – Simple Linear Regression (Association between Quantitative Variables)Goals:- What is a “regression model”?- interpret meaning of α and β in the regression model- understand the difference between the regression and the prediction models- explain regression toward the mean- recognize the relationship between r and b- interpret the r2 value- identify the assumptions for regression inference- test hypothesis of for statistical linear independence of X and Y- compute and interpret confidence interval for mean response at a given value of the predictor- compute and interpret a prediction interval for a single response at a given value of the predictor- understand the role of residuals in regression analysis- use ANOVA to determine if the linear association is statistically significantWaaaay back in Chapter 3, we talked about regression as a bivariate descriptive method.That is, given two measurements from a sample, we could describe a linear relationship betweenthose variables using regression. Correlation was, as a complement, a description of the strength of that linear association.But it was, again, all descriptive. We made no assessment or claims of significance of the relationship. We made no inference about the truth of the relationship in the population; it all pertained to the sample.“But, Megan”, you say. “Statistical inference is about drawing inference from the sample information to the population, generally. Can’t we do that with regression?” Yes, we can! And using inferential methods we are well familiar with.Review:ANOVA to compare means:- categorical predictors (factors)- quantitative responseIf there is a difference in mean response across different levels of the predictor, there was an association: the response (y) depended on the level of the factor (x).That’s what we are trying to determine with regression, only now our predictor isn’t categorical: the levels have quantitative meaning: they are on an interval or ratio scale. The order of the predictor means something.§12.1 Model How Two Variables are RelatedJust as we created boxplots for our one-way ANOVA data before, when dealing with bivariate quantitative data we must also plot the data, using a scatterplot. This will give us a good idea about whether a linear model is appropriate.If a linear trend, or association, between the predictor and response variable appears to exist, we can construct a regression line.Recall from Chapter 3:^y=a+bxThis is our regression line, or our prediction equation.^y is predicted responsea is intercept (predicted response when x = 0)b is slope (change in predicted response for 1 unit increase in x)x is the predictorWe plot the data first to look for anomalies:- nonlinearity- outliers (unusual responses for their given predictor value)- leverage points (unusual predictors)- clustering- heteroscedasticity (non-constant variance in the response along the range of the predictor)Even if we don’t have the anomalies, we won’t have perfect prediction not all observations will fall exactly on our line. That is, there is variability around the regression line.Say we have an observation (xi, yi). We can predict a response for that observation using the line to get (xi,^yi). It is entirely possible to have another observation with the same value for the predictor but a different response (e.g., all puppies from the same litter don’t weigh the same amount, even though everything about the mom is identical).The difference between our actual, observed response and the predicted response for a value, x, based on the prediction equation can be determined usingei= yi−^yiThis difference, ei, is called a residual.Definition: a residual is the prediction error for a given observation: it is the difference between an observed response and predicted response for a given observation (x, y).Some things to know about residuals:- all observations have a residual (even if that residual is 0)- smaller residuals  less prediction error  better prediction-∑ei=´e=0-∑(y−^y)2=∑(ei)2 = sum of squared residuals, or sum of squared error (we will see this again later)The regression line is the best fit line because it has the smallest ∑(ei)2 of all possible lines. Itis, therefore, called the least squares line. The methods in this chapter and the next are called “Least Squares Regression” for this reason.When we build a regression equation, what we are really doing is estimating the mean (or expected) response at a given value of the predictor. That is, the response at a given value x varies, even in the real world: what we want to know is: what is the long-run average response for that value of the predictor? an expected value a meanThe regression line ^y=a+bx is a sample estimate of what we think goes on in the real world; it describes the relationship between x and y based on our sample, observed data.Our model for what we believe is true of the relationship in the population isμ(y|x)=α+βx +ϵwhere:μ(y|x)is the mean response for a given value of the predictor (a parameter)α is the true interceptβ is the true slopeϵ is error, or variation in the population around the mean responseWe use our sample data and equation to estimate our population model, the same way we have all along:a is a point estimate for αb is a point estimate for βNB: “All models are wrong. Some are useful”. – George BoxWhen testing our regression model, we aren’t testing whether our model is correct, we are really testing how useful it is, or how well it fits our data. We know it isn’t an exact description of reality, but we want to know if it’s a reasonable approximation.§ 12.2 Describe Strength of Associationa parameter!We use the correlation coefficient, r, to describe the strength and direction of the linear association between two quantitative variables.Recall, the correlation coefficient has some defining characteristics:1. -1 ≤ r ≤ 1- r < 0 implies a negative linear relationship- r = 0 implies no linear relationship- r > 0 implies a positive linear relationship2. The closer r is to 1 or -1, the stronger the linear relationship.3.rxy=ryx4. r has the same sign (direction) as b (the slope)5. r is scale invariantr=1n−1∑(zxzy)=1n−1∑(x−´xsx)(y− ´ysy)Although r only tells us about the strength of linear association, and not the nature of the relationship itself, it can help get us there:b=rsysxa= ´y−b ´xWe can also see that, if


View Full Document

UGA STAT 4210 - Chapter 12

Documents in this Course
Load more
Download Chapter 12
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Chapter 12 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Chapter 12 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?