DOC PREVIEW
MIT 9 07 - Correlation & Regression

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Correlation & Regression, III 9.07 4/6/2004 Review • Linear regression refers to fitting a best fit line y=a+bx to the bivariate data (x, y), where a = my–bmx b = cov(x, y)/sx2 = ssxy/ssxx • Correlation, r, is a measure of the strength and direction (positive vs. negative) of the relationship between x and y. r = cov(x, y)/(sx sy) (There are various other computational formulas, too.) Outline • Relationship between correlation and regression, along with notes on the correlation coefficient • Effect size, and the meaning of r • Other kinds of correlation coefficients • Confidence intervals on the parameters of correlation and regression Relationship between r and regression • r = cov(x, y)/(sx sy) 2• In regression, the slope, b = cov(x, y)/sx • So we could also write b = r·(sy/sx) • This means b = r when sx = sy 1Notes on the correlation coefficient, r 1. The correlation coefficient is the slope (b) of the regression line when both the X and Y variables have been converted to z-scores, i.e. when sx = sy = 1. Or more generally, when sx = sy. For a given sx and sy, the larger the size of the correlation coefficient, the steeper the slope. Invariance of r to linear transformations of x and y • A linear change in scale of either x or y will not change r. • E.G. converting height to meters and weight to kilograms will not change r. • This is just the sort of nice behavior we’d like from a measure of the strength of the relationship. – If you can predict height in inches from weight in lbs, you can just as well predict height in meters from weight in kilograms. Notes on the correlation coefficient, r 2. The correlation coefficient is invariant under linear transformations of x and/or y. • (r is the average of zx zy, and zx and z areyinvariant to linear transformations of x and/or y) How do correlations (=r) and regression differ? • While in regression the emphasis is on predicting one variable from the other, in correlation the emphasis is on the degree to which a linear model may describe the relationship between two variables. • The regression equation depends upon which variable we choose as the explanatory variable, and which as the variable we wish to predict. • The correlation equation is symmetric with respect to x and y – switch them and r stays the same. 2but regression is not y x x y )/sx 2 )/sy 2 )/(sx sy) )/(sx sy) x↔y To look out for, when calculating r: • • regression) – Correlation over a normal range 80 120 160 200 240 50 60 70 80 ( ) ( ) Correlation over a narrow range of heights 80 50 70 80 ( ) ( ) Correlation is symmetric wrt x & y, a = m –bm a = m –bmb = cov(x, y b = cov(x, yr = cov(x, y r = cov(x, yIn regression, we had to watch out for outliers and extreme points, because they could have an undue influence on the results. In correlation, the key thing to be careful of is not to artificially limit the range of your data, as this can lead to inaccurate estimates of the strength of the relationship (as well as give poor linear fits in Often it gives an underestimate of r, though not always Height inchesWeight lbs120 160 200 240 60 Height inchesWeight lbsr = 0.71 r = 0.62 3Correlation over a limited range • A limited range will often (though not always) lead to an underestimate of the strength of the association between the two variables 200 150 100 60 65 70 75 height (inches) The meaning of r • We’ve already talked about r indicating both whether the relationship between x and y is positive or negative, and the strength of the relationship • The correlation coefficient, r, also has meaning as a measure of effect size Outline • Relationship between correlation and regression, along with notes on the correlation coefficient • Effect size, and the meaning of r • Other kinds of correlation coefficients • Confidence intervals on the parameters of correlation and regression Effect size • When we talked about effect size before, it was in the context of a two-sample hypothesis test for a difference in the mean. • If there were a significant difference, we decided it was likely there was a real systematic difference between the two samples. • Measures of effect size attempt to get at how big is this systematic effect, in an attempt to begin to answer the question “how important is it?” 4Effect size & regression • In the case of linear regression, the systematic effect refers to the linear relationship between x and y • A measure of effect size should get at how important (how strong) this relationship is – The fact that we’re talking about strength of relationship should be a hint that effect size will have something to do with r Predicting the value of y • If x is correlated with y, then the situation might look like this: X The meaning of r and effect size • When we talked about two-sample tests, one particularly useful measure of effect size was the proportion of the variance in y accounted for by knowing x • (You might want to review this, to see the similarity to the development on the following slides) • The reasoning went something like this, where here it’s been adapted to the case of linear regression: Predicting the value of y • Suppose I pick a random individual from this scatter plot, don’t tell you which, and ask you to estimate y for that individual. It would be hard to guess! The best you could probably hope for is to guess the mean of all the y values (at least your error would be 0 on average) X 56 How far off would your guess be? X Predicting the value of y when you know x • Now suppose that I told you the value of x, and again asked you to predict y. • This would be somewhat easier, because you could use regression to predict a good guess for y, given x. Predicting the value of y when you know x • Your best guess is y’, the predicted value of y given x. • (Recall that regression attempts to fit the best fit line through the average y for each x. So the best guess is still a mean, but it’s the mean y given x.) X How far off would your guess be, now? • The variance about the mean score for that value of x, gives a measure of your uncertainty. • Under the assumption of homoscedasticity, that measure of uncertainty is sy’ 2, where sy’ is the rms error = sqrt(Σ(yi –yi’)2/N) • y 2, sy The variance about the mean y score, sgives a measure of your uncertainty about the y


View Full Document

MIT 9 07 - Correlation & Regression

Download Correlation & Regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Correlation & Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Correlation & Regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?