DOC PREVIEW
MIT 9 07 - Correlation & Regression

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Correlation & Regression, I 9.07 4/1/2004 Regression and correlation • • Y, Z, …) Involve bivariate, paired data, X & Y – Height & weight measured for the same individual – IQ & exam scores for each individual – Height of mother paired with height of daughter Sometimes more than two variables (W, X, Regression & correlation • – – – • Regression vs. correlation • rule • Concerned with the questions: Does a statistical relationship exist between X & Y, which allows some predictability of one of the variables from the other? How strong is the apparent relationship, in the sense of predictive ability? Can a simple linear rule be used to predict one variable from the other, and if so how good is this rule? E.G. Y = 5X + 6 Regression: – Predicting Y from X (or X from Y) by a linear Correlation: – How good is this relationship? 1• pair. • • Scatter plot: height vs. weight 40 45 50 55 60 65 70 75 80 140 170 190 200 (i (iRegression line First tool: scatter plot For each pair of points, plot one member of a pair against the corresponding other member of that In an experimental study, convention is to plot the independent variable on the x-axis, the dependent on the y-axis. Often we are describing the results of observational or “correlational” studies, in which case it doesn’t matter which variable is on which axis. 150 160 180 210 Weight lbs) He ght nches) 2nd tool: find the regression line • • – – technique isn’t appropriate) • 40 45 50 55 60 65 70 75 80 140 170 190 200 ( (We attempt to predict the values of y from the values of x, by fitting a straight line to the data The data probably doesn’t fit on a straight line Scatter The relationship between x & y may not quite be linear (or it could be far from linear, in which case this The regression line is like a perfect version of what the linear relationship in the data would look like 150 160 180 210 Weight lbs) Height inches) 2How do we find the regression line that best fits the data? • We don’t just sketch in something that looks good • • • fit,” find the equation of the best fit line Straight Line • • • x=0) x y a bFirst, recall the equation for a line. Next, what do we mean by “best fit”? Finally, based upon that definition of “best General formula for any line is y=bx+a b is the slope of the line a is the intercept (i.e., the value of y when “best fit” mean? i iyi i i • • ∑(yi –yi’)2 • i i • Minimizing sum of squared errors X Y yi yi ’ yi –yi ’ Least-squares regression: What does •If yis the true value of y paired with x , let ’ = our prediction of y from xWe want to minimize the error in our prediction of y over the full range of x We’ll do this by minimizing sse = Express the formula as y ’=a+bxWe want to find the values of a and b that give us the least squared error, sse, thus this is called “least-squares” regression 3For fun, we’re going to derive the equations for the best-fit a and b • A different form of the variance • • µx)2 2 µx + µx 2) 2µx 2 + µx 2 2) – µx 2 = Σ xi 2/N – (Σ xi)2/N2 = (Σ xi 2 –(Σ xi)2• estimate The covariance • not independent (m1 –m2) = σ1 2/n1 + σ2 2/n2 –2 cov(m1, m2) The covariance • • ) µxµy)] • • cov(x, y) µxµy)] µy -yµx + µx µy) µx µy – µx µy + µx µy µx µy But first, some preliminary work: – Other forms of the variance – And the definition of covariance Recall: var(x) = E(x-= E(x –2x= E(x ) – 2= E(x/N) / N You may recognize this equation from the practise midterm (where it may have confused you). N-1 for unbiased We talked briefly about covariance a few lectures ago, when we talked about the variance of the difference of two random variables, when the random variables are •varThe covariance is a measure of how the x varies with y (co-variance = “varies with”) cov(x, y = E[(x-)(y-var(x) = cov(x, x) Using algebra like that from two slides ago, we get an alternate form: = E[(x-) (y-= E(xy – x= E(xy) – = E(xy) – 4OK, deriving the equations for a and b •yi’ = a + bxi • sse = ∑(yi –yi’)2 = ∑(yi i)2 • equation, we need to take derivatives and set them to zero. Derivative with respect to a 0)(2))(( 2 =−−−−∂ ∂ ∑ ∑ iiii bxaybxaya 0=−−⇒ ∑ ∑ ii xbaNy N xbN ya ii ∑∑ −=⇒ xbya −=⇒ We want the a and b that minimize –a –bxRecall from calculus that to minimize this − = This is the equation for a, however it’s still in terms of b. Derivative with respect to b 0)(2))(( 2 =−−−−∂ ∂ ∑ ∑ iiiii xbxaybxayb ∑ ∑∑ =−−−⇒ 0)( 2 ii xbxyyx ∑ ∑∑∑ =−+−⇒ 0)(11 2 iii xxxN b xyNyxN ∑ ∑ −=−⇒ )1(1 22 xxNbyxN i 2/) xsyxb =⇒ • )/sx 2 y x (x = mx over a letter, so we’ll go back to our old notation) • xx = Σ(xi –mx)2 ssyy = Σ(yi –my)2 ssxy = Σ(xi –mx)(yi –my) xy / ssxx − = i i x b i i y x i i , cov( Least-squares regression equations b = cov(x, y•a = m–b mPowerpoint doesn’t make it easy to create a bar Alternative notation: ss = “sum of squares” let ssthen b = ss56A typical question• Can we predict the weight of a student if we are given their height?• We need to create a regression equation relating the outcome variable, weight, to the explanatory variable, height. • Start with the obligatory scatterplotExample: predicting weight from height60 8462 9564 14066 15568 11970 17572 14574 19776 150First, plot a scatter plot, and see if therelationship seems even remotely linear:5010015020025060 65 70 75height (inches)weight (lbs)Looks ok.yixiSteps for computing the regression equation• Compute mxand my• Compute (xi–mx) and (yi–my)• Compute (xi–mx)2and (xi–mx)(yi–my)• Compute ssxxand ssxy•b=ssxy/ssxx•a=my-bmxExample: predicting weight from height60 84 -8 -56 64 3136 44862 95 -6 -45 36 2025 27064 140 -4 0 16 0 066 155 -2 15 4 225 -3068 119 0 -21 0 441 070 175 2 35 4 1225 7072 145 4 5 16 25 2074 197 6 57 36 3249 34276 150 8 10 64 100 80(xi-mx) (yi-my)(yi-my)2(xi-mx)2(yi-my)(xi-mx)yixiSum=612 1260mx=68 my=140ssxx=240 ssyy=10426 ssxy=1200b = ssxy/ssxx= 1200/240 = 5; a = my–bmx= 140-5(68) = -200Example: predicting weight from height 60 84 3136 448 62 95 2025 270 64 140 0 0 0 66 155 15 4 225 68 119 0 0 441 0 70 175 2 4 1225 70 72 145 4 5 16 25 20 74 197 6 3249 342 76 150 8 100 80 (xi-mx)


View Full Document

MIT 9 07 - Correlation & Regression

Download Correlation & Regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Correlation & Regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Correlation & Regression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?