DOC PREVIEW
UNC-Chapel Hill STOR 155 - Lecture 9 - Cautions about Regression and Correlation, Causation

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

2/10/11 Lecture 9 1 STOR 155 Introductory Statistics Lecture 9: Cautions about Regression and Correlation, Causation The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL2/10/11 Lecture 9 2 Review • Least-Squares Regression Lines • Equation and interpretation of the line • Prediction using the line • Correlation and Regression • Coefficient of Determination2/10/11 Lecture 9 3 Regression Diagnostics • Look at residuals (errors): – A residual is the difference between an observed value of the response variable and the value predicted by the regression line, i.e., – The sum of the least-squares residuals is always zero. .ˆresidual yy Why?2/10/11 Lecture 9 4 Residual Plots • A residual plot is a scatterplot of the regression residuals against the explanatory variable. • Residual plots help us assess the fit of a regression line.2/10/11 Lecture 9 5 Age vs Height2/10/11 Lecture 9 6 Residual Plot • If the regression line catches the overall pattern of the data, there should be no pattern in the residual. totally random2/10/11 Lecture 9 7 nonlinear nonconstant variation2/10/11 Lecture 9 8 Diabetes Patient: FPG vs HbA • FPG: fasting plasma glucose. • HbA: percent of red blood cells that have a glucose molecule attached. • Both are measuring blood glucose. • We expect a positive association. • 18 subjects, r = 0.4819. • See the scatterplot on the next page.2/10/11 Lecture 9 9 Diabetes Patient: FPG vs HbA2/10/11 Lecture 9 10 Outliers and Influential Observations • An outlier is a point that lies outside the overall pattern of the other points. – Outliers in the y direction have large residuals, but other outliers may not. • An influential obs. is a point that the regression line would be significantly changed with or without it. – Outliers in the x direction are often influential points. – But not always…2/10/11 Lecture 9 11 Diabetes Patient: FPG vs HbA2/10/11 Lecture 9 12 • Outliers in the y direction can be spotted from the residual plot. • Influential points can be identified by fitting regression lines with/without those points. More serious. – Can not be identified via residual plot. – Scatterplot gives us some hint. Outliers & Influential Obs.2/10/11 Lecture 9 13 Cautions about correlation and regression • Linear only • DO NOT extrapolate • Not resistant • Beware lurking variables • Beware correlations based on averaged data • The restricted-range problem2/10/11 Lecture 9 14 Lurking Variable • A lurking (hidden) variable is a variable that has an important effect on the relationship among the variables in a study, but is not included among the variables being studied. • Examples: – SAT scores and college grades • Lurking variable: IQ2/10/11 Lecture 9 15 Lurking variables can create nonsense correlations. • For the world’s nations, let x be the number of TVs/person and y be the average life expectancy; • A high positive correlation – nations with more TV sets have higher life expectancies. – Could we lengthen the lives of people in Rwanda by shipping them more TVs? • Lurking variable: wealth of the nation – Rich nations: more TV sets. – Rich nations: longer life expectancies because of better nutrition, clean water, and better health care. • There is no cause-and-effect tie between TV sets and length of life. • Association vs causation.2/10/11 Lecture 9 16 Misleading correlation (two clusters)2/10/11 Lecture 9 17 Beware correlations based on averaged data • A correlation based on averages over many individuals is usually higher than the correlation between the same variables based on data for individuals. • Age vs Height • (Basketball) score % vs practice time2/10/11 Lecture 9 18 The restricted-range problem • A restricted-range problem occurs when one does not get to observe the full range of the variables. • When data suffer from restricted range, r and r2 are lower than they would be if the full range could be observed. • SAT scores vs College GPA – Princeton vs Generic State College (Ex 2.26)2/10/11 Lecture 9 19 Causation vs Association • Some studies want to find the existence of causation. • Example of causation: – Increased drinking of alcohol causes a decrease in coordination. – Smoking and Lung Cancer. • Example of association: – The above two examples. – SAT scores and Freshman year GPA.2/10/11 Lecture 9 20 Association does not imply causation. • An association between two variables x and y can reflect many types of relationship among x, y, and one or more lurking variables. • An association between a predictor x and a response y, even if it is very strong, is not by itself good evidence that changes in x actually cause changes in y.2/10/11 Lecture 9 21 Explaining Association2/10/11 Lecture 9 22 Explaining Association: Causation • Cause-and-effect • Examples – Amount of fertilizer and yield of corn – Weight of a car and its MPG – Dosage of a drug and the survival rate of the mice2/10/11 Lecture 9 23 Explaining Association: Common Response • Lurking variables • Both x and y change in response to changes in z, the lurking variable • There may not be direct causal link between x and y. • Examples: – SAT scores vs College GPA (IQ, Attitude) – Monthly flow of money into stock mutual funds vs rate of return for the stock market (Market Condition, Investor Attitude)2/10/11 Lecture 9 24 Explaining Association: Confounding • Two variables are confounded when their effects on a response variable are mixed together. • One explanatory variable may be confounded with other explanatory variables or lurking variables. • Examples: – More education leads to higher income. • Family background… – Religious people live longer. • Life style…2/10/11 Lecture 9 25 Establishing causation • The only compelling method: Designed experiment (More in Chapter 3) • Hot disputes: – Does gun control reduce violent crime? – Does meat consumption in your diet cause heart diseases? – Does smoking cause lung cancer?2/10/11 Lecture 9 26 Does smoking CAUSE lung cancer? • causation: smoking causes lung cancer. • common response: people who have a genetic predisposition to lung cancer also have a genetic predisposition to smoking. • confounding: people who drink too much, don't exercise, eat unhealthy foods, etc. are more likely to get lung


View Full Document

UNC-Chapel Hill STOR 155 - Lecture 9 - Cautions about Regression and Correlation, Causation

Documents in this Course
Exam 1

Exam 1

2 pages

Load more
Download Lecture 9 - Cautions about Regression and Correlation, Causation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 9 - Cautions about Regression and Correlation, Causation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 9 - Cautions about Regression and Correlation, Causation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?