Chapter 15Thought Question 2Thought Question 3Linear RegressionLeast Squares RegressionPrediction via Regression LineCoefficient of Determination (R2)Income versus AssetsA Caution Beware of ExtrapolationSlide 12Correlation Does Not Imply CausationEvidence of CausationReasons Two Variables May Be Related (Correlated)Explanatory causes ResponseResponse causes ExplanatoryExplanatory is not Sole ContributorConfounding VariablesCommon Response (both variables change due to common cause)Both Variables are Changing Over TimeThe Relationship May Be Just a CoincidenceCoincidence (?)Case StudyKey ConceptsCautions about Correlation and RegressionSlide 27A few explanations for an observed associationChapter 15 1Chapter 15Describing Relationships: Regression, Prediction, and CausationChapter 15 3Thought Question 2From past natural disasters, a strong positive correlation has been found between the amount of aid sent and the number of deaths. Would you interpret this to mean that sending more aid causes more people to die? Explain.Chapter 15 4Thought Question 3Studies have shown a negative correlation between the amount of food consumed that is rich in beta carotene and the incidence of lung cancer in adults. Does this correlation provide evidence that beta carotene is a contributing factor in the prevention of lung cancer? Explain.Chapter 15 6Linear RegressionObjective: To quantify the linear relationship between an explanatory variable and a response variable. We can then predict the average response for all subjects with a given value of the explanatory variable.Regression equation: y = a + bx–x is the value of the explanatory variable–y is the average value of the response variable–note that a and b are just the intercept and slope of a straight line–note that r and b are not the same thing, but their signs will agreePlotChapter 15 7Least Squares RegressionUsed to determine the “best” lineWe want the line to be as close as possible to the data points in the vertical (y) direction (since that is what we are trying to predict)Least Squares: use the line that minimizes the sum of the squares of the vertical distances of the data points from the lineClick for Graphical ExplanationChapter 15 8Prediction via Regression LineHand, et.al., A Handbook of Small Data Sets, London: Chapman and HallThe regression equation is y = 3.6 + 0.97x–y is the average age of all husbands who have wives of age xFor all women aged 30, we predict the average husband age to be 32.7 years:3.6 + (0.97)(30) = 32.7 yearsSuppose we know that an individual wife’s age is 30. What would we predict her husband’s age to be? How old is her husband? Husband and Wife: AgesChapter 15 9Coefficient of Determination (R2)Measures usefulness of regression predictionR2 (or r2, the square of the correlation): measures the percentage of the variation in the values of the response variable (y) that is explained by the regression liner=1: R2=1: regression line explains all (100%) ofthe variation in yr=.7: R2=.49: regression line explains almost half(50%) of the variation in yChapter 14 10Income versus AssetsIncome =a + bAssetsAssets vary from 3.4 billion to 49 billionIncome varies from bank to bank, even among those with similar assetsStatistical relationshipChapter 15 11A CautionBeware of ExtrapolationSarah’s height was plotted against her ageCan you predict her height at age 42 months?Can you predict her height at age 30 years (360 months)?Chapter 15 12A CautionBeware of ExtrapolationRegression line:y = 71.95 + .383 xheight at age 42 months? y = 88 cm.height at age 30 years? y = 209.8 cm.–She is predicted to be 6' 10.5" at age 30.709011013015017019021030 90 150 210 270 330 390age (months)height (cm)Chapter 15 13Correlation Does Not Imply CausationEven very strong correlations may not correspond to a real causal relationship.Click for Graphical ExplanationChapter 15 14Evidence of CausationA properly conducted experiment establishes the connectionOther considerations:–A reasonable explanation for a cause and effect exists–The connection happens in repeated trials –The connection happens under varying conditions–Potential confounding factors are ruled out–Alleged cause precedes the effect in timeChapter 15 15Reasons Two Variables May Be Related (Correlated)Explanatory variable causes change in response variableResponse variable causes change in explanatory variableExplanatory may have some cause, but is not the sole cause of changes in the response variableConfounding variables may existBoth variables may result from a common cause–such as, both variables changing over timeThe correlation may be merely a coincidenceChapter 15 16Explanatory causes ResponseExplanatory: pollen count from grassesResponse: percentage of people suffering from allergy symptomsExplanatory: amount of food eatenResponse: hunger levelChapter 15 17Response causes ExplanatoryExplanatory: Hotel advertising dollarsResponse: Occupancy ratePositive correlation? – more advertising leads to increased occupancy rate?Actual correlation is negative: lower occupancy leads to more advertisingChapter 15 18Explanatory is notSole Contributorbarbecued foods are known to contain carcinogens, but other lifestyle choices may also contributeExplanatory: Consumption of barbecued foodsResponse: Incidence of stomach cancerChapter 15 19Confounding VariablesExplanatory: MeditationResponse: Aging (measurable aging factor) general concern for one’s well being may be confounded with decision to try meditationMeditation vs. AgingChapter 15 20Common Response(both variables change due to common cause)Both may result from an unhappy marriage.Explanatory: Divorce among menResponse: Percent abusing alcoholChapter 15 21Both Variables are Changing Over TimeBoth divorces and suicides have increased dramatically since 1900.Are divorces causing suicides?Are suicides causing divorces???The population has increased dramatically since 1900 (causing both to increase).Better to investigate: Has the rate of divorce or the rate of suicide changed over time?Chapter 15 22The Relationship May Be Just a CoincidenceWe will see some strong correlations (or apparent associations) just by chance, even when the variables are not related in the populationChapter 15 23A required whooping cough
View Full Document