5/18/11 Lecture 7 1 STOR 155 Introductory Statistics Lecture 7: Scatterplots, Correlation The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL5/18/11 Lecture 7 2 Relationships between Quantitative Variables • Chapter 1 talks about distribution of one variable. • Often we are interested in the relationships between two quantitative variables. • Examples: – Heights of parents and children – High school GPA and college GPA – Stock returns of two different corporations5/18/11 Lecture 7 3 Chapter 2: Looking at Data – Relationships • Association (statistical dependence) • Response and explanatory variables • Scatterplots • Correlation5/18/11 Lecture 7 4 Association between Variables • Two variables are associated (dependent) if some values of one variable tend to occur more often along with some values of the second variable.5/18/11 Lecture 7 5 Positive vs Negative Association5/18/11 Lecture 7 6 Examples • weight (in kilogram) and height (in centimeter) • An insurance company reports that heavier cars have less fatal accidents per 10,000 vehicles than lighter cars do. • A medical study finds that short women are more likely to have heart attacks than women of average height, while tall women have even fewer heart attacks. Note: no explanation yet, just findings …5/18/11 Lecture 7 7 Response and Explanatory Variables • A response variable (or dependent variable) measures an outcome of a study • An explanatory variable (or independent variable) explains or causes changes in the response variable. • In most cases, we set values of one variable to see how it affects another variable. – biological, chemical experiments • Not always causal relation! – SAT scores vs college grades5/18/11 Lecture 7 8 Scatterplots • Two-dimensional plot, with one variable’s values plotted along the vertical axis and the other along the horizontal axis. – X-axis: explanatory – Y-axis: response • Display the general relationship between two quantitative variables graphically. • Two variables measured on the same ``individuals’’.5/18/11 Lecture 7 9 Example • A statistician wanted to purchase a house in a neighborhood. He decided to develop a model to predict the selling price of a house. • He took a random sample of 100 houses that recently sold and recorded the selling price, the number of bedrooms, and the size (in square foot) for each.5/18/11 Lecture 7 10 80000100000120000140000160000180000200000220000240000Price1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400House SizeBiv ariate Fit of Price By H Size5/18/11 Lecture 7 11 Examining a Scatterplot • Overall Pattern – Direction: positive or negative association – Form: linear or nonlinear (eg. curved or clustered) – Strength: strong if there is very little deviation from the trend • Deviation – Deviation in form or direction – Outliers5/18/11 Lecture 7 12 Typical Patterns of Scatterplots No relationship Negative nonlinear relationship This is a weak linear relationship. A non linear relationship seems to fit the data better. Nonlinear (concave) relationship Positive linear relationship Negative linear relationship5/18/11 Lecture 7 135/18/11 Lecture 7 14 Categorical variable in a scatterplot5/18/11 Lecture 7 15 80000100000120000140000160000180000200000220000240000Price1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400House SizeBiv ariate Fit of Price By H Size5/18/11 Lecture 7 16 Relationship between Categorical and Quantitative Variables • Back-to-back stemplots: two categories • Side-by-side boxplots: any number of categories5/18/11 Lecture 7 17 Review: Scatterplot • The plot shows relationship between two quantitative variables. • It plots observations of different individuals in a two-dimensional graph. • Each point in a scatterplot corresponds to two variables of the same individual. • Just graphical, not numerical.5/18/11 Lecture 7 18 Which one shows a stronger relationship?5/18/11 Lecture 7 19 Correlation • A quantity used to measure the direction and strength of the linear relationship between two quantitative variables. • Often written as r ))((11yixinsyysxxr5/18/11 Lecture 7 20 Example • A car dealer wants to find the relationship between the odometer reading and the selling price of used cars. • A random sample of 100 cars is selected, and the data are summarized as follows. • Find the correlation • r = - 0.806 4.5411,5.009,36,100 yxn.256,356,1))((11,9.254,6.6597 yyxxnssiiyx5/18/11 Lecture 7 21 Properties of r 11 r• Positive r indicates positive linear association. • Negative r indicates negative linear association. • The closer that r moves toward 1 or –1, the stronger the linear association is. • r = -1 or 1 occurs only when the points in the scatterplot lie along a straight line.5/18/11 Lecture 7 22 Properties of r • It measures only the linear relationship between two quantitative variables. • Invariant to the order of the variables • Invariant to rescaling (why?) • Unit-free • Sensitive to outliers • Question:True or False? r = 0 means there is no association between two variables.5/18/11 Lecture 7 23 Different correlations5/18/11 Lecture 7 24 Blunders about Correlation (why ?) • There is a high correlation between the gender of American workers and their income. • We found a high (r =1.09) between students’ ratings of faculty teaching and ratings made by other faculty members. • The correlation between planting rate and yield of corn was found to be r = 0.23 bushel.5/18/11 Lecture 7 25 Take Home Message • Association (dependence) • Scatterplot • Correlation • Properties of
View Full Document