DOC PREVIEW
UNC-Chapel Hill STOR 155 - Lecture 7 - Scatterplots, Correlation

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

5/18/11 Lecture 7 1 STOR 155 Introductory Statistics Lecture 7: Scatterplots, Correlation The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL5/18/11 Lecture 7 2 Relationships between Quantitative Variables • Chapter 1 talks about distribution of one variable. • Often we are interested in the relationships between two quantitative variables. • Examples: – Heights of parents and children – High school GPA and college GPA – Stock returns of two different corporations5/18/11 Lecture 7 3 Chapter 2: Looking at Data – Relationships • Association (statistical dependence) • Response and explanatory variables • Scatterplots • Correlation5/18/11 Lecture 7 4 Association between Variables • Two variables are associated (dependent) if some values of one variable tend to occur more often along with some values of the second variable.5/18/11 Lecture 7 5 Positive vs Negative Association5/18/11 Lecture 7 6 Examples • weight (in kilogram) and height (in centimeter) • An insurance company reports that heavier cars have less fatal accidents per 10,000 vehicles than lighter cars do. • A medical study finds that short women are more likely to have heart attacks than women of average height, while tall women have even fewer heart attacks. Note: no explanation yet, just findings …5/18/11 Lecture 7 7 Response and Explanatory Variables • A response variable (or dependent variable) measures an outcome of a study • An explanatory variable (or independent variable) explains or causes changes in the response variable. • In most cases, we set values of one variable to see how it affects another variable. – biological, chemical experiments • Not always causal relation! – SAT scores vs college grades5/18/11 Lecture 7 8 Scatterplots • Two-dimensional plot, with one variable’s values plotted along the vertical axis and the other along the horizontal axis. – X-axis: explanatory – Y-axis: response • Display the general relationship between two quantitative variables graphically. • Two variables measured on the same ``individuals’’.5/18/11 Lecture 7 9 Example • A statistician wanted to purchase a house in a neighborhood. He decided to develop a model to predict the selling price of a house. • He took a random sample of 100 houses that recently sold and recorded the selling price, the number of bedrooms, and the size (in square foot) for each.5/18/11 Lecture 7 10 80000100000120000140000160000180000200000220000240000Price1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400House SizeBiv ariate Fit of Price By H Size5/18/11 Lecture 7 11 Examining a Scatterplot • Overall Pattern – Direction: positive or negative association – Form: linear or nonlinear (eg. curved or clustered) – Strength: strong if there is very little deviation from the trend • Deviation – Deviation in form or direction – Outliers5/18/11 Lecture 7 12 Typical Patterns of Scatterplots No relationship Negative nonlinear relationship This is a weak linear relationship. A non linear relationship seems to fit the data better. Nonlinear (concave) relationship Positive linear relationship Negative linear relationship5/18/11 Lecture 7 135/18/11 Lecture 7 14 Categorical variable in a scatterplot5/18/11 Lecture 7 15 80000100000120000140000160000180000200000220000240000Price1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400House SizeBiv ariate Fit of Price By H Size5/18/11 Lecture 7 16 Relationship between Categorical and Quantitative Variables • Back-to-back stemplots: two categories • Side-by-side boxplots: any number of categories5/18/11 Lecture 7 17 Review: Scatterplot • The plot shows relationship between two quantitative variables. • It plots observations of different individuals in a two-dimensional graph. • Each point in a scatterplot corresponds to two variables of the same individual. • Just graphical, not numerical.5/18/11 Lecture 7 18 Which one shows a stronger relationship?5/18/11 Lecture 7 19 Correlation • A quantity used to measure the direction and strength of the linear relationship between two quantitative variables. • Often written as r ))((11yixinsyysxxr5/18/11 Lecture 7 20 Example • A car dealer wants to find the relationship between the odometer reading and the selling price of used cars. • A random sample of 100 cars is selected, and the data are summarized as follows. • Find the correlation • r = - 0.806 4.5411,5.009,36,100  yxn.256,356,1))((11,9.254,6.6597  yyxxnssiiyx5/18/11 Lecture 7 21 Properties of r 11  r• Positive r indicates positive linear association. • Negative r indicates negative linear association. • The closer that r moves toward 1 or –1, the stronger the linear association is. • r = -1 or 1 occurs only when the points in the scatterplot lie along a straight line.5/18/11 Lecture 7 22 Properties of r • It measures only the linear relationship between two quantitative variables. • Invariant to the order of the variables • Invariant to rescaling (why?) • Unit-free • Sensitive to outliers • Question:True or False? r = 0 means there is no association between two variables.5/18/11 Lecture 7 23 Different correlations5/18/11 Lecture 7 24 Blunders about Correlation (why ?) • There is a high correlation between the gender of American workers and their income. • We found a high (r =1.09) between students’ ratings of faculty teaching and ratings made by other faculty members. • The correlation between planting rate and yield of corn was found to be r = 0.23 bushel.5/18/11 Lecture 7 25 Take Home Message • Association (dependence) • Scatterplot • Correlation • Properties of


View Full Document

UNC-Chapel Hill STOR 155 - Lecture 7 - Scatterplots, Correlation

Documents in this Course
Exam 1

Exam 1

2 pages

Load more
Download Lecture 7 - Scatterplots, Correlation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 7 - Scatterplots, Correlation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 7 - Scatterplots, Correlation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?