Unformatted text preview:

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STAT 155 Introductory Statistics Lecture 7 Scatterplots Correlation 9 26 06 Lecture 7 1 Relationships between Quantitative Variables Chapter 1 talks about distribution of one variable Often we are interested in the relationships between two quantitative variables Examples Heights of parents and children High school GPA and college GPA Stock returns of two different corporations 9 26 06 Lecture 7 2 Chapter 2 Looking at Data Relationships Association statistical dependence Response and explanatory variables Scatterplots Correlation 9 26 06 Lecture 7 3 Association between Variables Two variables are associated dependent if some values of one variable tend to occur more often along with some values of the second variable 9 26 06 Lecture 7 4 Positive vs Negative Association 9 26 06 Lecture 7 5 Examples weight in kilogram and height in centimeter An insurance company reports that heavier cars have less fatal accidents per 10 000 vehicles than lighter cars do A medical study finds that short women are more likely to have heart attacks than women of average height while tall women have even fewer heart attacks Note no explanation yet just findings 9 26 06 Lecture 7 6 Response and Explanatory Variables A response variable or dependent variable measures an outcome of a study An explanatory variable or independent variable explains or causes changes in the response variable In most cases we set values of one variable to see how it affects another variable Biological chemical experiments Not always causal relation SAT scores vs college grades 9 26 06 Lecture 7 7 Scatterplots Two dimensional plot with one variable s values plotted along the vertical axis and the other along the horizontal axis X axis explanatory Y axis response Display the general relationship between two quantitative variables graphically Two variables measured on the same individuals 9 26 06 Lecture 7 8 Example A statistician wanted to purchase a house in a neighborhood He decided to develop a model to predict the selling price of a house He took a random sample of 100 houses that recently sold and recorded the selling price the number of bedrooms and the size in square foot for each 9 26 06 Lecture 7 9 Bivariate Fit of Price By H Size 240000 220000 200000 Price 180000 160000 140000 120000 100000 80000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 House Size 9 26 06 Lecture 7 10 Examining a Scatterplot Overall Pattern Direction positive or negative association Form linear or nonlinear eg curved or clustered Strength strong if there is very little deviation from the trend Deviation Deviation in form or direction Outliers 9 26 06 Lecture 7 11 Typical Patterns of Scatterplots Positive linear relationship No relationship Negative nonlinear relationship This is a weak linear relationship A non linear relationship seems to fit the data better 9 26 06 Lecture 7 Negative linear relationship Nonlinear concave relationship 12 9 26 06 Lecture 7 13 Categorical variable in a scatterplot 9 26 06 Lecture 7 14 Bivariate Fit of Price By H Size 240000 220000 200000 Price 180000 160000 140000 120000 100000 80000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 House Size 9 26 06 Lecture 7 15 Relationship between Categorical and Quantitative Variables Back to back stemplots two categories Side by side boxplots any number of categories 9 26 06 Lecture 7 16 Review Scatterplot The plot shows relationship between two quantitative variables It plots observations of different individuals in a two dimensional graph Each point in a scatterplot corresponds to two variables of the same individual Just graphical not numerical 9 26 06 Lecture 7 17 Which one shows a stronger relationship 9 26 06 Lecture 7 18 Correlation A quantity used to measure the direction and strength of the linear relationship between two quantitative variables Often written as r r 9 26 06 1 n 1 xi x yi y s s x y Lecture 7 19 Example A car dealer wants to find the relationship between the odometer reading and the selling price of used cars A random sample of 100 cars is selected and the data are summarized as follows n 100 x 36 009 5 y 5411 4 s x 6597 6 s y 254 9 Find the correlation r 0 806 9 26 06 1 xi x yi y 1 356 256 n 1 Lecture 7 20 Properties of r 1 r 1 Positive r indicates positive linear association Negative r indicates negative linear association The closer that r moves toward 1 or 1 the stronger the linear association is r 1 or 1 occurs only when the points in the scatterplot lie along a straight line 9 26 06 Lecture 7 21 Properties of r It measures only the linear relationship between two variables Invariant to the order of the variables Invariant to rescaling why Unit free Sensitive to outliers Question True or False r 0 means there is no association between two variables 9 26 06 Lecture 7 22 Different correlations 9 26 06 Lecture 7 23 Blunders about Correlation why There is a high correlation between the gender of American workers and their income We found a high r 1 09 between students ratings of faculty teaching and ratings made by other faculty members The correlation between planting rate and yield of corn was found to be r 0 23 bushel 9 26 06 Lecture 7 24 Take Home Message Association dependence Scatterplot Correlation Properties of Correlation 9 26 06 Lecture 7 25


View Full Document
Download Lecture 7- Scatterplots, Correlation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 7- Scatterplots, Correlation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 7- Scatterplots, Correlation and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?