Unformatted text preview:

Correlation and Linear regression Relationships between variables Scatterplots Explanatory and response variables Interpreting scatterplots Outliers Categorical variables in scatterplots Examining Relationships Most statistical studies involve more than one variable Questions What individuals do the data describe What variables are present and how are they measured Are all of the variables quantitative Do some of the variables explain or even cause changes in other variables Here we have two quantitative variables for each of 16 students 1 How many beers they drank and 2 Their blood alcohol level BAC We are interested in the relationship between the two variables How is one affected by changes in the other one Student Beers Blood Alcohol 1 2 3 6 7 9 4 5 8 11 13 10 12 14 15 16 5 2 9 7 3 3 4 5 8 3 5 5 6 7 1 4 0 1 0 03 0 19 0 095 0 07 0 02 0 07 0 085 0 12 0 04 0 06 0 05 0 1 0 09 0 01 0 05 Looking at relationships Start with a graph Look for an overall pattern and deviations from the pattern Use numerical descriptions of the data and overall pattern if appropriate Scatterplots In a scatterplot one axis is used to represent each of the variables and the data are plotted as points on the graph Student Beers BAC 1 2 3 6 7 9 4 5 8 11 13 10 12 14 15 16 5 2 9 7 3 3 4 5 8 3 5 5 6 7 1 4 0 1 0 03 0 19 0 095 0 07 0 02 0 07 0 085 0 12 0 04 0 06 0 05 0 1 0 09 0 01 0 05 Explanatory and response variables A response variable measures or records an outcome of a study An explanatory variable explains changes in the response variable Typically the explanatory or independent variable is plotted on the x axis and the response or dependent variable is plotted on the y axis Blood Alcohol as a function of Number of Beers Response dependent variable blood alcohol content 0 20 0 18 0 16 0 14 0 12 0 10 0 08 0 06 0 04 0 02 0 00 ml g m el v e L l o h o c l A d o o B l x y 0 1 2 3 4 5 6 7 8 9 10 Number of Beers Explanatory independent variable number of beers Some plots don t have clear explanatory and response variables Does percent return on Treasury bills explain percent return on common stocks Interpreting scatterplots After plotting two variables on a scatterplot we describe the relationship by examining the form direction and strength of the association We look for an overall pattern Form linear curved clusters no pattern Direction positive negative no direction Strength how closely the points fit the form and deviations from that pattern Outliers Form and direction of an association Linear No relationship Nonlinear Positive association High values of one variable tend to occur together with high values of the other variable Negative association High values of one variable tend to occur together with low values of the other variable No relationship X and Y vary independently Knowing X tells you nothing about Y Strength of the association The strength of the relationship between the two variables can be seen by how much variation or scatter there is around the main form With a strong relationship you can get a pretty good estimate of y if you know x With a weak relationship for any x you might get a wide range of y values This is a weak relationship For a particular state median household income you can t predict the state per capita income very well This is a very strong relationship The daily amount of gas consumed can be predicted quite accurately for a given temperature value Association Explanatory Variable Increases Response Variable Increases Decreases Decreases Increases Decreases Decreases Increases Positive Association Negative Association Outliers An outlier is a data value that has a very low probability of occurrence i e it is unusual or unexpected In a scatterplot outliers are points that fall outside of the overall pattern of the relationship Categorical variables in scatterplots Often things are not simple and one dimensional We need to group the data into categories to reveal trends What may look like a negative relationship is in fact a series of positive linear associations Categorical explanatory variables When the explanatory variable is categorical you cannot make a scatterplot but you can compare the different categories side by side on the same graph boxplots or mean standard deviation Comparison of income quantitative response variable for different education levels five categories But be careful in your interpretation This is NOT a positive association because education is not quantitative Looking at Data Relationships Correlation 2009 W H Freeman and Company Example Home Depot Sales and Leading Economic Indicators LEI What do you observe Home Depot Monthly Sales vs LEI from 1995 2004 3 0 2 5 2 0 1 5 s n o i l l i b s e l a S 1 0 95 100 110 115 105 LEI Standardizing review How many standard deviations is a value from the mean z value mean st dev Allows us to compare values with different units so let s standardize both x and y z x z y x x s x y y s y Scatterplot of standardized values What do the points in each quadrant tell you Scatterplot of standardized sales and LEI 0 s e l a s Z 3 2 1 0 1 2 0 2 1 1 2 0 Z LEI Cross products of Z scores Scatterplot of standardized sales and LEI 0 ZxZy 0 0 s e l a s Z 3 2 1 0 1 2 ZxZy 0 2 1 1 2 0 Z LEI Sum of Cross products the cross products of the z scores showing a positive relationship will be positive and those showing a negative relationship will be negative the more positive cross products we have the greater their sum will be r zxzy n 1 We call this ratio the correlation coefficient The correlation coefficient r The correlation coefficient is a measure of the direction and strength of a linear relationship It is calculated using the mean and the standard deviation of both the x and y variables Correlation can only be used to describe quantitative variables Categorical variables don t have means and standard deviations Correlation properties The correlation coefficient r r does not distinguish between x and y r has no units of measurement r ranges from 1 to 1 Correlation of zero means no linear relationship Correlation is not affected by changes in the center or scale of either variable Correlation is sensitive to unusual observations r ranges from 1 to 1 r quantifies the strength and direction of a linear relationship between 2 quantitative variables Strength how closely the points follow a straight line Direction is positive when individuals with higher X values tend to …


View Full Document

UMD BMGT 230 - Correlation and Linear regression

Documents in this Course
Data

Data

2 pages

Notes

Notes

8 pages

Notes

Notes

2 pages

Notes

Notes

3 pages

Exam

Exam

10 pages

Notes

Notes

1 pages

Notes

Notes

4 pages

EXAM 1

EXAM 1

3 pages

Exam 3

Exam 3

16 pages

Notes

Notes

1 pages

Notes

Notes

1 pages

Notes

Notes

1 pages

Exam 2

Exam 2

6 pages

Exam 2

Exam 2

6 pages

Notes

Notes

2 pages

Notes

Notes

2 pages

Load more
Download Correlation and Linear regression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Correlation and Linear regression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Correlation and Linear regression and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?