Unformatted text preview:

Chapter 7Correlation1So far in this book, we have limited ourselves to looking at only one variable at a time,trying to learn as much as possible about that single variable. However, most of our data ismade up of many variables, all interacting and having effects on each other. In this chapteryou will explore relationships between two variables using graphical methods (scatterplots),computational methods (correlation), and algebraic methods (equations of functions).• As a result of this chapter, students will learn√How to read and interpret a scatterplot√How correlation describes the relationship between two variables√The meanings of ”positive” and ”negative” relationships between two variables√About the slope and y-intercept of straight lines and how to compute these• As a result of this chapter, students will be able to√Identify variables with a positive or negative relationship using the correlationcoefficient√Construct a correlation table using StatPro to determine which variable relation-ships are most influential√Estimate the correlation coefficient of two variables based on a scatterplot√Set up a scatterplot according to conventions about axes, etc.√Add trendlines to a scatterplot1c2011 Kris H. Green and W. Allen Emerson199200 CHAPTER 7. COORELATION7.1 Picturing and Quantifying the Relationship Be-tween Two VariablesIn many of the previous examples in this book you have probably been tempted to go too farin your conclusions. For example, if you were to look at information about employees at acompany and you learned that the salaries were negatively skewed and that the ages of youremployees were also negatively skewed, you might be tempted to claim that one variable (forinstance, age) influences the other variable (in this case, salary).However, it would be dishonest to make such a claim with the tools we have discussed sofar. In fact, the relationship between the two variables could be exactly the opposite of whatyou claim: it could be that the low salaries are all earned by employees who are older andthat younger employees are making more money. It is even possible that the two variablesare unrelated entirely. All of our tools up to now have been tools to analyze data one variableat a time. In order to speculate about relationships between two or more variables, we neednew tools that include two variables at a time. A graphical tool for this analysis is thescatterplot. This is a two-dimensional graph made up of points where each point representsa pair of observations, one for each of the two variables you are comparing. In this way, youcan quickly spot connections between variables. Such connections are called correlationsand can also be computed numerically with a fairly simple formula based on z-scores.Consider the employee salary example above. One could speculate that the points repre-senting the salary and age of each employee would show that older employees tend to havehigher salaries (after all, they have been working longer, have more experience and have hadmore opportunities for promotion). If the graph shows this, then there might be a connectionbetween the two variables.We want to emphasize this as strongly as possible. Simply because the correlation be-tween two variables is high does not mean that one variable is causing the changes in theother. Consider the following situation: You are interested in the performance of your stockbrokers at a large investment firm. If you looked at the amount of money each broker earnedfor the firm and compared this to the number of cups of coffee that broker drinks each dayat work, what would it mean if there were a strong positive correlation? Would that meanthat drinking more coffee makes you a better broker? Clearly, this is absurd. What it doesmean is that brokers who make more money for the firm also tend to drink more coffee.That’s all it means. Why might this be so? There are many reasons. It could simply bethat the amount of coffee consumed is a surrogate for the number of hours the broker works.More hours worked might lead to more money for the broker. But more hours worked willprobably involve drinking more coffee.For the remainder of this book, we will be dealing with how to represent relationshipsamong variables. Our goal is to develop these relationships into mathematical equationscalled functions that we can use in our decision-making.7.1.1 Definitions and FormulasScatterplot A scatterplot is a graph that takes sets of observations of two variables andplots them as points on a graph. Each point corresponds to a single observation of both7.1. PICTURING TWO VARIABLE RELATIONSHIPS 201variables. The points are identified by an ordered pair, with the horizontal variablelisted first. These ordered pairs are written as (x, y). After each point in the data isplotted, the scatterplot can help determine if there is a relationship between the twovariables.Axis and axes All graphs have an axis that shows a scale and in which direction thevariable being graphed is increasing. ”Axes” is the plural form of the word axis.Quadrants In a scatterplot, the horizontal and vertical axis cross at a point called the originwhich has coordinates (0, 0). This divides the Cartesian plane (all the possible pointsof the scatterplot) into four regions called quadrants. Each quadrant is numberedaccording to the graph in figure 7.1.Figure 7.1: Diagram showing the labels for each of the four quadrants in an XY scatter plot.As usual, the x-axis runs left to right and the y-axis runs bottom to top.Dependent Variable The dependent variable is usually graphed on the vertical axis. Thisis the variable that you suspect will be affected by a change in the other variable.Independent Variable The independent variable is usually graphed on the horizontal axis.This is the variable that you suspect determines the value of the dependent variable.It is graphed on the horizontal axis because it is easier for the eye to scan left-to-rightin picking a value for it and then scanning up the graph to determine the value ofthe dependent variable that corresponds to the value of the independent variable youpicked.Direct Relationship If the cloud of points on the scatterplot seems to move upward asthe eye scans across the graph from left-to-right (as shown in figure 7.2), then therelationship between the two variables is said to be a direct relationship. This meansthat as the independent variable increases (gets larger in


View Full Document

SJFC MSTI 130 - Correlation

Download Correlation
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Correlation and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Correlation 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?