Unformatted text preview:

Stat 218 - Day 32 Correlation Coefficient Recall: We are now studying relationships between two quantitative variables. • A scatterplot is a graphical display of the relationship between two quantitative variables. • We examine a scatterplot for evidence of association between the variables. • We look for form (linear?), direction (positive or negative), and strength (weak, moderate, strong) of association. • A numerical summary of association is the correlation coefficient, denoted by r, which measures the degree of linear association between two variables. Example: New car data (cont.) Reconsider the nine scatterplots on new car data for the year 1999 (cars99.mtw). The following table reports the scatterplots arranged by direction and strength of association: Strong negative Moderate negative Virtually none Moderate positive Strong positive D G A H C E I F B (a) First consider scatterplot A, for which the variables were time to travel ¼ mile and weight. Use Minitab to calculate the value of the correlation coefficient between these variables (MTB> corr c11 c7). Record this value below the letter A in the table above. [Note: Also, ignore the P-value in the output for now.] (b) Repeat (a) for the other eight scatterplots. [Hint: For each scatterplot, you will first need to see which two variables are involved, and then see which columns of the Minitab worksheet contain those variables.] (c) Based on these correlation values, what would you guess is the largest value that a correlation coefficient can have? How about the smallest? (d) Under what conditions would a correlation coefficient equal its largest possible value? Its smallest?(e) How does the sign of the correlation relate to the direction of the association? (f) How does the magnitude of the correlation relate to the strength of the association? • The correlation coefficient r is calculated as: ()()()()∑∑∑−−−−=22yyxxyyxxriiii, which can be also be re-written in terms as 1−=∑nzzryx, where zx represents to the z-scores of the x-variable and zy represents the z-score of the y-variable. (g) Consider again scatterplot D, which displays a car’s city miles per gallon rating vs. its weight: weightcity mpg400035003000250020003028262422201816 Circle the cars that would have positive z-scores for weight. Then put a box around the cars that would have positive z-scores for city mpg. Is there much overlap? Based on the expression above, explain how this reveals that the correlation coefficient turns out to be negative.(h) Does order (which variable is x and which is y) matter when calculating a correlation coefficient? Explain. (i) Is the correlation coefficient resistant to outliers? Explain how you can tell. Example: Guess the correlation (a) Open the “Guess the Correlation” applet. Click on “new sample” to create a scatterplot. Consult with your partner to make a guess for the value of the correlation coefficient in this scatterplot, type it in, and then press the applet’s “enter” button. Repeat this process until you have seen 10 scatterplots and made 10 guesses. (b) Make a guess for the value of the correlation coefficient between your guesses and the actual values. (c) Change the applet’s graph to “guess vs. actual.” How well did your guesses do? What is the correlation between your guesses and the actual values? Does this indicate that you and your partner were pretty good guessers? Explain. (d) If every guess was too high by exactly .5, what would be the value of the correlation coefficient between the guesses and the actual values? Explain. What does this reveal about the limitation of correlation as a measure of guessing accuracy? (e) Draw a rough sketch of what the scatterplot of “guess vs. actual” would look like for a person who always guesses perfectly when the correlation is really negative but a bit too low when the correlation is positive.Example: Nenana ice break competition: evidence of global warming? Nenana is a small, interior Alaskan town that holds a famous competition to predict the exact moment that “spring arrives” every year. The arrival of spring is defined to be the moment when the Tanana River becomes ice-free, which is measured by a tripod erected on the ice with a trigger to an official clock. The minute at which the ice breaks has been recorded in every year since 1917. For example, the dates and times for the years 2000-2004 were: 2000 2001 2002 2003 2004 May 1, 10:47am May 8, 1:00pm May 7, 9:27pm April 29, 6:22pm April 24, 2:16pmThe Minitab worksheet NenanaIceBreak.mtw contains all of the data from 1917-2004. Scientists have examined these data for evidence of global warming, which would suggest that the ice break day should be tending to occur earlier as time goes on. (a) Examine a scatterplot of the day in which the ice broke (coded in c7 with April 1 = 1) vs. year (MTB> plot c7*c1). Does it reveal any association between the two variables? In other words, is there any indication that the day on which spring begins is changing over time? Explain. (b) Calculate the correlation coefficient between “ice break day” and year (MTB> corr c7 c1). What does it reveal? Explain. Example: Draft lottery (cont.) Reconsider (from day 4!) the data from the 1970 draft lottery (draft70.mtw). Examine a scatterplot of draft number vs. sequential date number, and also calculate the correlation coefficient. What does this analysis reveal about the random-ness of the draft lottery? Explain.Example: Hypothetical exam scores (a) Suppose that every student in a class scores exactly ten points higher on the second exam than on the first exam. What would the value of the correlation coefficient between the two exam scores be? Explain. [Hint: You might want to draw a sketch of a scatterplot, or enter some data into Minitab.] (b) Suppose that every student in a class scores exactly five points lower on the second exam than on the first exam. What would the value of the correlation coefficient between the two exam scores be? Explain. (c) What if every student were to score five times more points on the second exam than on the first exam- what then would the value of the correlation coefficient between the two exam scores be? Explain. (d) What’s wrong with the claim that “there’s no correlation” between scores on


View Full Document

Cal Poly STAT 218 - Correlation Coefficient

Download Correlation Coefficient
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Correlation Coefficient and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Correlation Coefficient 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?