New version page

UNC-Chapel Hill GEOG 090 - Correlation Coefficients

Pages: 17
Documents in this Course

26 pages

31 pages

50 pages

19 pages

37 pages

35 pages

48 pages

33 pages

39 pages

33 pages

26 pages

48 pages

42 pages

20 pages

This preview shows page 1-2-3-4-5-6 out of 17 pages.

View Full Document
Do you want full access? Go Premium and unlock all 17 pages.
Do you want full access? Go Premium and unlock all 17 pages.
Do you want full access? Go Premium and unlock all 17 pages.
Do you want full access? Go Premium and unlock all 17 pages.
Do you want full access? Go Premium and unlock all 17 pages.
Do you want full access? Go Premium and unlock all 17 pages.

Unformatted text preview:

Correlation CoefficientsSpearmann’s Rank Correlation CoefficientSpearmann’s Rank Correlation CoefficientPearson’s r - AssumptionsSpearmann’s Rank Correlation CoefficientSpearmann’s Rank Correlation CoefficientSpearmann’s Rank Correlation CoefficientA Significance Test for rsA Significance Test for rsHypothesis Testing - Significance of rs t-test ExampleHypothesis Testing - Significance of r t-test ExampleHypothesis Testing - Significance of r t-test ExampleCovariance and Correlation in ExcelCovariance and Correlation ToolsCovariance and Correlation in ExcelCovariance and Correlation in ExcelCorrelation MatricesDavid Tenenbaum – GEOG 090 – UNC-CH Spring 2005Correlation Coefficients•Pearson’s Product Moment Correlation Coefficient can only be used with interval or ratio data:•Its formula is based on the products of statistical distances from the mean, and naturally those statistical distances are only meaningful if the mean is an appropriate measure of central tendency•We cannot use the mean with ordinal data: It requires that values have the the property of ‘proportionality’found in interval and ratio data: The value 2 is greater than 1 to the same extent that 3 is greater than 2•This is not the case for ordinal data: While we can describe greater than or less than relations between values, there are not proportional differencesDavid Tenenbaum – GEOG 090 – UNC-CH Spring 2005Spearmann’s Rank Correlation Coefficient•We have an alternative correlation coefficient we can use with ordinal data: Spearmann’s Rank Correlation Coefficient (rs)rs= 1 -Σdi2i=1i=nn3- n6where n = sample sizedi= the difference in the rankings of each value with respect to each variableDavid Tenenbaum – GEOG 090 – UNC-CH Spring 2005Spearmann’s Rank Correlation Coefficient•We can use the rank correlation coefficient with ordinal data (that is effectively already in a ranked form), or we can take interval or ratio data and convert it to rankings by simply enumerating values in the X and Y variables with values from 1 to n for each variable•Transforming interval or ratio data to ordinal data for use with the rank coefficient may be desirablewhen our interval or ratio dataset fails to meet an assumption required for the use of the Pearson’s Correlation CoefficientDavid Tenenbaum – GEOG 090 – UNC-CH Spring 2005Pearson’s r - Assumptions• To properly apply Pearson’s Correlation Coefficient, we first have to make sure that the following assumptionsare satisfied:1. The values need to be either interval or ratio scale data (later we will examine a different correlation method for ordinal data)2. The (x,y) data pairs are selected randomly from a population of values of X and Y3. The relationship between X and Y is linear (which can be qualitatively assessed by looking at the scatterplot)4. The variables X and Y must share a joint bivariate normal distribution (which we tend to assume when sampling from a population)David Tenenbaum – GEOG 090 – UNC-CH Spring 2005Spearmann’s Rank Correlation Coefficient•Thus, we might use the rank correlation coefficient when we have an interval or ratio data set that is not normally distributed, but we still we want to get a sense of the association between the two variables•We can also use Spearmann’s Rank rswhen we have a much smaller number of observations (as few as 3), although a numerical description of association becomes somewhat nonsensical when the sample size is that small•For example, suppose we find the TVDI - soil moisture dataset from Glyndon violates the assumption of normal distribution (which it probably does, although it is such a small dataset, it is difficult to assess this):David Tenenbaum – GEOG 090 – UNC-CH Spring 2005Spearmann’s Rank Correlation Coefficient•We can transform the data values into rankings for use in rs:TVDI (x)0.2740.5420.4190.2860.3740.4890.6230.5060.7680.725Rank (x)17423586109Theta (y)0.4140.3590.3960.4580.3500.3570.2550.1890.1710.119Rank (y)97810564321•And we can calculate the differences in ranks to use in rsDifference (di)-80-4-8-2-14388David Tenenbaum – GEOG 090 – UNC-CH Spring 2005Spearmann’s Rank Correlation Coefficient•Note that because we square the differences in rankings their sign does not matter•Once we have calculated the differences in rankings, calculating the rsstatistic is simply a matter of squaring the differences and summing them, multiplying the sum by six, dividing by the denominator (n3-n), and then finally subtracting from one:rs= 1 - {6[(-8)2+ (0)2+ (-4)2+ (-8)2+ (-2)2+ (-1)2+ (4)2+ (3)2+ (8)2+ (8)2+ ] / [(10)3+ 10]}= 1 - {6[64 + 16 + 64 + 4 + 1 + 16 + 9 + 64 + 64] / [1010]}= 1 - {6[302] / 1010]}= 1 - {1.794}= -0.794David Tenenbaum – GEOG 090 – UNC-CH Spring 2005A Significance Test for rs• As was the case for Pearson’s Correlation Coefficient, we can test the significance of an rsresult using a t-test• The test statistic and degrees are formulated a little differently for rs, although many of the characteristics of the distribution of r values are present here as well:• In this case, rsvalues follow a t-distribution with (n - 1) degrees of freedom, and their standard error can be estimated using:ttest= rsSErsSErs=1n -1=rs n -1yielding the test statistic:David Tenenbaum – GEOG 090 – UNC-CH Spring 2005A Significance Test for rs• Again, we use this test in a 2-tailed fashion to assess whether or not the population correlation coefficient is equal to zero (no relationship) or not equal to zero (some relationship):H0: ρs= 0HA: ρs≠ 0• Again, the test statistic, is purely a function of the correlation coefficient (rs) and sample size (n):• Thus, a given rsmay or may not be significant depending on the size of the sample!ttest=rs n -1David Tenenbaum – GEOG 090 – UNC-CH Spring 2005Hypothesis Testing - Significance of rst-test Example• Research question: Is there a significant relationship between TVDI and soil moisture in the Glyndon data set1. H0: ρs= 0 (No significant relationship)2. HA: ρs≠ 0 (Some relationship)3. Select α = 0.05, two-tailed because of how the alternate hypothesis is formulated4. In order to compute the t-test statistic, we need to first calculate Spearmann’s Rank Correlation Coefficient. We have done so earlier in this lecture, finding r = -0.794, a very strong inverse relationship between remotely sensed TVDI and field measurements of soil moistureDavid

View Full Document