Chapter 6 The Sum of Ranks Test Thus far in these Course Notes we have considered CRDs with a numerical response In Chapter 5 we learned how to perform a statistical test of hypotheses to investigate whether the Skeptic s Argument is correct Every test of hypotheses has a test statistic in Chapter 5 we chose the test statistic U which has observed value u x y For rather obvious reasons this test using U is referred to a test of means or a test of comparing means We learned in Chapter 1 that the mean is a popular way to summarize a list of numbers Thus it is not surprising to learn that comparing means by subtraction is a popular way to compare two treatments and hence the test of Chapter 5 seems sensible But we also learned in Chapter 1 that the median is another popular way to summarize a list of numbers Thus you might guess that another popular choice for a test statistic would be the one whose observed value is v x y If you make this guess you would be wrong but close to the truth Recall from Chapter 1 that the distinction between the mean and the median can be viewed as the distinction between focusing on arithmetic versus position The median recall is the number at the center position of a sorted listed for an odd sample size or the average of the values at the two center positions for an even sample size Thus the value v in the previous paragraph compares two sorted lists by comparing the numbers in their center positions This comparison ignores a great deal of information In those situations in which for whatever reasons we prefer to focus on positions rather than arithmetic it turns out that using ranks defined below is superior to using medians in order to compare two sets of numbers In this chapter we will consider an option to using U i e we will present a test that compares the two sets of data by comparing their ranks When we study power in a later chapter we will see that sometimes the test that compares ranks is better than the test that compares means The last section of this chapter presents an additional advantage of using a test based on ranks it can be used when the response is ordinal but not numerical 6 1 Ranks We begin by doing something that seems quite odd We combine the data from the two treatments into one set of data and then we sort the n n1 n2 response values For example for Dawn s 117 Table 6 1 Dawn s 20 sorted response values with ranks Position Response Rank 1 0 1 2 1 3 3 1 3 4 1 3 5 2 5 Position 11 12 13 14 Response 4 5 5 5 Rank 10 5 13 13 13 15 6 16 6 7 3 3 7 5 7 5 16 6 16 8 3 7 5 9 10 3 4 7 5 10 5 17 18 19 6 7 7 16 18 5 18 5 20 8 20 study of her cat the 20 sorted response values are given in Table 6 1 You can verify these numbers from the data presented in Table 1 3 in Chapter 1 but I recommend that you just trust me on this We note that Dawn s 20 numbers consist of nine distinct values Her number of distinct values is smaller than 20 because several of the responses are tied for example four responses are tied with the value 3 Going back to Chapter 1 we talk about the 20 positions in the list in Table 6 1 As examples position 1 has the response 0 position 20 has the response 8 and positions 6 9 all have the response 3 If the n numbers in our list are all distinct then the rank of each response is its position This is referred to as the no ties situation and it makes all of the computations below much simpler Sadly in practice data with ties are commonplace Whenever there are ties all tied responses receive the same rank which is equal to the mean of their positions Thus for example all four of the responses equal to 3 receive the rank of 7 5 because they occupy positions 6 through 9 and the mean of 6 7 8 and 9 is 7 5 It is tedious to sum these four numbers to find their mean here is a shortcut that always works simply compute the mean of the smallest first and largest last positions in the list For example to find the mean of 6 7 8 and 9 simply calculate 6 9 2 7 5 When we consider ordinal data in Section 6 5 we will have occasion to find the mean of 43 44 and 75 Summing these 33 numbers is much more tedious than simply computing 43 75 2 59 Finally for any responses in the list that is not tied with another responses 0 2 and 8 in Dawn s data its rank equals its position The basic idea of our test based on ranks is that we analyze the ranks not the responses For example I have retyped Table 6 1 in Table 6 2 dropping the two Position rows with the added feature that the responses from treatment 1 chicken and their ranks are in bold face type For the test statistic U we performed arithmetic on the responses to obtain the means for each treatment and then we subtracted We do the same arithmetic now but we use the ranks instead of the responses For example let R1 denote the sum of the ranks for treatment 1 and let r1 denote its observed value For Dawn s data we get r1 3 7 5 10 5 13 13 16 16 16 18 5 20 133 5 118 Table 6 2 Dawn s 20 sorted responses with ranks The responses from treatment 1 and their ranks are in bold faced type Response 0 1 1 1 Rank 1 3 3 3 Response 4 5 5 5 Rank 10 5 13 13 13 2 3 3 3 3 4 5 7 5 7 5 7 5 7 5 10 5 6 6 6 7 7 8 16 16 16 18 5 18 5 20 Similarly let R2 denote the sum of the ranks for treatment 2 and let r2 denote its observed value For Dawn s data we get r2 1 3 3 5 7 5 7 5 7 5 10 5 13 18 5 76 5 In order to compare the treatments ranks descriptively we calculate the mean of the ranks for each treatment r 1 r1 n1 133 5 10 13 35 and r 2 r2 n2 76 5 10 7 65 which show that based on ranks the responses on treatment 1 are larger than the responses on treatment 2 The next obvious step is that we define v r 1 r 2 r1 n1 r2 n2 to be the observed value of the test …
View Full Document