Lecture Notes CMSC 251 Finally in the recurrence T n 4T n 3 n which corresponds to Case 1 most of the work is done at the leaf level of the recursion tree This can be seen if you perform iteration on this recurrence the resulting summation is log3 n i X 4 n 3 i 0 You might try this to see if you get the same result Since 4 3 1 as we go deeper into the levels of the tree that is deeper into the summation the terms are growing successively larger The largest contribution will be from the leaf level Lecture 9 Medians and Selection Tuesday Feb 24 1998 Read Todays material is covered in Sections 10 2 and 10 3 You are not responsible for the randomized analysis of Section 10 2 Our presentation of the partitioning algorithm and analysis are somewhat different from the ones in the book Selection In the last couple of lectures we have discussed recurrences and the divide and conquer method of solving problems Today we will give a rather surprising and very tricky algorithm which shows the power of these techniques The problem that we will consider is very easy to state but surprisingly difficult to solve optimally Suppose that you are given a set of n numbers Define the rank of an element to be one plus the number of elements that are smaller than this element Since duplicate elements make our life more complex by creating multiple elements of the same rank we will make the simplifying assumption that all the elements are distinct for now It will be easy to get around this assumption later Thus the rank of an element is its final position if the set is sorted The minimum is of rank 1 and the maximum is of rank n Of particular interest in statistics is the median If n is odd then the median is defined to be the element of rank n 1 2 When n is even there are two natural choices namely the elements of ranks n 2 and n 2 1 In statistics it is common to return the average of these two elements We will define the median to be either of these elements Medians are useful as measures of the central tendency of a set especially when the distribution of values is highly skewed For example the median income in a community is likely to be more meaningful measure of the central tendency than the average is since if Bill Gates lives in your community then his gigantic income may significantly bias the average whereas it cannot have a significant influence on the median They are also useful since in divide and conquer applications it is often desirable to partition a set about its median value into two sets of roughly equal size Today we will focus on the following generalization called the selection problem Selection Given a set A of n distinct numbers and an integer k 1 k n output the element of A of rank k The selection problem can easily be solved in n log n time simply by sorting the numbers of A and then returning A k The question is whether it is possible to do better In particular is it possible to solve this problem in n time We will see that the answer is yes and the solution is far from obvious The Sieve Technique The reason for introducing this algorithm is that it illustrates a very important special case of divide and conquer which I call the sieve technique We think of divide and conquer as breaking the problem into a small number of smaller subproblems which are then solved recursively The sieve technique is a special case where the number of subproblems is just 1 31 Lecture Notes CMSC 251 The sieve technique works in phases as follows It applies to problems where we are interested in finding a single item from a larger set of n items We do not know which item is of interest however after doing some amount of analysis of the data taking say nk time for some constant k we find that we do not know what the desired item is but we can identify a large enough number of elements that cannot be the desired value and can be eliminated from further consideration In particular large enough means that the number of items is at least some fixed constant fraction of n e g n 2 n 3 0 0001n Then we solve the problem recursively on whatever items remain Each of the resulting recursive solutions then do the same thing eliminating a constant fraction of the remaining set Applying the Sieve to Selection To see more concretely how the sieve technique works let us apply it to the selection problem Recall that we are given an array A 1 n and an integer k and want to find the k th smallest element of A Since the algorithm will be applied inductively we will assume that we are given a subarray A p r as we did in MergeSort and we want to find the kth smallest item where k r p 1 The initial call will be to the entire array A 1 n There are two principal algorithms for solving the selection problem but they differ only in one step which involves judiciously choosing an item from the array called the pivot element which we will denote by x Later we will see how to choose x but for now just think of it as a random element of A We then partition A into three parts A q contains the element x subarray A p q 1 will contain all the elements that are less than x and A q 1 r will contain all the element that are greater than x Recall that we assumed that all the elements are distinct Within each subarray the items may appear in any order This is illustrated below pivot p r 5 9 2 6 4 1 3 7 Before partitioing p q r 3 1 2 4 6 9 5 7 After partitioing x A p q 1 x A q 1 r x 5 9 2 6 4 1 3 7 Initial k 6 3 1 2 4 x rnk 4 6 9 5 7 Partition pivot 4 6 9 5 7 Recurse k 6 4 2 6 5 7 x rnk 3 9 Partition pivot 7 6 5 5 6 x rnk 2 DONE Recurse k 2 Partition pivot 6 Figure 7 Selection Algorithm It is easy to see that the rank of the pivot x is q p 1 in A p r Let x rnk q p 1 If k x rnk then the pivot is the kth smallest and we may just return it If k x rnk then we know that we need to recursively search in A p q 1 and if k x rnk then we need to recursively search A q 1 r In this latter case we have eliminated q smaller elements so we want to …
View Full Document