Unformatted text preview:

Lecture Notes CMSC 251Finally, in the recurrence T (n)=4T(n/3) + n (which corresponds to Case 1), most of the work isdone at the leaf level of the recursion tree. This can be seen if you perform iteration on this recurrence,the resulting summation isnlog3nXi=043i.(You might try this to see if you get the same result.) Since 4/3 > 1, as we go deeper into the levelsof the tree, that is deeper into the summation, the terms are growing successively larger. The largestcontribution will be from the leaf level.Lecture 9: Medians and Selection(Tuesday, Feb 24, 1998)Read: Todays material is covered in Sections 10.2 and 10.3. You are not responsible for the randomizedanalysis of Section 10.2. Our presentation of the partitioning algorithm and analysis are somewhat differentfrom the ones in the book.Selection: In the last couple of lectures we have discussed recurrences and the divide-and-conquer methodof solving problems. Today we will give a rather surprising (and very tricky) algorithm which showsthe power of these techniques.The problem that we will consider is very easy to state, but surprisingly difficult to solve optimally.Suppose that you are given a set of n numbers. Define the rank of an element to be one plus thenumber of elements that are smaller than this element. Since duplicate elements make our life morecomplex (by creating multiple elements of the same rank), we will make the simplifying assumptionthat all the elements are distinct for now. It will be easy to get around this assumption later. Thus, therank of an element is its final position if the set is sorted. The minimum is of rank 1 and the maximumis of rank n.Of particular interest in statistics is the median.Ifnis odd then the median is defined to be the elementof rank (n +1)/2. When n is even there are two natural choices, namely the elements of ranks n/2and (n/2) + 1. In statistics it is common to return the average of these two elements. We will definethe median to be either of these elements.Medians are useful as measures of the central tendency of a set, especially when the distribution of val-ues is highly skewed. For example, the median income in a community is likely to be more meaningfulmeasure of the central tendency than the average is, since if Bill Gates lives in your community thenhis gigantic income may significantly bias the average, whereas it cannot have a significant influenceon the median. They are also useful, since in divide-and-conquer applications, it is often desirable topartition a set about its median value, into two sets of roughly equal size. Today we will focus on thefollowing generalization, called the selection problem.Selection: Given a set A of n distinct numbers and an integer k, 1 ≤ k ≤ n, output the element of Aof rank k.The selection problem can easily be solved in Θ(n log n) time, simply by sorting the numbers of A,and then returning A[k]. The question is whether it is possible to do better. In particular, is it possibleto solve this problem in Θ(n) time? We will see that the answer is yes, and the solution is far fromobvious.The Sieve Technique: The reason for introducing this algorithm is that it illustrates a very important specialcase of divide-and-conquer, which I call the sieve technique. We think of divide-and-conquer as break-ing the problem into a small number of smaller subproblems, which are then solved recursively. Thesieve technique is a special case, where the number of subproblems is just 1.31Lecture Notes CMSC 251The sieve technique works in phases as follows. It applies to problems where we are interested infinding a single item from a larger set of n items. We do not know which item is of interest, howeverafter doing some amount of analysis of the data, taking say Θ(nk) time, for some constant k, we findthat we do not know what the desired item is, but we can identify a large enough number of elementsthat cannot be the desired value, and can be eliminated from further consideration. In particular “largeenough” means that the number of items is at least some fixed constant fraction of n (e.g. n/2, n/3,0.0001n). Then we solve the problem recursively on whatever items remain. Each of the resultingrecursive solutions then do the same thing, eliminating a constant fraction of the remaining set.Applying the Sieve to Selection: To see more concretely how the sieve technique works, let us apply it tothe selection problem. Recall that we are given an array A[1..n] and an integer k, and want to find thek-th smallest element of A. Since the algorithm will be applied inductively, we will assume that weare given a subarray A[p..r] as we did in MergeSort, and we want to find the kth smallest item (wherek ≤ r − p +1). The initial call will be to the entire array A[1..n].There are two principal algorithms for solving the selection problem, but they differ only in one step,which involves judiciously choosing an item from the array, called the pivot element, which we willdenote by x. Later we will see how to choose x, but for now just think of it as a random element of A.We then partition A into three parts. A[q] contains the element x, subarray A[p..q − 1] will contain allthe elements that are less than x, and A[q +1..r], will contain all the element that are greater than x.(Recall that we assumed that all the elements are distinct.) Within each subarray, the items may appearin any order. This is illustrated below.Before partitioingAfter partitioing2641379pivot351946xprqprA[q+1..r] > xA[p..q−1] < x527Partition(pivot = 4)9756(k=6−4=2)Recursex_rnk=2 (DONE!)6556(pivot = 6)Partition(k=2)Recursex_rnk=3(pivot = 7)Partition(k=6)Initialx_rnk=467314629541953726957Figure 7: Selection Algorithm.It is easy to see that the rank of the pivot x is q −p+1in A[p..r]. Let xrnk = q −p +1 .Ifk=x rnk ,then the pivot is the kth smallest, and we may just return it. If k<xrnk, then we know that we needto recursively search in A[p..q − 1] and if k>xrnk then we need to recursively search A[q +1..r].In this latter case we have eliminated q smaller elements, so we want to find the element of rank k −q.Here is the complete pseudocode.SelectionSelect(array A, int p, int r, int k) { // return kth smallest of A[p..r]if (p == r) return A[p] // only 1 item left, return it32Lecture Notes CMSC 251else {x = Choose_Pivot(A, p, r) // choose the pivot elementq = Partition(A, p, r, x) // partition <A[p..q-1], x, A[q+1..r]>x_rnk = q - p + 1 // rank of the pivotif (k == x_rnk)


View Full Document
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?