Lecture 9 Linear Sorting Steven Skiena Department of Computer Science State University of New York Stony Brook NY 11794 4400 http www cs sunysb edu skiena Problem of the Day The nuts and bolts problem is defined as follows You are given a collection of n bolts of different widths and n corresponding nuts You can test whether a given nut and bolt together from which you learn whether the nut is too large too small or an exact match for the bolt The differences in size between pairs of nuts or bolts can be too small to see by eye so you cannot rely on comparing the sizes of two nuts or two bolts directly You are to match each bolt to each nut 1 Give an O n2 algorithm to solve the nuts and bolts problem 2 Suppose that instead of matching all of the nuts and bolts you wish to find the smallest bolt and its corresponding nut Show that this can be done in only 2n 2 comparisons 3 Match the nuts and bolts in expected O n log n time Solution Quicksort Pseudocode Sort A Quicksort A 1 n Quicksort A low high if low high pivot location Partition A low high Quicksort A low pivot location 1 Quicksort A pivot location 1 high Partition Implementation Partition A low high pivot A low leftwall low for i low 1 to high if A i pivot then leftwall leftwall 1 swap A i A leftwall swap A low A leftwall Quicksort Animation Q U I C K S O R T Q I C K S O R T U Q I C K O R S T U I C K O Q R S T U I C K O Q R S T U I C K O Q R S T U Best Case for Quicksort Since each element ultimately ends up in the correct position the algorithm correctly sorts But how long does it take The best case for divide and conquer algorithms comes when we split the input as evenly as possible Thus in the best case each subproblem is of size n 2 The partition step on each subproblem is linear in its size Thus the total effort in partitioning the 2k problems of size n 2k is O n Best Case Recursion Tree The total partitioning on each level is O n and it take lg n levels of perfect partitions to get to single element subproblems When we are down to single elements the problems are sorted Thus the total time in the best case is O n lg n Worst Case for Quicksort Suppose instead our pivot element splits the array as unequally as possible Thus instead of n 2 elements in the smaller half we get zero meaning that the pivot element is the biggest or smallest element in the array Now we have n 1 levels instead of lg n for a worst case time of n2 since the first n 2 levels each have n 2 elements to partition To justify its name Quicksort had better be good in the average case Showing this requires some intricate analysis The divide and conquer principle applies to real life If you break a job into pieces make the pieces of equal size Intuition The Average Case for Quicksort Suppose we pick the pivot element at random in an array of n keys 1 n 4 n 2 3n 4 n Half the time the pivot element will be from the center half of the sorted array Whenever the pivot element is from positions n 4 to 3n 4 the larger remaining subarray contains at most 3n 4 elements How Many Good Partitions If we assume that the pivot element is always in this range what is the maximum number of partitions we need to get from n elements down to 1 element 3 4 l n 1 n 4 3 l lg n l lg 4 3 Therefore l lg 4 3 lg n 2 lg n good partitions suffice How Many Bad Partitions How often when we pick an arbitrary element as pivot will it generate a decent partition Since any number ranked between n 4 and 3n 4 would make a decent pivot we get one half the time on average If we need 2 lg n levels of decent partitions to finish the job and half of random partitions are decent then on average the recursion tree to quicksort the array has 4 lg n levels Since O n work is done partitioning on each level the average time is O n lg n Average Case Analysis of Quicksort To do a precise average case analysis of quicksort we formulate a recurrence given the exact expected time T n 1 T p 1 T n p n 1 p 1 n Each possible pivot p is selected with equal probability The number of comparisons needed to do the partition is n 1 We will need one useful fact about the Harmonic numbers Hn namely n X Hn 1 i ln n T n n X i 1 It is important to understand 1 where the recurrence relation comes from and 2 how the log comes out from the summation The rest is just messy algebra n 1 X T p 1 T n p n 1 T n p 1 n n 2 X T n T p 1 n 1 p 1 n n X nT n 2 T p 1 n n 1 multiply by n p 1 n 1 T n 1 2 n 1 X p 1 T p 1 n 1 n 2 apply to n 1 nT n n 1 T n 1 2T n 1 2 n 1 rearranging the terms give us T n 1 2 n 1 T n n 1 n n n 1 substituting an A n n 1 gives n 2 i 1 2 n 1 X an an 1 n n 1 i 1 i i 1 1 2 ln n i 1 i 1 We are really interested in A n so an 2 n X A n n 1 an 2 n 1 ln n 1 38n lg n Pick a Better Pivot Having the worst case occur when they are sorted or almost sorted is very bad since that is likely to be the case in certain applications To eliminate this problem pick a better pivot 1 Use the middle element of the subarray as pivot 2 Use a random element of the array as the pivot 3 Perhaps best of all take the median of three elements first last middle as the pivot Why should we use median instead of the mean Whichever of these three rules we use the worst case remains O n2 Is Quicksort really faster than Heapsort Since Heapsort is n lg n and selection sort is n2 there is no debate about which will be better for decent sized files When Quicksort is implemented well it is typically 2 3 times faster than mergesort or heapsort The primary reason is …
View Full Document
Unlocking...