Algorithms in Java, 4th Edition·Robert Sedgewick and Kevin Wayne·Copyright © 2008·February 20, 2008 12:17:14 AMAdvanced Topics in Sorting‣selection‣duplicate keys‣system sorts‣comparators2SelectionGoal. Find the kth largest element.Ex. Min (k = 0), max (k = N-1), median (k = N/2).Applications.•Order statistics.•Find the “top k.”Use theory as a guide.•Easy O(N log N) upper bound.•Easy O(N) upper bound for k = 1, 2, 3.•Easy Ω(N) lower bound.Which is true?•Ω(N log N) lower bound?•O(N) upper bound?is selection as hard as sorting?is there a linear-time algorithm for all k?Partition array so that:•Element a[i] is in place.•No larger element to the left of i.•No smaller element to the right of i.Repeat in one subarray, depending on i; finished when i equals k.3Quick-selectpublic static Comparable select(Comparable[] a, int k){ StdRandom.shuffle(a); int lo = 0, hi = a.length - 1; while (hi > lo) { int i = partition(a, lo, hi); if (i < k) lo = i + 1; else if (i > k) hi = i - 1; else return a[k]; } return a[k];}vvlo hilo hi v vibeforeafterif a[k] is here,set hi to i-1if a[k] is here,set lo to i+14Quick-select: mathematical analysisProposition. Quick-select takes linear time on average.Pf sketch.•Intuitively, each partitioning step roughly splits array in half:N + N/2 + N/4 + … + 1 ~ 2N compares.•Formal analysis similar to quicksort analysis yields:Ex. (2 + 2 ln 2) N compares to find the median.Remark. Quick-select might use ~ N2/2 compares, but as with quicksort,the random shuffle provides a probabilistic guarantee.CN = 2 N + k ln ( N / k) + (N - k) ln (N / (N - k))5Theoretical context for selectionChallenge. Design a selection algorithm whose running time is linear in the worst-case.Theorem. [Blum, Floyd, Pratt, Rivest, Tarjan, 1973] There exists a compare-based selection algorithm that takes linear time in the worst case.Remark. Algorithm is too complicated to be useful in practice.Use theory as a guide.•Still worthwhile to seek practical linear-time (worst-case) algorithm.•Until one is discovered, use quick-select if you don’t need a full sort.6Generic methodsIn our select() implementation, client needs a cast.The compiler is also unhappy.Q. How to fix? % javac Quick.java Note: Quick.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. Double[] a = new Double[N]; for (int i = 0; i < N; i++) a[i] = StdRandom.uniform(); Double median = (Double) Quick.select(a, N/2);hazardous castrequired7Generic methodsSafe version. Compiles cleanly, no cast needed in client.Remark. Obnoxious code needed in system sort; not in this course (for brevity).public class Quick{ public static <Key extends Comparable<Key>> Key select(Key[] a, int k) { /* as before */ } public static <Key extends Comparable<Key>> void sort(Key[] a) { /* as before */ } private static <Key extends Comparable<Key>> int partition(Key[] a, int lo, int hi) { /* as before */ } private static <Key extends Comparable<Key>> boolean less(Key v, Key w) { /* as before */ } private static <Key extends Comparable<Key>> void exch(Key[] a, int i, int j) { Key swap = a[i]; a[i] = a[j]; a[j] = swap; } } generic type variable(value inferred from argument a[])return type matches array typecan declare variables of generic type‣selection‣duplicate keys‣comparators‣applications89Duplicate keysOften, purpose of sort is to bring records with duplicate keys together.•Sort population by age.•Find collinear points.•Remove duplicates from mailing list.•Sort job applicants by college attended. Typical characteristics of such applications.•Huge file.•Small number of key values.see Assignment 3Chicago 09:00:00Phoenix 09:00:03Houston 09:00:13Chicago 09:00:59Houston 09:01:10Chicago 09:03:13Seattle 09:10:11Seattle 09:10:25Phoenix 09:14:25Chicago 09:19:32Chicago 09:19:46Chicago 09:21:05Seattle 09:22:43Seattle 09:22:54Chicago 09:25:52Chicago 09:35:21Seattle 09:36:14Phoenix 09:37:44Chicago 09:00:00Chicago 09:00:59Chicago 09:03:13Chicago 09:19:32Chicago 09:19:46Chicago 09:21:05Chicago 09:25:52Chicago 09:35:21Houston 09:00:13Houston 09:01:10Phoenix 09:00:03Phoenix 09:14:25Phoenix 09:37:44Seattle 09:10:11Seattle 09:10:25Seattle 09:22:43Seattle 09:22:54Seattle 09:36:14Chicago 09:25:52Chicago 09:03:13Chicago 09:21:05Chicago 09:19:46Chicago 09:19:32Chicago 09:00:00Chicago 09:35:21Chicago 09:00:59Houston 09:01:10Houston 09:00:13Phoenix 09:37:44Phoenix 09:00:03Phoenix 09:14:25Seattle 09:10:25Seattle 09:36:14Seattle 09:22:43Seattle 09:10:11Seattle 09:22:54Stability when sorting on a second keysortedsorted by time sorted by city (unstable) sorted by city (stable)NOTsortedkey10Duplicate keysMergesort with duplicate keys. Always ~ N lg N compares.Quicksort with duplicate keys.•Algorithm goes quadratic unless partitioning stops on equal keys!•1990s C user found this defect in qsort().several textbook and system implementationsalso have this defectDuplicate keys: the problemAssume all keys are equal. Recursive code guarantees this case predominates!Mistake. Put all keys equal to the partitioning element on one side.Consequence. ~ N2 / 2 compares when all keys equal.Recommended. Stop scans on keys equal to the partitioning element.Consequence. ~ N lg N compares when all keys equal.Desirable. Put all keys equal to the partitioning element in place.11B A A B A B B B C C C A A A A A A A A A A AB A A B A B C C B C B A A A A A A A A A A AA A A B B B B B C C C A A A A A A A A A A AGoal. Partition array into 3 parts so that:•Elements between lt and gt equal to partition element v.•No larger elements to left of lt.•No smaller elements to right of gt.Dutch national flag problem. [Edsger Dijkstra]•Convention wisdom until mid 1990s: not worth doing.•New approach discovered when fixing mistake in C library qsort().•Now incorporated into qsort() and Java system sort.123-way partitioningv>v<v=vlo hilt gtlo hi3-way partitioningbeforeafter133-way partitioning: Dijkstra's solution3-way partitioning.•Let v be partitioning element a[lo].•Scan i from left to right.-a[i] less than v : exchange a[lt] with a[i] and increment both lt and i-a[i] greater than v : exchange a[gt] with a[i] and decrement gt-a[i] equal to v : increment iAll the right properties.•In-place.•Not much code.•Small overhead if no equal
View Full Document