UMD CMSC 351 - Lecture 14: HeapSort Analysis and Partitioning - D487243

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 351> Lecture 14: HeapSort Analysis and Partitioning

DOC PREVIEW

UMD CMSC 351 - Lecture 14: HeapSort Analysis and Partitioning

School name University of Maryland, College Park

Course Cmsc 351- Algorithms

Pages 4

This preview shows page 1 out of 4 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 4 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Lecture Notes CMSC 251Heapify(A, 1, m) // fix things up}}An example of HeapSort is shown in Figure 7.4 on page 148 of CLR. We make n −1 calls to Heapify,each of which takes O(log n) time. So the total running time is O((n − 1) log n)=O(nlog n).Lecture 14: HeapSort Analysis and Partitioning(Thursday, Mar 12, 1998)Read: Chapt 7 and 8 in CLR. The algorithm we present for partitioning is different from the texts.HeapSort Analysis: Last time we presented HeapSort. Recall that the algorithm operated by first building aheap in a bottom-up manner, and then repeatedly extracting the maximum element from the heap andmoving it to the end of the array. One clever aspect of the data structure is that it resides inside thearray to be sorted.We argued that the basic heap operation of Heapify runs in O(log n) time, because the heap hasO(log n) levels, and the element being sifted moves down one level of the tree after a constant amountof work.Based on this we can see that (1) that it takes O(n log n) time to build a heap, because we needto apply Heapify roughly n/2 times (to each of the internal nodes), and (2) that it takes O(n log n)time to extract each of the maximum elements, since we need to extract roughly n elements and eachextraction involves a constant amount of work and one Heapify. Therefore the total running time ofHeapSort is O(n log n).Is this tight? That is, is the running time Θ(n log n)? The answer is yes. In fact, later we will see that itis not possible to sort faster than Ω(n log n) time, assuming that you use comparisons, which HeapSortdoes. However, it turns out that the first part of the analysis is not tight. In particular, the BuildHeapprocedure that we presented actually runs in Θ(n) time. Although in the wider context of the HeapSortalgorithm this is not significant (because the running time is dominated by the Θ(n log n) extractionphase).Nonetheless there are situations where you might not need to sort all of the elements. For example, itis common to extract some unknown number of the smallest elements until some criterion (dependingon the particular application) is met. For this reason it is nice to be able to build the heap quickly sinceyou may not need to extract all the elements.BuildHeap Analysis: Let us consider the running time of BuildHeap more carefully. As usual, it will makeour lives simple by making some assumptions about n. In this case the most convenient assumption isthat n is of the form n =2h+1−1, where h is the height of the tree. The reason is that a left-completetree with this number of nodes is a complete tree, that is, its bottommost level is full. This assumptionwill save us from worrying about floors and ceilings.With this assumption, level 0 of the tree has 1 node, level 1 has 2 nodes, and up to level h, which has2hnodes. All the leaves reside on level h.Recall that when Heapify is called, the running time depends on how far an element might sift downbefore the process terminates. In the worst case the element might sift down all the way to the leaflevel. Let us count the work done level by level.At the bottommost level there are 2hnodes, but we do not call Heapify on any of these so the work is0. At the next to bottommost level there are 2h−1nodes, and each might sift down 1 level. At the 3rdlevel from the bottom there are 2h−2nodes, and each might sift down 2 levels. In general, at level j44Lecture Notes CMSC 251 0 0 0 0 0 02*2 0 0*81*43*1Total work for BuildHeap 2 0 1 1 1 1 2 3Figure 13: Analysis of BuildHeap.from the bottom there are 2h−jnodes, and each might sift down j levels. So, if we count from bottomto top, level-by-level, we see that the total time is proportional toT (n)=hXj=0j2h−j=hXj=0j2h2j.If we factor out the 2hterm, we haveT (n)=2hhXj=0j2j.This is a sum that we have never seen before. We could try to approximate it by an integral, whichwould involve integration by parts, but it turns out that there is a very cute solution to this particularsum. We’ll digress for a moment to work it out. First, write down the infinite general geometric series,for any constant x<1.∞Xj=0xj=11 − x.Then take the derivative of both sides with respect to x, and multiply by x giving:∞Xj=0jxj−1=1(1 − x)2∞Xj=0jxj=x(1 − x)2,and if we plug x =1/2, then voila! we have the desired formula:∞Xj=0j2j=1/2(1 − (1/2))2=1/21/4=2.In our case we have a bounded sum, but since the infinite series is bounded, we can use it instead as aneasy approximation.Using this we haveT (n)=2hhXj=0j2j≤ 2h∞Xj=0j2j≤ 2h· 2=2h+1.Now recall that n =2h+1− 1,sowehaveT(n)≤n+1∈O(n). Clearly the algorithm takes at leastΩ(n) time (since it must access every element of the array at least once) so the total running time forBuildHeap is Θ(n).45Lecture Notes CMSC 251It is worthwhile pausing here a moment. This is the second time we have seen a relatively complexstructured algorithm, with doubly nested loops, come out with a running time of Θ(n). (The otherexample was the median algorithm, based on the sieve technique. Actually if you think deeply aboutthis, there is a sense in which a parallel version of BuildHeap can be viewed as operating like a sieve,but maybe this is getting too philosophical.) Perhaps a more intuitive way to describe what is happeninghere is to observe an important fact about binary trees. This is that the vast majority of nodes are at thelowest level of the tree. For example, in a complete binary tree of height h there is a total of n ≈ 2h+1nodes in total, and the number of nodes in the bottom 3 levels alone is2h+2h−1+2h−2=n2+n4+n8=7n8=0.875n.That is, almost 90% of the nodes of a complete binary tree reside in the 3 lowest levels. Thus the lessonto be learned is that when designing algorithms that operate on trees, it is important to be most efficienton the bottommost levels of the tree (as BuildHeap is) since that is where most of the weight of the treeresides.Partitioning: Our next sorting algorithm is QuickSort. QuickSort is interesting in a number of respects.First off, (as we will present it) it is a randomized algorithm, which means that it makes use of a ran-dom number generator. We will show that in the worst case its running time is O(n2), its expectedcase running time is O(n log n). Moreover, this expected case running time occurs with high proba-bility, in that the probability that the algorithm takes significantly more than O(n log n) time is rapidlydecreasing function of n. In addition, QuickSort

View Full Document