Lecture Notes CMSC 251 Analysis What remains is to analyze the running time of MergeSort First let us consider the running time of the procedure Merge A p q r Let n r p 1 denote the total length of both the left and right subarrays What is the running time of Merge as a function of n The algorithm contains four loops none nested in the other It is easy to see that each loop can be executed at most n times If you are a bit more careful you can actually see that all the while loops together can only be executed n times in total because each execution copies one new element to the array B and B only has space for n elements Thus the running time to Merge n items is n Let us write this without the asymptotic notation simply as n We ll see later why we do this Now how do we describe the running time of the entire MergeSort algorithm We will do this through the use of a recurrence that is a function that is defined recursively in terms of itself To avoid circularity the recurrence for a given value of n is defined in terms of values that are strictly smaller than n Finally a recurrence has some basis values e g for n 1 which are defined explicitly Let s see how to apply this to MergeSort Let T n denote the worst case running time of MergeSort on an array of length n For concreteness we could count whatever we like number of lines of pseudocode number of comparisons number of array accesses since these will only differ by a constant factor Since all of the real work is done in the Merge procedure we will count the total time spent in the Merge procedure First observe that if we call MergeSort with a list containing a single element then the running time is a constant Since we are ignoring constant factors we can just write T n 1 When we call MergeSort with a list of length n 1 e g Merge A p r where r p 1 n the algorithm first computes q b p r 2c The subarray A p q which contains q p 1 elements You can verify by some tedious floor ceiling arithmetic or simpler by just trying an odd example and an even example that is of size dn 2e Thus the remaining subarray A q 1 r has bn 2c elements in it How long does it take to sort the left subarray We do not know this but because dn 2e n for n 1 we can express this as T dn 2e Similarly we can express the time that it takes to sort the right subarray as T bn 2c Finally to merge both sorted lists takes n time by the comments made above In conclusion we have 1 if n 1 T n T dn 2e T bn 2c n otherwise Lecture 7 Recurrences Tuesday Feb 17 1998 Read Chapt 4 on recurrences Skip Section 4 4 Divide and Conquer and Recurrences Last time we introduced divide and conquer as a basic technique for designing efficient algorithms Recall that the basic steps in divide and conquer solution are 1 divide the problem into a small number of subproblems 2 solve each subproblem recursively and 3 combine the solutions to the subproblems to a global solution We also described MergeSort a sorting algorithm based on divide and conquer Because divide and conquer is an important design technique and because it naturally gives rise to recursive algorithms it is important to develop mathematical techniques for solving recurrences either exactly or asymptotically To do this we introduced the notion of a recurrence that is a recursively defined function Today we discuss a number of techniques for solving recurrences MergeSort Recurrence Here is the recurrence we derived last time for MergeSort Recall that T n is the time to run MergeSort on a list of size n We argued that if the list is of length 1 then the total sorting time is a constant 1 If n 1 then we must recursively sort two sublists one of size dn 2e and the other of size bn 2c and the nonrecursive part took n time for splitting the list constant time 23 Lecture Notes CMSC 251 and merging the lists n time Thus the total running time for MergeSort could be described by the following recurrence 1 if n 1 T n T dn 2e T bn 2c n otherwise Notice that we have dropped the s replacing 1 and n by just 1 and n respectively This is done to make the recurrence more concrete If we had wanted to be more precise we could have replaced these with more exact functions e g c1 and c2 n for some constants c1 and c2 The analysis would have been a bit more complex but we would arrive at the same asymptotic formula Getting a feel We could try to get a feeling for what this means by plugging in some values and expanding the definition T 1 1 by the basis T 2 T 1 T 1 2 1 1 2 4 T 3 T 2 T 1 3 4 1 3 8 T 4 T 5 T 2 T 2 4 4 4 4 12 T 3 T 2 5 8 4 5 17 T 8 T 4 T 4 8 12 12 8 32 T 16 T 8 T 8 16 32 32 16 80 T 32 T 16 T 16 32 80 80 32 192 It s hard to see much of a pattern here but here is a trick Since the recurrence divides by 2 each time let s consider powers of 2 since the function will behave most regularly for these values If we consider the ratios T n n for powers of 2 and interesting pattern emerges T 1 1 1 T 2 2 2 T 4 4 3 T 8 8 4 T 16 16 5 T 32 32 6 This suggests that for powers of 2 T n n lg n 1 or equivalently T n n lg n n which is n log n This is not a proof but at least it provides us with a starting point Logarithms in notation Notice that I have broken away from my usual convention of say lg n and just said log n inside the The reason is that the base really does not matter when it is inside the Recall the change of base formula loga n logb n loga b If a and b are constants the loga b is a constant Consequently logb n and loga n differ only by a constant factor Thus inside the we do not need to differentiate between them Henceforth I will not be fussy about the bases of logarithms if asymptotic results are sufficient Eliminating Floors and Ceilings One of the nasty things about recurrences is that floors and ceilings are a pain to deal with So whenever it is reasonable to …
View Full Document