Lecture Notes on Binary Search 15 122 Principles of Imperative Computation Frank Pfenning Lecture 6 January 27 2011 1 Introduction One of the fundamental and recurring problems in computer science is to find elements in collections such as elements in sets An important algorithm for this problem is binary search We use binary search for an integer in a sorted array to exemplify it We started in the last lecture by discussing linear search and giving some background on the problem We will also once again see the importance of loop invariants in writing correct code Here is a note by Jon Bentley about binary search I ve assigned binary search in courses at Bell Labs and IBM Professional programmers had a couple of hours to convert its description into a program in the language of their choice a high level pseudocode was fine At the end of the specified time almost all the programmers reported that they had correct code for the task We would then take thirty minutes to examine their code which the programmers did with test cases In several classes and with over a hundred programmers the results varied little ninety percent of the programmers found bugs in their programs and I wasn t always convinced of the correctness of the code in which no bugs were found I was amazed given ample time only about ten percent of professional programmers were able to get this small program right But they aren t the only ones to find this task difficult in the history in Section 6 2 1 of his Sorting and Searching Knuth points out that L ECTURE N OTES J ANUARY 27 2011 Binary Search L6 2 while the first binary search was published in 1946 the first published binary search without bugs did not appear until 1962 Jon Bentley Programming Pearls 1st edition pp 35 36 I contend that what these programmers are missing is the understanding of how to use loop invariants in composing their programs They help us to make assumptions explicit and clarify the reasons why a particular program is correct Part of the magic of pre and post conditions as well as loop invariants and assertions is that they localize reasoning Rather than having to look at the whole program or the whole function we can focus on individual statements tracking properties via the loop invariants and assertions 2 Binary Search Can we do better than searching through the array linearly If you don t know the answer already it might be surprising that yes we can do significantly better Perhaps almost equally surprising is that the code is almost as short Before we write the code let us describe the algorithm We start by examining the middle element of the array If it smaller than x than x must be in the upper half of the array if it is there at all if is greater than x then it must be in the lower half Now we continue by restricting our attention to either the upper or lower half again finding the middle element and proceeding as before We stop if we either find x or if the size of the subarray shrinks to zero in which case x cannot be in the array Before we write a program to implement this algorithm let us analyze the running time Assume for the moment that the size of the array is a power of 2 say 2k Each time around the loop when we examine the middle element we cut the size of the subarrays we look at in half So before the first iteration the size of the subarray of interest is 2k After the second iteration it is of size 2k 1 then 2k 2 etc After k iterations it will be 2k k 1 so we stop after the next iteration Altogether we can have at most k 1 iterations Within each iteration we perform a constant amount of work computing the midpoint and a few comparisons So overall when given a size of array n we perform c log2 n operations 1 1 In general in computer science we are mostly interested in logarithm to the base 2 so we will just write log n for log to the base 2 from now on unless we are considering a L ECTURE N OTES J ANUARY 27 2011 Binary Search L6 3 If the size n is not a power of 2 then we can round n up to the next power of 2 and the reasoning above still applies For example is n 13 we round it up to 16 24 The actual number of steps can only be smaller than this bound because some of the actual subintervals may be smaller than the bound we obtained when rounding up n The logarithm grows much slower than the linear function that we obtained when analyzing linear search As before consider that we are doubling the size of the input n0 2 n Then the number of operations will be c log 2 n c log 2 log n c 1 log n c c log n So the number of operations increases only by a constant amount c when we double the size of the input Considering that the largest representable positive number in two s complement representation is 231 1 about 2 billion binary search even for unreasonably large arrays will only traverse the loop 31 times So the maximal number of operations is effectively bounded by a constant if it is logarithmic different base L ECTURE N OTES J ANUARY 27 2011 Binary Search 3 L6 4 Implementing Binary Search The specification for binary search is the same as for linear search int binsearch int x int A int n requires 0 n n length A requires is sorted A n ensures 1 result is in x A n 0 result result n A result x We have two variables lower and upper which hold the lower and upper end of the subinterval in the array that we are considering We start with lower as 0 and upper as n so the interval includes lower and excludes upper This often turns out to be a convenient choice when computing with arrays The for loop from linear search becomes a while loop exiting when the interval has size zero that is lower upper We can easily write the first loop invariant relating lower and upper to each other and the overall bound of the array int binsearch int x int A int n requires 0 n n length A requires is sorted A n ensures 1 result is in x A n 0 result result n A result x int lower 0 int upper n while lower upper loop invariant 0 lower lower upper upper n return 1 In the body of the loop we first compute the midpoint mid We assert that the midpoint is indeed between lower and upper That assertion is easy to see because lower upper and therefore upper lower 0 Furthermore lower upper lower upper so if we divide the second summand L ECTURE N OTES J …
View Full Document
Unlocking...