DOC PREVIEW
Berkeley COMPSCI 70 - Lecture Notes

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 70 Discrete Mathematics for CSFall 2003 Wagner Lecture 19Two Killer ApplicationsIn this lecture, we will see two “killer apps” of elementary probability in Computer Science.1. Suppose a hash function distributes keys evenly over a table of size n. How many (randomly chosen)keys can we hash before the probability of a collision exceeds (say)12?2. Consider the following simple load balancing scenario. We are given m jobs and n machines; weallocate each job to a machine uniformly at random and independently of all other jobs. What is alikely value for the maximum load on any machine?As we shall see, both of these questions can be tackled by an analysis of the balls-and-bins probability spacewhich we have already encountered.Application 1: Hash functionsAs you may recall, a hash table is a data structure that supports the storage of sets of keys from a (large)universe U (say, the names of all 250m people in the US). The operations supported are ADDing a key to theset, DELETEing a key from the set, and testing MEMBERship of a key in the set. The hash function h maps Uto a table T of modest size. To ADD a key x to our set, we evaluate h(x) (i.e., apply the hash function to thekey) and store x at the location h(x) in the table T. All keys in our set that are mapped to the same tablelocation are stored in a simple linked list. The operations DELETE and MEMBER are implemented in similarfashion, by evaluating h(x) and searching the linked list at h(x). Note that the efficiency of a hash functiondepends on having only few collisions— i.e., keys that map to the same location. This is because the searchtime for DELETE and MEMBER operations is proportional to the length of the corresponding linked list.The question we are interested in here is the following: suppose our hash table T has size n, and that ourhash function h distributes U evenly over T.1Assume that the keys we want to store are chosen uniformly atrandom and independently from the universe U. What is the largest number, m, of keys we can store beforethe probability of a collision reaches12?Let’s begin by seeing how this problem can be put into the balls and bins framework. The balls will bethe m keys to be stored, and the bins will be the n locations in the hash table T. Since the keys are chosenuniformly and independently from U, and since the hash function distributes keys evenly over the table, wecan see each key (ball) as choosing a hash table location (bin) uniformly and independently from T. Thusthe probability space corresponding to this hashing experiment is exactly the same as the balls and binsspace.We are interested in the event A that there is no collision, or equivalently, that all m balls land in differentbins. Clearly Pr[A] will decrease as m increases (with n fixed). Our goal is to find the largest value of m1I.e., |U| =αn (the size of U is an integer multipleαof the size of T), and for each y ∈ T, the number of keys x ∈ U for whichh(x) = y is exactlyα.CS 70, Fall 2003, Lecture 19 1such that Pr[A] remains above12. [Note: Really we are looking at different sample spaces here, one for eachvalue of m. So it would be more correct to write Prmrather than just Pr, to make clear which sample spacewe are talking about. However, we will omit this detail.]Let’s fix the value of m and try to compute Pr[A]. Since our probability space is uniform (each outcome hasprobability1nm), it’s enough just to count the number of outcomes in A. In how many ways can we arrange mballs in n bins so that no bin contains more than one ball? Well, there are n places to put the first ball, thenn−1 remaining places for the second ball (since it cannot go in the same bin as the first), n −2 places forthe third ball, and so on. Thus the total number of such arrangements isn×(n−1) ×(n−2) ×···×(n−m+ 2) ×(n−m+ 1).This formula is valid as long as m ≤ n: if m > n then clearly the answer is zero. From now on, we’ll assumethat m ≤ n.Now we can calculate the probability of no collision:Pr[A] =n(n−1)(n−2). . . (n−m+ 1)nm=nn×n−1n×n−2n×···×n−m+ 1n=1−1n×1−2n×···×1−m−1n. (1)Before going on, let’s pause to observe that we could compute Pr[A] in a different way, as follows. View theprobability space as a sequence of choices, one for each ball. For 1 ≤ i ≤ m, let Aibe the event that the ithball lands in a different bin from balls 1, 2, . . . , i−1. ThenPr[A] = Pr[Tni=1Ai] = Pr[A1] ×Pr[A2|A1] ×Pr[A3|A1∩A2] ×···×Pr[Am|Tm−1i=1Ai]= 1×n−1n×n−2n×···×n−m+ 1n=1−1n×1−2n×···×1−m−1n.Fortunately, we get the same answer as before! [You should make sure you see how we obtained theconditional probabilities in the second line above. For example, Pr[A3|A1∩A2] is the probability that thethird ball lands in a different bin from the first two balls, given that those two balls also landed in differentbins. This means that the third ball has n−2 possible bin choices out of a total of n.]Essentially, we are now done with our problem: equation (1) gives an exact formula for the probability ofno collision when m keys are hashed. All we need to do now is plug values m = 1, 2, 3, . . . into (1) until wefind that Pr[A] drops below12. The corresponding value of m (minus 1) is what we want.But this is not really satisfactory: it would be much more useful to have a formula that gives the “critical”value of m directly, rather than having to compute Pr[A] for m = 1, 2, 3, . . .. Note that we would have to dothis computation separately for each different value of n we are interested in: i.e., whenever we change thesize of our hash table.So what remains is to “turn equation (1) around”, so that it tells us the value of m at which Pr[A] dropsbelow12. To do this, let’s take logs: this is a good thing to do because it turns the product into a sum, whichis easier to handle. We getln(Pr[A]) = ln1−1n+ ln1−2n+ ···+ ln1−m−1n, (2)CS 70, Fall 2003, Lecture 19 2where “ln” denotes natural (base e) logarithm. Now we can make use of a standard approximation forlogarithms: namely, if x is small then ln(1−x) ≈ −x. This comes from the Taylor series expansionln(1−x) = −x−x22−x33−. . . .So by replacing ln(1−x) by −x we are making an error of at most (x22+x33+···), which is at most 2x2whenx ≤12. In other words, we have−x ≥ ln(1−x) ≥ −x−2x2.And if x is small then the error


View Full Document

Berkeley COMPSCI 70 - Lecture Notes

Documents in this Course
Notes 1

Notes 1

12 pages

Note 2

Note 2

6 pages

Notes

Notes

6 pages

Notes

Notes

7 pages

Note 10

Note 10

8 pages

n20

n20

10 pages

n19

n19

10 pages

n18

n18

10 pages

n17

n17

6 pages

n16

n16

9 pages

n15

n15

10 pages

n14

n14

9 pages

n13

n13

6 pages

n12

n12

7 pages

n11

n11

6 pages

n10

n10

8 pages

n9

n9

5 pages

n8

n8

7 pages

n7

n7

8 pages

n6

n6

5 pages

n5

n5

8 pages

n4

n4

7 pages

n3

n3

10 pages

n2

n2

7 pages

n1

n1

5 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?