Berkeley COMPSCI 70 - Lecture Notes - D2711727

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 70> Lecture Notes

DOC PREVIEW

Berkeley COMPSCI 70 - Lecture Notes

School name University of California, Berkeley

Course Compsci 70- Discrete Mathematics and Probability Theory

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

CS 70 Discrete Mathematics for CSFall 2003 Wagner Lecture 19Two Killer ApplicationsIn this lecture, we will see two “killer apps” of elementary probability in Computer Science.1. Suppose a hash function distributes keys evenly over a table of size n. How many (randomly chosen)keys can we hash before the probability of a collision exceeds (say)12?2. Consider the following simple load balancing scenario. We are given m jobs and n machines; weallocate each job to a machine uniformly at random and independently of all other jobs. What is alikely value for the maximum load on any machine?As we shall see, both of these questions can be tackled by an analysis of the balls-and-bins probability spacewhich we have already encountered.Application 1: Hash functionsAs you may recall, a hash table is a data structure that supports the storage of sets of keys from a (large)universe U (say, the names of all 250m people in the US). The operations supported are ADDing a key to theset, DELETEing a key from the set, and testing MEMBERship of a key in the set. The hash function h maps Uto a table T of modest size. To ADD a key x to our set, we evaluate h(x) (i.e., apply the hash function to thekey) and store x at the location h(x) in the table T. All keys in our set that are mapped to the same tablelocation are stored in a simple linked list. The operations DELETE and MEMBER are implemented in similarfashion, by evaluating h(x) and searching the linked list at h(x). Note that the efficiency of a hash functiondepends on having only few collisions— i.e., keys that map to the same location. This is because the searchtime for DELETE and MEMBER operations is proportional to the length of the corresponding linked list.The question we are interested in here is the following: suppose our hash table T has size n, and that ourhash function h distributes U evenly over T.1Assume that the keys we want to store are chosen uniformly atrandom and independently from the universe U. What is the largest number, m, of keys we can store beforethe probability of a collision reaches12?Let’s begin by seeing how this problem can be put into the balls and bins framework. The balls will bethe m keys to be stored, and the bins will be the n locations in the hash table T. Since the keys are chosenuniformly and independently from U, and since the hash function distributes keys evenly over the table, wecan see each key (ball) as choosing a hash table location (bin) uniformly and independently from T. Thusthe probability space corresponding to this hashing experiment is exactly the same as the balls and binsspace.We are interested in the event A that there is no collision, or equivalently, that all m balls land in differentbins. Clearly Pr[A] will decrease as m increases (with n fixed). Our goal is to find the largest value of m1I.e., |U| =αn (the size of U is an integer multipleαof the size of T), and for each y ∈ T, the number of keys x ∈ U for whichh(x) = y is exactlyα.CS 70, Fall 2003, Lecture 19 1such that Pr[A] remains above12. [Note: Really we are looking at different sample spaces here, one for eachvalue of m. So it would be more correct to write Prmrather than just Pr, to make clear which sample spacewe are talking about. However, we will omit this detail.]Let’s fix the value of m and try to compute Pr[A]. Since our probability space is uniform (each outcome hasprobability1nm), it’s enough just to count the number of outcomes in A. In how many ways can we arrange mballs in n bins so that no bin contains more than one ball? Well, there are n places to put the first ball, thenn−1 remaining places for the second ball (since it cannot go in the same bin as the first), n −2 places forthe third ball, and so on. Thus the total number of such arrangements isn×(n−1) ×(n−2) ×···×(n−m+ 2) ×(n−m+ 1).This formula is valid as long as m ≤ n: if m > n then clearly the answer is zero. From now on, we’ll assumethat m ≤ n.Now we can calculate the probability of no collision:Pr[A] =n(n−1)(n−2). . . (n−m+ 1)nm=nn×n−1n×n−2n×···×n−m+ 1n=1−1n×1−2n×···×1−m−1n. (1)Before going on, let’s pause to observe that we could compute Pr[A] in a different way, as follows. View theprobability space as a sequence of choices, one for each ball. For 1 ≤ i ≤ m, let Aibe the event that the ithball lands in a different bin from balls 1, 2, . . . , i−1. ThenPr[A] = Pr[Tni=1Ai] = Pr[A1] ×Pr[A2|A1] ×Pr[A3|A1∩A2] ×···×Pr[Am|Tm−1i=1Ai]= 1×n−1n×n−2n×···×n−m+ 1n=1−1n×1−2n×···×1−m−1n.Fortunately, we get the same answer as before! [You should make sure you see how we obtained theconditional probabilities in the second line above. For example, Pr[A3|A1∩A2] is the probability that thethird ball lands in a different bin from the first two balls, given that those two balls also landed in differentbins. This means that the third ball has n−2 possible bin choices out of a total of n.]Essentially, we are now done with our problem: equation (1) gives an exact formula for the probability ofno collision when m keys are hashed. All we need to do now is plug values m = 1, 2, 3, . . . into (1) until wefind that Pr[A] drops below12. The corresponding value of m (minus 1) is what we want.But this is not really satisfactory: it would be much more useful to have a formula that gives the “critical”value of m directly, rather than having to compute Pr[A] for m = 1, 2, 3, . . .. Note that we would have to dothis computation separately for each different value of n we are interested in: i.e., whenever we change thesize of our hash table.So what remains is to “turn equation (1) around”, so that it tells us the value of m at which Pr[A] dropsbelow12. To do this, let’s take logs: this is a good thing to do because it turns the product into a sum, whichis easier to handle. We getln(Pr[A]) = ln1−1n+ ln1−2n+ ···+ ln1−m−1n, (2)CS 70, Fall 2003, Lecture 19 2where “ln” denotes natural (base e) logarithm. Now we can make use of a standard approximation forlogarithms: namely, if x is small then ln(1−x) ≈ −x. This comes from the Taylor series expansionln(1−x) = −x−x22−x33−. . . .So by replacing ln(1−x) by −x we are making an error of at most (x22+x33+···), which is at most 2x2whenx ≤12. In other words, we have−x ≥ ln(1−x) ≥ −x−2x2.And if x is small then the error

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 7 pages.

Berkeley COMPSCI 70 - Lecture Notes

Sign up for free to view:

Please select your school