DOC PREVIEW
Berkeley COMPSCI 70 - n14

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

EECS 70 Discrete Mathematics and Probability TheorySpring 2014 Anant Sahai Note 14A family of Applications: Hashing and Load BalancingIn this lecture, we will see our first glimpse of a “killer app” of probability in EECS. Probability helps usunderstand quantitatively how much of a resource we need when demands are somewhat random. There isa wide range of applications, especially in networking where understanding randomness is the key to theidea of “stochastic multiplexing.” The basic idea behind this (which are explored a lot more in EECS122and EECS126) is that even though the universe of potential demand is vast, in the real world, the actualdemands are random and not all potential “users” of the resource will show up at once. Consequently, wecan “overbook” and share the resource while actually not provisioning for the worst-case total demand.A naive interpretation of probability would completely neglect the random fluctuations associated withchance events and would assume perfectly regular demand. We shall see how that isn’t wise by lookingat two applications:1. Suppose a hash function distributes keys evenly over a table of size n. How many (randomly chosen)keys can we hash before the probability of a collision exceeds (say)12?2. Consider the following simple load balancing scenario. We are given m jobs and n machines; weallocate each job to a machine uniformly at random and independently of all other jobs. What is alikely value for the maximum load on any machine?As we shall see, this question can be tackled by an analysis of the balls-and-bins probability space whichwe have already encountered.Application 1: Hash tablesAs you may recall, a hash table is a data structure that supports the storage of sets of keys from a (large)universe U (say, the names of all 313m people in the US). The operations supported are ADDing a key to theset, DELETEing a key from the set, and testing MEMBERship of a key in the set. The hash function h maps Uto a table T of modest size. To ADD a key x to our set, we evaluate h(x) (i.e., apply the hash function to thekey) and store x at the location h(x) in the table T . All keys in our set that are mapped to the same tablelocation are stored in a simple linked list. The operations DELETE and MEMBER are implemented in similarfashion, by evaluating h(x) and searching the linked list at h(x). Note that the efficiency of a hash functiondepends on having only few collisions — i.e., keys that map to the same location. This is because the searchtime for DELETE and MEMBER operations is proportional to the length of the corresponding linked list.The question we are interested in here is the following: suppose our hash table T has size n, and that ourhash function h distributes U evenly over T .1Assume that the keys we want to store are chosen uniformly atrandom and independently from the universe U. What is the largest number, m, of keys we can store beforethe probability of a collision reaches12?1I.e., |U| = α n (the size of U is an integer multiple α of the size of T ), and for each y ∈ T , the number of keys x ∈U for whichh(x) = y is exactly α .EECS 70, Spring 2014, Note 14 1Let’s begin by seeing how this problem can be put into the balls and bins framework. The balls will bethe m keys to be stored, and the bins will be the n locations in the hash table T . Since the keys are chosenuniformly and independently from U, and since the hash function distributes keys evenly over the table, wecan see each key (ball) as choosing a hash table location (bin) uniformly and independently from T . Thusthe probability space corresponding to this hashing experiment is exactly the same as the balls and binsspace.We are interested in the event A that there is no collision, or equivalently, that all m balls land in differentbins. Clearly Pr[A] will decrease as m increases (with n fixed). Our goal is to find the largest value of msuch that Pr[A] remains above12. [Note: Really we are looking at different sample spaces here, one for eachvalue of m. So it would be more correct to write Prmrather than just Pr, to make clear which sample spacewe are talking about. However, we will omit this detail.]Let’s fix the value of m and try to compute Pr[A]. Since our probability space is uniform (each outcome hasprobability1nm), it’s enough just to count the number of outcomes in A. In how many ways can we arrange mballs in n bins so that no bin contains more than one ball? Well, this is just the number of ways of choosingm things out of n without replacement, which as we saw in Note 10 isn ×(n −1) ×(n −2) ×···×(n −m + 2) ×(n −m + 1).This formula is valid as long as m ≤ n: if m > n then clearly the answer is zero. From now on, we’ll assumethat m ≤ n.Now we can calculate the probability of no collision:Pr[A] =n(n −1)(n −2).. . (n −m + 1)nm=nn×n −1n×n −2n×···×n −m + 1n=1 −1n×1 −2n×···×1 −m −1n. (1)Before going on, let’s pause to observe that we could compute Pr[A] in a different way, as follows. View theprobability space as a sequence of choices, one for each ball. For 1 ≤ i ≤ m, let Aibe the event that the ithball lands in a different bin from balls 1,2,.. . ,i −1. ThenPr[A] = Pr[Tni=1Ai] = Pr[A1] ×Pr[A2|A1] ×Pr[A3|A1∩A2] ×···×Pr[Am|Tm−1i=1Ai]= 1 ×n −1n×n −2n×···×n −m + 1n=1 −1n×1 −2n×···×1 −m −1n.Notice that we get the same answer. [You should make sure you see how we obtained the conditionalprobabilities in the second line above. For example, Pr[A3|A1∩A2] is the probability that the third ball landsin a different bin from the first two balls, given that those two balls also landed in different bins. This meansthat the third ball has n −2 possible bin choices out of a total of n.]Essentially, we are now done with our problem: equation (1) gives an exact formula for the probability ofno collision when m keys are hashed. All we need to do now is plug values m = 1,2,3,. . . into (1) until wefind that Pr[A] drops below12. The corresponding value of m (minus 1) is what we want.But this is not really satisfactory: it would be much more useful to have a formula that gives the “critical”value of m directly, rather than having to compute Pr[A] for m = 1,2,3,.. .. Note that we would have to doEECS 70, Spring 2014, Note 14 2this computation separately for each different value of n we are interested in: i.e., whenever we


View Full Document

Berkeley COMPSCI 70 - n14

Documents in this Course
Notes 1

Notes 1

12 pages

Note 2

Note 2

6 pages

Notes

Notes

6 pages

Notes

Notes

7 pages

Note 10

Note 10

8 pages

n20

n20

10 pages

n19

n19

10 pages

n18

n18

10 pages

n17

n17

6 pages

n16

n16

9 pages

n15

n15

10 pages

n13

n13

6 pages

n12

n12

7 pages

n11

n11

6 pages

n10

n10

8 pages

n9

n9

5 pages

n8

n8

7 pages

n7

n7

8 pages

n6

n6

5 pages

n5

n5

8 pages

n4

n4

7 pages

n3

n3

10 pages

n2

n2

7 pages

n1

n1

5 pages

Load more
Download n14
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view n14 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view n14 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?