MIT OpenCourseWare http://ocw.mit.edu 6.006 Introduction to Algorithms Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.6.006 Recitation Build 2008.14Coming up next... • Open addressing • Karp-Rabin • coming back from the dead to hunt usOpen Addressing • Goal: use nothing but the table • Hoping for less code, better caching • Hashing ⇒ we must handle collisions • Solution: try another location))))Easy Collision handling taken taken taken taken taken here ☺ taken • h(x) = standard hash 0 function 1 • if T[h(x)] is taken 2 3 • try T[h(x)+1] h(29) ➙ 4 h(29) + 1 ➙ 5 • then T[h(x) + 2] h(29) + 2 ➙ 6 • then T[h(x) + 3] h(29) + 3 ➙ 7 8 • just like parking a car 9• h(k) grows up to H(k, i) where i is the attempt number • first try T[H(k, 0)] H(29, 0) ➙ Collision Handling: Abstracting it Up 0 taken 1 taken 2 taken 3 taken 4 taken 5 taken 6 taken 7 taken 8 taken 9 taken• h(k) grows up to H(k, i) where i is the attempt H(29, 1) ➙ number • first try T[H(k, 0)] • then T[H(k, 1)] H(29, 0) ➙ Collision Handling: Abstracting it Up 0 taken 1 taken 2 taken 3 taken 4 taken 5 taken 6 taken 7 taken 8 taken 9 taken• h(k) grows up to H(k, i) where i is the attempt H(29, 1) ➙ number • first try T[H(k, 0)] H(29, 2) ➙ • then T[H(k, 1)] • then T[H(k, 2)] H(29, 0) ➙ Collision Handling: Abstracting it Up 0 taken 1 taken 2 taken 3 taken 4 taken 5 taken 6 taken 7 taken 8 taken 9 takenCollision Handling: Abstracting it Up • h(k) grows up to H(k, i) where i is the attempt number • first try T[H(k, 0)] • then T[H(k, 1)] • then T[H(k, 2)] • stop after trying all H(29, 3) ➙ 0 taken H(29, 1) ➙ 1 taken H(29, 4) ➙ 2 taken H(29, 9) ➙ 3 taken H(29, 2) ➙ 4 taken H(29, 5) ➙ 5 taken H(29, 6) ➙ 6 taken H(29, 7) ➙ 7 taken H(29, 8) ➙ 8 taken H(29, 0) ➙ 9 takenCollision Handling: Abstracting it Up • H(k) = <H(k, 0), H(k, 1), H(k, 2) ... > • Linear probing, h(29) = 4, Hlinear(29) = ? <4, 5, 6, 7, 8, 9, 0, 1, 2, 3> • General properties? H(29, 3) ➙ 0 taken H(29, 1) ➙ 1 taken H(29, 4) ➙ 2 taken H(29, 9) ➙ 3 taken H(29, 2) ➙ 4 taken H(29, 5) ➙ 5 taken H(29, 6) ➙ 6 taken H(29, 7) ➙ 7 taken H(29, 8) ➙ 8 taken H(29, 0) ➙ 9 takenCollision Handling: Abstracting it Up • Any collision handling strategy comes to: • for key k, probe H(k,0), then H(k,1) etc. • No point in trying the same place twice • Probes should cover the whole table (otherwise we raise ‘table full’ prematurely) • Conclusion: H(k, 0), H(k, 1) ... H(k, m-1) are a permutation of {1, 2, 3 ... m}))))Linear Probing and taken taken taken taken taken here ☺ taken Permutations • h(29) = 4; H(29) = 0 1 <4, 5, 6, 7, 8, 9, 0, 1, 2, 3> 2 3 • h(k) = h0(mod m); H(k) = h(29) ➙ 4 <h0 mod m, (h0 + 1) mod h(29) + 1 ➙ 5 m, (h0 + 2) mod m, ... h(29) + 2 ➙ 6 (h0 + m - 1) mod m > h(29) + 3 ➙ 7 8 • m permutations (max m!) 9Ideal Collision Handling • Simple Hashing (collision by chaining) • Ideal hashing function: uniformly distributes keys across hash values • Open Addressing • Ideal hashing function: uniformly distributes keys across permutations • a.k.a. uniform hashingUniform Hashing: Achievable? • Simple mapping between permutations of m and numbers 1 ... m! • Convert key to big number, then use permutation number (bignum mod m!) • ... right? k mod 6 Permutation 0 <1, 2, 3> 1 <1, 3, 2> 2 <2, 1, 3> 3 <2, 3, 1> 4 <3, 1, 2> 5 <3, 2, 1>Uniform Hashing: Achievable? • Number of digits in m! • O(log(m!)) • O(m*log(m)) • Working mod m! is slow • check your Python cost model k mod 6 Permutation 0 <1, 2, 3> 1 <1, 3, 2> 2 <2, 1, 3> 3 <2, 3, 1> 4 <3, 1, 2> 5 <3, 2, 1>Working Compromise • Why does linear probing suck? • We jump in the table once, then walk • Improvement • Keep jumping after the initial jump • Jumping distance: 2nd hash function • Name: double hashingDouble Hashing: Math • h1(k) and h2(k) are hashing functions 0 taken 1 2 taken 3 4 taken 5 taken 6 taken 7 taken 8 9 taken• h1(k) and h2(k) are 0 hashing functions 1 • H(k, 0) = h1(k) 2 3 h1(29) ➙ 4 5 6 7 8 9 Double Hashing: Math taken taken taken taken taken taken taken• h1(k) and h2(k) are 0 hashing functions 1 • H(k, 0) = h1(k) 2 3 • H(k, 1) = h1(k) + h2(k) h1(29) ➙ 4 5 6 h1(29)+h2(29) ➙ 7 8 9 Double Hashing: Math taken taken taken taken taken taken taken• h1(k) and h2(k) are h1(29)+2⋅h2(29) ➙ 0 hashing functions 1 • H(k, 0) = h1(k) 2 3 • H(k, 1) = h1(k) + h2(k) h1(29) ➙ 4 5 6 h1(29)+h2(29) ➙ 7 8 9 Double Hashing: Math taken taken taken taken taken taken taken• h1(k) and h2(k) are h1(29)+2⋅h2(29) ➙ 0 hashing functions 1 • H(k, 0) = h1(k) 2 h1(29)+3⋅h2(29) ➙ 3 • H(k, 1) = h1(k) + h2(k) h1(29) ➙ 4 5 6 h1(29)+h2(29) ➙ 7 8 9 Double Hashing: Math taken taken here ☺ taken taken taken taken taken• h1(k) and h2(k) are h1(29)+2⋅h2(29) ➙ 0 hashing functions 1 • H(k, 0) = h1(k) 2 h1(29)+3⋅h2(29) ➙ 3 • H(k, 1) = h1(k) + h2(k) h1(29) ➙ 4 5 • H(k, i) = h1(k) + i⋅h2(k) 6 h1(29)+h2(29) ➙ 7 • mod m 8 • you knew that, right? 9 Double Hashing: Math taken taken here ☺ taken taken taken taken taken123456789Double Hashing Trap • gcd(h2(k), m) must be 1 • solution 1 (easy to get) • m prime, h2(k) = k mod q (where q < m) • solution 2 (faster, better) • m = 2r (table can grow) • h2(k) is odd (not even) h1(29)+2⋅h2(29) ➙ 0 taken h1(29)+3⋅h2(29) ➙ h1(29) ➙ h1(29)+h2(29) ➙ taken here ☺ taken taken taken taken takenOpen Addressing: Deleting Keys h1(29)+2⋅h2(29) ➙ 0 taken • Suppose we want to 1 delete kd stored at 7 2 taken here ☺ • Can’t simply wipe the h1(29)+3⋅h2(29) ➙ 3 entry, because key 29 h1(29) ➙ 4 taken wouldn’t be found 5 taken anymore 6 taken • rember H(29) = h1(29)+h2(29) ➙ 7 kd <4, 7, 0, 3 ...> 8 9 takenOpen Addressing: Deleting Keys h1(29)+2⋅h2(29) ➙ • Entry meaning ‘deleted’ • Handling ‘deleted’ h1(29)+3⋅h2(29) ➙ • Search: Keep looking h1(29) ➙ • Insert: Stop, replace ‘deleted’ with the new h1(29)+h2(29) ➙ key/value 0 taken 1 2 taken …
View Full Document