MIT 6 006 - Lecture 6: Hashing II: Table Doubling, Rolling Hash, Karp-Rabin - D996185

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 006> Lecture 6: Hashing II: Table Doubling, Rolling Hash, Karp-Rabin

MIT 6 006 - Lecture 6: Hashing II: Table Doubling, Rolling Hash, Karp-Rabin

School name Massachusetts Institute of Technology

Course 6 006- Introduction to Algorithms

Pages 7

Download Save

Unformatted text preview:

Lecture 6 Hashing II Table Doubling Rolling Hash Karp Rabin 6 006 Fall 2009 Lecture 6 Hashing II Table Doubling Rolling Hash Karp Rabin Lecture Overview Table Resizing Amortization DNA Comparison Karp Rabin Rolling Hash Readings CLRS Chapter 17 and 32 2 Recall Hashing with Chaining Table 0 h hash function 1 universe of all possible keys h k1 item1 h U h k3 item3 K h k2 h k4 h k5 set of actual keys not known in advance K n item2 item4 item5 m 1 m slots m n expected length n m Figure 1 Chaining in a Hash Table 1 Lecture 6 Hashing II Table Doubling Rolling Hash Karp Rabin 6 006 Fall 2009 Simple Uniform Hashing Assumption Silently used in all of this lecture Each key is equally likely to be hashed to any slot of table independent of where other keys are hashed average keys per slot is n m expected time to search insert delete O 1 Caution The above bound assumes that the application of the hash function h takes O 1 time Sometimes this is not the case e g if the keys are strings Then the keys need to be processed into numbers and then hashed to 0 1 m 1 Then the above bound is scaled by the time needed for applying h Good Hash Functions Division Method h k k mod m Good Practice m is a prime number not close to a power of 2 or 10 Multiplication Method h k a k mod 2w w r where denotes the shift right operator 2r is the table size m w the bit length of the machine words and a is chosen to be an odd integer between 2 w 1 and 2w Good Practice a not too close to 2 w 1 or 2w w k ignore a x keep r ignore w r product as sum Figure 2 Multiplication Method 2 lots of mixing Lecture 6 Hashing II Table Doubling Rolling Hash Karp Rabin 6 006 Fall 2009 How Large should Table be want m n at all times Why m too small slow recall operations take expected time 1 where m too big wasteful n m Challenge Don t know how large n will get at creation Idea Start small constant and grow or shrink as necessary Table Resizing with Rehashing To change m build new hash table from scratch Allocate table of size m For each item in old table insert into new table n m time n if m n How fast to grow When n reaches m say m 1 rebuild every step n inserts cost 1 2 n n2 m 2 m n still rebuild at insertion 2i pay 2i 1 see Figure 3 n inserts cost 1 2 4 8 n where n is really the next power of 2 n a few inserts cost linear time but 1 on average Amortized Analysis This is a common technique in data structures like paying rent 1500 month 50 day if a sequence of n operations has total cost n T n then each operation has amortized cost T n T n amortized roughly means T n on average but averaged over all ops e g inserting into a hash table with doubling takes O 1 amortized time 3 Lecture 6 Hashing II Table Doubling Rolling Hash Karp Rabin 6 006 Fall 2009 32 16 8 4 2 12 4 8 32 16 item count Figure 3 Resizing by Doubling Costly Inserts Item 1 2 4 8 16 Resizing Penalty is constant on average Back to Hashing Maintain m n so also support search in O 1 expected time assuming simple uniform hashing Delete Also O 1 expected time space can get big with respect to n e g n insert n delete solution when n decreases to m 4 shrink table to m 2 O 1 amortized cost for both insert and delete analysis is trickier see CLRS 17 4 4 Lecture 6 Hashing II Table Doubling Rolling Hash Karp Rabin 6 006 Fall 2009 Rolling Hash Human vs Chimp Given two strings S and T find the longest common substring of two strings Naive algorithm n4 Naive binary search n3 log n Winner algorithm from last lecture runs in time n2 log n using hash tables For all possible lengths Step 1 Insert all substrings s of S of length into a hash table using some hash function h Step 2 For all substrings t of T of length check if position h t of the dictionary is occupied If yes say by substring s of string S compare s and t If strings agree return s and t and exit Analysis Outer Loop using binary search on the length of the longest common substring only O log n iterations are needed For every Step 1 there are n 1 substrings s of S of length need to convert each of them into an integer how think of s S i i as a multi digit number base b where b is larger than the alphabet size s 7 S i b 1 S i 1 b 2 S i 1 use hash function h s s mod m How to compute h s without writing down the above expression Mod arithmetic magic Claim For all integers a b a b mod m a mod m b mod m mod m a b mod m a mod m b mod m mod m hence computation of hash takes time O for each substring s total time for Step 1 O n 1 O n2 5 Lecture 6 Hashing II Table Doubling Rolling Hash Karp Rabin 6 006 Fall 2009 Step 2 there are n 1 substrings t of T of length every hash operation takes O plus another potential O for comparing strings if hashes match total time for Step 1 O n 1 O n2 OVERALL time is O n2 log n OUR GOAL Drop time down to O n2 initialization time O n log n execution time By making both Step 1 and Step 2 take O n Idea use h1 h S i i to compute h2 h S i 1 i 1 in O 1 time How Go from h1 S i b 1 S i 1 b 2 S i 1 mod m to h2 S i 1 b 1 S i 2 b 2 S i mod m without recomputing h2 from scratch Magic Again Called Rolling Hash and introduced by Karp Rabin h2 S i 1 b 1 S i 2 b 2 S i mod m S i b 1 S i 1 b 2 S i 1 b S i S i b mod m h1 b S i S i b mod m mod m Now Step 1 takes time O n overall What about Step 2 Also time O n except if there are too many spurious matches of hash values which do not actually result in matching substrings Each comparison takes O and if many unsuccessful comparisons are made this could increase the time for Step 2 to O …

View Full Document


School:
Email:
New Password:
Confirm Password:

MIT 6 006 - Lecture 6: Hashing II: Table Doubling, Rolling Hash, Karp-Rabin

Sign up for free to view:

Please select your school