6 006 Introduction to 6 006 Algorithms Lecture 7 P f Manolis Prof M li Kellis K lli CLRS 11 4 17 Unit 2 Genomes Hashing and Dictionaries 2 Unit 2 Hashing L Lastt Tues T Genomes G Dictionaries Di ti i Hashing H hi Intro basic operations collisions and chaining Simple Si l uniform if hashing h hi assumption i Last Thur Faster hashing hash functions Hash functions in practice div mult python Faster hashing Rolling Hash O n2 O nlgn Faster comparison Signatures mismatch time Today Space issues Dynamic resizing and amortized analysis Open addressing deletions and probing Advanced topics universal hashing fingerprints 3 Today Hashing I I Space issues R Rev H Hash h functs f t chaining h i i SUHA SUHA rolling lli signatures i t Dynamic dictionaries Resizing hash tables When to resize insertions deletions Resizing operations amortized analysis Open addressing Doing away w linked lists Operations insertion deletion Probing linear probing double hashing Performance analysis UHA open vs chaining Advanced topics Randomized algorithms shht U i Universal lh hashing hi perfect f th hashing hi Fingerprinting file signatures false positives Remember Hashing I and II Hashing and hash functions Humongous universe of keys itty bitty little space Hash table as dictionaryy Insert Search Delete Collisions by chaining Build a linked list in each bucket Operation time is length of list Simple Uniform Hashing Assumption Every item to uniform random bucket n items in size m table average length n m Speeding up hashing Rolling Hash fast sequence of hash s Signatures fast comparison avoid frequent mismatches Comparing C i genomes O n4 ObinsearchL n3lgn Ohash n2lgn Oroll sign nlgn Today Hashing I I Space issues R Rev H Hash h functs f t chaining h i i SUHA SUHA rolling lli signatures i t Dynamic dictionaries Resizing hash tables When to resize insertions deletions Resizing operations amortized analysis Open addressing Doing away w linked lists Operations insertion deletion Probing linear probing double hashing Performance analysis UHA open vs chaining Advanced topics Randomized algorithms shht U i Universal lh hashing hi perfect f th hashing hi Fingerprinting file signatures false positives Dynamic Dictionaries IIn substring b i application li i inserted i d all ll at once then scanned More generally arbitrary sequence of insert delete find How do we know how big the table will get What if we guess wrong too small load high operations too slow too large high initialization cost consumes space potentially more cache misses Want m n at all times Solution Resize when needed Start table at small constant size When table too full make it bigger When h table bl too empty make k it i smaller ll How Build a whole new hash table and insert items p all hashes Pick new hash seed recompute Recreate new linked lists p to rebuild Time spent new size hashes x HashTime When to resize Approach 1 whenever n m rebuild table to new size a factor of Sequence of n inserts Each E h increases i n pastt m causes rebuild b ild Total work 1 2 n n2 HashTime is suppressed here Approach 2 Whenever n 2m 2m rebuild table to new size Costly inserts insert 2i for all i These cost 1 2 4 n n All other inserts take O 1 time why Inserting n items takes O n time Keeps m a power of 2 good for mod Amortized Analysis If a sequence of n operations takes time T then each operation has amortized cost T n Like amortizingg a loan payment p y per p month Rebuilding when n 2m some ops are very slow n for insertion that causes last resize But on average g fast O 1 amortized cost per operation Often only care about total runtime So averaging is fine Insertions Deletions Rebuild b ild table bl to new size i when h n m Same as bad insert O n2 work Rebuild R b ild when h n m 2 2 Makes a sequence of deletes fast What about an arbitrary sequence of inserts deletes Suppose we have just rebuilt m n Next rebuild a grow at least m more inserts are needed before growing table Amortized cost O 2m m O 1 Cost to rebuild Paid after m insertions Next rebuild a shrink at least m 2 more deletes are needed before shrinking Amortized cost O m 2 m 2 O 1 Cost to rebuild Paid after m 2 deletions Putting the two together Algorithm Keep m a power of 2 good for mod Grow double m when n m Shrink halve m when n m 4 Analysis Just after rebuild n m 2 Next rebuild a ggrow at least m 2 more inserts Amortized cost O 2m m 2 O 1 Next rebuild a shrink at least m 4 more deletes Amortized cost O m 2 m 4 O 1 Summary Arbitrary sequence of insert delete find O 1 amortized time per operation Today Hashing I I Space issues R Rev H Hash h functs f t chaining h i i SUHA SUHA rolling lli signatures i t Dynamic dictionaries Resizing hash tables When to resize insertions deletions Resizing operations amortized analysis Open addressing Doing away w linked lists Operations insertion deletion Probing linear probing double hashing Performance analysis UHA open vs chaining Advanced topics Randomized algorithms shht U i Universal lh hashing hi perfect f th hashing hi Fingerprinting file signatures false positives The trouble with chaining h k1 my heart is sick of being in chains Tori Amos 92 k1 item1 K h k3 our people are sick of being in chains Tunisia Egypt Libya 10 10 k3 item3 h k2 h k4 k2 item2 k4 item4 Hash table just for indexing all storage in linked lists In ppractice Bad localityy of reference for table items Would like to store only table in memory with all items Open Addressing Different technique for dealing with collisions does not use linked list Instead if bucket occupied p find other bucket need m n For insert probe a sequence of buckets until find empty one h x specifies probe sequence for item x Ideally sequence visits all buckets h h U 1 m 1 1 m 1 Universe of keys Bucket Probe number Open Addressing Ope dd ess g e example p e NIL NIL 1 2 collision other item NIL NIL other item collision NIL NIL itemk free spot NIL NIL other item collision NIL NIL NIL m 1 Operations Insert Probe till find empty bucket put item there Search S h Probe till find item return with success Or find empty bucket return with failure Because if item inserted would use that empty bucket Delete Probe till find item Remove leaving empty bucket NIL Problem with Deletion Consider a sequence Insert x Insert y suppose probe sequence for y passes x bucket store y elsewhere Delete x leaving hole Search for y Probe sequence hits x bucket py Bucket now empty Conclude y not in table else y would be there Solution for deletion When
View Full Document
Unlocking...