MIT OpenCourseWare http ocw mit edu 6 006 Introduction to Algorithms Spring 2008 For information about citing these materials or our Terms of Use visit http ocw mit edu terms 6 006 Recitation Build 2008 10 Coming Up Next Hashing in theory and in Python Bad hash functions Mutable dictionary keys Hashes for basic data types in Python Why Hashing Useless from a theoretical standpoint O N op worst case not t for proofs Used everywhere dictionaries indices O 1 op is smokin hot fast Simple small constant factor Relies on black magic Hashing pwns BSTs BSTs O lg N op Hashing O 1 op avg case comparison model an order relation on keys is suf cient intimate knowledge of keys via magic inside the hash function pwns in real time rocks for most cases guaranteed upper bound worst case no guarantees for worst case O N Real Life Hashing I Application Keeping library cards 4x6 card for each book ling by the 1st letter of the book title e g Differential Equations goes to D no sorting asides from mechanism above Real Life Hashing II ling is uncool let s think of bucketing 26 buckets labeled A Z Books are bucketed by 1st letter in title Time to nd a book bucket size Real Life Hashing III What sucks in the scheme above Common pre xes The Introduction to Uneven distribution Many words start with E Few words start with X Real Life Hashing IV Solutions to issues above Ignore The Introduction e g bucket The Invisibles under I Break up E s bucket Ea Em En Ez Merge X s bucket with W Y Bucketing function gets hairy Hashing in Codeworld Memory is a block of cells Buckets are numbered 0 to N 1 Each bucket is a list of the objects in it Fancy name for the bucketing method hashing function Hash Functions Theory Maps the universe of keys to small bounded numbers Practice Black magic that allows us to beat the log N theoretical bound on a daily basis Good Hash Functions Convenient universe size 16 32 64 bit ints Uniform distribution of keys No obvious bad behavior Correct Equal keys always hash to the same value Fast Hashing Hall of Shame String hashing numeric code for rst letter sum of numeric code for all letters permutations hash to the same value polynomial value str i 256 grows without bound i Hashing Hall of Shame String hashing II str i 256 mod 2 takes N to compute str i 256 mod 2 mod 2 only takes rst 4 letters into account still sucks for table sizes powers of 2 i 32 2 i 32 32 Hashing Wisdom Good functions are hard to come up with Use built in functions whenever possible Python Hashing 101 Want hash to work for your own objects def hash self hash to a 32 bit number not 1 Want your objects as dictionary keys def eq self other return True False self equals other Application Screw Python I want lists as dictionary keys Plan 1 SuperList object encapsulating a list 2 implement hash and eq 3 prepare Turing award acceptance speech Behold it s SuperList 1 def make32 x 2 x x 2 32 3 if x 2 31 x x 2 32 4 return int x 5 class SuperList object 6 def init self list 7 self list list 8 def hash self 9 m 1000003 10 x 0x345678 11 v self list 12 for i in range len v 13 y v i hash 14 if y 1 return 1 15 x make32 x y m 16 m make32 m 82520 2 len v i 1 17 x make32 x 97531 18 if x 1 19 x 2 20 return x 21 def eq self other 22 return self list eq other list OMG I m teh one111 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 from super list import SuperList k1 SuperList 1 2 3 k2 SuperList 1 2 3 k3 SuperList 4 5 6 k1 k2 True k1 k3 False d d k1 a d k2 b d k3 c print d super list SuperList object at 0x69870 c super list SuperList object at 0x69930 b 17 18 print d k1 d k2 d k3 19 b b c Except not WTF 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 k1 list append 4 k1 k2 False k1 k3 False hash k1 89902565 hash k3 448334556 d k1 Traceback most recent call last File stdin line 1 in module KeyError super list SuperList object at 0x69930 d k2 Traceback most recent call last File stdin line 1 in module KeyError super list SuperList object at 0x698b0 d k3 c What have we learned Dictionary keys must be immutable Hashing Basic Data Examine Python s hashing functions for the built in data types Examples of reasonable hash functions avoiding common pitfalls Know your language Especially its cost model PyHash the Plan 1 def hash v 2 3 A Python implementation that is identical 4 to the underlying builtin Python function hash 5 for integers longs strings instances and tuples thereof 6 This returns 1 only when the object is unhashable 7 Floats not yet implemented 8 9 if type v type 1 return int hash v 10 if type v type 1L return long hash v 11 if type v type return string hash v 12 if type v type 1 return tuple hash v 13 x dummy 14 if type v type x return id v 15 return 1 PyHash Short Integers 1 def make32 x 2 3 Convert x into a 32 bit signed integer 4 5 x x 2 32 6 if x 2 31 7 x x 2 32 8 x int x 9 return x 10 11 def int hash v 12 if v 1 v 2 13 return v PyHash Strings 1 def string hash v 2 if v 3 return 0 4 else 5 x ord v 0 7 6 m 1000003 7 for c in v 8 x make32 x m ord c 9 x len v 10 if x 1 11 x 2 12 return x PyHash Tuples 1 def tuple hash v 2 3 The addend 82520 was selected from the range 0 1000000 for 4 generating the greatest number of prime multipliers for tuples 5 upto length eight 6 1082527 1165049 1082531 1165057 1247581 1330103 1082533 7 1330111 1412633 1165069 1247599 1495177 1577699 8 9 m 1000003 10 x 0x345678 11 for i in range len v 12 y v i hash Invoke built in python hash 13 if y 1 return 1 14 x make32 x y m 15 m make32 m 82520 2 len v i 1 16 x make32 x 97531 17 if x 1 18 x 2 19 return x PyHash Long Integers 1 def long hash v 2 sign 1 3 if v 0 4 v sign abs v 1 5 SHIFT 15 for a 32 bit machine 6 LONG BIT SHIFT 32 …
View Full Document
Unlocking...