DOC PREVIEW
MIT 6 006 - Hashing in theory

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

MIT OpenCourseWare http://ocw.mit.edu 6.006 Introduction to Algorithms Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.6.006 Recitation Build 2008.10Coming Up Next... • Hashing in theory and in Python • Bad hash functions • Mutable dictionary keys • Hashes for basic data types in PythonWhy Hashing • Useless from a theoretical standpoint • O(N) / op worst-case, not fit for proofs • Used everywhere (dictionaries, indices) • O(1) / op is smokin’ hot / fast • Simple - small constant factor • Relies on black magicHashing pwns BSTs? • BSTs • O(lg(N)) / op • guaranteed upper bound (worst-case) • comparison model (an order relation on keys is sufficient) • pwns in real-time • Hashing • O(1) / op avg-case • no guarantees for worst-case -- O(N) • intimate knowledge of keys (via magic inside the hash function) • rocks for most casesReal Life Hashing I • Application: Keeping library cards • 4x6” card for each book • filing by the 1st letter of the book title • e.g.“Differential Equations” goes to D • no sorting asides from mechanism aboveReal Life Hashing II • filing is uncool, let’s think of bucketing • 26 buckets, labeled ‘A’ - ‘Z’ • Books are bucketed by 1st letter in title • Time to find a book ~ bucket sizeReal Life Hashing III • What sucks in the scheme above? • Common prefixes • “The ...”,“Introduction to...” • Uneven distribution • Many words start with E • Few words start with XReal Life Hashing IV • Solutions to issues above? • Ignore “The...”,“Introduction...” • e.g. bucket “The Invisibles” under I • Break up E’s bucket:‘Ea-Em’,‘En-Ez’ • Merge X’s bucket with W/Y • Bucketing function gets hairy :(Hashing in Codeworld • Memory is a block of cells • Buckets are numbered 0 to N-1 • Each bucket is a list of the objects in it • Fancy name for the bucketing method: hashing functionHash Functions • Theory • Maps the universe of keys to small (bounded) numbers • Practice • Black magic that allows us to beat the log(N) theoretical bound on a daily basisGood Hash Functions • Convenient universe size (16/32/64-bit ints) • Uniform distribution of keys • No obvious bad behavior • Correct • Equal keys always hash to the same value • FastHashing Hall of Shame • String hashing • numeric code for first letter • sum of numeric code for all letters • permutations hash to the same value • polynomial value: ∑str[i]⋅256i • grows without boundHashing Hall of Shame • String hashing II •(∑str[i]⋅256i) mod 232 • takes N2 to compute •(∑str[i]⋅(256i mod 232)) mod 232 • only takes first 4 letters into account • still sucks for table sizes = powers of 2Hashing Wisdom • Good functions are hard to come up with • Use built-in functions whenever possiblePython Hashing 101 • Want hash() to work for your own objects? • def __hash__(self) • hash to a 32-bit number, not -1 • Want your objects as dictionary keys? • def __eq__(self, other) • return True/False (self equals other?)Application: Screw Python • I want lists as dictionary keys! • Plan: 1. SuperList object, encapsulating a list 2. implement __hash__ and __eq__ 3. prepare Turing award acceptance speech1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Behold, it’s SuperList!!! def make32(x): x = x % (2**32) if x >= 2**31: x = x - 2**32 return int(x) class SuperList(object): def __init__(self, list): self.list = list def __hash__(self): m = 1000003 x = 0x345678 v = self.list for i in range(len(v)): y = v[i].__hash__() if y == -1: return -1 x = make32((x^y)*m) m = make32(m + 82520 + 2*((len(v)-i-1))) x = make32(x+97531) if x == -1: x = -2 return x def __eq__(self, other): return self.list.__eq__(other.list)OMG!! I’m teh one111 1 >>> from super_list import SuperList 2 >>> 3 >>> k1 = SuperList([1, 2, 3]) 4 >>> k2 = SuperList([1, 2, 3]) 5 >>> k3 = SuperList([4, 5, 6]) 6 >>> 7 >>> k1 == k2 8 True 9 >>> k1 == k3 10 False 11 >>> d = {} 12 >>> d[k1] = 'a' 13 >>> d[k2] = 'b' 14 >>> d[k3] = 'c' 15 >>> print d 16 {<super_list.SuperList object at 0x69870>: 'c', <super_list.SuperList object at 0x69930>: 'b'} 17 >>> 18 >>> print d[k1], d[k2], d[k3] 19 b b cExcept not (WTF?!) 1 >>> k1.list.append(4) 2 >>> k1 == k2 3 False 4 >>> k1 == k3 5 False 6 >>> hash(k1) 7 89902565 8 >>> hash(k3) 9 448334556 10 >>> d[k1] 11 Traceback (most recent call last): 12 File "<stdin>", line 1, in <module> 13 KeyError: <super_list.SuperList object at 0x69930> 14 >>> d[k2] 15 Traceback (most recent call last): 16 File "<stdin>", line 1, in <module> 17 KeyError: <super_list.SuperList object at 0x698b0> 18 >>> d[k3] 19 'c'What have we learned? • Dictionary keys must be immutableHashing Basic Data • Examine Python’s hashing functions for the built-in data types • Examples of reasonable hash functions, avoiding common pitfalls • Know your language • Especially its cost model1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 PyHash: the Plan def hash(v): """ A Python implementation that is identical to the underlying builtin Python function 'hash' for integers, longs, strings, instances, and tuples thereof. This returns -1 only when the object is unhashable. (Floats not yet implemented.) """ if type(v) == type(1): return int_hash(v) if type(v) == type(1L): return long_hash(v) if type(v) == type(" "): return string_hash(v) if type(v) == type((1,)): return tuple_hash(v) x = dummy if type(v) == type(x): return id(v) return -11 2 3 4 5 6 7 8 9 1011 12 13 PyHash: Short Integers def make32(x): """ Convert x into a 32-bit signed integer. """ x = x % (2**32) if x >= 2**31: x = x - 2**32 x = int(x) return x def int_hash(v): if v == -1: v = -2 return v1 2 3 4 5 6 7 8 9 10 11 12 PyHash: Strings def string_hash(v): if v == "": return 0 else: x = ord(v[0])<<7 m = 1000003 for c in v: x = make32((x*m)^ord(c))x ^= len(v) if x == -1: x = -2 return x1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 PyHash:Tuples def tuple_hash(v): """ The addend 82520, was selected from the range(0, 1000000) for generating the greatest number of prime multipliers for tuples upto length eight: 1082527, 1165049, 1082531, 1165057,


View Full Document

MIT 6 006 - Hashing in theory

Documents in this Course
Quiz 1

Quiz 1

7 pages

Quiz 2

Quiz 2

12 pages

Quiz 2

Quiz 2

9 pages

Quiz 1

Quiz 1

10 pages

Quiz 2

Quiz 2

11 pages

Quiz 1

Quiz 1

12 pages

Graphs

Graphs

27 pages

Load more
Download Hashing in theory
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hashing in theory and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hashing in theory 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?