DOC PREVIEW
Introduction to Algorithms

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

3/4/20081Introduction to Algorithms6.046J/18.401JLecture 8 - HashingProf. Manolis Kellis3/4/2008 L8.2© Leiserson, Indyk, KellisHashing lecture outlineIntro and definitionHashing in practiceUniversal hashingPerfect hashingOpen Addressing (optional)3/4/2008 L8.3© Leiserson, Indyk, KellisData Structures• Role of data structures:– Encapsulate data– Support certain operations (e.g., INSERT, DELETE, SEARCH)• Our focus: efficiency of the operations• Algorithms vs. data structures3/4/2008 L8.4© Leiserson, Indyk, KellisSymbol-table problemSymbol table T holding n records:key[x]key[x]recordxOther fields containing satellite dataOperations on T:• INSERT(T, x)• DELETE(T, x)• SEARCH(T, k)How should the data structure T be organized?3/4/2008 L8.5© Leiserson, Indyk, KellisDirect-access tableIDEA: Suppose that the set of keys is K ⊆ {0, 1, …, m–1}, and keys are distinct. Set up an array T[0 . . m–1]: T[k] =x if k ∈ K and key[x] = k,NIL otherwise.Then, operations take Θ(1) time.Problem: The range of keys can be large. Eg.:• 64-bit numbers (which represent 18,446,744,073,709,551,616 different keys)• Character strings (even larger!).3/4/2008 L8.6© Leiserson, Indyk, KellisAs each key is inserted, h maps it to a slot of T.Hash functionsSolution: Use a hash function h to map the universe U of all keys into{0, 1, …, m–1}:UKk1k2k3k4k50m–1h(k1)h(k4)h(k2)h(k3)When a record to be inserted maps to an already occupied slot in T, a collision occurs.T= h(k5)3/4/200823/4/2008 L8.7© Leiserson, Indyk, KellisResolving collisions by chaining• Records in the same slot are linked into a list.h(49) = h(86) = h(52) = iT494986865252i• Run time analysis requires key assumption:3/4/2008 L8.8© Leiserson, Indyk, KellisSimple uniform hashingWe make the assumption of simple uniform hashing:• Each key k ∈ K of keys is equally likely to be hashed to any slot of table T, independent of where other keys are hashed.Let n be the number of keys in the table, and let m be the number of slots.Define the load factor of T to beα = n/m= average number of keys per slot.3/4/2008 L8.9© Leiserson, Indyk, KellisSearch cost for chaining under simple uniform hashingExpected time to search for a record with a given key = Θ(1 + α).apply hash function and access slotsearch the listExpected search time = Θ(1) if α = O(1), or equivalently, if n = O(m).3/4/2008 L8.10© Leiserson, Indyk, KellisIntro and definitionHashing in practiceUniversal hashingPerfect hashingOpen Addressing (optional)3/4/2008 L8.11© Leiserson, Indyk, KellisChoosing a hash functionThe assumption of simple uniform hashing is hard to guarantee, but several common techniques tend to work well in practice as long as their deficiencies can be avoided.Desirata:• A good hash function should distribute the keys uniformly into the slots of the table.• Regularity in the key distribution should not affect this uniformity.3/4/2008 L8.12© Leiserson, Indyk, Kellish(k)1. Division methodAssume all keys are integers, and defineh(k) = k mod m.Extreme deficiency: If m = 2r, then the hash doesn’t even depend on all the bits of k:• If k = 10110001110110102and r = 6, then h(k) = 0110102 .Deficiency: Don’t pick an m that has a small divisor d. A preponderance of keys that are congruent modulo d can adversely affect uniformity.3/4/200833/4/2008 L8.13© Leiserson, Indyk, Kellis1. Division method (continued)h(k) = k mod m.Pick m to be a prime not too close to a power of 2 or 10 and not otherwise used prominently in the computing environment.Annoyance:• Sometimes, making the table size a prime is inconvenient.But, this method is popular, although the next method we’ll see is usually superior.3/4/2008 L8.14© Leiserson, Indyk, Kellis2. Multiplication methodAssume that all keys are integers, m = 2r, and our computer has w-bit words. Define h(k) = (A·k mod 2w) rsh (w – r),where rsh is the “bit-wise right-shift” operator and A is an odd integer in the range 2w–1< A < 2w.• Don’t pick A too close to 2w.• Multiplication modulo 2wis fast.• The rsh operator is fast.3/4/2008 L8.15© Leiserson, Indyk, Kellis40352617Modular wheel2. Multiplication method exampleh(k) = (A·k mod 2w) rsh (w – r)Suppose that m = 8 = 23and that our computer has w = 7-bit words:1 0 1 1 0 0 1× 1 1 0 1 0 1 11 0 0 1 0 1 0 0 1 1 0 0 1 1= A= kh(k)A..2A..3A..Don’t pick A too close to 2w(explore more of the wheel space)3/4/2008 L8.16© Leiserson, Indyk, Kellis2. Multiplication method example1101011110101111010111101011110101111010111101011001100111010011 011 001h(k)A=k=h(k) = (A·k mod 2w) rsh (w – r)wNever computed(mult. mod 2w)rDon’t pick A too close to 2w(have balance of 1s and 0s)3/4/2008 L8.17© Leiserson, Indyk, Kellis3. Dot-product methodRandomized strategy:•Let m be prime. • Decompose k into r+1digits, each in {0,1,..m-1}. • Pick random vector a, similarly decomposed (each aichosen randomly from {0,1,..m-1} ). • Calculate dot product k·a, each multiplication mod m• Excellent in practice, but expensive to compute.mkakhriiiamod)(0∑==…krkr-1k2k1k0…arar-1a2a1a0…arkr…a2k2a1k1a0k0Each value 0 ≤ ki< m, with m prime.·m·m·m·m·m3/4/2008 L8.18© Leiserson, Indyk, KellisIntro and definitionHashing in practiceUniversal hashingPerfect hashingOpen Addressing (optional)3/4/200843/4/2008 L8.19© Leiserson, Indyk, KellisA weakness of hashingProblem: For any hash function h, a set of keys exists that can cause the average access time of a hash table to skyrocket.IDEA: Choose the hash function at random, independently of the keys.• Even if an adversary can see your code, they cannot find a bad set of keys, since they don’t know exactly which hash function will be chosen.• Adversary can pick all keys st. {k∈U:h(k)=i}for some slot i (e.g. denial-of-service attacks).3/4/2008 L8.20© Leiserson, Indyk, KellisUniversal hashing: Definition• Let U be a universe of keys. • Let H be a finite collection of hash functions, each mapping U to {0, 1, …, m–1}.• H is universal if for all x, y ∈ U, where x ≠ y, we have |{h ∈H: h(x) = h(y)}| = |H|/m, i.e. only 1/m of hash functions in H result in x,y collision. • Thus, if we choose hrandomly from H, the chance of a collision between x and y is 1/mH{h : h(x) = h(y)}|H|m3/4/2008 L8.21© Leiserson, Indyk, KellisUniversality is goodTheorem. Let h be a hash function chosen (uniformly) at random from a universal set Hof hash functions. Suppose h is used to hash n


Introduction to Algorithms

Download Introduction to Algorithms
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Introduction to Algorithms and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Introduction to Algorithms 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?