Unformatted text preview:

1CISC320, F05, Lec8, Liao 1CISC 320 Introduction to AlgorithmsFall 2005Lecture 8Hash Tables CISC320, F05, Lec8, Liao2Problem: to construct a dynamic set that supports the dictionary operations: search, insert and delete. Examples:dictionary: word key to definitioncompiler: symbol key to semantic dataCISC320, F05, Lec8, Liao3 Key types: Numerical Alphabet Key space: the set of all possible keys.Recall that we can search a sorted array quickly, so the question isCan we use array?Case 1: if keys are integer, directly index into array.Case 2: if keys are string of alphabets, convert to case 1 by first transforming characters to integers (e.g., ASCII). CISC320, F05, Lec8, Liao4T0123456789cdehkey Associated dataKey spaceK (actual keys)Direct address tablecdehCISC320, F05, Lec8, Liao5 Dictionary operations are easily supported in such direct-address model. Each operation takes O(1) time. Problem: key space may be too huge.e.g., names of at most 20 letters => size of key space = 2620≈ 2100≈ 1028In practice, while key space is huge, only a small portion is really used, say a few millions of names in our example. CISC320, F05, Lec8, Liao6HashingHash function hh: U → {0, 1, …, m-1}where U is the key space and typically m << |U|.Since m is smaller than |U|, h can not be a one-to-one mapping. Collisions: a collision occurs between keys k1 and k2if h(k1) = h(k2).k1k2h(k1) = h(k2)2CISC320, F05, Lec8, Liao7Collision resolution by chaining (closed-address) Each position in hash table is pointer to head of a linked list. To insert elements into the table, add to head of list.h(k1) = h(k2)= h(k3)k3k2k1iCISC320, F05, Lec8, Liao8 Chained-Has h-Insert(T,x)insert x at the head of list T[h(key[x])].worst- case running time is O(1). Chained-Has h- Search(T,k)search for an element with key k in list T[h(k)].worst- case running time is proportional to length of list T[h(k)]. Chained-Has h- Delete(T,x)delete x from the list T[h(key[x])].worst- case running time is the time for searching x plus O(1) time for removing it from the list.CISC320, F05, Lec8, Liao9 Uniform hashing: each key is equally likely to be hashed into any integer [0, …, m-1].load factor α: n/m, where n is the number of keys that will be actually stored in the table. That is, α is the average length of lists. Therefore,average time for search = O(1+ α).If n = O(m), then α = O(1). All dictionary operations can be supported in O(1) time on average.CISC320, F05, Lec8, Liao10 Open-address hashing all elements stored in the array of the hash table (no linked lists). More space efficient Less flexible: load factor α can not be larger than 1. Rehashing to resolve collisions.If a key K is hashed to position i, which is already occupied, K is rehashed to an alternative location:rehash(i+d) = (i+d) mod mwhere d is an increment computed from K. Linear probing: d = 1In linear probing, the alternative to i is the next position i+1.When i+1 = m will be mod by m to 0. So rehasing m times will guarantee to probe every slot in the Table.CISC320, F05, Lec8, Liao11 Example: h (x) = 5x mod 8keys: 1055, 1492, 1776, 1812, 1918, 1945.h(1055) = 3h(1492) = 4h(1776) = 0h(1812) = 4h(1918) = 6h(1945) = 51776 1055149218121945 19180 1 2 3 4 5 6 7CISC320, F05, Lec8, Liao12 Example: h (x) = 5x mod 8, rehash(i) = (i+1) mod 8.keys: 1055, 1492, 1776, 1812, 1918, 1945.h(1055) = 3h(1492) = 4h(1776) = 0h(1812) = 4, but T[4] is occupied. Rehash(4) = (4+1) mod 8 = 5, which is empty, so 1812 is stored in T[5]. h(1918) = 6h(1945) = 5, but T[5] is occupied. Rehash(5) = 6, T[6] is also occupied, so 6 is rehashed to 7, which is empty. 1776 1055 1492 1812 19180 1 2 3 4 5 6 719453CISC320, F05, Lec8, Liao13Search(T,key) 1. i = h(key);2. j = 0; // counter of rehash 3. inc = hashInc(key); // for a general increment scheme4.while (T[i] ≠ nil and j < m)5. if (T[i] = key) 6. then return i; // successful search7. i = rehash(i, inc); // i = i+1 for linear probing8. j = j+1; 9. return nil; // unsuccessful searchCISC320, F05, Lec8, Liao14 Theorem 11.6 Given an open-address hash table with load factor α = n/m <1, the expected number of probes in an unsuccessful search is at most 1/(1- α), assuming uniform hashing.e.g., In a half full table, 1/(1-.5) = 2; In a 90% full table, 1/(1-.9) = 10.  Theorem 11.8 Given an open-address hash table with load factor α <1, the expected number of probes in a successful search is at most 1/ α ln[1/(1- α)], assuming uniform hashing and assuming that each key in the table is equally likely to be searched for.e.g., in a half full table, it’s <1.387; in a 90% full table, it’s < 2.559CISC320, F05, Lec8, Liao15Choice of Hash Functions Distribute keys uniformly into integer range [0, 1, …, m]. Low collision rate. Hashing method I: divisionh(k) = k mod m  Must avoid certain values of m.• Powers of 2. If m = 2p, h(k) is p lowest order bits of k. e.g., m = 8 = 23, 0 ≤ k ≤ 128k = (107) = 1101011, h(k) = 011 = 3 k = (43) = 0101011, h(k) = 011 = 3… xxxx011, there are 16 collisions on h(k) = 3.• Powers of 10. similar argument.  Good values for m are primes not too close to exact power of 2.CISC320, F05, Lec8, Liao16 Hashing method II: multiplicationh(k) = └ m(k A mod 1) ┘where A is a constant, 0< A < 1, and (k A mod 1) is the fractional part of kA, namely, kA - └ kA ┘.e.g., A = (√5 -1) /2 ≈ 0.6180339887…m = 10000h(123456) = └10000 x (123456 x 0.61803… mod 1) ┘= └ 10000 x (76300.0041151.. mod 1) ┘= └ 10000 x 0.0041151.. ┘= └ 41.151… ┘= 41. Optimal choice of A depends on characteristics of data (Knuth suggests the golden ratio) Choose m as power of 2.CISC320, F05, Lec8, Liao17Summary Hash tables are an effective data structure for implementing dictionaries. Worst-case: search may take as long as Θ(n) time. Average-case:


View Full Document

UD CISC 320 - Lecture 8 Hash Tables

Download Lecture 8 Hash Tables
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 8 Hash Tables and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 8 Hash Tables 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?