Hash tableA basic problemUnsorted arraySorted arrayLinked listMore approachesArray as tableSlide 8Slide 9Slide 10Hash functionHash TableHash Table with Perfect HashSlide 14Division MethodCollisionSolutions to CollisionChained Hash TableSlide 19Chained Hash tableOpen AddressingHow to compute probe sequencesOpen Addressing ExampleLinear Probing: h(k, i ) = (h(k) + i ) mod m.Choosing a Hash FunctionClusteringQuadratic Probing ExampleQuadratic Probing: h(k, i ) = (h(k) + c1i + c2i 2) mod mDouble Hashing1Hash table2A basic problemWe have to store some records and perform the following:add new recorddelete recordsearch a record by keyFind a way to do these efficiently!3Unsorted arrayUse an array to store the records, in unsorted orderadd - add the records as the last entry fast O(1)delete a target - slow at finding the target, fast at filling the hole (just take the last entry) O(n)search - sequential search slow O(n)4Sorted arrayUse an array to store the records, keeping them in sorted orderadd - insert the record in proper position. much record movement slow O(n)delete a target - how to handle the hole after deletion? Much record movement slow O(n)search - binary search fast O(log n)5Linked listStore the records in a linked list (unsorted) add - fast if one can insert node anywhere O(1)delete a target - fast at disposing the node, but slow at finding the target O(n)search - sequential search slow O(n) (if we only use linked list, we cannot use binary search even if the list is sorted.)6More approacheshave better performance but are more complexHash tableTree (BST, Heap, …)7Array as table990303098020209801010005678900123450033333tommarypeterdavidandybetty731002056.881.590studid name score9908080 bill 49......Consider this problem. We want to store 1,000 student records and search them by student id.Consider this problem. We want to store 1,000 student records and search them by student id.8Array as table:33333:123450::betty:andy::90:81.5:name score56789david 56.8:9908080:::bill:::49::9999999One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]One ‘stupid’ way is to store the records in a huge array (index 0..9999999). The index is used as the student id, i.e. the record of the student with studid 0012345 is stored at A[12345]9Array as tableIt is also called Direct-address Hash Table.• Each slot, or position, corresponds to a key in U.• If there’s an element x with key k, then T [k] contains a pointer to x.• Otherwise, T [k] is empty, represented by NIL.10Array as tableStore the records in a huge array where the index corresponds to the keyadd - very fast O(1)delete - very fast O(1)search - very fast O(1)But it wastes a lot of memory! Not feasible.11Hash functionfunction Hash(key: KeyType): integer;Imagine that we have such a magic function Hash. It maps the key (studid) of the 1000 records into the integers 0..999, one to one. No two different keys maps to the same number.Imagine that we have such a magic function Hash. It maps the key (studid) of the 1000 records into the integers 0..999, one to one. No two different keys maps to the same number.H(‘0012345’) = 134H(‘0033333’) = 67H(‘0056789’) = 764…H(‘9908080’) = 312Hash Table:betty:bill::90:49:name scoreandy 81.5::david:::56.8::0033333:9908080:0012345::0056789:3670764999134To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id).To store a record, we compute Hash(stud_id) for the record and store it at the location Hash(stud_id) of the array. To search for a student, we only need to peek at the location Hash(target stud_id).13Hash Table with Perfect HashSuch magic function is called perfect hashadd - very fast O(1)delete - very fast O(1)search - very fast O(1)But it is generally difficult to design perfect hash. (e.g. when the potential key space is large)14Hash functionA hash function maps a key to an index within in a rangeDesirable properties:simple and quick to calculateeven distribution, avoid collision as much as possiblefunction Hash(key: KeyType);15Division MethodCertain values of m may not be good:When m = 2p then h(k) is the p lower-order bits of the keyGood values for m are prime numbers which are not close to exact powers of 2. For example, if you want to store 2000 elements then m=701 (m = hash table length) yields a hash function: h(k) = k mod mh(key) = k mod 70116CollisionFor most cases, we cannot avoid collisionCollision resolution - how to handle when two different keys map to the same indexH(‘0012345’) = 134H(‘0033333’) = 67H(‘0056789’) = 764…H(‘9903030’) = 3H(‘9908080’) = 317The problem arises because we have two keys that hash in the same array entry, a collision. There are two ways to resolve collision:Hashing with Chaining: every hash table entry contains a pointer to a linked list of keys that hash in the same entryHashing with Open Addressing: every hash table entry contains only one key. If a new key hashes to a table entry which is filled, systematically examine other table entries until you find one empty entry to place the new keySolutions to Collision18Chained Hash Table24103nilnilnil5nil:HASHMAXKey: 9903030name: tomscore: 73 One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil.One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil.19Chained Hash TablePut all elements that hash to the same slot into a linked list. • Slot j contains a pointer to the head of the list of all stored elements that hash to j • If there are no such elements, slot j contains NIL.20Chained Hash tableHash table, where collided records are stored in linked listgood hash function, appropriate hash size Few collisions. Add, delete, search very fast O(1)otherwise…some hash value has a long list of collided records..add - just insert at the head fast O(1)delete
View Full Document