DOC PREVIEW
MIT 6 006 - Overview of Hash Tables

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

6.006 Intro to Algorithms Recitation 05 February 16, 2011Overview of Hash TablesA hash table is a data structure that supports the following operations:• insert(k) - puts key k into the hash table• search(k) - searches for key k in the hash table• remove(k) - removes key k from the hash tableIn a well formed hash table, each of these operations take on average O(1) time, making hashtables a very useful data structure.You can think of a hash table as a list of m slots. Inserting a key puts it in one of the slotsin the hash table, deleting a key removes it from the slot it was inserted in, and searching a keylooks in the slot the key would have been inserted into to see if it is indeed there. Empty slots aredesignated with a NIL value. The big question is figuring out which slot should a key k be insertedinto in order to maintain the O(1) runtime of these operations.Hash FunctionsConsider a function h(k) that maps the universe U of keys (specific to the hash table, keys couldbe integers, strings, etc. depending on the hash table) to some index 0 to m. We call this functiona hash function. When inserting, searching, or deleting a key k, the hash table hashes k and looksat the h(k)th slot to add, look for, or remove the key.A good hash function• satisfies (approximately) the assumption of simple uniform hashing: each key is equallylikely to hash to any of the m slots. The hash function shouldn’t bias towards particular slots• does not hash similar keys to the same slot (e.g. compiler’s symbol table shouldn’t hashvariables i and j to the same slot since they are used in conjunction a lot)• is quick to calculate, should have O(1) runtime• is deterministic. h(k) should always return the same value for a given k6.006 Intro to Algorithms Recitation 05 February 16, 2011Example 1: Division methodThe division method is one way to create hash functions. The functions take the formh(k) = k mod m (1)Since we’re taking a value mod m, h(k) does indeed map the universe of keys to a slot in thehash table. It’s important to note that if we’re using this method to create hash functions, m shouldnot be a power of 2. If m = 2p, then the h(k) only looks at the p lower bits of k, completelyignoring the rest of the bits in k. A good choice for m with the division method is a prime number(why are composite numbers bad?).Example 2: Multiplication methodThe multiplication method is another way to create hash functions. The functions take the formh(k) = bm(kA mod 1)c (2)where 0 < A < 1 and (kA mod 1) refers to the fractional part of kA. Since 0 < (kA mod 1) <1, the range of h(k) is from 0 to m. The advantage of the multiplication method is it works equallywell with any size m. A should be chosen carefully. Rational numbers should not be chosen for A(why?). An example of a good choice for A is√5−12.CollisionsIf all keys hash to different slots, then the hash table operations are as fast as computing the hashfunction and changing or inspecting the value of an array element, which is O(1) runtime. How-ever, this is not always possible. If the number of possible keys is greater than the number ofslots in the hash table, then there must be some keys that hash into the same slot, in other words acollision. There are several ways to resolve a collision.ChainingIn the chaining method of resolution, hash table slot j contains a linked list of every key whosehash value is j. The hash table operations now look like• insert(k) - insert k into the linked list at slot h(k)• search(k) - search for k in the linked list at slot h(k) by iterating through the list• remove(k) - search for k in the linked list at slot h(k) and then remove it from the list6.006 Intro to Algorithms Recitation 05 February 16, 2011With chaining, if a key collides with another key, it gets inserted into the same linked list in theslot they hash into.In the ideal case, all keys hash to different slots and every linked list has at most 1 element,keeping the runtimes of the operations at O(1). In the worst case, all n keys inserted into the hashtable hashes to the same slot. We then get a n size linked list which takes O(n) to search through,resulting in O(n) search and remove. This is why choosing a hash function that equally distributeskeys to all slots is important.If there are n keys in a hash table with m slots, we call the load factor α for the hash table tobenm. Under the assumption of simple uniform hashing, the length of each linked list in the hashtable is α. As long as the number of keys inserted is proportional to the size of the hash table,α = O(1), thus the operations on average are O(1) as well.Open Addressing CollisionsA hash table may use open addressing, which means that each slot in the hash table contains eithera single key or NIL to indicate that no key has been hashed in that slot. Unlike chaining, we cannotfit more than one key in a single slot, so we must resolve collisions in a different way. We musthave a method to determine which slot to try next in the case of a collision. We still try to put a keyk into slot h(k) first, but if that slot is occupied, we keep trying new slots until we find an emptyone to put the key into.Linear probing resolves collisions by simply checking the next slot, i.e. if a collision occurredin slot j, the next slot to check would be slot j + 1. More formally, linear probing uses the hashfunctionh(k, i) = (h0(k) + i) mod m (3)Where h0(k) is the hash function we try first. If h(k, 0) results in a collision, we increment iuntil we find an empty slot. One drawback to linear probing is if keys hash to slots close to eachother, a cluster of adjacent slots get filled up. When trying to insert future keys into this cluster, we6.006 Intro to Algorithms Recitation 05 February 16, 2011must then traverse through the entire cluster in order to find an empty slot to insert into, which canslow down our hash table operations.Quadratic probing resolves collisions in a similar fashion:h(k, i) = (h0(k) + c1i + c2i2) mod m (4)for some constants c1, c2. Instead of linearly traversing through the hash table slots in the caseof collisions, quadratic probing introduces more spacing between the slots we try in case of acollision, which reduces the clustering effect seen in linear probing. However, a milder form ofclustering can still occur, since keys that hash to the same initial value will probe the exact samesequence of slots to find an empty slot.Double hashing resolves collisions


View Full Document

MIT 6 006 - Overview of Hash Tables

Documents in this Course
Quiz 1

Quiz 1

7 pages

Quiz 2

Quiz 2

12 pages

Quiz 2

Quiz 2

9 pages

Quiz 1

Quiz 1

10 pages

Quiz 2

Quiz 2

11 pages

Quiz 1

Quiz 1

12 pages

Graphs

Graphs

27 pages

Load more
Download Overview of Hash Tables
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Overview of Hash Tables and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Overview of Hash Tables 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?