CMSC 341Hashing8/30/2007CMSC 341 Hashing2The Basic Problem We have lots of data to store. We desire efficient – O( 1 ) – performance for insertion, deletion and searching. Too much (wasted) memory is required if we use an array indexed by the data’s key. The solution is a “hash table”.8/30/2007CMSC 341 Hashing3Hash Table Basic Idea The hash table is an array of size ‘m’ The storage index for an item determined by a hash function h(k): U → {0, 1, …, m-1} Desired Properties of h(k) easy to compute uniform distribution of keys over {0, 1, …, m-1} when h(k1) = h(k2) for k1, k2∈ U , we have a collision0 1 2 m-18/30/2007CMSC 341 Hashing4Division Method The hash function:h( k ) = k mod m where m is the table size. m must be chosen to spread keys evenly. Poor choice: m = a power of 10 Poor choice: m = 2b, b> 1 A good choice of m is a prime number. Table should be no more than 80% full. Choose m as smallest prime number greater than mmin, where mmin= (expected number of entries)/0.88/30/2007CMSC 341 Hashing5Multiplication Method The hash function:h( k ) = m( kA - kA ) where A is some real positive constant. A very good choice of A is the inverse of the “golden ratio.” Given two positive numbers x and y, the ratio x/y is the “golden ratio” if φ = x/y = (x+y)/x The golden ratio:x2- xy - y2= 0 ⇒ φ2- φ - 1 = 0φ = (1 + sqrt(5))/2 = 1.618033989…~= Fibi/Fibi-18/30/2007CMSC 341 Hashing6Multiplication Method (cont.) Because of the relationship of the golden ratio to Fibonacci numbers, this particular value of A in the multiplication method is called “Fibonacci hashing.” Some values ofh( k ) = m(k φ-1- k φ-1)= 0 for k = 0= 0.618m for k = 1 (φ-1= 1/ 1.618… = 0.618…)= 0.236m for k = 2= 0.854m for k = 3= 0.472m for k = 4= 0.090m for k = 5= 0.708m for k = 6= 0.326m for k = 7= …= 0.777m for k = 328/30/2007CMSC 341 Hashing78/30/2007CMSC 341 Hashing8Non-integer Keys In order to have a non-integer key, must first convert to a positive integer:h( k ) = g( f( k ) ) with f: U → integerg: I → {0 .. m-1} Suppose the keys are strings. How can we convert a string (or characters) into an integer value?8/30/2007CMSC 341 Hashing9Horner’s Rulestatic int hash(String key, int tableSize){int hashVal = 0;for (int i = 0; i < key.length(); i++)hashVal = 37 * hashVal + key.charAt(i);hashVal %= tableSize;if(hashVal < 0)hashVal += tableSize;return hashVal;}8/30/2007CMSC 341 Hashing10HashTable Classpublic class SeparateChainingHashTable<AnyType>{public SeparateChainingHashTable( ){/* Later */}public SeparateChainingHashTable(int size){/*Later*/}public void insert( AnyType x ){ /*Later*/ }public void remove( AnyType x ){ /*Later*/}public boolean contains( AnyType x ){/*Later */}public void makeEmpty( ){ /* Later */ }private static final int DEFAULT_TABLE_SIZE = 101;private List<AnyType> [ ] theLists; private int currentSize;private void rehash( ){ /* Later */ }private int myhash( AnyType x ){ /* Later */ }private static int nextPrime( int n ){ /* Later */ }private static boolean isPrime( int n ){ /* Later */ }}8/30/2007CMSC 341 Hashing11HashTable Ops boolean contains( AnyType x ) Returns true if x is present in the table. void insert (AnyType x) If x already in table, do nothing. Otherwise, insert it, using the appropriate hash function. void remove (AnyType x) Remove the instance of x, if x is present. Ptherwise, does nothing void makeEmpty()8/30/2007CMSC 341 Hashing12Hash Methodsprivate int myhash( AnyType x ){int hashVal = x.hashCode( );hashVal %= theLists.length;if( hashVal < 0 )hashVal += theLists.length;return hashVal;}8/30/2007CMSC 341 Hashing13Handling Collisions Collisions are inevitable. How to handle them? Separate chaining hash tables Store colliding items in a list. If m is large enough, list lengths are small. Insertion of key k hash( k ) to find the proper list. If k is in that list, do nothing, else insert k on that list. Asymptotic performance If always inserted at head of list, and no duplicates, insert = O(1): best, worst, average8/30/2007CMSC 341 Hashing14Hash Class for Separate Chaining To implement separate chaining, the private data of the hash table is a vector (array) of Lists. The hash functions are written using List functionsprivate List<AnyType> [ ] theLists;8/30/2007CMSC 341 Hashing15Performance of contains( ) contains Hash k to find the proper list. Call contains( ) on that list which returns a boolean. Performance best: worst: average8/30/2007CMSC 341 Hashing16Performance of remove( ) Remove k from table Hash k to find proper list. Remove k from list. Performance best worst average8/30/2007CMSC 341 Hashing17Handling Collisions Revisited Probing hash tables All elements stored in the table itself (so table should be large. Rule of thumb: m >= 2N) Upon collision, item is hashed to a new (open) slot. Hash functionh: U x {0,1,2,….} → {0,1,…,m-1}h( k, i ) = ( h’( k ) + f( i ) ) mod mfor some h’: U → { 0, 1,…, m-1} and some f( i ) such that f(0) = 0 Each attempt to find an open slot (i.e. calculating h( k, i )) is called a probe8/30/2007CMSC 341 Hashing18HashEntry Class for Probing Hash Tables In this case, the hash table is just an arrayprivate static class HashEntry<AnyType>{public AnyType element; // the elementpublic boolean isActive; // false if deletedpublic HashEntry( AnyType e ){ this( e, true ); }public HashEntry( AnyType e, boolean active ){ element = e; isActive = active; }}// The array of elementsprivate HashEntry<AnyType> [ ] array; // The number of occupied cellsprivate int currentSize;8/30/2007CMSC 341 Hashing19Linear Probing Use a linear function for f( i ) f( i ) = c * i Example: h’( k ) = k mod 10 in a table of size 10 , f( i ) = iSo thath( k, i ) = (k mod 10 + i ) mod 10Insert the values U={89,18,49,58,69} into the hash table8/30/2007CMSC 341 Hashing20Linear Probing (cont.) Problem: Clustering When the table starts to fill up, performance →O(N) Asymptotic Performance Insertion and unsuccessful find, average λ is the “load factor” – what fraction of the table is used Number of probes ≅ ( ½ ) ( 1+1/( 1-λ )2 ) if λ ≅ 1, the denominator goes to zero and the number of probes goes to infinity8/30/2007CMSC 341 Hashing21Linear Probing
View Full Document