DOC PREVIEW
UMBC CMSC 341 - Hashing

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC 341Hashing8/30/2007CMSC 341 Hashing2The Basic Problem We have lots of data to store. We desire efficient – O( 1 ) – performance for insertion, deletion and searching. Too much (wasted) memory is required if we use an array indexed by the data’s key. The solution is a “hash table”.8/30/2007CMSC 341 Hashing3Hash Table Basic Idea The hash table is an array of size ‘m’  The storage index for an item determined by a hash function h(k): U → {0, 1, …, m-1} Desired Properties of h(k) easy to compute uniform distribution of keys over {0, 1, …, m-1} when h(k1) = h(k2) for k1, k2∈ U , we have a collision0 1 2 m-18/30/2007CMSC 341 Hashing4Division Method The hash function:h( k ) = k mod m where m is the table size. m must be chosen to spread keys evenly. Poor choice: m = a power of 10 Poor choice: m = 2b, b> 1 A good choice of m is a prime number.  Table should be no more than 80% full. Choose m as smallest prime number greater than mmin, where mmin= (expected number of entries)/0.88/30/2007CMSC 341 Hashing5Multiplication Method The hash function:h( k ) =  m( kA -  kA  ) where A is some real positive constant. A very good choice of A is the inverse of the “golden ratio.” Given two positive numbers x and y, the ratio x/y is the “golden ratio” if φ = x/y = (x+y)/x  The golden ratio:x2- xy - y2= 0 ⇒ φ2- φ - 1 = 0φ = (1 + sqrt(5))/2 = 1.618033989…~= Fibi/Fibi-18/30/2007CMSC 341 Hashing6Multiplication Method (cont.) Because of the relationship of the golden ratio to Fibonacci numbers, this particular value of A in the multiplication method is called “Fibonacci hashing.” Some values ofh( k ) = m(k φ-1- k φ-1)= 0 for k = 0= 0.618m for k = 1 (φ-1= 1/ 1.618… = 0.618…)= 0.236m for k = 2= 0.854m for k = 3= 0.472m for k = 4= 0.090m for k = 5= 0.708m for k = 6= 0.326m for k = 7= …= 0.777m for k = 328/30/2007CMSC 341 Hashing78/30/2007CMSC 341 Hashing8Non-integer Keys In order to have a non-integer key, must first convert to a positive integer:h( k ) = g( f( k ) ) with f: U → integerg: I → {0 .. m-1} Suppose the keys are strings.  How can we convert a string (or characters) into an integer value?8/30/2007CMSC 341 Hashing9Horner’s Rulestatic int hash(String key, int tableSize){int hashVal = 0;for (int i = 0; i < key.length(); i++)hashVal = 37 * hashVal + key.charAt(i);hashVal %= tableSize;if(hashVal < 0)hashVal += tableSize;return hashVal;}8/30/2007CMSC 341 Hashing10HashTable Classpublic class SeparateChainingHashTable<AnyType>{public SeparateChainingHashTable( ){/* Later */}public SeparateChainingHashTable(int size){/*Later*/}public void insert( AnyType x ){ /*Later*/ }public void remove( AnyType x ){ /*Later*/}public boolean contains( AnyType x ){/*Later */}public void makeEmpty( ){ /* Later */ }private static final int DEFAULT_TABLE_SIZE = 101;private List<AnyType> [ ] theLists; private int currentSize;private void rehash( ){ /* Later */ }private int myhash( AnyType x ){ /* Later */ }private static int nextPrime( int n ){ /* Later */ }private static boolean isPrime( int n ){ /* Later */ }}8/30/2007CMSC 341 Hashing11HashTable Ops boolean contains( AnyType x ) Returns true if x is present in the table. void insert (AnyType x) If x already in table, do nothing. Otherwise, insert it, using the appropriate hash function. void remove (AnyType x) Remove the instance of x, if x is present. Ptherwise, does nothing void makeEmpty()8/30/2007CMSC 341 Hashing12Hash Methodsprivate int myhash( AnyType x ){int hashVal = x.hashCode( );hashVal %= theLists.length;if( hashVal < 0 )hashVal += theLists.length;return hashVal;}8/30/2007CMSC 341 Hashing13Handling Collisions Collisions are inevitable. How to handle them? Separate chaining hash tables Store colliding items in a list. If m is large enough, list lengths are small. Insertion of key k hash( k ) to find the proper list. If k is in that list, do nothing, else insert k on that list. Asymptotic performance If always inserted at head of list, and no duplicates, insert = O(1): best, worst, average8/30/2007CMSC 341 Hashing14Hash Class for Separate Chaining To implement separate chaining, the private data of the hash table is a vector (array) of Lists. The hash functions are written using List functionsprivate List<AnyType> [ ] theLists;8/30/2007CMSC 341 Hashing15Performance of contains( ) contains Hash k to find the proper list. Call contains( ) on that list which returns a boolean. Performance best:  worst:  average8/30/2007CMSC 341 Hashing16Performance of remove( ) Remove k from table Hash k to find proper list. Remove k from list. Performance best worst average8/30/2007CMSC 341 Hashing17Handling Collisions Revisited Probing hash tables All elements stored in the table itself (so table should be large. Rule of thumb: m >= 2N) Upon collision, item is hashed to a new (open) slot. Hash functionh: U x {0,1,2,….} → {0,1,…,m-1}h( k, i ) = ( h’( k ) + f( i ) ) mod mfor some h’: U → { 0, 1,…, m-1} and some f( i ) such that f(0) = 0 Each attempt to find an open slot (i.e. calculating h( k, i )) is called a probe8/30/2007CMSC 341 Hashing18HashEntry Class for Probing Hash Tables In this case, the hash table is just an arrayprivate static class HashEntry<AnyType>{public AnyType element; // the elementpublic boolean isActive; // false if deletedpublic HashEntry( AnyType e ){ this( e, true ); }public HashEntry( AnyType e, boolean active ){ element = e; isActive = active; }}// The array of elementsprivate HashEntry<AnyType> [ ] array; // The number of occupied cellsprivate int currentSize;8/30/2007CMSC 341 Hashing19Linear Probing Use a linear function for f( i ) f( i ) = c * i Example: h’( k ) = k mod 10 in a table of size 10 , f( i ) = iSo thath( k, i ) = (k mod 10 + i ) mod 10Insert the values U={89,18,49,58,69} into the hash table8/30/2007CMSC 341 Hashing20Linear Probing (cont.) Problem: Clustering When the table starts to fill up, performance →O(N) Asymptotic Performance Insertion and unsuccessful find, average λ is the “load factor” – what fraction of the table is used Number of probes ≅ ( ½ ) ( 1+1/( 1-λ )2 ) if λ ≅ 1, the denominator goes to zero and the number of probes goes to infinity8/30/2007CMSC 341 Hashing21Linear Probing


View Full Document

UMBC CMSC 341 - Hashing

Download Hashing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hashing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hashing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?