DOC PREVIEW
UMBC CMSC 341 - Hashing

This preview shows page 1-2-24-25 out of 25 pages.

Save
View full document
Premium Document
Do you want full access? Go Premium and unlock all 25 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC 341 Hashing The Basic Problem We have lots of data to store We desire efficient O 1 performance for insertion deletion and searching Too much wasted memory is required if we use an array indexed by the data s key The solution is a hash table 8 30 2007 CMSC 341 Hashing 2 Hash Table 0 2 m 1 Basic Idea 1 The hash table is an array of size m The storage index for an item determined by a hash function h k U 0 1 m 1 Desired Properties of h k easy to compute uniform distribution of keys over 0 1 m 1 8 30 2007 when h k1 h k2 for k1 k2 U we have a collision CMSC 341 Hashing 3 Division Method The hash function h k k mod m where m is the table size m must be chosen to spread keys evenly Poor choice m a power of 10 Poor choice m 2b b 1 A good choice of m is a prime number Table should be no more than 80 full Choose m as smallest prime number greater than mmin where mmin expected number of entries 0 8 8 30 2007 CMSC 341 Hashing 4 Multiplication Method The hash function h k m kA kA where A is some real positive constant A very good choice of A is the inverse of the golden ratio Given two positive numbers x and y the ratio x y is the golden ratio if x y x y x The golden ratio x2 xy y2 0 1 sqrt 5 2 Fibi Fibi 1 8 30 2007 2 1 0 1 618033989 CMSC 341 Hashing 5 Multiplication Method cont Because of the relationship of the golden ratio to Fibonacci numbers this particular value of A in the multiplication method is called Fibonacci hashing Some values of h k m k 1 k 1 0 for k 0 0 618m 0 236m 0 854m 0 472m 0 090m 0 708m 0 326m 0 777m 8 30 2007 for k 1 1 1 1 618 0 618 for k 2 for k 3 for k 4 for k 5 for k 6 for k 7 for k 32 CMSC 341 Hashing 6 8 30 2007 CMSC 341 Hashing 7 Non integer Keys In order to have a non integer key must first convert to a positive integer h k g f k with f U integer g I 0 m 1 Suppose the keys are strings How can we convert a string or characters into an integer value 8 30 2007 CMSC 341 Hashing 8 Horner s Rule static int hash String key int tableSize int hashVal 0 for int i 0 i key length i hashVal 37 hashVal key charAt i hashVal tableSize if hashVal 0 hashVal tableSize return hashVal 8 30 2007 CMSC 341 Hashing 9 HashTable Class public class SeparateChainingHashTable AnyType public SeparateChainingHashTable Later public SeparateChainingHashTable int size Later public void insert AnyType x Later public void remove AnyType x Later public boolean contains AnyType x Later public void makeEmpty Later private static final int DEFAULT TABLE SIZE 101 private List AnyType theLists private int currentSize private void rehash Later private int myhash AnyType x Later private static int nextPrime int n Later private static boolean isPrime int n Later 8 30 2007 CMSC 341 Hashing 10 HashTable Ops boolean contains AnyType x void insert AnyType x If x already in table do nothing Otherwise insert it using the appropriate hash function void remove AnyType x Returns true if x is present in the table Remove the instance of x if x is present Ptherwise does nothing void makeEmpty 8 30 2007 CMSC 341 Hashing 11 Hash Methods private int myhash AnyType x int hashVal x hashCode hashVal theLists length if hashVal 0 hashVal theLists length return hashVal 8 30 2007 CMSC 341 Hashing 12 Handling Collisions Collisions are inevitable How to handle them Separate chaining hash tables Insertion of key k Store colliding items in a list If m is large enough list lengths are small hash k to find the proper list If k is in that list do nothing else insert k on that list Asymptotic performance If always inserted at head of list and no duplicates insert O 1 best worst average 8 30 2007 CMSC 341 Hashing 13 Hash Class for Separate Chaining To implement separate chaining the private data of the hash table is a vector array of Lists The hash functions are written using List functions private List AnyType theLists 8 30 2007 CMSC 341 Hashing 14 Performance of contains contains Hash k to find the proper list Call contains on that list which returns a boolean Performance best worst average 8 30 2007 CMSC 341 Hashing 15 Performance of remove Remove k from table Hash k to find proper list Remove k from list Performance best worst average 8 30 2007 CMSC 341 Hashing 16 Handling Collisions Revisited Probing hash tables All elements stored in the table itself so table should be large Rule of thumb m 2N Upon collision item is hashed to a new open slot Hash function h U x 0 1 2 0 1 m 1 h k i h k f i mod m for some h U 0 1 m 1 and some f i such that f 0 0 Each attempt to find an open slot i e calculating h k i is called a probe 8 30 2007 CMSC 341 Hashing 17 HashEntry Class for Probing Hash In this Tables case the hash table is just an array private static class HashEntry AnyType public AnyType element the element public boolean isActive false if deleted public HashEntry AnyType e this e true public HashEntry AnyType e boolean active element e isActive active The array of elements private HashEntry AnyType array The number of occupied cells private int currentSize 8 30 2007 CMSC 341 Hashing 18 Linear Probing Use a linear function for f i f i c i Example h k k mod 10 in a table of size 10 f i i So that h k i k mod 10 i mod 10 Insert the values U 89 18 49 58 69 into the hash table 8 30 2007 CMSC 341 Hashing 19 Linear Probing cont Problem Clustering When the table starts to fill up performance O N Asymptotic Performance Insertion and unsuccessful find average 8 30 2007 is the load factor what fraction of the table is used Number of probes 1 1 1 2 if 1 the denominator goes to zero and the number of probes goes to infinity CMSC 341 Hashing 20 Linear Probing cont Remove Can t just use the hash function s to find the object and remove it because objects that were inserted after X were hashed based on X s presence Can just mark the cell as deleted so it won t be found anymore 8 30 2007 Other elements still in right cells Table can fill with lots of deleted junk CMSC 341 Hashing 21 Quadratic Probing Use a quadratic function for f i f i c2i2 c1i c0 …


View Full Document

UMBC CMSC 341 - Hashing

Download Hashing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hashing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hashing and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?