CMSC 341 Hashing The Basic Problem We have lots of data to store We desire efficient O 1 performance for insertion deletion and searching Too much wasted memory is required if we use an array indexed by the data s key The solution is a hash table 8 30 2007 CMSC 341 Hashing 2 Hash Table 0 2 m 1 Basic Idea 1 The hash table is an array of size m The storage index for an item determined by a hash function h k U 0 1 m 1 Desired Properties of h k easy to compute uniform distribution of keys over 0 1 m 1 8 30 2007 when h k1 h k2 for k1 k2 U we have a collision CMSC 341 Hashing 3 Division Method The hash function h k k mod m where m is the table size m must be chosen to spread keys evenly Poor choice m a power of 10 Poor choice m 2b b 1 A good choice of m is a prime number Table should be no more than 80 full 8 30 2007 Choose m as smallest prime number greater than mmin where mmin expected number of entries 0 8 CMSC 341 Hashing 4 Multiplication Method The hash function h k m kA kA where A is some real positive constant A very good choice of A is the inverse of the golden ratio Given two positive numbers x and y the ratio x y is the golden ratio if x y x y x The golden ratio x2 xy y2 0 1 sqrt 5 2 Fibi Fibi 1 8 30 2007 2 1 0 1 618033989 CMSC 341 Hashing 5 Multiplication Method cont Because of the relationship of the golden ratio to Fibonacci numbers this particular value of A in the multiplication method is called Fibonacci hashing Some values of h k m k 1 k 1 0 0 618m 0 236m 0 854m 0 472m 0 090m 0 708m 0 326m 0 777m 8 30 2007 for k 0 for k 1 1 1 1 618 0 618 for k 2 for k 3 for k 4 for k 5 for k 6 for k 7 for k 32 CMSC 341 Hashing 6 8 30 2007 CMSC 341 Hashing 7 Non integer Keys In order to have a non integer key must first convert to a positive integer h k g f k with f U integer g I 0 m 1 Suppose the keys are strings How can we convert a string or characters into an integer value 8 30 2007 CMSC 341 Hashing 8 Horner s Rule static int hash String key int tableSize int hashVal 0 for int i 0 i key length i hashVal 37 hashVal key charAt i hashVal tableSize if hashVal 0 hashVal tableSize return hashVal 8 30 2007 CMSC 341 Hashing 9 HashTable Class public class SeparateChainingHashTable AnyType public SeparateChainingHashTable Later public SeparateChainingHashTable int size Later public void insert AnyType x Later public void remove AnyType x Later public boolean contains AnyType x Later public void makeEmpty Later private static final int DEFAULT TABLE SIZE 101 private List AnyType theLists private int currentSize private void rehash Later private int myhash AnyType x Later private static int nextPrime int n Later private static boolean isPrime int n Later 8 30 2007 CMSC 341 Hashing 10 HashTable Ops boolean contains AnyType x void insert AnyType x If x already in table do nothing Otherwise insert it using the appropriate hash function void remove AnyType x Returns true if x is present in the table Remove the instance of x if x is present Ptherwise does nothing void makeEmpty 8 30 2007 CMSC 341 Hashing 11 Hash Methods private int myhash AnyType x int hashVal x hashCode hashVal theLists length if hashVal 0 hashVal theLists length return hashVal 8 30 2007 CMSC 341 Hashing 12 Handling Collisions Collisions are inevitable How to handle them Separate chaining hash tables Insertion of key k Store colliding items in a list If m is large enough list lengths are small hash k to find the proper list If k is in that list do nothing else insert k on that list Asymptotic performance 8 30 2007 If always inserted at head of list and no duplicates insert O 1 best worst average CMSC 341 Hashing 13 Hash Class for Separate Chaining To implement separate chaining the private data of the hash table is a vector array of Lists The hash functions are written using List functions private List AnyType theLists 8 30 2007 CMSC 341 Hashing 14 Performance of contains contains Hash k to find the proper list Call contains on that list which returns a boolean Performance best worst average 8 30 2007 CMSC 341 Hashing 15 Performance of remove Remove k from table Hash k to find proper list Remove k from list Performance best worst average 8 30 2007 CMSC 341 Hashing 16 Handling Collisions Revisited Probing hash tables All elements stored in the table itself so table should be large Rule of thumb m 2N Upon collision item is hashed to a new open slot Hash function h U x 0 1 2 0 1 m 1 h k i h k f i mod m for some h U 0 1 m 1 and some f i such that f 0 0 Each attempt to find an open slot i e calculating h k i is called a probe 8 30 2007 CMSC 341 Hashing 17 HashEntry Class for Probing Hash Tables In this case the hash table is just an array private static class HashEntry AnyType public AnyType element the element public boolean isActive false if deleted public HashEntry AnyType e this e true public HashEntry AnyType e boolean active element e isActive active The array of elements private HashEntry AnyType array The number of occupied cells private int currentSize 8 30 2007 CMSC 341 Hashing 18 Linear Probing Use a linear function for f i f i c i Example h k k mod 10 in a table of size 10 f i i So that h k i k mod 10 i mod 10 Insert the values U 89 18 49 58 69 into the hash table 8 30 2007 CMSC 341 Hashing 19 Linear Probing cont Problem Clustering When the table starts to fill up performance O N Asymptotic Performance Insertion and unsuccessful find average 8 30 2007 is the load factor what fraction of the table is used Number of probes 1 1 1 2 if 1 the denominator goes to zero and the number of probes goes to infinity CMSC 341 Hashing 20 Linear Probing cont Remove Can t just use the hash function s to find the object and remove it because objects that were inserted after X were hashed based on X s presence Can just mark the cell as deleted so it won t be found anymore 8 30 2007 Other elements still in right cells Table can fill with lots of deleted junk CMSC 341 Hashing 21 Quadratic Probing Use a quadratic function for f i f i c2i2 c1i c0 …
View Full Document