CMSC 341 Hashing 01 14 19 1 Hash Table 0 1 2 m 1 Basic Idea an array in which items are stored storage index for an item determined by a hash function h k U 0 1 m 1 Desired Properties of h k easy to compute uniform distribution of keys over 0 1 m when h k1 h k2 for k1 k2 U we have a collision 01 14 19 2 Division Method The function h k k mod m where m is the table size m must be chosen to spread keys evenly Ex m a factor of 10 Ex m 2b b 1 A good choice of m is a prime number Also we want the table to be no more than 80 full Choose m as smallest prime number greater than mmin where mmin expected number of entries 0 8 01 14 19 3 Multiplication Method The function h k m kA kA where A is some real positive constant A very good choice of A is the inverse of the golden ratio Given two positive numbers x and y the ratio x y is the golden ratio if x y x y x The golden ratio x2 xy y2 0 2 1 0 1 sqrt 5 2 1 618033989 Fibi Fibi 1 01 14 19 4 Multiplication Method cont Because of the relationship of the golden ratio to Fibonacci numbers this particular value of A in the multiplication method is called Fibonacci hashing Some values of h k m k 1 k 1 0 for k 0 0 618m for k 1 1 1 1 618 0 618 0 236m for k 2 0 854m for k 3 0 472m for k 4 0 090m for k 5 0 708m for k 6 0 326m for k 7 0 777m for k 32 01 14 19 5 01 14 19 6 Non integer Keys In order to has a non integer key must first convert to a positive integer h k g f k with f U int g I 0 m 1 2 Suppose the keys are strings How can we convert a string or characters into an integer value 01 14 19 7 Horner s Rule int hash const string key int tablesize int hashval 0 f k by Horner s rule for int i 0 i key length i hashval 37 hashval key i g k by division method hashval tablesize if hashval 0 hashval tablesize return hashval 01 14 19 8 HashTable Class template class HashedObj class HashTable public explicit HashTable const HashedObj notFound size 101 HashTable const HashTable rhs ITEM NOT FOUND rhs ITEM NOT FOUND theLists rhs theLists no code const HashedObj find const HashedObj x const void makeEmpty void insert const HashedObj x void remove const HashedObj x const HashTable operator const HashTable rhs private vector List HashedObj theLists const HashedObj ITEM NOT FOUND 01 14 19 9 Hash Table Ops const HashedObj find const HashedObj x const returns the HashedObj in the table if present otherwise returns ITEM NOT FOUND void insert const HashedObj x if x already in table do nothing otherwise insert it using the appropriate hash function void remove const HashedObj x remove the instance of x if x is present otherwise does nothing void makeEmpty 01 14 19 10 Handling Collisions Collisions are inevitable How to handle them One possibility separate chaining aka open hashing store colliding items in a list if m is large enough list lengths are small Insertion of key k hash k to find bucket if k is in that list do nothing Else insert k on that list Asymptotic performance if always inserted at head of list and no duplicates insert O 1 best worst average 01 14 19 11 Find Performance Find hash k to find the bucket do a find on that list returns a listItr if itr isPastEnd return ITEM NOT FOUND otherwise return itr retrieve Performance best worst average 01 14 19 12 Remove Performance Remove k from table hash k to find bucket remove k from list Performance best worst average 01 14 19 13 Handling Collisions Revisited Open addressing aka closed hashing all elements stored in the table itself so table should be large Rule of thumb M 2N upon collision item is hashed to a new open slot Hash function h U x 0 1 2 0 1 M 1 h k I h k f I mod m for some h U 0 1 M 1 and f 0 0 Each try is called a probe 01 14 19 14 Linear Probing Function f i ci Example h k k mod 10 in a table of size 10 f i i U 89 18 49 58 69 01 14 19 15 Linear Probing cont Problem Clustering when table starts to fill up performance O N Asymptotic Performance insertion and unsuccessful find average probes 1 1 1 2 if 1 the denominator goes to zero and the number of probes goes to infinity 01 14 19 16 Linear Probing cont Remove Can t just use the hash function s to find the object and remove it because objects that were inserted after x were hashed based on x s presence Can just mark the cell as deleted so it won t be found anymore Other elements still in right cells Table can fill with lots of deleted junk 01 14 19 17 Quadratic Probing Function f i c2i2 c1i c0 Example f i i2 m 10 U 89 18 49 58 69 01 14 19 18 Quadratic Probing cont Advantage reduced clustering problem Disadvantages reduced number of sequences no guarantee that empty slot will be found if lambda 0 5 if table size is not prime 01 14 19 19 Double Hashing Use two hash functions h 1 k h 2 k h k I h 1 k ih 2 k mod M Choosing h 2 k don t allow h 2 k 0 for any k a good choice h 2 k R k mod R with R a prime smaller than M Characteristics No clustering problem Requires a second hash function 01 14 19 20
View Full Document