CMSC 341Hash TableDivision MethodMultiplication MethodMultiplication Method (cont.)PowerPoint PresentationNon-integer KeysHorner’s RuleHashTable ClassHash Table OpsHandling CollisionsFind PerformanceRemove PerformanceHandling Collisions RevisitedLinear ProbingLinear Probing (cont)Slide 17Quadratic ProbingQuadratic Probing (cont.)Double Hashing01/14/19 1CMSC 341Hashing01/14/19 2Hash Table 0 1 2 m-1Basic Idea–an array in which items are stored–storage index for an item determined by a hash functionh(k): U {0, 1, …, m-1}Desired Properties of h(k)–easy to compute–uniform distribution of keys over {0, 1, …, m}•when h(k1) = h(k2) for k1, k2 U , we have a collision01/14/19 3Division MethodThe function:h(k) = k mod mwhere m is the table size. m must be chosen to spread keys evenly.–Ex: m = a factor of 10–Ex: m = 2b, b> 1A good choice of m is a prime number. Also we want the table to be no more than 80% full.–Choose m as smallest prime number greater than mmin, where mmin = (expected number of entries)/0.801/14/19 4Multiplication MethodThe functionh(k) = m(kA - kA)where A is some real positive constant.A very good choice of A is the inverse of the “golden ratio.”Given two positive numbers x and y, the ratio x/y is the golden ratio if = x/y = (x+y)/x The golden ratio:x2 - xy - y2 = 0 2 - - 1 = 0 = (1 + sqrt(5))/2 = 1.618033989…~= Fibi/Fibi-101/14/19 5Multiplication Method (cont.)Because of the relationship of the golden ratio to Fibonacci numbers, this particular value of A in the multiplication method is called “Fibonacci hashing.”Some values of h(k) = m(k -1 - k -1 )= 0 for k = 0= 0.618m for k = 1 (-1 = 1/ 1.618… = 0.618…)= 0.236m for k = 2= 0.854m for k = 3 = 0.472m for k = 4= 0.090m for k = 5= 0.708m for k = 6= 0.326m for k = 7= …= 0.777m for k = 3201/14/19 601/14/19 7Non-integer KeysIn order to has a non-integer key, must first convert to a positive integer:h(k) = g(f(k)) with f: U intg: I {0 .. m-1}/2Suppose the keys are strings. How can we convert a string (or characters) into an integer value?01/14/19 8Horner’s Ruleint hash(const string &key, int tablesize){int hashval = 0;// f(k) by Horner’s rulefor (int i = 0; i < key.length(); i++)hashval = 37*hashval + key[i];// g(k) by division methodhashval %= tablesize;if (hashval < 0)hashval += tablesize;return hashval;}01/14/19 9HashTable Classtemplate <class HashedObj>class HashTable {public:explicit HashTable(const HashedObj ¬Found, size=101);HashTable(const HashTable &rhs) : ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND),theLists(rhs.theLists){ /* no code */ }const HashedObj &find(const HashedObj &x) const;void makeEmpty();void insert (const HashedObj &x);void remove (const HashedObj &x);const HashTable& operator=(const HashTable &rhs);private:vector<List<HashedObj> > theLists;const HashedObj ITEM_NOT_FOUND;};01/14/19 10Hash Table Opsconst HashedObj &find(const HashedObj &x) const;–returns the HashedObj in the table, if present–otherwise, returns ITEM_NOT_FOUNDvoid insert (const HashedObj &x);–if x already in table, do nothing.–otherwise insert it, using the appropriate hash functionvoid remove (const HashedObj &x);–remove the instance of x, if x is present–otherwise, does nothingvoid makeEmpty();01/14/19 11Handling CollisionsCollisions are inevitable. How to handle them?One possibility: separate chaining (aka open hashing)–store colliding items in a list–if m is large enough, list lengths are smallInsertion of key k–hash(k) to find bucket–if k is in that list, do nothing. Else, insert k on that list.Asymptotic performance–if always inserted at head of list, and no duplicates, insert = O(1): best, worst, average01/14/19 12Find PerformanceFind–hash k to find the bucket–do a find on that list, returns a listItr–if itr.isPastEnd(), return ITEM_NOT_FOUND, otherwise, return itr.retrieve()Performance–best: –worst: –average01/14/19 13Remove PerformanceRemove k from table–hash k to find bucket–remove k from listPerformance–best–worst–average01/14/19 14Handling Collisions RevisitedOpen addressing (aka closed hashing)–all elements stored in the table itself (so table should be large. Rule of thumb: M >= 2N)–upon collision, item is hashed to a new (open) slot.Hash functionh: U x {0,1,2,….} {0,1,…,M-1}h( k, I ) = (h’ ( k ) + f( I ) ) mod mfor some h’: U {0,1,…,M-1}and f(0) = 0Each try is called a probe01/14/19 15Linear ProbingFunction: f(i) = ciExample: h’(k) = k mod 10 in a table of size 10 , f(i) = iU={89,18,49,58,69}01/14/19 16Linear Probing (cont)Problem: Clustering–when table starts to fill up, performance O(N)Asymptotic Performance–insertion and unsuccessful find, average•# probes (½) (1+1/(1-)2)•if 1, the denominator goes to zero and the number of probes goes to infinity01/14/19 17Linear Probing (cont)Remove–Can’t just use the hash function(s) to find the object,and remove it, because objects that were inserted after x were hashed based on x’s presence.–Can just mark the cell as deleted so it won’t be found anymore.•Other elements still in right cells•Table can fill with lots of deleted junk01/14/19 18Quadratic ProbingFunction:f(i) = c2i2 + c1i + c0Example:f(i) = i2, m=10U={89,18,49,58,69}01/14/19 19Quadratic Probing (cont.)Advantage: –reduced clustering problemDisadvantages: –reduced number of sequences–no guarantee that empty slot will be found if lambda >= 0.5 if table size is not prime01/14/19 20Double HashingUse two hash functions: h’1(k), h’2(k)h(k,I) = (h’1(k) + ih’2(k)) mod MChoosing h’2(k)–don’t allow h’2(k) = 0 for any k.–a good choice:•h’2(k) = R - (k mod R) with R a prime smaller than MCharacteristics–No clustering problem–Requires a second hash
View Full Document