DOC PREVIEW
UMBC CMSC 341 - Hashing

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC 341Hash TableDivision MethodMultiplication MethodMultiplication Method (cont.)PowerPoint PresentationNon-integer KeysHorner’s RuleHashTable ClassHash Table OpsHandling CollisionsFind PerformanceRemove PerformanceHandling Collisions RevisitedLinear ProbingLinear Probing (cont)Slide 17Quadratic ProbingQuadratic Probing (cont.)Double Hashing01/14/19 1CMSC 341Hashing01/14/19 2Hash Table 0 1 2 m-1Basic Idea–an array in which items are stored–storage index for an item determined by a hash functionh(k): U  {0, 1, …, m-1}Desired Properties of h(k)–easy to compute–uniform distribution of keys over {0, 1, …, m}•when h(k1) = h(k2) for k1, k2  U , we have a collision01/14/19 3Division MethodThe function:h(k) = k mod mwhere m is the table size. m must be chosen to spread keys evenly.–Ex: m = a factor of 10–Ex: m = 2b, b> 1A good choice of m is a prime number. Also we want the table to be no more than 80% full.–Choose m as smallest prime number greater than mmin, where mmin = (expected number of entries)/0.801/14/19 4Multiplication MethodThe functionh(k) = m(kA - kA)where A is some real positive constant.A very good choice of A is the inverse of the “golden ratio.”Given two positive numbers x and y, the ratio x/y is the golden ratio if = x/y = (x+y)/x The golden ratio:x2 - xy - y2 = 0  2 -  - 1 = 0 = (1 + sqrt(5))/2 = 1.618033989…~= Fibi/Fibi-101/14/19 5Multiplication Method (cont.)Because of the relationship of the golden ratio to Fibonacci numbers, this particular value of A in the multiplication method is called “Fibonacci hashing.”Some values of h(k) = m(k -1 - k -1 )= 0 for k = 0= 0.618m for k = 1 (-1 = 1/ 1.618… = 0.618…)= 0.236m for k = 2= 0.854m for k = 3 = 0.472m for k = 4= 0.090m for k = 5= 0.708m for k = 6= 0.326m for k = 7= …= 0.777m for k = 3201/14/19 601/14/19 7Non-integer KeysIn order to has a non-integer key, must first convert to a positive integer:h(k) = g(f(k)) with f: U  intg: I  {0 .. m-1}/2Suppose the keys are strings. How can we convert a string (or characters) into an integer value?01/14/19 8Horner’s Ruleint hash(const string &key, int tablesize){int hashval = 0;// f(k) by Horner’s rulefor (int i = 0; i < key.length(); i++)hashval = 37*hashval + key[i];// g(k) by division methodhashval %= tablesize;if (hashval < 0)hashval += tablesize;return hashval;}01/14/19 9HashTable Classtemplate <class HashedObj>class HashTable {public:explicit HashTable(const HashedObj &notFound, size=101);HashTable(const HashTable &rhs) : ITEM_NOT_FOUND(rhs.ITEM_NOT_FOUND),theLists(rhs.theLists){ /* no code */ }const HashedObj &find(const HashedObj &x) const;void makeEmpty();void insert (const HashedObj &x);void remove (const HashedObj &x);const HashTable& operator=(const HashTable &rhs);private:vector<List<HashedObj> > theLists;const HashedObj ITEM_NOT_FOUND;};01/14/19 10Hash Table Opsconst HashedObj &find(const HashedObj &x) const;–returns the HashedObj in the table, if present–otherwise, returns ITEM_NOT_FOUNDvoid insert (const HashedObj &x);–if x already in table, do nothing.–otherwise insert it, using the appropriate hash functionvoid remove (const HashedObj &x);–remove the instance of x, if x is present–otherwise, does nothingvoid makeEmpty();01/14/19 11Handling CollisionsCollisions are inevitable. How to handle them?One possibility: separate chaining (aka open hashing)–store colliding items in a list–if m is large enough, list lengths are smallInsertion of key k–hash(k) to find bucket–if k is in that list, do nothing. Else, insert k on that list.Asymptotic performance–if always inserted at head of list, and no duplicates, insert = O(1): best, worst, average01/14/19 12Find PerformanceFind–hash k to find the bucket–do a find on that list, returns a listItr–if itr.isPastEnd(), return ITEM_NOT_FOUND, otherwise, return itr.retrieve()Performance–best: –worst: –average01/14/19 13Remove PerformanceRemove k from table–hash k to find bucket–remove k from listPerformance–best–worst–average01/14/19 14Handling Collisions RevisitedOpen addressing (aka closed hashing)–all elements stored in the table itself (so table should be large. Rule of thumb: M >= 2N)–upon collision, item is hashed to a new (open) slot.Hash functionh: U x {0,1,2,….}  {0,1,…,M-1}h( k, I ) = (h’ ( k ) + f( I ) ) mod mfor some h’: U  {0,1,…,M-1}and f(0) = 0Each try is called a probe01/14/19 15Linear ProbingFunction: f(i) = ciExample: h’(k) = k mod 10 in a table of size 10 , f(i) = iU={89,18,49,58,69}01/14/19 16Linear Probing (cont)Problem: Clustering–when table starts to fill up, performance  O(N)Asymptotic Performance–insertion and unsuccessful find, average•# probes  (½) (1+1/(1-)2)•if   1, the denominator goes to zero and the number of probes goes to infinity01/14/19 17Linear Probing (cont)Remove–Can’t just use the hash function(s) to find the object,and remove it, because objects that were inserted after x were hashed based on x’s presence.–Can just mark the cell as deleted so it won’t be found anymore.•Other elements still in right cells•Table can fill with lots of deleted junk01/14/19 18Quadratic ProbingFunction:f(i) = c2i2 + c1i + c0Example:f(i) = i2, m=10U={89,18,49,58,69}01/14/19 19Quadratic Probing (cont.)Advantage: –reduced clustering problemDisadvantages: –reduced number of sequences–no guarantee that empty slot will be found if lambda >= 0.5 if table size is not prime01/14/19 20Double HashingUse two hash functions: h’1(k), h’2(k)h(k,I) = (h’1(k) + ih’2(k)) mod MChoosing h’2(k)–don’t allow h’2(k) = 0 for any k.–a good choice:•h’2(k) = R - (k mod R) with R a prime smaller than MCharacteristics–No clustering problem–Requires a second hash


View Full Document

UMBC CMSC 341 - Hashing

Download Hashing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hashing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hashing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?