New version page

MASON CS 310 - Hash Tables

Documents in this Course
Load more

This preview shows page 1-2-3-4 out of 13 pages.

View Full Document
View Full Document

End of preview. Want to read all 13 pages?

Upload your study docs or become a GradeBuddy member to access this document.

View Full Document
Unformatted text preview:

CS 310 Hash Tables, Page 1'&$%Hash Tableskey valueHashingFunctionkey valueCS 310 Hash Tables, Page 2'&$%• The hash-table data structure achieves (near) constant timesearching by “wasting” memory space.☞ the size of the memory we reserve for a hash table istypically much large than the number of data stored in it.• Each item in the table comprises a key and a value.☞ In each GMU student record, we could use GMU ID as thekey and all other data combined as the value.• We use a hashing function to determine where to store itemsin table.☞ Given an entry x, it will be stored at location h(x.key),where h is the hashing function.• The verb “to hash” means to chop something up or to make amess out of it (it/something being the key).CS 310 Hash Tables, Page 3'&$%Example• The table contains 80 entries.• The key of each entry is an integer.• We use the hash function h(key) = key % 80.☞ An entry with key 140 will be stored at location140 % 80 = 60 in the table.☞ An entry with key 425 will be stored at location425 % 80 = 25☞ An entry with key 825 will be also stored at location825 % 80 = 25 !CS 310 Hash Tables, Page 4'&$%Key-to-Index Mapping• A collision occurs when two keys are “hashed” to the samelocation in the table.• Birthday paradox: If 23 people are present in a room,chances are good (> 0.5) that two of them will have the samemonth/day birthday.☞ If we randomly maps 23 keys into a table of size 365, theprobability that no two keys map to the same location isless than 0.5.• A complete key-to-index mapping consists of1. a hashing function that minimizes collisions, and2. a technique for handling collisions.CS 310 Hash Tables, Page 5'&$%Hashing Fuction I: Division• Let CAPACITY be the number of entries in the table.• h(key) = key % CAPACITY.☞ please notice that this hash function is called “division,”although what we actually want is its remainder.• Certain CAPACITY values are better than others at avoidingcollisions.• Studies show that a good choice of CAPACITY is a primenumber of the form 4c + 3, where c is some integer number.☞ 811 is a prime number equal to (4 × 202) + 3☞ 1019 is a prime number equal to (4 × 254) + 3CS 310 Hash Tables, Page 6'&$%Hashing Function II: Mid-Square• The key is multiplied by itself.• The hashing function returns middle r digits of the result.• CAPACITY is set to 2r.CS 310 Hash Tables, Page 7'&$%Hashing Function III: Multiplication• The key is multiplied by a constant less than one.• The hash function returns the first r digits of the the results.• Again, CAPACITY is set to 2r.CS 310 Hash Tables, Page 8'&$%How to Handle Keys of Type String• Just treat a string as a long binary number.Example: Suppose the key is the string ’KEY’In ASCII, this is 1001011 1000101 1011001.Treat this as a binary number and use previous hashingfunctions.• Folding. Suppose we are given a key comprising 11 lettersa1a2a3a4a5a6a7a8a9a10a11.a1a2a3a4+ a5a6a7a8+ a9a10a11CS 310 Hash Tables, Page 9'&$%Collision Resolution Method I: ChainingFor chaining, we put entries hashed to the same index into a linkedlist.The advantages of chaining are:• It is easy to locate a key when there are collisions.• The hash table needs only space for pointers.• Deletion of an item is straight-forward.The disadvantage of chaining is that we need additional space forpointers along with the data.CS 310 Hash Tables, Page 10'&$%Example: Suppose h(key) = key % 100Keys: 4203 3606 1200 1302 8802 4106999800010203040506CS 310 Hash Tables, Page 11'&$%Collision Resolution Method II:Open Addressing• If h(key) = h0and is occupied, then we try, or “probe,”positions h1, h2, h3. . ., until we find an opening slot.• Common probing techniques: linear, quadratic, random, doublehashingCS 310 Hash Tables, Page 12'&$%Linear Probing• hi= (h0+ i) % CAPACITY• That is, we try h0+ 1, h0+ 2, h0+ 3, . . . until an empty slot isfound.• the “% CAPACITY” part ensures “wraping around” afterunsuccessfully probing the last location in the table.When the table becomes about half full, linear probing has atendency toward clustering; that is, data occupy long strings ofadjacent positions with gaps between the strings.CS 310 Hash Tables, Page 13'&$%M = 10Keys: 211, 143, 160, 821, 223, 2219876543210CS 310 Hash Tables, Page 14'&$%Quadratic Probing• hi= (h0+ i2) % CAPACITY☞ That is, we try h0+ 1, h0+ 4, h0+ 9, . . . until an empty slotis found.• Typically, quadratic probing will not probe all locations.• For prime CAPACITY values, (CAPACITY+1)/2 probes willbring you back to the starting point; when that happens, thereis no point in going further. (See pages 402 to 403 of Kruise fora proof.)CS 310 Hash Tables, Page 15'&$%ExampleCAPACITY=7, key = 702h0= 2h1= (h0+ 12) % 7 = 3h2= (h0+ 22) % 7 = 6h3= (h0+ 32) % 7 = 4h4= (h0+ 42) % 7 = 4h5= (h0+ 52) % 7 = 6h6= (h0+ 62) % 7 = 3h7= (h0+ 72) % 7 = 2We have completed a cycle, and half of the table entries are notprobed.CS 310 Hash Tables, Page 16'&$%• consider inserting 2, 72, 702, 7002, and 70002 to an empty,size-7 hash table that uses quadratic probing• for all these keys, h0= 2, h1= 3, h2= 6, and h3= 4.012 h03 h14 h356 h2• This is typically acceptable with hash tables, for “probing halfof the table” most probably indicates flaws in system design.☞ a hash table is typically large to minimize the chances ofcollisions.CS 310 Hash Tables, Page 17'&$%Double Hashing• For double hashing, we use a second hashing function h0(key).• If location h(key) = h0is occupied, we probeh0+ h0(key), h0+ 2 · h0(key), h0+ 3 · h0(key), . . .until an empty space is found.CS 310 Hash Tables, Page 18'&$%Rehashing• For rehashing, we also use two hashing functions, h and h0.• If h(key) = h0is occupied,then we try h1= h0(h0),then we try h2= h0(h1),then we try h3= h0(h2), and so forth.• A good choice of h and h0is as follows:☞ h(key) = key % CAPACITY☞ h0(x) = 1 + (x % (CAPACITY − 2))☞ Both CAPACITY and CAPACITY − 2 should be prime (forexample, 811 and 809).CS 310 Hash Tables, Page 19'&$%Implementing Open-Address Hashingwith Linear Probingtemplate <class Entry>class Table{public:static const int CAPACITY = 811;Table(); ~Table();void insert (const Entry& entry);void remove (int key);bool is_present (int key) const;void find (int key, bool& found, Entry& result) const;int size() const {return used;}CS 310 Hash Tables, Page 20'&$%private:static const int NEVER_USED = -1;static const int


View Full Document
Loading Unlocking...
Login

Join to view Hash Tables and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hash Tables and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?