DOC PREVIEW
MASON CS 310 - Hash Tables

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 310 Hash Tables, Page 1'&$%Hash Tableskey valueHashingFunctionkey valueCS 310 Hash Tables, Page 2'&$%• The hash-table data structure achieves (near) constant timesearching by “wasting” memory space.☞ the size of the memory we reserve for a hash table istypically much large than the number of data stored in it.• Each item in the table comprises a key and a value.☞ In each GMU student record, we could use GMU ID as thekey and all other data combined as the value.• We use a hashing function to determine where to store itemsin table.☞ Given an entry x, it will be stored at location h(x.key),where h is the hashing function.• The verb “to hash” means to chop something up or to make amess out of it (it/something being the key).CS 310 Hash Tables, Page 3'&$%Example• The table contains 80 entries.• The key of each entry is an integer.• We use the hash function h(key) = key % 80.☞ An entry with key 140 will be stored at location140 % 80 = 60 in the table.☞ An entry with key 425 will be stored at location425 % 80 = 25☞ An entry with key 825 will be also stored at location825 % 80 = 25 !CS 310 Hash Tables, Page 4'&$%Key-to-Index Mapping• A collision occurs when two keys are “hashed” to the samelocation in the table.• Birthday paradox: If 23 people are present in a room,chances are good (> 0.5) that two of them will have the samemonth/day birthday.☞ If we randomly maps 23 keys into a table of size 365, theprobability that no two keys map to the same location isless than 0.5.• A complete key-to-index mapping consists of1. a hashing function that minimizes collisions, and2. a technique for handling collisions.CS 310 Hash Tables, Page 5'&$%Hashing Fuction I: Division• Let CAPACITY be the number of entries in the table.• h(key) = key % CAPACITY.☞ please notice that this hash function is called “division,”although what we actually want is its remainder.• Certain CAPACITY values are better than others at avoidingcollisions.• Studies show that a good choice of CAPACITY is a primenumber of the form 4c + 3, where c is some integer number.☞ 811 is a prime number equal to (4 × 202) + 3☞ 1019 is a prime number equal to (4 × 254) + 3CS 310 Hash Tables, Page 6'&$%Hashing Function II: Mid-Square• The key is multiplied by itself.• The hashing function returns middle r digits of the result.• CAPACITY is set to 2r.CS 310 Hash Tables, Page 7'&$%Hashing Function III: Multiplication• The key is multiplied by a constant less than one.• The hash function returns the first r digits of the the results.• Again, CAPACITY is set to 2r.CS 310 Hash Tables, Page 8'&$%How to Handle Keys of Type String• Just treat a string as a long binary number.Example: Suppose the key is the string ’KEY’In ASCII, this is 1001011 1000101 1011001.Treat this as a binary number and use previous hashingfunctions.• Folding. Suppose we are given a key comprising 11 lettersa1a2a3a4a5a6a7a8a9a10a11.a1a2a3a4+ a5a6a7a8+ a9a10a11CS 310 Hash Tables, Page 9'&$%Collision Resolution Method I: ChainingFor chaining, we put entries hashed to the same index into a linkedlist.The advantages of chaining are:• It is easy to locate a key when there are collisions.• The hash table needs only space for pointers.• Deletion of an item is straight-forward.The disadvantage of chaining is that we need additional space forpointers along with the data.CS 310 Hash Tables, Page 10'&$%Example: Suppose h(key) = key % 100Keys: 4203 3606 1200 1302 8802 4106999800010203040506CS 310 Hash Tables, Page 11'&$%Collision Resolution Method II:Open Addressing• If h(key) = h0and is occupied, then we try, or “probe,”positions h1, h2, h3. . ., until we find an opening slot.• Common probing techniques: linear, quadratic, random, doublehashingCS 310 Hash Tables, Page 12'&$%Linear Probing• hi= (h0+ i) % CAPACITY• That is, we try h0+ 1, h0+ 2, h0+ 3, . . . until an empty slot isfound.• the “% CAPACITY” part ensures “wraping around” afterunsuccessfully probing the last location in the table.When the table becomes about half full, linear probing has atendency toward clustering; that is, data occupy long strings ofadjacent positions with gaps between the strings.CS 310 Hash Tables, Page 13'&$%M = 10Keys: 211, 143, 160, 821, 223, 2219876543210CS 310 Hash Tables, Page 14'&$%Quadratic Probing• hi= (h0+ i2) % CAPACITY☞ That is, we try h0+ 1, h0+ 4, h0+ 9, . . . until an empty slotis found.• Typically, quadratic probing will not probe all locations.• For prime CAPACITY values, (CAPACITY+1)/2 probes willbring you back to the starting point; when that happens, thereis no point in going further. (See pages 402 to 403 of Kruise fora proof.)CS 310 Hash Tables, Page 15'&$%ExampleCAPACITY=7, key = 702h0= 2h1= (h0+ 12) % 7 = 3h2= (h0+ 22) % 7 = 6h3= (h0+ 32) % 7 = 4h4= (h0+ 42) % 7 = 4h5= (h0+ 52) % 7 = 6h6= (h0+ 62) % 7 = 3h7= (h0+ 72) % 7 = 2We have completed a cycle, and half of the table entries are notprobed.CS 310 Hash Tables, Page 16'&$%• consider inserting 2, 72, 702, 7002, and 70002 to an empty,size-7 hash table that uses quadratic probing• for all these keys, h0= 2, h1= 3, h2= 6, and h3= 4.012 h03 h14 h356 h2• This is typically acceptable with hash tables, for “probing halfof the table” most probably indicates flaws in system design.☞ a hash table is typically large to minimize the chances ofcollisions.CS 310 Hash Tables, Page 17'&$%Double Hashing• For double hashing, we use a second hashing function h0(key).• If location h(key) = h0is occupied, we probeh0+ h0(key), h0+ 2 · h0(key), h0+ 3 · h0(key), . . .until an empty space is found.CS 310 Hash Tables, Page 18'&$%Rehashing• For rehashing, we also use two hashing functions, h and h0.• If h(key) = h0is occupied,then we try h1= h0(h0),then we try h2= h0(h1),then we try h3= h0(h2), and so forth.• A good choice of h and h0is as follows:☞ h(key) = key % CAPACITY☞ h0(x) = 1 + (x % (CAPACITY − 2))☞ Both CAPACITY and CAPACITY − 2 should be prime (forexample, 811 and 809).CS 310 Hash Tables, Page 19'&$%Implementing Open-Address Hashingwith Linear Probingtemplate <class Entry>class Table{public:static const int CAPACITY = 811;Table(); ~Table();void insert (const Entry& entry);void remove (int key);bool is_present (int key) const;void find (int key, bool& found, Entry& result) const;int size() const {return used;}CS 310 Hash Tables, Page 20'&$%private:static const int NEVER_USED = -1;static const int


View Full Document

MASON CS 310 - Hash Tables

Download Hash Tables
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hash Tables and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hash Tables 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?