U-M EECS 281 - Dictionary ADT and Hashing Recurrence Relations - D2628848

Home> Schools> University of Michigan> Electrical Engineering And Computer Science (EECS) > EECS 281> Dictionary ADT and Hashing Recurrence Relations

DOC PREVIEW

U-M EECS 281 - Dictionary ADT and Hashing Recurrence Relations

School name University of Michigan

Course Eecs 281- Data Struct&Algor

Pages 13

This preview shows page 1-2-3-4 out of 13 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 13 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Lecture 5: Dictionary ADT and Hashing!Recurrence Relations!How do you use a dictionary?!!Used where you need to do some sort of table lookup:!• search for a key in a table!• the key is usually associated with some data/value of interest!!Also known as associative array!!Why search for key instead of just searching for the data?!Dictionary ADT!Dictionary ADT!Key space is usually more regular/structured than value space, so easier to search!Dictionary entry is a"<const key_type, data_type> pair!• for example, <title, mp4_file>!• <Avatar, avatar.mp4>!Normally associate a given key with"only a single value or a pointer to data!Dictionary is optimized to quickly add "<key, data> pairs, retrieve data by key!Types of Dictionary!Whether items are grouped by some category such as by subject, by popularity, chronologically, etc.!• unordered!• ordered!Whether items are listed by a collating sequence of the key, e.g., numerical, alphabetical!• unsorted!• sorted!Adding entries into an ordered (sorted) list must retain the ordered (sorted) property of the list!The AppStore!What type of dictionary do we see at the AppStore?!Implementation! Search! Insert!Arrays! O(N)! O(1)!Linked Lists! O(N)! O(1)!Hashing (amortized)! O(1)! O(1)!Implementation! Search! Insert!Arrays! O(?)! O(?)!Linked Lists! O(?)! O(?)!Unsorted Dictionary Runtimes!Hashing!Access table items by their keys in relatively constant time regardless of their locations!!Main idea: use arithmetic operations (hash function) to transform keys into table locations!• the same key is always hashed to the same location!• such that insert and search are both directed to the same location in O(1) time!Hash table: an array of buckets, where each bucket contains items assigned by a hash function!Hashing Example!In a text editor, to speed up search, we build a hash table and hash each word into the table!Let hash table size (M) = 16!Let hash function (h()) = (sum all characters) mod 16!• by “sum all characters” we mean sum the ASCII (or UTF-8) representation of the character!• for example, h(“He”) = (72+101)%16 = 13!Let sample text be the following N=13 words:!“He was well educated and from his"portrait a shrewd observer might divine”!Hashing Example!(sum all characters) mod 16!He 13!was 11!well 4!educated 15!and 3!from 4!his 4!portrait 5!a 1!shrewd 13!observer 8!might 9!divine 15!N = 13, M = 16!0!1!2!3!4!5!6!7!8!9!10!11!12!13!14!15!a!and!well! from! his!portrait!observer!might!was!shrewd!He!educated! devine!Collision and Collision Resolution Collision occurs when the hash function maps two or more items—all having different search keys—into the same bucket!!What to do when there is a collision?!!Collision-resolution scheme:!• assigns distinct locations in the hash table to items involved in a collision!Separate Chaining A collision resolution scheme that lets each bucket points to a linked list of elements!• insertion:!• compute k = h(key)!• prepend to kth bucket in O(1) time"(but may need to check for duplicates)!• search:!• compute k = h(key)!• search in kth container (e.g., check every element)!Separate Chaining! a! and! his! portrait! observer! might! was! shrewd! divine! 0! 1! 2! 3! 4! 5! 6! 7! 8! 9! 10! 11! 12! 13! 14! 15!from!He!educated!well!Performance Analysis!Worst case!• all N elements in one bucket!• searching through a bucket : O(N) time!Average-case analysis:!• table size M, has N keys to store!• average bucket size is N/M!• L = N/M is called the load factor!• average runtime of search: O(h()) + O(1) + O(L)!• unsuccessful search: 1+L comparisons!• successful search: 1+L/2 comparisons on average!• for good performance, want small load factor!Why differentiate between successful and unsuccessful search?!How to Improve Runtime?!Can set M > N so that L < 1!• then average search time is O(1)!Space-time trade off:!• very large table/array!• few collisions!• for the movie title example, can have millions of entries!• small table/array!• many collisions, may need time to resolve!Table Resizing!If table size is fixed:!• search performance deteriorates with growth (when load factor becomes high)!When load factor becomes too high,"resize by doubling the size of hash table!• each entry must be re-hashed, not just moved, into the new hash table!• expensive worst-case; OK if amortized! 0! 1! 2! 3! 4! Table Resizing: Amortized Analysis!Hash table of size 2M!Assume O(1) operation to insert up to M−1 items: O(M)!For the M-th item, create a new hash table of size 4M: O(1)!Rehash all M−1 items to the new table: O(M)!Insert new item: O(1)!Total cost to insert M items: O(M + 1 + M + 1) = O(M)!So, average cost to insert M items is O(1)!󲰛 Hash table doubling cost is amortized!over individual inserts!• though the periodic high cost may not be acceptable to some applications that require smooth running time!Other Ways to Resolve Collisions!Aside from separate chaining, other methods have been proposed for collision resolution!!Two main motivations:!1. scatter table: re-use empty spaces in the hash table"to hold colliding items: coalesced chaining and open addressing!2. dynamic hashing: grow the hash table incrementally so as not to take the performance hit of rehashing everything when resizing: extendible hashing and linear hashing!• more complicated hash function to allow for addressing of incrementally grown hash table (not covered)!Coalesced Chaining!Keep the linked list used in separate chaining, but store it in the unused portions of the hash table!!Hash table can only hold as many items as table size!!If an item hashes to an already occupied bin follow the linked list and add item to the end!!If an item is deleted from the linked list, the rest of the list must be “moved up”!• but be careful that an item is not moved up past its original hash bucket!Coalesced Chaining!0!1!2!3!4!5!6!7!8!9!10!11!12!13!14!15!a!and!well!from!his!portrait!observer!might!was!shrewd!He!educated!devine!(sum all characters) mod 16!He 13!was 11!well 4!educated 15!and 3!from 4!his 4!portrait 5!a 1!shrewd 13!observer 8!might 9!divine 15!N = 13, M = 16!“portrait” must never be moved higher than 5!Item Removal!Removal on scatter tables is complicated:!• must not

View Full Document