CSE 326 Data Structures Part 5 Hashing Henry Kautz Autumn 2002 1 Midterm Monday November 4th Will cover everything through hash tables No homework due that day but a study sheet and practice problems on trees and hashing will be distributed 50 minutes in class You may bring one page of notes to refer to 2 Dictionary Search ADTs Operations create destroy insert find delete Dictionary keys insert kohlrabi upscale tuber spicy cabbage kreplach find kreplach kreplach tasty stuffed dough Stores values associated kim chi tasty stuffed dough kiwi Australian fruit with user specified keys may be any homogenous comparable type values may be any homogenous type implementation data field is a struct with two parts Search ADT keys values 3 Implementations So Far unsorted list sorted array Trees BST average AVL worst case splay amortized insert find 1 n log n find n log n log n delete find 1 n Array of size n where keys are 0 n 1 log n 4 Hash Tables Basic Idea Use a key arbitrary string or number to index directly into an array O 1 time to access records A kreplach tasty stuffed dough Need a hash function to convert the key to an integer Key Data 0 kim chi spicy cabbage 1 kreplach tasty stuffed dough 2 kiwi Australian fruit 5 Applications When log n is just too big Symbol tables in interpreters Real time databases in core or on disk air traffic control packet routing When associative memory is needed Dynamic programming cache results of previous computation f x if Find x then Find x else f x Chess endgames Many text processing applications e g Web Status LastURL visited 6 How could you use hash tables to Implement a linked list of unique elements Create an index for a book Convert a document to a Sparse Boolean Vector where each index represents a different word 7 Properties of Good Hash Functions Must return number 0 tablesize Should be efficiently computable O 1 time Should not waste space unnecessarily For every index there is at least one key that hashes to it Load factor lambda number of keys TableSize Should minimize collisions different keys hashing to same index 8 Integer Keys Hash x x TableSize Good idea to make TableSize prime Why 9 Integer Keys Hash x x TableSize Good idea to make TableSize prime Why Because keys are typically not randomly distributed but usually have some pattern mostly even mostly multiples of 10 in general mostly multiples of some k If k is a factor of TableSize then only TableSize k slots will ever be used Since the only factor of a prime number is itself this phenomena only hurts in the rare case where k TableSize 10 Strings as Keys If keys are strings can get an integer by adding up ASCII values of characters in key for i 0 i key length i hashVal key charAt i Problem 1 What if TableSize is 10 000 and all keys are 8 or less characters long Problem 2 What if keys often contain the same characters abc bca etc 11 Hashing Strings Basic idea consider string to be a integer base 128 Hash abc a 1282 b 1281 c TableSize Range of hash large anagrams get different values Problem although a char can hold 128 values 8 bits only a subset of these values are commonly used 26 letters plus some special characters So just use a smaller base Hash abc a 322 b 321 c TableSize 12 Making the String Hash Easy to Compute Horner s Rule int hash String s h 0 for i s length 1 i 0 i h s keyAt i h 5 tableSize return h What is Advantages happening here 13 How Can You Hash A set of values name birthdate An arbitrary pointer in C An arbitrary reference to an object in Java 14 How Can You Hash A set of values name birthdate Hash name Hash birthdate tablesize What s this An arbitrary pointer in C int p tablesize An arbitrary reference to an object in Java Hash obj toString or just obj hashCode tablesize 15 Optimal Hash Function The best hash function would distribute keys as evenly as possible in the hash table Simple uniform hashing Maps each key to a fixed random number Idealized gold standard Simple to analyze Can be closely approximated by best hash functions 16 Collisions and their Resolution A collision occurs when two different keys hash to the same value E g For TableSize 17 the keys 18 and 35 hash to the same value 18 mod 17 1 and 35 mod 17 1 Cannot store both data records in the same slot in array Two different methods for collision resolution Separate Chaining Use a dictionary data structure such as a linked list to store multiple items that hash to the same slot Closed Hashing or probing search for empty slots using a second function and store item in first empty slot that is found 17 A Rose by Any Other Name Separate chaining Open hashing Closed hashing Open addressing 18 Hashing with Separate Chaining Put a little dictionary at each entry choose type as appropriate common case is unordered linked list chain 0 1 performance degrades with length of chains can be greater than 1 What was a d e b 2 3 Properties h a h d h e h b 4 5 c 6 19 Load Factor with Separate Chaining Search cost unsuccessful search successful search Optimal load factor 20 Load Factor with Separate Chaining Search cost assuming simple uniform hashing unsuccessful search Whole list average length successful search Half the list average length 2 1 Optimal load factor Zero But between and 1 is fast and makes good use of memory 21 Alternative Strategy Closed Hashing Problem with separate chaining Memory consumed by pointers 32 or 64 bits per key What if we only allow one Key at each entry two objects that hash to the same spot can t both go there first one there gets the spot next one must go in another spot Properties 1 performance degrades with difficulty of finding right spot h a h d h e h b 0 1 2 3 4 5 a d e b c 6 22 Collision Resolution by Closed Hashing Given an item X try cells h0 X h1 X h2 X hi X hi X Hash X F i mod TableSize Define F 0 0 F is the collision resolution function Some possibilities Linear F i i Quadratic F i i2 Double Hashing F i i Hash2 X 23 Closed Hashing I Linear Probing Main Idea When collision occurs scan down the array one cell at a time looking for an empty cell hi X Hash X i mod TableSize i 0 1 2 Compute hash value and increment it until a free cell is found 24 Linear Probing Example insert 14 insert 8 14 7 0 0 probes 8 7 1 insert 21 insert 2 21 7 0 2 7 2 0 14 0 14 0 14 1 1 8 1 8 1 8 2 2 …
View Full Document
Unlocking...