UT Dallas CS 5343 - 23. Hashing1 - D3105680

Home> Schools> University of Texas at Dallas> Computer Science (CS) > CS 5343> 23. Hashing1

UT Dallas CS 5343 - 23. Hashing1

School name University of Texas at Dallas

Course Cs 5343- Algorithm Analysis and Data Structures

Pages 40

Download Save

Unformatted text preview:

Hashing CENG 213 Data Structures 1 Hash Tables We ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees The implementation of hash tables is called hashing Hashing is a technique used for performing insertions deletions and finds in constant average time i e O 1 This data structure however is not efficient in operations that require any ordering information among the elements such as findMin findMax and printing the entire table in sorted order CENG 213 Data Structures 2 General Idea The ideal hash table structure is merely an array of some fixed size containing the items A stored item needs to have a data member called key that will be used in computing the index value for the item Key could be an integer a string etc e g a name or Id that is a part of a large employee structure The size of the array is TableSize The items that are stored in the hash table are indexed by values from 0 to TableSize 1 Each key is mapped into some number in the range 0 to TableSize 1 The mapping is called a hash function CENG 213 Data Structures 3 Example Hash Table 0 1 Items 2 john 25000 phil 31250 dave 27500 3 key Hash Function 4 5 6 mary 28200 john25000 25000 john phil31250 31250 phil 7 dave27500 27500 dave mary28200 28200 mary 8 key 9 CENG 213 Data Structures 4 Hash Function The hash function must be simple to compute must distribute the keys evenly among the cells If we know which keys will occur in advance we can write perfect hash functions but we don t CENG 213 Data Structures 5 Hash function Problems Keys may not be numeric Number of possible keys is much larger than the space available in table Different keys may map into same location Hash function is not one to one collision If there are too many collisions the performance of the hash table will suffer dramatically CENG 213 Data Structures 6 Hash Functions If the input keys are integers then simply Key mod TableSize is a general strategy Unless key happens to have some undesirable properties e g all keys end in 0 and we use mod 10 If the keys are strings hash function needs more care First convert it into a numeric value CENG 213 Data Structures 7 Definitions Hash function maps a key k to an index address in the table Perfect hash function function that maps each key to a unique index Collision occurs when more than one key maps to the same index CENG 213 Data Structures 8 Some methods Truncation e g 123456789 map to a table of 1000 addresses by picking 3 digits of the key Folding e g 123 456 789 add them and take mod Key mod N N is the size of the table better if it is prime Squaring Square the key and then truncate Radix conversion e g 1 2 3 4 treat it to be base 7 truncate if necessary CENG 213 Data Structures 9 folding Hashing by folding Idea divide the key into parts then combine fold the parts to create the index Shift folding parts are placed underneath one another then added Boundary folding same as shift folding except that every other part is written backwards Usually followed by modulo division CENG 213 Data Structures 10 Folding example Key is 23459087632 Divide into parts 234 590 876 32 Shift folding 234 590 876 32 1732 Boundary folding 234 095 876 23 1228 CENG 213 Data Structures 11 Computing the mid square Hashing by computing the mid square Idea square the key use the middle as the address Example 96012 9218304144 8304 is hash Advantage entire key is used to calculate the address reducing chances of collisions CENG 213 Data Structures 12 Hash Function 1 Add up the ASCII values of all characters of the key int hash const string key int tableSize int hasVal 0 for int i 0 i key length i hashVal key i return hashVal tableSize Simple to implement and fast However if the table size is large the function does not distribute the keys well e g Table size 10000 key length 8 the hash function can assume values only between 0 and 1016 CENG 213 Data Structures 13 Hash Function 2 Examine only the first 3 characters of the key int hash const string key int tableSize return key 0 27 key 1 729 key 2 tableSize In theory 26 26 26 17576 different words can be generated However English is not random only 2851 different combinations are possible Thus this function although easily computable is also not appropriate if the hash table is reasonably large CENG 213 Data Structures 14 Hash Function 3 hash key KeySize 1 i Key KeySize i 1 37 i 0 int hash const string key int tableSize int hashVal 0 for int i 0 i key length i hashVal 37 hashVal key i hashVal tableSize if hashVal 0 in case overflows occurs hashVal tableSize return hashVal CENG 213 Data Structures 15 Hash function for strings key i 116 112 110 key t 0 o m i 1 2 KeySize 3 hash tom 110 1 112 37 116 372 10 007 894 tom 0 1 2 hash function tom 894 10 006 TableSize CENG 213 Data Structures 16 Collision Resolution If when an element is inserted it hashes to the same value as an already inserted element then we have a collision and need to resolve it There are several methods for dealing with this Separate chaining Open addressing Linear Probing Quadratic Probing Double Hashing CENG 213 Data Structures 17 Separate Chaining The idea is to keep a list of all elements that hash to the same value The array elements are pointers to the first nodes of the lists A new item is inserted to the front of the list Advantages Better space utilization for large items Simple collision handling searching linked list Overflow we can store more items than the hash table size Deletion is quick and easy deletion from the linked list CENG 213 Data Structures 18 Example Keys 0 1 4 9 16 25 36 49 64 81 hash key key 10 0 0 1 81 1 4 64 4 5 25 6 36 16 49 9 2 3 7 8 9 CENG 213 Data Structures 19 Operations Initialization all entries are set to NULL Find locate the cell using hash function sequential search on the linked list in that cell Insertion Locate the cell using hash function If the item does not exist insert it as the first item in the list Deletion Locate the cell using hash function Delete the item from the linked list CENG 213 Data Structures 20 Analysis of Separate Chaining Collisions are very likely How likely and what is the average length of lists Load factor definition Ratio of number of elements N in a hash table to the hash TableSize i e N TableSize The average length of a list is also For chaining is not bound by 1 it …

View Full Document


School:
Email:
New Password:
Confirm Password:

UT Dallas CS 5343 - 23. Hashing1

Sign up for free to view:

Please select your school