Hashing & Hash TablesHashing & Hash Tables111111Cpt S 223. School of EECS, WSUOverview Hash Table Data Structure : PurposeTo support insertion, deletion and search in tttiaverage-case constant time Assumption: Order of elements irrelevant ==> data structure *not* useful for if you want to iti d ti kid f d fthmaintain and retrieve some kind of an order of the elements Hash functionHash[ “string key”] ==> integer value Hash table ADTIl tti AliAliti22222Implementations, Analysis, ApplicationsCpt S 223. School of EECS, WSUHash table: Main componentskey valueHash indexe“john”keyh(“john”)TableSizekeyHash functionHash table(implemented as a vector)How to determine … ?3Cpt S 223. School of EECS, WSUHash Table Hash table is an array of fixed size TableSizekey Element value Array elements indexed by a key, which is mapped to an array index (0…TableSize-1)Mapping (hash function) hMapping (hash function) h from key to index E.g., h(“john”) = 34Cpt S 223. School of EECS, WSUHash Table OperationsHash fti Insert T [h(“john”)] = <“john”,25000>Hash keyfunction DeleteT[h(“john”)] = NULLData recordT [h( john )] = NULL Search T [h(“john”)] returns the element hashed for “john”What happens if h(“john”)==h(“joe”)?5Cpt S 223. School of EECS, WSUWhat happens if h( john ) == h(joe) ?“collision”Factors affecting Hash Table Design Hash function Table size Usually fixed at the startyCollision handling schemeCollision handling scheme6Cpt S 223. School of EECS, WSUHash Function A hash function is one which maps an element’s key into a valid hash table index h(key) => hash table indexNote that this is (slightly) different from saying: h(string) => intBecause the key can be of any typeBecause the key can be of any type E.g., “h(int) => int” is also a hash function! But also note that any type can be converted into yypan equivalent string formCpt S 223. School of EECS, WSU 7h(key) ==> hash table indexHash Function Properties A hash function maps key to integer Constraint: Integer should be between [0, TableSize-1] A hash function can result in a many-to-one mapping (causing collision)(causing collision) Collision occurs when hash function maps two or more keys to same array indexC lli i t b id d b t it h bCollisions cannot be avoided but its chances can be reduced using a “good” hash function8Cpt S 223. School of EECS, WSUh(key) ==> hash table indexHash Function Properties A “good” hash function should have the properties:properties:1. Reduced chance of collisionDifferent keys should ideally map to differentDifferent keys should ideally map to different indicesDistribute keys uniformly over table 2. Should be fast to compute99Cpt S 223. School of EECS, WSUHash Function - Effective use of table size Simple hash function (assume integer keys) h(Key) = Key modTableSize For random keys, h() distributes keys evenly over tableover table What if TableSize = 100 and keys are ALL multiples of 10? Better if TableSize is a prime number10Cpt S 223. School of EECS, WSUDifferent Ways to Design a Hash Function for StringKeysA very simple function to map strings to integers: Add up character ASCII values (0-255) to produce integer keysinteger keys E.g., “abcd” = 97+98+99+100 = 394 ==> h(“abcd”) = 394 % TableSize Potential problems:Potential problems: Anagrams will map to the same index h(“abcd”) == h(“dbac”) Small strings may not use all of table Strlen(S) * 255 < TableSizeTime proportional to length of the stringTime proportional to length of the string11Cpt S 223. School of EECS, WSUDifferent Ways to Design a Hash Function for StringKeys Approach 2 Treat first 3 characters of string as base-27 integer (26 letters plus space)letters plus space) Key = S[0] + (27 * S[1]) + (272 * S[2]) Better than approach 1 because … ?Potential problems: Assumes first 3 characters randomly distributed Not true of EnglishAppleApplyAppointmentcollision12ppApricot12Cpt S 223. School of EECS, WSUDifferent Ways to Design a Hash Function for StringKeys Approach 3Use all N characters of string as an N-digit base-K numberg Choose K to be prime number larger than number of different digits (characters) I.e., K = 29, 31, 37 If L = length of string S, thenL1 Use Horner’s rule to compute h(S)Li it L f l t iTableSizeiLSShLiimod37]1[)(10Problems:potential overflow Limit L for long strings13larger runtimeCpt S 223. School of EECS, WSU“Collision resolution techniques”Thi tDlithqTechniques to Deal with CollisionsCollisionsChainingOpen addressingOpen addressingDouble hashingEtcEtc.Cpt S 223. School of EECS, WSU 14Resolving Collisions What happens when h(k1) = h(k2)?==> collision !> collision ! Collision resolution strategiesChainingChaining Store colliding keys in a linked list at the same hash table indexhash table indexOpen addressing Store colliding keys elsewhere in the tablegy15Cpt S 223. School of EECS, WSUCh i iChainingCollision resolution technique #116Cpt S 223. School of EECS, WSUChaining strategy: maintains a linked list atChaining strategy: maintains a linked list at every hash index for collided elementsInsertion sequence: { 0149162536496481} Hash table T is a vector of linked listsInsertion sequence: { 0 1 4 9 16 25 36 49 64 81 } Insert element at the head (as shown here) or at the tailKey k is stored in list atKey k is stored in list at T[h(k)] E.g., TableSize = 10g h(k) = k mod 10 Insert first 10 perfect squaressquares17Cpt S 223. School of EECS, WSUImplementation of Chaining Hash TableVector of linked lists(this is the main hashtable)Current #elements in the hashtableHash functions for it d tithe hashtable18integers and string keysCpt S 223. School of EECS, WSUImplementation of Chaining Hash TableThis is the hashtable’s current capacitycurrent capacity (aka. “table size”)This is the hash table index for the element x19Cpt S 223. School of EECS, WSUxDuplicate checkLater, but essentially resizes the hashtable if its getting crowded20Cpt S 223. School of EECS, WSUEach of theseEach of these operations takes time linear in the length of the list at the hashed 21index locationCpt S 223. School of EECS, WSUAll hash objects must define == and != operators.Hash function to handle Employee
View Full Document