111111Hashing22222OverviewHashingTechnique supporting insertion, deletion and search in average-case constant timeOperations requiring elements to be sorted (e.g., FindMin) are not efficiently supportedHash table ADTImplementationsAnalysisApplicationsHash TableOne approachHash table is an array of fixed size TableSizeArray elements indexed by a key, which is mapped to an array index (0…TableSize-1)Mapping (hash function) h from key to indexE.g., h(“john”) = 33Hash TableInsertT [h(“john”] = <“john”,25000>DeleteT [h(“john”)] = NULLSearchReturn T [h(“john”)]What if h(“john”) = h(“joe”) ?4Hash FunctionMapping from key to array index is called a hash functionTypically, many-to-one mappingDifferent keys map to different indicesDistributes keys evenly over tableCollision occurs when hash function maps two keys to same array index5Hash FunctionSimple hash functionh(Key) = Key mod TableSizeAssumes integer keysFor random keys, h() distributes keys evenly over tableWhat if TableSize = 100 and keys are multiples of 10?Better if TableSize is a prime numberNot too close to powers of 2 or 106Hash Function for String KeysApproach 1Add up character ASCII values (0-127) to produce integer keysSmall strings may not use all of tableStrlen(S) * 127 < TableSizeAnagrams will map to the same indexApproach 2Treat first 3 characters of string as base-27 integer (26 letters plus space)Key = S[0] + (27 * S[1]) + (272 * S[2])Assumes first 3 characters randomly distributedNot true of English7Hash Function for String KeysApproach 3Use all N characters of string as an N-digit base-K integerChoose K to be prime number larger than number of different digits (characters)I.e., K = 29, 31, 37If L = length of string S, thenUse Horner’s rule to compute h(S)Limit L for long strings8TableSizeiLSShLiimod37]1[)(10∗−−=∑−=Collision ResolutionWhat happens when h(k1) = h(k2)?Collision resolution strategiesChainingStore colliding keys in a linked listOpen addressingStore colliding keys elsewhere in the table9ChainingCollision resolution approach #1Collision Resolution by ChainingHash table T is a vector of listsOnly singly-linked lists needed if memory is tightKey k is stored in list at T[h(k)]E.g., TableSize = 10h(k) = k mod 10Insert first 10 perfect squares11Implementation of Chaining Hash Table12Generic hash functions for integers and keysImplementation of Chaining Hash Table1314Each of these operations takes time linear in the length of the list.15Later, but essentially doubles size of list and reinserts current elements.16All hash objects must define == and != operators.Hash function to handle Employee object typeCollision Resolution by Chaining: AnalysisLoad factor λ of a hash table TN = number of elements in TM = size of Tλ = N/MAverage length of a chain is λUnsuccessful search O(λ)Successful search O(λ/2)Ideally, want λ ≈ 1 (not a function of N)I.e., TableSize = number of elements you expect to store in the table17Open AddressingCollision resolution approach #2Collision Resolution byOpen AddressingWhen a collision occurs, look elsewhere in the table for an empty slotAdvantages over chainingNo need for addition list structuresNo need to allocate/deallocate memory during insertion/deletion (slow)DisadvantagesSlower insertion – May need several attempts to find an empty slotTable needs to be bigger (than chaining-based table) to achieve average-case constant-time performanceLoad factor λ ≈ 0.519Collision Resolution byOpen AddressingProbe sequenceSequence of slots in hash table to searchh0(x), h1(x), h2(x), …Needs to visit each slot exactly onceNeeds to be repeatable (so we can find/delete what we’ve inserted)Hash functionhi(x) = (h(x) + f(i)) mod TableSizef(0) = 020Linear Probingf(i) is a linear function of iE.g., f(i) = iExample: h(x) = x mod TableSizeh0(89) = (h(89)+f(0)) mod 10 = 9h0(18) = (h(18)+f(0)) mod 10 = 8h0(49) = (h(49)+f(0)) mod 10 = 9 (X)h1(49) = (h(49)+f(1)) mod 10 = 021Linear Probing Example22Linear Probing: AnalysisProbe sequences can get longPrimary clusteringKeys tend to cluster in one part of tableKeys that hash into cluster will be added to the end of the cluster (making it even bigger)23Linear Probing: AnalysisExpected number of probes for insertion or unsuccessful searchExpected number of probes for successful searchExample (λ = 0.5)Insert / unsuccessful search2.5 probesSuccessful search1.5 probesExample (λ = 0.9)Insert / unsuccessful search50.5 probesSuccessful search5.5 probes24−+2)1(1121λ−+)1(1121λRandom Probing: AnalysisRandom probing does not suffer from clusteringExpected number of probes for insertion or unsuccessful search:Exampleλ = 0.5: 1.4 probesλ = 0.9: 2.6 probes25λλ−11ln1Linear vs. Random Probing26Load factor λ# probesLinear probingRandom probingQuadratic ProbingAvoids primary clusteringf(i) is quadratic in iE.g., f(i) = i2Exampleh0(58) = (h(58)+f(0)) mod 10 = 8 (X)h1(58) = (h(58)+f(1)) mod 10 = 9 (X)h2(58) = (h(58)+f(2)) mod 10 = 227Quadratic Probing Example28Quadratic Probing: AnalysisDifficult to analyzeTheorem 5.1New element can always be inserted into a table that is at least half empty and TableSize is primeOtherwise, may never find an empty slot, even is one existsEnsure table never gets half fullIf close, then expand it29Quadratic ProbingOnly M (TableSize) different probe sequencesMay cause “secondary clustering”DeletionEmptying slots can break probe sequenceLazy deletionDifferentiate between empty and deleted slotSkip deleted slotsSlows operations (effectively increases λ)30Quadratic Probing: Implementation31Quadratic Probing: Implementation32Lazy deletionQuadratic Probing: Implementation33Ensure table size is primeQuadratic Probing: Implementation34Quadratic probe sequence (really)FindSkip DELETED;No duplicatesQuadratic Probing: Implementation35InsertRemoveNo deallocationneededNo duplicatesDouble HashingCombine two different hash functionsf(i) = i * h2(x)Good choices for h2(x) ?Should never evaluate to 0h2(x) = R – (x mod
View Full Document