HashingOverviewHash TableHash TableHash FunctionHash FunctionHash Function for String KeysHash Function for String KeysCollision ResolutionCollision Resolution by ChainingImplementation of Chaining Hash TableImplementation of Chaining Hash TableSlide Number 13Slide Number 14Slide Number 15Collision Resolution by Chaining: AnalysisCollision Resolution by Open AddressingCollision Resolution by Open AddressingLinear ProbingLinear Probing ExampleLinear Probing: AnalysisLinear Probing: AnalysisRandom Probing: AnalysisLinear vs. Random ProbingQuadratic ProbingQuadratic Probing ExampleQuadratic Probing: AnalysisQuadratic ProbingQuadratic Probing: ImplementationQuadratic Probing: ImplementationQuadratic Probing: ImplementationQuadratic Probing: ImplementationQuadratic Probing: ImplementationDouble HashingDouble Hashing ExampleDouble Hashing: AnalysisRehashingRehashing ExampleRehashing AnalysisRehashing ImplementationRehashing for ChainingRehashing for Quadratic ProbingHash Tables in C++ STLHash Set in SGI’s STLHash Map in SGI’s STLProblem with Large TablesExtendible HashingExtendible Hashing ExampleExtendible Hashing ExampleExtendible Hashing ExampleExtendible Hashing AnalysisHash Table ApplicationsSummary111111HashingCptS 223 – Advanced Data StructuresLarry HolderSchool of Electrical Engineering and Computer ScienceWashington State University22222Overview Hashing Technique supporting insertion, deletion and search in average-case constant time Operations requiring elements to be sorted (e.g., FindMin) are not efficiently supported Hash table ADT Implementations Analysis ApplicationsHash Table One approach Hash table is an array of fixed size TableSize Array elements indexed by a key, which is mapped to an array index (0…TableSize-1) Mapping (hash function) h from key to index E.g., h(“john”) = 33Hash Table Insert T [h(“john”] = <“john”,25000> Delete T [h(“john”)] = NULL Search Return T [h(“john”)] What if h(“john”) = h(“joe”) ?4Hash Function Mapping from key to array index is called a hash function Typically, many-to-one mapping Different keys map to different indices Distributes keys evenly over table Collision occurs when hash function maps two keys to same array index5Hash Function Simple hash function h(Key) = Key mod TableSize Assumes integer keys For random keys, h() distributes keys evenly over table What if TableSize = 100 and keys are multiples of 10? Better if TableSize is a prime number Not too close to powers of 2 or 106Hash Function for String Keys Approach 1 Add up character ASCII values (0-127) to produce integer keys Small strings may not use all of table Strlen(S) * 127 < TableSize Approach 2 Treat first 3 characters of string as base-27 integer (26 letters plus space) Key = S[0] + (27 * S[1]) + (272 * S[2]) Assumes first 3 characters randomly distributed Not true of English7Hash Function for String Keys Approach 3 Use all N characters of string as an N-digit base-K integer Choose K to be prime number larger than number of different digits (characters) I.e., K = 29, 31, 37 If L = length of string S, then Use Horner’s rule to compute h(S) Limit L for long strings8TableSizeiLSShLiimod37]1[)(10⎥⎦⎤⎢⎣⎡∗−−=∑−=Collision Resolution What happens when h(k1) = h(k2)? Collision resolution strategies Chaining Store colliding keys in a linked list Open addressing Store colliding keys elsewhere in the table9Collision Resolution by Chaining Hash table T is a vector of lists Only singly-linked lists needed if memory is tight Key k is stored in list at T[h(k)] E.g., TableSize = 10 h(k) = k mod 10 Insert first 10 perfect squares10Implementation of Chaining Hash Table11Generic hash functions for integers and keysImplementation of Chaining Hash Table1213Each of these operations takes time linear in the length of the list.STL algorithm: find14Later, but essentially doubles size of table and reinserts current elements.No duplicates15All hash objects must define == and != operators.Hash function to handle Employee object typeCollision Resolution by Chaining: Analysis Load factor λ of a hash table T N = number of elements in T M = size of T λ = N/M Average length of a chain is λ Unsuccessful search O(λ) Successful search O(λ/2) Ideally, want λ≈ 1 (not a function of N) I.e., TableSize = number of elements you expect to store in the table16Collision Resolution byOpen Addressing When a collision occurs, look elsewhere in the table for an empty slot Advantages over chaining No need for addition list structures No need to allocate/deallocate memory during insertion/deletion (slow) Disadvantages Slower insertion – May need several attempts to find an empty slot Table needs to be bigger (than chaining-based table) to achieve average-case constant-time performance Load factor λ≈0.517Collision Resolution byOpen Addressing Probe sequence Sequence of slots in hash table to search h0(x), h1(x), h2(x), … Needs to visit each slot exactly once Needs to be repeatable (so we can find/delete what we’ve inserted) Hash function hi(x) = (h(x) + f(i)) mod TableSize f(0) = 018Linear Probing f(i) is a linear function of i E.g., f(i) = i Example: h(x) = x mod TableSize h0(89) = (h(89)+f(0)) mod 10 = 9 h0(18) = (h(18)+f(0)) mod 10 = 8 h0(49) = (h(49)+f(0)) mod 10 = 9 (X) h1(49) = (h(49)+f(1)) mod 10 = 019Linear Probing Example20Linear Probing: Analysis Probe sequences can get long Primary clustering Keys tend to cluster in one part of table Keys that hash into cluster will be added to the end of the cluster (making it even bigger)21Linear Probing: Analysis Expected number of probes for insertion or unsuccessful search Expected number of probes for successful search Example (λ = 0.5) Insert / unsuccessful search 2.5 probes Successful search 1.5 probes Example (λ = 0.9) Insert / unsuccessful search 50.5 probes Successful search 5.5 probes22⎟⎟⎠⎞⎜⎜⎝⎛−+2)1(1121λ⎟⎟⎠⎞⎜⎜⎝⎛−+)1(1121λRandom Probing: Analysis Random probing does not suffer from clustering Expected number of probes for insertion or unsuccessful search: Example λ = 0.5: 1.4 probes λ = 0.9: 2.6
View Full Document