Hash TablesDictionaryDirect-address TablesSlide 4Hash TablesHashingSlide 7Issues with HashingMethods of ResolutionCollision Resolution by ChainingSlide 11Hashing with ChainingAnalysis on Chained-Hash-SearchExpected Cost of an Unsuccessful SearchExpected Cost of a Successful SearchSlide 16Proof – Contd.Expected Cost – InterpretationGood Hash FunctionsKeys as Natural NumbersDivision MethodMultiplication MethodMultiplication Mthd. – ImplementationMultiplication Mthd – ImplementationHow to choose A?Universal HashingUniversal Set of Hash FunctionsCost of Universal HashingExample of Universal HashingOpen AddressingProbe SequenceEx: Linear ProbingOperation InsertPseudo-code for SearchDeletionComputing Probe SequencesLinear ProbingSlide 38Quadratic ProbingDouble HashingSlide 41Analysis of Open-address HashingExpected cost of an unsuccessful searchSlide 44Slide 45Expected cost of a successful searchSlide 47Perfect HashingSlide 49Slide 50Slide 51Hash Tables Hash Tables Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel HillDictionary Dictionary:»Dynamic-set data structure for storing items indexed using keys.»Supports operations Insert, Search, and Delete.»Applications:•Symbol table of a compiler.•Memory-management tables in operating systems. •Large-scale distributed systems.Hash Tables:»Effective way of implementing dictionaries.»Generalization of ordinary arrays.Direct-address Tables Direct-address Tables are ordinary arrays.Facilitate direct addressing.»Element whose key is k is obtained by indexing into the kth position of the array.Applicable when we can afford to allocate an array with one position for every possible key.»i.e. when the universe of keys U is small.Dictionary operations can be implemented to take O(1) time.»Details in Sec. 11.1.Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.Hash TablesNotation:»U – Universe of all possible keys.»K – Set of keys actually stored in the dictionary.» |K| = n.When U is very large,»Arrays are not practical.»|K| << |U|.Use a table of size proportional to |K| – The hash tables.»However, we lose the direct-addressing ability.»Define functions that map keys to slots of the hash table.HashingHash function h: Mapping from U to the slots of a hash table T[0..m–1]. h : U {0,1,…, m–1}With arrays, key k maps to slot A[k].With hash tables, key k maps or “hashes” to slot T[h[k]].h[k] is the hash value of key k.Hashing0m–1h(k1)h(k4)h(k2)=h(k5)h(k3)U(universe of keys)K(actualkeys)k1k2k3k5k4collisionIssues with HashingMultiple keys can hash to the same slot – collisions are possible.»Design hash functions such that collisions are minimized.»But avoiding collisions is impossible.•Design collision-resolution techniques.Search will cost Ө(n) time in the worst case.»However, all operations can be made to have an expected complexity of Ө(1).Methods of ResolutionChaining: »Store all elements that hash to the same slot in a linked list.»Store a pointer to the head of the linked list in the hash table slot.Open Addressing:»All elements stored in hash table itself.»When collisions occur, use a systematic (consistent) procedure to store elements in free slots of the table.k20m–1k1k4k5k6k7k3k8Collision Resolution by Chaining0m–1h(k1)=h(k4)h(k2)=h(k5)=h(k6)h(k3)=h(k7)U(universe of keys)K(actualkeys)k1k2k3k5k4k6k7k8h(k8)XXXk2Collision Resolution by Chaining0m–1U(universe of keys)K(actualkeys)k1k2k3k5k4k6k7k8k1k4k5k6k7k3k8Hashing with ChainingDictionary Operations:Chained-Hash-Insert (T, x)»Insert x at the head of list T[h(key[x])].»Worst-case complexity – O(1).Chained-Hash-Delete (T, x)»Delete x from the list T[h(key[x])].»Worst-case complexity – proportional to length of list with singly-linked lists. O(1) with doubly-linked lists.Chained-Hash-Search (T, k)»Search an element with key k in list T[h(k)].»Worst-case complexity – proportional to length of list.Analysis on Chained-Hash-SearchLoad factor =n/m = average keys per slot.»m – number of slots.» n – number of elements stored in the hash table.Worst-case complexity: (n) + time to compute h(k).Average depends on how h distributes keys among m slots.Assume »Simple uniform hashing.•Any key is equally likely to hash into any of the m slots, independent of where any other key hashes to.»O(1) time to compute h(k).Time to search for an element with key k is (|T[h(k)]|).Expected length of a linked list = load factor = = n/m.Expected Cost of an Unsuccessful SearchProof:Any key not already in the table is equally likely to hash to any of the m slots.To search unsuccessfully for any key k, need to search to the end of the list T[h(k)], whose expected length is α.Adding the time to compute the hash function, the total time required is Θ(1+α). Theorem:An unsuccessful search takes expected time Θ(1+α).Expected Cost of a Successful SearchProof:The probability that a list is searched is proportional to the number of elements it contains.Assume that the element being searched for is equally likely to be any of the n elements in the table.The number of elements examined during a successful search for an element x is 1 more than the number of elements that appear before x in x’s list.»These are the elements inserted after x was inserted.Goal:»Find the average, over the n elements x in the table, of how many elements were inserted into x’s list after x was inserted. Theorem:A successful search takes expected time Θ(1+α).Expected Cost of a Successful SearchProof (contd):Let xi be the ith element inserted into the table, and let ki = key[xi].Define indicator random variables Xij = I{h(ki) = h(kj)}, for all i, j.Simple uniform hashing Pr{h(ki) = h(kj)} = 1/m E[Xij] = 1/m.Expected number of elements examined in a successful search is:Theorem:A successful search takes expected time Θ(1+α). ninijijXnE1 111No. of elements inserted after xi into the same slot as xi.Proof – Contd.nmnnnnnminnminnmmnXEnXnEnininininijninijijninijij2212112)1(1111)(11111][111121 111 11 11
View Full Document