Robert Sedgewick and Kevin Wayne • Copyright © 2005 • http://www.Princeton.EDU/~cos2264.2 Hashing2Optimize JudiciouslyReference: Effective Java by Joshua Bloch."More computing sins are committed in the name of efficiency (withoutnecessarily achieving it) than for any other single reason - includingblind stupidity." - William A. Wulf"We should forget about small efficiencies, say about 97% of the time:premature optimization is the root of all evil." - Donald E. Knuth"We follow two rules in the matter of optimization: Rule 1: Don't do it. Rule 2 (for experts only). Don't do it yet - that is, not until you have a perfectly clear and unoptimized solution." - M. A. Jackson3Hashing: Basic Plan.Save items in a key-indexed table. Index is a function of the key.Hash function. Method for computing table index from key.Collision resolution strategy. Algorithm and data structure to handletwo keys that hash to the same index.Classic space-time tradeoff.! No space limitation: trivial hash function with key as address.! No time limitation: trivial collision resolution = sequential search.! Limitations on both time and space: hashing (the real world).4Choosing a Good Hash FunctionGoal: scramble the keys.! Efficiently computable.! Each table position equally likely for each key.Ex: Social Security numbers.! Bad: first three digits.! Better: last three digits.Ex: date of birth.! Bad: birth year.! Better: birthday.Ex: phone numbers.! Bad: first three digits.! Better: last three digits.573 = California, 574 = Alaskaassigned in chronological order within agiven geographic regionthoroughly researched problem5Hash Function: String KeysJava string library hash functions.! Equivalent to h = 31L-1s0 + . . . + 312sL-3 + 31sL-2 + sL-1.! Horner's method to hash string of length L: O(L).Q. Can we reliably use (h % M) as index for table of size M?A. No. Instead, use (h & 0x7fffffff) % M.public int hashCode() { int hash = 0; for (int i = 0; i < length(); i++) hash = (31 * hash) + s[i]; return hash;}s = "call";h = s.hashCode();hash = h % M;304598271218191ith character of s6hashCodeHash code. For any references x and y:! Repeated calls to x.hashCode() must return the same valueprovided no information used in equals comparison has changed.! If x.equals(y) then x and y must have the same hash code.Default implementation: memory address of x.Customized implementations: String, URL, Integer, Date."consistent with equals"7Implementing HashCode: US Phone NumbersPhone numbers: (609) 867-5309.area code exchangeextensionpublic final class PhoneNumber { private final int area, exch, ext; public PhoneNumber(int area, int exch, int ext) { this.area = area; this.exch = exch; this.ext = ext; } public boolean equals(Object y) { // as before } public int hashCode() { return 10007 * (area + 1009 * exch) + ext; }}8CollisionsCollision = two keys hashing to same value.! Essentially unavoidable.! Birthday problem: how many people will have to enter a room untiltwo have the same birthday? 23! With M hash values, expect a collision after sqrt(! ! M) insertions.Conclusion: can't avoid collisions unlessyou have a ridiculous amount of memory.Challenge: efficiently cope with collisions.9Collision Resolution: Two Approaches.Separate chaining.! M much smaller than N.! " N / M keys per table position.! Put keys that collide in a list.! Need to search lists.Open addressing.! M much larger than N.! Plenty of empty table slots.! When a new key collides, find nextempty slot and put it there.! Complex collision patterns.jocularly seriouslylistenbrowsingst[0]st[1]st[2]st[8190]suburban untravelledst[3]consideratingnullM = 8191, N = 15000listenbrowsingst[2]st[3]st[4]st[30001]suburbanst[5]nulljocularlyseriouslyst[0]st[1]nullM = 30001, N = 1500010Separate ChainingSeparate chaining: array of M linked lists.! Hash: map key to integer i between 0 and M-1.! Insert: put at front of ith chain (if not already there).! Search: only need to search ith chain.! Running time: proportional to length of chain.jocularly seriouslylistenbrowsingst[0]st[1]st[2]st[8190]3untravelled3suburban5017ishmael0seriously.. . .34807121hashmecallkeysuburban untravelledst[3] consideratingnullM = 819111Symbol Table: Separate Chainingpublic class ListHashST<Key, Value> { private int M = 8191; private Node[] st = new Node[M]; private static class Node { Object key; Object val; Node next; Node(Object key, Object val, Node next) { this.key = key; this.val = val; this.next = next; } } private int hash(Key key) { return (key.hashCode() & 0x7fffffff) % M; }between 0 and M-1hexno generic array creation in Java12Symbol Table: Separate Chaining Implementation (cont)public void put(Key key, Value val) { int i = hash(key); for (Node x = st[i]; x != null; x = x.next) { if (key.equals(x.key)) { x.val = val; return; } } st[i] = new Node(k, val, st[i]);} public Value get(Key key) { int i = hash(k); for (Node x = st[i]; x != null; x = x.next) if (key.equals(x.key)) return (Value) x.val; return null;}check if key already presentinsert at front of chain13Separate Chaining PerformanceSeparate chaining performance.! Search cost is proportional to length of chain.! Trivial: average length = N / M.! Worst case: all keys hash to same chain.Theorem. Let # = N / M > 1 be average length of list. For any t > 1,probability that list length > t # is exponentially small in t.Parameters.! M too large $ too many empty chains.! M too small $ chains too long.! Typical choice: # = N / M " 10 $ constant-time search/insert.depends on hash map being random map14Advantages: fast insertion, fast search.Disadvantage: hash table has fixed size.Sorted arrayImplementationUnsorted listlog NSearchNNInsertNlog NSearchN / 2N / 2InsertNN / 2DeleteN / 2Worst Case Average CaseNDeleteNSeparate chaining N N 1* 1* 1*N* assumes hash function is randomSymbol Table: Implementations Cost Summaryfix: use repeated doubling, and rehash all keys15Linear ProbingLinear probing: array of size M.! Hash: map key to integer i between 0 and M-1.! Insert: put in slot i if free, if not try i+1, i+2, etc.! Search: search slot i, if occupied but no match, try i+1, i+2, etc.Cluster.! Contiguous block of items.! Search through cluster using elementary algorithm for arrays.typically twice as many slots as elementsA S E A R C H I N G X M P16Symbol Table: Linear Probing Implementationpublic class ArrayHashST<Key, Val> {
View Full Document