Princeton COS 226 - Hashing - D2216579

Home> Schools> Princeton University> Computer Science (COS) > COS 226> Hashing

DOC PREVIEW

Princeton COS 226 - Hashing

School name Princeton University

Course Cos 226- Operating Systems

Pages 7

This preview shows page 1-2 out of 7 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 7 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Robert Sedgewick and Kevin Wayne • Copyright © 2005 • http://www.Princeton.EDU/~cos2264.2 Hashing2Optimize JudiciouslyReference: Effective Java by Joshua Bloch."More computing sins are committed in the name of efficiency (withoutnecessarily achieving it) than for any other single reason - includingblind stupidity." - William A. Wulf"We should forget about small efficiencies, say about 97% of the time:premature optimization is the root of all evil." - Donald E. Knuth"We follow two rules in the matter of optimization: Rule 1: Don't do it. Rule 2 (for experts only). Don't do it yet - that is, not until you have a perfectly clear and unoptimized solution." - M. A. Jackson3Hashing: Basic Plan.Save items in a key-indexed table. Index is a function of the key.Hash function. Method for computing table index from key.Collision resolution strategy. Algorithm and data structure to handletwo keys that hash to the same index.Classic space-time tradeoff.! No space limitation: trivial hash function with key as address.! No time limitation: trivial collision resolution = sequential search.! Limitations on both time and space: hashing (the real world).4Choosing a Good Hash FunctionGoal: scramble the keys.! Efficiently computable.! Each table position equally likely for each key.Ex: Social Security numbers.! Bad: first three digits.! Better: last three digits.Ex: date of birth.! Bad: birth year.! Better: birthday.Ex: phone numbers.! Bad: first three digits.! Better: last three digits.573 = California, 574 = Alaskaassigned in chronological order within agiven geographic regionthoroughly researched problem5Hash Function: String KeysJava string library hash functions.! Equivalent to h = 31L-1s0 + . . . + 312sL-3 + 31sL-2 + sL-1.! Horner's method to hash string of length L: O(L).Q. Can we reliably use (h % M) as index for table of size M?A. No. Instead, use (h & 0x7fffffff) % M.public int hashCode() { int hash = 0; for (int i = 0; i < length(); i++) hash = (31 * hash) + s[i]; return hash;}s = "call";h = s.hashCode();hash = h % M;304598271218191ith character of s6hashCodeHash code. For any references x and y:! Repeated calls to x.hashCode() must return the same valueprovided no information used in equals comparison has changed.! If x.equals(y) then x and y must have the same hash code.Default implementation: memory address of x.Customized implementations: String, URL, Integer, Date."consistent with equals"7Implementing HashCode: US Phone NumbersPhone numbers: (609) 867-5309.area code exchangeextensionpublic final class PhoneNumber { private final int area, exch, ext; public PhoneNumber(int area, int exch, int ext) { this.area = area; this.exch = exch; this.ext = ext; } public boolean equals(Object y) { // as before } public int hashCode() { return 10007 * (area + 1009 * exch) + ext; }}8CollisionsCollision = two keys hashing to same value.! Essentially unavoidable.! Birthday problem: how many people will have to enter a room untiltwo have the same birthday? 23! With M hash values, expect a collision after sqrt(! ! M) insertions.Conclusion: can't avoid collisions unlessyou have a ridiculous amount of memory.Challenge: efficiently cope with collisions.9Collision Resolution: Two Approaches.Separate chaining.! M much smaller than N.! " N / M keys per table position.! Put keys that collide in a list.! Need to search lists.Open addressing.! M much larger than N.! Plenty of empty table slots.! When a new key collides, find nextempty slot and put it there.! Complex collision patterns.jocularly seriouslylistenbrowsingst[0]st[1]st[2]st[8190]suburban untravelledst[3]consideratingnullM = 8191, N = 15000listenbrowsingst[2]st[3]st[4]st[30001]suburbanst[5]nulljocularlyseriouslyst[0]st[1]nullM = 30001, N = 1500010Separate ChainingSeparate chaining: array of M linked lists.! Hash: map key to integer i between 0 and M-1.! Insert: put at front of ith chain (if not already there).! Search: only need to search ith chain.! Running time: proportional to length of chain.jocularly seriouslylistenbrowsingst[0]st[1]st[2]st[8190]3untravelled3suburban5017ishmael0seriously.. . .34807121hashmecallkeysuburban untravelledst[3] consideratingnullM = 819111Symbol Table: Separate Chainingpublic class ListHashST<Key, Value> { private int M = 8191; private Node[] st = new Node[M]; private static class Node { Object key; Object val; Node next; Node(Object key, Object val, Node next) { this.key = key; this.val = val; this.next = next; } } private int hash(Key key) { return (key.hashCode() & 0x7fffffff) % M; }between 0 and M-1hexno generic array creation in Java12Symbol Table: Separate Chaining Implementation (cont)public void put(Key key, Value val) { int i = hash(key); for (Node x = st[i]; x != null; x = x.next) { if (key.equals(x.key)) { x.val = val; return; } } st[i] = new Node(k, val, st[i]);} public Value get(Key key) { int i = hash(k); for (Node x = st[i]; x != null; x = x.next) if (key.equals(x.key)) return (Value) x.val; return null;}check if key already presentinsert at front of chain13Separate Chaining PerformanceSeparate chaining performance.! Search cost is proportional to length of chain.! Trivial: average length = N / M.! Worst case: all keys hash to same chain.Theorem. Let # = N / M > 1 be average length of list. For any t > 1,probability that list length > t # is exponentially small in t.Parameters.! M too large $ too many empty chains.! M too small $ chains too long.! Typical choice: # = N / M " 10 $ constant-time search/insert.depends on hash map being random map14Advantages: fast insertion, fast search.Disadvantage: hash table has fixed size.Sorted arrayImplementationUnsorted listlog NSearchNNInsertNlog NSearchN / 2N / 2InsertNN / 2DeleteN / 2Worst Case Average CaseNDeleteNSeparate chaining N N 1* 1* 1*N* assumes hash function is randomSymbol Table: Implementations Cost Summaryfix: use repeated doubling, and rehash all keys15Linear ProbingLinear probing: array of size M.! Hash: map key to integer i between 0 and M-1.! Insert: put in slot i if free, if not try i+1, i+2, etc.! Search: search slot i, if occupied but no match, try i+1, i+2, etc.Cluster.! Contiguous block of items.! Search through cluster using elementary algorithm for arrays.typically twice as many slots as elementsA S E A R C H I N G X M P16Symbol Table: Linear Probing Implementationpublic class ArrayHashST<Key, Val> {

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 7 pages.

Princeton COS 226 - Hashing

Sign up for free to view:

Please select your school