DOC PREVIEW
Princeton COS 226 - Hashing

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Robert Sedgewick and Kevin Wayne • Copyright © 2005 • http://www.Princeton.EDU/~cos2264.2 Hashing2Optimize JudiciouslyReference: Effective Java by Joshua Bloch."More computing sins are committed in the name of efficiency (withoutnecessarily achieving it) than for any other single reason - includingblind stupidity." - William A. Wulf"We should forget about small efficiencies, say about 97% of the time:premature optimization is the root of all evil." - Donald E. Knuth"We follow two rules in the matter of optimization: Rule 1: Don't do it. Rule 2 (for experts only). Don't do it yet - that is, not until you have a perfectly clear and unoptimized solution." - M. A. Jackson3Hashing: Basic Plan.Save items in a key-indexed table. Index is a function of the key.Hash function. Method for computing table index from key.Collision resolution strategy. Algorithm and data structure to handletwo keys that hash to the same index.Classic space-time tradeoff.! No space limitation: trivial hash function with key as address.! No time limitation: trivial collision resolution = sequential search.! Limitations on both time and space: hashing (the real world).4Choosing a Good Hash FunctionGoal: scramble the keys.! Efficiently computable.! Each table position equally likely for each key.Ex: Social Security numbers.! Bad: first three digits.! Better: last three digits.Ex: date of birth.! Bad: birth year.! Better: birthday.Ex: phone numbers.! Bad: first three digits.! Better: last three digits.573 = California, 574 = Alaskaassigned in chronological order within agiven geographic regionthoroughly researched problem5Hash Function: String KeysJava string library hash functions.! Equivalent to h = 31L-1s0 + . . . + 312sL-3 + 31sL-2 + sL-1.! Horner's method to hash string of length L: O(L).Q. Can we reliably use (h % M) as index for table of size M?A. No. Instead, use (h & 0x7fffffff) % M.public int hashCode() { int hash = 0; for (int i = 0; i < length(); i++) hash = (31 * hash) + s[i]; return hash;}s = "call";h = s.hashCode();hash = h % M;304598271218191ith character of s6hashCodeHash code. For any references x and y:! Repeated calls to x.hashCode() must return the same valueprovided no information used in equals comparison has changed.! If x.equals(y) then x and y must have the same hash code.Default implementation: memory address of x.Customized implementations: String, URL, Integer, Date."consistent with equals"7Implementing HashCode: US Phone NumbersPhone numbers: (609) 867-5309.area code exchangeextensionpublic final class PhoneNumber { private final int area, exch, ext; public PhoneNumber(int area, int exch, int ext) { this.area = area; this.exch = exch; this.ext = ext; } public boolean equals(Object y) { // as before } public int hashCode() { return 10007 * (area + 1009 * exch) + ext; }}8CollisionsCollision = two keys hashing to same value.! Essentially unavoidable.! Birthday problem: how many people will have to enter a room untiltwo have the same birthday? 23! With M hash values, expect a collision after sqrt(! ! M) insertions.Conclusion: can't avoid collisions unlessyou have a ridiculous amount of memory.Challenge: efficiently cope with collisions.9Collision Resolution: Two Approaches.Separate chaining.! M much smaller than N.! " N / M keys per table position.! Put keys that collide in a list.! Need to search lists.Open addressing.! M much larger than N.! Plenty of empty table slots.! When a new key collides, find nextempty slot and put it there.! Complex collision patterns.jocularly seriouslylistenbrowsingst[0]st[1]st[2]st[8190]suburban untravelledst[3]consideratingnullM = 8191, N = 15000listenbrowsingst[2]st[3]st[4]st[30001]suburbanst[5]nulljocularlyseriouslyst[0]st[1]nullM = 30001, N = 1500010Separate ChainingSeparate chaining: array of M linked lists.! Hash: map key to integer i between 0 and M-1.! Insert: put at front of ith chain (if not already there).! Search: only need to search ith chain.! Running time: proportional to length of chain.jocularly seriouslylistenbrowsingst[0]st[1]st[2]st[8190]3untravelled3suburban5017ishmael0seriously.. . .34807121hashmecallkeysuburban untravelledst[3] consideratingnullM = 819111Symbol Table: Separate Chainingpublic class ListHashST<Key, Value> { private int M = 8191; private Node[] st = new Node[M]; private static class Node { Object key; Object val; Node next; Node(Object key, Object val, Node next) { this.key = key; this.val = val; this.next = next; } } private int hash(Key key) { return (key.hashCode() & 0x7fffffff) % M; }between 0 and M-1hexno generic array creation in Java12Symbol Table: Separate Chaining Implementation (cont)public void put(Key key, Value val) { int i = hash(key); for (Node x = st[i]; x != null; x = x.next) { if (key.equals(x.key)) { x.val = val; return; } } st[i] = new Node(k, val, st[i]);} public Value get(Key key) { int i = hash(k); for (Node x = st[i]; x != null; x = x.next) if (key.equals(x.key)) return (Value) x.val; return null;}check if key already presentinsert at front of chain13Separate Chaining PerformanceSeparate chaining performance.! Search cost is proportional to length of chain.! Trivial: average length = N / M.! Worst case: all keys hash to same chain.Theorem. Let # = N / M > 1 be average length of list. For any t > 1,probability that list length > t # is exponentially small in t.Parameters.! M too large $ too many empty chains.! M too small $ chains too long.! Typical choice: # = N / M " 10 $ constant-time search/insert.depends on hash map being random map14Advantages: fast insertion, fast search.Disadvantage: hash table has fixed size.Sorted arrayImplementationUnsorted listlog NSearchNNInsertNlog NSearchN / 2N / 2InsertNN / 2DeleteN / 2Worst Case Average CaseNDeleteNSeparate chaining N N 1* 1* 1*N* assumes hash function is randomSymbol Table: Implementations Cost Summaryfix: use repeated doubling, and rehash all keys15Linear ProbingLinear probing: array of size M.! Hash: map key to integer i between 0 and M-1.! Insert: put in slot i if free, if not try i+1, i+2, etc.! Search: search slot i, if occupied but no match, try i+1, i+2, etc.Cluster.! Contiguous block of items.! Search through cluster using elementary algorithm for arrays.typically twice as many slots as elementsA S E A R C H I N G X M P16Symbol Table: Linear Probing Implementationpublic class ArrayHashST<Key, Val> {


View Full Document

Princeton COS 226 - Hashing

Documents in this Course
QUICKSORT

QUICKSORT

14 pages

QUICKSORT

QUICKSORT

55 pages

STRINGS

STRINGS

69 pages

Lecture

Lecture

4 pages

STRINGS

STRINGS

18 pages

MERGESORT

MERGESORT

14 pages

Quicksort

Quicksort

12 pages

Strings

Strings

10 pages

Overview

Overview

15 pages

Hashing

Hashing

13 pages

Mergesort

Mergesort

15 pages

Tries

Tries

13 pages

Final

Final

12 pages

Final

Final

12 pages

Mergesort

Mergesort

13 pages

Final

Final

10 pages

Load more
Download Hashing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hashing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hashing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?