DOC PREVIEW
Princeton COS 226 - Hash functions Separate chaining

This preview shows page 1-2 out of 7 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 7 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Princeton University • COS 226 • Algorithms and Data Structures • Spring 2004 • Kevin Wayne • http://www.Princeton.EDU/~cos226HashingReference: Chapter 14, Algorithms in Java, 3rdEdition, Robert Sedgewick.Hash functionsSeparate chainingLinear probingDouble hashing2Optimize JudiciouslyReference: Effective Javaby Joshua Bloch."More computing sins are committed in the name of efficiency (without necessarily achieving it) than for any other single reason - including blind stupidity." - William A. Wulf"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." - Donald E. Knuth"We follow two rules in the matter of optimization:Rule 1: Don't do it.Rule 2 (for experts only). Don't do it yet - that is, not until you havea perfectly clear and unoptimized solution."- M. A. Jackson3Hashing: Basic Plan.Save items in a key-indexed table. Index is a function of the key.Hash function. Method for computing table index from key.Collision resolution strategy. Algorithm and data structure to handle two keys that hash to the same index.Classic space-time tradeoff.n No space limitation: trivial hash function with key as address.n No time limitation: trivial collision resolution = sequential search.n Limitations on both time and space: hashing (the real world)4Choosing a Good Hash FunctionGoal: scramble the keys.n Each table position equally likely for each key.Ex: Social Security numbers.n Bad: first three digits.n Better: last three digits.Ex: date of birth.n Bad: first three digits of birth year.n Better: birthday.Ex: phone numbers.n Bad: first three digits.n Better: last three digits.573 = California, 574 = Alaskaassigned in chronological order within a given geographic regionthoroughly researched problem198 for all of youless collisions even with only366 possible values5Hash Function: String KeysStrings hash functions.n Java 1.1: calculation involving only 16 characters.n Java 1.2: calculation involving all characters.n Equivalent to h = 31N-1s0+ . . . + 312s2+ 31s1+ sN-1.n Can we use h % M as index for table of size M?Work to hash a string of length W.n W add, W multiply, 1 mod.n Note: reference Java implementation caches String hash codes.public int hashCode() {int hash = 0;for (int i = 0; i < length(); i++)hash = (31 * hash)+charAt(i);return hash;}String.javas = "call";h = s.hashCode();hash = h % M; 304598271218191Horner's method6CollisionsCollision = two keys hashing to same value.n Essentially unavoidable.n Birthday problem: how many people will have to enter a room until two have the same birthday? 23n With M hash values, expect a collision after sqrt(p M/2) insertions.Conclusion: can't avoid collisions unlessyou have a ridiculous amount of memory.Challenge: efficiently cope with collisions.7Collision Resolution.Two main approaches.Separate chaining.n M much smaller than N.n ~N / M keys per table position.n Put keys that collide in a list.n Need to search lists.Open addressing.n M much larger than N.n plenty of empty table slots.n When a new key collides, find nextempty slot and put it there.n Complex collision patterns.jocularly seriouslylistenbrowsingst[0]st[1]st[2]st[8190]suburban untravelledst[3]consideratingnullM = 8191, N = 15000 listenbrowsingst[2]st[3]st[4]st[8190]suburbanst[5]nulljocularlyseriouslyst[0]st[1]nullM = 30001, N = 15000 8Separate ChainingSeparate chaining: array of M linked lists.n Hash: map key to integer i between 0 and M-1.n Insert: put at front of ith chain.n Search: only need to search ithchain.jocularly seriouslylistenbrowsingst[0]st[1]st[2]st[8190]3untravelled3suburban5017ishmael0seriously.. . .34807121hashmecallkeysuburban untravelledst[3] consideratingnullM = 8191constant timeproportional to length of chain9Symbol Table: Hash Table Implementationpublic class SymbolTable {private int M = 8191;private List[] st = new List[M];private class List { AS BEFORE }public static int hash(String s, int M) {return (s.hashCode() & 0x7fffffff)% M;}void put(String k, Object val) {int i = hash(k, M);st[i]=new List(k, val, st[i]);} Object get(String k) {int i = hash(k, M);for (List x = st[i]; x != null; x = x.next)if (k.equals(x.key)) return x.value;return null;}}number of chains (8191 is prime)insert at front of ithchainexhaustively search ithchain for keybetween 0 and M-1hex10Hash Table Implementation: PerformanceAdvantages: fast insertion, fast search.Disadvantage: hash table has fixed size.Hash tables improves ALL symbol table clients.n Makes difference between practical solution and no solution.n Ex: Moby Dick now takes a few seconds instead of hours.% java DeDup < mobydick.txtmobydickhermanmelvillecallmeishmaelsomeyearsago. . .210,028 words16,834 distinctcorrected by doubling the size of the array and rehashing all of the key-value pairs11Separate Chaining PerformanceSeparate chaining performance.n Search cost is proportional to length of chain.n Trivial: average length = N / M.n Worst case: all keys hash to same chain.Theorem. Let a = N / M > 1 be average length of list. For any t > 1, probability that list length > t a is exponentially small in t.Parameters.n M too large Þ too many empty chains.n M too small Þ chains too long.n Typical choice: a = N / M ~ 10 Þ constant-time search/insert.depends on hash map being random map12Symbol Table: Implementations Cost SummarySorted arrayImplementationUnsorted listlog NSearchNNInsert1log NSearchN / 2N / 2Insert1N / 2Delete1Worst Case Average CaseNDelete1Hashing N 1 1* 1* 1*N* assumes hash function is random13Linear ProbingLinear probing: array of size M.n Hash: map key to integer i between 0 and M-1.n Insert: put in slot i if free, if not try i+1, i+2, etc.n Search: search slot i, if occupied but no match, try i+1, i+2, etc.Cluster.n Contiguous block of items.n Search through cluster using elementary algorithm for arrays.typically twice as many slots as elements14Linear Probing PerformanceLinear probing performance.n Insert and search cost depend on length of cluster.n Trivial: average length of cluster = a = N / M.n Worst case: all keys hash to same cluster.Theorem (Knuth, 1962). Let a = N / M < 1 be average length of list.Parameters.n M too large Þ too many empty array entries.n M too small Þ clusters coalesce.n Typical choice: M ~ 2N Þ constant-time search/insert.depends on hash map being random mapbut elements more likely to hash to big clusters15Double


View Full Document

Princeton COS 226 - Hash functions Separate chaining

Documents in this Course
QUICKSORT

QUICKSORT

14 pages

QUICKSORT

QUICKSORT

55 pages

STRINGS

STRINGS

69 pages

Lecture

Lecture

4 pages

STRINGS

STRINGS

18 pages

Hashing

Hashing

7 pages

MERGESORT

MERGESORT

14 pages

Quicksort

Quicksort

12 pages

Strings

Strings

10 pages

Overview

Overview

15 pages

Hashing

Hashing

13 pages

Mergesort

Mergesort

15 pages

Tries

Tries

13 pages

Final

Final

12 pages

Final

Final

12 pages

Mergesort

Mergesort

13 pages

Final

Final

10 pages

Load more
Download Hash functions Separate chaining
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hash functions Separate chaining and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hash functions Separate chaining 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?