DOC PREVIEW
Princeton COS 226 - Hashing

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Copyright © 2007 by Robert Sedgewick and Kevin Wayne.1HashingReferences: Algorithms in Java, Chapter 14Intro to Algs and Data Structs, Section 4.2•hash functions•collision resolution•applicationsSummary of symbol-table implementationsCan we do better?2guaranteeaverage caseorderediteration?implementationsearchinsertdeletesearchinsertdeleteunordered arrayNNNN/2N/2N/2noordered arraylg NNNlg NN/2N/2yesunordered listNNNN/2NN/2noordered listNNNN/2N/2N/2yesBSTNNN1.39 lg N1.39 lg N?yesrandomized BST7 lg N7 lg N7 lg N1.39 lg N1.39 lg N1.39 lg Nyesred-black tree2 lg N2 lg N2 lg Nlg Nlg Nlg Nyes3Optimize JudiciouslyReference: Effective Java by Joshua Bloch.More computing sins are committed in the name of efficiency(without necessarily achieving it) than for any other single reason - including blind stupidity. - William A. WulfWe should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. - Donald E. KnuthWe follow two rules in the matter of optimization: Rule 1: Don't do it. Rule 2 (for experts only). Don't do it yet - that is, not until you have a perfectly clear and unoptimized solution. - M. A. Jackson4Hashing: basic planSave items in a key-indexed table. Index is a function of the key.Hash function. Method for computing table index from key.Collision resolution strategy. Algorithm and data structure to handle two keys that hash to the same index.Equality test. Method for checking whether two keys are equal.Classic space-time tradeoff.!No space limitation: trivial hash function with key as address.!No time limitation: trivial collision resolution with sequential search.!Limitations on both time and space: hashing (the real world).5hash functionscollision resolutionapplications6Implementing a good hash functionIdealistic goal: scramble the keys uniformly.!Efficiently computable.!Each table position equally likely for each key.Practical challenge: need different approach for each type of keyEx: Social Security numbers.!Bad: first three digits.!Better: last three digits.Ex: date of birth.!Bad: birth year.!Better: birthday.Ex: phone numbers.!Bad: first three digits.!Better: last three digits.573 = California, 574 = Alaskaassigned in chronological order within a given geographic regionthoroughly researched problem,still problematic in practical applications7Hash Codes and Hash FunctionsJava convention: all classes implement hashcode()hashcode() returns a 32-bit int (between -2147483648 and 2147483647)Hash function. An int between 0 and M-1 (for use as an array index)First try:Bug. Don't use (code % M) as array index 1-in-a billion bug. Don't use (Math.abs(code) % M) as array index. OK. Safe to use ((code & 0x7fffffff) % M) as array index.String s = "call";int code = s.hashCode();int hash = code % M; 7121 8191hex literal30459828Hash Codes and Hash FunctionsJava convention: all classes implement hashcode()hashcode() returns a 32-bit int (between -2147483648 and 2147483647)Hash function. An int between 0 and M-1 (for use as an array index)First try:Bug. Don't use (code % M) as array index [could be negative].1-in-a billion bug. Don't use (Math.abs(code) % M) as array index. [code could be 2147483648]OK. Safe to use ((code & 0x7fffffff) % M) as array index.String s = "call";int code = s.hashCode();int hash = code % M; 7121 8191hex literal30459829Implementing hashCode() in JavaTheoretical advantages of hashCode() convention!Ensures hashing can be used for every type of object!Allows expert implementations suited to each typeAPI for hashCode().!Return an int.!If x.equals(y) then x and y must have the same hash code.!Repeated calls to x.hashCode() must return the same value.Practical realities of hashCode() convention!Cost is an important consideration!True randomness is hard to achieveDefault implementation. Memory address of x.Customized implementations. String, URL, Integer, Date.User-defined implementations. Tricky to get right, black art.inherited from Object10A typical typeAssumption when using hashing in Java: Key type has reasonable implementation of hashCode() and equals()Ex. Phone numbers: (609) 867-5309.Problem: Need a theorem for each type of data to ensure reliability.sufficiently random?exchangeextensionpublic final class PhoneNumber{ private final int area, exch, ext; public PhoneNumber(int area, int exch, int ext) { this.area = area; this.exch = exch; this.ext = ext; } public boolean equals(Object y) { // as before } public int hashCode() { return 10007 * (area + 1009 * exch) + ext; }}11A decent hash code designJava 1.5 string library [see also Program 14.2 in Algs in Java].!Equivalent to h = 31L-1·s0 + … + 312·sL-3 + 31·sL-2 + sL-1.!Horner's method to hash string of length L: L multiplies/addsEx. Provably random? Well, no. String s = "call";int code = s.hashCode();3045982 = 99·313 + 97·312 + 108·311 + 108·310 = 108 + 31·(108 + 31·(99 + 31·(97)))ith character of sUnicodechar… …'a' 97'b' 98'c' 99… …public int hashCode(){ int hash = 0; for (int i = 0; i < length(); i++) hash = s[i] + (31 * hash); return hash;}12A poor hash code designJava 1.1 string library.!For long strings: only examines 8-9 evenly spaced characters.!Saves time in performing arithmetic…but great potential for bad collision patterns.Basic rule: need to use the whole key.http://www.cs.princeton.edu/introcs/13loop/Hello.javahttp://www.cs.princeton.edu/introcs/13loop/Hello.classhttp://www.cs.princeton.edu/introcs/13loop/Hello.htmlhttp://www.cs.princeton.edu/introcs/13loop/index.htmlhttp://www.cs.princeton.edu/introcs/12type/index.htmlpublic int hashCode(){ int hash = 0; int skip = Math.max(1, length() / 8); for (int i = 0; i < length(); i += skip) hash = (37 * hash) + s[i]; return hash;}Digression: using a hash function for data miningUse content to characterize documents.Applications!Search documents on the web for documents similar to a given one.!Determine whether a new document belongs in one set or anotherApproach!Fix order k and dimension d!Compute hashcode() % d for allk-grams in the document!Result: d-dimensional vectorprofile of each document!To compare documents:Consider angle ! separating vectors–cos ! close to 0: not similar–cos ! close to 1: similarEffective for literature, genomes, Java code, art, music, data, video13cos ! = a ! b / ⎜a⎜⎜b⎜ab!Digression: using a hash function for data


View Full Document

Princeton COS 226 - Hashing

Documents in this Course
QUICKSORT

QUICKSORT

14 pages

QUICKSORT

QUICKSORT

55 pages

STRINGS

STRINGS

69 pages

Lecture

Lecture

4 pages

STRINGS

STRINGS

18 pages

Hashing

Hashing

7 pages

MERGESORT

MERGESORT

14 pages

Quicksort

Quicksort

12 pages

Strings

Strings

10 pages

Overview

Overview

15 pages

Mergesort

Mergesort

15 pages

Tries

Tries

13 pages

Final

Final

12 pages

Final

Final

12 pages

Mergesort

Mergesort

13 pages

Final

Final

10 pages

Load more
Download Hashing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hashing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hashing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?