Algorithms in Java, 4th Edition·Robert Sedgewick and Kevin Wayne·Copyright © 2009·January 22, 2010 10:54:35 PM3.4 Hash Tables‣hash functions‣separate chaining‣linear probing‣applications2Optimize judiciouslyReference: Effective Java by Joshua Bloch“ More computing sins are committed in the name of efficiency(without necessarily achieving it) than for any other single reason— including blind stupidity. ” — William A. Wulf“ We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. ” — Donald E. Knuth“ We follow two rules in the matter of optimization: Rule 1: Don't do it. Rule 2 (for experts only). Don't do it yet - that is, not until you have a perfectly clear and unoptimized solution. ” — M. A. JacksonST implementations: summaryQ. Can we do better?A. Yes, but with different access to the data.3implementationguaranteeguaranteeaverage caseaverage caseorderedoperationsimplementationsearchinsertdeletesearch hitinsertdeleteorderediteration?operationson keyssequential search(linked list)NNNN/2NN/2noequals()binary search(ordered array)lg NNNlg NN/2N/2yescompareTo()BSTNNN1.38 lg N1.38 lg N?yescompareTo()red-black tree2 lg N2 lg N2 lg N1.00 lg N1.00 lg N1.00 lg NyescompareTo()4Hashing: basic planSave items in a key-indexed table (index is a function of the key).Hash function. Method for computing array index from key.Issues.•Computing the hash function.•Equality test: Method for checking whether two keys are equal.hash("it") = 30123"it"455Hashing: basic planSave items in a key-indexed table (index is a function of the key).Hash function. Method for computing array index from key.Issues.•Computing the hash function.•Equality test: Method for checking whether two keys are equal.•Collision resolution: Algorithm and data structureto handle two keys that hash to the same array index.Classic space-time tradeoff.•No space limitation: trivial hash function with key as index.•No time limitation: trivial collision resolution with sequential search.•Limitations on both time and space: hashing (the real world).hash("times") = 3??0123"it"45hash("it") = 36‣hash functions‣separate chaining‣linear probing‣applications7Equality testNeeded because hash methods do not use compareTo().All Java classes inherit a method equals().Java requirements. For any references x, y and z:•Reflexive: x.equals(x) is true.•Symmetric: x.equals(y) iff y.equals(x).•Transitive: if x.equals(y) and y.equals(z), then x.equals(z).•Non-null: x.equals(null) is false.Default implementation. (x == y) Customized implementations. Integer, Double, String, File, URL, Date, …User-defined implementations. Some care needed.do x and y refer tothe same object?equivalencerelationSeems easypublic class Record{ private final String name; private final long val; ... public boolean equals(Record y) { Record that = y; return (this.val == that.val) && (this.name.equals(that.name)); }}Implementing equals for user-defined types8check that all significantfields are the sameSeems easy, but requires some care.public final class Record{ private final String name; private final long val; ... public boolean equals(Object y) { if (y == this) return true; if (y == null) return false; if (y.getClass() != this.getClass()) return false; Record that = (Record) y; return (this.val == that.val) && (this.name.equals(that.name)); }}Implementing equals for user-defined types9check for nulloptimize for true object equalityno safe way to use equals() with inheritancemust be Object.Why? Experts still debate.objects must be in the same classcheck that all significantfields are the same10Computing the hash functionIdealistic goal. Scramble the keys uniformly to produce a table index.•Efficiently computable.•Each table index equally likely for each key.Ex 1. Phone numbers.•Bad: first three digits.•Better: last three digits.Ex 2. Social Security numbers.•Bad: first three digits.•Better: last three digits.Practical challenge. Need different approach for each key type.573 = California, 574 = Alaska(assigned in chronological order within geographic region)thoroughly researched problem,still problematic in practical applicationskeytableindex11Java’s hash code conventionsAll Java classes inherit a method hashCode(), which returns a 32-bit int.Requirement. If x.equals(y), then (x.hashCode() == y.hashCode()).Highly desirable. If !x.equals(y), then (x.hashCode() != y.hashCode()).Default implementation. Memory address of x.Customized implementations. Integer, Double, String, File, URL, Date, …User-defined types. Users are on their own.x.hashCode()xy.hashCode()y12Implementing hash code: integers and doublespublic final class Integer{ private final int value; ... public int hashCode() { return value; }}convert to IEEE 64-bit representation;xor most significant 32-bitswith least significant 32-bitspublic final class Double{ private final double value; ... public int hashCode() { long bits = doubleToLongBits(value); return (int) (bits ^ (bits >>> 32)); }}•Horner's method to hash string of length L: L multiplies/adds.•Equivalent to h = 31L-1 · s0 + … + 312 · sL-3 + 311 · sL-2 + 310 · sL-1.Ex. public final class String{ private final char[] s; ... public int hashCode() { int hash = 0; for (int i = 0; i < length(); i++) hash = s[i] + (31 * hash); return hash; }} 13Implementing hash code: strings3045982 = 99·313 + 97·312 + 108·311 + 108·310 = 108 + 31· (108 + 31 · (97 + 31 · (99)))ith character of sString s = "call";int code = s.hashCode();charUnicode……'a'97'b'98'c'99…...Ex. Strings (in Java 1.1).•For long strings: only examine 8-9 evenly spaced characters.•Benefit: saves time in performing arithmetic.•Downside: great potential for bad collision patterns.14A poor hash codepublic int hashCode(){ int hash = 0; int skip = Math.max(1, length() / 8); for (int i = 0; i < length(); i += skip) hash = s[i] + (37 * hash); return
View Full Document