Algorithms, 4th Edition·Robert Sedgewick and Kevin Wayne·Copyright © 2002–2012·March 7, 2012 5:02:20 AMAlgorithmsFOUR T H EDIT IONR O B E R T S EDG EWICK K EVIN W A Y N E3.4 HASH TABLES‣hash functions‣separate chaining‣linear probingST implementations: summaryQ. Can we do better?A. Yes, but with different access to the data.2implementationworst-case cost(after N inserts) worst-case cost(after N inserts) worst-case cost(after N inserts) average-case cost(after N random inserts)average-case cost(after N random inserts)average-case cost(after N random inserts)orderediteration?keyinterfaceimplementationsearchinsertdeletesearch hitinsertdeleteiteration?interfacesequential search(unordered list)NNNN/2NN/2noequals()binary search(ordered array)lg NNNlg NN/2N/2yescompareTo()BSTNNN1.38 lg N1.38 lg N?yescompareTo()red-black BST2 lg N2 lg N2 lg N1.00 lg N1.00 lg N1.00 lg NyescompareTo()3Hashing: basic planSave items in a key-indexed table (index is a function of the key).Hash function. Method for computing array index from key.Issues.•Computing the hash function.•Equality test: Method for checking whether two keys are equal.hash("it") = 30123"it"454Hashing: basic planSave items in a key-indexed table (index is a function of the key).Hash function. Method for computing array index from key.Issues.•Computing the hash function.•Equality test: Method for checking whether two keys are equal.•Collision resolution: Algorithm and data structureto handle two keys that hash to the same array index.Classic space-time tradeoff.•No space limitation: trivial hash function with key as index.•No time limitation: trivial collision resolution with sequential search.•Space and time limitations: hashing (the real world).hash("times") = 3??0123"it"45hash("it") = 35‣hash functions‣separate chaining‣linear probing6Computing the hash functionIdealistic goal. Scramble the keys uniformly to produce a table index.•Efficiently computable.•Each table index equally likely for each key.Ex 1. Phone numbers.•Bad: first three digits.•Better: last three digits.Ex 2. Social Security numbers.•Bad: first three digits.•Better: last three digits.Practical challenge. Need different approach for each key type.573 = California, 574 = Alaska(assigned in chronological order within geographic region)thoroughly researched problem,still problematic in practical applicationskeytableindex7Java’s hash code conventionsAll Java classes inherit a method hashCode(), which returns a 32-bit int.Requirement. If x.equals(y), then (x.hashCode() == y.hashCode()).Highly desirable. If !x.equals(y), then (x.hashCode() != y.hashCode()).Default implementation. Memory address of x.Legal (but poor) implementation. Always return 17.Customized implementations. Integer, Double, String, File, URL, Date, …User-defined types. Users are on their own.x.hashCode()xy.hashCode()y8Implementing hash code: integers, booleans, and doublespublic final class Integer{ private final int value; ... public int hashCode() { return value; }}convert to IEEE 64-bit representation;xor most significant 32-bitswith least significant 32-bitspublic final class Double{ private final double value; ... public int hashCode() { long bits = doubleToLongBits(value); return (int) (bits ^ (bits >>> 32)); }}public final class Boolean{ private final boolean value; ... public int hashCode() { if (value) return 1231; else return 1237; }}Java library implementations•Horner's method to hash string of length L: L multiplies/adds.•Equivalent to h = s[0] · 31L–1 + … + s[L – 3] · 312 + s[L – 2] · 311 + s[L – 1] · 310.Ex. public final class String{ private final char[] s; ... public int hashCode() { int hash = 0; for (int i = 0; i < length(); i++) hash = s[i] + (31 * hash); return hash; }} 9Implementing hash code: strings3045982 = 99·313 + 97·312 + 108·311 + 108·310 = 108 + 31· (108 + 31 · (97 + 31 · (99)))(Horner's method)ith character of sString s = "call";int code = s.hashCode();charUnicode……'a'97'b'98'c'99…...Java library implementationPerformance optimization.•Cache the hash value in an instance variable.•Return cached value.public final class String{ private int hash = 0; private final char[] s; ... public int hashCode() { int h = hash; if (h != 0) return h; for (int i = 0; i < length(); i++) h = s[i] + (31 * hash); hash = h; return h; }} 10Implementing hash code: stringsreturn cached valuecache of hash codestore cache of hash codeString hashCode() in Java 1.1.•For long strings: only examine 8-9 evenly spaced characters.•Benefit: saves time in performing arithmetic.•Downside: great potential for bad collision patterns.11War story: String hashing in Javapublic int hashCode(){ int hash = 0; int skip = Math.max(1, length() / 8); for (int i = 0; i < length(); i += skip) hash = s[i] + (37 * hash); return hash;}http://www.cs.princeton.edu/introcs/13loop/Hello.javahttp://www.cs.princeton.edu/introcs/13loop/Hello.classhttp://www.cs.princeton.edu/introcs/13loop/Hello.htmlhttp://www.cs.princeton.edu/introcs/12type/index.html12Implementing hash code: user-defined typespublic final class Transaction implements Comparable<Transaction>{ private final String who; private final Date when; private final double amount; public Transaction(String who, Date when, double amount) { /* as before */ } ... public boolean equals(Object y) { /* as before */ } public int hashCode() { int hash = 17; hash = 31*hash + who.hashCode(); hash = 31*hash + when.hashCode(); hash = 31*hash + ((Double) amount).hashCode(); return hash; }}typically a small primenonzero constantfor primitive types, use hashCode()of wrapper typefor reference types,use hashCode()13Hash code design"Standard" recipe for user-defined types.•Combine each significant field using the 31x + y rule.•If field is a primitive type, use wrapper type hashCode().•If field is null, return 0.•If field is a reference type, use hashCode().•If field is an array, apply to each entry.In practice. Recipe works reasonably well; used in Java libraries.In theory. Keys are bitstring; "universal" hash functions exist. Basic rule. Need to use the whole key to compute hash code;consult an expert for
View Full Document