Slide 1Hash Table: Another dictionaryHash tablesHash functionsWho hashes what?More on rolesWhat to hash?Hashing integersCollision-avoidanceMore arguments for a prime table sizeWhat if we don’t have ints as keys?Java-esque String HashSpecializing hash functionsCombining hash functionsAdditional operationsHash Tables: A Different ADT?Collision resolutionSeparate ChainingSeparate ChainingSeparate ChainingSeparate ChainingSeparate ChainingSeparate ChainingThoughts on chainingTime vs. space (constant factors only here)A more rigorous chaining analysisSeparate Chaining DeletionAn Alternative to Separate Chaining: Open AddressingAn Alternative to Separate Chaining: Open AddressingAn Alternative to Separate Chaining: Open AddressingAn Alternative to Separate Chaining: Open AddressingAn Alternative to Separate Chaining: Open AddressingOpen addressing: Storing in the tableMore about Open AddressingTerminologyPrimary ClusteringAnalysis of Linear ProbingIn a chartOpen Addressing: Quadratic probingQuadratic Probing ExampleQuadratic Probing ExampleQuadratic Probing ExampleQuadratic Probing ExampleQuadratic Probing ExampleQuadratic Probing ExampleAnother Quadratic Probing ExampleAnother Quadratic Probing ExampleAnother Quadratic Probing ExampleAnother Quadratic Probing ExampleAnother Quadratic Probing ExampleAnother Quadratic Probing ExampleAnother Quadratic Probing ExampleFrom bad news to good newsClustering reconsideredOpen Addressing: Double hashingDouble-hashing analysisYet another reason to use a prime TablesizeMore double-hashing factsCharts: Double hashing (w/ uniform hashing) vs. Linear probingWe’ve explored different methods of collision detectionRehashingMore on rehashingHashing and comparingEqual objects must hash the sameJava bottom line(Incorrect) ExampleAside: Comparable/Comparator have rules tooFinal word on hashingCSE332: Data AbstractionsLecture 11: Hash TablesTyler RobisonSummer 20101Hash Table: Another dictionary2Aim for constant-time (i.e., O(1)) find, insert, and delete“On average” under some reasonable assumptionsA hash table is an array of some fixed sizeDefine a mapping from each key to a location in tableBasic idea:0…TableSize –1 hash function:index = h(key)hash tablekey space (e.g., integers, strings)Hash tables3There are m possible keys (m typically large, even infinite) but we expect our table to have only n items where n is much less than m (often written n << m)Many dictionaries have this propertyCompiler: All possible identifiers allowed by the language vs. those used in some file of one programDatabase: All possible student names vs. students enrolledAI: All possible chess-board configurations vs. those considered by the current playerHash functions4Hash function: Our key to index mappingAn ideal hash function:Is fast to compute“Rarely” hashes two “used” keys to the same indexOften impossible in theory; easy in practiceWill handle collisions a bit later0…TableSize –1 hash function:index = h(key)hash tablekey space (e.g., integers, strings)Who hashes what?5Hash tables can be genericTo store elements of type E, we just need E to be:1. Comparable: order any two E (like with all dictionaries)2. Hashable: convert any E to an intWhen hash tables are a reusable library, the division of responsibility generally breaks down into two roles:•We will learn both roles, but most programmers “in the real world” spend more time on the client side, while still having an understanding of the libraryEint table-indexcollision?collisionresolutionclienthash table libraryMore on roles6Two roles must both contribute to minimizing collisions•Client should aim for different ints for expected items–Avoid “wasting” any part of E or the 32 bits of the int•Library should aim for putting “similar” ints in different indices–conversion to index is almost always “mod table-size”–using prime numbers for table-size is commonEint table-indexcollision?collisionresolutionclienthash table librarySome ambiguity in terminology on which parts are “hashing”“hashing”?“hashing”?What to hash?7In lecture we will consider the two most common things to hash: integers and stringsIf you have objects with several fields, it is usually best to have most of the “identifying fields” contribute to the hash to avoid collisionsExample: class Person { String first; String middle; String last; int age; }Hashing integers8key space = integersUseful for examplesSimple hash function: h(key) = key % TableSizeClient: f(x) = xLibrary g(x) = x % TableSizeFairly fast and naturalExample:TableSize = 10Insert 7, 18, 41, 34, 10(As usual, ignoring data “along for the ride”)What could go wrong?Now insert 20….0123456789104134718Collision-avoidance9Collision: Two keys map to the same indexWith “x % TableSize” the number of collisions depends onthe ints insertedTableSizeLarger table-size tends to help, but not alwaysExample: Insert 12, 22, 32 with TableSize = 10 vs. TableSize = 6Technique: Pick table size to be prime. Why?Real-life data tends to have a pattern, and “multiples of 61” are probably less likely than “multiples of 60”Later we’ll see that one collision-handling strategy does provably better with prime table sizeUsually use something like 10 for examples though0 1212 3234 225More arguments for a prime table size10If TableSize is 60 and…Lots of data items are multiples of 5, wasting 80% of tableLots of data items are multiples of 10, wasting 90% of tableLots of data items are multiples of 2, wasting 50% of tableIf TableSize is 61…Collisions can still happen, but 5, 10, 15, 20, … will fill tableCollisions can still happen but 10, 20, 30, 40, … will fill tableCollisions can still happen but 2, 4, 6, 8, … will fill tableIn general, if x and y are “co-prime” (means gcd(x,y)==1), then (a * x) % y == (b * x) % y if and only if a % y == b % ySo, given table size y and keys as multiples of x, we’ll get a decent distribution if x & y are co-primeGood to have a TableSize that has not common factors with any “likely pattern” xWhat if we don’t have ints as keys?11If keys aren’t ints, the client must convert to an intTrade-off: speed and distinct keys hashing to distinct intsVery important example: StringsKey space K = s0s1s2…sm-1 Where si are chars: si
View Full Document