HashingSearchingSlide 3Example (ideal) hash functionWhy hash tables?Finding the hash functionExample imperfect hash functionCollisionsHandling collisionsInsertion, ISearching, ISearching, IIInsertion, IIInsertion, IIIClusteringEfficiencySolution #2: RehashingSolution #3: Bucket hashingThe hashCode functionWriting your own hashCode methodOther considerationsHash tables in JavaHash table operationsThe EndHashing2SearchingConsider the problem of searching an array for a given valueIf the array is not sorted, the search requires O(n) timeIf the value isn’t there, we need to search all n elementsIf the value is there, we search n/2 elements on averageIf the array is sorted, we can do a binary searchA binary search requires O(log n) timeAbout equally fast whether the element is found or notIt doesn’t seem like we could do much betterHow about an O(1), that is, constant time search?We can do it if the array is organized in a particular way3HashingSuppose we were to come up with a “magic function” that, given a value to search for, would tell us exactly where in the array to lookIf it’s in that location, it’s in the arrayIf it’s not in that location, it’s not in the arrayThis function would have no other purposeIf we look at the function’s inputs and outputs, they probably won’t “make sense”This function is called a hash function because it “makes hash” of its inputs4Example (ideal) hash functionSuppose our hash function gave us the following values: hashCode("apple") = 5hashCode("watermelon") = 3hashCode("grapes") = 8hashCode("cantaloupe") = 7hashCode("kiwi") = 0hashCode("strawberry") = 9hashCode("mango") = 6hashCode("banana") = 2kiwibananawatermelonapplemangocantaloupegrapesstrawberry01234567895Why hash tables?We don’t (usually) use hash tables just to see if something is there or not—instead, we put key/value pairs into the tableWe use a key to find a place in the tableThe value holds the information we are actually interested inrobinsparrowhawkseagullbluejayowl. . .141142143144145146147148robin infosparrow infohawk infoseagull infobluejay infoowl infokey value6Finding the hash functionHow can we come up with this magic function?In general, we cannot--there is no such magic function In a few specific cases, where all the possible values are known in advance, it has been possible to compute a perfect hash functionWhat is the next best thing?A perfect hash function would tell us exactly where to lookIn general, the best we can do is a function that tells us where to start looking!7Example imperfect hash functionSuppose our hash function gave us the following values:hash("apple") = 5hash("watermelon") = 3hash("grapes") = 8hash("cantaloupe") = 7hash("kiwi") = 0hash("strawberry") = 9hash("mango") = 6hash("banana") = 2hash("honeydew") = 6kiwibananawatermelonapplemangocantaloupegrapesstrawberry0123456789• Now what?8CollisionsWhen two values hash to the same array location, this is called a collisionCollisions are normally treated as “first come, first served”—the first value that hashes to the location gets itWe have to find something to do with the second and subsequent values that hash to this same location9Handling collisionsWhat can we do when two different values attempt to occupy the same place in an array?Solution #1: Search from there for an empty locationCan stop searching when we find the value or an empty locationSearch must be end-aroundSolution #2: Use a second hash function...and a third, and a fourth, and a fifth, ... Solution #3: Use the array location as the header of a linked list of values that hash to this locationAll these solutions work, provided:We use the same technique to add things to the array as we use to search for things in the array10Insertion, ISuppose you want to add seagull to this hash tableAlso suppose:hashCode(seagull) = 143table[143] is not emptytable[143] != seagulltable[144] is not emptytable[144] != seagulltable[145] is emptyTherefore, put seagull at location 145robinsparrowhawkbluejayowl. . .141142143144145146147148. . .seagull11Searching, ISuppose you want to look up seagull in this hash tableAlso suppose:hashCode(seagull) = 143table[143] is not emptytable[143] != seagulltable[144] is not emptytable[144] != seagulltable[145] is not emptytable[145] == seagull !We found seagull at location 145robinsparrowhawkbluejayowl. . .141142143144145146147148. . .seagull12Searching, IISuppose you want to look up cow in this hash tableAlso suppose:hashCode(cow) = 144table[144] is not emptytable[144] != cowtable[145] is not emptytable[145] != cowtable[146] is emptyIf cow were in the table, we should have found it by nowTherefore, it isn’t hererobinsparrowhawkbluejayowl. . .141142143144145146147148. . .seagull13Insertion, IISuppose you want to add hawk to this hash tableAlso supposehashCode(hawk) = 143table[143] is not emptytable[143] != hawktable[144] is not emptytable[144] == hawkhawk is already in the table, so do nothingrobinsparrowhawkseagullbluejayowl. . .141142143144145146147148. . .14Insertion, IIISuppose:You want to add cardinal to this hash tablehashCode(cardinal) = 147The last location is 148147 and 148 are occupiedSolution:Treat the table as circular; after 148 comes 0Hence, cardinal goes in location 0 (or 1, or 2, or ...)robinsparrowhawkseagullbluejayowl. . .14114214314414514614714815ClusteringOne problem with the above technique is the tendency to form “clusters”A cluster is a group of items not containing any open slotsThe bigger a cluster gets, the more likely it is that new values will hash into the cluster, and make it ever biggerClusters cause efficiency to degradeHere is a non-solution: instead of stepping one ahead, step n locations aheadThe clusters are still there, they’re just harder to seeUnless n and the table size are mutually prime, some table locations are never checked16EfficiencyHash tables are actually surprisingly efficientUntil the table is about 70% full, the number of probes (places looked at in the table) is typically only 2 or 3Sophisticated mathematical analysis is required to prove that the expected cost of inserting into a hash table, or looking something up in the hash table, is
View Full Document