DOC PREVIEW
Penn CIT 594 - Hashing Lecture

This preview shows page 1-2-19-20 out of 20 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 20 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

HashingSearchingSlide 3Example (ideal) hash functionFinding the hash functionExample imperfect hash functionCollisionsHandling collisionsSearching for a location ISearching for a location IISearching for a location IIIClusteringEfficiencySolution #2: RehashingSolution #3: Bucket hashingThe hashCode functionWriting your own hashCode methodOther considerationsHash tables in JavaThe EndHashingSearching•Consider the problem of searching an array for a given value–If the array is not sorted, the search requires O(n) time•If the value isn’t there, we need to search all n elements•If the value is there, we search n/2 elements on average–If the array is sorted, we can do a binary search•A binary search requires O(log n) time•About equally fast whether the element is found or not–It doesn’t seem like we could do much better•How about an O(1), that is, constant time search?•We can do it if the array is organized in a particular wayHashing•Suppose we were to come up with a “magic function” that, given a value to search for, would tell us exactly where in the array to look–If it’s in that location, it’s in the array–If it’s not in that location, it’s not in the array•This function would have no other purpose•If we look at the function’s inputs and outputs, they probably won’t “make sense”•This function is called a hash function because it “makes hash” of its inputsExample (ideal) hash function•Suppose our hash function gave us the following values: hashCode("apple") = 5hashCode("watermelon") = 3hashCode("grapes") = 8hashCode("cantaloupe") = 7hashCode("kiwi") = 0hashCode("strawberry") = 9hashCode("mango") = 6hashCode("banana") = 2kiwibananawatermelonapplemangocantaloupegrapesstrawberry0123456789Finding the hash function•How can we come up with this magic function?•In general, we cannot--there is no such magic function –In a few specific cases, where all the possible values are known in advance, it has been possible to compute a perfect hash function•What is the next best thing?–A perfect hash function would tell us exactly where to look–In general, the best we can do is a function that tells us where to start looking!Example imperfect hash function•Suppose our hash function gave us the following values:–hash("apple") = 5hash("watermelon") = 3hash("grapes") = 8hash("cantaloupe") = 7hash("kiwi") = 0hash("strawberry") = 9hash("mango") = 6hash("banana") = 2hash("honeydew") = 6kiwibananawatermelonapplemangocantaloupegrapesstrawberry0123456789• Now what?Collisions•When two values hash to the same array location, this is called a collision•Collisions are normally treated as “first come, first served”—the first value that hashes to the location gets it•We have to find something to do with the second and subsequent values that hash to this same locationHandling collisions•What can we do when two different values attempt to occupy the same place in an array?–Solution #1: Search from there for an empty location•Can stop searching when we find the value or an empty location•Search must be end-around–Solution #2: Use a second hash function•...and a third, and a fourth, and a fifth, ... –Solution #3: Use the array location as the header of a linked list of values that hash to this location•All these solutions work, provided:–We use the same technique to add things to the array as we use to search for things in the arraySearching for a location I•Suppose you want to add seagull to this hash table•Also suppose:–hashCode(seagull) = 143–table[143] is not empty–table[143] != seagull–table[144] is not empty–table[144] != seagull–table[145] is empty•Therefore, put seagull at location 145robinsparrowhawkbluejayowl. . .141142143144145146147148. . .seagullSearching for a location II•Suppose you want to add hawk to this hash table•Also suppose–hashCode(hawk) = 143–table[143] is not empty–table[143] != hawk–table[144] is not empty–table[144] == hawk•hawk is already in the table, so do nothing•We use the same procedure for looking things up in the table as we do for inserting themrobinsparrowhawkseagullbluejayowl. . .141142143144145146147148. . .Searching for a location III•Suppose:–You want to add cardinal to this hash table–hashCode(cardinal) = 147–The last location is 148–147 and 148 are occupied•Solution:–Treat the table as circular; after 148 comes 0–Hence, cardinal goes in location 0 (or 1, or 2, or ...)robinsparrowhawkseagullbluejayowl. . .141142143144145146147148Clustering•One problem with the above technique is the tendency to form “clusters”•A cluster is a group of items not containing any open slots•The bigger a cluster gets, the more likely it is that new values will hash into the cluster, and make it ever bigger•Clusters cause efficiency to degrade•Here is a non-solution: instead of stepping one ahead, step n locations ahead–The clusters are still there, they’re just harder to see–Unless n and the table size are mutually prime, some table locations are never checkedEfficiency•Hash tables are actually surprisingly efficient•Until the table is about 70% full, the number of probes (places looked at in the table) is typically only 2 or 3•Sophisticated mathematical analysis is required to prove that the expected cost of inserting into a hash table, or looking something up in the hash table, is O(1)•Even if the table is nearly full (leading to long searches), efficiency is usually still quite highSolution #2: Rehashing•In the event of a collision, another approach is to rehash: compute another hash function–Since we may need to rehash many times, we need an easily computable sequence of functions•Simple example: in the case of hashing Strings, we might take the previous hash code and add the length of the String to it–Probably better if the length of the string was not a component in computing the original hash function•Possibly better yet: add the length of the String plus the number of probes made so far–Problem: are we sure we will look at every location in the array?•Rehashing is a fairly uncommon approach, and we won’t pursue it any further hereSolution #3: Bucket hashing•The previous solutions used open hashing: all entries went into a “flat” (unstructured) array•Another solution is to make each array location the header of a linked list of values that hash to that


View Full Document

Penn CIT 594 - Hashing Lecture

Documents in this Course
Trees

Trees

17 pages

Searching

Searching

24 pages

Pruning

Pruning

11 pages

Arrays

Arrays

17 pages

Stacks

Stacks

30 pages

Recursion

Recursion

25 pages

Hashing

Hashing

24 pages

Recursion

Recursion

24 pages

Graphs

Graphs

25 pages

Storage

Storage

37 pages

Trees

Trees

21 pages

Arrays

Arrays

24 pages

Hashing

Hashing

24 pages

Recursion

Recursion

25 pages

Graphs

Graphs

23 pages

Graphs

Graphs

25 pages

Stacks

Stacks

25 pages

Recursion

Recursion

25 pages

Quicksort

Quicksort

21 pages

Quicksort

Quicksort

21 pages

Graphs

Graphs

25 pages

Recursion

Recursion

25 pages

Searching

Searching

24 pages

Counting

Counting

20 pages

HTML

HTML

18 pages

Recursion

Recursion

24 pages

Pruning

Pruning

11 pages

Graphs

Graphs

25 pages

Load more
Download Hashing Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hashing Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hashing Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?