DOC PREVIEW
CORNELL CS 211 - Hashing

This preview shows page 1 out of 3 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 3 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

1HashingHashing is a technique for maintaining a set ofelements in an array. You should also read Weiss,chapter 20, which goes into more detail (but is harderto read).A set is just a collection of distinct (different)elements on which the following operations can beperformed:• Make the set empty• Add an element to the set• Remove an element from the set• Get the size of the set (number of elements in it)• Tell whether a value is in the set• Tell whether the set is empty.Obvious first implementation: Keep the elements inan array b. The elements are in b[0..n-1], wherevariable n contains the size of the array. No duplicatesare allowed.Problems: Adding an item take time O(n) --itshouldn’t be inserted if it is already in the set, sob[0..n-1] has first to be searched for it. Removing anitem also takes time O(n) in the worst case. We wouldlike an implementation in which the expected time forthese operations is constant: O(1).Solution: Use hashing. We illustrate hashingassuming that the elements of the set are Strings.Basic idea: Rather than keep the Strings in b[0..n-1],we allow them to be anywhere in the b. We use anarray whose elements are of the following nested classtype:// An instance is an entry in array bprivate static class HashEntry {public String element; // the elementpublic boolean isInSet; // = “element is in set”// Constructor: an entry that is in the set iff bpublic HashEntry( String e, boolean b ) { element = e; isInSet= b;}}Each element of our array b is either null or the nameof a HashEntry, and that entry indicates whether it isin the set or not. So, to remove an element of the set,just set its isInSet field to false.Hashing with linear probing. Here’s the basic idea.Suppose we want to insert the String “bc” into the set.We compute an index k of the array, using what’scalled a hash function,int k= hashCode(“bc”);and try to store the element at position b[k]. If thatentry is already filled with some other element, we tryto store it in b[(k+1)%b.length] --note that we usewraparound, just as in implementing a queue in anarray. If that position is filled, we keep tryingsuccessive elements in the same way.Each test of an array element to see whether it is nullor the String is called a probe.The hash function just picks some index, dependingon its argument. We’ll show a hash function later.Checking to see whether a String “xxx” is in the set issimilar; compute k= hashCode(“xxx”) and look insuccessive elements of b[k..] until a null element isreached or until “xxx” is found. If it is found, it is inthe set iff the position in which it is found has itsisInSet field true.You might think that this is a weird way to implementthe set, that it couldn’t possibly work. But it does,provided the set doesn’t fill up too much, andprovided we later make some adjustments.Here’s a basic fact:Suppose String s is in the set and hashCode(s) = k.Let b[j] be the first null element after b[k] (weinclude wraparound here). Then s is one of theelements b[k], b[k+1], …, b[j-1] (withwraparound).Then, because of the basic fact, we can write methodadd as follows, assuming that array b is never full: b0 k b.lengthtry to insert element at b[k], b[k+1], etc...2Hashing// Add s to this setpublic void add(String s) {int k= hashCode(s);while (b[k] != null && !b[k].element.equals(s)){ k= (k+1)%b.length(); }if (b[k] != null && b.isInSet)return;// s is not in the set; store it in b[k].b[k]= new HashEntry(s, true);size= size+1;}Removing an element is just as easy. Note thatremoving a value from the set leaves it in the array.// Remove s from this set (if it is in it)public void remove(String s) {int k= hashCode(s);while (b[k] != null && !b[k].element.equals(s)){ k= (k+1)%b.length(); }if (b[k] == null || !b[k].isInSet) return;// s is in the set; remove it.b[k].isInSet= false;size= size-1;}Hashing functionsWe need a function that turns a String s into an intthat is in the range of array b. It doesn’t matter whatthis function is as long as it distributes Strings tointegers in a fairly even manner. Here is the functionthat Weiss uses, assuming that s has 4 characters. s[0]*373 + s[1]*372 + s[2]*371 + s[3]*370i.e.((s[0]*37 + s[1])*37 + s[2])*37 + s[3]The result is then reduced modulo the size of array bto produce an int in the range of b. Some of theabove calculations may overflow, but that’s okay.The overflow produces an integer in the range of intthat satisfies our needs.See Sec. 20.2 of Weiss for an example of this hashfunction as a Java method.What about the load factor?The load factor, lf, is the value oflf = (size of elements of b in use) / (size of array b)The load factor is an estimate of how full the array is.If lf is close to 0, the array is relatively empty, andhashing will be quick. If lf is close to 1, then addingand removing elements will tend to take time linear inthe size of b, which is bad. Here’s what someoneproved:Under certain independence assumptions, theaverage number of array elements examined inadding an element is 1/(1-lf).So, if the array is half full, we can expect an additionto look at 1/(1-1/2) = 2 array elements. That’s prettygood! If the set contains 1,000 elements and the arraysize is over 2,000, only 2 probes are needed!So, we will keep the array no more than half full.Whenever insertion of an element will increase thenumber of used elements to more than 1/2 the size ofthe array, we will “rehash”. A new array will becreated and the elements that are in the set will becopied over to it. Of course, this takes time, but it isworth it. Here’s the method: /** Rehash array b */ private void rehash( ) { HashEntry[] oldb= b; // copy of array b // Create a new, empty array b= new HashEntry[nextPrime(4*size())]; size= 0; // Copy active elements from oldb to b for (int i= 0; i != oldb.length; i= i+1) if (oldb[i] !== null && oldb[i].isInSet) add(oldb[i].element); }The size of the new array is the smallest primenumber that is at least 4*b.size(). The reason forchoosing a prime number is explained on the nextpage.3HashingQuadratic probing.Linear probing looks for a String in the followingentries, given that the String hashed to k (weimplicitly assume that wraparound is being used): b[k], b[k+1], b[k+2], b[k+3], …This tends to produce clustering --long sequences ofnonnull


View Full Document

CORNELL CS 211 - Hashing

Documents in this Course
B-Trees

B-Trees

10 pages

Load more
Download Hashing
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hashing and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hashing 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?