DOC PREVIEW
UCF COP 3502 - Hash Tables

This preview shows page 1-2-3-4-5 out of 16 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 16 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Hash TablesUsing an array, we can retrieve/search for values in that arrayin O(lg n) time. However, inserting an item into an arrayquickly can be difficult. Why is this?Also, if we store items in a binary tree, we can alsoretrieve/search for them in O(lg n) time as long as the tree isbalanced. Insertion also takes O(lg n). These seem to be somepretty good numbers (we'll do the analysis later when we startto cover binary trees), but actually, we can do even better.A hash table allows us to store many records and very quicklysearch for these records. The basic idea is as follows:If you know you are going to deal with a total of n records,create an array of size n. Each array slot should be able to hold1 record. Now, the crux to a hash table working is a good hashfunction. A hash function takes as an input the type of recordbeing stored, and outputs a value from 0 to n-1, an integer thatis a valid index into the array.So, for example, if you were storing Strings, the hash functionwould have to map an arbitrary String to an integer in therange of the hash table array. Here is an example of such ahash function (this is a very poor hash function):f(w) = ascii value of the first character of w.One of the first things to notice about a hash function is thattwo different values, such as "cat" and "camera", can hash tothe exact same value. (In this case, the ascii values of the firstletters of both words are identical.)For the time being, let's ignore this problem. Let's assume thatour hash function works in such a way that every element wetry to store in it miraculously hashes to a different location.Now, imagine searching for an element in the hash table. Allyou have to do is compute its hash value and then just look inthat ONE particular array slot! Thus, the running time of thisoperation is simply based on how long it takes to compute thehash function. Hopefully, we can come up with a hash functionthat works reasonably well that can be computed in O(1) time.So now we get to the problem of collisions. A collision is whentwo values you are storing in your hash table hash to the exactsame location. (In essence, we have that f(x) = f(y) when x y.)Some ideas of how to deal with these:1) Don't: Just replace the new value you are trying to insertwith the old one stored in the hash table.2) Linear Probing: If there is a collision, continue searching inthe array in sequential order until you find the next emptylocation.3) Quadratic Probing: If there is a collision, continue searchingin the array by offsets of the integers square. This means youfirst look in array index c, the original index, followed by indexc+1, then index c+4, then index c+9, etc.4) Separate Chaining Hashing: Rather than storing the hashtable as an array of elements, store it as an array of linked listsof elements. If there is a collision, just insert the new elementinto the linked list at that slot. Hopefully, there will only be afew collisions so that no one linked list will be too long.Although searching may not be exactly O(1) time, it willdefinitely be quicker than O(lg n) time.The Hash FunctionSince these are difficult to truly analyze, I won't get into muchdetail here. (The book does not either.) Ideally, a hash functionshould work well for any type of input it could receive. Withthat in mind, the ideal hash function simply maps an elementto a random integer in the given range. Thus, the probabilitythat a randomly chosen element maps to any one of thepossible array locations should be equal.Given this reasoning, why is the hash function I showed youearlier a poor choice?Mainly for two reasons:1) It's designed for an array of only size 26. (Or maybe a bitbigger if we allow non-alphabetic characters.) Usually hashtables are larger than this.2) More words start with certain letters than others. Thesehash locations would be more densely filled.Let's go over a couple ideas in the book for hash functions(when dealing with Strings):1) Each printable character has a ascii value less than 128.Thus, a string could be a representation of a number in base128. So, if we had the String "dog", we could hash it to theintegerascii('d')*1282 + ascii('o')*1281 + ascii('g')*1280 =100*1282 + 111*128 + 103 = 1652711.What are the problems with this technique?1) Relatively small Strings map to HUGE integers. It is possiblethat these integers are not valid indexes to our hash table.2) Just computing this function may cause an overflow error.Remember that the int type can only store an integer up to 231 -1. Any string of length 5 or higher would hash to a valuegreater than this.How can we deal with these problems?Simply put, the mod operator can help us deal with both ofthese issues. Change the function as follows:f(c0c1...cn) = (ascii(c0)*1280 + ascii(c1)*1281 +...+ ascii(cn)*128n) mod tablesizeFor example if tablesize = 1307, then in our previous example,“dog” would hash to index 1652711 mod 1307 = 663.Now, we get a guarantee that each String hashes to a validlocation in the table, and it's not readily apparent that thisfunction isn't fairly random. It may not be, but chances are it'sbetter than the first one I showed you. However, rigorouslyproving this is most certainly beyond the scope of this class.Now, if you do this calculation in one particular way, you havethe possibility creating overflow. (Namely if you only do themod at the end rather than at each point of the calculation...)If you apply the mod at each step, then all intermediate valuescalculated remain low and no overflow occurs. Also, using Horner's rule can aid in this computation.Applying the rule states that(ascii(c0)*1280 + ascii(c1)*1281 +...+ ascii(cn)*128n) =ascii(c0) + 128(ascii(c1) + 128(ascii(c2) + ...+(128(ascii(cn-1) + 128ascii(cn))...))In general, Horner's rule specifies how to evaluate a polynomialwithout calculating xn in that polynomial directly. Here it is:cnxn + cn-1xn-1 + ... c1x + c0 =c0 + x(c1 + x(c2 + ... + x(cn-1 + xcn)...))This might seem confusing. Here’s a segment of code thatillustrates the use of Horner’s method to avoid overflow errors:char word[10]; // word to hash.// Initialize word.int hval = 0;for (i=0; i<strlen(word); i++) hval= 128*hval + (int)(word[i]);Try this with our


View Full Document

UCF COP 3502 - Hash Tables

Download Hash Tables
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hash Tables and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hash Tables 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?