Berkeley COMPSCI 61B - Lecture Notes - D1663054

Home> Schools> University of California, Berkeley> Computer Science (COMPSCI) > COMPSCI 61B> Lecture Notes

DOC PREVIEW

Berkeley COMPSCI 61B - Lecture Notes

School name University of California, Berkeley

Course Compsci 61b- Data Structures

Pages 2

This preview shows page 1 out of 2 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 2 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

03/12/1404:03:34 121 CS 61B: Lecture 21 Wednesday, March 12, 2014Today’s reading: Goodrich & Tamassia, Sections 9.1, 9.2, 9.5-9.5.1.DICTIONARIES============Suppose you have a set of two-letter words and their definitions. You want tobe able to look up the definition of any word, very quickly. The two-letterword is the _key_ that addresses the definition.Since there are 26 English letters, there are 26 * 26 = 676 possible two-letterwords. To implement a dictionary, we declare an array of 676 references, allinitially set to null. To insert a Definition into the dictionary, we definea function hashCode() that maps each two-letter word (key) to a unique integerbetween 0 and 675. We use this integer as an index into the array, and makethe corresponding bucket (array position) point to the Definition object.public class Word { public static final int LETTERS = 26, WORDS = LETTERS * LETTERS; public String word; public int hashCode() { // Map a two-letter Word to 0...675. return LETTERS * (word.charAt(0) - ’a’) + (word.charAt(1) - ’a’); }}public class WordDictionary { private Definition[] defTable = new Definition[Word.WORDS]; public void insert(Word w, Definition d) { defTable[w.hashCode()] = d; // Insert (w, d) into Dictionary. } Definition find(Word w) { return defTable[w.hashCode()]; // Return the Definition of w. }}What if we want to store every English word, not just the two-letter words?The table "defTable" must be long enough to accommodatepneumonoultramicroscopicsilicovolcanoconiosis, 45 letters long. Unfortunately,declaring an array of length 26^45 is out of the question. English has fewerthan one million words, so we should be able to do better.Hash Tables (the most common implementation of dictionaries)-----------Suppose n is the number of keys (words) whose definitions we want to store, andsuppose we use a table of N buckets, where N is perhaps a bit larger than n,but much smaller than the number of _possible_ keys. A hash table maps a hugeset of possible keys into N buckets by applying a _compression_function_ toeach hash code. The obvious compression function is h(hashCode) = hashCode mod N.Hash codes are often negative, so remember that mod is not the same as Java’sremainder operator "%". If you compute hashCode % N, check if the result isnegative, and add N if it is.With this compression function, no matter how long and variegated the keys are,we can map them into a table whose size is not much greater than the actualnumber of entries we want to store. However, we’ve created a new problem:several keys are hashed to the same bucket in the table if h(hashCode1) =h(hashCode2). This circumstance is called a _collision_.How do we handle collisions without losing entries? We use a simple ideacalled _chaining_. Instead of having each bucket in the table reference oneentry, we have it reference a linked list of entries, called a _chain_. Ifseveral keys are mapped to the same bucket, their definitions all reside inthat bucket’s linked list.Chaining creates a second problem: how do we know which definition correspondsto which word? The answer is that we must store each key in the table with itsdefinition. The easiest way to do this is to have each listnode store an_entry_ that has references to both a key (the word) and an associated value(its definition). --- ----------------------------------------------------------defTable |.+-->| . | . | X | . | X | . | . | ... --- ----|-------|---------------|---------------|-------|----- v v v v v --- --- --- --- --- |.+>pus |.+>evil |.+>okthxbye |.+>cool|.+>mud |.+>goo |.+>C++ |.+>creep |.+>jrs |.+>wet dirt |.| |X| |X| |.| |X| -+- --- --- -+- --- | | v v --- ^ --- |.+>sin < chains > |.+>twerk |.+>have fun |.+>Miley burping |X| |X| the wrong way --- ---Hash tables usually support at least three operations. An Entry objectreferences a key and its associated value.public Entry insert(key, value) Compute the key’s hash code and compress it to determine the entry’s bucket. Insert the entry (key and value together) into that bucket’s list.public Entry find(key) Hash the key to determine its bucket. Search the list for an entry with the given key. If found, return the entry; otherwise, return null.public Entry remove(key) Hash the key to determine its bucket. Search the list for an entry with the given key. Remove it from the list if found. Return the entry or null.What if two entries with the same key are inserted? There are two approaches.(1) Following Goodrich and Tamassia, we can insert both, and have find() or remove() arbitrarily return/remove one. Goodrich and Tamassia also propose a method findAll() that returns all the entries with a given key.(2) Replace the old value with the new one, so only one entry with a given key exists in the table.Which approach is best? It depends on the application.WARNING: When an object is stored as a key in a hash table, an applicationshould never change the object in a way that will change its hash code.If you do so, the object will thenceforth be in the wrong bucket.The _load_factor_ of a hash table is n/N, where n is the number of keys in thetable and N is the number of buckets. If the load factor stays below one (ora small constant), and the hash code and compression function are "good," andthere are no duplicate keys, then the linked lists are all short, and eachoperation takes O(1) time. However, if the load factor grows too large(n >> N), performance is dominated by linked list operations and degenerates toO(n) time (albeit with a much smaller constant factor than if you replaced thehash table with one singly-linked list). A proper analysis requires a

View Full Document


School:
Email:
New Password:
Confirm Password:

This preview shows page 1 out of 2 pages.

Berkeley COMPSCI 61B - Lecture Notes

Sign up for free to view:

Please select your school