DOC PREVIEW
Duke CPS 100E - Lecture

This preview shows page 1-2-3-4 out of 13 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 13 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CompSci 100E22.1Searching, Maps,Tries (hashing)ÿ Searching is a fundamentally important operation We want to search quickly, very very quickly Consider searching using Google, ACES, issues? In general we want to search in a collection for a keyÿ We've searched using trees and arrays Tree implementation was quick: O(log n) worst/average? Arrays: access is O(1), search is slowerÿ If we compare keys, log n is best for searching n elements Lower bound is ΩΩΩΩ(log n), provable Hashing is O(1) on average, not a contradiction, why? Tries are O(1) worst-case!! (ignoring length of key)CompSci 100E22.2From Google to Mapsÿ If we wanted to write a search engine we’d need to access lots of pages and keep lots of data Given a word, on what pages does it appear? This is a map of words->web pagesÿ In general a map associates a key with a value Look up the key in the map, get the value Google: key is word/words, value is list of web pages Anagram: key is string, value is words that are anagramsÿ Interface issues Lookup a key, return boolean: in map or value: associated with the key (what if key not in map?) Insert a key/value pair into the mapCompSci 100E22.3Interface at work: MapDemo.javaÿ Key is a string, Value is # occurrences Interface in code below shows how Mapclass workswhile (scanner.hasNext()) {String s = (String) scanner.next();Counter c = (Counter) map.get(s);if (c != null) c.increment();else map.put(s, new Counter());}ÿ What clues are there for prototype of map.get and map.put? What if a key is not in map, what value returned? What kind of objects can be put in a map?CompSci 100E22.4Accessing all values in a map(e.g., print)ÿ Access every key in the map, then get the corresponding value Get an iterator of the set of keys: keySet().iterator() For each key returned by this iterator call map.get(key)…ÿ Get an iterator over (key,value) pairs, there's a nested class called Map.Entry that the iterator returns, accessing the key and the value separately is then possible To see all the pairs use entrySet().iterator()CompSci 100E22.5External Iterator without genericsÿ The Iterator interface accesses elements Source of iterator makes a difference: cast required?Iterator it = map.keySet().iterator();while (it.hasNext()){Object value = map.get(it.next());}Iterator it2 = map.entrySet().iterator();while (it2.hasNext()){Map.Entry me = (Map.Entry) it2.next();Object value = me.getValue();}CompSci 100E22.6External Iterator with genericsÿ Avoid Object, we know what we have a map of Is the syntax worth it?Iterator<String> it = map.keySet().iterator();while (it.hasNext()){Object value = map.get(it.next());}Iterator<Map.Entry<String,Counter>> it2 =map.entrySet().iterator();while (it2.hasNext()){Map.Entry<String,Counter> me = it2.next();Object value = me.getValue();}CompSci 100E22.7Hashing: Log (10100) is a big numberÿ Comparison based searches are too slow for lots of data How many comparisons needed for a billion elements? What if one billion web-pages indexed?ÿ Hashing is a search method: average case O(1) search Worst case is very bad, but in practice hashing is good Associate a number with every key, use the number to store the keyo Like catalog in library, given book title, find the bookÿ A hash function generates the number from the key Goal: Efficient to calculate Goal: Distributes keys evenly in hash tableCompSci 100E22.8Hashing detailsÿ There will be collisions, two keys will hash to the same value We must handle collisions, still have efficient search What about birthday “paradox”: using birthday as hash function, will there be collisions in a room of 25 people? ÿ Several ways to handle collisions, in general array/vector used Linear probing, look in next spot if not foundo Hash to index h, try h+1,h+2,…,wrap at endo Clustering problems, deletion problems, growing problems Quadratic probingo Hash to index h, try h+12,h+22,h+32,…,wrap at endo Fewer clustering problems Double hashingo Hash to index h, with another hash function to jo Try h, h+j, h+2j, …0123 n-1CompSci 100E22.9Chaining with hashingÿ With n buckets each bucket stores linked list Compute hash value h, look up key in linked list table[h] Hopefully linked lists are short, searching is fast Unsuccessful searches often faster than successfulo Empty linked lists searched more quickly than non-empty Potential problems?ÿ Hash table details Size of hash table should be a prime number Keep load factor small: number of keys/size of table On average, with reasonable load factor, search is O(1) What if load factor gets too high? Rehash or other methodCompSci 100E22.10Hashing problemsÿ Linear probing, hash(x) = x, (mod tablesize) Insert 24, 12, 45, 14, delete 24, insert 23 (where?)ÿ Same numbers, use quadratic probing (clustering better?)ÿ What about chaining, what happens?012345678910012345678910241245 1424124514CompSci 100E22.11What about hash functionsÿ Hashing often done on strings, consider two alternativespublic static int hash(String s){int k, total = 0;for(k=0; k < s.length(); k++){total += s.charAt(k);}return total;}ÿ Consider total += (k+1)*s.charAt(k), why might this be better? Other functions used, always mod result by table sizeÿ What about hashing other objects? Need conversion of key to index, not always simple Every object contains hashCode()!CompSci 100E22.12Trie: efficient search words/suffixesÿ A trie (from retrieval, but pronounced “try”) supports Insertion: put string into trie (delete and look up) These operations are O(size of string) regardless of how many strings are stored in the trie! Guaranteed!ÿ In some ways a trie is like a 128 (or 26 or alphabet-size) tree, one branch/edge for each character/letter Node stores branches to other nodes Node stores whether it ends the string from root to itÿ Extremely useful in DNA/string processing Very useful for matching suffixes: suffix treeCompSci 100E22.13Trie picture and code (see Trie.java)ÿ To add string Start at root, for each char create node as needed, go down tree, mark last nodeÿ To find string Start at root, follow linkso If null, not found Check word flag at endÿ To print all nodes Visit every node, build string as nodes traversedÿ What about union and intersection?acprnsarahaostcdIndicates word ends


View Full Document

Duke CPS 100E - Lecture

Documents in this Course
Topics

Topics

9 pages

Lecture

Lecture

3 pages

Notes

Notes

2 pages

Hashing

Hashing

19 pages

Lecture

Lecture

59 pages

Lecture

Lecture

6 pages

Lecture

Lecture

4 pages

Lecture

Lecture

20 pages

Lecture

Lecture

12 pages

Lecture

Lecture

12 pages

Lecture

Lecture

7 pages

Lecture

Lecture

8 pages

Lecture

Lecture

10 pages

Lecture

Lecture

4 pages

Notes

Notes

16 pages

Lecture

Lecture

5 pages

Lecture

Lecture

9 pages

Lecture

Lecture

4 pages

Lecture

Lecture

6 pages

Lecture

Lecture

16 pages

Lecture

Lecture

5 pages

Lecture

Lecture

5 pages

Lecture

Lecture

12 pages

Lecture

Lecture

12 pages

Lecture

Lecture

10 pages

Sets

Sets

14 pages

Lecture

Lecture

9 pages

Lecture

Lecture

4 pages

Test 1

Test 1

7 pages

Load more
Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?