DOC PREVIEW
Princeton COS 226 - String Sets and Symbol Tables

This preview shows page 1-2-3 out of 8 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 8 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Robert Sedgewick and Kevin Wayne • Copyright © 2005 • http://www.Princeton.EDU/~cos226String Sets and Symbol Tables2Symbol Table ReviewSymbol table.! Associate a value with a key.! Search for value given key.! Balanced trees use O(log N) key comparisons.! Hashing uses O(1) probes, but probe proportional to key length.Q. Are key comparisons necessary? No.Q. Is time proportional to key length required? No.Best possible. Examine O(log N) bits.This lecture. Specialized symbol table/set for string keys.! Faster than hashing.! More flexible than BST.4Tries. [from retrieval, but pronounced "try"]! Store characters in internal nodes, not keys.! Store records in external nodes.! Use the characters of the key to guide the search.Ex: sells sea shells by the sea shoreTriesbyseasellsshellstheshore5ApplicationsApplications.! Spell checkers.! Auto-complete.! Data compression. [stay tuned]! Computational biology.! Inverted index of Web.! Routing tables for IP addresses.! Storing and querying XML documents.! T9 predictive text input for cell phones.6String Set: OperationsOperations.! st.add(s): insert string s into the set.! st.contains(s): is string s in the set?Removes duplicates from input streamStringSET set = new StringSET();while (!StdIn.isEmpty()) { String key = StdIn.readString(); if (!set.contains(key)) { set.add(key); System.out.println(key); }}goal: implement this efficiently7KeysKey = sequence of "digits."! DNA: sequence of a,c, g, t.! IPv6 address: sequence of 128 bits.! English words: sequence of lowercase letters.! Protein: sequence of 20 amino acids A, C, ..., Y.! Credit card number: sequence of 16 decimal digits.! International words: sequence of UNICODE characters.! Library call numbers: sequence of letters, numbers, periods.This lecture: key = string over ASCII alphabet.8String Set: Implementations Cost SummaryChallenge: As fast as hashing, as flexible as BST.N = number of stringsL = size of stringC = number of characters in inputR = radix* only reads in dataActor. 82MB, 11.4M words, 900K distinct.Moby. 1.2MB, 210K words, 32K distinct.Red-blackImplementationHashingL + log NSearch hitLlog NInsertLTypical CaseCSpaceC1.40Moby0.7697.4Actors40.6DedupInput * L L L 0.26 15.19R-Way Trie: ExampleEx: sells sea shells by the sea shoreR = 2610R-Way Trie: Java ImplementationR-way existence trie: a node.Node: reference to R nodes.rootR = 8private class Node { Node[] next = new Node[R]; boolean end;}dcba hgfe11R-Way Trie: Java Implementationpublic class StringSET { private static final int R = 128; private Node root new Node(); private class Node { Node[] next = new Node[R]; boolean end; } public boolean contains(String s) { return contains(root, s, 0); } private boolean contains(Node x, String s, int i) { if (x == null) return false; if (i == s.length()) return x.end; char c = s.charAt(i); return contains(x.next[c], s, i+1); }ASCII12R-Way Trie: Java Implementation public void add(String s) { add(root, s, 0); } private Node add(Node x, String s, int i) { if (x == null) x = new Node(); if (i == s.length()) x.end = true; else { char c = s.charAt(i); x.next[c] = add(x.next[c], s, i+1); } return x;}13Existence Symbol Table: Implementations Cost SummaryR-way trie. Faster than hashing for small R,but slow and wastes memory for large R.Challenge. Use less space.R = 256Red-BlackImplementationHashingL + log NSearch hitLlog NInsertLTypical CaseCSpaceC1.40Moby0.7697.4Actors40.6R-Way Trie L L R N + C 1.12 MemoryDedupInput L L L 0.26 15.1R = 128N = number of stringsL = size of stringC = number of characters in inputR = radix* only reads in data17Ternary Search TrieTernary search trie. [Bentley-Sedgewick]! Each node has 3 children:! Left (smaller), middle (equal), right (larger).Ex: sells sea shells by the sea shoreObservation: Few wasted links!1826-Way Trie vs. TSTTST. Collapses empty links in 26-way trie.26-way trie (1035 null links, not shown)TST (155 null links)19TST ImplementationTST String set: a node.Node: four fields:! Character d.! Reference to left TST. [smaller]! Reference to middle TST. [equal]! Reference to right TST. [larger]roothiasdhadisiphipprivate class Node { char c; Node l, m, r; boolean end;}hi20TST: Java Implementationpublic boolean contains(String s) { if (s.length() == 0) return false; return contains(root, s, 0);} private boolean contains(Node x, String s, int i) { if (x == null) return false; char c = s.charAt(i); if (c < x.c) return contains(x.l, s, i); else if (c > x.c) return contains(x.r, s, i); else if (i < s.length()-1) return contains(x.m, s, i+1); else (i < s.length()-1) return x.end;}21TST: Java Implementationpublic void add(String s) { root = add(root, s, 0); } private Node add(Node x, String s, int i) { char c = s.charAt(i); if (x == null) x = new Node(c); if (c < x.c) x.l = add(x.l, s, i); else if (c > x.c) x.r = add(x.r, s, i); else if (i < s.length()-1) x.m = add(x.m, s, i+1); else (i < s.length() - 1 x.end = true; return x;}22String Set: Implementations Cost Summaryno arithmeticRed-BlackImplementationHashingL + log NSearch hitLlog NInsertLTypical CaseCSpaceC1.40Moby0.7697.4Actors40.6R-Way TrieTSTLL + log NLL + log NR N + CC1.120.72Memory38.7DedupInput L L L 0.26 15.1N = number of stringsL = size of stringC = number of characters in inputR = radix* only reads in data23TST With R2 Branching At RootHybrid of R-way and TST.! Do R-way or R2-way branching at root.! Each of R2 root nodes points to a TST.Q. What about one letter words?TSTaaTSTabTSTacTSTzzTSTzyarray of R2 roots24String Set: Implementations Cost SummaryRed-BlackImplementationHashingL + log NSearch hitLlog NInsertLTypical CaseCSpaceC1.40Moby0.7697.4Actors40.6R-Way TrieTSTLL + log NLL + log NR N + CC1.120.72Memory38.7TST with R2L + log N L + log N C 0.51 32.7DedupInput L L L 0.26 15.1N = number of stringsL = size of stringC = number of characters in inputR = radix* only reads in data25TST SummaryAdvantages.! Linear space.! Very fast search hits.! Search misses even faster.! Adapts gracefully to irregularities in keys.! Supports even more general symbol table ops.Bottom line: TST more flexible than BST; can be faster than hashing.especially if lots of search missesexamine only a few digits of the key!26Tries: Advanced OperationsInsert. Insert a key.Contains. Check if given key in the set.Delete. Delete key from the symbol table.Sort. Iterate through the


View Full Document

Princeton COS 226 - String Sets and Symbol Tables

Documents in this Course
QUICKSORT

QUICKSORT

14 pages

QUICKSORT

QUICKSORT

55 pages

STRINGS

STRINGS

69 pages

Lecture

Lecture

4 pages

STRINGS

STRINGS

18 pages

Hashing

Hashing

7 pages

MERGESORT

MERGESORT

14 pages

Quicksort

Quicksort

12 pages

Strings

Strings

10 pages

Overview

Overview

15 pages

Hashing

Hashing

13 pages

Mergesort

Mergesort

15 pages

Tries

Tries

13 pages

Final

Final

12 pages

Final

Final

12 pages

Mergesort

Mergesort

13 pages

Final

Final

10 pages

Load more
Download String Sets and Symbol Tables
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view String Sets and Symbol Tables and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view String Sets and Symbol Tables 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?