DOC PREVIEW
CMU CS 15122 - Lecture Notes on Tries

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

IntroductionThe Boggle Word GameMulti-Way TriesBinary TriesTernary Search TriesAsymptotic ComplexitySpecifying an InterfaceChecking InvariantsImplementing Search on TSTsImplementing InsertionLecture Notes onTries15-122: Principles of Imperative ComputationFrank PfenningLecture 18October 26, 20101 IntroductionIn the data structures implementing associative arrays so far, we have neededeither an equality operation and a hash function, or a comparison operatorwith a total order on keys. Similarly, our sorting algorithms just used a totalorder on keys and worked by comparisons of keys. We obtain a differentclass of representations and algorithms if we analyze the structure of keysand decompose them. In this lecture we explore tries, an example from thisclass of data structures. The asymptotic complexity we obtain has a differ-ent nature from data structures based on comparisons, depending on thestructure of the key rather than the number of elements stored in the datastructure.2 The Boggle Word GameThe Boggle word game is played on an n × n grid (usually 4 × 4 or 5 × 5).We have n ∗ n dice that have letters on all 6 sides and which are shaken sothat they randomly settle into the grid. At that point we have an n × n gridfilled with letters. Now the goal is to find as many words as possible in thisgrid within a specified time limit. To construct a word we can start at anarbitrary position and use any of the 8 adjacent letters as the second letter.From there we can again pick any adjacent letter as the third letter in theword, and so on. We may not reuse any particular place in the grid in thesame word, but they may be in common for different words. For example,LECTURE NOTES OCTOBER 26, 2010Tries L18.2in the gridE F R AH G D RP S N AE E B Ewe have the words SEE, SEEP, and BEARDS, but not SEES. Scoring as-signs points according to the lengths of the words found, where longerwords score higher.One simple possibility for implementing this game is to systematicallysearch for potential words and then look them up in a dictionary, perhapsstored as a sorted word list, some kind of binary search tree, or a hash table.The problem is that there are too many potential words on the grid, so wewant to consider prefixes and abort the search when a prefix does not starta word. For example, if we start in the upper right-hand corner and tryhorizontally first, then EF is a prefix for a number of words, but EFR, EFD,EFG, EFH are not and we can abandon our search quickly. A few morepossibilities reveal that no word with 3 letters or more in the above gridstarts in the upper left-hand corner.Because a dictionary is sorted alphabetically, by prefix, we may be ableto use a sorted array effectively in order for the computer to play Boggleand quickly determine all possible words on a grid. But we may still lookfor potentially more efficient data structures which take into account thatwe are searching for words that are constructed by incrementally extendingthe prefix.3 Multi-Way TriesOne possibility is to use a multi-way trie, where each node has a potentialchild for each letter in the alphabet. Consider the word SEE. We start at theroot and follow the link labeled S, which gets us to a node on the secondlevel in the tree. This tree indexes all words with first character S. Fromhere we follow the link labeled E, which gets us to a node indexing allwords that start with SE. After one more step we are at SEE. At this pointwe cannot be sure if this is a complete word or just a prefix for words storedin it. In order to record this, we can either store a boolean (true if thecurrent prefix is a complete word) or terminate the word with a specialcharacter that cannot appear in the word itself.LECTURE NOTES OCTOBER 26, 2010Tries L18.3Below is an example of a multi-way trie indexing the three words BE,BED, and BACCALAUREATE.A"B"C"D"E"…"Z"A"B"C"D"E"…"Z"A"B"C"D"E"…"Z"A"B"C"D"E"…"Z"false"false"true"false"A"B"C"D"E"…"Z"true"While the paths to finding each word are quite short, including one morenode than characters in the word, the data structure consumes a lot ofspace, because there are a lot of nearly empty arrays.An interesting property is that the lookup time for a word is O(k),where k is the number of characters in the word. This is independent ofhow many words are stored in the data structure! Contrast this with, say,balanced binary search trees where the search time is O(log(n)), where n isthe number of words stored. For the latter analysis we assumed that keycomparisons where constant time, which is not really true because the keys(which are strings) have to be compared character by character. So eachcomparison, while searching through a binary search tree, might take up toO(k) individual character comparison, which would make it O(k ∗ log(n))in the worst case. Compare that with O(k) for a trie.On the other hand, the wasted space of the multi-way trie with an arrayat each node costs time in practice. This is not only because this memorymust be allocated, but because on modern architectures the so-called mem-ory hierarchy means that accesses to memory cells close to each other will beLECTURE NOTES OCTOBER 26, 2010Tries L18.4much faster than accessing distant cells. You will learn more about this in15-213 Computer Systems.4 Binary TriesThe idea of the multi-way trie is quite robust, and there are useful specialcases. One of these if we want to represent sets of numbers. In that casewe can decompose the binary representation of numbers bit by bit in orderto index data stored in the trie. We could start with the most significant orleast significant bit, depending on the kind of numbers we expect. In thiscase every node would have at most two successors, one for 0 and one for1. This does not waste nearly as much space and can be efficient for manypurposes.5 Ternary Search TriesFor the particular application we have in mind, namely searching for wordson a grid of letters, we could either use multiway tries directly (wastingspace) or use binary tries (wasting time and space, because each characteris decomposed into individual bits).A more suitable data structure is a ternary search trie (TST) which com-bines ideas from binary search trees with tries. Roughly, at each node in atrie we store a binary search tree with characters as keys. The entries arepointers to the subtries.More precisely, at each node we store a character c and three point-ers. The left subtree stores all words starting with characters


View Full Document

CMU CS 15122 - Lecture Notes on Tries

Download Lecture Notes on Tries
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes on Tries and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes on Tries 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?