Unformatted text preview:

Lecture Notes on Tries 15 122 Principles of Imperative Computation Frank Pfenning Lecture 18 October 26 2010 1 Introduction In the data structures implementing associative arrays so far we have needed either an equality operation and a hash function or a comparison operator with a total order on keys Similarly our sorting algorithms just used a total order on keys and worked by comparisons of keys We obtain a different class of representations and algorithms if we analyze the structure of keys and decompose them In this lecture we explore tries an example from this class of data structures The asymptotic complexity we obtain has a different nature from data structures based on comparisons depending on the structure of the key rather than the number of elements stored in the data structure 2 The Boggle Word Game The Boggle word game is played on an n n grid usually 4 4 or 5 5 We have n n dice that have letters on all 6 sides and which are shaken so that they randomly settle into the grid At that point we have an n n grid filled with letters Now the goal is to find as many words as possible in this grid within a specified time limit To construct a word we can start at an arbitrary position and use any of the 8 adjacent letters as the second letter From there we can again pick any adjacent letter as the third letter in the word and so on We may not reuse any particular place in the grid in the same word but they may be in common for different words For example L ECTURE N OTES O CTOBER 26 2010 Tries L18 2 in the grid E F R A H G D R P S N A E E B E we have the words SEE SEEP and BEARDS but not SEES Scoring assigns points according to the lengths of the words found where longer words score higher One simple possibility for implementing this game is to systematically search for potential words and then look them up in a dictionary perhaps stored as a sorted word list some kind of binary search tree or a hash table The problem is that there are too many potential words on the grid so we want to consider prefixes and abort the search when a prefix does not start a word For example if we start in the upper right hand corner and try horizontally first then EF is a prefix for a number of words but EFR EFD EFG EFH are not and we can abandon our search quickly A few more possibilities reveal that no word with 3 letters or more in the above grid starts in the upper left hand corner Because a dictionary is sorted alphabetically by prefix we may be able to use a sorted array effectively in order for the computer to play Boggle and quickly determine all possible words on a grid But we may still look for potentially more efficient data structures which take into account that we are searching for words that are constructed by incrementally extending the prefix 3 Multi Way Tries One possibility is to use a multi way trie where each node has a potential child for each letter in the alphabet Consider the word SEE We start at the root and follow the link labeled S which gets us to a node on the second level in the tree This tree indexes all words with first character S From here we follow the link labeled E which gets us to a node indexing all words that start with SE After one more step we are at SEE At this point we cannot be sure if this is a complete word or just a prefix for words stored in it In order to record this we can either store a boolean true if the current prefix is a complete word or terminate the word with a special character that cannot appear in the word itself L ECTURE N OTES O CTOBER 26 2010 Tries L18 3 Below is an example of a multi way trie indexing the three words BE BED and BACCALAUREATE A B C D E Z false A B C D E Z A B C D E Z A B C D E Z false A B false C D E Z true true While the paths to finding each word are quite short including one more node than characters in the word the data structure consumes a lot of space because there are a lot of nearly empty arrays An interesting property is that the lookup time for a word is O k where k is the number of characters in the word This is independent of how many words are stored in the data structure Contrast this with say balanced binary search trees where the search time is O log n where n is the number of words stored For the latter analysis we assumed that key comparisons where constant time which is not really true because the keys which are strings have to be compared character by character So each comparison while searching through a binary search tree might take up to O k individual character comparison which would make it O k log n in the worst case Compare that with O k for a trie On the other hand the wasted space of the multi way trie with an array at each node costs time in practice This is not only because this memory must be allocated but because on modern architectures the so called memory hierarchy means that accesses to memory cells close to each other will be L ECTURE N OTES O CTOBER 26 2010 Tries L18 4 much faster than accessing distant cells You will learn more about this in 15 213 Computer Systems 4 Binary Tries The idea of the multi way trie is quite robust and there are useful special cases One of these if we want to represent sets of numbers In that case we can decompose the binary representation of numbers bit by bit in order to index data stored in the trie We could start with the most significant or least significant bit depending on the kind of numbers we expect In this case every node would have at most two successors one for 0 and one for 1 This does not waste nearly as much space and can be efficient for many purposes 5 Ternary Search Tries For the particular application we have in mind namely searching for words on a grid of letters we could either use multiway tries directly wasting space or use binary tries wasting time and space because each character is decomposed into individual bits A more suitable data structure is a ternary search trie TST which combines ideas from binary search trees with tries Roughly at each node in a trie we store a binary search tree with characters as keys The entries are pointers to the subtries More precisely at each node we store a character c and three pointers The left …


View Full Document

CMU CS 15122 - Lecture Notes on Tries

Loading Unlocking...
Login

Join to view Lecture Notes on Tries and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes on Tries and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?