Great Theoretical Ideas In Computer Science Steven Rudich Anupam Gupta CS 15 251 Lecture 20 Carnegie Mellon University March 25 2004 Decision Trees and Information A Question of Bits Spring 2004 Choice Tree A choice tree is a rooted directed tree with an object called a choice associated with each edge and a label on each leaf Choice Tree Representation of S We satisfy these two conditions Each leaf label is in S Each element from S on exactly one leaf Question Tree Representation of S I am thinking of an outfit Ask me questions until you know which one What color is the beanie What color is the tie When a question tree has at most 2 choices at each node we will call it a decision tree or a decision strategy Note Nodes with one choices represent stupid questions but we allow stupid questions 20 Questions S set of all English nouns Game I am thinking of an element of S You may ask up to 20 YES NO questions What is a question strategy for this game 20 Questions Suppose S a0 a1 a2 ak Binary search on S First question will be Is the word in a0 a1 a2 a k1 2 20 Questions Decision Tree Representation A decision tree with depth at most 20 which has the elements of S on the leaves Decision tree for Decision tree for a0 a1 a2 a k 1 2 a k 1 2 ak 1 ak Decision Tree Representation Theorem The binary search decision tree for S with k 1 elements a0 a1 a2 ak has depth d log k 1 e log k 1 k Another way to look at it If you are thinking of the noun am in S We are asking about each bit of index m Is the leftmost bit of m 0 Is the next bit of m 0 Theorem The binary search decision tree for S a0 a1 a2 ak has depth k log k 1 A lower bound Theorem No decision tree for S can have depth d log k 1 Proof Any depth d tree can have at most 2d leaves But d log k 1 number of leaves 2d k 1 Hence some element of S is not a leaf Tight bounds The optimal depth decision tree for any set S with k 1 elements has depth log k 1 k Prefix free Set Let T be a subset of 0 1 Definition T is prefix free if for any distinct x y 2 T if x y then x is not a prefix of y Example 000 001 1 01 is prefix free 0 01 10 11 101 is not Prefix free Code for S Let S be any set Definition A prefix free code for S is a prefix free set T and a 1 1 encoding function f S T The inverse function f 1 is called the decoding function Example S apple orange mango T 0 110 1111 f apple 0 f orange 1111 f mango 110 What is so cool about prefix free codes Sending sequences of elements of S over a communications channel Let T be prefix free and f be an encoding function Wish to send x1 x2 x3 Sender sends f x1 f x2 f x3 Receiver breaks bit stream into elements of T and decodes using f 1 Sending info on a channel Example S apple orange mango T 0 110 1111 f apple 0 f orange 1111 f mango 110 If we see 00011011111100 we know it must be 0 0 0 110 1111 110 0 and hence apple apple apple mango orange mango apple Morse Code is not Prefix free SOS encodes as A F K P U B G L Q V C H M R W D I N S X E J O T Y Z Morse Code is not Prefix free SOS encodes as Could decode as IAMIE A F K P U B G L Q V C H M R W D I N S X E J O T Y Z Unless you use pauses SOS encodes as A F K P U B G L Q V C H M R W D I N S X E J O T Y Z Prefix free codes are also called self delimiting codes Representing prefix free codes A 100 0 1 B 010 C 101 D 011 00 0 1 0 B 0 1 D 0 A 1 1 C F 11 CAF would encode as 1011001100 How do we decode 1011001100 fast F 0 0 1 1 0 B 0 1 D 0 A 1 1 C If you see 1000101000111011001100 can decode as F 0 0 1 1 0 B 0 1 D 0 A 1 1 C If you see 1000101000111011001100 can decode as A F 0 0 1 1 0 B 0 1 D 0 A 1 1 C If you see 1000101000111011001100 can decode as AB F 0 0 1 1 0 B 0 1 D 0 A 1 1 C If you see 1000101000111011001100 can decode as ABA F 0 0 1 1 0 B 0 1 D 0 A 1 1 C If you see 1000101000111011001100 can decode as ABAD F 0 0 1 1 0 B 0 1 D 0 A 1 1 C If you see 1000101000111011001100 can decode as ABADC F 0 0 1 1 0 B 0 1 D 0 A 1 1 C If you see 1000101000111011001100 can decode as ABADCA F 0 0 1 1 0 B 0 1 D 0 A 1 1 C If you see 1000101000111011001100 can decode as ABADCAF F 0 0 1 1 0 B 0 1 D 0 A 1 1 C If you see 1000101000111011001100 can decode as ABADCAF F Prefix free codes are yet another representation of a decision tree Theorem S has a decision tree of depth d if and only if S has a prefix free code with all codewords bounded by length d 0 F B D A C 0 1 1 0 0 B 1 D 0 A 1 1 F C Theorem S has a decision tree of depth d if and only if S has a prefix free code with all codewords bounded by length d Extends to infinite sets Let S is a subset of Theorem S has a decision tree where all length n elements of S have depth D n if and only if S has a prefix free code where all length n strings in S have encodings of length D n I am thinking of some natural number k ask me YES NO questions in order to determine k Let d k be the number of questions that you ask when I am thinking of k Let D n max d j over n bit numbers j I am thinking of some natural number k ask me YES NO questions in order to determine k Lousy strategy Is it 0 1 2 3 d k k 1 D n 2n 1 since 2n 1 1 uses only n …
View Full Document
Unlocking...