Unformatted text preview:

VARIABLE TO FIXED LENGTH SOURCE CODING - TUNSTALL CODES 6.441 Supplementary Notes 1, 2/10/94 So far, we have viewed source coding as a mapping from letters of a source alphabet into strings from a code alphabet. We saw that Huffman coding minimized the expected number of code letters per source letter, subject to the requirement of unique decodability. We then saw that by accumulating L source letters at a time into a "super letter" from the alphabet of source L-tuples, the expected number of code letters per source letter, nL, could typically be reduced with increasing L. This reduction arose in two ways, first by taking advantage of any statistical dependencies that might exist between successive source letters, and second by reducing the effect of the integer constraint on code word length. We saw that (for stationary sources) the expected number of binary code letters per source letter, nL, satisfies H∞ ≤ nL ≤ HL + 1/L where limL∅∞HL = H∞. By taking L to be arbitrarily large, and viewing it as the lifetime of the source, we see that in a very real sense, H∞ is the minimum expected number of binary code letters per source letter that can be achieved by any source coding technique, and that this value can be approached arbitrarily closely by Huffman coding over L-tuples of source letters for sufficiently large L. We see from this that, from a purely theoretical standpoint, there is not much reason to explore other approaches to source coding. From a slightly more practical viewpoint, however, there are two important questions that must be asked. First, are there computationally simpler techniques to approach the limit H∞? Note that a Huffman code on L-tuples from a source alphabet of size K contains KL code words; this is not attractive computationally for large L. The second question has to do with the assumption of known source probabilities. Usually such probabilities are not known, and thus one would like to have source coding techniques that are "universal" in the sense that they work well independent of the source probabilities, or "adaptive" in the sense that they adapt to the existing source probabilities. There is not much difference between universal and adaptive source coding, and we discuss these topics later. The important1.2 point for now, however, is that we need a richer set of tools than just Huffman coding in order to address both the problem of computational complexity and the problem of adaptation. A broader viewpoint of a source coder than that taken for Huffman codes is given in figure 1, where the encoder is broken into two parts, a parser and a coder. The parser segments the source output sequence into a concatenation of strings. As an example: a b a a c b a a b a c a a a b c ... (ab)(aac)(b)(aab)(ac)(aaa)(b)(c)... The string encoder then maps the set of possible strings produced by the parser into binary code words (or more generally Dary code words). For example, the parser might simply parse the source output into L-tuples, as analyzed earlier. The primary concern here, however, is the case where the parser output is a set of variable length strings (as in the example above) . We assume that there is a finite dictionary of M allowable strings and that the string encoder maps the set of such strings into binary code words. If these binary code words are uniquely decodable (which we assume), then a decoder can reproduce the parsed strings, from which the original source output is obtained. SourceParserString Encodera b a a c (ab)(aac)Binary codeFigure 1: Model of Source encoder We see from the above description that an essential element of a parser is its dictionary of strings. Assuming that any combination of source letters (from the given K letter alphabet) is possible, the dictionary must have the property that every possible source sequence has a prefix that is in the dictionary (for otherwise the parser could not produce an initial string). Such a dictionary is said to be a valid dictionary. We shall also restrict our attention temporarily to dictionaries that satisfy the prefix condition, i.e., that no dictionary entry is a prefix of any other dictionary entry.1.3 cbabacaaaaacbabacaaaaabaaccbaaa) Non-valid b) Valid, Prefixc) Valid, non-prefixFigure 2 A dictionary can be represented as a rooted tree, as illustrated for a ternary source alphabet in figure 2. Note that the non-valid dictionary in Fig 2a is not capable of parsing ab..., illustrating that parsers must use valid dictionaries. Note also that the non-prefix condition dictionary in Fig. 2c allows the sequence aaab... to be parsed either as (aaa)(b) or (aa)(ab). This is not a serious problem, except that the parser is not fully described by such a dictionary; it also needs a rule for choosing between such alternatives. Practical adaptive source coders often use such non-prefix condition dictionaries. The rule usually used, given the dictionary of strings y1, y2,...,yM, is for the parser to pick the longest prefix of the source sequence u1, u2,... that is in the dictionary (say u1,...,uL). For a prefix condition dictionary, of course, the dictionary fully specifies the parsing process. Note that the dictionary tree here is analogous to the code tree for a Huffman code. A valid prefix condition dictionary corresponds to a complete prefix condition code tree. It is interesting to observe that validity is necessary for the parser, whereas completeness is only desirable for efficiency in the code tree. We now restrict our attention to valid prefix condition dictionaries. As we saw before, each intermediate node in such a dictionary tree has K immediate descendants (where K is the source alphabet size) and the total number of leaves in such a tree is of the form M=α(K-1)+1 for some integer α, where α is the number of intermediate nodes, including the root. We also now restrict our attention to variable to fixed length codes in which the string encoder maps each dictionary string into a fixed length binary string of length n. This requires the number of strings to satisfy M≤2n and for efficiency, we make M as large as possible subject to M=α(K-1)+1 and M≤2n. That is, M lies in the range 2n-(K-2)≤M≤2n.1.4 It is intuitively plausible that the appropriate objective, given the constraints above, is to find that dictionary of at most 2n strings that maximizes the expected number, E[L], of source letters per dictionary


View Full Document

MIT 6 441 - VARIABLE TO FIXED LENGTH SOURCE CODING

Download VARIABLE TO FIXED LENGTH SOURCE CODING
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view VARIABLE TO FIXED LENGTH SOURCE CODING and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view VARIABLE TO FIXED LENGTH SOURCE CODING 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?