DOC PREVIEW
MIT 6 454 - Grammar-Based Codes

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 3, MAY 2000 737Grammar-Based Codes: A New Class of UniversalLossless Source CodesJohn C. Kieffer, Fellow, IEEE, and En-hui Yang, Member, IEEEAbstract—We investigate a type of lossless source code called agrammar-based code, which, in response to any input data stringover a fixed finite alphabet, selects a context-free grammarrepresenting in the sense that is the unique string belongingto the language generated by. Lossless compression of takesplace indirectly via compression of the production rules of thegrammar. It is shown that, subject to some mild restrictions,a grammar-based code is a universal code with respect to thefamily of finite-state information sources over the finite alphabet.Redundancy bounds for grammar-based codes are established.Reduction rules for designing grammar-based codes are pre-sented.Index Terms—Chomsky hierarchy, context-free grammars, en-tropy, Kolmogorov complexity, lossless coding, redundancy, uni-versal coding.I. INTRODUCTIONGRAMMARS (especially context-free grammars) havemany applications in engineering and computer science.Some of these applications are speech recognition [8, Ch. 13],image understanding [22, p. 289], compiler design [1], andlanguage modeling [10, Theorems 4.5, 4.6]. In this paper, weshall be interested in using context-free grammars for losslessdata compression. There has been some previous work ofthis nature, including the papers [3], [2], [11], [14], [24],[18].Two approaches have been used. In one of these approaches(as illustrated in [2], [11], and [14]), one fixes a context-freegrammar, known to both encoder and decoder, such that thelanguage generated bycontains all of the data strings thatare to be compressed. To compress a particular data string, onethen compresses the derivation tree [2, p. 844] showing how thegiven string is derived from the start symbol of the grammar.In the second of the two approaches (exemplified by the papers[3], [24], and [18]), a different context-free grammarisassigned to each data string, so that the language generated byis . If the data string is to be compressed, the encodertransmits code bits to the decoder that allow reconstruction ofthe grammar, from which the decoder infers . This secondapproach is the approach that we employ in this paper. WeManuscript received December 20, 1995; revised December 16, 1999. Thiswork was supported in part by the National Science Foundation under GrantsNCR-9304984, NCR-9508282, NCR-9627965, and by the Natural Sciences andEngineering Research Council of Canada under Grant RGPIN203035-98.J. C. Kieffer is with the Department of Electrical and Computer Engi-neering, University of Minnesota, Minneapolis, MN 55455 USA (e-mail:[email protected]).E.-h. Yang is with the Department of Electrical and Computer Engi-neering, University of Waterloo, Waterloo, Ont., Canada N2L 3G1 (e-mail:[email protected]).Communicated by M. Weinberger, Associate Editor for Source Coding.Publisher Item Identifier S 0018-9448(00)03094-7.Fig. 1. Encoder structure of grammar-based code.shall put forth a class of lossless source codes that employ thisapproach that we call grammar-based codes. Unlike previousworkers using the second approach, we place our results in aninformation-theoretic perspective, showing how to properlydesign a grammar-based code so that it will be a universal codewith respect to the family of finite-state information sources ona fixed finite alphabet.In this section, we wish to give the reader an informal notionof theidea of a grammar-based code.For this purpose, we do notneed a precise definition of the concept of context-free grammar(this will be done in Section II). All we need to know about acontext-free grammarhere is that it furnishes us with someproduction rules via which we can construct certain sequencesover a finite alphabet which form what is called the languagegenerated by, denoted by .A grammar-based code consists of encoder and decoder.• Fig. 1 depicts the encoder structure. Lettingdenote thedata string that is to be compressed, consisting of finitelymany terms chosen from some fixed finite alphabet, thegrammar transform in Fig. 1 constructs a context-freegrammarsatisfying the property that ,which tells us thatmay be inferred from becauseis the unique string belonging to the language .The grammar encoder in Fig. 1 assigns to the grammara binary codeword which is denoted .• When the decoder is presented with the codeword,the data stringis recovered by first reconstructing thegrammar, and then inferring from the productionrules of.From the preceding, the reader can see that our philosophy isnot to directly compress the data string; instead, we try to “ex-plain”by finding a grammar that is simple and generatesin the sense that . Since can be recoveredfrom, we can compress instead of . As the grammarthat we shall use to represent will be simple, we will getgood compression by compressing.The main results of this paper (Theorems 7 and 8) tell usthat, under some weak restrictions, a grammar-based code is auniversal lossless source code for any finite-state informationsource. We shall be able to obtain specific redundancy boundsforgrammar-based codes with respectto finite-stateinformation0018–9448/00$10.00 © 2000 IEEE738 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 46, NO. 3, MAY 2000sources. Also, we shall see how to design efficient grammar-based codes by means of reduction rules.As a result of this paper, the code designer is afforded withmore flexibility in universal lossless source code design. Forexample, for some data strings, a properly designed grammar-based code yields better compression performance than that af-forded by the Lempel–Ziv universaldata compression algorithm[27].Notation and Terminology: We explain the following nota-tions and terminologies used throughout the paper:•denotes the cardinality of a finite set .•denotes the length of a string of finite length.•denotes the smallest positive integer greater than orequal to the real number.•denotes the set of all strings of finite length whose en-tries come from the finite set, including the empty string.We represent each nonempty string inmultiplicativelyand uniquely as, where is the length of thestring and; we write the empty stringinas .•Ifand are elements of , we define the product tobe the element ofsuch thati) If, then ;if , then .ii) Ifand , thenwhere and are the uniquemultiplicative


View Full Document

MIT 6 454 - Grammar-Based Codes

Documents in this Course
Load more
Download Grammar-Based Codes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Grammar-Based Codes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Grammar-Based Codes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?