DOC PREVIEW
MIT 6 050J - Compression

This preview shows page 1-2-3-4-5 out of 14 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 14 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

3 Compression3.1 Variable-Length Encoding3.2 Run Length Encoding3.3 Static Dictionary3.4 Semi-adaptive Dictionary3.5 Dynamic Dictionary3.5.1 The LZW Patent3.5.2 How does LZW work?3.6 Irreversible Techniques3.7 Detail: LZW Compression3.7.1 LZW Algorithm, Example 13.7.2 LZW Algorithm, Example 23.8 Detail: 2-D Discrete Cosine Transformation3.8.1 Discrete Linear Transformations3.8.2 Discrete Cosine TransformationChapter 3CompressionIn Chapter 1 we examined the fundamental unit of information, the bit, and its various abstract repre-sentations: the mathematical bit, the circuit bit, the control bit, the classical bit, and the quantum bit. Ournever-ending quest for improvement made us want representations of single bits that are stronger, smaller,faster, and cheaper. Following this route led us to fundamental limitations imposed by quantum mechanics.In Chapter 2 we considered some of the issues surrounding the representation of complex objects byarrays of bits. The mapping between the objects to be represented (the symbols) and the array of bitsused for this purpose is known as a code. We naturally want codes that are stronger and smaller, i.e., thatlead to representations of objects that are smaller and less susceptible to errors. In this chapter we willconsider techniques of compression that can be used for generation of particularly efficient representations.In Chapter 4 we will look at techniques of avoiding errors.In Chapter 2 we looked at this general sort of a system shown in Figure 3.1, in which symbols are encodedinto bit strings, these are transported (in space and/or time) to a decoder, which then recreates the originalsymbols. Typically the same code is used for a succession of symbols, one after another.CoderChannelDecoder- - - -(Arrays of Bits) (Arrays of Bits)Input(Symbols)Output(Symbols)Figure 3.1: Generalized communication systemThe role of data compression is to convert the string of bits representing a succession of symbols intoa shorter string for more economical transmission, storage, or processing. The result is the system inFigure 3.2, with both a compressor and an expander. Ideally, the expander would exactly reverse the actionof the compressor so that the coder and decoder could be unchanged.On first thought, this approach might seem surprising. Why is there any reason to believe that the sameinformation could be contained in a smaller number of bits? We will look at two typ e s of compression, usingAuthor: Paul Penfield, Jr.This document: http://www-mtl.mit.edu/Courses/6.050/2005/notes/chapter3.pdfVersion 1.2, February 8, 2005. Copyrightc 2005 Massachusetts Institute of TechnologyStart of notes · back · next | 6.050J/2.110J home page | Site map | Search | About this document | Comments and inquiries283.1 Variable-Length Encoding 29Source EncoderCompressorChannelExpanderSource Decoder- - - - - -Input Output(Symbols) (Symbols)(Arraysof Bits)(Arraysof Bits)(Arraysof Bits)(Arraysof Bits)Figure 3.2: More elaborate communication systemdifferent approaches:• Lossless or reversible compression (which can only be done if the original code was inefficient, forexample by having unused bit patterns, or by not taking into account the fact that some symbols areused more frequently than others)• Lossy or irreversible compression, in which the original symbol, or its coded representation, cannotbe reconstructed from the smaller version exactly, but instead the expander produces an approximationthat is “good enough”The six techniques described b e low are astonishingly effective in compressing data files. The first fourare reversible, and the last one irreversible. Each technique has some cases for which it is particularly wellsuited (the best cases) and others for which it is not well suited (the worst cases).3.1 Variable-Length EncodingIn Chapter 2 Morse code was discussed as an example of a source code in which more frequently occurringletters of the English alphabet were represented by shorter codewords, and less frequently occurring lettersby longer codewords. On average, messages as sent using these codewords are shorter than they would beusing all codewords of the same length. Variable-length encoding can be done either in the source encoder orthe compressor. A general procedure for variable-length encoding will be given in Chapter 5, so a discussionof this technique is put off until that chapter.3.2 Run Length EncodingSupp ose a message consists of long sequences of a small number of symbols or characters. Then themessage could be encoded as a list of the symbol and the number of times it occurs. Thus the message “aB B B B B a a a B B a a a a” could be written as “a 1 B 5 a 3 B 2 a 4”. This technique works very well fora relatively small number of circumstances. One example is the German flag, which could be encoded as somany black pixels, so many red pixels, and so many yellow pixels, with a great saving over specifying everypixel. Another example comes from fax technology, where a document is scanned and long groups of whiteFigure 3.3: Flag of Germany (black band on top, red in middle, yellow on bottom)3.3 Static Dictionary 30(or black) pixels are transmitted as merely the number of such pixels (since there are only white and blackpixels, it is not even necessary to encode the color since it can be assumed to be the other color).Run length encoding does not work well for messages without repeated sequences of the same symbol.For example, it may work well for drawings and even black-and-white scanned images, but it does not workwell for photographs since small changes in color from one pixel to the next would require a new symbol.3.3 Static DictionaryIf a code has unused codewords, these may be assigned, as abbreviations, to frequently occurring sequencesof symbols. Then such sequences could be encoded with no more bits than would be needed for a singlesymbol. For e xample, if English text is being encoded in ASCII and the DEL character is regarded asunnecessary, then it might make sense to assign its codeword 127 to the common word “the”. Practicalcodes offer numerous examples of this technique. The list of the codewords and their meanings is called acodebook, or dictionary. The compression technique considered here uses a dictionary which is static in thesense that it does not change from one message to the next.An example will illustrate this technique. Before the the electric telegraph, there were other schemes fortransmitting messages


View Full Document

MIT 6 050J - Compression

Download Compression
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Compression and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Compression 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?