MIT 6 050J - Compression - D207377

Home> Schools> Massachusetts Institute of Technology> Electrical Engineering and Computer Science (6) > 6 050J> Compression

MIT 6 050J - Compression

School name Massachusetts Institute of Technology

Course 6 050j- Information and Entropy

Pages 15

Download Save

Unformatted text preview:

3 Compression3.1 Variable-Length Encoding3.2 Run Length Encoding3.3 Static Dictionary3.4 Semi-adaptive Dictionary3.5 Dynamic Dictionary3.5.1 The LZW Patent3.5.2 How does LZW work?3.6 Irreversible Techniques3.7 Detail: LZW Compression3.7.1 LZW Algorithm, Example 13.7.2 LZW Algorithm, Example 23.8 Detail: 2-D Discrete Cosine Transformation3.8.1 Discrete Linear Transformations3.8.2 Discrete Cosine TransformationChapter 3CompressionIn Chapter 1 we examined the fundamental unit of information, the bit, and its various abstract repre-sentations: the mathematical bit, the circuit bit, the control bit, the classical bit, and the quantum bit. Ournever-ending quest for improvement made us want representations of single bits that are stronger, smaller,faster, and cheaper.In Chapter 2 we considered some of the issues surrounding the representation of complex objects byarrays of bits. The mapping between the objects to be represented (the symbols) and the array of bitsused for this purpose is known as a code. We naturally want codes that are stronger and smaller, i.e., thatlead to representations of objects that are smaller and less susceptible to errors. In this chapter we willconsider techniques of compression that can be used for generation of particularly efficient representations.In Chapter 4 we will look at techniques of avoiding errors.In Chapter 2 we considered systems of the sort shown in Figure 3.1, in which symbols are encoded intobit strings, which are transported (in space and/or time) to a decoder, which then recreates the originalsymbols.CoderChannelDecoder- - - -(Arrays of Bits) (Arrays of Bits)Input(Symbols)Output(Symbols)Figure 3.1: Generalized communication systemTypically the same code is used for a sequence of symbols, one after another. The role of data compressionis to convert the string of bits representing a succession of symbols into a shorter string for more economicaltransmission, storage, or processing. The result is the system in Figure 3.2, with both a compressor and anexpander. Ideally, the expander would exactly reverse the action of the compressor so that the coder anddecoder could be unchanged.On first thought, this approach might seem surprising. Why is there any reason to believe that the sameAuthor: Paul Penfield, Jr.This document: http://www.mtl.mit.edu/Courses/6.050/2006/notes/chapter3.pdfVersion 1.3, February 13, 2006. Copyrightc 2006 Massachusetts Institute of TechnologyStart of notes · back · next | 6.050J/2.110J home page | Site map | Search | About this document | Comments and inquiries293.1 Variable-Length Encoding 30Source EncoderCompressorChannelExpanderSource Decoder- - - - - -Input Output(Symbols) (Symbols)(Arraysof Bits)(Arraysof Bits)(Arraysof Bits)(Arraysof Bits)Figure 3.2: More elaborate communication systeminformation could be contained in a smaller number of bits? We will look at two types of compression, usingdifferent approaches:• Lossless or reversible compression (which can only be done if the original code was inefficient, forexample by having unused bit patterns, or by not taking advantage of the fact that some symbols areused more frequently than others)• Lossy or irreversible compression, in which the original symbol, or its c oded representation, cannotbe reconstructed from the smaller version exactly, but instead the expander produces an approximationthat is “good e nough”Six techniques are described below which are astonishingly effective in compressing data files. The first fiveare reversible, and the last one is irreversible. Each technique has some cases for which it is particularly wellsuited (the best cases) and others for which it is not well suited (the worst cases).3.1 Variable-Length EncodingIn Chapter 2 Morse code was discussed as an example of a source code in which more frequently occurringletters of the English alphabet were represented by shorter codewords, and less frequently occurring lettersby longer codewords. On average, messages as sent using these codewords are shorter than they would beusing all codewords of the same length. Variable-length encoding c an be done either in the source encoder orthe compressor. A general procedure for variable-length encoding will be given in Chapter 5, so a discussionof this technique is put off until that chapter.3.2 Run Length EncodingSupp ose a me ssage consists of long sequences of a small number of symbols or characters. Then themessage could be encoded as a list of the symbol and the number of times it occurs. Thus the message “aB B B B B a a a B B a a a a” could be written as “a 1 B 5 a 3 B 2 a 4”. This technique works very well fora relatively small number of circumstances. One example is the German flag, which could be encoded as somany black pixels, so many red pixels, and so many yellow pixels, with a great saving over specifying everypixel. Another example comes from fax technology, where a document is scanned and long groups of white(or black) pixels are transmitted as merely the number of such pixels (since there are only white and blackpixels, it is not even necessary to specify the color since it can be assumed to be the other color).Run length encoding does not work well for messages without repeated sequences of the same symbol.For example, it may work well for drawings and even black-and-white scanned images, but it does not workwell for photographs since small changes in shading from one pixel to the next would require that manysymbols be defined.3.3 Static Dictionary 31Figure 3.3: Flag of Germany (black band on top, red in middle, yellow on bottom)3.3 Static DictionaryIf a code has unused codewords, these may be assigned, as abbreviations, to frequently occurring sequencesof symbols. Then such sequences could be encoded with no more bits than would b e needed for a singlesymbol. For example, if English text is being encoded in ASCII and the DEL character is regarded asunnecessary, then it might make sense to assign its codeword 127 to the common word “the”. Practicalcodes offer numerous examples of this technique. The list of the codewords and their meanings is called acodebook, or dictionary. The compression technique considered here uses a dictionary which is static in thesense that it does not change from one me ssage to the next.An example will illustrate this technique. Before the the electric telegraph, there were other schemes fortransmitting messages long distances. A mechanical telegraph described in some detail by

View Full Document


School:
Email:
New Password:
Confirm Password:

MIT 6 050J - Compression

Sign up for free to view:

Please select your school