Clemson CPSC 863 - Multimedia Systems and Applications

Unformatted text preview:

1 CpSc 863: Multimedia Systems and Applications Data Compression James Wang 2 An overview of Compression Compression becomes necessary in multimedia because it requires large amounts of storage space and bandwidth Types of Compression Lossless compression = data is not altered or lost in the process Lossy = some info is lost but can be reasonably reproduced using the data. 3 Binary Image Compression RLE (Run Length Encoding) Also called Packed Bits encoding E.g. aaaaaaaaaaaaaaaaaaaa111110000000 Can be coded as: Byte1 Byte 2 Byte3 Byte4 Byte5 Byte6 20 a 05 1 07 0 This is a one dimensional scheme. Some schemes will also use a flag to separate the data bytes http://astronomy.swin.edu.au/~pbourke/dataformats/rle/ http://datacompression.info/RLE.shtml http://www.data-compression.info/Algorithms/RLE/ 4 Binary Image Compression Disadvantage of RLE scheme: When groups of adjacent pixels change rapidly, the run length will be shorter. It could take more bits for the code to represent the run length that the uncompressed data  negative compression. It is a generalization of zero suppression, which assumes that just one symbol appears particularly often in sequences. 5 Lossless Compression Algorithms (Entropy Encoding) Adapted from: http://www.cs.cf.ac.uk/Dave/Multimedia/node207.html 6 Basics of Information Theory According to Shannon, the entropy of an information source S is defined as: where pi is the probability that symbol Si in S will occur. indicates the amount of information contained in Si, i.e., the number of bits needed to code Si. For example, in an image with uniform distribution of grey-level intensity, i.e. pi = 1/256, then the number of bits needed to code each grey level is 8 bits. The entropy of this image is 8. iiippSH1log)( 2ip1log 22 7 The Shannon-Fano Algorithm A simple example will be used to illustrate the algorithm: Symbol A B C D E Count 15 7 6 6 5 8 The Shannon-Fano Algorithm Encoding for the Shannon-Fano Algorithm: A top-down approach 1. Sort symbols according to their frequencies/probabilities, e.g., ABCDE. 2. Recursively divide into two parts, each with approx. same number of counts. MORE FREQUENT  LESS BITS, LESS FREQUENT  MORE BITS 9 The Shannon-Fano Algorithm Symbol Count log2(1/pi) Code Subtotal (# of bits) =Count X Code Size A 15 1.38 00 30 B 7 2.48 01 14 C 6 2.70 10 12 D 6 2.70 110 18 E 5 2.96 111 15 TOTAL (# of bits): 89 10 Huffman Coding Encoding for Huffman Algorithm: A bottom-up approach 1. Initialization: Put all nodes in an OPEN list, keep it sorted at all times (e.g., ABCDE). 2. Repeat until the OPEN list has only one node left: (a) From OPEN pick two nodes having the lowest frequencies/probabilities, create a parent node of them. (b) Assign the sum of the children's frequencies/probabilities to the parent node and insert it into OPEN. (c) Assign code 0, 1 to the two branches of the tree, and delete the children from OPEN. 11 Huffman Coding 12 Huffman Coding Symbol Count log2(1/pi) Code Subtotal (# of bits) =Count X Code Size A 15 1.38 0 15 B 7 2.48 100 21 C 6 2.70 101 18 D 6 2.70 110 18 E 5 2.96 111 15 TOTAL (# of bits): 873 13 Huffman Coding Discussions: Decoding for the above two algorithms is trivial as long as the coding table (the statistics) is sent before the data. (There is a bit overhead for sending this, negligible if the data file is big.) Unique Prefix Property: no code is a prefix to any other code (all symbols are at the leaf nodes) --> great for decoder, unambiguous. If prior statistics are available and accurate, then Huffman coding is very good. In the above example: entropy = (15 x 1.38 + 7 x 2.48 + 6 x 2.7 + 6 x 2.7 + 5 x 2.96) / 39 = 85.26 / 39 = 2.19 Number of bits needed for Human Coding is: 87 / 39 = 2.23 14 Adaptive Huffman Coding Motivations: (a) The previous algorithms require the statistical knowledge which is often not available (e.g., live audio, video). (b) Even when it is available, it could be a heavy overhead especially when many tables had to be sent when a non-order0 model is used, i.e. taking into account the impact of the previous symbol to the probability of the current symbol (e.g., "qu" often come together, ...). The solution is to use adaptive algorithms. As an example, the Adaptive Huffman Coding is examined below. The idea is however applicable to other adaptive compression algorithms. 15 Adaptive Huffman Coding ENCODER Initialize_model(); while ((c = getc (input)) != eof) { encode (c, output); update_model (c); } 16 Adaptive Huffman Coding DECODER Initialize_model(); while ((c = decode (input)) != eof) { encode (c, output); update_model (c); } 17 Adaptive Huffman Coding Increasing weight Adaptive Huffman Tree 18 Adaptive Huffman Coding The key is to have both encoder and decoder to use exactly the same initialization and update_model routines. update_model does two things: (a) increment the count, (b) update the Huffman tree. During the updates, the Huffman tree will be maintained its sibling property, i.e. the nodes (internal and leaf) are arranged in order of increasing weights (see figure). When swapping is necessary, the farthest node with weight W is swapped with the node whose weight has just been increased to W+1. Note: If the node with weight W has a sub tree beneath it, then the sub tree will go with it. The Huffman tree could look very different after node swapping, e.g., in the third tree, node A is again swapped and becomes the #5 node. It is now encoded using only 2 bits.4 19 Adaptive Huffman Coding Increasing weight Adaptive Huffman Tree 20 Adaptive Huffman Coding Note: Code for a particular symbol changes during the adaptive coding process. W+1  W (A) (D) 21 Adaptive Huffman Coding Note: Code for a particular symbol changes during the adaptive coding process. 22 Lempel-Ziv-Welch Algorithm Motivation: Suppose we want to encode the Webster's English dictionary which contains about 159,000 entries. Why not just transmit each word as an 18 bit number? Problems: (a) Too many bits, (b) everyone needs a dictionary, (c) only works for English text. Solution: Find a way to build the dictionary adaptively. Original methods due to Ziv and Lempel in 1977 and 1978. Terry Welch improved the scheme in 1984 (called LZW compression). It is used in e.g., UNIX compress, GIF,


View Full Document

Clemson CPSC 863 - Multimedia Systems and Applications

Documents in this Course
Load more
Download Multimedia Systems and Applications
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multimedia Systems and Applications and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multimedia Systems and Applications 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?