CSEP 590 Data Compression Autumn 2007 Arithmetic Coding Reals in Binary Any real number x in the interval 0 1 can be represented in binary as b1b2 where bi is a bit 0 0 1 0 1 binary representation x 1 CSEP 590 Lecture 3 Autumn 2007 2 First Conversion L 0 R 1 i 1 while x L if x L R 2 then bi 0 R L R 2 if x L R 2 then bi 1 L L R 2 i i 1 end while bj 0 for all j i Invariant x is always in the interval L R CSEP 590 Lecture 3 Autumn 2007 3 Conversion using Scaling Always scale the interval to unit size but x must be changed as part of the scaling 0 0 1 0 1 x 1 CSEP 590 Lecture 3 Autumn 2007 4 Binary Conversion with Scaling y x i 0 while y 0 i i 1 if y 1 2 then bi 0 y 2y if y 1 2 then bi 1 y 2y 1 end while bj 0 for all j i 1 Invariant x b1b2 bi y 2i CSEP 590 Lecture 3 Autumn 2007 5 Proof of the Invariant Initially x 0 y 20 Assume x b1b2 bi y 2i Case 1 y 1 2 bi 1 0 and y 2y b1b2 bi bi 1 y 2i 1 b1b2 bi 0 2y 2i 1 b1b2 bi y 2i x Case 2 y 1 2 bi 1 1 and y 2y 1 b1b2 bi bi 1 y 2i 1 b1b2 bi 1 2y 1 2i 1 b1b2 bi 1 2i 1 2y 2i 1 1 2i 1 b1b2 bi y 2i x CSEP 590 Lecture 3 Autumn 2007 6 Example and Exercise x 1 3 y 1 3 2 3 1 3 2 3 i 1 2 3 4 x 17 27 b 0 1 0 1 y i 17 27 1 CSEP 590 Lecture 3 Autumn 2007 b 1 7 Arithmetic Coding Basic idea in arithmetic coding represent each string x of length n by a unique interval L R in 0 1 The width R L of the interval L R represents the probability of x occurring The interval L R can itself be represented by any number called a tag within the half open interval The k significant bits of the tag t1t2t3 is the code of x That is t1t2t3 tk000 is in the interval L R It turns out that k log2 1 R L CSEP 590 Lecture 3 Autumn 2007 8 Example of Arithmetic Coding 1 1 tag must be in the half open interval 2 tag can be chosen to be L R 2 3 code is the significant bits of the tag 0 1 3 2 3 a bba b bb 1 15 27 100011100 19 27 101101000 tag 17 27 101000010 code 101 CSEP 590 Lecture 3 Autumn 2007 9 Some Tags are Better than Others 0 1 3 a ba 2 3 1 11 27 011010000 bab 15 27 100011100 b Using tag L R 2 tag 13 27 011110110 code 0111 Alternative tag 14 27 100001001 code 1 CSEP 590 Lecture 3 Autumn 2007 10 Example of Codes P a 1 3 P b 2 3 0 a aa aaa aab aba ab abb baa ba bab bba b bb 1 0 27 1 27 3 27 5 27 000000000 000010010 000111000 001011110 9 27 010101010 11 27 011010000 15 27 100011100 19 27 101101000 bbb tag L R 2 code 000001001 000100110 001001100 010000101 010111110 011110111 0 0001 001 01 01011 0111 aaa aab aba abb baa bab 101000010 101 bba 110110100 11 bbb 27 27 111111111 CSEP 590 Lecture 3 Autumn 2007 95 bits symbol 92 entropy lower bound 11 Code Generation from Tag If binary tag is t1t2t3 L R 2 in L R then we want to choose k to form the code t1t2 tk Short code choose k to be as small as possible so that L t1t2 tk000 R Guaranteed code choose k log2 1 R L 1 L t1t2 tkb1b2b3 R for any bits b1b2b3 for fixed length strings provides a good prefix code example 000000000 000010010 tag 000001001 Short code 0 Guaranteed code 000001 CSEP 590 Lecture 3 Autumn 2007 12 Guaranteed Code Example P a 1 3 P b 2 3 short tag L R 2 code 0 a aa aaa aab aba ab abb baa ba bab bba b bb 1 0 27 1 27 000001001 3 27 000100110 5 27 001001100 010000101 9 27 11 27 010111110 011110111 15 27 101000010 19 27 bbb 110110100 Prefix code 0 0001 001 01 01011 0111 0000 0001 001 0100 01011 0111 aaa aab aba abb baa bab 101 101 bba 11 11 bbb 27 27 CSEP 590 Lecture 3 Autumn 2007 13 Arithmetic Coding Algorithm P a1 P a2 P am C ai P a1 P a2 P ai 1 Encode x1x2 xn Initialize L 0 and R 1 for i 1 to n do W R L L L W C xi R L W P xi t L R 2 choose code for the tag CSEP 590 Lecture 3 Autumn 2007 14 Arithmetic Coding Example P a 1 4 P b 1 2 P c 1 4 C a 0 C b 1 4 C c 3 4 abca W R L L L W C x R L W P x symbol W a b c a 1 1 4 1 8 1 32 L 0 0 1 16 5 32 5 32 R 1 1 4 3 16 6 32 21 128 tag 5 32 21 128 2 41 256 001010010 L 001010000 R 001010100 code 00101 prefix code 00101001 CSEP 590 Lecture 3 Autumn 2007 15 Arithmetic Coding Exercise P a 1 4 P b 1 2 P c 1 4 C a 0 C b 1 4 C c 3 4 bbbb symbol W R L L L W C x R L W P x b b b b W L 0 R 1 1 tag L R code prefix code CSEP 590 Lecture 3 Autumn 2007 16 Decoding 1 Assume the length is known to be 3 0001 which converts to the tag 0001000 0 0001000 output a a b 1 CSEP 590 Lecture 3 Autumn 2007 17 Decoding 2 Assume the length is known to be 3 0001 which converts to the tag 0001000 0 aa 0001000 a output a ab b 1 CSEP 590 Lecture 3 Autumn 2007 18 Decoding 3 Assume the length is known to be 3 0001 which converts to the tag 0001000 0 aa 0001000 a aab output b ab b 1 CSEP 590 Lecture 3 Autumn 2007 19 Arithmetic Decoding Algorithm P a1 P a2 P am C ai P a1 P a2 P ai 1 Decode b1b2 bk number of symbols is n Initialize L 0 and R 1 t b1b2 bk000 for i 1 to n do W R L find j such that L W C aj t L W C aj P aj output aj L L W …
View Full Document