LECTURE 5 Last time: Stochastic processes • Markov chains • Entropy rate • Random walks on graphs • Hidden Markov models • Lecture outline Codes • Kraft inequality • optimal codes. • Reading: Scts. 5.1-5.4.Codes for random variables Notation: the concatenation of two strings x and y is denoted by xy. The set of all strings over a finite alphabet D is denoted by D∗. W.l.o.g. assume D = 0, 1, . . . , D − 1 where D = |D ∗ |. Definition: a source code for a random variable X is a map C : X �→ D∗ x C(x)→ where C(x) is the codeword associated with x l(x) is the length of C(x) The length of a code C is L(C) = EX[l(X)]Codes for random variables C is nonsingular if every element of X maps onto a different element of D∗ The extension of a code C : X �→ D∗ is the code C∗ : X∗ �→ D∗ xn C∗(xn) = C(x1)C(x2) . . . C(xn)→ A code is uniquely decodable if its extension is nonsingular A code is instantaneous (or prefix code) iff no codeword of C is a prefix of any other codeword C Visually: construct a tree whose leaves are codewordsKraft inequality Any instantaneous code C with code lengths l1, l2, . . . , lm must satisfy m� D−li ≤ 1 i=1 Conversely, given lengths l1, l2, . . . , lm that satisfy the above inequality, there exists an instantaneous code with these codeword lengths Proof: construct a D-ary tree T (code-words are leaves) Extend tree T to D-ary tree T � with depth lMAX, total number of leaves is DlMAXKraft inequality Each leaf of T � is a descendant of at most one leaf of T Leaf in T corresponding to codeword C(i) has exactly DlMAX −li descendants in T � (1 if li = lMAX) Summing over all leaves of T gives m� DlMAX −li ≤ DlMAX i=1 m� D−li ⇒ i=1 ≤ 1Kraft inequality Given lengths l1, l2, . . . , lm satisfying Kraft’s inequality, we can construct a tree by as-signing C(i) to first available node at depth C(i)Extended Kraft inequality Kraft inequality holds for all countably in-finite set of codewords Let n(y1y2 . . . yli) be the real �lji =1 yjD−j associated with the ith codeword Why are the n(y1y2 . . . yli)s for different code-words different? By the same reasoning, all intervals � 1 � n(y1y2 . . . yli), n(y1y2 . . . yli) + Dli are disjoint since these intervals are all in (0, 1), the sum of their lengths is ≤ 1 For converse, reorder indices in increasing order and assign intervals as we walk along the unit intervalOptimal codes Optimal code is defined as code with small-est possible C(L) with respect to PX Optimization: minimize �x∈X PX(x)l(x) subject to �x∈X D−l(x) ≤ 1 and l(x)s are integersOptimal codes Let us relax the second constraint and re-place the first with equality to obtain a lower bound J = �x∈X PX(x)l(x) + λ ��x∈X D−l(x) − 1� ∂J use Lagrange multipliers and set ∂l(i)= 0 PX(i) − λ log(D)D−l(i)= 0 equivalently D−l(i)= PX (i) λ log(D) using Kraft inequality (now relaxed to equal-ity) yields 1 = � D−l(x)= � PX(i) i∈X i∈X λ log(D) so λ = 1 , yielding l(i) = − logD(PX(i))log(D)Optimal codes Thus a bound on the optimal code length is PX(i) logD(PX(i)) = HD(X)− � i∈X This is lower bound, equality holds iff PX is D-adic, PX(i) = D−l(i) for integer l(i)Optimal codes The optimal codelength L∗ satisfies HD(X) ≤ L∗≤ HD(X) + 1 Upper bound: take l(i) = �logD(PX(i))� � D�− logD(PX (i))�≤ � PX(i) = 1 i∈X thus these lengths satisfy Kraft’s inequality and we can create a prefix-free code with these lengths L∗≤ � PX(i)�− logD(PX(i))� i∈X ≤ � PX(i)(− logD(PX(i)) + 1) i∈X = HD(X) + 1 We call these types of codes Shannon codesOptimal codes Is this as tight as it gets? Consider coding several symbols together C : X n �→ D∗ expected codeword length is �xn∈X n PXn(xn)l(xn) optimum satisfies HD(Xn) ≤ L∗≤ HD(Xn) + 1 per symbol codeword length is HD(Xn) L∗ HD(Xn)+ 1 n ≤ n ≤ n nMIT OpenCourseWarehttp://ocw.mit.edu 6.441 Information Theory Spring 2010 For information about citing these materials or our Terms of Use, visit:
View Full Document