6.441 Transmission of Information February 23, 2006Lecture 5Lecturer: Madhu Sudan Scribe: Hyun Sung Chang1 Introduction1.1 Today’s Topic• Markov chains/processes• Entropy rate of Markov chain1.2 Motivating ExampleExample 1: Let us start by considering the following example. What are the rates of X and Y ? 2 Stochastic ProcessA stochastic process can be viewed as an infinite sequence of random variables, e.g., X−n, X−n+1, · · · ,X0, X1, X2, · · · , Xn, · · · , whose distribution may be expressed byPr[X1= x1, X2= x2, · · · , Xn= xn] ∼ p(x1, · · · , xn).There are some meaningful and restricted classes of stochastic process.5-1Definition 1 (Stationary Process) hXninis a stationary process ifPr[X1= x1, · · · , Xn= xn] = Pr[X1+l= x1, · · · , Xn+l= xn| {z }time shift by l], ∀n, l, x1, · · · , xn.Definition 2 (Markov Process/Markov Chain) hXninis a Markov chain ifPr[Xn= xn|X1= x1, · · · , Xn−1= xn−1] = Pr[Xn= xn|Xn−1= xn−1], ∀n, x1, · · · , xn.If Xi∈ Ω and Ω is finite, then Pr[Xn= xn|Xn−1= xn−1] is just |Ω|2entries for every n. But, canwe describe it in finite terms? No.Definition 3 (Time Invariant Markov Chain) Markov Chain is time-invariant ifPr[Xn= a|Xn−1= b] = Pr[Xn+l= a|Xn+l−1= b], ∀n, l, a, b ∈ Ω.Time invariant Markov chain can be specified by distribution on X0and probability transition matrixP = [Pij], where Pij= Pr[X2= j|X1= i]. Throughout the rest of lecture, time invariant Markov chainwill be referred to simply as Markov chain (MC).Example 2: Consider the following three-state MC. In this case, P =0 1 00 0 11 0 0. With X0= A, the resulting sequence will be “ABCABCABC · · · .” Note that this is not stationarybecause Pr[X0= A, X1= B, X2= C] = 1 but Pr[X1= A, X2= B, X3= C] = 0. Instead, Pr[X1=B, X2= C, X3= A] = 1Fact 1 For every MC, ∃stationary distribution µ on X0such that µ and P define a stationary process.In the example 2, µ =£131313¤.BecausePr[X1= x1, X2= x2, · · · , Xn= xn]= Pr[X1= x1] · Pr[X2= x2|X1= x1] · · · Pr[Xn= xn|Xn−1= xn−1]= Pr[X1= x1] · Px1x2· · · Pxn−1xn,the overall distribution depends only on the distribution on X1, which implies that the distribution µon X0is stationary if Pr[X1= i] = µi(= Pr[X0= i]).5-2Example 3: Let us consider the following example: In this case, µA= µC= 0, µB= 1 is stationary, but µA= µB= 0, µC= 1 is also stationary.More than one stationary distribution can be problematic, and this situation happens because the MCis reducible.Definition 4 (Reducibility of Markov Chain) 1. Markov chain given by probability transitionmatrix P is reducible if P can be written as·P0P10 P2¸,where P0, P2are square matrices.2. MC is irreducible if it is not reducible.In terms of graph structure, the “irreducible” and ”aperiodic” characteristics can be interpreted as• irreducible - strongly connected, ∃path from each state i to state j.• aperiodic - greatest common divisor of cycle lengths is 1.Theorem 2 (Perron-Frobenius’s Theorem) Every (aperiodic) irreducible Markov chain has a uniquestationary distribution.For stationary distribution, the probability distribution on X1should be the same as µ, the proba-bility distribution of X0. ⇒ Pr[X1= j] =PNi=1µiPij= µi, where N = |Ω| and Ω = {1, 2, · · · , N }. Ifwe use vector-matrix notation,[ µ ]P= [ µ ], (1)and µ corresponds to an eigenvector. For the example 1,P =0.9 0.1 00 2/3 1/32/3 1/3 0.Theorem 2 implies that there exists a unique eigenvector with all entries non-negative. We can computeµ = [µ1µ2µ3] using (1) and µ1+ µ2+ µ3= 1. ⇒ µ = [2032932332].5-33 Entropy Rate of Stochastic ProcessThere are two reasonable notions for measuring the uncertainty of X = hXnin.• Entropy rate:H(X ) = limn→∞1nH(X1, · · · , Xn) if the limit exists.• Entropy0rate:H0(X ) = limn→∞H(Xn|X1, · · · , Xn−1) if the limit exists.Theorem 3 Entropy rate of a stationary stochastic process exists and equals entropy0rate.H(X ) = H0(X ).Proof Idea The following inequality can be used for the proof of the existence of H0(X ).H(Xn|X1, · · · , Xn−1) ≤ H(Xn|X2, · · · , Xn−1) = H(Xn−1|X1, · · · , Xn−1).For complete proof, refer to pp.64-65 of Cover.Theorem 4 If irreducible MC has probability transition matrix P and stationary distribution µ,H(X ) = H0(X ) = −Xi,jµiPijlog Pij. (2)ProofH0(X ) = limn→∞H(Xn|X1, · · · , Xn−1)= limn→∞H(Xn|Xn−1)= H(X2|X1)=XiPr[X1= i] · H(X2|X1= i)= −XiµiXjPijlog Pij.Using (2), H(X ) of the example 1 can be computed:H(X ) =58H(0.9) +38Hµ23¶.AEP for Markov Chain:−1nlog p(X1, · · · , Xn) −→ H(X ).This doesn’t follow from our law of large numbers because random variables may be dependent oneach other.Hidden Markov Model: Now, let us consider the rate of hYninin the example 1. H0(Y ) =limn→∞H(Yn|Y1, · · · , Yn−1), and is bounded byH(Yn|Y1, · · · , Yn−1, X1) ≤ H0(Y ) = limn→∞H(Yn|Y1, · · · , Yn−1) ≤ H(Yn|Y1, · · · , Yn−1) ∀n.5-4(Try to prove the inequality at the left-hand side!) If we denote the interval between the upper and thelower bounds by ²n,²n= H(Yn|Y1, · · · , Yn−1) − H(Yn|Y1, · · · , Yn−1, X1) = I(X1; Yn|Y1, · · · , Yn−1),andMXn=1²n=MXn=1I(X1; Yn|Y1, · · · , Yn−1) ≤
View Full Document