LECTURE 4 Last time: Types of convergence • Weak Law of Large Numbers • Strong Law of Large Numbers • Asymptotic Equipartition Property • Lecture outline Stochastic processes • Markov chains • Entropy rate • Random walks on graphs • Hidden Markov models • Reading: Chapter 4.Stochastic processes A stochastic process is an indexed sequence or r.v.s X0, X1, . . . characterized by the joint PMF PX0,X1,...,Xn (x0, x1, . . . , xn), (x0, x1, . . . , xn) ∈ X n for n = 0, 1, . . .. A stochastic process is stationary if PX0,X1,...,Xn (x0, x1, . . . , xn) = PXl,Xl+1,...,Xl+n (x0, x1, . . . , xn) for every shift l and all (x0, x1, . . . , xn) ∈ X n .Stochastic processes A discrete stochastic process is a Markov chain if PXn|X0,...,Xn−1 �xn|x0, . . . , xn−1� = PXn|Xn−1 �xn|xn−1� for n = 1, 2, . . . and all (x0, x1, . . . , xn) ∈ X n . We deal with time invariant Markov chains Xn: state after n transitions – belongs to a finite set, e.g., {1, . . . , m} – X0 is either given or random (given current state, the past does not mat-ter) pi,j = P(Xn+1 = j | Xn = i) = P(Xn+1 = j | Xn = i, Xn−1, . . . , X0) Markov chain is characterized by probability transition matrix P = [pi,j]Review of Markov chains State occupancy probabilities, given initial state i: ri,j(n) = P(Xn = j X0= i)| Key recursion: mri,j(n) = � ri,k(n − 1)pk,j k=1 With random initial state: mP(Xn = j) = � P(X0= i)ri,j(n) i=1 Does rij converge to something? Does the limit depend on initial state?Review of Markov chains Recurrent and transient states. State i is recurrent if: starting from i, and from wherever you can go, there is a way of returning to i. If not recurrent, called transient. Recurrent class collection of re-current states that “communicate” to each other and to no other state. A recurrent state is periodic if: there is an integer d > 1 such that ri,i(k) = 0 when k is not an integer multiple of d Assume a single class of recurrent states, aperiodic. Then, lim ri,j(n) = πjn→∞ where πj does not depend on the initial conditions lim P(Xn = j X0) = πjn→∞ |π1, . . . , πm can be found as the unique• solution of the balance equations πj = � πkpk,j k together with � πj = 1 jEntropy rate The entropy rate of a stochastic process is 1 lim H(Xn) nn→∞ if it exists For a stationary stochastic process, the en-tropy rate exists and is equal to lim H(XnXn−1)n→∞ |since conditioning decreases entropy and by stationarity, it holds that H(Xn+1|Xn) ≤ H(Xn+1|X2n) = H(Xn|Xn−1) so it reaches a limit (decreasing non-negative sequence) Chain rule 1 1 nnH(Xn) = n � H(Xi|Xi−1) i=1 since the elements in the sum on the RHS reach a limit, that is the limit of the LHSEntropy rate Markov chain entropy rate: lim H(XnXn−1)n→∞ |= lim H(XnXn−1)n→∞ |= H(X2X1) = − � p|i,jπi log(pi,j) i,jRandom walk on graph Consider undirected graph G = (N , E, W) where N , E, W are the nodes, edges and weights. With each edge there is an as-sociated edge weight Wi,j =Wi,j Wj,i Wi = � Wi,j j W = � Wi,j i,j:j>i 2W = � Wi iRandom walk on graph We call a random walk the Markov chain in which the states are the nodes of the graph Wi,jpi,j = Wi Wi πi = 2W Check: �i πi = 1 and � i πipi,j = � i Wi 2W Wi,j Wi = � Wi,j i 2W = Wj 2W = πjRandom walk on graph H(X2X1) = − � π|i � pi,j log(pi,j) i j = − � Wi � Wi,j log �Wi,j � i 2WjWi Wi = − � Wi,j log �Wi,j � 2W Wi i,j = − � Wi,j log �Wi,j � 2W 2W i,j + � Wi,j log � Wi � 2W 2Wi,j = Wi,j log �Wi,j � + � Wi log � Wi � − � 2W 2W 2W 2Wi,j i Entropy rate is difference of two entropies Note: time reversibility for Markov chain that can be represented as random walk on undirected weighted graphHidden Markov models Consider an ALOHA wireless model M users sharing the same radio channel to transmit packets to a base station During each time slot, a packet arrives to a user’s queue with probability p, indepen-dently of the other M − 1 users Also, at the beginning of each time slot, if a user has at least one packet in its queue, it will transmit a packet with probability q, independently of all other users If two packets collide at the receiver, they are not successfully transmitted and remain in their respective queues�Hidden Markov models Let Xi = (N[1]i, N[2]i, . . . , N[M] ) denote the random vector at time i where N[m]i is the number of packets that are in user m’s queue at time i. Xi is a Markov chain. Consider the random vector Yi = (Z[1], Z[2], . . . , Z[M]) where Z[m]i = 1 if user m transmits during time slot i and Z[i] = 0 otherwise Is Yi Markov?Hidden Markov processes Let Xi, X2, . . . be a stationary Markov chain and let Yi = φ(Xi) be a process, each term of which is a function of the corresponding state in the Markov chain Y1, Y2, . . . form a hidden Markov chain, which is not always a Markov chain, but is still stationary What is its entropy rate?� Hidden Markov processes We suspect that the effect of initial infor-mation should decay H(Yn|Yn−1)−H(Yn|Yn−1, X1) = I(X1; Yn|Yn−1) should go to 0 Indeed, nlim I (X1; Yn) = lim � I �X1; YiYi−1� n→∞ n→∞ i=1 |= ∞I �X1; Yi |Yi−1� i=1 since we have an infinite sum in which the terms are non-negative and which is upper bounded by H(X1), the terms must tend to 0MIT OpenCourseWarehttp://ocw.mit.edu 6.441 Information Theory Spring 2010 For information about citing these materials or our Terms of Use, visit:
View Full Document