# WUSTL ESE 523 - ESE523Lect72013(1) (24 pages)

Previewing pages*1, 2, 23, 24*of 24 page document

**View the full content.**## ESE523Lect72013(1)

Previewing pages
*1, 2, 23, 24*
of
actual document.

**View the full content.**View Full Document

## ESE523Lect72013(1)

0 0 95 views

- Pages:
- 24
- School:
- Washington University in St. Louis
- Course:
- Ese 523 - Information Theory

**Unformatted text preview:**

ESE 523 Information Theory Lecture 7 Joseph A O Sullivan Samuel C Sachs Professor Electrical and Systems Engineering Washington University 2120E Green Hall 211 Urbauer Hall 314 935 4173 jao wustl edu 9 25 2013 J A O Sullivan ESE 523 Lecture 7 1 Graphs and entropy Random walks on undirected graphs Random walks on directed graphs Maximum entropy distributions on graphs 3 9 25 2013 J A O Sullivan ESE 523 Lecture 7 Random Walks on Undirected Graphs Assume positive weights are assigned to each edge Assume that a random walk takes place on the graph with one step at each time Assume that the probability that an edge is used given that the walk is at a node on that edge equals the weight of that edge divided by the sum of the weights of all edges attached to that node Find the entropy rate of the resulting random process Comment this is a time invariant Markov chain Comment the Markov chain is reversible the weight matrix is symmetric so backward probabilities are the same as forward Pij Wij W ik k W W ij i j j i 2W Wik i k 4 9 25 2013 J A O Sullivan ESE 523 Lecture 7 Random Walks on Undirected Graphs i W ik k 2W Wi 2W Wij The stationary distribution is Wi Wij i P i i ij i 2W W 2W easily found i Special case all weights equal Wj i i Pij 2W 1 entropy of edge fraction minus the entropy of the node H X P log P i ij ij i j fraction Adjacency matrix 0 1 A 1 0 1 9 25 2013 1 1 0 1 0 1 1 0 1 0 1 0 1 1 0 1 0 0 1 0 J A O Sullivan ESE 523 Lecture 7 H X i j Wij H X H 2W Wij 1 Eij Wij 2W log Wij Wi Wi H 2W 5 Undirected Graphs Adjacency Matrix Maximum Entropy Walk Compute the adjacency matrix A ones if an edge exists here A is symmetric If the adjacency matrix raised to a power k is strictly positive then the resulting Markov chain is strongly connected irreducible and aperiodic Comment the adjacency matrix computes combinatorics how many sequences walks exist A Aij Aij 1 Wij 0 Otherwise Aij 0 N ij k A k ij 1 log N ij k log max k k max largest eigenvalue of A of length k starting at node i and ending at node j The maximum entropy distribution puts approximately equal probability on all possible sequences 6 9 25 2013 J A O Sullivan ESE 523 Lecture 7 YouTube Model sessions as random walks on an undirected graph For each session create an edge with weight 1 between any two video clips viewed in that session Global Computation Refinement weight those viewed consecutively higher Save data for this user Send data for global computations Refinement save all triples of video clips viewed in sessions Refinement assign higher weights for likes Weights on edges equal total weight of edges created number of sessions in which the two video clips were viewed together Compute probabilities on edges given a certain video clip weight of edge divided by total weights connected to the edge User recommendations list related video clips probabilistically report probabilities 9 25 2013 J A O Sullivan ESE 523 Lecture 7 7 YouTube Extended Ideas Track similarity of your edge graphs to other users Assign personal weights for you based on other users random walk histories Need to allow new videos to come in need for content based indexing Base solely on content not number of views Allows for other video clips to be recommended not by views but by similarity of content YouTube can report basis for recommendations as a feature 8 9 25 2013 J A O Sullivan ESE 523 Lecture 7 Random walks on directed graphs 0 1 0 Consider problem 4 16 Suppose there is at least one 0 and at most A 1 0 1 two zeros between adjacent 1 s 1 0 0 Last two symbols constitute a state N ij k A k associate a state with a node 01 10 10 00 01 00 01 state 1 2 state 2 3 1 state 3 1 Adjacency matrix is not symmetric but still computes combinatorics Maximum entropy distribution achieves entropy rate equal to the logarithm of the largest eigenvalue of the adjacency matrix ij 1 log N ij k log max k k max largest eigenvalue of A max 1 3247 log max 0 4057 9 9 25 2013 J A O Sullivan ESE 523 Lecture 7 Maximum Entropy Distributions on The approach shown here is general Graphs The maximum entropy Markov chain is determined by the adjacency matrix 0 1 0 The log of the largest eigenvalue equals the maximum entropy rate A 1 0 1 The result is a nearly uniform distribution 1 0 0 over all possible sequences max largest eigenvalue of A 1 3247 max 0 5481 0 7265 0 4140 1 0 0 1 1 diag max A diag max 0 5698 0 0 4302 Pmax max 1 0 0 H max X log max 0 4057 10 9 25 2013 J A O Sullivan ESE 523 Lecture 7 Outline Data Compression Definition and properties of source codes Kraft inequality Optimal codes Minimum expected codeword length is within one of entropy Nonsingular extension uniquely decodable prefix or instantaneous Block coding decreases one bit penalty Cost of using q x to design a code when p x is truth is D p q Huffman codes Shannon Fano Elias coding Arithmetic coding 11 9 25 2013 J A O Sullivan ESE 523 Lecture 7 Data Compression Introduction D is the set of all finite length sequences from a D ary alphabet Definition A source code C is a mapping from X to D C x is the codeword for x its length is l x Definition A code is nonsingular if xi xj implies C xi C xj Definition The extension C of a code C is the mapping from X to D defined by concatenation C x1 x2 xn C x1 C x2 C xn Definition A code is uniquely decodable if its extension is nonsingular Definition A code is a prefix code or an instantaneous code if no codeword is a prefix of any other codeword Comment Instantaneous codes are self punctuating 12 9 25 2013 J A O Sullivan ESE 523 Lecture 7 Data Compression Kraft Inequality Common good codes Are instantaneous Have shorter codewords assigned to higher probability outcomes Theorem Kraft Inequality For any instantaneous code over an alphabet of size D the codeword lengths must satisfy the inequality lengths l1 l2 lm m li D 1 i 1 Conversely given a set of codeword lengths that satisfy this inequality there exists an instantaneous code with these codeword lengths 13 9 25 2013 J A O Sullivan ESE 523 Lecture 7 Data Compression Kraft Inequality Extended Theorem Kraft Inequality For any code over an alphabet of size D instantaneous the codeword lengths must satisfy …

View Full Document