Statistical Machine learningLecture 22Yuan (Alan) QiOutline• Review of Inference on chains, factor graphs• Sum-product algorithm, loopy belief propagation • Max-sum algorithmInference on a ChainInference on a ChainTreesUndirected TreeDirected Tree PolytreeFactor GraphsFactor Graphs from Directed GraphsFactor Graphs from Undirected GraphsThe Sum-Product Algorithm (1)Objective:i. to obtain an efficient, exact inference algorithm for finding marginals;ii. in situations where several marginals are required, to allow computations to be shared efficiently.Key idea: Distributive LawThe Sum-Product Algorithm (2)The Sum-Product Algorithm (3)Why this is true?The Sum-Product Algorithm (4)The Sum-Product Algorithm (5)The Sum-Product Algorithm (6)The Sum-Product Algorithm (7)InitializationThe Sum-Product Algorithm (8)To compute local marginals:• Pick an arbitrary node as root• Compute and propagate messages from the leaf nodes to the root, storing received messages at every node.• Compute and propagate messages from the root to the leaf nodes, storing received messages at every node.• Compute the product of received messages at each node for which the marginal is required, and normalize if necessary.Sum-Product: Example (1)Sum-Product: Example (2)Sum-Product: Example (3)Sum-Product: Example (4)The Junction Tree Algorithm• Exact inference on general graphs.• Works by turning the initial graph into a junction tree and then running a sum-product-like algorithm.• Intractable on graphs with large cliques.Loopy Belief Propagation• Sum-Product on general graphs.• Initial unit messages passed across all links, after which messages are passed around until convergence (not guaranteed!).• Approximate but tractable for large graphs.• Sometime works well, sometimes not at all.The Max-Sum Algorithm (1)Objective: an efficient algorithm for finding i. the value xmaxthat maximises p( x);ii. the value of p(xmax).In general, maximum marginals joint maximum.The Max-Sum Algorithm (2)Maximizing over a chain (max-product)The Max-Sum Algorithm (3)Generalizes to tree-structured factor graphmaximizing as close to the leaf nodes as possibleThe Max-Sum Algorithm (4)Max-Product Max-SumFor numerical reasons, useAgain, use distributive lawThe Max-Sum Algorithm (5)Initialization (leaf nodes)RecursionThe Max-Sum Algorithm (6)Termination:The Max-Sum Algorithm (7)BacktrackingThe previous updates give us the maximum of the model probability. But how about finding the global configuration that gives the maximal probability?Global consistency?BacktrackingThe previous updates give us the maximum of the model probability. But how about ?We need to store a quantity to tell us how to trace back to the value that maximize the previous sub-problem.This is know as back-tracking.Related algorithmsDynamic programmingThe Viterbi algorithm for Hidden Markov
View Full Document