Graphical models, message-passing algorithms, andvariational methods: Part IIMartin WainwrightDepartment of Statistics, andDepartment of Electrical Engineering and Computer Science,UC Berkeley, Berkeley, CA USAEmail: [email protected]{stat,eecs}.berkeley.eduFor further information (tutorial slides, films of course lectures), see:www.eecs.berkeley.edu/ewainwrig/1Introduction• graphical models are used and studied in various applied statisticaland computational fields:– machine learning and artificial intelligence– computational biology– statistical signal/image processing– communication and information theory– statistical physics– .....• based on correspondences between graph theory and probabilitytheory• important but difficult problems:– computing likelihoods, marginal distributions, modes– estimating model parameters and structure from (noisy) data2Outline1. Recap(a) Background on graphical models(b) Some applications and challenging problems(c) Illustrations of some message-passing algorithms2. Exponential families and variational methods(a) What is a variational method (and why should I care)?(b) Graphical models as exponential families(c) Variational representations from conjugate duality3. Exact techniques as variational methods(a) Gaussian inference on arbitrary graphs(b) Belief-propagation/sum-product on trees (e.g., Kalman filter; α-β alg.)(c) Max-product on trees (e.g., Viterbi)4. Approximate techniques as variational methods(a) Mean field and variants(b) Belief propagation and extensions on graphs with cycles(c) Convex methods and upper bounds(d) Tree-reweighted max-product and linear programs3Undirected graphical modelsBased on correspondences between graphs and random variables.• given an undirected graph G = (V, E), each node s has anassociated random variable Xs• for each subset A ⊆ V , define XA:= {Xs, s ∈ A}.PSfrag replacements123 4567PSfrag replacementsABSMaximal cliques (123), (345), (456), (47) Vertex cutset S• a clique C ⊆ V is a subset of vertices all joined by edges• a vertex cutset is a subset S ⊂ V whose removal breaks the graphinto two or more pieces4Factorization and Markov propertiesThe graph G can be used to impose constraints on the random vectorX = XV(or on the distribution p) in different ways.Markov property: X is Markov w.r.t G if XAand XBareconditionally indpt. given XSwhenever S separates A and B.Factorization: The distribution p factorizes according to G if it canbe expressed as a product over cliques:p(x) =1ZYC∈Cexp©θC(xC)ª|{z }compatibility function on clique CTheorem: (Hammersley-Clifford) For strictly positive p(·), theMarkov property and the Factorization property are equivalent.5Challenging computational problemsFrequently, it is of interest to compute various quantities associatedwith an undirected graphical model:(a) the log normalization constant log Z(b) local marginal distributions or other local statistics(c) modes or most probable configurationsRelevant dimensions often grow rapidly in graph size =⇒ majorcomputational challenges.Example: Consider a naive approach to computing the normalization constant forbinary random variables:Z =Xx∈{0,1}nYC∈Cexp˘θC(xC)¯Complexity scales exponentially as 2n.6Gibbs sampling in the Ising model• binary variables on a graph G = (V, E) with pairwise interactions:p(x; θ) ∝ exp©Xs∈Vθsxs+X(s,t)∈Eθstxsxtª• Updatex(m+1)sstochastically based on valuesx(m)N (s)at neighbors:1. Choose s ∈ V at random.2. Sample u ∼ U(0, 1) and updatex(m+1)s=8><>:1 if u ≤ {1 + exp[−(θs+Pt∈N (s)θstx(m)t)]}−10 otherwise,• sequence {x(m)} converges (in a stochastic sense) to a sample fromp(x; θ)7Mean field updates in the Ising model• binary variables on a graph G = (V, E) with pairwise interactions:p(x; θ) ∝ exp©Xs∈Vθsxs+X(s,t)∈Eθstxsxtª• simple (deterministic) message-passing algorithm involvingvariational parameters νs∈ (0, 1) at each nodePSfrag replacements12345ν2ν3ν4ν51. Choose s ∈ V at random.2. Updateνsbased on neighbors{νt, t ∈ N (s)}:νs←−˘1 + exp[−(θs+Xt∈N (s)θstνt)]¯−1Questions:• principled derivation? • convergence and accuracy?8Sum and max-product algorithms: On treesExact for trees, but approximate for graphs with cycles.PSfrag replacementsTtTuTvTwtwuvstMutMwtMvtMtsMts≡ message from node t to sΓ(t) ≡ neighbors of node tSum-product:for marginals(generalizes α − β algorithm; Kalman filter)Max-product: for MAP configurations(generalizes Viterbi algorithm)Update: Mts(xs) ←Px0t∈Xtnexphθst(xs, x0t) + θt(x0t)iQv∈Γ(t)\sMvt(xt)o.Marginals:p(xs; θ) ∝ exp{θt(xt)}Qt∈Γ(s)Mts(xs).9Sum and max-product: On graphs with cycles• what about applying same updates on graph with cycles?• updates need not converge(effect of cycles)• seems naive, but remarkably successful in many applicationsPSfrag replacementsTtTuTvTwtwuvstMutMwtMvtMtsMts≡ message from node t to sΓ(t) ≡ neighbors of node tSum-product:for marginalsMax-product:for modesQuestions: • meaning of these updates for graphs with cycles?• convergence? accuracy of resulting “marginals”?10Outline1. Recap(a) Background on graphical models(b) Some applications and challenging problems(c) Illustrations of some message-passing algorithms2. Exponential families and variational methods(a) What is a variational method (and why should I care)?(b) Graphical models as exponential families(c) Variational representations from conjugate duality3. Exact techniques as variational methods(a) Gaussian inference on arbitrary graphs(b) Belief-propagation/sum-product on trees (e.g., Kalman filter; α-β alg.)(c) Max-product on trees (e.g., Viterbi)4. Approximate techniques as variational methods(a) Mean field and variants(b) Belief propagation and extensions(c) Convex methods and upper bounds(d) Tree-reweighted max-product and linear programs11Variational methods• “variational”: umbrella term for optimization-based formulation ofproblems, and methods for their solution• historical roots in the calculus of variations• modern variational methods encompass a wider class of methods(e.g., dynamic programming; finite-element methods)Variational principle: Representation of a quantity of interestbu asthe solution of an optimization problem.1. allows the quantitybu to be studied through the lens of theoptimization problem2. approximations tobu can be obtained by approximating or relaxingthe variational principle12Illustration: A simple
or
We will never post anything without your permission.
Don't have an account? Sign up