New version page

Graphical models, message-passing algorithms, and variational methods

Upgrade to remove ads

This preview shows page 1-2-3-4-5-6-42-43-44-45-46-47-86-87-88-89-90-91 out of 91 pages.

Save
View Full Document
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 91 pages.
Access to all documents
Download any document
Ad free experience

Upgrade to remove ads
Unformatted text preview:

Graphical models, message-passing algorithms, andvariational methods: Part IIMartin WainwrightDepartment of Statistics, andDepartment of Electrical Engineering and Computer Science,UC Berkeley, Berkeley, CA USAEmail: [email protected]{stat,eecs}.berkeley.eduFor further information (tutorial slides, films of course lectures), see:www.eecs.berkeley.edu/ewainwrig/1Introduction• graphical models are used and studied in various applied statisticaland computational fields:– machine learning and artificial intelligence– computational biology– statistical signal/image processing– communication and information theory– statistical physics– .....• based on correspondences between graph theory and probabilitytheory• important but difficult problems:– computing likelihoods, marginal distributions, modes– estimating model parameters and structure from (noisy) data2Outline1. Recap(a) Background on graphical models(b) Some applications and challenging problems(c) Illustrations of some message-passing algorithms2. Exponential families and variational methods(a) What is a variational method (and why should I care)?(b) Graphical models as exponential families(c) Variational representations from conjugate duality3. Exact techniques as variational methods(a) Gaussian inference on arbitrary graphs(b) Belief-propagation/sum-product on trees (e.g., Kalman filter; α-β alg.)(c) Max-product on trees (e.g., Viterbi)4. Approximate techniques as variational methods(a) Mean field and variants(b) Belief propagation and extensions on graphs with cycles(c) Convex methods and upper bounds(d) Tree-reweighted max-product and linear programs3Undirected graphical modelsBased on correspondences between graphs and random variables.• given an undirected graph G = (V, E), each node s has anassociated random variable Xs• for each subset A ⊆ V , define XA:= {Xs, s ∈ A}.PSfrag replacements123 4567PSfrag replacementsABSMaximal cliques (123), (345), (456), (47) Vertex cutset S• a clique C ⊆ V is a subset of vertices all joined by edges• a vertex cutset is a subset S ⊂ V whose removal breaks the graphinto two or more pieces4Factorization and Markov propertiesThe graph G can be used to impose constraints on the random vectorX = XV(or on the distribution p) in different ways.Markov property: X is Markov w.r.t G if XAand XBareconditionally indpt. given XSwhenever S separates A and B.Factorization: The distribution p factorizes according to G if it canbe expressed as a product over cliques:p(x) =1ZYC∈Cexp©θC(xC)ª|{z }compatibility function on clique CTheorem: (Hammersley-Clifford) For strictly positive p(·), theMarkov property and the Factorization property are equivalent.5Challenging computational problemsFrequently, it is of interest to compute various quantities associatedwith an undirected graphical model:(a) the log normalization constant log Z(b) local marginal distributions or other local statistics(c) modes or most probable configurationsRelevant dimensions often grow rapidly in graph size =⇒ majorcomputational challenges.Example: Consider a naive approach to computing the normalization constant forbinary random variables:Z =Xx∈{0,1}nYC∈Cexp˘θC(xC)¯Complexity scales exponentially as 2n.6Gibbs sampling in the Ising model• binary variables on a graph G = (V, E) with pairwise interactions:p(x; θ) ∝ exp©Xs∈Vθsxs+X(s,t)∈Eθstxsxtª• Updatex(m+1)sstochastically based on valuesx(m)N (s)at neighbors:1. Choose s ∈ V at random.2. Sample u ∼ U(0, 1) and updatex(m+1)s=8><>:1 if u ≤ {1 + exp[−(θs+Pt∈N (s)θstx(m)t)]}−10 otherwise,• sequence {x(m)} converges (in a stochastic sense) to a sample fromp(x; θ)7Mean field updates in the Ising model• binary variables on a graph G = (V, E) with pairwise interactions:p(x; θ) ∝ exp©Xs∈Vθsxs+X(s,t)∈Eθstxsxtª• simple (deterministic) message-passing algorithm involvingvariational parameters νs∈ (0, 1) at each nodePSfrag replacements12345ν2ν3ν4ν51. Choose s ∈ V at random.2. Updateνsbased on neighbors{νt, t ∈ N (s)}:νs←−˘1 + exp[−(θs+Xt∈N (s)θstνt)]¯−1Questions:• principled derivation? • convergence and accuracy?8Sum and max-product algorithms: On treesExact for trees, but approximate for graphs with cycles.PSfrag replacementsTtTuTvTwtwuvstMutMwtMvtMtsMts≡ message from node t to sΓ(t) ≡ neighbors of node tSum-product:for marginals(generalizes α − β algorithm; Kalman filter)Max-product: for MAP configurations(generalizes Viterbi algorithm)Update: Mts(xs) ←Px0t∈Xtnexphθst(xs, x0t) + θt(x0t)iQv∈Γ(t)\sMvt(xt)o.Marginals:p(xs; θ) ∝ exp{θt(xt)}Qt∈Γ(s)Mts(xs).9Sum and max-product: On graphs with cycles• what about applying same updates on graph with cycles?• updates need not converge(effect of cycles)• seems naive, but remarkably successful in many applicationsPSfrag replacementsTtTuTvTwtwuvstMutMwtMvtMtsMts≡ message from node t to sΓ(t) ≡ neighbors of node tSum-product:for marginalsMax-product:for modesQuestions: • meaning of these updates for graphs with cycles?• convergence? accuracy of resulting “marginals”?10Outline1. Recap(a) Background on graphical models(b) Some applications and challenging problems(c) Illustrations of some message-passing algorithms2. Exponential families and variational methods(a) What is a variational method (and why should I care)?(b) Graphical models as exponential families(c) Variational representations from conjugate duality3. Exact techniques as variational methods(a) Gaussian inference on arbitrary graphs(b) Belief-propagation/sum-product on trees (e.g., Kalman filter; α-β alg.)(c) Max-product on trees (e.g., Viterbi)4. Approximate techniques as variational methods(a) Mean field and variants(b) Belief propagation and extensions(c) Convex methods and upper bounds(d) Tree-reweighted max-product and linear programs11Variational methods• “variational”: umbrella term for optimization-based formulation ofproblems, and methods for their solution• historical roots in the calculus of variations• modern variational methods encompass a wider class of methods(e.g., dynamic programming; finite-element methods)Variational principle: Representation of a quantity of interestbu asthe solution of an optimization problem.1. allows the quantitybu to be studied through the lens of theoptimization problem2. approximations tobu can be obtained by approximating or relaxingthe variational principle12Illustration: A simple


Download Graphical models, message-passing algorithms, and variational methods
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Graphical models, message-passing algorithms, and variational methods and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Graphical models, message-passing algorithms, and variational methods 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?