MIT 3 11 - Semidefinite relaxations for approximate inference on graphs with cycles

Unformatted text preview:

Semidefinite relaxations for approximate inferenceon graphs with cyclesMartin J. Wainwright and Michael I. JordanReport No. UCB/CSD-3-1226January 2003Computer Science Division (EECS)University of CaliforniaBerkeley, California 94720Semidefinite relaxations for approximate inferenceon graphs with cyclesMartin J. Wainwright, Michael I. JordanElectrical Engineering & CS, UC Berkeley, CS & Statistics Depts., UC [email protected] [email protected] present a new method for calculating approximate marginals for probability distributionsdefined by graphs with cycles, based on a Gaussian entropy bound combined with a semidefiniteouter bound on the marginal polytope. This combination leads to a log-determinant maximiza-tion problem that can be solved by efficient interior point methods [13]. As with the Betheapproximation and its generalizations [18], the optimizing arguments of this problem can betaken as approximations to the exact marginals. In contrast to Bethe/Kikuchi approaches,our variational problem is strictly convex and so has a unique global optimum. An additionaldesirable feature is that the value of the optimal solution is guaranteed to provide an upperbound on the log partition function. Such upper bounds are of interest in their own right (e.g.,for parameter estimation, large deviations exponents, combinatorial enumeration). Finally, weshow that taking the zero-temperature limit of our log-determinant relaxation recovers a classof well-known semidefinite relaxations for integer programming [e.g., 6].Keywords: Graphical models; Markov random field; factor graph; approximate inference; sum-productalgorithm; semidefinite constraints; determinant maximization; variational method; marginal polytope.1 IntroductionGiven a probability distribution defined by a graphical model (e.g., Markov random field, factorgraph), one important problem is the computation of marginal distributions. Although highlyefficient algorithms exist for trees, exact solutions are prohibitively complex for more general graphsof any substantial size. This difficulty motivates the use of algorithms for computing approximationsto marginal distributions, a problem to which we refer as approximate inference. One widely-usedalgorithm is the belief propagation or sum-product algorithm [11]. As shown by Yedidia et al. [18],it can be interpreted as a method for attempting to solve a variational problem wherein the exactentropy is replaced by the Bethe approximation. Moreover, Yedidia et al. proposed extensions tothe Bethe approximation based on clustering operations; such generalizations have been furtherexplored in subsequent work by various researchers [e.g., 9, 10, 15].An unattractive feature of the Bethe approach and its extensions is that with certain excep-tions [e.g., 10, 9], the associated variational problems are typically not convex, thus leading toalgorithmic complications, and also raising the possibility of multiple local optima. Secondly, incontrast to other variational methods (e.g., mean field [7]), the optimal values of Bethe-type vari-ational problems fail to provide bounds on the log partition function. This function arises invarious contexts, including approximate parameter estimation, large deviations, and combinatorialenumeration, so that such bounds are of interest in their own right.In previous work [14], we derived a class of upper bounds on the log partition function via vari-ational problems specified by “convexified” Bethe/Kikuchi entropy approximations. This paperintroduces a new class of upper bounds based on solving a log-determinant maximization problem.Our derivation relies on a Gaussian upper bound on the discrete entropy of a suitably regularizedrandom vector, and a semidefinite outer bound on the set of valid marginal distributions. The1resulting variational problem has a unique optimum that can be found by efficient interior pointmethods [13]. As with the Bethe/Kikuchi approximations and sum-product algorithms, the opti-mizing arguments of this problem can be taken as approximations to the marginal distributions ofthe underlying graphical model. Moreover, taking the “zero-temperature” limit recovers a class ofwell-known semidefinite programming relaxations for integer programming problems [e.g., 6].2 Problem set-upWe consider an undirected graph G = (V, E) with n = |V | nodes. Associated with each vertexs ∈ V is a random variable xstaking values in the discrete space X = {0, 1, . . . , m − 1}. We letx = {xs| s ∈ V } denote a random vector taking values in the Cartesian product space Xn. Ouranalysis makes use of the following exponential representation of a graph-structured distributionp(x). For some index set I, we let φ = {φα| α ∈ I} denote a collection of potential functionsassociated with the cliques of G, and let θ = {θα| α ∈ I} be a vector of parameters associatedwith these potential functions. The exponential family determined by φ is the following collection:p(x; θ) = expnXαθαφα(x) − Φ(θ)o(1a)Φ(θ) = logXx∈XnexpnXαθαφα(x)o(1b)Here Φ(θ) is the log partition function that serves to normalize the distribution. In a minimal repre-sentation, the functions {φα} are affinely independent, and d = |I| corresponds to the dimension ofthe family. For example, one minimal representation of a binary-valued random vector on a graphwith pairwise cliques is the standard Ising model, in which φ = {xs| s ∈ V } ∪ { xsxt| (s, t) ∈ E}.Here the index set I = V ∪ E, and d = n + |E|. In order to incorporate higher order interactions,we simply add higher degree monomials (e.g., xsxtxufor a third order interaction) to the collectionof potential functions. Similar representations exist for discrete processes on alphabets with m > 2elements [e.g., 1].2.1 Duality and marginal polytopesIt is well-known that Φ is convex in terms of θ, and strictly so for a minimal representation [1].Accordingly, it is natural to consider its conjugate dual function, which is defined by the relation:Φ∗(µ) = supθ∈Rd{hµ, θi − Φ(θ)}. (2)Here the vector of dual variables µ is the same dimension as exponential parameter θ (i.e., µ ∈ Rd).It is straightforward to show that the partial derivatives of Φ with respect to θ correspond tocumulants of φ(x); in particular, the first order derivatives are marginals:∂Φ∂θα(θ) =Xx∈Xnp(x; θ)φα(x) = Eθ[φα(x)]. (3)In order to compute Φ∗(bµ) for a given bµ, we


View Full Document

MIT 3 11 - Semidefinite relaxations for approximate inference on graphs with cycles

Download Semidefinite relaxations for approximate inference on graphs with cycles
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Semidefinite relaxations for approximate inference on graphs with cycles and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Semidefinite relaxations for approximate inference on graphs with cycles 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?