Unformatted text preview:

Semidefinite relaxations for approximate inference on graphs with cycles Martin J Wainwright and Michael I Jordan Report No UCB CSD 3 1226 January 2003 Computer Science Division EECS University of California Berkeley California 94720 Semidefinite relaxations for approximate inference on graphs with cycles Martin J Wainwright Electrical Engineering CS UC Berkeley martinw eecs berkeley edu Michael I Jordan CS Statistics Depts UC Berkeley jordan cs berkeley edu Abstract We present a new method for calculating approximate marginals for probability distributions defined by graphs with cycles based on a Gaussian entropy bound combined with a semidefinite outer bound on the marginal polytope This combination leads to a log determinant maximization problem that can be solved by efficient interior point methods 13 As with the Bethe approximation and its generalizations 18 the optimizing arguments of this problem can be taken as approximations to the exact marginals In contrast to Bethe Kikuchi approaches our variational problem is strictly convex and so has a unique global optimum An additional desirable feature is that the value of the optimal solution is guaranteed to provide an upper bound on the log partition function Such upper bounds are of interest in their own right e g for parameter estimation large deviations exponents combinatorial enumeration Finally we show that taking the zero temperature limit of our log determinant relaxation recovers a class of well known semidefinite relaxations for integer programming e g 6 Keywords Graphical models Markov random field factor graph approximate inference sum product algorithm semidefinite constraints determinant maximization variational method marginal polytope 1 Introduction Given a probability distribution defined by a graphical model e g Markov random field factor graph one important problem is the computation of marginal distributions Although highly efficient algorithms exist for trees exact solutions are prohibitively complex for more general graphs of any substantial size This difficulty motivates the use of algorithms for computing approximations to marginal distributions a problem to which we refer as approximate inference One widely used algorithm is the belief propagation or sum product algorithm 11 As shown by Yedidia et al 18 it can be interpreted as a method for attempting to solve a variational problem wherein the exact entropy is replaced by the Bethe approximation Moreover Yedidia et al proposed extensions to the Bethe approximation based on clustering operations such generalizations have been further explored in subsequent work by various researchers e g 9 10 15 An unattractive feature of the Bethe approach and its extensions is that with certain exceptions e g 10 9 the associated variational problems are typically not convex thus leading to algorithmic complications and also raising the possibility of multiple local optima Secondly in contrast to other variational methods e g mean field 7 the optimal values of Bethe type variational problems fail to provide bounds on the log partition function This function arises in various contexts including approximate parameter estimation large deviations and combinatorial enumeration so that such bounds are of interest in their own right In previous work 14 we derived a class of upper bounds on the log partition function via variational problems specified by convexified Bethe Kikuchi entropy approximations This paper introduces a new class of upper bounds based on solving a log determinant maximization problem Our derivation relies on a Gaussian upper bound on the discrete entropy of a suitably regularized random vector and a semidefinite outer bound on the set of valid marginal distributions The 1 resulting variational problem has a unique optimum that can be found by efficient interior point methods 13 As with the Bethe Kikuchi approximations and sum product algorithms the optimizing arguments of this problem can be taken as approximations to the marginal distributions of the underlying graphical model Moreover taking the zero temperature limit recovers a class of well known semidefinite programming relaxations for integer programming problems e g 6 2 Problem set up We consider an undirected graph G V E with n V nodes Associated with each vertex s V is a random variable xs taking values in the discrete space X 0 1 m 1 We let x xs s V denote a random vector taking values in the Cartesian product space X n Our analysis makes use of the following exponential representation of a graph structured distribution p x For some index set I we let I denote a collection of potential functions associated with the cliques of G and let I be a vector of parameters associated with these potential functions The exponential family determined by is the following collection nX o p x exp x 1a X log exp x X n nX x o 1b Here is the log partition function that serves to normalize the distribution In a minimal representation the functions are affinely independent and d I corresponds to the dimension of the family For example one minimal representation of a binary valued random vector on a graph with pairwise cliques is the standard Ising model in which xs s V xs xt s t E Here the index set I V E and d n E In order to incorporate higher order interactions we simply add higher degree monomials e g xs xt xu for a third order interaction to the collection of potential functions Similar representations exist for discrete processes on alphabets with m 2 elements e g 1 2 1 Duality and marginal polytopes It is well known that is convex in terms of and strictly so for a minimal representation 1 Accordingly it is natural to consider its conjugate dual function which is defined by the relation sup h i 2 Rd Here the vector of dual variables is the same dimension as exponential parameter i e R d It is straightforward to show that the partial derivatives of with respect to correspond to cumulants of x in particular the first order derivatives are marginals X p x x E x 3 x X n In order to compute b for a given b we take the derivative with respect to of the quantity within curly braces in equation 2 Setting this derivative to zero and making use of equation 3 yields defining conditions for the vector b attaining the optimum in equation 2 b E b x 2 I 4 Since equation 4 involves taking an expectation the dual variables have a natural interpretation as mean parameters For example given the standard minimal representation


View Full Document

MIT 3 11 - Semidefinite relaxations for approximate inference on graphs with cycles

Loading Unlocking...
Login

Join to view Semidefinite relaxations for approximate inference on graphs with cycles and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Semidefinite relaxations for approximate inference on graphs with cycles and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?