11/6/20081Variational InferenceAmr AhmedNov. 6th2008Outline• Approximate Inference• Variational inference formulation– Mean Field• Examples– Structured VI• Examples11/6/20082Approximate Inference• Exact inference is exponential in clique size• Posterior is highly peaked– Can be approximated by a simpler distribution• Formulate inference as an optimization problem– Define an objective: how good is Q– Define a family of simpler distributions to search over– Find Q* that best approximate PAll DistributionsPQ*Tractable familyApproximate Inference• Exact inference is exponential in clique size• Posterior inference is highly peaked– Can be approximated by a simpler distribution• Formulate inference as an optimization problem– Define an objective: how good is Q– Define a family of simpler distributions to search over– Find Q* that best approximate P• Today we will cover variational Inference – Just a possible way of such a formulation• There are many other ways– Variants of loopy BP (later in the semester)11/6/20083What is Variational Inference?DifficultySATGradeHappyJobCoherenceLetterIntelligenceP• Posterior “hard” to compute•We know the CPTs (factors) DifficultySATGradeHappyCoherenceLetterIntelligenceQ•Tractable Posterior Family• Set CPT (factors)•Get distribution, Q()PQKLQ Min QQ*∈=•Fully instantiated distribution Q*•Enables exact Inference • Has low tree-width• Clique tree, variable elimination , etc.VI Questions• Which family to choose– Basically we want to remove some edges• Mean field: fully factorized• Structured : Keep tractable sub-graphs• How to carry the optimization• Assume P is a Markov network – Product of factors (that is all)()PQKLQ Min QQ*∈=11/6/20084Outline• Approximate Inference• Variational inference formulation– Mean Field• Examples– Structured VI• ExamplesMean-field Variational InferenceDifficultySATGradeHappyJobCoherenceLetterIntelligenceP• Posterior “hard” to compute•We know the CPTs (factors) DifficultySATGradeHappyCoherenceLetterIntelligenceQ•Tractable Posterior Family• Set CPT (factors)•Get distribution, Q()PQKLQ Min QQ*∈=•Fully instantiated distribution Q*•Enables exact Inference11/6/2008510-708 – Carlos Guestrin 2006-20089D(q||p) for mean field –KL the reverse direction: cross-entropy term• p:• q:The Energy Functional• Theorem : • Where energy functional:• Our problem now isMinimizeMaximize[]QPFFXQQixi, Max Q1)(Q*∑=∈=( )∏=iiiCZPφ1()∏=jjjxqQ≡You getLower bound on lnZ[]QQPFF∀≤,lnZMaximizing F[P,Q] tighten the boundAnd gives better prob. estimatesTractableBy construction11/6/20086• Our problem now is• Theorem: Q is a stationary point of mean field approximation iff for each j: []QPFFXQQixi, Max Q1)(Q*∑=∈=TractableBy constructionThe Energy Functional( )∏=iiiCZPφ1()∏=jjjxqQOutline• Approximate Inference• Variational inference formulation– Mean Field• Examples– Structured VI• Examples11/6/20087ExampleX1X2X3X4X5X6X7X8X9X10X11X12P: Pairwise MNQ : fully factorized MNX1X2X3X4X5X6X7X8X9X10X11X12()()∏∈∝EjijiXX,,XPφ()()∏=iiiXqXQ- Given the factors in P, we want to get the factors for Q.- Iterative procedure. Fix all q-i, compute qivia the above equation- Iterate until you reach a fixed pointExampleX1X2X3X4X5X6X7X8X9X10X11X12P: Pairwise MNQ : fully factorized MNX1X2X3X4X5X6X7X8X9X10X11X12( )( )( )[ ]∝∑∈−iscopeXiiXUQiiiiiiUxExq][:}{,lnexpφφφφφ( )( )()[]( )( )[ ]+∝−−51}{},{21}{},{1,ln,lnexp5121xxExxExqiiXXXQXXXQiφφ()()∏∈∝EjijiXX,,XPφ11/6/20088ExampleX1X2X3X4X5X6X7X8X9X10X11X12P: Pairwise MNQ : fully factorized MNX1X2X3X4X5X6X7X8X9X10X11X12( )( )( )[ ]∝∑∈−iscopeXiiXUQiiiiiiUxExq][:}{,lnexpφφφφφ()( )()[]( )()[]{}51211,ln,lnexp52xxExxExqXqXqiφφ+∝( ) ( ) ( ) ( )+∝∑∑5251552122,ln,lnexpxxxxxqxxxqφφ()()∏∈∝EjijiXX,,XPφExampleX1X2X3X4X5X6X7X8X9X10X11X12P: Pairwise MNQ : fully factorized MNX2X3X4X5X6X7X8X9X10X11X12( )( )( )[ ]∝∑∈−iscopeXiiXUQiiiiiiUxExq][:}{,lnexpφφφφφIn your homeworkX1()()∏∈∝EjijiXX,,XPφ11/6/20089ExampleX1X2X3X4X5X6X7X8X9X10X11X12P: Pairwise MNQ : fully factorized MNX2X3X4X5X6X7X8X9X10X11X12()()∏∈=EjijiXX,,XPφ( )( )( )[ ]∝∑∈−iscopeXiiXUQiiiiiiUxExq][:}{,lnexpφφφφφX1-q(X6) get to be tied with q(xi) for all xi that appear in a factor with it in P-i.e. fixed point equations are tied according to P- What q(x6) gets is expectations under these q(xi) of how the factor looks like- can be somehow interpreted as message passing as well (but we won’t cover this)IntuitivelyExampleX1X2X3X4X5X6X7X8X9X10X11X12P: Pairwise MNQ : fully factorized MNX2X5X6X7X10()()∏∈=EjijiXX,,XPφ( )()[]26,ln2xxEXqφ( )()[]610,ln10xxEXqφ( )()[]56,ln5xxEXqφ( )()[]67,ln7xxEXqφ( )( )( )[ ]∝∑∈−iscopeXiiXUQiiiiiiUxExq][:}{,lnexpφφφφφ-q(X6) get to be tied with q(xi) for all xi that appear in a factor with it in P-i.e. fixed point equations are tied according to P- What q(x6) gets is expectations under these q(xi) of how the factor looks like- can be somehow interpreted as message passing as well (but we won’t cover this)Intuitively11/6/200810Outline• Approximate Inference• Variational inference formulation– Mean Field• Examples– Structured VI• Examples• Inference in LDA (time permits)Structured Variational InferenceDifficultySATGradeHappyJobCoherenceLetterIntelligenceP• Posterior “hard” to compute•We know the CPTs (factors) DifficultySATGradeHappyCoherenceLetterIntelligenceQ•Tractable Posterior Family• Set CPT (factors)•Get distribution, Q()PQKLQ Min QQ*∈=•Fully instantiated distribution Q*•Enables exact Inference • Has low tree-width•You don’t have to remove all edges• Just keep tractable sub graphs: trees, chain11/6/200811Structured Variational InferenceX1X2X3X4X5X6X7X8X9X10X11X12P: Pairwise MNQ : tractably factorized MN()()∏∈∝EjijiXX,,XPφ()()∏∈∝qEjijijiXX,,,XQψGiven factors in P, get factors in Q that minimize energy functionalGoalStill tractableBy constructionX1X2X3X4X5X6X7X8X9X10X11X12Fixed point EquationsX1X2X3X4X5X6X7X8X9X10X11X12P: Pairwise MNQ : tractably factorized MNFactors that scope() is not independent of Cjin
View Full Document