The Elimination AlgorithmProbabilistic InferenceQuery 1: LikelihoodQuery 2: Conditional ProbabilityApplications of a posteriori BeliefQuery 3: Most Probable AssignmentApplications of MPAComplexity of InferenceApproaches to inferenceMarginalization and EliminationElimination on ChainsSlide 12Elimination in ChainsSlide 14Undirected ChainsThe Sum-Product OperationOutcome of eliminationDealing with evidenceInference on General GM via Variable EliminationThe elimination algorithmSlide 21Slide 22A more complex networkExample: Variable EliminationSlide 25Slide 26Slide 27Slide 28Slide 29Slide 30Slide 31Complexity of variable eliminationUnderstanding Variable EliminationGraph eliminationSlide 35Graph elimination and marginalizationComplexityExamplesLimitation of Procedure EliminationFrom Elimination to Message PassingSlide 41Summary1School of Computer ScienceThe Elimination AlgorithmProbabilistic Graphical Models (10-708)Probabilistic Graphical Models (10-708)Lecture 4, Sep 26, 2007Eric XingEric XingReading: J-Chap 3, KF-Chap. 8, 9Eric Xing 2Probabilistic InferenceWe now have compact representations of probability distributions: Graphical ModelsA GM G describes a unique probability distribution PHow do we answer queries about P?We use inference as a name for the process of computing answers to such queriesEric Xing 3 11x xkk,,x,xPP )()( ee Query 1: LikelihoodMost of the queries one may ask involve evidenceEvidence e is an assignment of values to a set E variables in the domainWithout loss of generality E = { Xk+1, …, Xn }Simplest query: compute probability of evidencethis is often referred to as computing the likelihood of eEric Xing 4xx,XPX,PPX,PXP)()()()()|(eeeeezezZYY )|()|( ,PePQuery 2: Conditional ProbabilityOften we are interested in the conditional probability distribution of a variable given the evidencethis is the a posteriori belief in X, given evidence eWe usually query a subset Y of all domain variables X={Y,Z} and "don't care" about the remaining, Z:the process of summing out the "don't care" variables z is called marginalization, and the resulting P(y|e) is called a marginal prob.Eric Xing 5A CBA CB??Applications of a posteriori BeliefPrediction: what is the probability of an outcome given the starting conditionthe query node is a descendent of the evidenceDiagnosis: what is the probability of disease/fault given symptomsthe query node an ancestor of the evidenceLearning under partial observationfill in the unobserved values under an "EM" setting (more later)The directionality of information flow between variables is not restricted by the directionality of the edges in a GMprobabilistic inference can combine evidence form all parts of the networkEric Xing 6In this query we want to find the most probable joint assignment (MPA) for some variables of interestSuch reasoning is usually performed under some given evidence e, and ignoring (the values of) other variables z :this is the maximum a posteriori configuration of y.zyyezyeyeY )|,(maxarg)|(maxarg)|(MPA PPYYQuery 3: Most Probable AssignmentEric Xing 7Applications of MPAClassification find most likely label, given the evidenceExplanation what is the most likely scenario, given the evidenceCautionary note:The MPA of a variable depends on its "context"---the set of variables been jointly queriedExample:MPA of X ?MPA of (X, Y) ?Eric Xing 8Thm:Computing P(X = x | e) in a GM is NP-hardHardness does not mean we cannot solve inferenceIt implies that we cannot find a general procedure that works efficiently for arbitrary GMsFor particular families of GMs, we can have provably efficient proceduresComplexity of InferenceEric Xing 9Approaches to inferenceExact inference algorithmsThe elimination algorithmMessage-passing algorithm (sum-product, belief propagation)The junction tree algorithms Approximate inference techniquesStochastic simulation / sampling methodsMarkov chain Monte Carlo methodsVariational algorithmsEric Xing 10Query: P(e) By chain decomposition, we getA B CEDd c b ad c b adePcdPbcPabPaPedcbaPeP)|()|()|()|()(),,,,()(a naïve summation needs to enumerate over an exponential number of termsA signal transduction pathway:What is the likelihood that protein E is active?Marginalization and EliminationEric Xing 11Rearranging terms ...A B CED d c b ad c b aabPaPdePcdPbcPdePcdPbcPabPaPeP)|()()|()|()|()|()|()|()|()()(Elimination on ChainsEric Xing 12Now we can perform innermost summationThis summation "eliminates" one variable from our summation argument at a "local cost".A B CEDX d c bd c b abpdePcdPbcPabPaPdePcdPbcPeP)()|()|()|()|()()|()|()|()(Elimination on ChainsEric Xing 13A B CEDRearranging and then summing again, we get d cd c bd c bcpdePcdPbpbcPdePcdPbpdePcdPbcPeP)()|()|()()|()|()|()()|()|()|()(XXElimination in ChainsEric Xing 14A B CEDEliminate nodes one by one all the way to the end, we getddpdePeP )()|()(XX X XComplexity:•Each step costs O(|Val(Xi)|*|Val(Xi+1)|) operations: O(kn2)•Compare to naïve evaluation that sums over joint values of n-1 variables O(nk)Elimination in ChainsEric Xing 15Rearranging terms ...A B CEDUndirected Chains d c b ad c b aabdecdbcZdecdbcabZeP),(),(),(),(),(),(),(),()(11Eric Xing 16The Sum-Product OperationIn general, we can view the task at hand as that of computing the value of an expression of the form:where F is a set of factorsWe call this task the sum-product inference task.zFEric Xing 17Outcome of eliminationLet X be some set of variables, let F be a set of factors such that for each ∈ F , Scope[ ] ⊆ X, let Y ⊂ X be a set of query variables, and let Z = X−Y be the variable to be eliminated The result of eliminating the variable Z is a factorThis factor does not necessarily correspond to any probability or conditional probability in this network. (example forthcoming)zYF)(Eric Xing 18Dealing with evidenceConditioning as a Sum-Product OperationThe evidence potential:Total evidence potential:Introducing evidence --- restricted factors:iiiiiieEeEeE if 0 if
View Full Document