Graphical ModelsAarti SinghSlides Courtesy: Carlos GuestrinMachine Learning 10-701/15-781Nov 15, 2010Directed – Bayesian Networks• Compact representation for a joint probability distribution• Bayes Net = Directed Acyclic Graph (DAG) + Conditional Probability Tables (CPTs)• distribution factorizes according to graph ≡ distribution satisfies local Markov independence assumptions≡ xkis independent of its non-descendantsgiven its parents pakDirected – Bayesian Networks• Graph encodes local independence assumptions (local Markov Assumptions) • Other independence assumptions can be read off the graph using d-separation• distribution factorizes according to graph ≡ distribution satisfies all independence assumptions found by d-separation• Does the graph capture all independencies? Yes, for almost alldistributions that factorize according to graph. More in 10-708F ID-separation• a is D-separated from b by c ≡ a b|c• Three important configurationsca……bCausal direction a b|ccCommon cause a b|cabcV-structure(Explaining away)a b|ca bca b…Undirected – Markov Random Fields• Popular in statistical physics, computer vision, sensor networks, social networks, protein-protein interaction network• Example – Image Denoising xi– value at pixel iyi– observed noisy valueConditional Independence properties• No directed edges• Conditional independence ≡ graph separation• A, B, C – non-intersecting set of nodes• A B|C if all paths between nodes in A & B are “blocked”i.e. path contains a node z in C.Factorization• Joint distribution factorizes according to the graphtypically NP-hard to computeClique, xC= {x1,x2}Maximal cliquexC= {x2,x3,x4}Arbitrary positive functionMRF ExampleOftenEnergy of the clique (e.g. lower if variables in clique take similar values)MRF ExampleIsing model: cliques are edges xC= {xi,xj}binary variables xiϵ {-1,1}Probability of assignment is higher if neighbors xiand xjare same1 if xi= xj-1 if xi≠ xjHammersley-Clifford Theorem• Set of distributions that factorize according to the graph - F• Set of distributions that respect conditional independencies implied by graph-separation – II FI FImportant because: Given independencies of P can get MRF structure GImportant because: Read independencies of P from MRF structure GWhat you should know…• Graphical Models: Directed Bayesian networks, Undirected Markov Random Fields– A compact representation for large probability distributions – Not an algorithm• Representation of a BN, MRF– Variables– Graph– CPTs• Why BNs and MRFs are useful• D-separation (conditional independence) & factorizationTopics in Graphical Models• Representation– Which joint probability distributions does a graphical model represent?• Inference– How to answer questions about the joint probability distribution?• Marginal distribution of a node variable• Most likely assignment of node variables• Learning– How to learn the parameters and structure of a graphical model?Inference• Possible queries:1) Marginal distribution e.g. P(S)Posterior distribution e.g. P(F|H=1)2) Most likely assignment of nodesarg max P(F=f,A=a,S=s,N=n|H=1)FluAllergySinusHeadacheNosef,a,s,nInference• Possible queries:1) Marginal distribution e.g. P(S)Posterior distribution e.g. P(F|H=1)FluAllergySinusHeadacheNoseP(F|H=1) ?P(F|H=1) == P(F, H=1) will focus on computing this, posterior will follow with only constant times more effortP(F, H=1)P(H=1)P(F, H=1)∑ P(F=f,H=1)fMarginalizationNeed to marginalize over other varsP(S) = ∑ P(f,a,S,n,h)P(F,H=1) = ∑ P(F,a,s,n,H=1)To marginalize out n binary variables,need to sum over 2ntermsFluAllergySinusHeadacheNosea,s,nf,a,n,h23termsInference seems exponential in number of variables!Actually, inference in graphical models is NP-hard Bayesian Networks Example• 18 binary attributes• Inference – P(BatteryAge|Starts=f)• need to sum over 216terms!• Not impressed?– HailFinder BN – more than 354= 58149737003040059690390169 termsFast Probabilistic InferenceFluAllergySinusHeadacheNoseP(F,H=1) = ∑ P(F,a,s,n,H=1)= ∑ P(F)P(a)P(s|F,a)P(n|s)P(H=1|s)= P(F) ∑ P(a) ∑ P(s|F,a)P(H=1|s) ∑ P(n|s)Push sums in as far as possible Distributive property: x1z + x2z = z(x1+x2) a,s,nna sa,s,n2 multiply 1 mulitplyFast Probabilistic InferenceFluAllergySinusHeadacheNose(Potential for) exponential reduction in computation!P(F,H=1) = ∑ P(F,a,s,n,H=1)= ∑ P(F)P(a)P(s|F,a)P(n|s)P(H=1|s)= P(F) ∑ P(a) ∑ P(s|F,a)P(H=1|s) ∑ P(n|s)= P(F) ∑ P(a) ∑ P(s|F,a)P(H=1|s)= P(F) ∑ P(a) g1(F,a)= P(F) g2(F)a,s,nna sa,s,n8 values x 4 multiplies4 values x 1 multiply12 values x 1 multiplya sa1 multiply32 multiplies vs. 7 multiplies2nvs. n 2kk – scope of largest factorFast Probabilistic Inference –Variable EliminationFluAllergySinusHeadacheNose(Potential for) exponential reduction in computation!P(F,H=1) = ∑ P(F)P(a)P(s|F,a)P(n|s)P(H=1|s)= P(F) ∑ P(a) ∑ P(s|F,a)P(H=1|s) ∑ P(n|s)P(H=1|F,a)P(H=1|F)a,s,nna s1Variable Elimination – Order can make a HUGE differenceFluAllergySinusHeadacheNose(Potential for) exponential reduction in computation!P(F,H=1) = ∑ P(F)P(a)P(s|F,a)P(n|s)P(H=1|s)= P(F) ∑ P(a) ∑ P(s|F,a)P(H=1|s) ∑ P(n|s)P(H=1|F,a)P(H=1|F)P(F,H=1) = P(F) ∑ P(a) ∑ ∑ P(s|F,a)P(n|s)P(H=1|s)g(F,a,n)a,s,nna s1sa n3 – scope of largest factorVariable Elimination – Order can make a HUGE differenceX1X2X3X4Yg(Y)1 – scope of largest factorg(X1,X2,..,Xn)n – scope of largest factorVariable Elimination Algorithm• Given BN – DAG and CPTs (initial factors – p(xi|pai) for i=1,..,n)• Given Query P(X|e) ≡ P(X,e) X – set of variables• Instantiate evidence e e.g. set H=1• Choose an ordering on the variables e.g., X1, …, Xn• For i = 1 to n, If Xi{X,e}– Collect factors g1,…,gkthat include Xi– Generate a new factor by eliminating Xifrom these factors– Variable Xihas been eliminated!– Remove g1,…,gkfrom set of factors but add g• Normalize P(X,e) to obtain P(X|e)IMPORTANT!!!Complexity for (Poly)tree graphsVariable elimination order:• Consider undirected version• Start from “leaves” up • find topological order • eliminate variables in reverse orderDoes not create any factors bigger than original CPTsFor polytrees, inference is linear in # variables (vs. exponential in general)!Complexity for graphs with loops• Loop – undirected cycleLinear in # variables but exponential
View Full Document