1Structure Learning(The Good), The Bad, The UglyInference Graphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityOctober 13th, 2008Readings:K&F: 17.3, 17.4, 17.5.1, 8.1, 12.110-708 – Carlos Guestrin 2006-200810-708 –Carlos Guestrin 2006-20082Decomposable score Log data likelihood Decomposable score: Decomposes over families in BN (node and its parents) Will lead to significant computational efficiency!!! Score(G: D) = ∑iFamScore(Xi|PaXi: D)10-708 –Carlos Guestrin 2006-20083Structure learning for general graphs In a tree, a node only has one parent Theorem: The problem of learning a BN structure with at most dparents is NP-hard for any (fixed) d≥2 Most structure learning approaches use heuristics Exploit score decomposition (Quickly) Describe two heuristics that exploit decomposition in different ways10-708 –Carlos Guestrin 2006-20084Understanding score decompositionDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –Carlos Guestrin 2006-20085Fixed variable order 1 Pick a variable order e.g., X1,…,Xn Xican only pick parents in {X1,…,Xi-1} Any subset Acyclicity guaranteed! Total score = sum score of each node10-708 –Carlos Guestrin 2006-20086Fixed variable order 2 Fix max number of parents to k For each i in order Pick PaXi⊆ {X1,…,Xi-1} Exhaustively search through all possible subsets PaXiis maximum U⊆ {X1,…,Xi-1} FamScore(Xi|U : D)Optimal BN for each order!!! Greedy search through space of orders: E.g., try switching pairs of variables in order If neighboring vars in order are switched, only need to recompute score for this pair O(n) speed up per iteration10-708 –Carlos Guestrin 2006-20087Learn BN structure using local searchStarting from Chow-Liu treeLocal search,possible moves:Only if acyclic!!!• Add edge• Delete edge• Invert edgeSelect using favorite score10-708 –Carlos Guestrin 2006-20088Exploit score decomposition in local search Add edge and delete edge: Only rescore one family! Reverse edge Rescore only two familiesDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 –Carlos Guestrin 2006-20089Some experimentsAlarm network10-708 –Carlos Guestrin 2006-200810Order search versus graph search Order search advantages For fixed order, optimal BN – more “global” optimization Space of orders much smaller than space of graphs Graph search advantages Not restricted to k parents Especially if exploiting CPD structure, such as CSI Cheaper per iteration Finer moves within a graph10-708 –Carlos Guestrin 2006-200811Bayesian model averaging So far, we have selected a single structure But, if you are really Bayesian, must average over structures Similar to averaging over parameters Inference for structure averaging is very hard!!! Clever tricks in reading10-708 –Carlos Guestrin 2006-200812What you need to know about learning BN structures Decomposable scores Data likelihood Information theoretic interpretation Bayesian BIC approximation Priors Structure and parameter assumptions BDe if and only if score equivalence Best tree (Chow-Liu) Best TAN Nearly best k-treewidth (in O(Nk+1)) Search techniques Search through orders Search through structures Bayesian model averaging10-708 –Carlos Guestrin 2006-200813Inference in graphical models: Typical queries 1FluAllergySinusHeadacheNose Conditional probabilities Distribution of some var(s). given evidence10-708 –Carlos Guestrin 2006-200814Inference in graphical models: Typical queries 2 – MaximizationFluAllergySinusHeadacheNose Most probable explanation (MPE) Most likely assignment to all hidden vars given evidence Maximum a posteriori (MAP) Most likely assignment to some var(s) given evidence10-708 –Carlos Guestrin 2006-200815Are MPE and MAP Consistent?Sinus Nose Most probable explanation (MPE) Most likely assignment to all hidden vars given evidence Maximum a posteriori (MAP) Most likely assignment to some var(s) given evidenceP(S=t)=0.4 P(S=f)=0.6P(N|S)C++ Library Now available, join: http://groups.google.com/group/10708-f08-code/ The library implements the following functionality: random variables, random processes, and linear algebra factorized distributions, such Gaussians, multinomial distributions, and mixtures graph structures and basic graph algorithms graphical models, including Bayesian networks, Markov networks, andjunction trees basic static and dynamic inference algorithms parameter learning for Gaussian distributions, Chow Liu Fairly advanced C++ (not for everyone ☺)10-708 –Carlos Guestrin 2006-20081610-708 –Carlos Guestrin 2006-200817Complexity of conditional probability queries 1 How hard is it to compute P(X|E=e)?Reduction – 3-SAT...)()(432321∧∨∨∧∨∨ XXXXXX10-708 –Carlos Guestrin 2006-200818Complexity of conditional probability queries 2 How hard is it to compute P(X|E=e)? At least NP-hard, but even harder!10-708 –Carlos Guestrin 2006-200819Inference is #P-complete, hopeless? Exploit structure! Inference is hard in general, but easy for many (real-world relevant) BN structures10-708 –Carlos Guestrin 2006-200820Complexity for other inference questions Probabilistic inference general graphs: poly-trees and low tree-width: Approximate probabilistic inference Absolute error: Relative error: Most probable explanation (MPE) general graphs: poly-trees and low tree-width: Maximum a posteriori (MAP) general graphs: poly-trees and low tree-width:10-708 –Carlos Guestrin 2006-200821Inference in BNs hopeless? In general, yes! Even approximate! In practice Exploit structure Many effective approximation algorithms (some with guarantees) For now, we’ll talk about exact inference Approximate inference later this semester10-708 –Carlos Guestrin 2006-200822General probabilistic inference Query: Using def. of cond. prob.: Normalization:FluAllergySinusHeadacheNose10-708 –Carlos Guestrin 2006-200823MarginalizationFlu Nose=tSinus10-708 –Carlos Guestrin 2006-200824Probabilistic inference exampleFluAllergySinusHeadacheNose=tInference seems exponential in number of variables!10-708 –Carlos Guestrin 2006-200825Fast probabilistic inference
View Full Document