Structure Learning (The Good), The Bad, The Ugly InferenceDecomposable scoreStructure learning for general graphsUnderstanding score decompositionFixed variable order 1Fixed variable order 2Learn BN structure using local searchExploit score decomposition in local searchSome experimentsOrder search versus graph searchBayesian model averagingWhat you need to know about learning BN structuresInference in graphical models: Typical queries 1Inference in graphical models: Typical queries 2 – MaximizationAre MPE and MAP Consistent?C++ LibraryComplexity of conditional probability queries 1Complexity of conditional probability queries 2Inference is #P-complete, hopeless?Complexity for other inference questionsInference in BNs hopeless?General probabilistic inferenceMarginalizationProbabilistic inference exampleFast probabilistic inference example – Variable eliminationUnderstanding variable elimination – Exploiting distributivityUnderstanding variable elimination – Order can make a HUGE differenceUnderstanding variable elimination – Intermediate resultsUnderstanding variable elimination – Another examplePruning irrelevant variables1Structure Learning(The Good), The Bad, The UglyInference Graphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityOctober 13th, 2008Readings:K&F: 17.3, 17.4, 17.5.1, 8.1, 12.110-708 – Carlos Guestrin 2006-200810-708 – Carlos Guestrin 2006-20082Decomposable scoreLog data likelihoodDecomposable score:Decomposes over families in BN (node and its parents)Will lead to significant computational efficiency!!!Score(G : D) = i FamScore(Xi|PaXi : D)10-708 – Carlos Guestrin 2006-20083Structure learning for general graphsIn a tree, a node only has one parentTheorem:The problem of learning a BN structure with at most d parents is NP-hard for any (fixed) d≥2Most structure learning approaches use heuristicsExploit score decomposition(Quickly) Describe two heuristics that exploit decomposition in different ways10-708 – Carlos Guestrin 2006-20084Understanding score decompositionDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-20085Fixed variable order 1Pick a variable ordere.g., X1,…,XnXi can only pick parents in {X1,…,Xi-1}Any subsetAcyclicity guaranteed!Total score = sum score of each node10-708 – Carlos Guestrin 2006-20086Fixed variable order 2Fix max number of parents to kFor each i in order Pick PaXi{X1,…,Xi-1}Exhaustively search through all possible subsetsPaXi is maximum U{X1,…,Xi-1} FamScore(Xi|U : D)Optimal BN for each order!!!Greedy search through space of orders:E.g., try switching pairs of variables in orderIf neighboring vars in order are switched, only need to recompute score for this pair O(n) speed up per iteration10-708 – Carlos Guestrin 2006-20087Learn BN structure using local searchStarting from Chow-Liu treeLocal search,possible moves:Only if acyclic!!!• Add edge• Delete edge• Invert edgeSelect using favorite score10-708 – Carlos Guestrin 2006-20088Exploit score decomposition in local searchAdd edge and delete edge:Only rescore one family!Reverse edgeRescore only two familiesDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 2006-20089Some experimentsAlarm network10-708 – Carlos Guestrin 2006-200810Order search versus graph searchOrder search advantagesFor fixed order, optimal BN – more “global” optimizationSpace of orders much smaller than space of graphsGraph search advantagesNot restricted to k parentsEspecially if exploiting CPD structure, such as CSICheaper per iterationFiner moves within a graph10-708 – Carlos Guestrin 2006-200811Bayesian model averagingSo far, we have selected a single structureBut, if you are really Bayesian, must average over structuresSimilar to averaging over parametersInference for structure averaging is very hard!!!Clever tricks in reading10-708 – Carlos Guestrin 2006-200812What you need to know about learning BN structuresDecomposable scoresData likelihood Information theoretic interpretationBayesianBIC approximationPriorsStructure and parameter assumptionsBDe if and only if score equivalenceBest tree (Chow-Liu)Best TANNearly best k-treewidth (in O(Nk+1))Search techniquesSearch through ordersSearch through structuresBayesian model averaging10-708 – Carlos Guestrin 2006-200813Inference in graphical models: Typical queries 1FluAllergySinusHeadacheNoseConditional probabilitiesDistribution of some var(s). given evidence10-708 – Carlos Guestrin 2006-200814Inference in graphical models: Typical queries 2 – MaximizationFluAllergySinusHeadacheNoseMost probable explanation (MPE)Most likely assignment to all hidden vars given evidenceMaximum a posteriori (MAP)Most likely assignment to some var(s) given evidence10-708 – Carlos Guestrin 2006-200815Are MPE and MAP Consistent?Sinus NoseMost probable explanation (MPE)Most likely assignment to all hidden vars given evidenceMaximum a posteriori (MAP)Most likely assignment to some var(s) given evidenceP(S=t)=0.4 P(S=f)=0.6P(N|S)C++ LibraryNow available, join:http://groups.google.com/group/10708-f08-code/The library implements the following functionality:random variables, random processes, and linear algebrafactorized distributions, such Gaussians, multinomial distributions, and mixturesgraph structures and basic graph algorithmsgraphical models, including Bayesian networks, Markov networks, andjunction treesbasic static and dynamic inference algorithmsparameter learning for Gaussian distributions, Chow LiuFairly advanced C++ (not for everyone )10-708 – Carlos Guestrin 2006-20081610-708 – Carlos Guestrin 2006-200817Complexity of conditional probability queries 1How hard is it to compute P(X|E=e)?Reduction – 3-SAT...)()(432321 XXXXXX10-708 – Carlos Guestrin 2006-200818Complexity of conditional probability queries 2How hard is it to compute P(X|E=e)? At least NP-hard, but even harder!10-708 – Carlos Guestrin 2006-200819Inference is #P-complete, hopeless?Exploit structure!Inference is hard in general, but easy for many (real-world relevant) BN structures10-708 – Carlos Guestrin 2006-200820Complexity for
View Full Document