Structure Learning in BNs 3: (the good, the bad,) and, finally, the ugly Inference we now get to use the BNs!!Bayesian, a decomposable scoreStructure learning for general graphsUnderstanding score decompositionFixed variable order 1Fixed variable order 2Learn BN structure using local searchExploit score decomposition in local searchSome experimentsOrder search versus graph searchBayesian model averagingWhat you need to know about learning BN structuresAnnouncementsInference in graphical models: Typical queries 1Inference in graphical models: Typical queries 2 – MaximizationAre MPE and MAP Consistent?Complexity of conditional probability queries 1Complexity of conditional probability queries 2Inference is #P-hard, hopeless?Complexity for other inference questionsInference in BNs hopeless?General probabilistic inferenceMarginalizationProbabilistic inference exampleFast probabilistic inference example – Variable eliminationUnderstanding variable elimination – Exploiting distributivityUnderstanding variable elimination – Order can make a HUGE differenceUnderstanding variable elimination – Intermediate resultsUnderstanding variable elimination – Another examplePruning irrelevant variablesVariable elimination algorithmOperations on factorsSlide 33Complexity of VE – First analysisComplexity of variable elimination – (Poly)-tree graphsWhat you need to know about inference thus far1Structure Learning in BNs 3:(the good, the bad,) and, finally, the ugly Inferencewe now get to use the BNs!!Graphical Models – 10708Carlos GuestrinCarnegie Mellon UniversityOctober 6th, 2006Readings:K&F: 15.1, 15.2, 15.3, 15.4, 15.5K&F: 7 (overview of inference)K&F: 8.1, 8.2 (Variable Elimination)10-708 – Carlos Guestrin 20062 Bayesian, a decomposable scoreAs with last lecture, assume:Local and global parameter independenceAlso, prior satisfies parameter modularity:If Xi has same parents in G and G’, then parameters have same priorFinally, structure prior P(G) satisfies structure modularityProduct of terms over familiesE.g., P(G) / c|G|Bayesian score decomposes along families!10-708 – Carlos Guestrin 20063 Structure learning for general graphsIn a tree, a node only has one parentTheorem:The problem of learning a BN structure with at most d parents is NP-hard for any (fixed) d¸2Most structure learning approaches use heuristicsExploit score decomposition(Quickly) Describe two heuristics that exploit decomposition in different ways10-708 – Carlos Guestrin 20064 Understanding score decompositionDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 20065 Fixed variable order 1Pick a variable order Áe.g., X1,…,XnXi can only pick parents in {X1,…,Xi-1}Any subsetAcyclicity guaranteed!Total score = sum score of each node10-708 – Carlos Guestrin 20066 Fixed variable order 2Fix max number of parents to kFor each i in order ÁPick PaXiµ{X1,…,Xi-1}Exhaustively search through all possible subsetsPaXi is maximum Uµ{X1,…,Xi-1} FamScore(Xi|U : D)Optimal BN for each order!!!Greedy search through space of orders:E.g., try switching pairs of variables in orderIf neighboring vars in order are switched, only need to recompute score for this pair O(n) speed up per iteration10-708 – Carlos Guestrin 20067 Learn BN structure using local searchStarting from Chow-Liu treeLocal search,possible moves:Only if acyclic!!!• Add edge• Delete edge• Invert edgeSelect using favorite score10-708 – Carlos Guestrin 20068 Exploit score decomposition in local searchAdd edge and delete edge:Only rescore one family!Reverse edgeRescore only two familiesDifficultySATGradeHappyJobCoherenceLetterIntelligence10-708 – Carlos Guestrin 20069 Some experimentsAlarm network10-708 – Carlos Guestrin 200610 Order search versus graph searchOrder search advantagesFor fixed order, optimal BN – more “global” optimizationSpace of orders much smaller than space of graphsGraph search advantagesNot restricted to k parentsEspecially if exploiting CPD structure, such as CSICheaper per iterationFiner moves within a graph10-708 – Carlos Guestrin 200611 Bayesian model averagingSo far, we have selected a single structureBut, if you are really Bayesian, must average over structuresSimilar to averaging over parametersInference for structure averaging is very hard!!!Clever tricks in reading10-708 – Carlos Guestrin 200612 What you need to know about learning BN structuresDecomposable scoresData likelihood Information theoretic interpretationBayesianBIC approximationPriorsStructure and parameter assumptionsBDe if and only if score equivalenceBest tree (Chow-Liu)Best TANNearly best k-treewidth (in O(Nk+1))Search techniquesSearch through ordersSearch through structuresBayesian model averaging10-708 – Carlos Guestrin 200613 AnnouncementsDon’t forget project proposals due this WednesdaySpecial recitation on advanced topic:Ajit Singh on Optimal Structure LearningOn Monday Oct 9, 5:30-7:00pm in Wean Hall 4615A10-708 – Carlos Guestrin 200614 Inference in graphical models: Typical queries 1FluAllergySinusHeadacheNoseConditional probabilitiesDistribution of some var(s). given evidence10-708 – Carlos Guestrin 200615 Inference in graphical models: Typical queries 2 – MaximizationFluAllergySinusHeadacheNoseMost probable explanation (MPE)Most likely assignment to all hidden vars given evidenceMaximum a posteriori (MAP)Most likely assignment to some var(s) given evidence10-708 – Carlos Guestrin 200616 Are MPE and MAP Consistent?Sinus NoseMost probable explanation (MPE)Most likely assignment to all hidden vars given evidenceMaximum a posteriori (MAP)Most likely assignment to some var(s) given evidenceP(S=t)=0.4 P(S=f)=0.6P(N|S)10-708 – Carlos Guestrin 200617 Complexity of conditional probability queries 1How hard is it to compute P(X|E=e)?Reduction – 3-SAT...)()(432321 XXXXXX10-708 – Carlos Guestrin 200618 Complexity of conditional probability queries 2How hard is it to compute P(X|E=e)? At least NP-hard, but even harder!10-708 – Carlos Guestrin 200619 Inference is #P-hard, hopeless?Exploit structure!Inference is hard in general, but easy for many (real-world relevant) BN
View Full Document