1My Plan5-9 April: Bayesian Classifiers, Local Structure, Trip12 April: Learning Time Series (DBNs)14 April: Gaussian/Discrete networks16 April: Normal Wishart19 April: Summary21 + 23 April: Student presentationsLast TimeBayesian prediction with incomplete data.MAP approximationMCMCScoring Metrics for complete dataLikelihood:MDL:BDe:()[]()()()∑−==iiiiXHPaXIMDGLGDI ::log:()() () ( )MdDGIGDLMdDGIGDMDL log2:log2:: −≈−−={}()()()()()()()()()()∑∏∏=Γ+Γ+ΓΓ=NipaxGiiGiiGiiGiGiGiGiipaxpaxNpaxpaNpapaGDP1,,,log|logαααα2Basic Bayes Net Model SelectionThe extremely basic idea:Given two nets, prefer the one with the higher score.We know how to score multinomials…How do we select the proper model?BADCEBADCEScore( ) > Score( )Structure Optimization(Complete Data)Select graph thatmaximizes scoreP{A}P{B|A}P{C}P{D|A,C}P{D|A}P{E|D}+BADCEABCDE12101110110111111112+BADCEStructure Optimization( )CandidateGeneration3TreesTree:At most one parent per variable.O(N) SolutionLater: In classification, tree-augmented classifiers provides anintermediate step between naïve bayes classifiers andgeneral network structures.Structure: OptimizationTreesDefine to be the parent of . if has noparent.Score() ( )∑=iiiPaXScoreDGScore ::()()()()()∑∑=>+=0,0,:ipiiipiipiXScoreXXScore()ipiX()0=ipiX()()()()()()∑∑+−=> iiipiiipiXScoreXScoreXXScore0,:Structure: Optimization4TreesAlgorithmConstruct graph with vertices 1, …, NSet toFind tree (or forest) with maximal weightvariant on Kruskal’s algorithmWhen the score is likelihood, then this is known as a Chow &Liu method.()jiw →()()iijXScoreXXScore −|Structure: OptimizationGeneral DAGsFinding DAG of maximum score is much harder.THM: Finding the maximal scoring DAG with at mostk parents for each variable is NP-hard for k>1.Implication: Greedy (O(n)) approaches are no longerguaranteed to work.Structure: Optimization5Search over DAGsDefine operations to transform an input DAGAdd an arcRemove an arcReverse an arcABCABCABCABCABCABCStructure: OptimizationSearchDifficult optimization problemLocal maxima, plateausApproachesGreedy hill climbingGreedy with Tabu searchSimulated annealingStructure: Optimization6Search over I-Equivalence ClassesIdea:Search space of I-equivalence classesEach I-equivalence class is represented by a PDAG: graphskeleton + V-structuresAdd arcs to ‘complete’ PDAG in order to score.BenefitsFewer local maxima and plateausFewer PDAGs than DAGsABCABCABCStructure: OptimizationSummary: Search over structurePieces of the solution:Structure generationDAGsPDAGsStructure scoringEasy closed form solution (need conjugate prior for BDe).Search algorithm (not discussed)Structure generation algorithm defines a search space.ABC-107.3,-104.3-106.5-103.2-101.2-102.9-100.1-101.9-104.5-102.5-103.4-107.0-105.6-106.67Model AveragingRecall, Bayesian inference started withTrue Bayes solution:average over all possible graphs.We focussed on finding the best scoring model:implicit assumption: Best model dominates weighted sum.Problems:Over commitment to a single structure?[]{}[]{}{}∑+=+GDGPGDMxPDMxP |,|1|1Structure: Model AveragingModel AveragingFull AveragingSum over all structures (intractable)Approximate AveragingFind K highest scoring structuresApproximate overall prediction as a weighted average of theindividual predictionsRelative weight of each structure is determined by the BayesFactorIn limit:Equiprobable dist’n over structures with independence relationsthat are “closest” to the underlying distribution.{}{}{}{ }{}{}{ }{}{}{ }{}{}2211221121||||||GDPGPGDPGPDPGDPGPDPGDPGPDGPDGP==Structure: Model Averaging8Structure Optimization: ConclusionMultiple scores: Likelihood, MLE, BDeEquivalent for large data sets.Select structure that optimizes scoreTree: Simple O(n) search algorithm.General BNs: NP-HardProposal: local search based on modifying candidatesLots of plateaus/local maxima.Bayesian model
View Full Document