HMMs (cont.)AnnouncementsUnderstanding the HMM SemanticsHMMs semantics: DetailsLearning HMMs from fully observable data is easyPossible inference tasks in an HMMUsing variable elimination to compute P(Xi|o1:n)What if I want to compute P(Xi|o1:n) for each i?Reusing computationThe forwards-backwards algorithmMost likely explanationThe Viterbi algorithmWhat you’ll implement 1: multiplicationWhat you’ll implement 2: max & argmaxHigher-order HMMsWhat you need to knowBayesian Networks –(Structure) LearningReviewLearning Bayes netsLearning the CPTsInformation-theoretic interpretation of maximum likelihoodMaximum likelihood (ML) for learning BN structureHow many trees are there?Information-theoretic interpretation of maximum likelihood 2Information-theoretic interpretation of maximum likelihood 3Mutual information ! Independence testsDecomposable scoreScoring a tree 1: equivalent treesScoring a tree 2: similar treesChow-Liu tree learning algorithm 1Chow-Liu tree learning algorithm 2Can we extend Chow-Liu 1Can we extend Chow-Liu 2Scoring general graphical models – Model selection problemMaximum likelihood overfits!Bayesian score avoids overfittingHow many graphs are there?Structure learning for general graphsLearn BN structure using local searchWhat you need to know about learning BNs1Classic HMM tutorial – see class website:*L. R. Rabiner, "A Tutorial on Hidden Markov Models and SelectedApplications in Speech Recognition," Proc. of the IEEE, Vol.77, No.2,pp.257--286, 1989.HMMs (cont.)Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityMarch 29th, 20062Announcements This weeks recitation: Go through several BNs topics, representation, inference, learning, in the context of an example →very useful for homework3Understanding the HMM SemanticsX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5=HMMs semantics: Details4X1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Just 3 distributions:5Learning HMMs from fully observable data is easyX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Learn 3 distributions:6Possible inference tasks in an HMMX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Marginal probability of a hidden variable:Viterbi decoding – most likely trajectory for hidden vars:7Using variable elimination to compute P(Xi|o1:n)X1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Compute:Variable elimination order?Example:8What if I want to compute P(Xi|o1:n) for each i?X1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Compute:Variable elimination for each i?Variable elimination for each i, what’s the complexity?9Reusing computationX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Compute:10The forwards-backwards algorithmX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Initialization: For i = 2 to n Generate a forwards factor by eliminating Xi-1 Initialization: For i = n-1 to 1 Generate a backwards factor by eliminating Xi+1 ∀ i, probability is:11Most likely explanationX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Compute:Variable elimination order?Example:12The Viterbi algorithmX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Initialization: For i = 2 to n Generate a forwards factor by eliminating Xi-1 Computing best explanation: For i = n-1 to 1 Use argmax to get explanation:13What you’ll implement 1: multiplication14What you’ll implement 2: max & argmax15Higher-order HMMsX1= {a,…z}O1= X5= {a,…z}X3= {a,…z} X4= {a,…z}X2= {a,…z}O2= O3= O4= O5= Add dependencies further back in time →better representation, harder to learn16What you need to know Hidden Markov models (HMMs) Very useful, very powerful! Speech, OCR,… Parameter sharing, only learn 3 distributions Trick reduces inference from O(n2) to O(n) Special case of BN17Bayesian Networks –(Structure) Learning Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon UniversityMarch 29th, 2006Koller & Friedman Chapters (handed out):Chapter 11 (short)Chapter 12: 12.1, 12.2, 12.3 (covered in the beginning of semester)12.4 (Learning parameters for BNs)Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic structure learning)Learning BN tutorial (class website):ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdfTAN paper (class website):http://www.cs.huji.ac.il/~nir/Abstracts/FrGG1.html18Review Bayesian Networks Compact representation for probability distributions Exponential reduction in number of parameters Fast probabilistic inference using variable elimination Compute P(X|e) Time exponential in tree-width, not number of variables Today Learn BN structureFluAllergySinusHeadacheNose19Learning Bayes netsKnown structure Unknown structureFully observable dataMissing datax(1)…x(m)Datastructure parametersCPTs –P(Xi| PaXi)20Learning the CPTsx(1)…x(M)DataFor each discrete variable XiWHY??????????21Information-theoretic interpretation of maximum likelihood Given structure, log likelihood of data:FluAllergySinusHeadacheNose22Maximum likelihood (ML) for learning BN structureData<x1(1),…,xn(1)>…<x1(M),…,xn(M)>FluAllergySinusHeadacheNosePossible structuresScore structureLearn parametersusing ML23How many trees are there?Nonetheless – Efficient optimal algorithm finds best tree24Information-theoretic interpretation of maximum likelihood 2 Given structure, log likelihood of data:FluAllergySinusHeadacheNose25Information-theoretic interpretation of maximum likelihood 3 Given structure, log likelihood of data:FluAllergySinusHeadacheNose26Mutual information → Independence tests Statistically difficult task! Intuitive approach: Mutual information Mutual information and independence: Xiand Xjindependent if and only if I(Xi,Xj)=0
View Full Document