Required Readings from Koller Friedman Representation 2 1 2 2 Inference 5 1 6 1 6 2 6 7 1 Optional 2 3 5 2 5 3 6 3 6 7 2 Bayesian Networks Inference cont Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University March 26th 2006 1 Marginalization Flu Allergy t Sinus 2 Probabilistic inference example Flu Allergy Sinus Headache Nose t 3 Inference seems exponential in number of variables Understanding variable elimination Order can make a HUGE difference Flu Allergy Sinus Headache Nose t 4 Variable elimination algorithm Given a BN and a query P X e P X e Instantiate evidence e IMPORTANT Prune non ancestors of X e Choose an ordering on variables e g X1 Xn For i 1 to n If Xi X e Collect factors f1 fk that include Xi Generate a new factor by eliminating Xi from these factors Variable Xi has been eliminated Normalize P X e to obtain P X e 5 Complexity of variable elimination Poly tree graphs Variable elimination order Start from leaves up find topological order eliminate variables in reverse order Linear in number of variables versus exponential 6 Complexity of variable elimination Graphs with loops 7 Exponential in number of variables in largest factor generated Complexity of variable elimination Tree width Moralize graph Connect parents into a clique and remove edge directions Complexity of VE elimination Only exponential in tree width Tree width is maximum node cut 1 8 Example Large tree width with small number of parents Compact representation Easy inference 9 Choosing an elimination order Choosing best order is NP complete Reduction from MAX Clique Many good heuristics some with guarantees Ultimately can t beat NP hardness of inference Even optimal order can lead to exponential variable elimination computation In practice Variable elimination often very effective Many many many approximate inference approaches available when variable elimination too expensive 10 Most likely explanation MLE Flu Query Sinus Headache Using Bayes rule Normalization irrelevant Allergy Nose 11 Max marginalization Flu Sinus Nose t 12 Example of variable elimination for MLE Forward pass Flu Allergy Sinus Headache Nose t 13 Example of variable elimination for MLE Backward pass Flu Allergy Sinus Headache Nose t 14 MLE Variable elimination algorithm Forward pass Given a BN and a MLE query maxx1 xnP x1 xn e Instantiate evidence e Choose an ordering on variables e g X1 Xn For i 1 to n If Xi e Collect factors f1 fk that include Xi Generate a new factor by eliminating Xi from these factors Variable Xi has been eliminated 15 MLE Variable elimination algorithm Backward pass x1 xn will store maximizing assignment For i n to 1 If Xi e Take factors f1 fk used when Xi was eliminated Instantiate f1 fk with xi 1 xn Now each fj depends only on Xi Generate maximizing assignment for Xi 16 What you need to know Bayesian networks A useful compact representation for large probability distributions Inference to compute Probability of X given evidence e Most likely explanation MLE given evidence e Inference is NP hard Variable elimination algorithm Efficient algorithm only exponential in tree width not number of variables Elimination order is important Approximate inference necessary when tree width to large not covered this semester Only difference between probabilistic inference and MLE is sum versus max 17 Classic HMM tutorial see class website L R Rabiner A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition Proc of the IEEE Vol 77 No 2 pp 257 286 1989 HMMs Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University March 26th 2005 18 Adventures of our BN hero Compact representation for probability distributions Fast inference Fast learning But Who are the most popular kids 1 Na ve Bayes 2 and 3 Hidden Markov models HMMs Kalman Filters 19 Handwriting recognition Character recognition e g kernel SVMs rr r r r c r a c z bc 20 Example of a hidden Markov model HMM 21 Understanding the HMM Semantics X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 22 HMMs semantics Details X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Just 3 distributions 23 HMMs semantics Joint distribution X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 24 Learning HMMs from fully observable data is easy X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Learn 3 distributions 25 Possible inference tasks in an HMM X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Marginal probability of a hidden variable Viterbi decoding most likely trajectory for hidden vars 26 Using variable elimination to compute P Xi o1 n X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Compute Variable elimination order Example 27 What if I want to compute P Xi o1 n for each i X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Compute Variable elimination for each i Variable elimination for each i what s the complexity 28 Reusing computation X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Compute 29 The forwards backwards algorithm X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Initialization For i 2 to n Generate a forwards factor by eliminating Xi 1 Initialization For i n 1 to 1 Generate a backwards factor by eliminating Xi 1 i probability is 30 Most likely explanation X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Compute Variable elimination order Example 31 The Viterbi algorithm X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Initialization For i 2 to n Generate a forwards factor by eliminating Xi 1 Computing best explanation For i n 1 to 1 Use argmax to get explanation 32 What you ll implement 1 multiplication 33 What you ll implement 2 max argmax 34 Higher order HMMs X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Add dependencies further back in time better representation harder to learn 35 What you need to know Hidden Markov models HMMs Very useful very powerful Speech OCR Parameter sharing only learn 3 distributions Trick reduces inference from O n2 to O n Special case of BN 36
View Full Document