Classic HMM tutorial see class website L R Rabiner A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition Proc of the IEEE Vol 77 No 2 pp 257 286 1989 Time series HMMs Kalman Filters Machine Learning 10701 15781 Carlos Guestrin Carnegie Mellon University March 28th 2005 Adventures of our BN hero Compact representation for probability distributions Fast inference Fast learning But Who are the most popular kids 1 Na ve Bayes 2 and 3 Hidden Markov models HMMs Kalman Filters Handwriting recognition Character recognition e g kernel SVMs rr r r r c r a c z bc Example of a hidden Markov model HMM Understanding the HMM Semantics X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 HMMs semantics Details X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Just 3 distributions HMMs semantics Joint distribution X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Learning HMMs from fully observable data is easy X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Learn 3 distributions Possible inference tasks in an HMM X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Marginal probability of a hidden variable Viterbi decoding most likely trajectory for hidden vars Using variable elimination to compute P Xi o1 n X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Variable elimination order Example Compute What if I want to compute P Xi o1 n for each i X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Compute Variable elimination for each i Variable elimination for each i what s the complexity Reusing computation X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Compute The forwards backwards algorithm X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Initialization For i 2 to n Generate a forwards factor by eliminating Xi 1 Initialization For i n 1 to 1 Generate a backwards factor by eliminating Xi 1 i probability is Most likely explanation X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Variable elimination order Example Compute The Viterbi algorithm X1 a z X2 a z X3 a z X4 a z X5 a z O1 O2 O3 O4 O5 Initialization For i 2 to n Generate a forwards factor by eliminating Xi 1 Computing best explanation For i n 1 to 1 Use argmax to get explanation What about continuous variables In general very hard Must represent complex distributions A special case is very doable When everything is Gaussian Called a Kalman filter One of the most used algorithms in the history of probabilities Time series data example Temperatures from sensor network 50 OFFICE 52 49 12 9 54 OFFICE 51 53 QUIET PHONE 11 8 16 15 10 CONFERENCE 13 14 7 17 18 STORAGE 48 LAB ELEC COPY 5 47 19 6 4 46 45 21 3 2 SERVER 44 KITCHEN 39 37 42 41 38 36 23 33 35 40 22 1 43 20 29 27 31 34 25 32 30 28 26 24 Operations in Kalman filter X1 O1 X2 O2 X3 O3 X4 O4 Compute Start with At each time step t X5 O5 Condition on observation Roll up marginalize previous time step Detour Understanding Multivariate Gaussians Observe attributes Example Observe X1 18 P X2 X1 18 Characterizing a multivariate Gaussian Mean vector Covariance matrix Conditional Gaussians Conditional probabilities P Y X Kalman filter with Gaussians X1 O1 X2 O2 X3 O3 X4 O4 X5 O5 Equivalent to a linear system Detour2 Canonical form Standard form and canonical forms are related Conditioning is easy in canonical form Marginalization easy in standard form Conditioning in canonical form First multiply Then condition on value B y Operations in Kalman filter X1 O1 X2 O2 X3 O3 X4 O4 Compute Start with At each time step t X5 O5 Condition on observation Roll up marginalize previous time step Roll up in canonical form First multiply Then marginalize Xt Operations in Kalman filter X1 O1 X2 O2 X3 O3 X4 O4 Compute Start with At each time step t X5 O5 Condition on observation Roll up marginalize previous time step Learning a Kalman filter Must learn Learn joint and use division rule Maximum likelihood learning of a multivariate Gaussian Data Means are just empirical means Empirical covariances What you need to know Hidden Markov models HMMs Very useful very powerful Speech OCR Parameter sharing only learn 3 distributions Trick reduces inference from O n2 to O n Special case of BN Kalman filter Continuous vars version of HMMs Assumes Gaussian distributions Equivalent to linear system Simple matrix operations for computations
View Full Document