15 381 Artificial Intelligence Hidden Markov Models HMMs What s wrong with Bayesian networks Bayesian networks are very useful for modeling joint distributions But they have their limitations Cannot account for temporal sequence models DAG s no self or any other loops This is not a valid Bayesian network Hidden Markov models Model a set of observation with a set of hidden states Robot movement Observations range sensor visual sensor Hidden states location on a map Speech processing Observations sound signals Hidden states parts of speech words Biology Observations DNA base pairs Hidden states Genes Hidden Markov models Model a set of observation with a set of hidden states Robot movement Observations range sensor visual sensor Hidden states location on a map 1 Speech Hiddenprocessing states generate observations Observations sound signals 2 Hidden states transition to other hidden states Hidden states parts of speech words Biology Observations DNA base pairs Hidden states Genes Examples Speech processing Example Biological data ATGAAGCTACTGTCTTCTATCGAACAAGCATGCG ATATTTGCCGACTTAAAAAGCTCAAG TGCTCCAAAGAAAAACCGAAGTGCGCCAAGTGT CTGAAGAACAACTGGGAGTGTCGCTAC TCTCCCAAAACCAAAAGGTCTCCGCTGACTAGG GCACATCTGACAGAAGTGGAATCAAGG CTAGAAAGACTGGAACAGCTATTTCTACTGATTT TTCCTCGAGAAGACCTTGACATGATT Example Gambling on dice outcome Two dices both skewed output model Can either stay with the same dice or switch to the second dice transition mode 0 2 0 8 A 0 8 B 0 2 A Hidden Markov model A set of states s1 sn In each time point we are in exactly one of these states denoted by qt i the probability that we start at state si A transition probability model P qt si qt 1 sj A set of possible outputs In time point t we emit a symbol An emission probability model p ot si 0 8 0 5 0 8 0 2 A B 0 2 0 5 The Markov property A set of states s1 sn In each time point we are in exactly one of these states denoted by qt i the probability that we start at state si A transition probability model P qt si qt 1 sj A set of possible outputs An important aspect of this definitions is the Markov property Inconditionally time point t we emit a symbol ot qt 1 is independent of qt 1 and any earlier time An emission points given qt probability model p ot si More formally P qt 1 si qt sj P qt 1 si qt sj qt 1 0 8 sj 0 2 0 8 A 0 2 What can we ask when using a HMM A few examples What dice is currently being used What is the probability of a 6 in the next role What is the probability of 6 in any of the next 3 roles 0 2 0 8 A 0 8 B 0 2 Inference in HMMs Computing P Q and P qt si If we cannot look at observations Computing P Q O and P qt si O When we have observation and care about the last state only Computing argmaxQP Q O When we care about the entire path What dice is currently being used There where t rounds so far We want to determine P qt A Lets assume for now that we cannot observe any outputs we are blind folded How can we compute this 0 2 0 8 0 5 A 0 8 B 0 2 0 5 P qt A Simple answer Lets determine P Q where Q is any path that ends in A Q q1 qt 1 A P Q P q1 qt 1 A P A q1 qt 1 P q1 qt 1 P A qt 1 P q1 qt 1 P A qt 1 P q2 q1 P q1 Markov property 0 8 0 2 Initial probability A 0 8 B 0 2 P qt A Simple answer 1 Lets determine P Q where Q is any path that ends in A Q q1 qt 1 A P Q P q1 qt 1 A P A q1 qt 1 P q1 qt 1 P A qt 1 P q1 qt 1 P A qt 1 P q2 q1 P q1 2 P qt A P Q where the sum is over all sets of t sates that end in A P qt A Simple answer 1 Lets determine P Q where Q is any path that ends in A Q q1 qt 1 A P Q P q1 qt 1 A P A q1 qt 1 P q1 qt 1 P A qt 1 P q1 qt 1 P A qt 1 P q2 q1 P q1 2 P qt A P Q where the sum is over all sets of t sates that end in A Q How many sets Q are there A A lot 2t 1 Not a feasible solution P qt A the smart way Lets define pt i probability state i at time t p qt si We can determine pt i by induction 1 p1 i i 2 pt i P qt A the smart way Lets define pt i probability state i at time t p qt si We can determine pt i by induction 1 p1 i i 2 pt i j p qt si qt 1 sj pt 1 j P qt A the smart way Lets define pt i probability state i at time t p qt si We can determine pt i by induction 1 p1 i i 2 pt i j p qt si qt 1 sj pt 1 j This type of computation is called dynamic programming Complexity O n2 t Number of states in our HMM Time state t1 s1 3 s2 7 t2 t3 Inference in HMMs Computing P Q and P qt si Computing P Q O and P qt si O Computing argmaxQP Q But what if we observe outputs So far we assumed that we could not observe the outputs In reality we almost always can v P v A P v B 1 3 1 2 2 1 3 2 1 4 1 2 5 1 2 6 1 3 0 2 0 8 A 0 8 B 0 2 But what if we observe outputs So far we assumed that we could not observe the outputs Does can observing the sequence In reality we almost always 5 6 4 5 6 6 v P v A P v B 1 3 1 2 2 1 3 2 1 4 1 2 5 1 2 6 1 3 Change our belief about the state 0 2 0 8 A 0 8 B 0 2 P qt A when outputs are observed We want to compute P qt A O1 Ot For ease of writing we will use the following notations common in literature ai j P qt si qt 1 sj bi ot P ot si P qt A when outputs are observed We want to compute P qt A O1 Ot Lets start with a simpler question Given a sequence of states Q what is P Q O1 Ot P Q O It is pretty simple to move from P Q to P qt A In some cases P Q is the more important question Speech processing NLP P Q O We can use Bayes rule P O …
View Full Document