Unformatted text preview:

Hidden Markov Model HMM Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723 Introduction to Computational Linguistics Lecture 5 HMMs allow you to estimate probabilities of unobserved events Given plain text which underlying parameters generated the surface E g in speech recognition the observed data is the acoustic signal and the words are the hidden parameters October 6 2004 HMMs and their Usage HMMs are very common in Computational Linguistics Speech recognition observed acoustic signal hidden words Handwriting recognition observed image hidden words Part of speech tagging observed words hidden part of speech tags Machine translation observed foreign words hidden words in target language Noisy Channel Model In speech recognition you observe an acoustic signal A a1 an and you want to determine the most likely sequence of words W w1 wn P W A Problem A and W are too specific for reliable counts on observed data and are very unlikely to occur in unseen data 1 Noisy Channel Model Noisy Channel Model Assume that the acoustic signal A is already segmented wrt word boundaries P W A could be computed as Given a candidate sequence W we need to compute P W and combine it with P W A Applying Bayes rule P W A max P w i ai ai wi argmax P W A argmax Problem Finding the most likely word corresponding to a acoustic representation depends on the context E g pre z ns could mean presents or presence depending on the context W W P A W P W P A The denominator P A can be dropped because it is constant for all W Noisy Channel in a Picture Decoding The decoder combines evidence from The likelihood P A W This can be approximated as n P A W P ai w i i 1 The prior P W This can be approximated as P W P w1 n i 2 P w i w i 1 7 2 Search Space Given a word segmented acoustic sequence list all candidates ik spen siv excessive pre z ns presidents bald P inactive bald expensive presence bold bought expressive inactive bot boat Markov Assumption P bot bald The Markov assumption states that probability of the occurrence of word wi at time t depends only on occurrence of word wi 1 at time t 1 n P w1 w n P w i w1 w i 1 presents press i 2 Chain rule Compute the most likely path Markov assumption n P w1 w n P w i w i 1 i 2 The Trellis Parameters of an HMM States A set of states S s1 sn Transition probabilities A a1 1 a1 2 an n Each ai j represents the probability of transitioning from state si to sj Emission probabilities A set B of functions of the form bi ot which is the probability of observation ot being emitted by si Initial state distribution i is the probability that si is a start state 3 The Three Basic HMM Problems The Three Basic HMM Problems Problem 1 Evaluation Given the observation sequence O o1 oT and an HMM model A B how do we compute the probability of O given the model Problem 2 Decoding Given the observation sequence O o1 oT and an HMM model A B how do we find the state sequence that best explains the observations Problem 3 Learning How do we adjust the model parameters A B to maximize P O Problem 1 Probability of an Observation Sequence Forward Probabilities What is P O The probability of a observation sequence is the sum of the probabilities of all possible state sequences in the HMM Na ve computation is very expensive Given T observations and N states there are NT possible state sequences Even small HMMs e g T 10 and N 10 contain 10 billion different paths Solution to this and problem 2 is to use dynamic programming What is the probability that given an HMM at time t the state is i and the partial observation o1 ot has been generated t i P o1 ot qt si 4 Forward Probabilities Forward Algorithm t i P o1 ot qt si Initialization 1 i ibi o1 1 i N Induction N t j t 1 i aij b j ot 2 t T 1 j N i 1 N N t j t 1 i aij b j ot i 1 Termination P O T i i 1 Forward Algorithm Complexity Backward Probabilities In the na ve approach to solving problem 1 it takes on the order of 2T NT computations The forward algorithm takes on the order of N2T computations Analogous to the forward probability just in the other direction What is the probability that given an HMM and given the state at time t is i the partial observation ot 1 oT is generated t i P ot 1 oT qt si 5 Backward Probabilities t i P ot 1 oT qt si Backward Algorithm Initialization Induction T i 1 1 i N N t i aij b j ot 1 t 1 j t T 1 1 1 i N j 1 Termination N t i aij b j ot 1 t 1 j j 1 i 1 Problem 2 Decoding N P O i 1 i The solution to Problem 1 Evaluation gives us the sum of all paths through an HMM efficiently For Problem 2 we wan to find the path with the highest probability We want to find the state sequence Q q1 qT such that Q argmax P Q O Viterbi Algorithm Similar to computing the forward probabilities but instead of summing over transitions from incoming states compute the maximum N Forward j i a b o t i 1 t 1 ij j t Viterbi Recursion Q t j max t 1 i aij b j ot 1 i N 6 Viterbi Algorithm Initialization Induction Problem 3 Learning 1 i ib j o1 1 i N t j max t 1 i aij b j ot 1 i N t j argmax t 1 i aij 2 t T 1 j N 1 i N T i Termination p max 1 i N Read out path t qT argmax T i Up to now we ve assumed that we know the underlying model A B Often these parameters are estimated on annotated training data which has two drawbacks 1 i N Annotation is difficult and or expensive Training data is different from the current data We want to maximize the parameters with respect to the current data i e we re looking for a model such that argmax P O t 1 q t 1 q t T 1 1 Problem 3 Learning Parameter Re estimation Unfortunately there is no known way to analytically find a global maximum i e a model such that argmax P O But it is possible to find a local maximum Given an initial model we can always find a model such that P O P O Use the forward backward or BaumWelch algorithm which is a hill climbing algorithm Using an initial parameter instantiation the forward …


View Full Document

UMD CMSC 723 - Hidden Markov Models

Documents in this Course
Lecture 9

Lecture 9

12 pages

Smoothing

Smoothing

15 pages

Load more
Loading Unlocking...
Login

Join to view Hidden Markov Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hidden Markov Models and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?