4.Relation between Forward and Viterbi:Hidden Markov ModelsLecture 5Scribed by Mahathi S MahabhashyamIn this lecture, we discuss the three main questions answered by Hidden Markov Models.In the previous lecture we discussed, we touched on 1. Time Warping:It deals with the alignment and comparison of two trajectories in multi dimensionalspace. Two trajectories (t), (t) are connected by an approximate continuous time warping(u0, v0), if:u0, v0 are strictly increasing functions on [0, T], and a(u0(t)) b(v0(t)) for 0 £ t £ T We also discussed discretizing this problem and using dynamic programming methods tosolve it.2. Hidden Markov Models:Definition: A hidden Markov model (HMM) can be defined in terms of•Alphabet S = { b1, b2, …, bM } which are the list of symbols that are emitted by theHMM•Set of states Q = { 1, ..., K } •Transition probabilities between any two statesaij = transition prob from state i to state j with the condition thatai1 + … + aiK = 1, for all states i = 1…K•Start probabilities a0i. This gives the probability that we are in a state “i” at thebeginning of the experiment a01 + … + a0K = 1Emission probabilityTransition probabilityVk(i)0 T(t)(t)u0(t)v0(t)•Emission probabilities within each state: It gives probability with which any symbolfrom the alphabet S is emitted by a state ei(b) = P( xi = b | i = k)ei(b1) + … + ei(bM) = 1, for all states i = 1…KToday’s Topics:1. Introduction of the three main HMM problems2. Decoding in HMMs3. Evaluation in HMMs4. Introduction to CpG islandsThe three main problems that we study in HMMs are:1.Evaluation: Given an HMM M and a sequence x, the evaluation problem is to find the probability thatx is generated by the HMM M.That is, given M and x, find Prob[ x | M ]2.Decoding: Given an HMM M, and a sequence x, the decoding problem deals with fnding the sequence“” of states that maximizes the probability that x is generated by the sequence oftransitions of in HMM MThat is, given M, x and p find P[ x, | M ]3.Learning:Given an HMM M, with unspecified transition/emission probabilities, and a sequence x,the learning problem deals with finding the parameters of the HMM so as to maximizethe probability of generating “x” from M.That is, given x and M, find parameters θ = (ei(.), aij) that maximize P[ x | θ]In today’s lecture, we look at the first two problems and methods to solve them usingdynamic programming concepts.I. Decoding:1.Problem Statement: Given a string of the alphabet S, x = x1x2……xN, we want to find P= 1, ……, N, such that P[ x, ] is maximized ie., * = argmaxp P[ x, ]2.Definition:We can use dynamic programming to develop a solution to this problem.Let Vk(i) be the probability of the most likely sequence of states ending at state “k” afterthe first “i” alphabets of the sequence x are encountered. We estimate this as themaximum of the probabilities of all sequence of states 1,2,…i-1 that emit x1,x2..xi-1and that the ith state is i=k would emit xi.Vk(i)= max{1,…,i-1} P[x1…xi-1, 1, …, i-1, xi, i = k]= Probability of most likely sequence of states ending at state i = k.3.Derivation of recurrence relation:To derive a recurrence relation for dynamic programming, we proceed as follows:From definition, Vl(i+1) = max{1,…,i}P[ x1…xi, 1, …, i, xi+1, i+1 = l ]From Bayes rule, we know that P(A,B) = P(B|A) p(A). Therefore,Vl(i+1)= max{1,…,i}P(xi+1, i+1 = l | x1…xi,1,…, i) P[x1…xi, 1,…, i].From the memory-less property of HMMs we know that the transition probability from astate A to state B in a HMM is dependent only on that state and not on any previoussequence of states that may have led into the state. Hence,Vl(i+1)= max{1,…,i}P(xi+1, i+1 = l | i ) P[x1…xi-1, 1, …, i-1, xi, i]Since the first part depends only on i, we can split the above equation as a max over allvalues of i, with the second component conditionally depending on each of the values.That is,Vl(i+1)= maxk {P(xi+1, pi+1 = l | pi = k) max{p1,…,i-1}P[x1…xi-1,p1,…,pi-1,xi,pi=k]}The probability of emitting xi+1 in state “l” (since pi+1=l ) is by definition its emissionprobability el(xi+1). The probability of going from state k to state l is the transitionprobability akl. And the final part is by definition, Vk(i).Thus we define, Vl(i+1) recursively as,Vl(i+1)=el(xi+1)maxk akl Vk(i)4.Viterbi Algorithm:The dynamic programming algorithm for this is known as the Viterbi algorithm. Thealgorithm is defined as follows:(Here Ptrj(i) keeps track of the path taken from the ith state to the jth state.Input: x = x1……xNInitialization:V0(0) = 1 (0 is the imaginary first state where every HMM is start itsprocessing)Vk(0) = 0, for all k > 0Iteration:Vj(i) = ej(xi) maxk akj Vk(i-1) (from the recursive derivation above)Ptrj(i) = argmaxk akj Vk(i-1) (The traceback pointer)Termination:P(x, *) = maxk Vk(N)Traceback: N* = argmaxk Vk(N) (nth state in the parse) i-1* = Ptri (i) Practical Details:Some of the practical details that need to be kept in mind are:Because of the number of multiplication involved between probabilities which lie [0,1],underflow is a huge problem. To get over this, we use the fact that multiplication ofnumbers is equivalent in semantics to adding their logs.Vl(i) = log ek(xi) + maxk [ Vk(i-1) + log akl ]Complexity:We see that for every cell (k,i) the calculation of Vk(i) requires the V-values the entireprevious column. Since there are K*N number of cells,The time complexity is O(K2N)Space required by the algorithm is to store the entire K*N matrix.So, space complexity is O(KN).Example:Following up on the dishonest casino player example from previous class we seethat, if x is a sequence of rolls with a portion of ~ 1/6 6’s, followed by a portion of ~ ½6’s…x = 123456123456…12345 6626364656…1626364656Then, its optimal parse is: FFF…………………...F LLL………………………...LRecall that the HMM for the dishonest casino player wasState 12KVj(i)Lecture 4, Thursday April 10, 2003The dishonest casino modelFAIR LOADED0.050.050.950.95P(1|F) = 1/6P(2|F) = 1/6P(3|F) = 1/6P(4|F) = 1/6P(5|F) = 1/6P(6|F) = 1/6P(1|L) = 1/10P(2|L) = 1/10P(3|L) = 1/10P(4|L) = 1/10P(5|L) = 1/10P(6|L) =
View Full Document