DOC PREVIEW
Stanford CS 262 - Lecture Notes

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

4.Relation between Forward and Viterbi:Hidden Markov ModelsLecture 5Scribed by Mahathi S MahabhashyamIn this lecture, we discuss the three main questions answered by Hidden Markov Models.In the previous lecture we discussed, we touched on 1. Time Warping:It deals with the alignment and comparison of two trajectories in multi dimensionalspace. Two trajectories (t), (t) are connected by an approximate continuous time warping(u0, v0), if:u0, v0 are strictly increasing functions on [0, T], and a(u0(t))  b(v0(t)) for 0 £ t £ T We also discussed discretizing this problem and using dynamic programming methods tosolve it.2. Hidden Markov Models:Definition: A hidden Markov model (HMM) can be defined in terms of•Alphabet S = { b1, b2, …, bM } which are the list of symbols that are emitted by theHMM•Set of states Q = { 1, ..., K } •Transition probabilities between any two statesaij = transition prob from state i to state j with the condition thatai1 + … + aiK = 1, for all states i = 1…K•Start probabilities a0i. This gives the probability that we are in a state “i” at thebeginning of the experiment a01 + … + a0K = 1Emission probabilityTransition probabilityVk(i)0 T(t)(t)u0(t)v0(t)•Emission probabilities within each state: It gives probability with which any symbolfrom the alphabet S is emitted by a state ei(b) = P( xi = b | i = k)ei(b1) + … + ei(bM) = 1, for all states i = 1…KToday’s Topics:1. Introduction of the three main HMM problems2. Decoding in HMMs3. Evaluation in HMMs4. Introduction to CpG islandsThe three main problems that we study in HMMs are:1.Evaluation: Given an HMM M and a sequence x, the evaluation problem is to find the probability thatx is generated by the HMM M.That is, given M and x, find Prob[ x | M ]2.Decoding: Given an HMM M, and a sequence x, the decoding problem deals with fnding the sequence“” of states that maximizes the probability that x is generated by the sequence oftransitions of  in HMM MThat is, given M, x and p find P[ x,  | M ]3.Learning:Given an HMM M, with unspecified transition/emission probabilities, and a sequence x,the learning problem deals with finding the parameters of the HMM so as to maximizethe probability of generating “x” from M.That is, given x and M, find parameters θ = (ei(.), aij) that maximize P[ x | θ]In today’s lecture, we look at the first two problems and methods to solve them usingdynamic programming concepts.I. Decoding:1.Problem Statement: Given a string of the alphabet S, x = x1x2……xN, we want to find P= 1, ……, N, such that P[ x,  ] is maximized ie., * = argmaxp P[ x,  ]2.Definition:We can use dynamic programming to develop a solution to this problem.Let Vk(i) be the probability of the most likely sequence of states ending at state “k” afterthe first “i” alphabets of the sequence x are encountered. We estimate this as themaximum of the probabilities of all sequence of states 1,2,…i-1 that emit x1,x2..xi-1and that the ith state is i=k would emit xi.Vk(i)= max{1,…,i-1} P[x1…xi-1, 1, …, i-1, xi, i = k]= Probability of most likely sequence of states ending at state i = k.3.Derivation of recurrence relation:To derive a recurrence relation for dynamic programming, we proceed as follows:From definition, Vl(i+1) = max{1,…,i}P[ x1…xi, 1, …, i, xi+1, i+1 = l ]From Bayes rule, we know that P(A,B) = P(B|A) p(A). Therefore,Vl(i+1)= max{1,…,i}P(xi+1, i+1 = l | x1…xi,1,…, i) P[x1…xi, 1,…, i].From the memory-less property of HMMs we know that the transition probability from astate A to state B in a HMM is dependent only on that state and not on any previoussequence of states that may have led into the state. Hence,Vl(i+1)= max{1,…,i}P(xi+1, i+1 = l | i ) P[x1…xi-1, 1, …, i-1, xi, i]Since the first part depends only on i, we can split the above equation as a max over allvalues of i, with the second component conditionally depending on each of the values.That is,Vl(i+1)= maxk {P(xi+1, pi+1 = l | pi = k) max{p1,…,i-1}P[x1…xi-1,p1,…,pi-1,xi,pi=k]}The probability of emitting xi+1 in state “l” (since pi+1=l ) is by definition its emissionprobability el(xi+1). The probability of going from state k to state l is the transitionprobability akl. And the final part is by definition, Vk(i).Thus we define, Vl(i+1) recursively as,Vl(i+1)=el(xi+1)maxk akl Vk(i)4.Viterbi Algorithm:The dynamic programming algorithm for this is known as the Viterbi algorithm. Thealgorithm is defined as follows:(Here Ptrj(i) keeps track of the path taken from the ith state to the jth state.Input: x = x1……xNInitialization:V0(0) = 1 (0 is the imaginary first state where every HMM is start itsprocessing)Vk(0) = 0, for all k > 0Iteration:Vj(i) = ej(xi) maxk akj Vk(i-1) (from the recursive derivation above)Ptrj(i) = argmaxk akj Vk(i-1) (The traceback pointer)Termination:P(x, *) = maxk Vk(N)Traceback: N* = argmaxk Vk(N) (nth state in the parse) i-1* = Ptri (i) Practical Details:Some of the practical details that need to be kept in mind are:Because of the number of multiplication involved between probabilities which lie [0,1],underflow is a huge problem. To get over this, we use the fact that multiplication ofnumbers is equivalent in semantics to adding their logs.Vl(i) = log ek(xi) + maxk [ Vk(i-1) + log akl ]Complexity:We see that for every cell (k,i) the calculation of Vk(i) requires the V-values the entireprevious column. Since there are K*N number of cells,The time complexity is O(K2N)Space required by the algorithm is to store the entire K*N matrix.So, space complexity is O(KN).Example:Following up on the dishonest casino player example from previous class we seethat, if x is a sequence of rolls with a portion of ~ 1/6 6’s, followed by a portion of ~ ½6’s…x = 123456123456…12345 6626364656…1626364656Then, its optimal parse is: FFF…………………...F LLL………………………...LRecall that the HMM for the dishonest casino player wasState 12KVj(i)Lecture 4, Thursday April 10, 2003The dishonest casino modelFAIR LOADED0.050.050.950.95P(1|F) = 1/6P(2|F) = 1/6P(3|F) = 1/6P(4|F) = 1/6P(5|F) = 1/6P(6|F) = 1/6P(1|L) = 1/10P(2|L) = 1/10P(3|L) = 1/10P(4|L) = 1/10P(5|L) = 1/10P(6|L) =


View Full Document

Stanford CS 262 - Lecture Notes

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?