DOC PREVIEW
Stanford CS 262 - Lecture Notes

This preview shows page 1 out of 4 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 4 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 262 Lecture 5 Scribed by Yu Bai (SUID: 4905352) 01/2005 1. Motivation Identifying distantly related homologs is a difficult problem, primarily because sequence identity between them is sparse. Although the traditional pairwise alignment algorithms( e.g. BLAST, Smith-Waterman) have been devised to discriminate relatively harmless differences, penalizing common or conservative changes less than radical ones, Hidden Markov Models (HMM) offer a more systemic, family-based statistical approach to describe the consensus between the sequences. It has been one of the most important tools in genome analysis and structure prediction. There are also wide HMM applications in speech recognition. 2. Outline The lecture illustrated the basic ideas of Hidden Markov Model with a dishonest casino example, following the introduction to the definition and features of HMMs. The three main questions on building a HMM were reviewed and the algorithms for decoding and evaluation a HMM model were discussed. 3. Basics Hidden Markov Model is a (customizable) statistical model that describes a series of observations by a hidden stochastic process and defines a probability distribution over possible sequences. The lecture introduced the HMM as following: 3.1 Dishonest Casino example A casino game exists in which a player wins $1 for each roll that lands on a 1,2,3,4, or 5, and loses $2 if the roll lands on 6. With a fair die, the probability that the casino dealer wins (i.e., rolls a 6) is 1/6, and it is independent from the sequence of the previous rolls. However, the casino owner is a “dishonest” businessman and introduces a second die which is loaded, and consequently has a higher probability, 50%, of landing on a 6 and 10% each for other numbers. The casino dealer plays this trick carefully for not being caught, so he fetches back and forth between the fair and loaded dice every 20 rounds. (or say, 5% exchanging probability). The game can be represented as the model below: Where P(i|F) and P(i|L) are the probability of generating number i by Fair and Loaded dice, respectively. i = {1,2,3,4,5,6} The above diagram is an example of Hidden Markov Model. The reason that it is called “Hidden” is because normally we only observe a sequence of dice rolling, e.g.1, 2, 6, 4, 3, 6, 5, 2, 6, 6, 4, 1, 3, 6, 6, 6, 6, 6, 6, 6, 6, 5, 4, 6, 1, 6. We cannot tell which state each rolling is in, e.g. subsequence 6, 6, 6, 6, 6, 6 may happen using the loaded dice or it can happen using the fair dice even though the later case has less probability. The state is “hidden” from the sequence, e.g. we cannot determine the sequence of states from the given sequence. Hence, it is “Hidden” Markov Model. 3.2 Definition of HMM A Hidden Markov Model contains z Alphabet Σ = { b1, b2, …, bM }, i.e. all possible observations in a process z A set of States Q={1,…K} z Statistical parameters connecting Alphabet to States and States to States, i.e. z Start probability a0i, Σ(i=1..K) a0i =1 z Transition probabilities between any two States aij= transition probability from State i to State j, Σ(j=1..K) aij =1 z Emission probability ei(b) = P( xi = b | πi = k) ei(b1) + … + ei(bM) = 1, for all states i = 1…K 3.3 Features and notations in HMM HMM is memory-less: at each step t, the only thing that affects future states is the current state πt. I.e.: P(πt+1 = k | “whatever happened so far”) =P(πt+1 = k | π1, π2, …, πt, x1, x2, …, xt)= P(πt+1 = k | πt) What is “Hidden”: The sequence of states is hidden from the sequence of outcomes A parse of a sequence: A sequence of states π= π1, ….. πN for a given sequence x=x1,…xN Likehood of a parse: The probability that a given parse (π= π1, ….. πN) generates a given sequence (x=x1,…xN): P(x, π) = P(x1, …, xN, π1, ……, πN) = P(xN, πN | πN-1) P(xN-1, πN-1 | πN-2)……P(x2, π2 | π1) P(x1, π1) = P(xN | πN) P(πN | πN-1) ……P(x2 | π2) P(π2 | π1) P(x1 | π1) P(π1) = a0π1 aπ1π2……aπN-1πN eπ1(x1)……eπN(xN) Using our favorite casino example, given a sequence of roll of x=1215621624,the likehood of a parse π=Fair Fair Fair Fair Fair Fair Fair Fair Fair Fair is a0Fair*P(1 | Fair) P(Fair | Fair) P(2 | Fair) P(Fair | Fair) … P(4 | Fair)= 1/2*(1/6)10 *(0.95)9=0.5x10-9 Similarly, the likehood of a parse π=Loaded Loaded Loaded Loaded Loaded Loaded Loaded Loaded Loaded is a0Loaded*P(1|Loaded) P(Loaded|Loaded) P(2|Loaded) P(Loaded|Loaded) … P(4|Loaded) =1/2*(1/10)8 × (1/2)2 (0.95)9=7.9x10-10 We could conclude that for above given rolls, it is 6.59 times more likely that the die is fair all the way, than that it is loaded all the way. 3.3 Main questions on HMMs (i.e. what do we learn from HMM?) Evaluation: Given a HMM model M, and a sequence x, what is the probability of that x is generated from M? i.e. P(x|M) Decoding: Given a HMM model M and a sequence x, find a parse(s) of the sequence π (i.e. a set of states ) that maximizes P(x, π|M). Learning: Given a HMM model M with unspecified transition/emission probabilities, and a sequence x, using (experimental) training data to parameterize θ = (ei(.), aij) that maximize P(x | θ ). 4. Decoding and Evaluation algorithms 3.1 Viterbi algorithm (Decoding) Idea of Viterbi algorithm: Given a sequence x=x1…xN, and HMM model, the goal of decoding is to find a parse π=π1…πN such that P(x, π|M)= P(x,π) is maximal. Because of the memory-less feature of HMM, the probability of the optimal parse ending at state πi+1=l only depends on the prefix optimal parse ending at state πi=k and the transition probability from state k to l. This is a sign of using Dynamic Programming and also the basic idea behind Viterbi algorithm: Define Vk(i)= Probability of most likely sequence of states generating the first i letters and ending at state πi=k, i.e. Vk(i)=max(π1…πi-1)P(x1…xi-1, π1…πi-1, xi, πi=k) Then Vl(i+1)=maxk P(xi+1, πi+1 = l | πi = k) max{π1,…,i-1}P(x1…xi-1,π1,…,πi-1, xi, πi=k) =el(xi+1) maxk akl Vk(i) Algorithm: input x=x1…xN Initiation: V0(0)=1; Vk(0)=0, for all k>0 Iteration: Vj(i)=ej(xi) maxk[akjVk(i-1)]; Ptrj(i)=argmaxk(akjVk(i-1)) Termination: P(x, π*) =maxkVk(N) Traceback: πN*=argmaxkVk(N); πi-1* = Ptrπi (i) There is a practical issue we need to notice, that is because the absolute value of these probabilities is very small, the underflows will cause


View Full Document

Stanford CS 262 - Lecture Notes

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?