DOC PREVIEW
Stanford CS 262 - Hidden Markov Models Lecture Notes

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS262: Hidden Markov ModelsLecture NotesScribe: Jason [email protected] January, 2008Brief Comment from the ScribeThese notes are are based on the CS262 lec ture on Hidden Markov Models, given by Serafim Batzoglou atStanford University on 1/28/08. Any errors/typos in this document are my own. All images are taken fromSerafim’s slides.1 IntroductionHidden Markov Models (HMMs) model sequential data. In particular, they are often used to model biologicalsequences such as DNA and proteins.HMMs gained popularity in 1997, when they were used as a gene predictor that allowed accurate pre-dictions across the whole human genome. They are now used in many different areas of bioinfomatics, andother areas of computer science, such as speech recognition.2 Recommended Textbook for HMMsDurbin, Eddy, Krogh, Mitchison “Biological Sequence Analysis”. Chapters 3 and 4.3 Motivating Example: The Dishonest Casino Player1Imagine that a casino player has two dice:• A fair die with P (1) = P (2) = P (3) = P (4) = P (5) = P (6) =16.• A loaded die, with P (1) = P (2) = P (3) = P (4) = P (5) =110and P (6) =12.He switches dice randomly, without respect to the last time he switched. In other words, on every turn, heis equally likely to switch dice. On average, he switches dice once every 20 turns.Assume we have a sequence of rolls. There are 3 main questions we might want to ask:1. EvaluationAssuming that we have a model of what is happening, how likely is the sequence to occur?2. DecodingWhen was the player rolling the fair die, and when was he rolling the loaded die?3. LearningWe want to learn the parameters of the model. What are the probabilities of the dice landing on eachnumber? How often does the player switch dice?Note that we can do a lot of these things intuitively. For example, if we see a long string of 6s in a sequence,we can assume that perhaps the player was using a loaded die. HMMs allow us to do this sort of analysis ina rigorous, mathematical way.4 Definition of a Hidden Markov Model4.1 ExampleWe begin with a simple example of an HMM.2As you can see, for e ach state, there are probabilities for outputting each possible letter in the alphabet, andprobabilities for moving to some other states (including, possibly, the state you are currently in). Note thatan HMM is memory-less: the probabilities depend only on the state you are currently in, without regardsto any history. Even if you just arrived in a state, you have the same probability of leaving that state as ifyou had been there for 100 turns.4.2 Formal DefinitionA Hidden Markov Model (HMM) consists of the following:• An alphabe t Σ = {b1, b2, · · · , bM}.• A set of states Q = {1, 2, · · · , K}.• Transition probabilities between any two states: aij= the transition probability from state i to j.Note that the transition probabilities for a given state must sum to 1. i.e. ai1+ ai2+ · · · + aiK= 1 forall 1 ≤ i ≤ K.• Start probabilities a0ifor all 1 ≤ i ≤ K. As before, these probabilities must sum to 1.• Emission probabilities for each state: ei(b) is the probability of emitting b in state i. We have ei(b) =P (xt= b|πt= i). As before, the sum if emission probabilities in each state must be 1. i.e. ei(b1) +ei(b2) + · · · ei(bM) = 1 for all 1 ≤ i ≤ K.Note that the Durbin book also has end probabilities, but these are unnecessary as they are implied by theother parameters.4.3 HMMs are Memory-LessBecause each step is affected only by the current state (and not any previous states), we can say the following:P (πt+1= k|“whatever happened so far”) =P (πt+1= k|π1, π2, · · · , πt, x1, x2, · · · , xt) =P (πt+1= k|πt)Similarly, the emission probability is based solely on the current state.P (xt= b|“whatever happened so far”) =P (xt= b|π1, π2, · · · , πt, x1, x2, · · · , xt−1) =P (xt= b|πt)How simple!5 Parsing a SequenceA sequence is a series of emissionsx = x1x2· · · xN.A parse is a series of statesπ = π1, π2, · · · πN.A pictorial representation of a parse is as follows:3Observe that for each time t from 1 to N , there is a state and an emission.5.1 Generating A Sequence from an HMMThis is quite easy. We simply follow the following procedure to generate a sequence of length n:1. Start at state π1according to probability a0π1.2. Emit letter x1according to probability eπ1(x1).3. Go to state π2according to probability aπ1π2.4. Repeat until emitting xn.6 Evaluation (Likelihood of a Parse)Given a sequence x = x1x2· · · xNand a parse π = π1, π2, · · · πN, we can determine the likelihood of thatparse.This is quite logical. We just look at all the probabilities of emissions and transitions from t = 0 onwards.You can logically think ab out this as all of the orange arrows in the previous diagram.P (x, π) = P (x1, · · · , xN, π1, · · · , piN)= P (xN|πN) · P (πN|πN−1) · P (xN−1|πN−1) · P (πN−1|πN−2) · · · P (x1|π1) · P (π1)which we get by applying the chain rule multiple times. We can rewrite this as the product of all emissionprobabilities and all transition probabilities:= a0π1aπ1π2· · · aπN −11πN· eπ1(x1) · · · eπN(xN)This has a lot of symbols. But there is a more mathematically elegant way of writing this. When you thinkabout it, no matter how long that sequence is, all of those numbers come from a small set of parameters.For example, in the casino example, there are 18 parameters: 4 transition probabilities, and 12 emissionprobabilities, and 2 start probabilities.So really instead of that huge product, we could just include the 18 parameters, with exponents.4a0fair= θ1; a0loaded= θ2; · · · ; eloaded(6) = θ18Let F (j, x, π) (the “feature count”) be the number of times parameter θjoccurs in (x, π). Then we have:P (x, π) =Yj=1···nθF (j,x,π)j= expXj=1···nlog(θj)F (j, x, π)This will be especially useful during later topics in the course.7 DecodingDecoding is finding the most likely parse of a sequence. Intuitively, this is equivalent to finding whateversequence of states π has the highest probability, given a sequence x.7.1 Simple ExampleImagine we have a sequence of rolls x = 1, 2, 1, 5, 6, 2, 1, 5, 2, 4. We want to see whether it is more likely thatit was always a fair die, or always a loaded die. (This is not strictly the problem of decoding, as it is justdeciding between two possible π sequences, as opposed to all of them.) The likelihood of a fair die


View Full Document

Stanford CS 262 - Hidden Markov Models Lecture Notes

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Hidden Markov Models Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hidden Markov Models Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hidden Markov Models Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?