DOC PREVIEW
Stanford CS 262 - Hidden Markov Models

This preview shows page 1-2-17-18-19-36-37 out of 37 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 37 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1 x2 x3 xK 2 1 K 2HiSeq X & NextSeqViterbi, Forward, Backward VITERBI Initialization: V0(0) = 1 Vk(0) = 0, for all k > 0 Iteration: Vl(i) = el(xi) maxk Vk(i-1) akl Termination: P(x, π*) = maxk Vk(N) FORWARD Initialization: f0(0) = 1 fk(0) = 0, for all k > 0 Iteration: fl(i) = el(xi) Σk fk(i-1) akl Termination: P(x) = Σk fk(N) BACKWARD Initialization: bk(N) = 1, for all k Iteration: bl(i) = Σk el(xi+1) akl bk(i+1) Termination: P(x) = Σk a0k ek(x1) bk(1)Variants of HMMsHigher-order HMMs • How do we model “memory” larger than one time point? • P(πi+1 = l | πi = k) akl • P(πi+1 = l | πi = k, πi -1 = j) ajkl • … • A second order HMM with K states is equivalent to a first order HMM with K2 states state H state T aHT(prev = H) aHT(prev = T) aTH(prev = H) aTH(prev = T) state HH state HT state TH state TT aHHT aTTH aHTT aTHH aTHT aHTHSimilar Algorithms to 1st Order • P(πi+1 = l | πi = k, πi -1 = j) § Vlk(i) = maxj{ Vkj(i – 1) + … } § Time? Space?Modeling the Duration of States Length distribution of region X: E[lX] = 1/(1-p) • Geometric distribution, with mean 1/(1-p) This is a significant disadvantage of HMMs Several solutions exist for modeling different length distributions X Y 1-p 1-q p qExample: exon lengths in genesSolution 1: Chain several states X Y 1-p 1-q p q X X Disadvantage: Still very inflexible lX = C + geometric with mean 1/(1-p)Solution 2: Negative binomial distribution Duration in X: m turns, where § During first m – 1 turns, exactly n – 1 arrows to next state are followed § During mth turn, an arrow to next state is followed m – 1 m – 1 P(lX = m) = n – 1 (1 – p)n-1+1p(m-1)-(n-1) = n – 1 (1 – p)npm-n X(n) p X(2) X(1) p 1 – p 1 – p p …… Y 1 – pExample: genes in prokaryotes • EasyGene: Prokaryotic gene-finder Larsen TS, Krogh A • Negative binomial with n = 3Solution 3: Duration modeling Upon entering a state: 1. Choose duration d, according to probability distribution 2. Generate d letters according to emission probs 3. Take a transition to next state according to transition probs Disadvantage: Increase in complexity of Viterbi: Time: O(D) Space: O(1) where D = maximum duration of state F d<Df xi…xi+d-1 Pf Warning, Rabiner’s tutorial claims O(D2) & O(D) increasesViterbi with duration modeling Recall original iteration: Vl(i) = maxk Vk(i – 1) akl × el(xi) New iteration: Vl(i) = maxk maxd=1…Dl Vk(i – d) × Pl(d) × akl × Πj=i-d+1…iel(xj) F L transitions emissions d<Df xi…xi + d – 1 emissions d<Dl xj…xj + d – 1 Pf Pl Precompute cumulative valuesLearning Re-estimate the parameters of the model based on training dataTwo learning scenarios 1. Estimation when the “right answer” is known Examples: GIVEN: a genomic region x = x1…x1,000,000 where we have good (experimental) annotations of the CpG islands GIVEN: the casino player allows us to observe him one evening, as he changes dice and produces 10,000 rolls 2. Estimation when the “right answer” is unknown Examples: GIVEN: the porcupine genome; we don’t know how frequent are the CpG islands there, neither do we know their composition GIVEN: 10,000 rolls of the casino player, but we don’t see when he changes dice QUESTION: Update the parameters θ of the model to maximize P(x|θ)1. When the states are known Given x = x1…xN for which the true π = π1…πN is known, Define: Akl = # times k→l transition occurs in π Ek(b) = # times state k in π emits b in x We can show that the maximum likelihood parameters θ (maximize P(x|θ)) are: Akl Ek(b) akl = ––––– ek(b) = ––––––– Σi Aki Σc Ek(c)1. When the states are known Intuition: When we know the underlying states, Best estimate is the normalized frequency of transitions & emissions that occur in the training data Drawback: Given little data, there may be overfitting: P(x|θ) is maximized, but θ is unreasonable 0 probabilities – BAD Example: Given 10 casino rolls, we observe x = 2, 1, 5, 6, 1, 2, 3, 6, 2, 3 π = F, F, F, F, F, F, F, F, F, F Then: aFF = 1; aFL = 0 eF(1) = eF(3) = .2; eF(2) = .3; eF(4) = 0; eF(5) = eF(6) = .1Pseudocounts Solution for small training sets: Add pseudocounts Akl = # times k→l transition occurs in π + rkl Ek(b) = # times state k in π emits b in x + rk(b) rkl, rk(b) are pseudocounts representing our prior belief Larger pseudocounts ⇒ Strong priof belief Small pseudocounts (ε < 1): just to avoid 0 probabilitiesPseudocounts Example: dishonest casino We will observe player for one day, 600 rolls Reasonable pseudocounts: r0F = r0L = rF0 = rL0 = 1; rFL = rLF = rFF = rLL = 1; rF(1) = rF(2) = … = rF(6) = 20 (strong belief fair is fair) rL(1) = rL(2) = … = rL(6) = 5 (wait and see for loaded) Above #s are arbitrary – assigning priors is an art2. When the states are hidden We don’t know the true Akl, Ek(b) Idea: • We estimate our “best guess” on what Akl, Ek(b) are § Or, we start with random / uniform values • We update the parameters of the model, based on our guess • We repeat2. When the states are hidden Starting with our best guess of a model M, parameters θ: Given x = x1…xN for which the true π = π1…πN is unknown, We can get to a provably more likely parameter set θ i.e., θ that increases the probability P(x | θ) Principle: EXPECTATION MAXIMIZATION 1. Estimate Akl, Ek(b) in the training data 2. Update θ according to Akl, Ek(b) 3. Repeat 1 & 2, until convergenceEstimating new parameters To estimate Akl: (assume “| θCURRENT”, in all formulas below) At each position i of sequence x, find probability transition k→l is used: P(πi = k, πi+1 = l | x) = [1/P(x)] × P(πi = k, πi+1 = l, x1…xN) = Q/P(x) where Q = P(x1…xi, πi = k, πi+1 = l, xi+1…xN) = = P(πi+1 = l, xi+1…xN | πi = k) P(x1…xi, πi = k) = = P(πi+1 = l, xi+1xi+2…xN | πi = k) fk(i) = = P(xi+2…xN | πi+1 = l) P(xi+1 | πi+1 = l) P(πi+1 = l | πi = k) fk(i) = = bl(i+1) el(xi+1) akl


View Full Document

Stanford CS 262 - Hidden Markov Models

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Hidden Markov Models
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Hidden Markov Models and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Hidden Markov Models 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?