DOC PREVIEW
Stanford CS 262 - Lecture 7

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Hidden Markov Models—Variants Conditional Random FieldsTwo learning scenarios1. When the “true” parse is known2. When the “true parse” is unknownVariants of HMMsHigher-order HMMsModeling the Duration of StatesSlide 8Solution 1: Chain several statesSolution 2: Negative binomial distributionExample: genes in prokaryotesSolution 3: Duration modelingViterbi with duration modelingConditional Random FieldsLet’s look at an HMM againSlide 16Slide 17“Features” that depend on many pos. in xSlide 19Slide 20Slide 21How many parameters are there, in general?Conditional TrainingSlide 24Slide 25Slide 26Conditional Random Fields—SummaryHidden Markov Models—VariantsConditional Random Fields12K…12K…12K…………12K…x1x2x3xK21K2CS262 Lecture 7, Win06, BatzoglouTwo learning scenarios1. Estimation when the “right answer” is knownExamples: GIVEN: a genomic region x = x1…x1,000,000 where we have good (experimental) annotations of the CpG islandsGIVEN: the casino player allows us to observe him one evening, as he changes dice and produces 10,000 rolls2. Estimation when the “right answer” is unknownExamples:GIVEN: the porcupine genome; we don’t know how frequent are the CpG islands there, neither do we know their compositionGIVEN: 10,000 rolls of the casino player, but we don’t see when he changes diceQUESTION: Update the parameters  of the model to maximize P(x|)CS262 Lecture 7, Win06, Batzoglou1. When the “true” parse is knownGiven x = x1…xNfor which the true  = 1…N is known,Simply count up # of times each transition & emission is taken!Define:Akl = # times k l transition occurs in Ek(b) = # times state k in  emits b in xWe can show that the maximum likelihood parameters  (maximize P(x| )) are: Akl Ek(b)akl = ––––– ek(b) = ––––––– i Aki c Ek(c)CS262 Lecture 7, Win06, Batzoglou2. When the “true parse” is unknownBaum-Welch AlgorithmCompute expectedexpected # of times each transition & is taken!Initialization:Pick the best-guess for model parameters(or arbitrary)Iteration:1. Forward2. Backward3. Calculate Akl, Ek(b), given CURRENT4. Calculate new model parameters NEW : akl, ek(b)5. Calculate new log-likelihood P(x | NEW)GUARANTEED TO BE HIGHER BY EXPECTATION-MAXIMIZATIONUntil P(x | ) does not change muchCS262 Lecture 7, Win06, BatzoglouVariants of HMMsCS262 Lecture 7, Win06, BatzoglouHigher-order HMMs•How do we model “memory” larger than one time point?•P(i+1 = l | i = k) akl•P(i+1 = l | i = k, i -1 = j) ajkl•…•A second order HMM with K states is equivalent to a first order HMM with K2 statesstate H state TaHT(prev = H)aHT(prev = T)aTH(prev = H)aTH(prev = T)state HH state HTstate TH state TTaHHTaTTHaHTTaTHHaTHTaHTHCS262 Lecture 7, Win06, BatzoglouModeling the Duration of StatesLength distribution of region X:E[lX] = 1/(1-p)•Geometric distribution, with mean 1/(1-p)This is a significant disadvantage of HMMsSeveral solutions exist for modeling different length distributionsX Y1-p1-qp qCS262 Lecture 7, Win06, BatzoglouExample: exon lengths in genesCS262 Lecture 7, Win06, BatzoglouSolution 1: Chain several statesX Y1-p1-qpqXXDisadvantage: Still very inflexible lX = C + geometric with mean 1/(1-p)CS262 Lecture 7, Win06, BatzoglouSolution 2: Negative binomial distributionDuration in X: m turns, whereDuring first m – 1 turns, exactly n – 1 arrows to next state are followedDuring mth turn, an arrow to next state is followedm – 1 m – 1P(lX = m) = n – 1 (1 – p)n-1+1p(m-1)-(n-1) = n – 1 (1 – p)npm-nX(n)pX(2)X(1)p1 – p 1 – p p……Y1 – pCS262 Lecture 7, Win06, BatzoglouExample: genes in prokaryotes•EasyGene:Prokaryoticgene-finderLarsen TS, Krogh A•Negative binomial with n = 3CS262 Lecture 7, Win06, BatzoglouSolution 3: Duration modelingUpon entering a state:1. Choose duration d, according to probability distribution2. Generate d letters according to emission probs3. Take a transition to next state according to transition probsDisadvantage: Increase in complexity of Viterbi:Time: O(D)Space: O(1) where D = maximum duration of stateFd<Dfxi…xi+d-1PfWarning, Rabiner’s tutorial claims O(D2) & O(D) increasesCS262 Lecture 7, Win06, BatzoglouViterbi with duration modelingRecall original iteration:Vl(i) = maxk Vk(i – 1) akl  el(xi) New iteration:Vl(i) = maxk maxd=1…Dl Vk(i – d)  Pl(d)  akl  j=i-d+1…iel(xj)F Ltransitionsemissionsd<Dfxi…xi + d – 1emissionsd<Dlxj…xj + d – 1PfPlPrecompute cumulative valuesCS262 Lecture 7, Win06, BatzoglouConditional Random FieldsA brief description of a relatively new kind of graphical modelCS262 Lecture 7, Win06, BatzoglouLet’s look at an HMM againWhy are HMMs convenient to use?Because we can do dynamic programming with them!•“Best” state sequence for 1…i interacts with “best” sequence for i+1…N using K2 arrowsVl(i+1) = el(i+1) maxk Vk(i) akl = maxk( Vk(i) + [ e(l, i+1) + a(k, l) ] ) (where e(.,.) and a(.,.) are logs)•Total likelihood of all state sequences for 1…i+1 can be calculated from total likelihood for 1…i by only summing up K2 arrows12K…12K…12K…………12K…x1x2x3xN21K2CS262 Lecture 7, Win06, BatzoglouLet’s look at an HMM again•Some shortcomings of HMMsCan’t model state duration•Solution: explicit duration models (Semi-Markov HMMs)Unfortunately, state i cannot “look” at any letter other than xi!•Strong independence assumption: P(i | x1…xi-1, 1…i-1) = P(i | i-1) 12K…12K…12K…………12K…x1x2x3xN21K2CS262 Lecture 7, Win06, BatzoglouLet’s look at an HMM again•Another way to put this, features used in objective function P(x, ):akl, ek(b), where b  At position i: all K2 akl features, and all K el(xi) features play a roleOK forget probabilistic interpretation for a moment“Given that prev. state is k, current state is l, how much is current score?”•Vl(i) = Vk(i – 1) + (a(k, l) + e(l, i)) = Vk(i – 1) + g(k, l, xi)•Let’s generalize g!!! Vk(i – 1) + g(k, l, x, i)12K…12K…12K…………12K…x1x2x3xN21K2CS262 Lecture 7, Win06, Batzoglou“Features” that depend on many pos. in x•What do we put in g(k, l, x, i)?The “higher” g(k, l, x, i), the more we like going from k to l at position i•Richer models using this additional powerExamples•Casino


View Full Document

Stanford CS 262 - Lecture 7

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture 7
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 7 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 7 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?