DOC PREVIEW
Stanford CS 262 - RNA Secondary Structure Prediction

This preview shows page 1-2-3 out of 9 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 9 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CS 262 Lecture 20Scribed by Michael YuMarch 15, 2007RNA Secondary Structure PredictionThis lecture continues the discussion of using stochastic context free grammars (SCFG) for predict-ing the secondary structure (loops, bulges, etc.) of RNA.Stochastic Context Free GrammarsAn SCFG is a context free grammar (CFG) with probabilities assigned to every production rulesuch that the sum of the probabilities of all productions out of a given non-terminal is 1. That is,given a grammar,X1→ s11| . . . |s1n...Xm→ sm1| . . . |smn,We can assign probabilities to each production Xi→ sjksuch that Pr(Xi→ si1) + . . . + Pr(Xi→sin) = 1.We are interested in the decoding, evaluation, and learning problems for SCFGs, just like wewere for hidden Markov models (HMMs). These problems can be solved using the CYK (Cocke-Younger-Kasami) algorithm, the inner and outer algorithms, and expectation maximization (EM)respectively.The CYK Algorithm (Decoding)The CYK algorithm is the analogous to the Viterbi algorithm; it solves the decoding problem forSCFGs. Given a sequence and a SCFG in Chomsky Normal Form (CNF), the CYK algorithm findsthe most likely parse tree of the sequence by the SCFG. The algorithm is as follows:Initialization:For i = 1 to N , any non-terminal V :γ(i, i, V ) = log Pr(V → xi)Iteration:For i = 1 to N − 1:For j = i + 1 to N :For any non-terminal V :γ(i, j, V ) = maxXmaxYmaxi≤k<jγ(i, k, X) + γ(k + 1, j, Y ) + log Pr(V → XY ))1Termination:log Pr(x|θ, π∗) = γ(1, N, S)Where π∗is the optimal parse tree (if traced appropriately from above).The basic idea is that γ(i, j, V ) is the likelihood of the most probable parse of xi· · · xjrooted at V .In CNF, a parse rooted at V can either produce a terminal xior two non-terminals X and Y . Thefirst case is handled in the initialization step by letting γ(i, i, V ) = log Pr(V → xi). The second caseis handled recursively. If we take a production of the form V → XY , then X is the root of a parsesubtree for xi· · · xkand Y for xk+1· · · xj:The max likelihood of such a parse tree is the max likelihood of the two subtrees X and Y , γ(i, k , X)and γ(k +1, j, Y ) respectively, summed with log Pr(V → XY ). So, we take the maximum sum overall productions of the form V → XY and all k ∈ [i, j).The Inside and Outside Algorithms (Evaluation)The inside and outside algorithms are analogous to the forward and backward algorithms; theysolve the evaluation problem for SCFGs. The inside algorithm calculates:a(i, j, V ) = Pr(xi, . . . , xjis generated by nonterminal V )It is very similar to the CYK algorithm, but we sum instead of maximize over all productions andk. The algorithm is as follows:Initialization:For i = 1 to N , any non-terminal V :a(i, i, V ) = log Pr(V → xi)Iteration:For i = 1 to N − 1:For j = i + 1 to N :For any non-terminal V :a(i, j, V ) =PXPYPi≤k<ja(i, k, X) a(k + 1, j , Y ) Pr(V → XY ))Termination:Pr(x|θ) = a(1, N, S)The outside algorithm calculates:a(i, j, V ) = Pr(x, excluding xi· · · xj, is generated by S and the excluded part is rooted at V )2And the actual algorithm is as follows:Initialization:b(1, N, S) = 1For any other V, b(1, N, V ) = 0Iteration:For i = 1 to N − 1:For j = N down to i:For any non-terminal V :b(i, j, V ) =PXPYPk<ia(k, i − 1, X) b(k, j, Y ) Pr(Y → XV ))PXPYPk>ja(j + 1, k, X) b(i, k, Y ) Pr(Y → V X))Termination:It is true for any i that Pr(x|θ) =PXb(i, i, X) Pr(X → xi)The main idea is that if V is the root of the parse tree for xi· · · xj, then there must be a productionof the form Y → XV or Y → V X. We’ll consider the first case (the second case follows similarreasoning), which is shown below:V is the root for the parse of xi· · · xj, so X must be the root for the parse of xk· · · xi−1for k < i.We take the product of the maximum likelihood of X covering xk· · · xi−1, which is a(k, i − 1, X),the probability of covering x excluding the pieces covered by X and V , and the probability of theproduction Y → XV . We sum this product over all k < i and all productions. Similarly, we do thisfor the case where V is the left-side of a production.Expectation Maximization (Learning)We do parameter training for SCFGs very similarly to how we do it for HMMs. There are two cases.The first case is when we have annotated examples (a set of sequences and the corresponding RNAstructure). The second case is when we only have a set of sequences, but we do not know thecorresponding RNA structures. We can use EM in both cases, although a closed form solutionexists for the maximum likelihood parameters for the first case, so EM would never be used inpractice for that case.Let c(V ) = expected number of times V is in the parse of x. We can compute this using a run of the3inside and outside algorithms as follows:c(V ) =1Pr(x|θ)NXi=1NXj=ia(i, j, V ) b(i, j , V )Every time V is used in the parse of x, it roots a tree covering xi· · · xjfor some i and j. V can showup multiple times, but each case is disjoint, so we can sum the probabilities and normalize it withPr(x|θ) to get the expected number of times V is in the parse of x.We can also calculate c(V → XY ), the expected number of times the production v → XY appearsin the parse of x:c(V ) =1Pr(x|θ)NXi=1NXj=ij−1Xk=ib(i, j, V ) a(i, k, X) a(k + 1, j, Y ) Pr(V → XY )b(i, j, V ) gives us the probability of x excluding xi· · · xjand that the subtree covering xi· · · xjis rooted at V . Pr(V → XY ) gives the probability that V branches into two subtrees X and Y .a(i, k, X) and a(k + 1, j, Y ) give the maximum likelihood probabilities for these two subtrees. Theproduct of these probabilities is the probability of V → XY , X covers xi· · · xk, and V coversxk+1· · · xjin the parse of x. Again, we can sum these products to get the expected number.We can now re-estimate our parameters with EM:Pr(V → XY ) =c(V → XY )c(V )Pr(V → a) =c(V → a)c(V )Stochastic Context Free Grammars vs Hidden Markov ModelsThe following table summarizes SCFGs compared to HMMs:Goal HMM Algorithm SCFG AlgorithmOptimal parse Viterbi CYKEstimation Forward InsideBackward OutsideLearning EM: Forward / Backward EM: Inside / OutsideMemory complexity O(NK) O(N2K)Time complexity O(NK2) O(N3K3)K is the number of states in the HMM and the number of non-terminals in the SCFG. Notice thatSCFGs do not scale as well as HMMs.There is a recent paper, which won best paper award in the 2006 International Conference …


View Full Document

Stanford CS 262 - RNA Secondary Structure Prediction

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download RNA Secondary Structure Prediction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view RNA Secondary Structure Prediction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view RNA Secondary Structure Prediction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?