DOC PREVIEW
Stanford CS 262 - Lecture Notes

This preview shows page 1-2-15-16-17-32-33 out of 33 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 33 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

RNA Secondary StructureSlide 2A Context Free GrammarExample: modeling a stem loopSlide 5A parse tree: alignment of CFG to sequenceAlignment scores for parsesThe Nussinov AlgorithmSlide 9Slide 10The Nussinov Algorithm and CFGsSlide 12Stochastic Context Free GrammarsComputational ProblemsNormal Forms for CFGsExample of converting a CFG to C.N.F.Another exampleDecoding: the CYK algorithmThe CYK algorithm (Cocke-Younger-Kasami)A SCFG for predicting RNA structureCYK for RNA foldingEvaluationThe Inside AlgorithmAlgorithm: InsideThe Outside AlgorithmAlgorithm: OutsideLearning for SCFGsSlide 28Summary: SCFG and HMM algorithmsThe Zuker algorithm – main ideasMethods for inferring RNA foldMultiple alignment and RNA foldingMutual informationRNA Secondary StructureaagacuucggaucuggcgacacccuacacuucggaugacaccaaagugaggucuucggcacgggcaccauucccaacuucggauuuugcuaccauaaagccuucggagcgggcguaacucHairpin LoopsStemsBulge loopInterior loopsMulti-branched loopA Context Free GrammarS  AB Nonterminals: S, A, BA  aAc | a Terminals: a, b, c, dB  bBd | bDerivation:S  AB  aAcB  …  aaaacccB  aaaacccbBd  …  aaaacccbbbbbbdddProduces all strings ai+1cibj+1dj, for i, j  0Example: modeling a stem loopS  a W1 uW1  c W2 gW2  g W3 cW3  g L cL  agucgWhat if the stem loop can have other letters in place of the ones shown?ACGGUGCCAG UCGExample: modeling a stem loopS  a W1 u | g W1 u W1  c W2 gW2  g W3 c | g W3 uW3  g L c | a L uL  agucg | agccg | cugugcMore general: Any 4-long stem, 3-5-long loop:S  aW1u | gW1u | gW1c | cW1g | uW1g | uW1aW1  aW2u | gW2u | gW2c | cW2g | uW2g | uW2aW2  aW3u | gW3u | gW3c | cW3g | uW3g | uW3aW3  aLu | gLu | gLc | cLg | uLg | uLaL  aL1 | cL1 | gL1 | uL1L1  aL2 | cL2 | gL2 | uL2L2  a | c | g | u | aa | … | uu | aaa | … | uuuACGGUGCCAG UCGGCGAUGCUAG CCGGCGAUGUUCUG UCGA parse tree: alignment of CFG to sequenceACGGUGCCAG UCGA C G G A G U G C C C G USW1W2W3L•S  a W1 u•W1  c W2 g•W2  g W3 c•W3  g L c•L  agucgAlignment scores for parsesWe can define each rule X  s, where s is a string,to have a score.Example:W  a W’ u: 3 (forms 3 hydrogen bonds)W  g W’ c: 2 (forms 2 hydrogen bonds)W  g W’ u: 1 (forms 1 hydrogen bond)W  x W’ z -1, when (x, z) is not an a/u, g/c, g/u pairQuestions:-How do we best align a CFG to a sequence? (DP)-How do we set the parameters? (Stochastic CFGs)The Nussinov AlgorithmLet’s forget CFGs for a momentProblem:Find the RNA structure with the maximum (weighted) number of nested pairingsAGACCUCUGGGCGGCAGUCUAUGCGAACGCGUCAUCAGCUGGAAGAAGGGAGAUCUUCACCAAUACUGAAUUGCACCACGCUUAAGACACCUAGCUUGUGUCCUGGAGGUCUAUAAGUCAGACCGCGAGAGGGAAGACUCGUAUAAGCGAThe Nussinov AlgorithmGiven sequence X = x1…xN,Define DP matrix:F(i, j) = maximum number of bonds if xi…xj folds optimallyTwo cases, if i < j:1. xi is paired with xjF(i, j) = s(xi, xj) + F(i+1, j-1)•xi is not paired with xjF(i, j) = max{ k: i  k < j } F(i, k) + F(k+1, j)i ji ji jkF(i, j)F(i, k)F(k+1, j)The Nussinov AlgorithmInitialization:F(i, i-1) = 0; for i = 2 to NF(i, i) = 0; for i = 1 to NIteration:For i = 2 to N:For i = 1 to N – lj = i + l – 1F(i+1, j -1) + s(xi, xj)F(i, j) = maxmax{ i  k < j } F(i, k) + F(k+1, j)Termination: Best structure is given by F(1, N)(Need to trace back; refer to the Durbin book)The Nussinov Algorithm and CFGsDefine the following grammar, with scores:S  a S u : 3 | u S a : 3 g S c : 2 | c S g : 2 g S u : 1 | u S g : 1 S S : 0 | a S : 0 | c S : 0 | g S : 0 | u S : 0 |  : 0 Note:  is the “” stringThen, the Nussinov algorithm finds the optimal parse of a string with this grammarThe Nussinov AlgorithmInitialization:F(i, i-1) = 0; for i = 2 to NF(i, i) = 0; for i = 1 to N S  a | c | g | uIteration:For i = 2 to N:For i = 1 to N – lj = i + l – 1F(i+1, j -1) + s(xi, xj) S  a S u | …F(i, j) = maxmax{ i  k < j } F(i, k) + F(k+1, j) S  S STermination: Best structure is given by F(1, N)Stochastic Context Free GrammarsIn an analogy to HMMs, we can assign probabilities to transitions:Given grammarX1  s11 | … | sin…Xm  sm1 | … | smnCan assign probability to each rule, s.t.P(Xi  si1) + … + P(Xi  sin) = 1Computational Problems•Calculate an optimal alignment of a sequence and a SCFG(DECODING)•Calculate Prob[ sequence | grammar ](EVALUATION)•Given a set of sequences, estimate parameters of a SCFG(LEARNING)Normal Forms for CFGsChomsky Normal Form:X  YZX  aAll productions are either to 2 nonterminals, or to 1 terminalTheorem (technical)Every CFG has an equivalent one in Chomsky Normal Form(That is, the grammar in normal form produces exactly the same set of strings)Example of converting a CFG to C.N.F.S  ABCA  Aa | aB  Bb | bC  CAc | cConverting:S  AS’S’  BCA  AA | aB  BB | bC  DC’ | cC’  cD  CASAB CAaaBbBbbCAcc aSAS’B CA Aa aB BB BbbbDC’CAcc aAnother exampleS  ABCA  C | aAB  bB | bC  cCd | cConverting:S  AS’S’  BCA  C’C’’ | c | A’AA’  aB  B’B | bB’  bC  C’C’’ | cC’  cC’’  CDD  dDecoding: the CYK algorithmGiven x = x1....xN, and a SCFG G,Find the most likely parse of x(the most likely alignment of G to x)Dynamic programming variable:(i, j, V): likelihood of the most likely parse of xi…xj,rooted at nonterminal VThen, (1, N, S): likelihood of the most likely parse of x by the grammarThe CYK algorithm (Cocke-Younger-Kasami)Initialization:For i = 1 to N, any nonterminal V, (i, i, V) = log P(V  xi)Iteration:For i = 1 to N-1 For j = i+1 to N For any nonterminal V,(i, j, V) = maxXmaxYmaxik<j (i,k,X) + (k+1,j,Y) + log P(V XY)Termination:log P(x | , *) = (1, N, S) Where * is the optimal parse tree (if traced back appropriately from above)A SCFG for predicting RNA structureS  a S | c S | g S | u S |   S a | S c | S g | S u  a S u | c S g | g S u | u S g | g S c | u S a  SS•Adjust the probability parameters to reflect bond strength etc•No distinction between non-paired bases, bulges, loops•Can modify to model these eventsL: loop nonterminalH: hairpin nonterminalB: bulge


View Full Document

Stanford CS 262 - Lecture Notes

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?