DOC PREVIEW
Stanford CS 262 - Predicting RNA Secondary Structure

This preview shows page 1-2 out of 6 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 6 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Predicting RNA Secondary Structure CS262: Computational Genomics Lecture Notes from May 21, 2002 Submitted by: Anton Dizhur (Part I) and Oliver Kaljuvee (Part II) RNA is a string of the { A, U, G, C } nucleic acids. A string of these nucleic acids folds upon itself to form secondary structures much like the one shown above. RNA folds because certain pairings of nucleic acids are energetically favorable (form Hydrogen bonds), so if we start with an unwound RNA sequence, the secondary structure is formed through these chemical attractions. Specifically, G*C pair forms a 3-Hydrogen bonds A*U pair forms a 2-Hydrogen bonds G*U pair forms a 1-Hydrogen bond RNA’s functionality very much depends on its secondary structure. So we want to be able to predict the secondary structure, given the RNA sequence. More specifically, we want to predict the Hydrogen bond pairings between the nucleic acids, given the RNA sequence.2 The HMM technique we used to predict the alignments will not be sufficient to predict the long-range Hydrogen-bond interactions, so we will model them with Context Free Grammars (CFG). For example, given an RNA sequence ACGAAACCGU, the following CFG expansion produces the RNA structure on the right: S ! aX1u X1 ! cX2g X2 ! gX3c X3 ! aaac So, great, we looked at the sequence and omnisciently created the CFG that produces the optimal secondary structure; how does this help us predict the structure of an arbitrary RNA sequence? - You’re right, it doesn’t. We will need to use Stochastic Context Free Grammars for that. Stochastic Grammars Introduction What is a stochastic grammar? - It’s a grammar that has a probability distribution associated with the productions. Probabilities for all productions of a given non-terminal must sum to 1. Then, given the probability distribution θ associated with the grammar, we generate a string X, with probability P (X | θ). X ! R1 | R2 | R3 …… | Rn if pi = P (X ! Ri), then [1-N] Σ pi = 1 Example: The following Stochastic grammar statistically produces DNA strings of average size L. S ! aS | gS | cS | tS | ε P (S ! aS) = P (S ! gS) = P (S ! cS) = P (S ! tS) = (1 – 1/L) / 4 P (S ! ε) = 1 / L3 Then, given any string AATGCGT… of length l we can calculate its probability as P (AATGCGT…) = [(1 – 1/L) / 4] l x 1 / L Stochastic Regular Grammars """" HMMs: Proof of equivalency by example. An HMM with alphabet { 0, 1 }: Parameters: a – transition probability e – emission probability a11, a12, a13, a21, a22, a23, a31, a32, a33 e1(0), e2(0), e3(0) A Corresponding Stochastic Regular Grammar: S ! X1 X1 ! 0X1 | 1X1 | 0X2 | 1X2 | 0X3 | 1X3 Parameters: p11 p12 p13 p14 p15 p16 probabilities: pij i ∋ [1, 3], j ∋ [1, 6] X2 ! 0X1 | 1X1 | 0X2 | 1X2 | 0X3 | 1X3 p21 p22 p23 p24 p25 p26 X3 ! 0X1 | 1X1 | 0X2 | 1X2 | 0X3 | 1X3 p31 p32 p33 p34 p35 p36 Why are these equivalent? - Because we can calculate one set of parameters given the other, as follows: p11 + p12 = a11 p13 + p14 = a12 p15 + p16 = a13 p11 / (p11 + p12) = e1(0) p13 / (p13 + p14) = e1(0) p15 / (p15 + p16) = e1(0) 6 variables, 6 unknowns – done. Repeat for the p2j, p3j.4 Stochastic CFG (Context Free Grammar) kjrXrX→→!11 11=∑=kiir Example: Productions: ε|||||||||||||||||||3333333333322222221111111uXgXcXaXXguXugXcgXauXgcXaaXXguXugXcgXauXgcXaaXXguXugXcgXauXgcXaaXS→→→→ Issues: (1) decoding (2) evaluation (3) learning Problem: You are given examples of known RNAs; do learning. Review of CNF (Chomsky Normal Form) To get CNF, we convert all productions to such productions that all non-terminals would yield either two other non-terminals or a terminal: aXYZX→→ Definitions: Let G be a SCFG and let mWWS...1= be non-terminals, then Termination: )(:),(zyvvWWWPzyt→ Emission: )(:)(bWPbevv→ Such that: ∑∑=+zybvvbezyt,1)(),(5 Evaluation: Given nXXX...1=, find ).|(θXP Inside Algorithm: There are some strong similarities between the Inside Algorithm and the Forward Algorithm for the HMMs. ] vstarting,...[),,(1jxxPvjia= )|()1,,1(θXPna= ji≤ Initialization: )(),,(ivxeviia= Induction: ∑∑∑⋅+⋅=<kvzyjizytzjkaykiavjia),(),,1(),,(),,( Termination: )|()1,,1(θXPna= Let n: |X| and M: number of non-terminals, then memory performance is O(n2M) and running time is O(n3M3). (Typical RNA sequence is of order 102.) Outside Algorithm There are some strong similarities between the Outside Algorithm and the Backward Algorithm for the HMMs. ] vstarting and derived, is ...,...[:),,(111njixxxxPvjib+− Initialization: 1n ,0),,1(1)1,,1(≠==vnbnb6 Induction: ∑∑∑⋅⋅−=kyzyvztyjkbzikavjib),(),,(),1,(),,(= ∑∑∑∑∑∑+==⋅⋅++⋅⋅−11),(),,(),,1(),(),,(),1,(jkyyzkyyzzvtykibzkjavztyjkbzika Termination: ∑⋅=vivxixeviibxP)(),,()|(θ


View Full Document

Stanford CS 262 - Predicting RNA Secondary Structure

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Predicting RNA Secondary Structure
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Predicting RNA Secondary Structure and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Predicting RNA Secondary Structure 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?