Stanford CS 262 - Predicting RNA Secondary Structure - D2923932

Home> Schools> Stanford University> Computer Science (CS) > CS 262> Predicting RNA Secondary Structure

DOC PREVIEW

Stanford CS 262 - Predicting RNA Secondary Structure

School name Stanford University

Course Cs 262- Computational Genomics

Pages 6

This preview shows page 1-2 out of 6 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 6 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Predicting RNA Secondary Structure CS262: Computational Genomics Lecture Notes from May 21, 2002 Submitted by: Anton Dizhur (Part I) and Oliver Kaljuvee (Part II) RNA is a string of the { A, U, G, C } nucleic acids. A string of these nucleic acids folds upon itself to form secondary structures much like the one shown above. RNA folds because certain pairings of nucleic acids are energetically favorable (form Hydrogen bonds), so if we start with an unwound RNA sequence, the secondary structure is formed through these chemical attractions. Specifically, G*C pair forms a 3-Hydrogen bonds A*U pair forms a 2-Hydrogen bonds G*U pair forms a 1-Hydrogen bond RNA’s functionality very much depends on its secondary structure. So we want to be able to predict the secondary structure, given the RNA sequence. More specifically, we want to predict the Hydrogen bond pairings between the nucleic acids, given the RNA sequence.2 The HMM technique we used to predict the alignments will not be sufficient to predict the long-range Hydrogen-bond interactions, so we will model them with Context Free Grammars (CFG). For example, given an RNA sequence ACGAAACCGU, the following CFG expansion produces the RNA structure on the right: S ! aX1u X1 ! cX2g X2 ! gX3c X3 ! aaac So, great, we looked at the sequence and omnisciently created the CFG that produces the optimal secondary structure; how does this help us predict the structure of an arbitrary RNA sequence? - You’re right, it doesn’t. We will need to use Stochastic Context Free Grammars for that. Stochastic Grammars Introduction What is a stochastic grammar? - It’s a grammar that has a probability distribution associated with the productions. Probabilities for all productions of a given non-terminal must sum to 1. Then, given the probability distribution θ associated with the grammar, we generate a string X, with probability P (X | θ). X ! R1 | R2 | R3 …… | Rn if pi = P (X ! Ri), then [1-N] Σ pi = 1 Example: The following Stochastic grammar statistically produces DNA strings of average size L. S ! aS | gS | cS | tS | ε P (S ! aS) = P (S ! gS) = P (S ! cS) = P (S ! tS) = (1 – 1/L) / 4 P (S ! ε) = 1 / L3 Then, given any string AATGCGT… of length l we can calculate its probability as P (AATGCGT…) = [(1 – 1/L) / 4] l x 1 / L Stochastic Regular Grammars """" HMMs: Proof of equivalency by example. An HMM with alphabet { 0, 1 }: Parameters: a – transition probability e – emission probability a11, a12, a13, a21, a22, a23, a31, a32, a33 e1(0), e2(0), e3(0) A Corresponding Stochastic Regular Grammar: S ! X1 X1 ! 0X1 | 1X1 | 0X2 | 1X2 | 0X3 | 1X3 Parameters: p11 p12 p13 p14 p15 p16 probabilities: pij i ∋ [1, 3], j ∋ [1, 6] X2 ! 0X1 | 1X1 | 0X2 | 1X2 | 0X3 | 1X3 p21 p22 p23 p24 p25 p26 X3 ! 0X1 | 1X1 | 0X2 | 1X2 | 0X3 | 1X3 p31 p32 p33 p34 p35 p36 Why are these equivalent? - Because we can calculate one set of parameters given the other, as follows: p11 + p12 = a11 p13 + p14 = a12 p15 + p16 = a13 p11 / (p11 + p12) = e1(0) p13 / (p13 + p14) = e1(0) p15 / (p15 + p16) = e1(0) 6 variables, 6 unknowns – done. Repeat for the p2j, p3j.4 Stochastic CFG (Context Free Grammar) kjrXrX→→!11 11=∑=kiir Example: Productions: ε|||||||||||||||||||3333333333322222221111111uXgXcXaXXguXugXcgXauXgcXaaXXguXugXcgXauXgcXaaXXguXugXcgXauXgcXaaXS→→→→ Issues: (1) decoding (2) evaluation (3) learning Problem: You are given examples of known RNAs; do learning. Review of CNF (Chomsky Normal Form) To get CNF, we convert all productions to such productions that all non-terminals would yield either two other non-terminals or a terminal: aXYZX→→ Definitions: Let G be a SCFG and let mWWS...1= be non-terminals, then Termination: )(:),(zyvvWWWPzyt→ Emission: )(:)(bWPbevv→ Such that: ∑∑=+zybvvbezyt,1)(),(5 Evaluation: Given nXXX...1=, find ).|(θXP Inside Algorithm: There are some strong similarities between the Inside Algorithm and the Forward Algorithm for the HMMs. ] vstarting,...[),,(1jxxPvjia= )|()1,,1(θXPna= ji≤ Initialization: )(),,(ivxeviia= Induction: ∑∑∑⋅+⋅=<kvzyjizytzjkaykiavjia),(),,1(),,(),,( Termination: )|()1,,1(θXPna= Let n: |X| and M: number of non-terminals, then memory performance is O(n2M) and running time is O(n3M3). (Typical RNA sequence is of order 102.) Outside Algorithm There are some strong similarities between the Outside Algorithm and the Backward Algorithm for the HMMs. ] vstarting and derived, is ...,...[:),,(111njixxxxPvjib+− Initialization: 1n ,0),,1(1)1,,1(≠==vnbnb6 Induction: ∑∑∑⋅⋅−=kyzyvztyjkbzikavjib),(),,(),1,(),,(= ∑∑∑∑∑∑+==⋅⋅++⋅⋅−11),(),,(),,1(),(),,(),1,(jkyyzkyyzzvtykibzkjavztyjkbzika Termination: ∑⋅=vivxixeviibxP)(),,()|(θ

View Full Document

Stanford CS 262 - Predicting RNA Secondary Structure

Sign up for free to view:

This document and 3 million+ documents and flashcards
High quality study guides, lecture notes, practice exams
Course Packets handpicked by editors offering a comprehensive review of your courses
Better Grades Guaranteed


School:
Email:
New Password:
Confirm Password:

This preview shows page 1-2 out of 6 pages.

Stanford CS 262 - Predicting RNA Secondary Structure

Sign up for free to view:

Please select your school