LEHIGH CSE 397 - RNA Secondary Structure Prediction

Unformatted text preview:

RNA Secondary Structure PredictionChemical Structure of RNAPartial Tertiary StructureYet Another Tertiary StructureOur Final Tertiary PictureA Partial RNA Secondary StructurePure Secondary StructureOur Basic ModelImplementing Model RestrictionsOur Two AlgorithmsIndependent Base Pair AlgorithmIndependent Base Pairs What Makes It “Easy”?Independent Base Pairs Basic ApproachIndependent Base Pairs NotationSlide 15Independent Base Pairs - AlgorithmIndependent Base Pairs: Question 1Independent Base Pairs: Question 2Independent Base Pairs: Question 3Independent Base Pairs: Question 4Independent Base Pairs: Question 5Independent Base Pairs: Question 6Loop Free Energy AlgorithmTypes of LoopsHairpin LoopBulge on i and jInterior loopHelical regionFree energy analysisFree Energy FunctionsLoop Energy FormulasFree Energy Calculation for interval (i,j)What is the Apparent Complexity?What is the Actual Complexity?Multiple SolutionsReferencesRNA Secondary Structure PredictionIntroduction•RNA is a single-stranded chain of the nucleotides A, C, G, and U. The string of nucleotides specifies the linear structure of the RNA strand.•When RNA folds, complementary nucleotides form base pairs (CG and AU).•The tertiary (3 dimensional) structure is too complicated for us to calculate. •We calculate only secondary structures, lists of base pairs.•Knowing the base pairs tells a lot about the 3 dimensional structure.Chemical Structure of RNA•Four base types.•Distinguishable ends.Partial Tertiary Structure•One illustrationYet Another Tertiary Structure•Found via googleOur Final Tertiary Picture•Very complexA Partial RNA Secondary StructurePure Secondary StructureOur Basic Model•RNA linear structure: R=r1 r2 . . . rn from {A,C,G,U}•RNA secondary structure: pairs (ri,rj) such that 0<i<j<n+1.•Goal: secondary structures with minimum free energy.Implementing Model Restrictions•No knots: pairs (ri,rj) and (rk,rl) such that i<k<j<l. RNA does contain knots. •Program loop structure.•No “close” base pairs: j-i>t for some t>0.•High free energy.•Complementary base pairs: A-U, C-G.•High free energy.Our Two Algorithms•Independent base pairs – quite easy, but inaccurate.•Calculate loops’ free energy – best we can do for today’s class.Independent Base Pair Algorithm•Assumption: Independent base pairs.•Advantage 1: Simpler calculations.•Advantage 2: Illustrates ideas for a much more accurate algorithm.•Disadvantage: Unrealistic answers.Independent Base PairsWhat Makes It “Easy”?•Assumption: The energy of each base pair is independent of all of the other pairs and the loop structure.•Consequence: Total free energy is the sum of all of the base pair free energies.Independent Base PairsBasic Approach•Use solutions for smaller strings to determine solutions for larger strings.•This is precisely the kind of decoupling required for dynamic programming algorithms to work.Independent Base Pairs Notation•a(ri,rj) – the free energy of a base pair joining ri and rj.•Si,j – The secondary structure of the RNA strand from base ri to base rj. Ie, the set of base pairs between ri and rj inclusive.•E(Si,j) – The free energy associated with the secondary structure Si,j.•We define a(ri,rj) large when constraints are violated.•Consider the RNA strand from position i to j.•Consider whether rj is paired •If rj is paired, E(Si,j)=E(Si,k-1)+a(k,j)+E(Sk+1,j-1) for some i-1<k<j•If rj isn’t paired, then E(Si,j)=E(Si,j-1)Independent Base Pairs:Calculating Free EnergyIndependent Base Pairs - Algorithm•We search for intervals with minimum free energy.•For each interval, the free energy is given by this formula: E(Si,j) = min( E(Si+1,j-1)+a(ri,rj), E(Si,k-1+a(ri,rk)+Sk+1,j-1), i -1<k<j+1 )•The free energy of the RNA strand is E(S1,n).Independent Base Pairs:Question 1•How does this formula deal with the case where rj isn’t paired with any base?•A special case of E(Si,k-1+a(ri,rk)+Sk+1,j-1), i -1<k<j+1•The special case with k=j.Independent Base Pairs:Question 2•What is the high level algorithm flow?1. Advance from smaller to larger intervals, calculating free energy costs.2. Trace back the path that corresponds to the maximum free energy cost.Independent Base Pairs:Question 3•In what orders can the intervals’ free energy costs be evaluated?1. Major = lower, minor = upper bound2. Major = upper, minor = lower bound3. Diagonally4. Any order (eg, random) that respects the partial order induced by inclusionIndependent Base Pairs:Question 4•What are the time and storage requirements of this algorithm? •Express your answer in terms of the number of bases in the RNA strand.•Since the number of intervals is quadratic, the storage requirements are quadratic.•Since the time requirement for each interval is linear, total time is cubic.Independent Base Pairs: Question 5•Why not simply calculate free energies as they are needed? Why store them at all?•Because the recursive calls would turn our polynomial algorithm into an exponential algorithm.Independent Base Pairs:Question 6•How does traceback work for this algorithm?1. Recalculate which subinterval yields the maximum free energy.2. Save traceback paths.Loop Free Energy Algorithm•An RNA molecule’s free energy is not independent of all other base pairs.•An RNA molecules free energy actually depends on its loop structure.•What do we mean by loops?Types of Loops•Each base pair (ri,rj) encloses a loop:1. Hairpin loop2. Bulge on i or j3. Interior loop4. Helical regionHairpin Loop•There are no base pairs (rk,rl) for i<k<l<j.Bulge on i and j•Bulge on i:•(ri,rj) and (rk,rj-1) are base pairs with k>i+1. •ri+1 is not paired.•The bulge on j is symmetric.Interior loop•(ri,rj) and (rk,rl) are base pairs with i+1<k1<k2<j-1.•ri+1 and rj-1 are not in base pairsHelical region•(ri,rj) and (ri+1,rj-1) are base pairs.Free energy analysis•E(Si,j) = E(Si+1,j) when ri isn’t paired.•E(Si,j) = E(Si,j-1) when rj isn’t paired. •E(Si,j) = min(E(Si,k)+E(Sk+1,j)) for i<k<l, k between i’s and j’s pairs when i and j are paired but not to each other•E(Si,j) = E(Li,j) where Li,j is loop energy when I and j are paired to each otherFree Energy Functions•a(ri,rj) – Free energy of base pair (ri,rj)•H(k) – Destabilizing free energy of a hairpin loop with size k.•R – Stabilizing free energy of adjacent base pairs


View Full Document

LEHIGH CSE 397 - RNA Secondary Structure Prediction

Download RNA Secondary Structure Prediction
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view RNA Secondary Structure Prediction and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view RNA Secondary Structure Prediction 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?