L2 De novo prediction of RNA structure CSE280B 06 Bafna Pevzner De novo RNA prediction a combinatorial formulation Input A string over A C G U A pairs with U C pairs with G Output A subset of possible base pairs of maximum size such that No two base pairs intersect How can we compute this set efficiently CSE280B 06 Bafna Pevzner RNA structure 1 Nussinov s algorithm 1 Score B for every base pair No penalty for loops No pesudoknots 2 Let W i j be the score of the best structure of the subsequence from i to j for i n down to 1 for j i 1 to n B ri rj W i 1 j 1 W i j 1 W i j max W i 1 j W i k W k 1 j i k j CSE280B 06 Bafna Pevzner Obtaining RNA structure for i n downto 1 for j i 1 to n B ri rj W i 1 j 1 W i j 1 W i j max W i 1 j W i k W k 1 j i if 1 else if 2 j else if 3 else CSE280B 06 Bafna Pevzner S i j S i j S i j S i j k 1 2 3 4 Obtaining RNA Structure Procedure print RNA i j if S i j print i j print RNA i 1 j 1 else if S i j print RNA i 1 j else if S i j print RNA i j 1 else k S i j print RNA i k print RNA k 1 j CSE280B 06 Bafna Pevzner i j RNA structure example j i 1 2 3 4 5 6 2 0 3 1 4 1 ACGAUU 1 2 3 4 5 6 1 1 0 5 2 2 1 1 6 3 2 1 1 CSE280B 06 0 Bafna Pevzner RNA Structure Details CSE280B 06 Bafna Pevzner Base pairing Loops Base pairs arise from complementary nucleotides Single stranded Stack is when 2 base pairs are contiguous Loops arise when there are unpaired bases They are characterized by the number of base pairs that close it Hairpin closed by 1 base pair Bulge Interior Loops 2 base pairs Multiple Internal loops k base pairs CSE280B 06 Bafna Pevzner Scoring Loops multi loops Zuker Turner Energy Rules http www bioinfo rpi edu zukerm rna energy node2 html Stacking Energies Energy for Bulges and Interior Loops Energy for Multi loops CSE280B 06 Bafna Pevzner Improving the efficiency Computing structure on a 2000nt sequence is about 20s 2GHz 2Gb Wexler et al 2006 How much time would it take to search 10000 candidate regions What would be the impact of an O n speedup on this search CSE280B 06 Bafna Pevzner A triangle inequality is satisfied B ri rj W i 1 j 1 W i j 1 W i j max W i 1 j W i k W k 1 j i k j After the computations are done W i j W i k W k 1 j i k j CSE280B 06 Bafna Pevzner Towards more efficient algorithms Consider the main loop again B ri rj W i 1 j 1 W i j 1 W i j max W i 1 j W i k W k 1 j i k j We only need to branch when i forms a base pair Therefore w l o g B ri rj W i 1 j 1 W i j 1 W i j max W i 1 j B ri rk W i 1 k 1 W k 1 j i k j CSE280B 06 Bafna Pevzner A candidate list approach B ri rj W i 1 j 1 W i j 1 W i j max W i 1 j B ri rk W i 1 k 1 W k 1 j k CandidateList i Note that if C is the maximum size of the candidate list then the running time is O Cn2 How can we reduce the size of the candidate list Consider only those k s t bk is complementary to bi We will consider cases when k is NOT optimal using the triangle inequality CSE280B 06 Bafna Pevzner Candidate list Consider i j k s t W i j B ri rk W i 1 k 1 W k 1 j 1 Claim For all j j j is not a candidate in computing W i j CSE280B 06 Bafna Pevzner Proof Suppose W i j B ri rj W i 1 j 1 W j 1 j Recall B ri rj W i 1 j 1 B ri rk W i 1 k 1 W k 1 j Therefore W i j B ri rk W i 1 k 1 W k 1 j W j 1 j Using the triangle inequality W i j B ri rk W i 1 k 1 W k 1 j Contradiction CSE280B 06 Bafna Pevzner The candidate list algorithm For all i n down to 1 CandidateList i For all j i 1 to n B ri rj W i 1 j 1 W i j 1 W i j max W i 1 j B ri rk W i 1 k 1 W k 1 j k CandidateList i if W i j B r r W i 1 j 1 i j CandidateList i CandidateList i j CSE280B 06 Bafna Pevzner How large is the candidate list Here the answer depends very much on the energy computations If we simply count the number of base pairs the candidate list should be large probably For more realistic functions it is very small experiments with n upto 1000 showed C 6 CSE280B 06 Bafna Pevzner Running time improvement 70 A Study of Accessible Motifs and RNA Folding Complexity Ydo Wexler Chaya Zilberstein and Michal Ziv Ukelson LNBI 3909 p 473 ff 1000 CSE280B 06 Bafna Pevzner A thermodynamic argument It is conjectured and empirically shown that RNA has the polymer zeta property Probability that a string of length m folds into an RNA structure is b mc for constants b c 0 Note that if c 0 the number of indices j that base pair with i is O 1 In practice c 1 5 CSE280B 06 Bafna Pevzner Incorporating pseudoknots in structure prediction Pseudoknots are only loosely defined If any interleaving is allowed then simply select a structure in which a max number of nucleotides can be paired Under some restrictive notions the structure problem becomes NPhard CSE280B 06 Bafna Pevzner Akutsu s Simple pseudoknots i0 j 0 j0 k0 A region i0 k0 forms a simple pseudoknot if there exist positions j 0 j0 s t Each i j M satisfies either i0 i j 0 j j0 or j 0 i j0 j k0 Within one of the two groups there is no interleaving i i j 0 or j 0 i i implies j j CSE280B 06 Bafna Pevzner Simple pseudoknotted structure M1 M2 M Collection of simple pseudoknots Mi and other basepairs M None of the simple pseudoknot regions …
View Full Document