CMSC 423 RNA Folding Slides By Carl Kingsford Department of Computer Science University of Maryland College Park Based on Section 6 5 of Algorithm Design by Kleinberg Tardos RNA Folding RNA ACGGGGUUUAAAUUCCCUAUAT In the cell RNA folds up G and C stick together A and U stick together RNA Fold Example G C G C A U A U U A C G G C U A A U A A G A G C G C A A U C C G U U G A C A U C C C G A A G A C C C A G G U A U G U A U C C C G U G RNA Folding Rules RNA Folding Rules 1 If two bases are closer than 4 bases apart they cannot pair 2 Each base is matched to at most one other base 3 The allowable pairs are U A and G C 4 Pairs cannot cross G C G C A U A U U A C G G C U A U G C U A A G G C C U U A G C A No Crossings This is not allowed i k j l In other words if i j and k are paired then we must have i k j Paired bases have to be nested RNA Folding Problem RNA Folding Given a string r b1 b2 b3 bn with bi A C U G Find the largest set of pairs S i j where i j 1 2 n that satisfies the RNA Folding Rules Goal match as many pairs of bases as possible Subproblems j is not paired with anything U G C U A U G G C C U U A G C 1 A j OPT 1 j 1 j is paired with some t j 4 U G C U 1 A U G G C C U U t OPT 1 t 1 A G C A j OPT t 1 j 1 Subproblems 2 We have a subproblem for every interval i j How many subproblems are there Subproblems 2 We have a subproblem for every interval i j How many subproblems are there n 2 O n2 Recurrence If j i 4 OPT i j 0 If j i 4 OPT i j 1 OPT i j max maxt 1 OPT i t 1 OPT t 1 j 1 In the 2nd case above we try all possible t to pair with j That is t runs from i to j 4 Order to Solve the Problems In what order should we solve the subproblems What problems do we need to solve OPT i j Order to Solve the Problems In what order should we solve the subproblems What problems do we need to solve OPT i j OPT i t 1 and OPT t 1 j 1 for every t between i and j In what sense are these problems smaller Order to Solve the Problems In what order should we solve the subproblems What problems do we need to solve OPT i j OPT i t 1 and OPT t 1 j 1 for every t between i and j In what sense are these problems smaller They involve smaller intervals of the string Subproblem Ordering We solve OPT i j in order of increasing value of j i Pseudocode Initialize OPT i j to 0 for 1 i j n For k 5 6 n 1 For i 1 2 n k Set j i k interval length interval start interval end find the best t best t 0 For t i j 1 If rna t is complementary to rna j best t max best t 1 OPT i t 1 OPT t 1 j 1 Either pair j with t or nothing OPT i j max best t OPT i j 1 EndFor EndFor Return OPT 1 n Running Time 1 O n2 subproblems 2 Now it takes O n time to solve each subproblem have to search over all possible choices for t 3 Total running time is O n3 Code def rnafold rna n len rna OPT make matrix n n Arrows make matrix n n for k in xrange 5 n interval length for i in xrange n k interval start j i k interval end best t OPT i j 1 arrow 1 for t in xrange i j if is complement rna t rna j val 1 OPT i t 1 if t i else 0 OPT t 1 j 1 if val best t best t arrow val t OPT i j best t Arrows i j arrow return OPT Arrows Backtrace Code def rna backtrace Arrows Pairs holds the pairs in the optimal solution Stack 0 len Arrows 1 tracks cells we have to visit while len Stack 0 i j Stack pop if j i 4 continue if cell is base case skip it Arrow 1 means we didn t match j if Arrows i j 1 Stack append i j 1 else t Arrows i j Pairs append t j save that j matched with t add the two daughter problems if t i Stack append i t 1 Stack append t 1 j 1 return Pairs Notes Recap This is essentially Nussinov s algorithm which was proposed for finding RNA structures in 1978 Same dynamic programming idea write the answer to the full problem in terms of the answer to smaller problems Still have an O n n matrix to fill in Main differences from sequence alignment 1 We fill in the matrix in a different order entries i j in order of increasing j i 2 We have to try O n possible subproblems inside the max This leads to an O n3 algorithm
View Full Document
Unlocking...