RNA FoldingCMSC 423: RNA FoldingSlides By: Carl KingsfordDepartment of Computer ScienceUniversity of Maryland, College ParkBased on Section 6.5 of Algorithm Design by Kleinberg & Tardos.RNA FoldingRNA: ACGGGGUUUAAAUUCCCUAUATIn the cell, RNA folds up:•G and C stick together•A and U stick togetherRNA Fold ExampleG CCGUAA UU AC GCGAUAUGGGUAAAA G C CGGCUUAAAGACCGGUCUUUACCCCGGAUAUGCCCCAARNA Folding RulesRNA Folding Rules:1 If two bases are closer than 4 bases apart, they cannot pair.2 Each base is matched to at most one other base.3 The allowable pairs are {U, A} and {G , C }.4 Pairs cannot “cross.”G CCGUAA UU AC GCGAUG CCG UAA UU AC G CG AUNo CrossingsThis is not allowed:i jk lIn other words, if (i, j) and (k, `) are paired, then we must havei < k < ` < j.Paired bases have to be nested.RNA Folding ProblemRNA FoldingGiven: a string r = b1, b2, b3, . . . , bn, with bi∈ {A,C,U,G}Find: the largest set of pairs S = {(i, j)}, wherei, j ∈ {1, 2, . . . , n}, that satisfies the RNA Folding Rules.Goal: match as many pairs of bases as possible.SubproblemsG CCG UUA UU AC G CG AU1 jG CCG UUA UU AC G CG AU1 jj is not paired with anythingj is paired with some t ≤ j -4tOPT(t+1, j-1)OPT(1, t-1)OPT(1, j-1)Subproblems, 2We have a subproblem for every interval (i, j).How many subproblems are there?n2= O(n2)Subproblems, 2We have a subproblem for every interval (i, j).How many subproblems are there?n2= O(n2)RecurrenceIf j − i ≤ 4: OPT (i, j) = 0If j − i > 4:OPT (i, j) = max(OPT (i, j − 1)maxt{1 + OPT (i, t − 1) + OPT (t + 1, j − 1)}In the 2nd case above, we try all possible t to pair with j.That is t runs from i to j − 4.Order to Solve the Problems•In what order should we solve the subproblems?•What problems do we need to solve OPT (i , j)?•OPT (i, t − 1) and OPT (t + 1, j − 1)for every t between i and j.•In what sense are these problems “smaller?”•They involve smaller intervals of the string.Subproblem OrderingWe solve OPT (i , j) in order of increasing value of j − i .Order to Solve the Problems•In what order should we solve the subproblems?•What problems do we need to solve OPT (i , j)?•OPT (i, t − 1) and OPT (t + 1, j − 1)for every t between i and j.•In what sense are these problems “smaller?”•They involve smaller intervals of the string.Subproblem OrderingWe solve OPT (i , j) in order of increasing value of j − i .Order to Solve the Problems•In what order should we solve the subproblems?•What problems do we need to solve OPT (i , j)?•OPT (i, t − 1) and OPT (t + 1, j − 1)for every t between i and j.•In what sense are these problems “smaller?”•They involve smaller intervals of the string.Subproblem OrderingWe solve OPT (i , j) in order of increasing value of j − i .PseudocodeInitialize OPT[i,j] to 0 for 1 ≤ i, j ≤ nFor k = 5, 6,. . ., n-1 // interval lengthFor i = 1,2,. . ., n-k // interval startSet j = i + k // interval end// find the best tbest_t = 0For t = i,. . .,j-1:If rna[t] is complementary to rna[j]:best_t = max(best_t, 1 + OPT[i,t-1]+OPT[t+1,j-1])// Either pair j with t or nothingOPT[i,j] = max(best_t, OPT[i,j-1])EndForEndForReturn OPT[1,n]Running Time1 O(n2) subproblems.2 Now, it takes O(n) time to solve each subproblem.(have to search over all possible choices for t.)3 Total running time is O(n3).Codedef rnafold(rna):n = len(rna)OPT = make_matrix(n, n)Arrows = make_matrix(n, n)for k in xrange(5, n): # interval lengthfor i in xrange(n-k): # interval startj = i + k # interval endbest_t = OPT[i][j-1]arrow = -1for t in xrange(i, j):if is_complement(rna[t], rna[j]):val = 1 + (OPT[i][t-1] if t > i else 0) + OPT[t+1][j-1]if val >= best_t: best_t, arrow = val, tOPT[i][j] = best_tArrows[i][j] = arrowreturn OPT, ArrowsBacktrace Codedef rna_backtrace(Arrows):Pairs = [] # holds the pairs in the optimal solutionStack = [(0, len(Arrows) - 1)] # tracks cells we have to visitwhile len(Stack) > 0:i, j = Stack.pop()if j - i <= 4: continue # if cell is base case, skip it# Arrow = -1 means we didn’t match jif Arrows[i][j] == -1:Stack.append((i, j - 1))else:t = Arrows[i][j]Pairs.append((t, j)) # save that j matched with t# add the two daughter problemsif t > i: Stack.append((i, t - 1))Stack.append((t + 1, j - 1))return PairsNotes & Recap•This is essentially “Nussinov’s algorithm”, which wasproposed for finding RNA structures in 1978.•Same dynamic programming idea: write the answer to the fullproblem in terms of the answer to smaller problems.•Still have an O(n × n) matrix to fill in.•Main differences from sequence alignment:1 We fill in the matrix in a different order: entries (i , j) in orderof increasing j − i.2 We have to try O(n) possible subproblems inside the max.This leads to an O(n3)
View Full Document