CMSC423 Bioinformatic Algorithms Databases and Tools RNA folding RNA folding Function of RNA molecules depends on how they fold based on nucleotide base pairing CMSC423 Fall 2009 2 CMSC423 Fall 2009 3 Types of structures Nested hairpin ACAUGGAUGU Pseudo knots UUCCG A AGGGCAACUCGA A A UGAGCU UUCCGAAGCUCAACGGGAAAUGAGCU CMSC423 Fall 2009 4 RNA folding not just for RNA RNA given the sequence figure out how it folds Parsing A BC Given a grammar A Parse a string of letters text program according to grammar S example Given a C program Find whether parentheses braces and brackets are matched up correctly if not where are the likely errors find the smallest number of corrections that would make the progam balanced Both can be solved by the same algorithm CMSC423 Fall 2009 5 From multiple alignment to structure GCCUUCGGGC GACAUCGGUC GGCUUTGGCC Find columns in the alignment where mutations are correlated Mutual information how correlated are the columns M i j f x i x j f xi x j log f xi x xi f j xj Mi j mutual information between columns i and j fxixj frequency of each of 16 pairs of nucleotides at columns i and j fxi frequency of each of 4 nucleotides at column i fxj frequency of each of 4 nucleotides at column j CMSC423 Fall 2009 6 Mutual information Ranges from 0 to 2 for a 4 letter alphabet Correlated columns mutual information high Advantages Don t need to know how RNA folds pseudo knots should pop out of the alignment Disadvantages Need many sequences in an alignment to compute frequencies The aligned sequences must be sufficiently divergent conserved columns provide no information CMSC423 Fall 2009 7 Nussinov s algorithm Assumes no pseudo knots Dynamic programming approach maximize of pairings S string of nucleotides representing the RNA molecule Sub problem F i j score of folding just S i j Initial values F i 1 i F i i F i i 1 0 CMSC423 Fall 2009 8 Nussinov s algorithm F i j is the maximum of i 1 i j I F i 1 j S i unpaired i j 1 j III F i 1 j 1 1 if S i 1 complementary to S j 1 S i paired with S j i 1 i j 1 j II F i j 1 S j unpaired CMSC423 Fall 2009 i k k 1 j IV maxk F i k F k 1 j Branch 9 Questions In what order do we fill the dynamic programming table How can we ensure that loops consist of at least k nucleotides Note related to CYK parsing algorithm for Chomsky Normal Form grammars CMSC423 Fall 2009 10 G G G A A A U C C G G G A A A U C C CMSC423 Fall 2009 F i 1 j F i j 1 F i 1 j 1 1 if paired maxkF i k F k 1 j 11 G G G G A G G A A A U C C 0 0 0 0 0 0 1 2 3 0 0 0 0 0 0 1 2 3 0 0 0 0 1 2 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 A A U C C CMSC423 Fall 2009 0 GGGAAAUCC 2 12 A better objective function Find the RNA fold that minimizes the Gibbs free energy Zucker s algorithm keeps track of Stacking energy f of base pairs in a stem Loop energy f length of loop Bulge energy f length of bulge Dangle energy f length of dangle Computation is done with an extension of the traditional Nussinov dynamic programming approach One extension compute sub optimal folds during backtracking try multiple paths CMSC423 Fall 2009 13 Question How do you change Nussinov s algorithm to allow the computation of the stacking energy score depends on of next to next pairings Hint think affine gap penalties CMSC423 Fall 2009 14
View Full Document
Unlocking...