Unformatted text preview:

CMSC423 Bioinformatic Algorithms Databases and Tools inexact alignment dynamic programming gapped alignment CMSC423 Fall 2009 1 Inexact matching why Redundancy in genetic code nucleotide sequence may differ but proteins the same S Y P T D TCTTATCCTACTGAT TCATACCCCACAGAC Different amino acid sequences still fold the same way function unchanged generally changing an amino acid with a similar one doesn t affect protein function Aligning ESTs RNA sequences to DNA need to account for gaps corresponding to exons Need to account for sequencing errors Read chap 6 1 define ortholog paralog homolog CMSC423 Fall 2009 2 CMSC423 Fall 2009 3 Several hemoglobins HBB HUMAN HBB HORSE HBA HUMAN HBA HORSE MYG PHYCA GLB5 PETMA LGB2 LUPLU FFESFGDLSTPDAVMGNPKVKAHGKKVL GAFSDGLAHLDNLKGTF FFDSFGDLSNPGAVMGNPKVKAHGKKVL HSFGEGVHHLDNLKGTF YFPHF DLS HGSAQVKGHGKKVA DALTNAVAHVDDMPNAL YFPHF DLS HGSAQVKAHGKKVG DALTLAVGHLDDLPGAL KFDRFKHLKTEAEMKASEDLKKHGVTVL TALGAILKKKGHHEAEL FFPKFKGLTTADQLKKSADVRWHAERII NAVNDAVASMDDTEKMS LFSFLKGTSEVP QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATL From http bioinfo cnio es docus courses SEK2003Filogenias seq analysis multiple html CMSC423 Fall 2009 4 Warm up Longest Common Subsequence Given two strings of letters identify longest string of letters that occurs in the same order in both strings AG C GTAG G C G A GTCAG A A G C G T A G G 1 1 1 T 1 C 1 A 1 1 G 1 1 1 A 1 1 Find the longest chain of 1s moving to the right and down CMSC423 Fall 2009 5 Dynamic programming Idea re use previously computed information LCS i j longest common subsequence of strings S1 1 i S2 1 j i j A G C G T A G G 1 1 1 T 1 C 1 A 1 1 G 1 1 1 A 1 1 CMSC423 Fall 2009 LCS i j is the maximum of 1 if S1 i S2 j LCS i 1 j 1 1 else LCS i 1 j 1 2 LCS i 1 j 3 LCS i j 1 Goal find LCS m n 6 Computing the LCS table Row 0 and column 0 easy to fill Fill the rest column by column Find the actual sequence trace back pointers G T C A G A A 0 0 0 1 0 1 G C G T A G 1 0 1 0 0 1 0 1 1 1 2 2 CMSC423 Fall 2009 G T C A G A A 0 0 0 1 0 1 G 1 1 1 1 2 2 C 0 1 2 2 2 2 G 1 1 2 2 3 3 T 0 2 2 2 3 3 A 0 2 2 3 3 4 7 G 1 2 2 3 4 4 Extending to sequence alignment AG C GTAG GTCAG A In LCS mis alignments were free What happens if we pay for our mistakes this also allows us to account for similar aminoacids Value A A 10 Value A G 5 Value A 2 etc The same dynamic programming algorithm works CMSC423 Fall 2009 8 The recurrences AG C GTAG GTCAG AScore i j is the maximum of 1 Score i 1 j 1 Value S1 i S2 j AG C G AG C G GTCAG GTCAT 2 Score i 1 j Value S1 i S1 i aligned to gap AG C GT GTCAG3 Score i j 1 Value S2 j S2 j aligned to gap AG C GTCA CMSC423 Fall 2009 9 The dynamic programming table Score i j is the maximum of 1 Score i 1 j 1 Value S1 i S2 j S1 i 1 S2 j 1 aligned 2 Score i 1 j Value S1 i S1 i aligned to gap 3 Score i j 1 Value S2 j S2 j aligned to gap G T C A G A 0 2 4 6 8 10 12 A 2 4 6 8 CMSC423 Fall 2009 G 4 8 6 4 C 6 6 4 16 G 8 T A G 10 12 14 Value A A 10 Value A G 5 Value A 2 Note we only look at 3 adjacent boxes 10 Intuition What is the best way to align strings S1 and S2 just look at last character for now what is it aligned to S1 n S2 m AG C GTAG GTCAG A S1 n S2 m S1 n S2 m CMSC423 Fall 2009 11 The recurrences AG C GTAG GTCAG AScore i j is the maximum of 1 Score i 1 j 1 Value S1 i S2 j AG C G AG C G GTCAG GTCAT 2 Score i 1 j Value S1 i S1 i aligned to gap AG C GT GTCAG3 Score i j 1 Value S2 j S2 j aligned to gap AG C GTCA CMSC423 Fall 2009 12 The dynamic programming table Score i j is the maximum of 1 Score i 1 j 1 Value S1 i S2 j S1 i 1 S2 j 1 aligned 2 Score i 1 j Value S1 i S1 i aligned to gap 3 Score i j 1 Value S2 j S2 j aligned to gap G T C A G A 0 2 4 6 8 10 12 A 2 4 6 8 CMSC423 Fall 2009 G 4 8 6 4 C 6 6 4 16 G 8 T A G 10 12 14 Value A A 10 Value A G 5 Value A 2 Note we only look at 3 adjacent boxes 13 How do you output the result Goal produce the nice string with gaps that is shown in the examples Idea create the string backwards starting from the right As you follow backtrack pointers if you follow diagonal pointer add characters to both output strings aligned versions of original strings if you move up add gap character to string represented on the y axis add string character to string represented on x axis if you move left gap goes in string on x axis and character in string on y axis When you reach 0 0 output the two aligned strings CMSC423 Fall 2009 14 Local vs global alignment Can we change the algorithm to allow S1 to be a substring of S2 ACAGTTGACCCGTGCAT TG CC G Key idea gaps at the end of S2 are free Simply change the first row in the DP table to 0s Answer is no longer Score n m rather the largest value in the last row CMSC423 Fall 2009 15 Sub string alignment C G T 0 2 4 6 A 0 G 0 C 0 10 8 6 G 0 8 20 18 T 0 A 0 18 30 28 G 0 26 AGCGTAG CGT CMSC423 Fall 2009 16 Local alignment What if we just want a region of similarity ACAGTTGACCCGTGCAT GTCATG CC GAGATCG First row and column set to 0s Allow alignment to start anywhere Score i j max 0 case 1 case 2 case 3 Answer is location in matrix with highest score CMSC423 Fall 2009 17 Local alignment C T C G T C 0 0 0 0 0 0 0 A 0 G 0 C 0 G 0 T 0 A 0 G 0 0 10 20 30 AGCGTAG CTCGTC CMSC423 Fall 2009 18 Various flavors of alignment Alignment problem also called edit distance how …


View Full Document

UMD CMSC 423 - Inexact alignment dynamic programming, gapped alignment

Documents in this Course
Midterm

Midterm

8 pages

Lecture 7

Lecture 7

15 pages

Load more
Loading Unlocking...
Login

Join to view Inexact alignment dynamic programming, gapped alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Inexact alignment dynamic programming, gapped alignment and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?