DOC PREVIEW
UMD CMSC 423 - Lecture 10 Inexact alignment dynamic programming, gapped alignment

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

CMSC423 Fall 2008 1CMSC423: Bioinformatic Algorithms, Databases and ToolsLecture 10inexact alignmentdynamic programming, gapped alignmentCMSC423 Fall 2008 2Intuition• What is the best way to align strings S1 and S2?• just look at last character for now – what is it aligned to?S1[n]S2[m]S1[n]S2[m]S1[n]S2[m]AG-C-GTAG-GTCAG-A-CMSC423 Fall 2008 3The recurrencesAG-C-GTAG-GTCAG-A-Score[i,j] is the maximum of:1. Score[i-1, j-1] + Value[S1[i],S2[j]] AG-C-G AG-C-G -GTCAG -GTCAT2. Score[i – 1, j] + Value[S1[i], -] (S1[i] aligned to gap) AG-C-GT -GTCAG-3. Score[i, j – 1] + Value[-, S2[j]] (S2[j] aligned to gap) AG-C- -GTCACMSC423 Fall 2008 4The dynamic programming tableScore[i,j] is the maximum of:1. Score[i-1, j-1] + Value(S1[i],S2[j]) (S1[i-1], S2[j-1] aligned)2. Score[i – 1, j] + Value(S1[i], -) (S1[i] aligned to gap)3. Score[i, j – 1] + Value(-, S2[j]) (S2[j] aligned to gap)-14-12-10-8-6-4-20--14-10-8-6-4-2-AGACTG-8-6-4A468G1646GATGCValue (A, A) = 10Value (A, G) = -5Value (A, -) = -2Note: we only lookat 3 adjacent boxesCMSC423 Fall 2008 5How do you output the result?• Goal: produce the “nice” string with gaps that is shown in the examples• Idea: create the string backwards – starting from the right•As you follow backtrack pointers:– if you follow diagonal pointer – add characters to both output strings (aligned versions of original strings)– if you move up – add gap character to string represented on the y axis, add string character to string represented on x axis– if you move left – gap goes in string on x axis and character in string on y axis• When you reach (0,0) output the two aligned stringsCMSC423 Fall 2008 6Local vs. global alignment• Can we change the algorithm to allow S1 to be a substring of S2? ACAGTTGACCCGTGCAT ----TG-CC-G------• Key idea: gaps at the end of S2 are free•Simply change the first row in the DP table to 0s• Answer is no longer Score[n, m], rather the largest value in the last rowCMSC423 Fall 2008 7Sub-string alignment00000000--6-4-2-TGCA G26283018618208810GATGCAGCGTAG CGTCMSC423 Fall 2008 8Local alignment• What if we just want a region of similarity? ACAGTTGACCCGTGCAT || || | GTCATG-CC-GAGATCG• First row and column set to 0s•Allow alignment to start anywhere:Score[i,j] = max{0, case 1, case 2, case 3}• Answer is location in matrix with highest scoreCMSC423 Fall 2008 9Local alignment00000000000000CTGCTC3020A0G10GATGCAGCGTAG |||CTCGTCCMSC423 Fall 2008 10Various flavors of alignment• Alignment problem also called "edit distance" – how many changes do you have to make to a string to convert it into another one.•Edit distance also called Levenshtein distance•Local alignment – Smith-Waterman• Global alignment – Needleman-Wunsch11Gap penaltiesCMSC423 Fall 2008 12How much do we pay for gaps?• In the edit-distance/alignment frameworkCost(n gaps in a row) = n * Cost(gap)•This doesn't work for e.g. RNA-DNA alignmentsACAGTTCGACTAGAGGACCTAGACCACTCTGT TTCGA----------TAGACCAC• Affine gap penaltiesCost(n gaps in a row) = Cost(gap open) + n * Cost(gap)•Gap opening penalty is high, gap extension penalty is low (once we start a gap we might as well pile more gaps on top)CMSC423 Fall 2008 13Dynamic programming solution• Traditional 1-table approach doesn't work anymore• Instead, use 4 tables:–V – stores value of best alignment between S1[1..i], S2[1..j]– G – best alignment between S1[1..i], S2[1..j] s.t. S1[i] aligned with S2[j]– E – best alignment between S1[1..i], S2[1..j], s.t. alignment ends with gap in S1– F – best alignment between S1[1..i], S2[1..j], s.t. alignment ends with gap in S2• V[i,j] = max(E[i,j], F[i,j], G[i,j])•As in traditional approach, find box in V matrix where V[i,j] is maximal.CMSC423 Fall 2008 14Affine gap recurrences• V[i,j] = max[E[i,j], F[i,j], G[i,j] ]• G[i,j] = V[i-1, j-1] + Value(S1[i], S2[j])–irrespective how we got here (hence use of V), S1[i] and S2[j] are matched•E[i,j] = max{E[i, j-1], V[i, j-1] – GapOpen} – GapExtend–either we add a gap in S1 to an existing one (E-GapExtend)– or we add a gap in S1 when there was none (V-GapOpen-GapExtend)•F[i,j] = max{F[i-1, j], V[i-1, j] – GapOpen} – GapExtend–either we add a gap in S2 to an existing one (F–GapExtend)– or we add a gap in S2 when there was none (V-GapOpen-GapExtend)CMSC423 Fall 2008 15Running times• All these algorithms run in O(mn) – quadratic time• Note – this is significantly worse than exact matching•Next we'll talk about speed-up opportunities• BTW, how much space is needed?•If we only need to find the best score (not the exact alignment as well) – O(min(m,n))•If we need to find the best alignment – elegant divide and conquer algorithm leads to linear space


View Full Document

UMD CMSC 423 - Lecture 10 Inexact alignment dynamic programming, gapped alignment

Documents in this Course
Midterm

Midterm

8 pages

Lecture 7

Lecture 7

15 pages

Load more
Download Lecture 10 Inexact alignment dynamic programming, gapped alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 10 Inexact alignment dynamic programming, gapped alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 10 Inexact alignment dynamic programming, gapped alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?