Fa08! CSE 182!Back to Smith Waterman i!i-1!j!j-1!Fa08! CSE 182!An Example • Align s=TCAT with t=TGCAA • Match Score = 1 • Mismatch score = -1, Indel Score = -1 • C(‘T’,’T’) = ? ; C(‘T’,’A’)=? ; C(‘-’,’A’) =? • Global Score A1?, Score A2? T C A T -!T G C A A!T C A T!T G C A A!A1 A2Fa08! CSE 182!T G C A A!T!C!A!T! 1! 1! -1! -2! -2! -4! 1! 2! 0! -1! -1! -3! -1! 0! 1! 0! 0! -2! -3! -2! -1! 0! 1! -1! -5! -4! -3! -2! -1! 0!Example: Global alignment € S[i, j] = maxS[i −1, j −1] + C(si,tj)S[i −1, j] + C(si,−)S[i, j −1] + C(−,tj) Fa08! CSE 182!Local Alignment • The original recurrence still works, except when the optimum score S[i,j] is negative • When S[i,j] <0, it means that the optimum local alignment cannot include the point (i,j). • So, we must reset the score to 0. si!tj!i!i-1!j!j-1!Fa08! CSE 182!Local Alignment Trick (Smith-Waterman algorithm) How can we compute the local alignment itself?!€ S[i, j] = max0S[i −1, j −1] + C(si,tj)S[i −1, j] + C(si,−)S[i, j −1] + C(−,tj) Fa08! CSE 182!T G C A A!T!C!A!T! 1! 1! 0! 0! 0! 0! 1! 2! 0! 0! 0! 0! 0! 0! 1! 0! 0! 0! 0! 0! 0! 0! 0! 0! 0! 0! 0! 0! 1! 0!Ex: Local Alignment € S[i, j] = max0S[i −1, j −1] + C(si,tj)S[i −1, j] + C(si,−)S[i, j −1] + C(−,tj) Fa08! CSE 182!Gaps appear together!Fa08! CSE 182!Generalizing Gap Cost • It is more likely for gaps to be contiguous • The penalty for a gap of length l should be € go + ge ∗ lFa08! CSE 182!Using affine gap penalties € S[i, j] = maxl0S[i −1, j −1] + C(si,tj)S[i − l, j] + go + ge * lS[i, j − l] + go + ge * l • What is the time taken for this? • What are the values that l can take? • Can we get rid of the extra Dimension?Fa08! CSE 182!Affine gap penalties • Define D[i,j] : Score of the best alignment, given that the final column is a ‘deletion’ (si is aligned to a gap) • Define I[i,j]: Score of the best alignment, given that the final column is an insertion (tj is aligned to a gap) € S[i, j] = max0S[i −1, j −1] + C(si,tj)D[i, j]I[i, j] s[i]!Optimum alignment of s[1..i-1], and t[1..j]! -!Optimum alignment of s[1..i], and t[1..j-1]!t[j]!Fa08! CSE 182!O(nm) solution for affine gap costs € D[i, j] = maxD[i −1, j] + geS[i −1, j] + go + ge € I[i, j] = maxI[i, j −1] + geS[i, j −1] + go + ge Fa08! CSE 182!Alignment Space? • How much space do we need? • Is the space requirement too much? € S[i, j] = maxS[i −1, j −1] + C(si,tj)S[i −1, j] + C(si,−)S[i, j −1] + C(−,tj) Fa07! CSE 182!Copyright ©2004 by the National Academy of Sciences Istrail, Sorin et al. (2004) Proc. Natl. Acad. Sci. USA 101, 1916-1921 Fig. 1. Dot-plot representation of sample assembly comparison results Ex: Genomic assembly comparisons (6Mb)Fa08! CSE 182!Alignment (Linear Space) • Score computation € i2= i%2; i1= (i −1)%2;S[i2, j] = maxS[i1, j −1] + C(si,tj)S[i1, j] + C(si,−)S[i2, j −1] + C(−,tj) For i = 1 to n! For j = 1 to m !€ S[i, j] = maxS[i −1, j −1] + C(si,tj)S[i −1, j] + C(si,−)S[i, j −1] + C(−,tj) Fa08! CSE 182!Linear Space Alignment • In Linear Space, we can do each row of the D.P. • We need to compute the optimum path from the origin (0,0) to (m,n)Fa07! CSE 182!Linear Space (cont’d) • At i=n/2, we know scores of all the optimal paths ending at that row. • Define F[j] = S[n/2,j] • One of these j is on the true path. Which one?Fa08! CSE 182!Backward alignment • Let Sb[i,j] be the optimal score of aligning s[i+1..n] with t[j+1..m] € Sb[i, j] = maxSb[i + 1, j + 1] + C(si,tj)Sb[i + 1, j] + C(si,−)Sb[i, j + 1] + C(−,tj) • Boundary cases? • Sb[n,j]? Sb[m,j]?Fa08! CSE 182!Backward alignment • Let Sb[i,j] be the optimal score of aligning s[i+1..n] with t[j+1..m] • Define B[j] = Sb[n/2,j] • One of these j is on the true path. Which one?Fa08! CSE 182!Forward, Backward computation • At the optimal coordinate, j – F[j]+B[j]=S[n,m] • In O(nm) time, and O(m) space, we can compute one of the coordinates on the optimum path.Fa08! CSE 182!Linear Space Alignment • Align(1..n,1..m) – For all 1<=j <= m • Compute F[j]=S(n/2,j) – For all 1<=j <= m • Compute B[j]=Sb(n/2,j) – j* = maxj {F[j]+B[j] } – X = Align(1..n/2,1..j*) – Y = Align(n/2+1..n,j*+1..m) – Return X,j*,YFa08! CSE 182!Linear Space complexity • T(nm) = c.nm + T(nm/2) = O(nm) • Space =
View Full Document