Sequence AlignmentSlide 2The Needleman-Wunsch AlgorithmThe Smith-Waterman algorithmScoring the gaps more accuratelyCompromise: affine gapsNeedleman-Wunsch with affine gapsSlide 8To generalize a little…Bounded Dynamic ProgrammingSlide 11Linear-Space AlignmentSubsequences and SubstringsHirschberg’s algortihmIntroduction: Compute optimal scoreLinear-space alignmentSlide 17Slide 18Slide 19Slide 20Slide 21Slide 22Heuristic Local AlignerersState of biological databasesSlide 25Some useful applications of alignmentsSlide 27Indexing-based local alignmentSlide 29Indexing-based local alignment—ExtensionsIndexing-based local alignment—ExtensionsSlide 32Sensitivity-Speed TradeoffSlide 34Measured improvementNon-consecutive words—PatternsAdvantage of PatternsMultiple patternsVariants of BLASTExampleSlide 41The Four-Russian Algorithm brief overview A (not so useful) speedup of Dynamic Programming [Arlazarov, Dinic, Kronrod, Faradzev 1970]Main ObservationThe Four-Russian AlgorithmSlide 45Sequence AlignmentCS262 Lecture 3, Win06, BatzoglouSequence Alignment-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGACDefinitionGiven two strings x = x1x2...xM, y = y1y2…yN,an alignment is an assignment of gaps to positions0,…, N in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gapin the other sequenceAGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGACCS262 Lecture 3, Win06, BatzoglouThe Needleman-Wunsch Algorithm1. Initialization.F(0, 0) = F(0, j) = F(i, 0) = 02. Main Iteration. a. For each i = 1……M For each j = 1……N F(i-1,j-1) + s(xi, yj) [case 1] F(i, j) = max F(i-1, j) – d [case 2] F(i, j-1) – d [case 3]DIAG, if [case 1] Ptr(i,j) = LEFT, if [case 2]UP,if [case 3]3. Termination. F(M, N) is the optimal score, andfrom Ptr(M, N) can trace back optimal alignmentCS262 Lecture 3, Win06, BatzoglouThe Smith-Waterman algorithmIdea: Ignore badly aligning regionsModifications to Needleman-Wunsch:Initialization: F(0, j) = F(i, 0) = 00Iteration: F(i, j) = max F(i – 1, j) – dF(i, j – 1) – dF(i – 1, j – 1) + s(xi, yj)CS262 Lecture 3, Win06, BatzoglouScoring the gaps more accuratelySimple, linear gap model:Gap of length nincurs penalty ndHowever, gaps usually occur in bunchesConvex gap penalty function:(n):for all n, (n + 1) - (n) (n) - (n – 1) Algorithm: O(N3) time, O(N2) space(n)(n)CS262 Lecture 3, Win06, BatzoglouCompromise: affine gaps(n) = d + (n – 1)e | | gap gap open extendTo compute optimal alignment,At position i, j, need to “remember” best score if gap is open best score if gap is not openF(i, j): score of alignment x1…xi to y1…yjifif xi aligns to yjG(i, j): score ifif xi aligns to a gap after yjH(i, j): score ifif yj aligns to a gap after xiV(i, j) = best score of alignment x1…xi to y1…yjde(n)CS262 Lecture 3, Win06, BatzoglouNeedleman-Wunsch with affine gapsWhy do we need matrices F, G, H?•xi aligns to yjx1……xi-1 xi xi+1y1……yj-1 yj -2. xi aligns to a gapx1……xi-1 xi xi+1y1……yj …- -Add -dAdd -eG(i+1, j) = V(i, j) – d G(i+1, j) = G(i, j) – e Because, perhapsG(i, j) < V(i, j)(it is best to align xi to yj if we were aligningonly x1…xi to y1…yj and not the rest of x, y),but on the contraryG(i, j) – e > V(i, j) – d (i.e., had we “fixed” our decision that xi alignsto yj, we could regret it at the next step whenaligning x1…xi+1 to y1…yj)CS262 Lecture 3, Win06, BatzoglouNeedleman-Wunsch with affine gapsInitialization: V(i, 0) = d + (i – 1)eV(0, j) = d + (j – 1)eIteration:V(i, j) = max{ F(i, j), G(i, j), H(i, j) }F(i, j) = V(i – 1, j – 1) + s(xi, yj)V(i – 1, j) – d G(i, j) = maxG(i – 1, j) – eV(i, j – 1) – d H(i, j) = maxH(i, j – 1) – eTermination: V(i, j) has the best alignmentTime?Space?CS262 Lecture 3, Win06, BatzoglouTo generalize a little…… think of how you would compute optimal alignment with this gap function….in time O(MN)(n)CS262 Lecture 3, Win06, BatzoglouBounded Dynamic ProgrammingAssume we know that x and y are very similarAssumption: # gaps(x, y) < k(N)xi Then, | implies | i – j | < k(N) yjWe can align x and y more efficiently:Time, Space: O(N k(N)) << O(N2)CS262 Lecture 3, Win06, BatzoglouBounded Dynamic ProgrammingInitialization:F(i,0), F(0,j) undefined for i, j > kIteration:For i = 1…M For j = max(1, i – k)…min(N, i+k)F(i – 1, j – 1)+ s(xi, yj)F(i, j) = max F(i, j – 1) – d, if j > i – k(N)F(i – 1, j) – d, if j < i + k(N)Termination: sameEasy to extend to the affine gap casex1 ………………………… xMy1 ………………………… yNk(N)CS262 Lecture 3, Win06, BatzoglouLinear-Space AlignmentCS262 Lecture 3, Win06, BatzoglouSubsequences and SubstringsDefinition A string x’ is a substring of a string x,if x = ux’v for some prefix string u and suffix string v(similarly, x’ = xi…xj, for some 1 i j |x|) A string x’ is a subsequence of a string xif x’ can be obtained from x by deleting 0 or more letters(x’ = xi1…xik, for some 1 i1 … ik |x|)Note: a substring is always a subsequenceExample: x = abracadabray = cadabr; substringz = brcdbr; subseqence, not substringCS262 Lecture 3, Win06, BatzoglouHirschberg’s algortihmGiven a set of strings x, y,…, a common subsequence is a string u that is a subsequence of all strings x, y, …•Longest common subsequenceGiven strings x = x1 x2 … xM, y = y1 y2 … yN,Find longest common subsequence u = u1 … uk•Algorithm:F(i – 1, j)•F(i, j) = max F(i, j – 1)F(i – 1, j – 1) + [1, if xi = yj; 0 otherwise]•Ptr(i, j) = (same as in N-W)•Termination: trace back from Ptr(M, N), and prepend a letter to u whenever •Ptr(i, j) = DIAG and F(i – 1, j – 1) < F(i, j)•Hirschberg’s algorithm solves this in linear spaceCS262 Lecture 3, Win06, BatzoglouF(i,j)Introduction: Compute optimal scoreIt is easy to compute F(M, N) in linear spaceAllocate ( column[1] )Allocate ( column[2] )For i = 1….MIf i > 1, then:Free( column[i – 2] )Allocate( column[ i ] )For j = 1…NF(i, j) = …CS262 Lecture 3, Win06, BatzoglouLinear-space alignmentTo compute both the optimal score and the optimal alignment:Divide & Conquer approach:Notation:xr, yr: reverse of x, yE.g. x = accgg;xr = ggccaFr(i, j): optimal score of aligning xr1…xri
View Full Document