Stanford CS 262 - Sequence Alignment - D1845028

Home> Schools> Stanford University> Computer Science (CS) > CS 262> Sequence Alignment

DOC PREVIEW

Stanford CS 262 - Sequence Alignment

School name Stanford University

Course Cs 262- Computational Genomics

Pages 45

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 45 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Sequence AlignmentSlide 2The Needleman-Wunsch AlgorithmThe Smith-Waterman algorithmScoring the gaps more accuratelyCompromise: affine gapsNeedleman-Wunsch with affine gapsSlide 8To generalize a little…Bounded Dynamic ProgrammingSlide 11Linear-Space AlignmentSubsequences and SubstringsHirschberg’s algortihmIntroduction: Compute optimal scoreLinear-space alignmentSlide 17Slide 18Slide 19Slide 20Slide 21Slide 22Heuristic Local AlignerersState of biological databasesSlide 25Some useful applications of alignmentsSlide 27Indexing-based local alignmentSlide 29Indexing-based local alignment—ExtensionsIndexing-based local alignment—ExtensionsSlide 32Sensitivity-Speed TradeoffSlide 34Measured improvementNon-consecutive words—PatternsAdvantage of PatternsMultiple patternsVariants of BLASTExampleSlide 41The Four-Russian Algorithm brief overview A (not so useful) speedup of Dynamic Programming [Arlazarov, Dinic, Kronrod, Faradzev 1970]Main ObservationThe Four-Russian AlgorithmSlide 45Sequence AlignmentCS262 Lecture 3, Win06, BatzoglouSequence Alignment-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGACDefinitionGiven two strings x = x1x2...xM, y = y1y2…yN,an alignment is an assignment of gaps to positions0,…, N in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gapin the other sequenceAGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGACCS262 Lecture 3, Win06, BatzoglouThe Needleman-Wunsch Algorithm1. Initialization.F(0, 0) = F(0, j) = F(i, 0) = 02. Main Iteration. a. For each i = 1……M For each j = 1……N F(i-1,j-1) + s(xi, yj) [case 1] F(i, j) = max F(i-1, j) – d [case 2] F(i, j-1) – d [case 3]DIAG, if [case 1] Ptr(i,j) = LEFT, if [case 2]UP,if [case 3]3. Termination. F(M, N) is the optimal score, andfrom Ptr(M, N) can trace back optimal alignmentCS262 Lecture 3, Win06, BatzoglouThe Smith-Waterman algorithmIdea: Ignore badly aligning regionsModifications to Needleman-Wunsch:Initialization: F(0, j) = F(i, 0) = 00Iteration: F(i, j) = max F(i – 1, j) – dF(i, j – 1) – dF(i – 1, j – 1) + s(xi, yj)CS262 Lecture 3, Win06, BatzoglouScoring the gaps more accuratelySimple, linear gap model:Gap of length nincurs penalty ndHowever, gaps usually occur in bunchesConvex gap penalty function:(n):for all n, (n + 1) - (n)  (n) - (n – 1) Algorithm: O(N3) time, O(N2) space(n)(n)CS262 Lecture 3, Win06, BatzoglouCompromise: affine gaps(n) = d + (n – 1)e | | gap gap open extendTo compute optimal alignment,At position i, j, need to “remember” best score if gap is open best score if gap is not openF(i, j): score of alignment x1…xi to y1…yjifif xi aligns to yjG(i, j): score ifif xi aligns to a gap after yjH(i, j): score ifif yj aligns to a gap after xiV(i, j) = best score of alignment x1…xi to y1…yjde(n)CS262 Lecture 3, Win06, BatzoglouNeedleman-Wunsch with affine gapsWhy do we need matrices F, G, H?•xi aligns to yjx1……xi-1 xi xi+1y1……yj-1 yj -2. xi aligns to a gapx1……xi-1 xi xi+1y1……yj …- -Add -dAdd -eG(i+1, j) = V(i, j) – d G(i+1, j) = G(i, j) – e Because, perhapsG(i, j) < V(i, j)(it is best to align xi to yj if we were aligningonly x1…xi to y1…yj and not the rest of x, y),but on the contraryG(i, j) – e > V(i, j) – d (i.e., had we “fixed” our decision that xi alignsto yj, we could regret it at the next step whenaligning x1…xi+1 to y1…yj)CS262 Lecture 3, Win06, BatzoglouNeedleman-Wunsch with affine gapsInitialization: V(i, 0) = d + (i – 1)eV(0, j) = d + (j – 1)eIteration:V(i, j) = max{ F(i, j), G(i, j), H(i, j) }F(i, j) = V(i – 1, j – 1) + s(xi, yj)V(i – 1, j) – d G(i, j) = maxG(i – 1, j) – eV(i, j – 1) – d H(i, j) = maxH(i, j – 1) – eTermination: V(i, j) has the best alignmentTime?Space?CS262 Lecture 3, Win06, BatzoglouTo generalize a little…… think of how you would compute optimal alignment with this gap function….in time O(MN)(n)CS262 Lecture 3, Win06, BatzoglouBounded Dynamic ProgrammingAssume we know that x and y are very similarAssumption: # gaps(x, y) < k(N)xi Then, | implies | i – j | < k(N) yjWe can align x and y more efficiently:Time, Space: O(N  k(N)) << O(N2)CS262 Lecture 3, Win06, BatzoglouBounded Dynamic ProgrammingInitialization:F(i,0), F(0,j) undefined for i, j > kIteration:For i = 1…M For j = max(1, i – k)…min(N, i+k)F(i – 1, j – 1)+ s(xi, yj)F(i, j) = max F(i, j – 1) – d, if j > i – k(N)F(i – 1, j) – d, if j < i + k(N)Termination: sameEasy to extend to the affine gap casex1 ………………………… xMy1 ………………………… yNk(N)CS262 Lecture 3, Win06, BatzoglouLinear-Space AlignmentCS262 Lecture 3, Win06, BatzoglouSubsequences and SubstringsDefinition A string x’ is a substring of a string x,if x = ux’v for some prefix string u and suffix string v(similarly, x’ = xi…xj, for some 1  i  j  |x|) A string x’ is a subsequence of a string xif x’ can be obtained from x by deleting 0 or more letters(x’ = xi1…xik, for some 1  i1  …  ik  |x|)Note: a substring is always a subsequenceExample: x = abracadabray = cadabr; substringz = brcdbr; subseqence, not substringCS262 Lecture 3, Win06, BatzoglouHirschberg’s algortihmGiven a set of strings x, y,…, a common subsequence is a string u that is a subsequence of all strings x, y, …•Longest common subsequenceGiven strings x = x1 x2 … xM, y = y1 y2 … yN,Find longest common subsequence u = u1 … uk•Algorithm:F(i – 1, j)•F(i, j) = max F(i, j – 1)F(i – 1, j – 1) + [1, if xi = yj; 0 otherwise]•Ptr(i, j) = (same as in N-W)•Termination: trace back from Ptr(M, N), and prepend a letter to u whenever •Ptr(i, j) = DIAG and F(i – 1, j – 1) < F(i, j)•Hirschberg’s algorithm solves this in linear spaceCS262 Lecture 3, Win06, BatzoglouF(i,j)Introduction: Compute optimal scoreIt is easy to compute F(M, N) in linear spaceAllocate ( column[1] )Allocate ( column[2] )For i = 1….MIf i > 1, then:Free( column[i – 2] )Allocate( column[ i ] )For j = 1…NF(i, j) = …CS262 Lecture 3, Win06, BatzoglouLinear-space alignmentTo compute both the optimal score and the optimal alignment:Divide & Conquer approach:Notation:xr, yr: reverse of x, yE.g. x = accgg;xr = ggccaFr(i, j): optimal score of aligning xr1…xri

View Full Document