Stanford CS 262 - Sequence Alignment - D401013

Home> Schools> Stanford University> Computer Science (CS) > CS 262> Sequence Alignment

DOC PREVIEW

Stanford CS 262 - Sequence Alignment

School name Stanford University

Course Cs 262- Computational Genomics

Pages 35

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 35 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Sequence Alignment Cont’dSequence AlignmentScoring FunctionThe Needleman-Wunsch AlgorithmThe Smith-Waterman algorithmScoring the gaps more accuratelyCompromise: affine gapsNeedleman-Wunsch with affine gapsSlide 9To generalize a little…Bounded Dynamic ProgrammingSlide 12Linear-Space AlignmentHirschberg’s algortihmIntroduction: Compute optimal scoreLinear-space alignmentSlide 17Slide 18Slide 19Slide 20Slide 21Slide 22The Four-Russian Algorithm A useful speedup of Dynamic ProgrammingMain ObservationThe Four-Russian AlgorithmSlide 26Slide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Sequence AlignmentCont’dSequence Alignment-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGACDefinitionGiven two strings x = x1x2...xM, y = y1y2…yN,an alignment is an assignment of gaps to positions0,…, M in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gapin the other sequenceAGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGACScoring Function•Sequence edits:AGGCCTCMutations AGGACTCInsertionsAGGGCCTCDeletionsAGG.CTCScoring Function:Match: +mMismatch: -sGap: -dScore F = (# matches)  m - (# mismatches)  s – (#gaps)  dThe Needleman-Wunsch Algorithm1. Initialization.a. F(0, 0) = 0b. F(0, j) = - j  dc. F(i, 0) = - i  d2. Main Iteration. Filling-in partial alignmentsa. For each i = 1……M For each j = 1……N F(i-1,j-1) + s(xi, yj) [case 1]F(i, j) = max F(i-1, j) – d [case 2] F(i, j-1) – d [case 3]DIAG, if [case 1]Ptr(i,j) = LEFT, if [case 2]UP, if [case 3]3. Termination. F(M, N) is the optimal score, andfrom Ptr(M, N) can trace back optimal alignmentThe Smith-Waterman algorithmIdea: Ignore badly aligning regionsModifications to Needleman-Wunsch:Initialization: F(0, j) = F(i, 0) = 00Iteration: F(i, j) = max F(i – 1, j) – dF(i, j – 1) – dF(i – 1, j – 1) + s(xi, yj)Scoring the gaps more accuratelySimple, linear gap model:Gap of length nincurs penalty ndHowever, gaps usually occur in bunchesConvex gap penalty function:(n):for all n, (n + 1) - (n)  (n) - (n – 1) Algorithm: O(N3) time, O(N2) space(n)(n)Compromise: affine gaps(n) = d + (n – 1)e | | gap gap open extendTo compute optimal alignment,At position i,j, need to “remember” best score if gap is open best score if gap is not openF(i, j): score of alignment x1…xi to y1…yjifif xi aligns to yjG(i, j): score ifif xi, or yj, aligns to a gapde(n)Needleman-Wunsch with affine gapsWhy do we need two matrices?•xi aligns to yjx1……xi-1 xi xi+1y1……yj-1 yj -2. xi aligns to a gapx1……xi-1 xi xi+1y1……yj …- -Add -dAdd -eNeedleman-Wunsch with affine gapsInitialization: F(i, 0) = d + (i – 1)eF(0, j) = d + (j – 1)eIteration:F(i – 1, j – 1) + s(xi, yj)F(i, j) = maxG(i – 1, j – 1) + s(xi, yj)F(i – 1, j) – d F(i, j – 1) – d G(i, j) = max G(i, j – 1) – eG(i – 1, j) – e Termination: sameTo generalize a little…… think of how you would compute optimal alignment with this gap function….in time O(MN)(n)Bounded Dynamic ProgrammingAssume we know that x and y are very similarAssumption: # gaps(x, y) < k(N) ( say N>M )xi Then, | implies | i – j | < k(N) yjWe can align x and y more efficiently:Time, Space: O(N  k(N)) << O(N2)Bounded Dynamic ProgrammingInitialization:F(i,0), F(0,j) undefined for i, j > kIteration:For i = 1…M For j = max(1, i – k)…min(N, i+k)F(i – 1, j – 1)+ s(xi, yj)F(i, j) = max F(i, j – 1) – d, if j > i – k(N)F(i – 1, j) – d, if j < i + k(N)Termination: sameEasy to extend to the affine gap casex1 ………………………… xMy1 ………………………… yNk(N)Linear-Space AlignmentHirschberg’s algortihm•Longest common subsequenceGiven sequences s = s1 s2 … sm, t = t1 t2 … tn,Find longest common subsequence u = u1 … uk•Algorithm:F(i-1, j)•F(i, j) = max F(i, j-1)F(i-1, j-1) + [1, if si = tj; 0 otherwise]•Hirschberg’s algorithm solves this in linear spaceF(i,j)Introduction: Compute optimal scoreIt is easy to compute F(M, N) in linear spaceAllocate ( column[1] )Allocate ( column[2] )For i = 1….MIf i > 1, then:Free( column[i – 2] )Allocate( column[ i ] )For j = 1…NF(i, j) = …Linear-space alignmentTo compute both the optimal score and the optimal alignment:Divide & Conquer approach:Notation:xr, yr: reverse of x, yE.g. x = accgg;xr = ggccaFr(i, j): optimal score of aligning xr1…xri & yr1…yrj same as F(M-i+1, N-j+1)Linear-space alignmentLemma:F(M, N) = maxk=0…N( F(M/2, k) + Fr(M/2, N-k) )xyM/2k*F(M/2, k)Fr(M/2, N-k)Linear-space alignment•Now, using 2 columns of space, we can computefor k = 1…M, F(M/2, k), Fr(M/2, N-k)PLUS the backpointersLinear-space alignment•Now, we can find k* maximizing F(M/2, k) + Fr(M/2, N-k)•Also, we can trace the path exiting column M/2 from k* k*k*Linear-space alignment•Iterate this procedure to the left and right!N-k*M/2M/2k*Linear-space alignmentHirschberg’s Linear-space algorithm:MEMALIGN(l, l’, r, r’): (aligns xl…xl’ with yr…yr’)1. Let h = (l’-l)/22. Find in Time O((l’ – l)  (r’-r)), Space O(r’-r)the optimal path, Lh, entering column h-1, exiting column hLet k1 = pos’n at column h – 2 where Lh entersk2 = pos’n at column h + 1 where Lh exits3. MEMALIGN(l, h-2, r, k1)4. Output Lh5. MEMALIGN(h+1, l’, k2, r’)Top level call: MEMALIGN(1, M, 1, N)Linear-space alignmentTime, Space analysis of Hirschberg’s algorithm: To compute optimal path at middle column,For box of size M  N,Space: 2NTime: cMN, for some constant cThen, left, right calls cost c( M/2  k* + M/2  (N-k*) ) = cMN/2All recursive calls cost Total Time: cMN + cMN/2 + cMN/4 + ….. = 2cMN = O(MN)Total Space: O(N) for computation,O(N+M) to store the optimal alignmentThe Four-Russian AlgorithmA useful speedup of Dynamic ProgrammingMain ObservationWithin a rectangle of the DP matrix,values of D depend onlyon the values of A, B, C,and substrings xl...l’, yr…r’Definition: A t-block is a t  t square of the DP matrixIdea: Divide matrix in t-blocks,Precompute t-blocksSpeedup: O(t)ABCDxlxl’yryr’tThe Four-Russian AlgorithmMain structure of the algorithm:•Divide NN DP matrix into KK log2N-blocks that overlap by 1 column & 1 row•For i = 1……K• For j = 1……K•

View Full Document