Stanford CS 262 - Lecture Notes - D1868842

Home> Schools> Stanford University> Computer Science (CS) > CS 262> Lecture Notes

DOC PREVIEW

Stanford CS 262 - Lecture Notes

School name Stanford University

Course Cs 262- Computational Genomics

Pages 40

This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

Save

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

View full document

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Premium Document

Do you want full access? Go Premium and unlock all 40 pages.

Access to all documents

Download any document

Ad free experience

Subscribe for instant access Get instant access

Unformatted text preview:

Sequence Alignment Cont’dEvolutionScoring FunctionSequence AlignmentThe Needleman-Wunsch AlgorithmThe Smith-Waterman algorithmScoring the gaps more accuratelyCompromise: affine gapsNeedleman-Wunsch with affine gapsSlide 10To generalize a little…Bounded Dynamic ProgrammingSlide 13Linear-Space AlignmentHirschberg’s algortihmIntroduction: Compute optimal scoreLinear-space alignmentSlide 18Slide 19Slide 20Slide 21Slide 22Slide 23The Four-Russian Algorithm A useful speedup of Dynamic Programming [Arlazarov, Dinic, Kronrod, Faradzev 1970]Main ObservationThe Four-Russian AlgorithmSlide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Indexing-based Local Aligners BLAST, WU-BLAST, BlastZ, MegaBLAST, BLAT, PatternHunter, ……Some useful applications of alignmentsSlide 39Next lectureSequence AlignmentCont’dEvolutionScoring Function•Sequence edits:AGGCCTCMutations AGGACTCInsertionsAGGGCCTCDeletionsAGG.CTCScoring Function:Match: +mMismatch: -sGap: -dScore F = (# matches)  m - (# mismatches)  s – (#gaps)  dSequence Alignment-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGACDefinitionGiven two strings x = x1x2...xM, y = y1y2…yN,an alignment is an assignment of gaps to positions0,…, N in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gapin the other sequenceAGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGACThe Needleman-Wunsch Algorithm1. Initialization.a. F(0, 0) = 0b. F(0, j) = - j  dc. F(i, 0) = - i  d2. Main Iteration. Filling-in partial alignmentsa. For each i = 1……M For each j = 1……N F(i-1,j-1) + s(xi, yj) [case 1]F(i, j) = max F(i-1, j) – d [case 2] F(i, j-1) – d [case 3]DIAG, if [case 1]Ptr(i,j) = LEFT, if [case 2]UP, if [case 3]3. Termination. F(M, N) is the optimal score, andfrom Ptr(M, N) can trace back optimal alignmentThe Smith-Waterman algorithmIdea: Ignore badly aligning regionsModifications to Needleman-Wunsch:Initialization: F(0, j) = F(i, 0) = 00Iteration: F(i, j) = max F(i – 1, j) – dF(i, j – 1) – dF(i – 1, j – 1) + s(xi, yj)Scoring the gaps more accuratelySimple, linear gap model:Gap of length nincurs penalty ndHowever, gaps usually occur in bunchesConvex gap penalty function:(n):for all n, (n + 1) - (n)  (n) - (n – 1) Algorithm: O(N3) time, O(N2) space(n)(n)Compromise: affine gaps(n) = d + (n – 1)e | | gap gap open extendTo compute optimal alignment,At position i,j, need to “remember” best score if gap is open best score if gap is not openF(i, j): score of alignment x1…xi to y1…yjifif xi aligns to yjG(i, j): score ifif xi aligns to a gap after yjH(i, j): score ifif yj aligns to a gap after xiV(i, j) = best score of alignment x1…xi to y1…yjde(n)Needleman-Wunsch with affine gapsWhy do we need two matrices?•xi aligns to yjx1……xi-1 xi xi+1y1……yj-1 yj -2. xi aligns to a gapx1……xi-1 xi xi+1y1……yj …- -Add -dAdd -eG(i+1, j) = V(i, j) – d G(i+1, j) = G(i, j) – eNeedleman-Wunsch with affine gapsInitialization: V(i, 0) = d + (i – 1)eV(0, j) = d + (j – 1)eIteration:V(i, j) = max{ F(i, j), G(i, j), H(i, j) }F(i, j) = V(i – 1, j – 1) + s(xi, yj)V(i – 1, j) – d G(i, j) = maxG(i – 1, j) – eV(i, j – 1) – d H(i, j) = maxH(i, j – 1) – eTermination: similarTime?Space?To generalize a little…… think of how you would compute optimal alignment with this gap function….in time O(MN)(n)Bounded Dynamic ProgrammingAssume we know that x and y are very similarAssumption: # gaps(x, y) < k(N) ( say N>M )xi Then, | implies | i – j | < k(N) yjWe can align x and y more efficiently:Time, Space: O(N  k(N)) << O(N2)Bounded Dynamic ProgrammingInitialization:F(i,0), F(0,j) undefined for i, j > kIteration:For i = 1…M For j = max(1, i – k)…min(N, i+k)F(i – 1, j – 1)+ s(xi, yj)F(i, j) = max F(i, j – 1) – d, if j > i – k(N)F(i – 1, j) – d, if j < i + k(N)Termination: sameEasy to extend to the affine gap casex1 ………………………… xMy1 ………………………… yNk(N)Linear-Space AlignmentHirschberg’s algortihm•Longest common subsequenceGiven sequences s = s1 s2 … sm, t = t1 t2 … tn,Find longest common subsequence u = u1 … uk•Algorithm:F(i-1, j)•F(i, j) = max F(i, j-1)F(i-1, j-1) + [1, if si = tj; 0 otherwise]•Hirschberg’s algorithm solves this in linear spaceF(i,j)Introduction: Compute optimal scoreIt is easy to compute F(M, N) in linear spaceAllocate ( column[1] )Allocate ( column[2] )For i = 1….MIf i > 1, then:Free( column[i – 2] )Allocate( column[ i ] )For j = 1…NF(i, j) = …Linear-space alignmentTo compute both the optimal score and the optimal alignment:Divide & Conquer approach:Notation:xr, yr: reverse of x, yE.g. x = accgg;xr = ggccaFr(i, j): optimal score of aligning xr1…xri & yr1…yrj same as aligning xM-i+1…xM & yN-j+1…yNLinear-space alignmentLemma:F(M, N) = maxk=0…N( F(M/2, k) + Fr(M/2, N-k) )xyM/2k*F(M/2, k)Fr(M/2, N-k)Linear-space alignment•Now, using 2 columns of space, we can computefor k = 1…M, F(M/2, k), Fr(M/2, N-k)PLUS the backpointersLinear-space alignment•Now, we can find k* maximizing F(M/2, k) + Fr(M/2, N-k)•Also, we can trace the path exiting column M/2 from k* k*k*+10 1 …… M/2 M/2+1 …… M M+1Linear-space alignment•Iterate this procedure to the left and right!N-k*M/2M/2k*Linear-space alignmentHirschberg’s Linear-space algorithm:MEMALIGN(l, l’, r, r’): (aligns xl…xl’ with yr…yr’)1. Let h = (l’-l)/22. Find (in Time O((l’ – l)  (r’-r)), Space O(r’-r))the optimal path, Lh, entering column h-1, exiting column hLet k1 = pos’n at column h – 2 where Lh entersk2 = pos’n at column h + 1 where Lh exits3. MEMALIGN(l, h-2, r, k1)4. Output Lh5. MEMALIGN(h+1, l’, k2, r’)Top level call: MEMALIGN(1, M, 1, N)Linear-space alignmentTime, Space analysis of Hirschberg’s algorithm: To compute optimal path at middle column,For box of size M  N,Space: 2NTime: cMN, for some constant cThen, left, right calls cost c( M/2  k* + M/2  (N-k*) ) = cMN/2All recursive calls cost Total Time: cMN + cMN/2 + cMN/4 + ….. = 2cMN = O(MN)Total Space: O(N) for computation,O(N+M) to store the optimal alignmentThe Four-Russian AlgorithmA useful speedup of Dynamic

View Full Document