DOC PREVIEW
Stanford CS 262 - Lecture Notes

This preview shows page 1-2-3-19-20-38-39-40 out of 40 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 40 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Sequence Alignment Cont’dEvolutionScoring FunctionSequence AlignmentThe Needleman-Wunsch AlgorithmThe Smith-Waterman algorithmScoring the gaps more accuratelyCompromise: affine gapsNeedleman-Wunsch with affine gapsSlide 10To generalize a little…Bounded Dynamic ProgrammingSlide 13Linear-Space AlignmentHirschberg’s algortihmIntroduction: Compute optimal scoreLinear-space alignmentSlide 18Slide 19Slide 20Slide 21Slide 22Slide 23The Four-Russian Algorithm A useful speedup of Dynamic Programming [Arlazarov, Dinic, Kronrod, Faradzev 1970]Main ObservationThe Four-Russian AlgorithmSlide 27Slide 28Slide 29Slide 30Slide 31Slide 32Slide 33Slide 34Slide 35Slide 36Indexing-based Local Aligners BLAST, WU-BLAST, BlastZ, MegaBLAST, BLAT, PatternHunter, ……Some useful applications of alignmentsSlide 39Next lectureSequence AlignmentCont’dEvolutionScoring Function•Sequence edits:AGGCCTCMutations AGGACTCInsertionsAGGGCCTCDeletionsAGG.CTCScoring Function:Match: +mMismatch: -sGap: -dScore F = (# matches)  m - (# mismatches)  s – (#gaps)  dSequence Alignment-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGACDefinitionGiven two strings x = x1x2...xM, y = y1y2…yN,an alignment is an assignment of gaps to positions0,…, N in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gapin the other sequenceAGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGACThe Needleman-Wunsch Algorithm1. Initialization.a. F(0, 0) = 0b. F(0, j) = - j  dc. F(i, 0) = - i  d2. Main Iteration. Filling-in partial alignmentsa. For each i = 1……M For each j = 1……N F(i-1,j-1) + s(xi, yj) [case 1]F(i, j) = max F(i-1, j) – d [case 2] F(i, j-1) – d [case 3]DIAG, if [case 1]Ptr(i,j) = LEFT, if [case 2]UP, if [case 3]3. Termination. F(M, N) is the optimal score, andfrom Ptr(M, N) can trace back optimal alignmentThe Smith-Waterman algorithmIdea: Ignore badly aligning regionsModifications to Needleman-Wunsch:Initialization: F(0, j) = F(i, 0) = 00Iteration: F(i, j) = max F(i – 1, j) – dF(i, j – 1) – dF(i – 1, j – 1) + s(xi, yj)Scoring the gaps more accuratelySimple, linear gap model:Gap of length nincurs penalty ndHowever, gaps usually occur in bunchesConvex gap penalty function:(n):for all n, (n + 1) - (n)  (n) - (n – 1) Algorithm: O(N3) time, O(N2) space(n)(n)Compromise: affine gaps(n) = d + (n – 1)e | | gap gap open extendTo compute optimal alignment,At position i,j, need to “remember” best score if gap is open best score if gap is not openF(i, j): score of alignment x1…xi to y1…yjifif xi aligns to yjG(i, j): score ifif xi aligns to a gap after yjH(i, j): score ifif yj aligns to a gap after xiV(i, j) = best score of alignment x1…xi to y1…yjde(n)Needleman-Wunsch with affine gapsWhy do we need two matrices?•xi aligns to yjx1……xi-1 xi xi+1y1……yj-1 yj -2. xi aligns to a gapx1……xi-1 xi xi+1y1……yj …- -Add -dAdd -eG(i+1, j) = V(i, j) – d G(i+1, j) = G(i, j) – eNeedleman-Wunsch with affine gapsInitialization: V(i, 0) = d + (i – 1)eV(0, j) = d + (j – 1)eIteration:V(i, j) = max{ F(i, j), G(i, j), H(i, j) }F(i, j) = V(i – 1, j – 1) + s(xi, yj)V(i – 1, j) – d G(i, j) = maxG(i – 1, j) – eV(i, j – 1) – d H(i, j) = maxH(i, j – 1) – eTermination: similarTime?Space?To generalize a little…… think of how you would compute optimal alignment with this gap function….in time O(MN)(n)Bounded Dynamic ProgrammingAssume we know that x and y are very similarAssumption: # gaps(x, y) < k(N) ( say N>M )xi Then, | implies | i – j | < k(N) yjWe can align x and y more efficiently:Time, Space: O(N  k(N)) << O(N2)Bounded Dynamic ProgrammingInitialization:F(i,0), F(0,j) undefined for i, j > kIteration:For i = 1…M For j = max(1, i – k)…min(N, i+k)F(i – 1, j – 1)+ s(xi, yj)F(i, j) = max F(i, j – 1) – d, if j > i – k(N)F(i – 1, j) – d, if j < i + k(N)Termination: sameEasy to extend to the affine gap casex1 ………………………… xMy1 ………………………… yNk(N)Linear-Space AlignmentHirschberg’s algortihm•Longest common subsequenceGiven sequences s = s1 s2 … sm, t = t1 t2 … tn,Find longest common subsequence u = u1 … uk•Algorithm:F(i-1, j)•F(i, j) = max F(i, j-1)F(i-1, j-1) + [1, if si = tj; 0 otherwise]•Hirschberg’s algorithm solves this in linear spaceF(i,j)Introduction: Compute optimal scoreIt is easy to compute F(M, N) in linear spaceAllocate ( column[1] )Allocate ( column[2] )For i = 1….MIf i > 1, then:Free( column[i – 2] )Allocate( column[ i ] )For j = 1…NF(i, j) = …Linear-space alignmentTo compute both the optimal score and the optimal alignment:Divide & Conquer approach:Notation:xr, yr: reverse of x, yE.g. x = accgg;xr = ggccaFr(i, j): optimal score of aligning xr1…xri & yr1…yrj same as aligning xM-i+1…xM & yN-j+1…yNLinear-space alignmentLemma:F(M, N) = maxk=0…N( F(M/2, k) + Fr(M/2, N-k) )xyM/2k*F(M/2, k)Fr(M/2, N-k)Linear-space alignment•Now, using 2 columns of space, we can computefor k = 1…M, F(M/2, k), Fr(M/2, N-k)PLUS the backpointersLinear-space alignment•Now, we can find k* maximizing F(M/2, k) + Fr(M/2, N-k)•Also, we can trace the path exiting column M/2 from k* k*k*+10 1 …… M/2 M/2+1 …… M M+1Linear-space alignment•Iterate this procedure to the left and right!N-k*M/2M/2k*Linear-space alignmentHirschberg’s Linear-space algorithm:MEMALIGN(l, l’, r, r’): (aligns xl…xl’ with yr…yr’)1. Let h = (l’-l)/22. Find (in Time O((l’ – l)  (r’-r)), Space O(r’-r))the optimal path, Lh, entering column h-1, exiting column hLet k1 = pos’n at column h – 2 where Lh entersk2 = pos’n at column h + 1 where Lh exits3. MEMALIGN(l, h-2, r, k1)4. Output Lh5. MEMALIGN(h+1, l’, k2, r’)Top level call: MEMALIGN(1, M, 1, N)Linear-space alignmentTime, Space analysis of Hirschberg’s algorithm: To compute optimal path at middle column,For box of size M  N,Space: 2NTime: cMN, for some constant cThen, left, right calls cost c( M/2  k* + M/2  (N-k*) ) = cMN/2All recursive calls cost Total Time: cMN + cMN/2 + cMN/4 + ….. = 2cMN = O(MN)Total Space: O(N) for computation,O(N+M) to store the optimal alignmentThe Four-Russian AlgorithmA useful speedup of Dynamic


View Full Document

Stanford CS 262 - Lecture Notes

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?