Sequence AlignmentComplete DNA SequencesEvolutionEvolution at the DNA levelEvolutionary RatesSequence conservation implies functionSlide 7What is a good alignment?Scoring FunctionHow do we compute the best alignment?Alignment is additiveDynamic ProgrammingDynamic Programming (cont’d)Slide 14ExampleThe Needleman-Wunsch MatrixThe Needleman-Wunsch AlgorithmPerformanceA variant of the basic algorithm:Different types of overlapsThe Overlap Detection variantThe local alignment problemWhy local alignment – examplesCross-species genome similarityThe Smith-Waterman algorithmSlide 26Scoring the gaps more accuratelyConvex gap dynamic programmingCompromise: affine gapsNeedleman-Wunsch with affine gapsSlide 31To generalize a bit…Bounded Dynamic ProgrammingSlide 34RecapSequence AlignmentCS262 Lecture 2, Win07, BatzoglouComplete DNA SequencesMore than 300 complete genomes have been sequencedCS262 Lecture 2, Win07, BatzoglouEvolutionCS262 Lecture 2, Win07, BatzoglouEvolution at the DNA level…ACGGTGCAGTTACCA……AC----CAGTCCACCA…MutationSEQUENCE EDITSREARRANGEMENTSDeletionInversionTranslocationDuplicationCS262 Lecture 2, Win07, BatzoglouEvolutionary Rates OKOKOKXXStill OK?next generationCS262 Lecture 2, Win07, BatzoglouSequence conservation implies functionAlignment is the key to• Finding important regions• Determining function• Uncovering evolutionary eventsCS262 Lecture 2, Win07, BatzoglouSequence Alignment-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGACDefinitionGiven two strings x = x1x2...xM, y = y1y2…yN,an alignment is an assignment of gaps to positions0,…, N in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gapin the other sequenceAGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGACCS262 Lecture 2, Win07, BatzoglouWhat is a good alignment?AGGCTAGTT, AGCGAAGTTTAGGCTAGTT- 6 matches, 3 mismatches, 1 gapAGCGAAGTTTAGGCTA-GTT- 7 matches, 1 mismatch, 3 gapsAG-CGAAGTTTAGGC-TA-GTT- 7 matches, 0 mismatches, 5 gapsAG-CG-AAGTTTCS262 Lecture 2, Win07, BatzoglouScoring Function•Sequence edits:AGGCCTCMutations AGGACTCInsertions AGGGCCTCDeletions AGG . CTCScoring Function:Match: +mMismatch: -sGap: -dScore F = (# matches) m - (# mismatches) s – (#gaps) dAlternative definition:minimal edit distanceminimal edit distance“Given two strings x, y,find minimum # of edits (insertions, deletions, mutations) to transform one string to the other”CS262 Lecture 2, Win07, BatzoglouHow do we compute the best alignment?AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGAAGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTCToo many possible alignments:>> 2N(exercise)CS262 Lecture 2, Win07, BatzoglouAlignment is additiveObservation:The score of aligning x1……xMy1……yNis additiveSay that x1…xixi+1…xM aligns to y1…yjyj+1…yNThe two scores add up:F(x[1:M], y[1:N]) = F(x[1:i], y[1:j]) + F(x[i+1:M], y[j+1:N])CS262 Lecture 2, Win07, BatzoglouDynamic Programming•There are only a polynomial number of subproblemsAlign x1…xi to y1…yj•Original problem is one of the subproblemsAlign x1…xM to y1…yN•Each subproblem is easily solved from smaller subproblemsWe will show next•Then, we can apply Dynamic Programming!!!Dynamic Programming!!!Let F(i, j) = optimal score of aligningx1……xiy1……yjF is the DP “Matrix” or “Table”“Memoization”CS262 Lecture 2, Win07, BatzoglouDynamic Programming (cont’d)Notice three possible cases:1. xi aligns to yjx1……xi-1 xiy1……yj-1 yj2. xi aligns to a gapx1……xi-1 xiy1……yj -3. yj aligns to a gapx1……xi -y1……yj-1 yj m, if xi = yjF(i, j) = F(i – 1, j – 1) + -s, if not F(i, j) = F(i – 1, j) – d F(i, j) = F(i, j – 1) – dCS262 Lecture 2, Win07, BatzoglouDynamic Programming (cont’d)How do we know which case is correct?Inductive assumption:F(i, j – 1), F(i – 1, j), F(i – 1, j – 1) are optimalThen, F(i – 1, j – 1) + s(xi, yj)F(i, j) = max F(i – 1, j) – d F(i, j – 1) – dWhere s(xi, yj) = m, if xi = yj; -s, if notCS262 Lecture 2, Win07, BatzoglouG -A G T A0 -1 -2 -3 -4A -1 1 0 -1 -2T -2 0 0 1 0A -3 -1 -1 0 2F(i,j) i = 0 1 2 3 4Examplex = AGTA m = 1y = ATA s = -1d = -1j = 0123F(1, 1) = max{F(0,0) + s(A, A), F(0, 1) – d, F(1, 0) – d} =max{0 + 1, -1 – 1, -1 – 1} = 1AATTAAProcedure to output Alignment• Follow the backpointers• When diagonal,OUTPUT xi, yj• When up,OUTPUT yj• When left,OUTPUT xiCS262 Lecture 2, Win07, BatzoglouThe Needleman-Wunsch Matrixx1 ……………………………… xMy1 ……………………………… yNEvery nondecreasing path from (0,0) to (M, N) corresponds to an alignment of the two sequencesAn optimal alignment is composed of optimal subalignmentsCS262 Lecture 2, Win07, BatzoglouThe Needleman-Wunsch Algorithm1. Initialization.a. F(0, 0) = 0b. F(0, j) = - j dc. F(i, 0) = - i d2. Main Iteration. Filling-in partial alignmentsa. For each i = 1……M For each j = 1……N F(i – 1,j – 1) + s(xi, yj) [case 1]F(i, j) = max F(i – 1, j) – d [case 2] F(i, j – 1) – d [case 3]DIAG, if [case 1]Ptr(i, j) = LEFT, if [case 2]UP, if [case 3]3. Termination. F(M, N) is the optimal score, andfrom Ptr(M, N) can trace back optimal alignmentCS262 Lecture 2, Win07, BatzoglouPerformance•Time:O(NM)•Space:O(NM)•Later we will cover more efficient methodsCS262 Lecture 2, Win07, BatzoglouA variant of the basic algorithm:•Maybe it is OK to have an unlimited # of gaps in the beginning and end:----------CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC ||||||| |||| | || ||GCGAGTTCATCTATCAC--GACCGC--GGTCG--------------•Then, we don’t want to penalize gaps in the endsCS262 Lecture 2, Win07, BatzoglouDifferent types of overlapsExample:2 overlapping“reads” from a sequencing project – recall Lecture 1Example:Search for a mouse genewithin a human chromosomeCS262 Lecture 2, Win07, BatzoglouThe Overlap Detection variantChanges:1. InitializationFor all i, j,F(i, 0) = 0F(0, j) = 02. Termination maxi F(i, N)FOPT = max maxj F(M, j)x1 ……………………………… xMy1 ……………………………… yNCS262 Lecture 2, Win07, BatzoglouThe local alignment problemGiven two strings x = x1……xM, y = y1……yNFind substrings x’, y’ whose similarity (optimal global alignment value)is maximumx =
View Full Document