DOC PREVIEW
Stanford CS 262 - Sequence Alignment

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Sequence AlignmentComplete DNA SequencesEvolutionEvolution at the DNA levelEvolutionary RatesSequence conservation implies functionSlide 7What is a good alignment?Scoring FunctionHow do we compute the best alignment?Alignment is additiveDynamic ProgrammingDynamic Programming (cont’d)Slide 14ExampleThe Needleman-Wunsch MatrixThe Needleman-Wunsch AlgorithmPerformanceA variant of the basic algorithm:Different types of overlapsThe Overlap Detection variantThe local alignment problemWhy local alignment – examplesCross-species genome similarityThe Smith-Waterman algorithmSlide 26Scoring the gaps more accuratelyConvex gap dynamic programmingCompromise: affine gapsNeedleman-Wunsch with affine gapsSlide 31To generalize a bit…Bounded Dynamic ProgrammingSlide 34RecapSequence AlignmentCS262 Lecture 2, Win07, BatzoglouComplete DNA SequencesMore than 300 complete genomes have been sequencedCS262 Lecture 2, Win07, BatzoglouEvolutionCS262 Lecture 2, Win07, BatzoglouEvolution at the DNA level…ACGGTGCAGTTACCA……AC----CAGTCCACCA…MutationSEQUENCE EDITSREARRANGEMENTSDeletionInversionTranslocationDuplicationCS262 Lecture 2, Win07, BatzoglouEvolutionary Rates OKOKOKXXStill OK?next generationCS262 Lecture 2, Win07, BatzoglouSequence conservation implies functionAlignment is the key to• Finding important regions• Determining function• Uncovering evolutionary eventsCS262 Lecture 2, Win07, BatzoglouSequence Alignment-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGACDefinitionGiven two strings x = x1x2...xM, y = y1y2…yN,an alignment is an assignment of gaps to positions0,…, N in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gapin the other sequenceAGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGACCS262 Lecture 2, Win07, BatzoglouWhat is a good alignment?AGGCTAGTT, AGCGAAGTTTAGGCTAGTT- 6 matches, 3 mismatches, 1 gapAGCGAAGTTTAGGCTA-GTT- 7 matches, 1 mismatch, 3 gapsAG-CGAAGTTTAGGC-TA-GTT- 7 matches, 0 mismatches, 5 gapsAG-CG-AAGTTTCS262 Lecture 2, Win07, BatzoglouScoring Function•Sequence edits:AGGCCTCMutations AGGACTCInsertions AGGGCCTCDeletions AGG . CTCScoring Function:Match: +mMismatch: -sGap: -dScore F = (# matches)  m - (# mismatches)  s – (#gaps)  dAlternative definition:minimal edit distanceminimal edit distance“Given two strings x, y,find minimum # of edits (insertions, deletions, mutations) to transform one string to the other”CS262 Lecture 2, Win07, BatzoglouHow do we compute the best alignment?AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGAAGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTCToo many possible alignments:>> 2N(exercise)CS262 Lecture 2, Win07, BatzoglouAlignment is additiveObservation:The score of aligning x1……xMy1……yNis additiveSay that x1…xixi+1…xM aligns to y1…yjyj+1…yNThe two scores add up:F(x[1:M], y[1:N]) = F(x[1:i], y[1:j]) + F(x[i+1:M], y[j+1:N])CS262 Lecture 2, Win07, BatzoglouDynamic Programming•There are only a polynomial number of subproblemsAlign x1…xi to y1…yj•Original problem is one of the subproblemsAlign x1…xM to y1…yN•Each subproblem is easily solved from smaller subproblemsWe will show next•Then, we can apply Dynamic Programming!!!Dynamic Programming!!!Let F(i, j) = optimal score of aligningx1……xiy1……yjF is the DP “Matrix” or “Table”“Memoization”CS262 Lecture 2, Win07, BatzoglouDynamic Programming (cont’d)Notice three possible cases:1. xi aligns to yjx1……xi-1 xiy1……yj-1 yj2. xi aligns to a gapx1……xi-1 xiy1……yj -3. yj aligns to a gapx1……xi -y1……yj-1 yj m, if xi = yjF(i, j) = F(i – 1, j – 1) + -s, if not F(i, j) = F(i – 1, j) – d F(i, j) = F(i, j – 1) – dCS262 Lecture 2, Win07, BatzoglouDynamic Programming (cont’d)How do we know which case is correct?Inductive assumption:F(i, j – 1), F(i – 1, j), F(i – 1, j – 1) are optimalThen, F(i – 1, j – 1) + s(xi, yj)F(i, j) = max F(i – 1, j) – d F(i, j – 1) – dWhere s(xi, yj) = m, if xi = yj; -s, if notCS262 Lecture 2, Win07, BatzoglouG -A G T A0 -1 -2 -3 -4A -1 1 0 -1 -2T -2 0 0 1 0A -3 -1 -1 0 2F(i,j) i = 0 1 2 3 4Examplex = AGTA m = 1y = ATA s = -1d = -1j = 0123F(1, 1) = max{F(0,0) + s(A, A), F(0, 1) – d, F(1, 0) – d} =max{0 + 1, -1 – 1, -1 – 1} = 1AATTAAProcedure to output Alignment• Follow the backpointers• When diagonal,OUTPUT xi, yj• When up,OUTPUT yj• When left,OUTPUT xiCS262 Lecture 2, Win07, BatzoglouThe Needleman-Wunsch Matrixx1 ……………………………… xMy1 ……………………………… yNEvery nondecreasing path from (0,0) to (M, N) corresponds to an alignment of the two sequencesAn optimal alignment is composed of optimal subalignmentsCS262 Lecture 2, Win07, BatzoglouThe Needleman-Wunsch Algorithm1. Initialization.a. F(0, 0) = 0b. F(0, j) = - j  dc. F(i, 0) = - i  d2. Main Iteration. Filling-in partial alignmentsa. For each i = 1……M For each j = 1……N F(i – 1,j – 1) + s(xi, yj) [case 1]F(i, j) = max F(i – 1, j) – d [case 2] F(i, j – 1) – d [case 3]DIAG, if [case 1]Ptr(i, j) = LEFT, if [case 2]UP, if [case 3]3. Termination. F(M, N) is the optimal score, andfrom Ptr(M, N) can trace back optimal alignmentCS262 Lecture 2, Win07, BatzoglouPerformance•Time:O(NM)•Space:O(NM)•Later we will cover more efficient methodsCS262 Lecture 2, Win07, BatzoglouA variant of the basic algorithm:•Maybe it is OK to have an unlimited # of gaps in the beginning and end:----------CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC ||||||| |||| | || ||GCGAGTTCATCTATCAC--GACCGC--GGTCG--------------•Then, we don’t want to penalize gaps in the endsCS262 Lecture 2, Win07, BatzoglouDifferent types of overlapsExample:2 overlapping“reads” from a sequencing project – recall Lecture 1Example:Search for a mouse genewithin a human chromosomeCS262 Lecture 2, Win07, BatzoglouThe Overlap Detection variantChanges:1. InitializationFor all i, j,F(i, 0) = 0F(0, j) = 02. Termination maxi F(i, N)FOPT = max maxj F(M, j)x1 ……………………………… xMy1 ……………………………… yNCS262 Lecture 2, Win07, BatzoglouThe local alignment problemGiven two strings x = x1……xM, y = y1……yNFind substrings x’, y’ whose similarity (optimal global alignment value)is maximumx =


View Full Document

Stanford CS 262 - Sequence Alignment

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Sequence Alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sequence Alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sequence Alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?