DOC PREVIEW
Stanford CS 262 - Sequence Alignment

This preview shows page 1-2-17-18-19-35-36 out of 36 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 36 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Sequence AlignmentSlide 2How do we compute the best alignment?Alignment is additiveDynamic ProgrammingDynamic Programming (cont’d)Slide 7ExampleThe Needleman-Wunsch MatrixThe Needleman-Wunsch AlgorithmPerformanceA variant of the basic algorithm:Different types of overlapsThe Overlap Detection variantThe local alignment problemWhy local alignment – examplesCross-species genome similarityThe Smith-Waterman algorithmSlide 19Scoring the gaps more accuratelyConvex gap dynamic programmingCompromise: affine gapsNeedleman-Wunsch with affine gapsSlide 24To generalize a bit…Bounded Dynamic ProgrammingSlide 27Linear-Space AlignmentSubsequences and SubstringsHirschberg’s algortihmIntroduction: Compute optimal scoreLinear-space alignmentSlide 33Slide 34Slide 35Slide 36Sequence AlignmentSequence Alignment-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGACGiven two strings x = x1x2...xM, y = y1y2…yN,Find the alignment with maximum scoreF = (# matches)  m - (# mismatches)  s – (#gaps)  dAGGCTATCACCTGACCTCCAGGCCGATGCCCTAGCTATCACGACCGCGGTCGATTTGCCCGACHow do we compute the best alignment?AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGAAGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTCToo many possible alignments:>> 2N(exercise)Alignment is additiveObservation:The score of aligning x1……xMy1……yNis additiveSay that x1…xixi+1…xM aligns to y1…yjyj+1…yNThe two scores add up:F(x1…xM, y1…yN) = F(x1…xi, y1...yj) + F(xi+1…xm, yj+1…yN)Dynamic Programming•There are only a polynomial number of subproblemsAlign x1…xi to y1…yj•Original problem is one of the subproblemsAlign x1…xM to y1…yN•Each subproblem is easily solved from smaller subproblemsWe will show next•Then, we can apply Dynamic Programming!!!Dynamic Programming!!!Let F(i, j) = optimal score of aligningx1……xiy1……yjF is the DP “Matrix” or “Table”“Memoization”Dynamic Programming (cont’d)Notice three possible cases:1. xi aligns to yjx1……xi-1 xiy1……yj-1 yj2. xi aligns to a gapx1……xi-1 xiy1……yj -3. yj aligns to a gapx1……xi -y1……yj-1 yj m, if xi = yjF(i, j) = F(i – 1, j – 1) + -s, if not F(i, j) = F(i – 1, j) – d F(i, j) = F(i, j – 1) – dDynamic Programming (cont’d)How do we know which case is correct?Inductive assumption:F(i, j – 1), F(i – 1, j), F(i – 1, j – 1) are optimalThen, F(i – 1, j – 1) + s(xi, yj)F(i, j) = max F(i – 1, j) – d F(i, j – 1) – dWhere s(xi, yj) = m, if xi = yj; -s, if notG -A G T A0 -1 -2 -3 -4A -1 1 0 -1 -2T -2 0 0 1 0A -3 -1 -1 0 2F(i,j) i = 0 1 2 3 4Examplex = AGTA m = 1y = ATA s = -1d = -1j = 0123F(1, 1) = max{F(0,0) + s(A, A), F(0, 1) – d, F(1, 0) – d} =max{0 + 1, -1 – 1, -1 – 1} = 1AATTAAProcedure to output Alignment• Follow the backpointers• When diagonal,OUTPUT xi, yj• When up,OUTPUT yj• When left,OUTPUT xiThe Needleman-Wunsch Matrixx1 ……………………………… xMy1 ……………………………… yNEvery nondecreasing path from (0,0) to (M, N) corresponds to an alignment of the two sequencesAn optimal alignment is composed of optimal subalignmentsThe Needleman-Wunsch Algorithm1. Initialization.a. F(0, 0) = 0b. F(0, j) = - j  dc. F(i, 0) = - i  d2. Main Iteration. Filling-in partial alignmentsa. For each i = 1……M For each j = 1……N F(i – 1,j – 1) + s(xi, yj) [case 1]F(i, j) = max F(i – 1, j) – d [case 2] F(i, j – 1) – d [case 3]DIAG, if [case 1]Ptr(i, j) = LEFT, if [case 2]UP, if [case 3]3. Termination. F(M, N) is the optimal score, andfrom Ptr(M, N) can trace back optimal alignmentPerformance•Time:O(NM)•Space:O(NM)•Later we will cover more efficient methodsA variant of the basic algorithm:•Maybe it is OK to have an unlimited # of gaps in the beginning and end:----------CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC ||||||| |||| | || ||GCGAGTTCATCTATCAC--GACCGC--GGTCG--------------•Then, we don’t want to penalize gaps in the endsDifferent types of overlapsExample:2 overlapping“reads” from a sequencing project – recall Lecture 1Example:Search for a mouse genewithin a human chromosomeThe Overlap Detection variantChanges:1. InitializationFor all i, j,F(i, 0) = 0F(0, j) = 02. Termination maxi F(i, N)FOPT = max maxj F(M, j)x1 ……………………………… xMy1 ……………………………… yNThe local alignment problemGiven two strings x = x1……xM, y = y1……yNFind substrings x’, y’ whose similarity (optimal global alignment value)is maximumx = aaaacccccggggttay = ttcccgggaaccaaccWhy local alignment – examples •Genes are shuffled between genomes•Portions of proteins (domains) are often conservedCross-species genome similarity•98% of genes are conserved between any two mammals•>70% average similarity in protein sequencehum_a : GTTGACAATAGAGGGTCTGGCAGAGGCTC--------------------- @ 57331/400001mus_a : GCTGACAATAGAGGGGCTGGCAGAGGCTC--------------------- @ 78560/400001rat_a : GCTGACAATAGAGGGGCTGGCAGAGACTC--------------------- @ 112658/369938fug_a : TTTGTTGATGGGGAGCGTGCATTAATTTCAGGCTATTGTTAACAGGCTCG @ 36008/68174 hum_a : CTGGCCGCGGTGCGGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTG @ 57381/400001mus_a : CTGGCCCCGGTGCGGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTG @ 78610/400001rat_a : CTGGCCCCGGTGCGGAGCGTCTGGAGCGGAGCACGCGCTGTCAGCTGGTG @ 112708/369938fug_a : TGGGCCGAGGTGTTGGATGGCCTGAGTGAAGCACGCGCTGTCAGCTGGCG @ 36058/68174 hum_a : AGCGCACTCTCCTTTCAGGCAGCTCCCCGGGGAGCTGTGCGGCCACATTT @ 57431/400001mus_a : AGCGCACTCG-CTTTCAGGCCGCTCCCCGGGGAGCTGAGCGGCCACATTT @ 78659/400001rat_a : AGCGCACTCG-CTTTCAGGCCGCTCCCCGGGGAGCTGCGCGGCCACATTT @ 112757/369938fug_a : AGCGCTCGCG------------------------AGTCCCTGCCGTGTCC @ 36084/68174 hum_a : AACACCATCATCACCCCTCCCCGGCCTCCTCAACCTCGGCCTCCTCCTCG @ 57481/400001mus_a : AACACCGTCGTCA-CCCTCCCCGGCCTCCTCAACCTCGGCCTCCTCCTCG @ 78708/400001rat_a : AACACCGTCGTCA-CCCTCCCCGGCCTCCTCAACCTCGGCCTCCTCCTCG @ 112806/369938fug_a : CCGAGGACCCTGA------------------------------------- @ 36097/68174“atoh” enhancer in human, mouse, rat, fugu fishThe Smith-Waterman algorithmIdea: Ignore badly aligning regionsModifications to Needleman-Wunsch:Initialization: F(0, j) = F(i, 0) = 00Iteration: F(i, j) = max F(i – 1, j) – dF(i, j – 1) – dF(i – 1, j – 1) + s(xi, yj)The Smith-Waterman algorithmTermination:1. If we want the best local alignment…FOPT = maxi,j F(i, j)Find FOPT and


View Full Document

Stanford CS 262 - Sequence Alignment

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Sequence Alignment
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Sequence Alignment and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Sequence Alignment 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?