UMD CMSC 423 - Gap Penalties - D1703407

Home> Schools> University of Maryland, College Park> Computer Science (CMSC) > CMSC 423> Gap Penalties

UMD CMSC 423 - Gap Penalties

School name University of Maryland, College Park

Course Cmsc 423- Bioinformatic Algorithms, Databases, and Tools

Pages 17

Download Save

Unformatted text preview:

Gap Penalties CMSC 423 General Gap Penalties AAAGAATTCA A A A T CA vs AAAGAATTCA AAA TCA These have the same score but the second one is often more plausible A single insertion of GAAT into the first string could change it into the second Now the cost of a run of k gaps is GAP k A solution to the problem above is to support general gap penalty so that the score of a run of k gaps is gap k GAP k Then the optimization will prefer to group gaps together General Gap Penalties AAAGAATTCA A A A T CA vs AAAGAATTCA AAA TCA Previous DP no longer works with general gap penalties because the score of the last character depends on details of the previous alignment AAAGAAC AAA vs AAAGAATC AAA Instead we need to know how the previous alignment ends in order to give a score to the last subproblem Three Matrices We now keep 3 different matrices M i j score of best alignment of x 1 i and y 1 j ending with a charactercharacter match or mismatch X i j score of best alignment of x 1 i and y 1 j ending with a space in X Y i j score of best alignment of x 1 i and y 1 j ending with a space in Y M i 1 j 1 M i j match i j max X i 1 j 1 Y i 1 j 1 M i j k gap k X i j max Y i j k gap k for 1 k j for 1 k j M i k j gap k Y i j max X i k j gap k for 1 k i for 1 k i The M Matrix We now keep 3 different matrices M i j score of best alignment of x 1 i and y 1 j ending with a charactercharacter match or mismatch X i j score of best alignment of x 1 i and y 1 j ending with a space in X Y i j score of best alignment of x 1 i and y 1 j ending with a space in Y By definition alignment ends in a match M i 1 j 1 M i j match i j max X i 1 j 1 Y i 1 j 1 Any kind of alignment is allowed before the match A G The X and Y matrices k i x G ACGT G y j k j M i j k gap k X i j max Y i j k gap k i x y k G CGT G j k j k decides how long to make the gap We have to make the whole gap at once in order to know how to score it for 1 k j for 1 k j The X and Y matrices k i x G ACGT G y j k j M i j k gap k X i j max Y i j k gap k i x y k G CGT G j k j k decides how long to make the gap We have to make the whole gap at once in order to know how to score it for 1 k j for 1 k j This case is automatically handled k i x y GCGT G j k j Running Time for Gap Penalties M i 1 j 1 M i j match i j max X i 1 j 1 Y i 1 j 1 M i j k gap k X i j max Y i j k gap k for 1 k j for 1 k j M i k j gap k Y i j max X i k j gap k for 1 k i for 1 k i Final score is max M n m X n m Y n m How do you do the traceback Runtime Assume X Y n for simplicity 3n2 subproblems 2n2 subproblems take O n time to solve because we have to try all k O n3 total time Affine Gap Penalties O n3 for general gap penalties is usually too slow We can still encourage spaces to group together using a special case of general penalties called affine gap penalties gap start the cost of starting a gap gap extend the cost of extending a gap by one more space Same idea of using 3 matrices but now we don t need to search over all gap lengths we just have to know whether we are starting a new gap or not Affine Gap Penalties M i 1 j 1 If previous M i j match i j max X i 1 j 1 alignment ends in match match this is a Y i 1 j 1 between new gap x and y gap start gap extend M i j 1 X i j max gap extend X i j 1 gap in x gap start gap extend Y i j 1 gap start gap extend M i 1 j Y i j max gap start gap extend X i 1 j gap in y gap extend Y i 1 j Affine Base Cases M 0 i score of best alignment between 0 characters of x and i characters of y that ends in a match because no such alignment can exist X 0 i score of best alignment between 0 characters of x and i characters of y that ends in a gap in x gap start i gap extend because this alignment looks like yyyyyyyyy X i 0 score of best alignment between i characters of x and 0 characters of y that ends in a gap in X xxxxxxxxx not allowed M i 0 M 0 i and Y 0 i and Y i 0 are computed using the same logic as X i 0 and X 0 i Affine Gap Runtime 3n2 subproblems Each one takes constant time Total runtime O n2 back to the run time of the basic running time Why do you need 3 matrices Alternative WRONG algorithm M i j max M i 1 j 1 cost x i y i M i 1 j gap gap start if Arrow i 1 j M j i 1 gap gap start if Arrow i j 1 Intuition we only need to know whether we are starting a gap or extending a gap The arrows coming out of each subproblem tell us how the best alignment ends so we can use them to decide if we are starting a new gap The best alignment up to this cell ends in a gap The best alignment up to this cell ends in a match PROBLEM The best alignment for strings x 1 i and y 1 j doesn t have to be used in the best alignment between x 1 i 1 and y 1 j 1 Why 3 Matrices Example match 10 mismatch 2 gap 7 gap start 15 CART CA T OPT 4 3 optimal score 30 15 7 8 CARTS CA T WRONG 5 3 30 15 …

View Full Document


School:
Email:
New Password:
Confirm Password:

UMD CMSC 423 - Gap Penalties

Sign up for free to view:

Please select your school