DOC PREVIEW
CMU BSC 03510 - Computational Biology, Part 3
Pages 54

This preview shows page 1-2-3-4-25-26-27-51-52-53-54 out of 54 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 54 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Computational Biology Part 3 Sequence Comparison with Dynamic Programming Robert F Murphy Copyright 1996 1999 2007 All rights reserved Dynamic programming algorithms for sequence comparison Builds on concept of dot matrix Introduced for biological sequences by S B Needleman C D Wunsch A general method applicable to the search for similarities in the amino acid sequence of two proteins J Mol Biol 48 443 453 1970 Steps of basic dynamic programming alignment method 1 Initialize matrix to match scores 0 or 1 2 Do summation operation Finds the maximum number of matches that can be obtained starting at any position and proceeding forward 3 Traceback to find maximum match alignment Summation operation 1 Start in lower right corner 2 Move up one position and left one position 3 Find largest value in either a row segment starting one below current position and extending to the right or b column segment starting one to the right of current position and extending down Summation operation cont 4 Add this value to the value in the current cell 5 Repeat steps 3 and 4 for all cells to the left in current row and all cells above in current column 6 If we are not in the top left corner go to step 2 V HGQKV VA HGQKVA VADALTK HGQKVADALTK VADALTK HGQKVADALTK VADALTKPVNFKFA HGQKVADALTK A VADALTKPVNFKFAVAH HGQKVADALTK AVAH The sequences are entered here 1 acdcdefg 2 acdefghi Illustration of Simple Dynamic Programming Method Step 1 Initialize the matrix with 1 if match 0 if no match a c 1 0 0 0 0 0 0 0 a c d e f g h i d 0 1 0 0 0 0 0 0 c 0 0 1 0 0 0 0 0 d 0 1 0 0 0 0 0 0 e 0 0 1 0 0 0 0 0 f 0 0 0 1 0 0 0 0 g 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 Step 2 Do the summation operation a c 6 4 3 2 1 0 0 0 1 a c d e f g h i d 5 5 3 2 1 0 0 0 2 c 5 4 4 2 1 0 0 0 3 d 4 5 3 2 1 0 0 0 4 e 3 3 4 2 1 0 0 0 5 f 2 2 2 3 1 0 0 0 6 g 1 1 1 1 2 0 0 0 7 idx 0 0 0 0 0 1 0 0 8 1 2 3 4 5 6 7 8 Step 3 Trace back to find the maximum match pathway a a c Demonstration A8 d e f g h i c d c d e f g 1 2 3 4 5 6 col row next maxlow maxrt LIST 2 28 C 29 C 35 I 29 B 28 3 29 D 30 D 35 I 30 C 29 4 30 E 31 E 35 I 31 D 30 7 31 H 32 H 35 I 32 G 31 8 32 I 33 I 35 I 33 H 32 9 33 J 34 J 35 I 34 I 33 0 0 A 1 A 35 I 1 0 0 A 1 A 35 I 1 Use of dynamic programming to evaluate homology between pairs of sequences If we just want to know maximum match possible between two sequences don t need to do traceback but can just look at the highest value in the first row or column match score This represents the best possible alignment score Extensions to basic dynamic programming alignment method use similarity function in initialization step Similarity Functions Used to facilitate comparison of two sequence elements logical valued true or false 1 or 0 test whether first argument matches or could match second argument numerical valued test degree to which first argument matches second Scoring similarity matrices For each pair of characters in alphabet value is proportional to degree of similarity or other scoring criterion between them For proteins most frequently used is Mutation Data Matrix from Dayhoff 1978 MDM78 The sequences are entered here 1 acdcdefg 2 acdefghi Step 1 Initialize the matrix with similarity values a Dynamic Programming Illustration with Similarity Matrix a c d e f g h i 2 2 0 0 4 1 1 1 c d c d e f 2 0 2 0 0 12 5 12 5 5 5 4 5 4 3 5 3 5 3 4 4 6 4 6 5 3 1 3 1 0 3 1 3 1 1 2 2 2 2 2 g 4 4 6 5 9 5 2 1 1 3 1 0 5 5 2 3 this section refers to the PAM250 matrix found at the bottom of this spreadsheet Step 2 Do the summation operation a c d e f g h i a c d c d e f 36 20 18 14 1 3 0 1 1 32 34 13 9 1 1 2 2 2 34 17 22 17 1 3 2 2 3 20 34 13 9 1 1 2 2 4 18 13 22 17 1 3 2 2 5 14 9 17 18 0 1 2 2 6 1 1 1 0 14 7 5 1 7 g 1 3 1 0 5 5 2 3 8 idx 1 2 3 4 5 6 7 8 Step 3 Trace back to find the maximum match pathway a a c Demonstration A9 d e f g h i c d c d e f g 1 2 3 4 5 6 col row next maxlow maxrt LIST 2 28 C 29 C 35 I 29 B 28 3 29 D 30 D 35 I 30 C 29 4 30 E 31 E 35 I 31 D 30 7 31 H 32 H 35 I 32 G 31 8 32 I 33 I 35 I 33 H 32 9 33 J 34 J 35 I 34 I 33 0 0 A 1 A 35 I 1 0 0 A 1 A 35 I 1 Extensions to basic dynamic programming alignment method use similarity function in initialization step allow gaps use gap penalties Gap penalty alternatives constant gap penalty for gap 1 gap penalty proportional to gap size one penalty for starting a gap gap opening penalty different lower penalty for adding to a gap gap extension penalty Step 2 Do the summation operation a a c d e f g h i c d c d e f g 1 20 34 17 34 13 9 1 3 18 13 22 13 22 17 1 1 14 9 17 9 17 18 0 0 1 1 3 1 4 4 14 5 1 3 1 3 1 7 5 3 0 2 2 2 2 2 5 2 1 2 2 2 2 2 1 3 1 2 3 4 5 6 7 8 36 32 34 20 18 14 5 Appendix Demonstration A12 idx 1 2 3 4 5 6 7 8 gap penalty 0 Step 3 Trace back to find the maximum match pathway a …


View Full Document

CMU BSC 03510 - Computational Biology, Part 3

Download Computational Biology, Part 3
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Computational Biology, Part 3 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Computational Biology, Part 3 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?