DOC PREVIEW
U of I CS 498 - Multiple Sequence Alignment (II)

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Multiple Sequence Alignment (II)Dynamic programming for multi-sequence alignment gives an exact solution, but it’s computationally expensiveWhen finding an exact solution is computationally too expensive, we explore how to find an approximate solution…Inferring Multiple Alignment from Pairwise AlignmentsCombining Optimal Pairwise Alignments into Multiple AlignmentInferring Pairwise AlignmentsMultiple Alignment: Greedy ApproachGreedy Approach: ExampleGreedy Approach: Example (cont’d)Slide 10Slide 11Slide 12Progressive AlignmentFeng-Doolittle Progressive AlignmentFeng-Doolittle: Clustering ExampleFeng-Doolittle: How to generate a multiple alignment?Generating a Multi-Sequence AlignmentProblems with Feng-DoolittleProfile AlignmentIterative RefinementClustalW: A Multiple Alignment ToolClustalW HeuristicsHeuristic 1: Sequence WeightingHeuristic 2: Sophisticated Gap WeightingGap Adjustment HeuristicsGap Adjustment Heuristics (cont.)Heuristic 3: Delayed Alignment of ‘ Divergent SequencesWhat You Should KnowMultiple Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Oct. 6, 2005ChengXiang ZhaiDepartment of Computer ScienceUniversity of Illinois, Urbana-ChampaignDynamic programming for multi-sequence alignment gives an exact solution, but it’s computationally expensiveHow can we help biologists do multi-sequence alignment?When finding an exact solution is computationally too expensive, we explore how to find an approximate solution… So, how do we find a good approximation of the optimal multi-sequence alignment?Inferring Multiple Alignment from Pairwise Alignments •From an optimal multiple alignment, we can infer pairwise alignments between all sequences, but they are not necessarily optimal•It is difficult to infer a ``good” multiple alignment from optimal pairwise alignments between all sequencesCombining Optimal Pairwise Alignments into Multiple AlignmentCan combine pairwise alignments into multiple alignmentCan not combine pairwise alignments into multiple alignmentInferring Pairwise Alignments3 sequences, 3 comparisons4 sequences, 6 comparisons5 sequences, 10 comparisonsMultiple Alignment: Greedy Approach•Choose most similar pair of strings and combine into a consensus, reducing alignment of k sequences to an alignment of of k-1 “sequences”. Repeat•This is a heuristic greedy methodu1= ACGTACGTACGT…u2 = TTAATTAATTAA…u3 = ACTACTACTACT……uk = CCGGCCGGCCGGu1= AC-TAC-TAC-T…u2 = TTAATTAATTAA……uk = CCGGCCGGCCGG…kk-1Greedy Approach: Example•Consider these 4 sequencess1 GATTCAs2 GTCTGAs3 GATATTs4 GTCAGCGreedy Approach: Example (cont’d)•There are = 6 possible alignments24s2 GTCTGAs4 GTCAGC (score = 2)s1 GAT-TCAs2 G-TCTGA (score = 1)s1 GAT-TCAs3 GATAT-T (score = 1)s1 GATTCA--s4 G—T-CAGC(score = 0)s2 G-TCTGAs3 GATAT-T (score = -1)s3 GAT-ATTs4 G-TCAGC (score = -1)Greedy Approach: Example (cont’d)s2 and s4 are closest; combine:s2 GTCTGAs4 GTCAGCs2,4 GTCTGA (consensus)s1 GATTCAs3 GATATTs2,4 GTCTGAnew set becomes:There are many (4) alternative choices for the consensus, let’s assume we randomly choose oneGreedy Approach: Example (cont’d)s1 GATTCAs3 GATATTs2,4 GTCTGAset is:s1 GAT-TCAs3 GATAT-T (score = 1)s1 GATTC--As2,4 G-T-CTGA (score = 0)s3 GATATT-s2,4 G-TCTGA (score=-1)scores are:Take best pair and form another consensus:s1,3 = GATATT (arbitrarily break ties)Greedy Approach: Example (cont’d)new set is:s1,3 GATATTs2,4 GTCTGAs1,3 GATATTs2,4 G-TCTGA (score=-1)Form consensus:s1,3,2,4 = GATCTG(arbitrarily break ties)scores is:Progressive Alignment•Progressive alignment is a variation of greedy algorithms with a somewhat more intelligent strategy for scheduling the merges •Progressive alignment works well for close sequences, but deteriorates for distant sequences•Gaps in consensus string are permanent•Simplified representation of the alignments•Better solution? Use a profile to represent consensusATG-CAAAT-CCA-ACG-CTGA 3 0 0 0 0 2 1T 0 2 0 0 0 1 0G 0 0 2 0 0 0 1C 0 1 0 1 3 0 0A T G C C A AHidden Markov Models (HMMs) capture such a pattern…Feng-Doolittle Progressive Alignment•Step 1: Compute all possible pairwise alignments•Step 2: Convert alignment scores to distances•Step 3: Construct a “guide tree” by clustering•Step 4: Progressive alignment based on the guide tree (bottom up)Note that variations are possible at each step!Feng-Doolittle: Clustering ExampleX1X2X3X4X1X2X3X420 15 11 3 4 21 30 5 3 122 5 25 12 113 4 12 40 94 1 11 9 30X1X2X3X4X1X2X3X4X5X5151210X54.5X5maxlog logobs randeffrandS SD SS S-=- =--Length normalizationmax maxmax( ) ( )2S x S yS+=Similarity matrix (from pairwise alignment)Guide treeConvert score to distanceFeng-Doolittle: How to generate a multiple alignment? •At each step, follow the guide tree and consider all possible pairwise alignments of sequences in the two candidate groups ( 3 cases):–Sequence vs. sequence–Sequence vs. group (the best matching sequence in the group determines the alignment)–group vs. group (the best matching pair of sequences determines the alignment)•“Once a gap, always a gap” –gap is replaced by a neutral symbol X–X can be matched with any symbol, including a gap without penaltyGenerating a Multi-Sequence Alignment•Align the two most similar sequences•Following the guide tree, add in the next sequences, aligning to the existing alignment•Insert gaps as necessarySample output:FOS_RAT PEEMSVTS-LDLTGGLPEATTPESEEAFTLPLLNDPEPK-PSLEPVKNISNMELKAEPFDFOS_MOUSE PEEMSVAS-LDLTGGLPEASTPESEEAFTLPLLNDPEPK-PSLEPVKSISNVELKAEPFDFOS_CHICK SEELAAATALDLG----APSPAAAEEAFALPLMTEAPPAVPPKEPSG--SGLELKAEPFDFOSB_MOUSE PGPGPLAEVRDLPG-----STSAKEDGFGWLLPPPPPPP-----------------LPFQFOSB_HUMAN PGPGPLAEVRDLPG-----SAPAKEDGFSWLLPPPPPPP-----------------LPFQ . . : ** . :.. *:.* * . * **:Dots and stars show how well-conserved a column is.Problems with Feng-Doolittle•All alignments are completely determined by pairwise sequence alignment (restricted search space)•No backtracking (subalignment is “frozen”)–No way to correct an early mistake–Non-optimality: Mismatches and gaps at highly conserved region should be penalized more, but we can’t tell where is a highly conserved region early in the process Profile alignment


View Full Document

U of I CS 498 - Multiple Sequence Alignment (II)

Documents in this Course
Lecture 5

Lecture 5

13 pages

LECTURE

LECTURE

39 pages

Assurance

Assurance

44 pages

LECTURE

LECTURE

36 pages

Pthreads

Pthreads

29 pages

Load more
Download Multiple Sequence Alignment (II)
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Multiple Sequence Alignment (II) and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Multiple Sequence Alignment (II) 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?