Unformatted text preview:

1Pairwise sequence alignment (global and local)Multiple sequence pqalignmentlocalglobalSubstitution matricesDatabase searchingBLASTSequence ttitiEvolutionary tree reconstructionRNA structure predictionGene FindingProtein structure predictionstatisticsComputational genomics…Since exact methods for MSA have exponential time complexity, heuristic approaches are used. Progressive alignment is the most commonly used.Basic progressive alignment strategy:• Compute D, a matrix of distances between all pairs of sequences• From D, construct a “guide tree” T• Construct MSA by pairwise alignment of partial alignments (“profiles”) guided by T• Improve alignment by postprocessing steps.2Complexity of progressive alignment•Distance matrix•Distance matrix– Each pairwise alignment O(n2)– Number of pairwise alignments O(k2)• Iterative construction of MSA– Number of merge steps O(k)– Each pairwise alignment O(k2n2)Entire method O(k2n2)Summary:Progressive alignment heuristics• Not guaranteed to give the optimal MSA• Bad choice of gaps propagates• Complexity– Progressive: O(k2n2)– versus DP: O(nk2k k2)• Typically, merge the most closely related sequences first.3Mathematical correctness is not a guarantee of biological accuracy. The performance of MSA programs is typically evaluated using benchmarks based on biological data:– Curated structural alignment–Automated structural alignmentAutomated structural alignment– Real or simulated sequenceVarious benchmarks are designed to mimic properties of different types of data sets encountered in practice, especially those that are challenging to align:- Highly divergent sequences, e.g., <50% or <30% identity- A family of related sequences plus several outliers, or “orphan” sequences- Related sequences that differ due to large N or C terminal extensions or large internal insertions or deletionsModern MSA programs employ various strategies to improve the speed and/or accuracy of one or more steps in the basic progressive alignment procedure (see following slides). In addition, accuracy may be improved by iterating • Construct distance matrix, D• Construct guide tree, Tover individual steps or the entire procedure:• Iterative construction of MSA guided by T.• Iterative refinement of MSA4Strategies for improvement:Faster calculation of distance matrixObtain preliminary distance matrix withoutObtain preliminary distance matrix without pairwise alignment– Use a compressed alphabet– Compare polarity and volume with FFT– k-mer distance–Restrict pairwise searchNote D must only be sufficiently accurate to generate an acceptable guide treeStrategies for improvement:More accurate scoring •AdditionalinformationAdditional information– Structural information– Homology• Recalculate guide tree based on preliminary MSA•Consistencybtw sequence triples:Consistency btw sequence triples:X1y1y2z1z25• Position specific gap penaltiesStrategies for improvement:More accurate scoring Penalize gaps in hydrophobic and hydrophillicregions differentlyDo and Katoh, 2008Other improvements• Sequence weightingDo and Katoh, 2008Assign weights so that these sequences do not dominate.6Which program to choose?Do and Katoh, 2008Aligner Performance*Time Note that implementation choices result in substantial differences in running time:DIALIGN 57.2 12 h, 25 min CLUSTALW 58.9 2 h, 57 min T-Coffee 63.6 144 h, 51 min MUSCLE 64.8 3 h, 11 min MAFFT 64.8 2h,36min ,ProbCons 66.9 19 h, 41 min ProbCons-ext 68.0 37 h, 46 min Do et al, Genome Research, 2005* Fraction of correctly aligned residue pairs7Pairwise sequence alignment (global and local)Multiple sequence pqalignmentlocalglobalSubstitution matricesDatabase searchingBLASTSequence ttitiEvolutionary tree reconstructionRNA structure predictionGene FindingProtein structure predictionstatisticsComputational


View Full Document

CMU BSC 03711 - Notes

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?