DOC PREVIEW
Stanford CS 374 - MUSCLE - a multiple sequence alignment method

This preview shows page 1-2-3-4-5-6 out of 19 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 19 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

AbstractBackgroundResultsConclusionsBackgroundCurrent methodsImplementationAlgorithm overviewStage 1: draft progressiveSimilarity measureDistance estimateTree constructionProgressive alignmentStage 2: improved progressiveSimilarity measureTree constructionTree comparisonProgressive alignmentStage 3: refinementChoice of bipartitionProfile extractionRe-alignmentAccept/rejectAlgorithm elementsObjective scoreProgressive alignmentSimilarity measuresDistance measuresTree constructionSequence weightingProfile functionsGap penaltiesTerminal gapsTree comparisonDefaults, optimizations and complexity analysisTable 1Complexity of CLUSTALWInitial distance measureClusteringDynamic programmingInner loopDiagonal findingAdditive profilesSequence weightingGap representationConstruction of the root alignmentE-stringsBrenner's methodRefinement complexityAnchor columnsSP scoreDimer approximationEvaluation of profile functionsComplexity of MUSCLEResultsAlignment accuracyExecution speedTable 2Table 3Table 4ConclusionsAvailability and requirementsReferencesBioMed CentralPage 1 of 19(page number not for citation purposes)BMC BioinformaticsOpen AccessSoftwareMUSCLE: a multiple sequence alignment method with reduced time and space complexityRobert C Edgar*Address: Department of Plant and Microbial Biology, 461 Koshland Hall, University of California, Berkeley, CA 94720-3102, USAEmail: Robert C Edgar* - [email protected]* Corresponding author AbstractBackground: In a previous paper, we introduced MUSCLE, a new program for creating multiplealignments of protein sequences, giving a brief summary of the algorithm and showing MUSCLE toachieve the highest scores reported to date on four alignment accuracy benchmarks. Here wepresent a more complete discussion of the algorithm, describing several previously unpublishedtechniques that improve biological accuracy and / or computational complexity. We introduce anew option, MUSCLE-fast, designed for high-throughput applications. We also describe a newprotocol for evaluating objective functions that align two profiles.Results: We compare the speed and accuracy of MUSCLE with CLUSTALW, Progressive POAand the MAFFT script FFTNS1, the fastest previously published program known to the author.Accuracy is measured using four benchmarks: BAliBASE, PREFAB, SABmark and SMART. We testthree variants that offer highest accuracy (MUSCLE with default settings), highest speed (MUSCLE-fast), and a carefully chosen compromise between the two (MUSCLE-prog). We find MUSCLE-fastto be the fastest algorithm on all test sets, achieving average alignment accuracy similar toCLUSTALW in times that are typically two to three orders of magnitude less. MUSCLE-fast is ableto align 1,000 sequences of average length 282 in 21 seconds on a current desktop computer.Conclusions: MUSCLE offers a range of options that provide improved speed and / or alignmentaccuracy compared with currently available programs. MUSCLE is freely available at http://www.drive5.com/muscle.BackgroundMultiple alignments of protein sequences are importantin many applications, including phylogenetic tree estima-tion, secondary structure prediction and critical residueidentification. Many multiple sequence alignment (MSA)algorithms have been proposed; for a recent review, see[1]. Two attributes of MSA programs are of primaryimportance to the user: biological accuracy and computa-tional complexity (i.e., time and memory requirements).Complexity is of increasing relevance due to the rapidgrowth of sequence databases, which now containenough representatives of larger protein families to exceedthe capacity of most current programs. Obtaining biolog-ically accurate alignments is also a challenge, as the bestmethods sometimes fail to align readily apparent con-served motifs [2]. We recently introduced MUSCLE, a newMSA program that provides significant improvements inboth accuracy and speed, giving only a summary of thealgorithm [2]. Here, we describe the MUSCLE algorithmmore fully and analyze its complexity. We introduce aPublished: 19 August 2004BMC Bioinformatics 2004, 5:113 doi:10.1186/1471-2105-5-113Received: 25 March 2004Accepted: 19 August 2004This article is available from: http://www.biomedcentral.com/1471-2105/5/113© 2004 Edgar; licensee BioMed Central Ltd. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.BMC Bioinformatics 2004, 5:113 http://www.biomedcentral.com/1471-2105/5/113Page 2 of 19(page number not for citation purposes)new option designed for high-throughput applications,MUSCLE-fast. We also describe a new method for evaluat-ing objective functions for profile-profile alignment, theiterated step in the MUSCLE algorithm.Current methodsWhile multiple alignment and phylogenetic tree recon-struction have traditionally been considered separately,the most natural formulation of the computational prob-lem is to define a model of sequence evolution thatassigns probabilities to all possible elementary sequenceedits and then to seek an optimal directed graph in whichedges represents edits and terminal nodes are theobserved sequences. This graph makes the history explicit(it can be interpreted as a phylogenetic tree) and impliesan alignment. No tractable method for finding an optimalgraph is known for biologically realistic models, and sim-plification is therefore required. A common heuristic is toseek a multiple alignment that maximizes the SP score(the summed alignment score of each sequence pair),which is NP complete [3]. It can be achieved by dynamicprogramming with time and space complexity O(LN) inthe sequence length L and number of sequences N [4],and is practical only for very small N. Stochastic methodssuch as Gibbs sampling can be used to search for a maxi-mum objective score [5], but have not been widelyadopted. A more popular strategy is the progressivemethod [6,7] (Figure 1), which first estimates a phyloge-netic tree. A profile (a multiple alignment treated as asequence by regarding each column as a symbol) is thenconstructed for each node in the binary tree. If the node isa leaf, the profile is the corresponding sequence; other-wise its profile is produced by a pair-wise alignment of theprofiles of its child nodes (Figure 2). Current progressivealgorithms are typically practical for up to


View Full Document

Stanford CS 374 - MUSCLE - a multiple sequence alignment method

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download MUSCLE - a multiple sequence alignment method
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view MUSCLE - a multiple sequence alignment method and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view MUSCLE - a multiple sequence alignment method 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?