DOC PREVIEW
UMD CMSC 838T - Gapped BLAST and PSLBLAST

This preview shows page 1-2-3-4-5-6 out of 18 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 18 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Nucleic Acids Research. 1997. Vol. 25, No. 17 3389-3402 1. :, t 1 Gapped BLAST and PSLBLAST: a new generation of protein database search programs Stephen F. Altschul*, Thomas L. Madden, Alejandro A. Schaffer', Jinghui Zhang, Zheng Zhang2, Webb Miller2 and David J. Lipman National Center fof Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA, 'Laboratory of Genetic Disease Research, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA and 2Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA i Receivad June 20,1997; Revised and Accepted July 16,1997 A3STRACT The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs It approximately three times the speed of the original. n addition, a method is introduced for automatically mnbining statistically significant alignments pro- iuced by BLAST into a position-specific score matrix, ind searching the database using this matrix. The esulting Position-Specific Iterated BLAST (PSI- 3LAST) program runs ai approximately the same ;peed per iteration as gapped BLAST, but in many ases is much more sensitive to weak but biologically slevant sequence similarities. PSI-BLAST is used to Incover several new and Snteresting members of the 3RCT superfamily. NTRODUCTtON hiations of the BLAST algorithm (1) have been incorporated mto several popular programs for searching protein and DNA iatabahes for sequence similarities. BLAST programs have been wrinen 10 compare protein or DNA queries with protein or DNA hbases in any combination. with DNA sequences often undergoing conceptual translation before any comparison is Wormed. We will use the blarrp program, which compares pratein queries to protein databases. as a prototype for BLAST, although the ideas presented extend immediately to other Wsions that involve the translation of a DNA query or database. Some of the refinements described are applicable as well to DNA-DNA cornparison, kt have yet to be implemented. BLAST is a heuristic that attempts to optimize a specifi similarity measure. It pennits a tradeoff between speed an sensitivity, with the sening of a 'threshold' parameter, T. A highc value of T yields greater speed, but also an increased probabilil of missing weak similarities. The BLAST prow requires rim proponional to the product of the lengths of the query sequenc and the database searched. Since the rate of change in databa: sizes currently exceeds that of processor speeds, compute: running BLAST are subjected to increasing load. However, tt conjunction of several new algorithmic ideas allow a new versic of BLAST to achieve improved sensitivity at substantial: augmented speed. This paper describes three major refinemen to BLAST. (i) For increased speed, the criterion for extending word pai has been modified. The original BLAST program seeks shc word pairs whose aligned score is at least T. Each such 'hit' is thr extended, to test whether it is contained within a high-scori~ alignment. For the default Tvalue, this extension step consum. most of the processing time. The new 'two-hit' method requir. the existence of two nonsverlapping word pairs on the san diagonal, and within a distance A of one another, before : extension is invoked. To achieve comparable sensitivity, t! threshold parameter T must be lowered, yielding more hits th: previously. However, because only a small fraction of these h are extended, the average amount of computation requir decreases. (ii) The ability to generate gapped alignments has been add< The original BLAST program often finds several alignme! involving a single database sequence which. when consider together, are statistically significant. Overlooking any one these alignments can compromise the combined result. I introducing an algorithm for generating gapped alignments. becomes necessary to find only one rather than all the ungapp alignments subsumed in a significant result. This allows thc parameter to be raised, increasing the speed of the initial databr scan. The new gapped alignment algorithm uses dynm ' programming to extend a centd pair of aligned residues in ix directions. For speed. earlier heuristic methods (2,3) confined I alignments produced 10 a predefined s~p of the dynar.3390 Nucleic Acids Research, 1997. VO~. 25. NO. 17 programming path graph (4). Our approach considers only alignments that drop in score no more than Xg below the best score yet seen. The algorithm is able thereby to adapt the region of the path graph it explores to the data. (iii) BLAST searches may be iterated, with a position-specific score mamx generated from significant alignments found in round i used for round i + 1. Motif or profile search methods frequently are much more sensitive than painvise comparison methcds at detecting distant relationships. However, creating a set of motifs or a profiie that describes a protein family, and searching a database with them, typically has involved running several different programs, with substantial user intervention at various stages. ?he BLAST algorithm is easily generalized to use an arbitrary position-specific score ma& in place of a query sequence and associated substitution matrix. Accordingly, we have automated the procedure of generating such a matrix from the output produced by a BLAST search, and adapted the BLAST algorithm to take this mamx as input. The resulting Position- Specific Iterated BLAST, or PSI-BLAST, program may not be as sensitive as the best available motif search programs, but its speed and ease of operation can bring the power of these methods into more common use. After


View Full Document

UMD CMSC 838T - Gapped BLAST and PSLBLAST

Documents in this Course
Load more
Download Gapped BLAST and PSLBLAST
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Gapped BLAST and PSLBLAST and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Gapped BLAST and PSLBLAST 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?