DOC PREVIEW
CORNELL CS 726 - Gapped BLAST and PSI- BLAST

This preview shows page 1-2-3-4-5 out of 15 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 15 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Gapped BLAST and PSI-BLAST: a new generation of protein database search programsIntroduction to BLASTBasic AlgorithmScanning for hitsOther IssuesTwo-Hit MethodTwo-Hit Method AlgorithmGapped AlignmentsAdvantage of New Heuristic for Generating Gapped AlignmentsOlder Gapped AlignmentsNew Heuristic for Generating Gapped AlignmentsNew Gapped BLASTPSI-BLAST: OverviewPosition-specific score matrixPowerPoint PresentationGapped BLAST and PSI-BLAST: a new generation of protein database search programsBy Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller and David J. LipmanIntroduction to BLAST•BLAST is a heuristic approximation to dynamic programming based local alignment.•Finds locally maximal segment pairs with scores over a cutoff.•Has a formal statistical theory to assess the significance of scores.Basic Algorithm•Looks for words of length w with score greater than T.•These hits are then extended to check for segment pairs with score greater than S (>T.)•Tradeoff: Lowering T reduces probability of missing segment pairs (increases sensitivity) but increases number of hits to be extended.Scanning for hits•Two Approaches:–Positions of length w words in query with score higher than T stored in a 20w sized array and hits detected by array lookup.–A DFA for the appropriate words is generated and used to scan the sequences. A Mealy machine (acceptance on transitions) is used for efficiency.Other Issues•Hit extension is simplified by stopping when score falls below a threshold compared to the best score found for shorter extensions.•The various parameters are chosen based on experiments using random sequences.•Combinations of MSP’s can be used to get better scores for matching sequences.Two-Hit MethodTwo-Hit MethodOriginal BLAST: One-HitOriginal BLAST: One-Hit•Extend each hit to determine if it is in a high-scoring alignment•Extension consumes >90% of processing time•hit: short word pair whose aligned score ≥ T Two-Hit MethodTwo-Hit Method•Extension invoked only if there are two non-overlapping word pairs on the same diagonal•Lowering T yields more hits, but only a few are extended•3x fasterT: threshold parameter; as T ↑, speed ↑, probability of missing weak similarities ↑Two-Hit Method AlgorithmTwo-Hit Method Algorithm•Scan db for hits (word pair scoring ≥ T)•Seek pairs of non-overlapping hits found with distance A of one another on same diagonal•Invoke (ungapped) extension to determine if hits lie within a statistically significant alignment with query. Extend until alignment score has dropped ≥ X below max score yet attained.Gapped AlignmentsGapped AlignmentsOriginal BLASTOriginal BLASTImplicitly treat gapped alignments:•Locate several distinct HSPs within same db sequence•Calculate statistical significance on combined resultGapped BLASTGapped BLAST•Trigger gapped extension for any HSP exceeding moderate score Sg •Gapped extension longer to execute, few undergo this extensionHSP: high-scoring segment pair; locally optimalAdvantage of New Heuristic for Advantage of New Heuristic for Generating Gapped AlignmentsGenerating Gapped Alignments•Two or more HSPs may each have low scores independently, but can have a statistically significance together•Only one of the constituent HSPs need to be found to generate a successful combined result – can increase TOlder Gapped AlignmentsOlder Gapped Alignments•Confine the dynamic programming to a banded section of the full path graph•Optimal gapped alignment may be outside this band•As width of band ↑, speed ↓New Heuristic for Generating New Heuristic for Generating Gapped AlignmentsGapped Alignments•Starting from a seed HSP, dynamic programming proceeds both bidirectionally through the path graph•Consider only cells for which optimal local alignment score falls ≤ Xg below best score yet found•Region of path graph explored adapts to alignment being constructed•Seed: central residue pair of segment with highest alignment along HSPNew Gapped BLASTNew Gapped BLAST•Ungapped extension of second hit invoked for two non-overlapping hits of score ≥ T within distance A of one another•If HSP generated has normalized score ≥ Sg, gapped extension is triggered•Resulting gapped alignment reported if statistically significant (low enough E-value)•Runs on average 3x faster than original BLASTPSI-BLAST: Overview•Results of initial BLAST search used to construct position-specific scores.•BLAST is repeated using the new scores till no more sequences are found.•Position-specific scores improve the ability of successive BLAST iterations for detecting remote homologs.Position-specific score matrix•Dimensions: Lx20•“Multiple Alignment” created using all segments with e-value above a threshold.•Alignment based on pairwise alignments.•Columns with gaps in query ignored.•For each column C a reduced alignment MC is created.•MC includes all sequences with a residue in C and all columns which have the above sequences.•Sequence weighting method used to generate observed residue frequencies.•Score for residue i in column C given by log(Qi/ Pi) •Qi is the weighted sum of observed frequencies and a


View Full Document

CORNELL CS 726 - Gapped BLAST and PSI- BLAST

Download Gapped BLAST and PSI- BLAST
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Gapped BLAST and PSI- BLAST and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Gapped BLAST and PSI- BLAST 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?