DOC PREVIEW
Stanford CS 262 - Lecture 4 - Heuristic Local Alignerers

This preview shows page 1-2-23-24 out of 24 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 24 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Heuristic Local AlignerersIndexing-based local alignmentIndexing-based local alignment—ExtensionsSensitivity-Speed TradeoffSlide 5Measured improvementNon-consecutive words—PatternsAdvantage of PatternsMultiple patternsVariants of BLASTExampleSlide 12The Four-Russian Algorithm brief overview A (not so useful) speedup of Dynamic Programming [Arlazarov, Dinic, Kronrod, Faradzev 1970]Main ObservationMain Observation—ShiftsThe Four-Russian AlgorithmSlide 17Hidden Markov ModelsOutline for our next topicExample: The Dishonest CasinoQuestion # 1 – EvaluationQuestion # 2 – DecodingQuestion # 3 – LearningThe dishonest casino modelCS262 Lecture 4, Win07, BatzoglouHeuristic Local Alignerers1. The basic indexing & extension technique2. Indexing: techniques to improve sensitivityPairs of Words, Patterns3. Systems for local alignmentCS262 Lecture 4, Win07, BatzoglouIndexing-based local alignmentDictionary:All words of length k (~10)Alignment initiated between words of alignment score  T (typically T = k)Alignment:Ungapped extensions until score below statistical thresholdOutput:All local alignments with score > statistical threshold…………queryDBqueryscanCS262 Lecture 4, Win07, BatzoglouIndexing-based local alignment—ExtensionsA C G A A G T A A G G T C C A G TC T G A T C C T G G A T T G C G AGapped extensions until threshold•Extensions with gaps until score < C below best score so farOutput:GTAAGGTCCAGTGTTAGGTC-AGTCS262 Lecture 4, Win07, BatzoglouSensitivity-Speed Tradeofflong words(k = 15)short words(k = 7)SensitivitySpeedKent WJ, Genome Research 2002Sens.SpeedX%CS262 Lecture 4, Win07, BatzoglouSensitivity-Speed TradeoffMethods to improve sensitivity/speed1. Using pairs of words2. Using inexact words3. Patterns—non consecutive positions……ATAACGGACGACTGATTACACTGATTCTTAC…………GGCACGGACCAGTGACTACTCTGATTCCCAG…………ATAACGGACGACTGATTACACTGATTCTTAC…………GGCGCCGACGAGTGATTACACAGATTGCCAG……TTTGATTACACAGAT T G TT CAC GCS262 Lecture 4, Win07, BatzoglouMeasured improvementKent WJ, Genome Research 2002CS262 Lecture 4, Win07, BatzoglouNon-consecutive words—Patterns Patterns increase the likelihood of at least one match within a long conserved region3 common5 common7 commonConsecutive Positions Non-Consecutive Positions6 commonOn a 100-long 70% conserved region: Consecutive Non-consecutiveExpected # hits: 1.07 0.97Prob[at least one hit]: 0.30 0.47CS262 Lecture 4, Win07, BatzoglouAdvantage of Patterns11 positions11 positions10 positionsCS262 Lecture 4, Win07, BatzoglouMultiple patterns•K patternsTakes K times longer to scanPatterns can complement one another•Computational problem:Given: a model (prob distribution) for homology between two regionsFind: best set of K patterns that maximizes Prob(at least one match) TTTGATTACACAGAT T G TT CAC G T G T C CAG TTGATT A GBuhler et al. RECOMB 2003Sun & Buhler RECOMB 2004How long does it take to search the query?CS262 Lecture 4, Win07, BatzoglouVariants of BLAST•NCBI BLAST: search the universe http://www.ncbi.nlm.nih.gov/BLAST/•MEGABLAST: http://genopole.toulouse.inra.fr/blast/megablast.html Optimized to align very similar sequences•Works best when k = 4i  16•Linear gap penalty•WU-BLAST: (Wash U BLAST) http://blast.wustl.edu/ Very good optimizationsGood set of features & command line arguments•BLAT http://genome.ucsc.edu/cgi-bin/hgBlat Faster, less sensitive than BLASTGood for aligning huge numbers of queries•CHAOS http://www.cs.berkeley.edu/~brudno/chaos Uses inexact k-mers, sensitive •PatternHunter http://www.bioinformaticssolutions.com/products/ph/index.php Uses patterns instead of k-mers•BlastZ http://www.psc.edu/general/software/packages/blastz/ Uses patterns, good for finding genes•Typhon http://typhon.stanford.edu Uses multiple alignments to improve sensitivity/speed tradeoffCS262 Lecture 4, Win07, BatzoglouExampleQuery: gattacaccccgattacaccccgattaca (29 letters) [2 mins]Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences) 1,726,556 sequences; 8,074,398,388 total letters >gi|28570323|gb|AC108906.9| Oryza sativa chromosome 3 BAC OSJNBa0087C10 genomic sequence, complete sequence Length = 144487 Score = 34.2 bits (17), Expect = 4.5 Identities = 20/21 (95%) Strand = Plus / Plus Query: 4 tacaccccgattacaccccga 24 ||||||| ||||||||||||| Sbjct: 125138 tacacccagattacaccccga 125158 Score = 34.2 bits (17), Expect = 4.5 Identities = 20/21 (95%) Strand = Plus / Plus Query: 4 tacaccccgattacaccccga 24 ||||||| ||||||||||||| Sbjct: 125104 tacacccagattacaccccga 125124 >gi|28173089|gb|AC104321.7| Oryza sativa chromosome 3 BAC OSJNBa0052F07 genomic sequence, complete sequence Length = 139823 Score = 34.2 bits (17), Expect = 4.5 Identities = 20/21 (95%) Strand = Plus / Plus Query: 4 tacaccccgattacaccccga 24 ||||||| ||||||||||||| Sbjct: 3891 tacacccagattacaccccga 3911CS262 Lecture 4, Win07, BatzoglouExampleQuery: Human atoh enhancer, 179 letters [1.5 min]Result: 57 blast hits1. gi|7677270|gb|AF218259.1|AF218259 Homo sapiens ATOH1 enhanc... 355 1e-95 2. gi|22779500|gb|AC091158.11| Mus musculus Strain C57BL6/J ch... 264 4e-68 3. gi|7677269|gb|AF218258.1|AF218258 Mus musculus Atoh1 enhanc... 256 9e-66 4. gi|28875397|gb|AF467292.1| Gallus gallus CATH1 (CATH1) gene... 78 5e-12 5. gi|27550980|emb|AL807792.6| Zebrafish DNA sequence from clo... 54 7e-05 6. gi|22002129|gb|AC092389.4| Oryza sativa chromosome 10 BAC O... 44 0.068 7. gi|22094122|ref|NM_013676.1| Mus musculus suppressor of Ty ... 42 0.27 8. gi|13938031|gb|BC007132.1| Mus musculus, Similar to suppres... 42 0.27gi|7677269|gb|AF218258.1|AF218258 Mus musculus Atoh1 enhancer sequence Length = 1517 Score = 256 bits (129), Expect = 9e-66 Identities = 167/177 (94%), Gaps = 2/177 (1%) Strand = Plus / Plus Query: 3 tgacaatagagggtctggcagaggctcctggccgcggtgcggagcgtctggagcggagca 62 ||||||||||||| ||||||||||||||||||| |||||||||||||||||||||||||| Sbjct: 1144 tgacaatagaggggctggcagaggctcctggccccggtgcggagcgtctggagcggagca 1203 Query: 63 cgcgctgtcagctggtgagcgcactctcctttcaggcagctccccggggagctgtgcggc 122 |||||||||||||||||||||||||| ||||||||| |||||||||||||||| ||||| Sbjct: 1204 cgcgctgtcagctggtgagcgcactc-gctttcaggccgctccccggggagctgagcggc 1262 Query: 123


View Full Document

Stanford CS 262 - Lecture 4 - Heuristic Local Alignerers

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture 4 - Heuristic Local Alignerers
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 4 - Heuristic Local Alignerers and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 4 - Heuristic Local Alignerers 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?