DOC PREVIEW
Stanford CS 262 - Lecture 4 - Sequence Alignment Cont’d

This preview shows page 1-2-16-17-18-34-35 out of 35 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 35 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Sequence Alignment Cont’dLinear-space alignmentThe Four-Russian AlgorithmHeuristic Local Aligners BLAST, WU-BLAST, BlastZ, MegaBLAST, BLAT, PatternHunter, ……State of biological databasesSlide 6Some useful applications of alignmentsSlide 8BLASTBLAST  Original VersionSlide 11Gapped BLASTSlide 13Variants of BLASTExampleSlide 16BLAT: Blast-Like Alignment ToolPatternHunterAdvantage of Non-Consecutive WordsHidden Markov ModelsOutline for our next topicExample: The Dishonest CasinoQuestion # 1 – EvaluationQuestion # 2 – DecodingQuestion # 3 – LearningThe dishonest casino modelDefinition of a hidden Markov modelA HMM is memory-lessA parse of a sequenceLikelihood of a parseExample: the dishonest casinoSlide 32Slide 33The three main questions on HMMsLet’s not be confused by notationSequence AlignmentCont’dLinear-space alignment•Iterate this procedure to the left and right!N-k*M/2M/2k*The Four-Russian AlgorithmMain structure of the algorithm:•Divide NN DP matrix into K K log2N-blocks that overlap by 1 column & 1 row•For i = 1……K• For j = 1……K• Compute Di,j as a function of Ai,j, Bi,j, Ci,j, x[li…l’i], y[rj…r’j]Time: O(N2 / log2N) times the cost of step 4tttHeuristic Local AlignersBLAST, WU-BLAST, BlastZ, MegaBLAST, BLAT, PatternHunter, ……State of biological databasesSequenced Genomes:Human 3109Yeast 1.2107Mouse 2.7109  12 different strainsRat 2.6109Neurospora 4107 14 more fungi within next yearFugu fish 3.3108Tetraodon 3108~250 bacteria/virusesMosquito 2.8108Next year: Drosophila 1.2108 Dog, Chimpanzee, ChickenWorm 1.01082 sea squirts  1.6108 Current rate of sequencing:Rice 1.0109 4 big labs  3 109 bp /year/labArabidopsis 1.2108 10s small labsState of biological databases•Number of genes in these genomes:Vertebrate: ~30,000Insects: ~14,000Worm: ~17,000Fungi: ~6,000-10,000Small organisms: 100s-1,000s•Each known or predicted gene has an associated protein sequence•>1,000,000 known / predicted protein sequencesSome useful applications of alignments•Given a newly discovered gene,Does it occur in other species?How fast does it evolve?•Assume we try Smith-Waterman:The entire genomic databaseOur new gene1041010 - 1011Some useful applications of alignments•Given a newly sequenced organism,•Which subregions align with other organisms?Potential genesOther biological characteristics•Assume we try Smith-Waterman:The entire genomic databaseOur newly sequenced mammal31091010 - 1011BLAST(Basic Local Alignment Search Tool)Main idea:1. Construct a dictionary of all the words in the query2. Initiate a local alignment for each word match between query and DBRunning Time: O(MN)However, orders of magnitude faster than Smith-WatermanqueryDBBLAST  Original VersionDictionary:All words of length k (~11)Alignment initiated between words of alignment score  T (typically T = k)Alignment:Ungapped extensions until score below statistical thresholdOutput:All local alignments with score > statistical threshold…………queryDBqueryscanBLAST  Original VersionA C G A A G T A A G G T C C A G TC C C T T C C T G G A T T G C G AExample:k = 4,T = 4The matching word GGTC initiates an alignmentExtension to the left and right with no gaps until alignment falls < 50%Output:GTAAGGTCCGTTAGGTCCGapped BLASTA C G A A G T A A G G T C C A G TC T G A T C C T G G A T T G C G AAdded features:•Pairs of words can initiate alignment•Extensions with gaps in a band around anchorOutput:GTAAGGTCCAGTGTTAGGTC-AGTGapped BLASTA C G A A G T A A G G T C C A G TC T G A T C C T G G A T T G C G AAdded features:•Pairs of words can initiate alignment•Nearby alignments are merged•Extensions with gaps until score < T below best score so farOutput:GTAAGGTCCAGTGTTAGGTC-AGTVariants of BLAST•MEGABLAST:Optimized to align very similar sequences•Works best when k = 4i  16•Linear gap penalty•PSI-BLAST:BLAST produces many hitsThose are aligned, and a pattern is extractedPattern is used for next search; above steps iterated•WU-BLAST: (Wash U BLAST)Optimized, added features•BlastZCombines BLAST/PatternHunter methodologyExampleQuery: gattacaccccgattacaccccgattaca (29 letters) [2 mins]Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS, or phase 0, 1 or 2 HTGS sequences) 1,726,556 sequences; 8,074,398,388 total letters >gi|28570323|gb|AC108906.9| Oryza sativa chromosome 3 BAC OSJNBa0087C10 genomic sequence, complete sequence Length = 144487 Score = 34.2 bits (17), Expect = 4.5 Identities = 20/21 (95%) Strand = Plus / Plus Query: 4 tacaccccgattacaccccga 24 ||||||| ||||||||||||| Sbjct: 125138 tacacccagattacaccccga 125158 Score = 34.2 bits (17), Expect = 4.5 Identities = 20/21 (95%) Strand = Plus / Plus Query: 4 tacaccccgattacaccccga 24 ||||||| ||||||||||||| Sbjct: 125104 tacacccagattacaccccga 125124 >gi|28173089|gb|AC104321.7| Oryza sativa chromosome 3 BAC OSJNBa0052F07 genomic sequence, complete sequence Length = 139823 Score = 34.2 bits (17), Expect = 4.5 Identities = 20/21 (95%) Strand = Plus / Plus Query: 4 tacaccccgattacaccccga 24 ||||||| ||||||||||||| Sbjct: 3891 tacacccagattacaccccga 3911ExampleQuery: Human atoh enhancer, 179 letters [1.5 min]Result: 57 blast hits1. gi|7677270|gb|AF218259.1|AF218259 Homo sapiens ATOH1 enhanc... 355 1e-95 2. gi|22779500|gb|AC091158.11| Mus musculus Strain C57BL6/J ch... 264 4e-68 3. gi|7677269|gb|AF218258.1|AF218258 Mus musculus Atoh1 enhanc... 256 9e-66 4. gi|28875397|gb|AF467292.1| Gallus gallus CATH1 (CATH1) gene... 78 5e-12 5. gi|27550980|emb|AL807792.6| Zebrafish DNA sequence from clo... 54 7e-05 6. gi|22002129|gb|AC092389.4| Oryza sativa chromosome 10 BAC O... 44 0.068 7. gi|22094122|ref|NM_013676.1| Mus musculus suppressor of Ty ... 42 0.27 8. gi|13938031|gb|BC007132.1| Mus musculus, Similar to suppres... 42 0.27gi|7677269|gb|AF218258.1|AF218258 Mus musculus Atoh1 enhancer sequence Length = 1517 Score = 256 bits (129), Expect = 9e-66 Identities = 167/177 (94%), Gaps = 2/177 (1%) Strand = Plus / Plus Query: 3 tgacaatagagggtctggcagaggctcctggccgcggtgcggagcgtctggagcggagca 62 ||||||||||||| ||||||||||||||||||| |||||||||||||||||||||||||| Sbjct: 1144


View Full Document

Stanford CS 262 - Lecture 4 - Sequence Alignment Cont’d

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture 4 - Sequence Alignment Cont’d
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 4 - Sequence Alignment Cont’d and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 4 - Sequence Alignment Cont’d 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?