DOC PREVIEW
Stanford CS 262 - Gene Recognition

This preview shows page 1-2-3-24-25-26-27-49-50-51 out of 51 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 51 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Gene RecognitionUsing Comparative InformationUsing Comparative InformationPatterns of ConservationComparison-based Gene FindersTwinscanSLAM – Generalized Pair HMMNSCAN—Multiple Species Gene PredictionSlide 9Performance ComparisonCONTRASTSlide 12CONTRAST - FeaturesCONTRAST – SVM accuraciesCONTRAST - DecodingCONTRAST - TrainingSlide 17Slide 18Gene Regulation and MicroarraysOverviewCells respond to environmentGenome is fixed – Cells are dynamicWhere gene regulation takes placeTranscriptional RegulationTranscription Factors Binding to DNAPromoter and EnhancersGene Regulation with TFsSlide 28Slide 29Slide 30Slide 31Example: A Human heat shock proteinDNA MicroarraysWhat is a microarraySlide 35Goal of Microarray ExperimentsClustering vs. ClassificationClustering AlgorithmsHierarchical clusteringResults of Clustering Gene ExpressionK-Means Clustering AlgorithmK-Means AlgorithmSlide 43Slide 44Slide 45Slide 46Slide 47Slide 48Mixture of Gaussians – Probabilistic K-meansAnalysis of Clustering DataEvaluating clusters – Hypergeometric DistributionCS262 Lecture 9, Win07, BatzoglouGene RecognitionCS262 Lecture 9, Win07, BatzoglouUsing Comparative InformationCS262 Lecture 9, Win07, BatzoglouUsing Comparative Information •Hox cluster is an example where everything is conservedCS262 Lecture 9, Win07, BatzoglouPatterns of Conservation30% 1.3%0.14% 58%14%10.2%Genes Intergenic Mutations Gaps FrameshiftsSeparation2-fold10-fold75-foldCS262 Lecture 9, Win07, BatzoglouComparison-based Gene Finders•Rosetta, 2000•CEM, 2000–First methods to apply comparative genomics (human-mouse) to improve gene prediction•Twinscan, 2001–First HMM for comparative gene prediction in two genomes•SLAM, 2002–Generalized pair-HMM for simultaneous alignment and gene prediction in two genomes•NSCAN, 2006–Best method to-date based on a phylo-HMM for multiple genome gene predictionCS262 Lecture 9, Win07, BatzoglouTwinscan1. Align the two sequences (eg. from human and mouse)2. Mark each human base as gap ( - ), mismatch ( : ), match ( | )New “alphabet”: 4 x 3 = 12 letters= { A-, A:, A|, C-, C:, C|, G-, G:, G|, U-, U:, U| } 3. Run Viterbi using emissions ek(b) where b  { A-, A:, A|, …, T| }Emission distributions ek(b) estimated from real genes from human/mouseeI(x|) < eE(x|): matches favored in exonseI(x-) > eE(x-): gaps (and mismatches) favored in intronsExampleHuman: ACGGCGACGUGCACGUMouse: ACUGUGACGUGCACUUAlignment: ||:|:|||||||||:|CS262 Lecture 9, Win07, BatzoglouSLAM – Generalized Pair HMMdeExon GPHMM1.Choose exon lengths (d,e).2.Generate alignment of length d+e.CS262 Lecture 9, Win07, BatzoglouNSCAN—Multiple Species Gene Prediction•GENSCAN•TWINSCAN•N-SCANTarget GGTGAGGTGACCAAGAACGTGTTGACAGTATarget GGTGAGGTGACCAAGAACGTGTTGACAGTATarget GGTGAGGTGACCAAGAACGTGTTGACAGTAConservation |||:||:||:|||||:||||||||......sequenceTarget GGTGAGGTGACCAAGAACGTGTTGACAGTAConservation |||:||:||:|||||:||||||||......sequenceTarget GGTGAGGTGACCAAGAACGTGTTGACAGTAInformant1 GGTCAGC___CCAAGAACGTGTAG......Informant2 GATCAGC___CCAAGAACGTGTAG......Informant3 GGTGAGCTGACCAAGATCGTGTTGACACAATarget GGTGAGGTGACCAAGAACGTGTTGACAGTAInformant1 GGTCAGC___CCAAGAACGTGTAG......Informant2 GATCAGC___CCAAGAACGTGTAG......Informant3 GGTGAGCTGACCAAGATCGTGTTGACACAA...),...,,...,|(1 oiioiiiTTPIII),...,|(1 oiiiTTTP),...,,,...,|,(11 oiioiiiiTTTPIIITarget sequence:Informant sequences (vector):Joint prediction (use phylo-HMM):CS262 Lecture 9, Win07, BatzoglouNSCAN—Multiple Species Gene PredictionXXCYYZZHM R)|()|()|()|()|()|()(),,,,,,(1ZRPZMPYZPYHPXYPXCPAPZYXRMCHPXXCYYZZHM R)|()|()|()|()|()|()(),,,,,,(ZRPZMPXCPYZPYXPHYPHPZYXRMCHPCS262 Lecture 9, Win07, BatzoglouPerformance ComparisonGENSCANGeneralized HMMModels human sequenceTWINSCANGeneralized HMMModels human/mouse alignmentsN-SCANPhylo-HMMModels multiple sequence evolutionGENSCANGeneralized HMMModels human sequenceTWINSCANGeneralized HMMModels human/mouse alignmentsN-SCANPhylo-HMMModels multiple sequence evolutionNSCAN human/mouse >Human/multiple informantsCS262 Lecture 9, Win07, Batzoglou•2-level architecture•No Phylo-HMM that models alignmentsCONTRAST Human tttcttagACTTTAAAGCTGTCAAGCCGTGTTCTAGATAAAATAAGTATTGGACAACTTGTTAGTCTCCTTTCCAACAACCTGAACAAATTTGATGAAgtatgtaccta Macaque tttcttagACTTTAAAGCTGTCAAGCCGTGTTCTAGATAAAATAAGTATTGGACAACTTGTTAGTCTCCTTTCCAACAACCTGAACAAATTTGATGAAgtatgtaccta Mouse ttgcttagACTTTAAAGTTGTCAAGCCGCGTTCTTGATAAAATAAGTATTGGACAACTTGTTAGTCTTCTTTCCAACAACCTGAACAAATTTGATGAAgtatgta-cca Rat ttgcttagACTTTAAAGTTGTCAAGCCGTGTTCTTGATAAAATAAGTATTGGACAACTTATTAGTCTTCTTTCCAACAACCTGAACAAATTTGATGAAgtatgtaccca Rabbit t--attagACTTTAAAGCTGTCAAGCCGTGTTCTAGATAAAATAAGTATTGGGCAACTTATTAGTCTCCTTTCCAACAACCTGAACAAATTTGATGAAgtatgtaccta Dog t-cattagACTTTAAAGCTGTCAAGCCGTGTTCTGGATAAAATAAGTATTGGACAACTTGTTAGTCTCCTTTCCAACAACCTGAACAAATTCGATGAAgtatgtaccta Cow t-cattagACTTTGAAGCTATCAAGCCGTGTTCTGGATAAAATAAGTATTGGACAACTTGTTAGTCTCCTTTCCAACAACCTGAACAAATTTGATGAAgtatgta-ctaArmadillo gca--tagACCTTAAAACTGTCAAGCCGTGTTTTAGATAAAATAAGTATTGGACAACTTGTTAGTCTCCTTTCCAACAACCTGAACAAATTTGATGAAgtatgtgccta Elephant gct-ttagACTTTAAAACTGTCCAGCCGTGTTCTTGATAAAATAAGTATTGGACAACTTGTCAGTCTCCTTTCCAACAACCTGAACAAATTTGATGAAgtatgtatcta Tenrec tc-cttagACTTTAAAACTTTCGAGCCGGGTTCTAGATAAAATAAGTATTGGACAACTTGTTAGTCTCCTTTCCAACAACCTGAACAAATTTGATGAAgtatgtatcta Opossum ---tttagACCTTAAAACTGTCAAGCCGTGTTCTAGATAAAATAAGCACTGGACAGCTTATCAGTCTCCTTTCCAACAATCTGAACAAGTTTGATGAAgtatgtagctg Chicken ----ttagACCTTAAAACTGTCAAGCAAAGTTCTAGATAAAATAAGTACTGGACAATTGGTCAGCCTTCTTTCCAACAATCTGAACAAATTCGATGAGgtatgtt--tgSVMSVMSVMSVMXYa b a bCS262 Lecture 9, Win07, BatzoglouCONTRASTCS262 Lecture 9, Win07, Batzoglou•log P(y | x) ~ wTF(x, y)•F(x, y) = i f(yi-1, yi, i, x)•f(yi-1, yi, i, x):1{yi-1 = INTRON, yi = EXON_FRAME_1}1{yi-1 = EXON_FRAME_1, xhuman,i-2,…, xhuman,i+3 = ACCGGT)1{yi-1 = EXON_FRAME_1, xhuman,i-1,…, xdog,i+1 = ACC, AGC)(1-c)1{a<SVM_DONOR(i)<b}(optional) 1{EXON_FRAME_1, EST_EVIDENCE}CONTRAST - FeaturesCS262 Lecture 9, Win07, Batzoglou•Accuracy increases as we add informants•Diminishing returns after ~5 informantsCONTRAST – SVM accuraciesSN SPCS262 Lecture 9, Win07, BatzoglouCONTRAST - DecodingViterbi Decoding:maximize P(y | x)Maximum Expected Boundary Accuracy Decoding:maximize i,B 1{yi-1, yi is exon boundary B} Accuracy(yi-1, yi, B | x)Accuracy(yi-1, yi, B | x) = P(yi-1, yi is B | x) – (1 – P(yi-1, yi


View Full Document

Stanford CS 262 - Gene Recognition

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Gene Recognition
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Gene Recognition and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Gene Recognition 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?