DOC PREVIEW
Stanford CS 262 - Gene Recognition

This preview shows page 1-2-3-4-29-30-31-32-59-60-61-62 out of 62 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 62 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Gene RecognitionGene structureSlide 3Slide 4Slide 5HMM-based Gene FindersBetter way to do it: negative binomialGENSCAN’s hidden weaponEvaluation of AccuracyResults of GENSCANComparison-based MethodsCross-species gene findingSlide 13Slide 14Not always: HoxA human-mousePatterns of ConservationTwinscanTwinscan AlgorithmExampleHMMs for simultaneous alignment and gene finding: Generalized Pair HMMsThe SLAM hidden Markov modelExon GPHMMSlide 23Measuring PerformanceExample: HoxA2 and HoxA3Gene Regulation and MicroarraysOverviewCells respond to environmentGenome is fixed – Cells are dynamicWhere gene regulation takes placeTranscriptional RegulationTranscription Factors Binding to DNAPromoter and EnhancersRegulation of GenesSlide 35Slide 36Slide 37Slide 38Example: A Human heat shock proteinThe Cell as a Regulatory NetworkThe Cell as a Regulatory Network (2)DNA MicroarraysWhat is a microarraySlide 44Slide 45Goal of Microarray ExperimentsClustering vs. ClassificationClustering AlgorithmsHierarchical clusteringDistance between clustersResults of Clustering Gene ExpressionK-Means Clustering AlgorithmK-Means AlgorithmSlide 54Slide 55Slide 56Slide 57Slide 58Slide 59Mixture of Gaussians – Probabilistic K-meansAnalysis of Clustering DataEvaluating clusters – Hypergeometric DistributionCS262 Lecture 16, Win07, BatzoglouGene RecognitionCredits for slides:Serafim BatzoglouMarina AlexanderssonLior PachterSerge SaxonovCS262 Lecture 16, Win07, BatzoglouGene structureexon1exon2 exon3intron1 intron2transcriptiontranslationsplicingexon = protein-codingintron = non-codingCodon:A triplet of nucleotides that is converted to one amino acidCS262 Lecture 16, Win07, BatzoglouGTCAGATGAGCAAAGTAGACACTCCAGTAACGCGGTGAGTACATTAAexonexonexonintronintronintergeneintergeneHidden Markov Models for Gene FindingIntergene StateFirst Exon StateIntronStateCS262 Lecture 16, Win07, BatzoglouGTCAGATGAGCAAAGTAGACACTCCAGTAACGCGGTGAGTACATTAAexonexonexonintronintronintergeneintergeneHidden Markov Models for Gene FindingIntergene StateFirst Exon StateIntronStateCS262 Lecture 16, Win07, BatzoglouTAA A A A A A A A A A A AA AAT T T T T T T T T T T T T T TG GGG G G G GGGG G G G GC C C C C C CExon1 Exon2 Exon3Duration dDuration HMM for Gene FindingiPINTRON(xi | xi-1…xi-w)PEXON_DUR(d)iPEXON((i – j + 2)%3)) (xi | xi-1…xi-w)j+2P5’SS(xi-3…xi+4)PSTOP(xi-4…xi+3)CS262 Lecture 16, Win07, BatzoglouHMM-based Gene Finders•GENMARK (Borodovsky & McIninch 1993)•GENIE (Kulp 1996)•GENSCAN (Burge 1997)Big jump in accuracy of de novo gene findingCurrently, one of the bestHMM with duration modeling for Exon states •FGENESH (Solovyev 1997)Currently one of the best•HMMgene (Krogh 1997)•VEIL (Henderson, Salzberg, & Fasman 1997)CS262 Lecture 16, Win07, BatzoglouBetter way to do it: negative binomial•EasyGene:Prokaryoticgene-finderLarsen TS, Krogh A•Negative binomial with n = 3CS262 Lecture 16, Win07, BatzoglouGENSCAN’s hidden weapon•C+G content is correlated with:Gene content (+)Mean exon length (+)Mean intron length (–)•These quantities affect parameters of model•SolutionTrain parameters of model in four different C+G content ranges!CS262 Lecture 16, Win07, BatzoglouEvaluation of Accuracy(Slide by NF Samatova)Sensitivity (SN) Fraction of exons (coding nucleotides) whose boundaries are predicted exactly (that are predicted as coding)•Specificity (Sp) Fraction of the predicted exons (coding nucleotides) that are exactly correct (that are coding)•Correlation Coefficient (CC)Combined measure of Sensitivity & Specificity Range: -1 (always wrong)  +1 (always right)TP FP TN FN TP FN TNActualPredictedCoding / No CodingTNFNFPTPPredictedActualNo Coding / CodingCS262 Lecture 16, Win07, BatzoglouResults of GENSCAN•On the initial test dataset (Burset & Guigo)80% exact exon detection•10% partial exons•10% wrong exons•In generalHMMs have been best in de novo predictionIn practice they overpredict human genes by ~2xCS262 Lecture 16, Win07, BatzoglouComparison-based MethodsCS262 Lecture 16, Win07, BatzoglouCross-species gene finding5’3’Exon1Exon2Exon3Intron1 Intron2[human][mouse]GGTTTT--ATGAGTAAAGTAGACACTCCAGTAACGCGGTGAGTAC----ATTAA | ||||| ||||| ||| ||||| ||||||||||||| | |C-TCAGGAATGAGCAAAGTCGAC---CCAGTAACGCGGTAAGTACATTAACGA-CS262 Lecture 16, Win07, BatzoglouComparison of 1196 orthologous genes(Makalowski et al., 1996)•Sequence identity between genes in human/mouse–exons: 84.6%–protein: 85.4%–introns: 35%–5’ UTRs: 67%–3’ UTRs: 69%•27 proteins were 100% identicalCS262 Lecture 16, Win07, BatzoglouCS262 Lecture 16, Win07, BatzoglouNot always: HoxA human-mouseCS262 Lecture 16, Win07, BatzoglouPatterns of Conservation30% 1.3%0.14% 58%14%10.2%Genes Intergenic Mutations Gaps FrameshiftsSeparation2-fold10-fold75-foldCS262 Lecture 16, Win07, BatzoglouTwinscan•Twinscan is an augmented version of the Gencscan HMM.EItransitionsdurationemissionsACUAUACAGACAUAUAUCAUCS262 Lecture 16, Win07, BatzoglouTwinscan Algorithm1. Align the two sequences (eg. from human and mouse)2. Mark each human base as gap ( - ), mismatch ( : ), match ( | )New “alphabet”: 4 x 3 = 12 letters= { A-, A:, A|, C-, C:, C|, G-, G:, G|, U-, U:, U| } 3. Run Viterbi using emissions ek(b) where b  { A-, A:, A|, …, T| }Emission distributions ek(b) estimated from real genes from human/mouseeI(x|) < eE(x|): matches favored in exonseI(x-) > eE(x-): gaps (and mismatches) favored in intronsCS262 Lecture 16, Win07, BatzoglouExampleHuman: ACGGCGACGUGCACGUMouse: ACUGUGACGUGCACUUAlignment: ||:|:|||||||||:|Input to Twinscan HMM:A| C| G: G| C: G| A| C| G| U| G| C| A| C| G: U|Recall, eE(A|) > eI(A|)eE(A-) < eI(A-)Likely exonCS262 Lecture 16, Win07, BatzoglouHMMs for simultaneous alignment and gene finding: Generalized Pair HMMsCS262 Lecture 16, Win07, BatzoglouThe SLAM hidden Markov modelCS262 Lecture 16, Win07, BatzoglouExon GPHMMde1.Choose exon lengths (d,e).2.Generate alignment of length d+e.CS262 Lecture 16, Win07, BatzoglouApproximate alignmentCS262 Lecture 16, Win07, BatzoglouMeasuring PerformanceCS262 Lecture 16, Win07, BatzoglouExample: HoxA2 and HoxA3SLAMSGP-2TwinscanGenscanTBLASTXSLAM CNSVISTARefSeqCS262 Lecture 16, Win07, BatzoglouGene Regulation and Gene Regulation and MicroarraysMicroarraysCS262 Lecture 16, Win07, BatzoglouOverview•A. Gene Expression and Regulation•B. Measuring Gene Expression:


View Full Document

Stanford CS 262 - Gene Recognition

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Gene Recognition
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Gene Recognition and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Gene Recognition 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?