DOC PREVIEW
Stanford CS 262 - Gene Recognition

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Gene structureNeedles in a HaystackGene FindingSignals for Gene FindingSlide 6Exon and Intron LengthsNucleotide CompositionSlide 9Splice SitesHMMs for Gene RecognitionHMMs for Gene RecognitionDuration HMMs for Gene RecognitionGenscanUsing Comparative InformationUsing Comparative InformationPatterns of ConservationComparison-based Gene FindersTwinscanSLAM – Generalized Pair HMMNSCAN—Multiple Species Gene PredictionNSCAN—Multiple Species Gene PredictionPerformance ComparisonCONTRASTCONTRASTCONTRAST - FeaturesCONTRAST – SVM accuraciesCONTRAST - DecodingCONTRAST - TrainingPerformance ComparisonPerformance ComparisonCS262 Lecture 9, Win07, BatzoglouGene RecognitionCS262 Lecture 9, Win07, BatzoglouGene structureexon1exon2 exon3intron1 intron2transcriptiontranslationsplicingexon = protein-codingintron = non-codingCodon:A triplet of nucleotides that is converted to one amino acidCS262 Lecture 9, Win07, BatzoglouNeedles in a HaystackCS262 Lecture 9, Win07, Batzoglou•Classes of Gene predictorsAb initio•Only look at the genomic DNA of target genomeDe novo•Target genome + aligned informant genome(s)EST/cDNA-based & combined approaches•Use aligned ESTs or cDNAs + any other kind of evidenceGene FindingEXON EXON EXON EXON EXON Human tttcttagACTTTAAAGCTGTCAAGCCGTGTTCTAGATAAAATAAGTATTGGACAACTTGTTAGTCTCCTTTCCAACAACCTGAACAAATTTGATGAAgtatgtaccta Macaque tttcttagACTTTAAAGCTGTCAAGCCGTGTTCTAGATAAAATAAGTATTGGACAACTTGTTAGTCTCCTTTCCAACAACCTGAACAAATTTGATGAAgtatgtaccta Mouse ttgcttagACTTTAAAGTTGTCAAGCCGCGTTCTTGATAAAATAAGTATTGGACAACTTGTTAGTCTTCTTTCCAACAACCTGAACAAATTTGATGAAgtatgta-cca Rat ttgcttagACTTTAAAGTTGTCAAGCCGTGTTCTTGATAAAATAAGTATTGGACAACTTATTAGTCTTCTTTCCAACAACCTGAACAAATTTGATGAAgtatgtaccca Rabbit t--attagACTTTAAAGCTGTCAAGCCGTGTTCTAGATAAAATAAGTATTGGGCAACTTATTAGTCTCCTTTCCAACAACCTGAACAAATTTGATGAAgtatgtaccta Dog t-cattagACTTTAAAGCTGTCAAGCCGTGTTCTGGATAAAATAAGTATTGGACAACTTGTTAGTCTCCTTTCCAACAACCTGAACAAATTCGATGAAgtatgtaccta Cow t-cattagACTTTGAAGCTATCAAGCCGTGTTCTGGATAAAATAAGTATTGGACAACTTGTTAGTCTCCTTTCCAACAACCTGAACAAATTTGATGAAgtatgta-ctaArmadillo gca--tagACCTTAAAACTGTCAAGCCGTGTTTTAGATAAAATAAGTATTGGACAACTTGTTAGTCTCCTTTCCAACAACCTGAACAAATTTGATGAAgtatgtgccta Elephant gct-ttagACTTTAAAACTGTCCAGCCGTGTTCTTGATAAAATAAGTATTGGACAACTTGTCAGTCTCCTTTCCAACAACCTGAACAAATTTGATGAAgtatgtatcta Tenrec tc-cttagACTTTAAAACTTTCGAGCCGGGTTCTAGATAAAATAAGTATTGGACAACTTGTTAGTCTCCTTTCCAACAACCTGAACAAATTTGATGAAgtatgtatcta Opossum ---tttagACCTTAAAACTGTCAAGCCGTGTTCTAGATAAAATAAGCACTGGACAGCTTATCAGTCTCCTTTCCAACAATCTGAACAAGTTTGATGAAgtatgtagctg Chicken ----ttagACCTTAAAACTGTCAAGCAAAGTTCTAGATAAAATAAGTACTGGACAATTGGTCAGCCTTCTTTCCAACAATCTGAACAAATTCGATGAGgtatgtt--tgCS262 Lecture 9, Win07, BatzoglouSignals for Gene Finding1. Regular gene structure2. Exon/intron lengths3. Codon composition4. Motifs at the boundaries of exons, introns, etc.Start codon, stop codon, splice sites5. Patterns of conservation6. Sequenced mRNAs 7. (PCR for verification)CS262 Lecture 9, Win07, BatzoglouNext Exon:Frame 0Next Exon:Frame 1CS262 Lecture 9, Win07, BatzoglouExon and Intron LengthsCS262 Lecture 9, Win07, BatzoglouNucleotide Composition•Base composition in exons is characteristic due to the genetic codeAmino Acid SLC DNA CodonsIsoleucine I ATT, ATC, ATALeucine L CTT, CTC, CTA, CTG, TTA, TTGValine V GTT, GTC, GTA, GTGPhenylalanine F TTT, TTCMethionine M ATGCysteine C TGT, TGCAlanine A GCT, GCC, GCA, GCG Glycine G GGT, GGC, GGA, GGG Proline P CCT, CCC, CCA, CCGThreonine T ACT, ACC, ACA, ACGSerine S TCT, TCC, TCA, TCG, AGT, AGCTyrosine Y TAT, TACTryptophan W TGGGlutamine Q CAA, CAGAsparagine N AAT, AACHistidine H CAT, CACGlutamic acid E GAA, GAGAspartic acid D GAT, GACLysine K AAA, AAGArginine R CGT, CGC, CGA, CGG, AGA, AGGAmino Acid SLC DNA CodonsIsoleucine I ATT, ATC, ATALeucine L CTT, CTC, CTA, CTG, TTA, TTGValine V GTT, GTC, GTA, GTGPhenylalanine F TTT, TTCMethionine M ATGCysteine C TGT, TGCAlanine A GCT, GCC, GCA, GCG Glycine G GGT, GGC, GGA, GGG Proline P CCT, CCC, CCA, CCGThreonine T ACT, ACC, ACA, ACGSerine S TCT, TCC, TCA, TCG, AGT, AGCTyrosine Y TAT, TACTryptophan W TGGGlutamine Q CAA, CAGAsparagine N AAT, AACHistidine H CAT, CACGlutamic acid E GAA, GAGAspartic acid D GAT, GACLysine K AAA, AAGArginine R CGT, CGC, CGA, CGG, AGA, AGGCS262 Lecture 9, Win07, BatzoglouatgatgtgatgaggtgagggtgagggtgagggtgagggtgagggtgagcaggtgcaggtgcagatgcagatgcagttgcagttgcaggcccaggccggtgagggtgagCS262 Lecture 9, Win07, BatzoglouSplice Sites(http://www-lmmb.ncifcrf.gov/~toms/sequencelogo.html)CS262 Lecture 9, Win07, BatzoglouHMMs for Gene RecognitionGTCAGATGAGCAAAGTAGACACTCCAGTAACGCGGTGAGTACATTAAexonexonexonintronintronintergeneintergeneIntergene StateIntergene StateFirst Exon StateFirst Exon StateIntronStateIntronStateCS262 Lecture 9, Win07, BatzoglouHMMs for Gene RecognitionexonexonexonintronintronintergeneintergeneIntergene StateIntergene StateFirst Exon StateFirst Exon StateIntronStateIntronStateGTCAGATGAGCAAAGTAGACACTCCAGTAACGCGGTGAGTA CATTAACS262 Lecture 9, Win07, BatzoglouDuration HMMs for Gene RecognitionTAA A A A A A A A A A A AA AAT T T T T T T T T T T T T T TG GGG G G G GGGG G G G GC C C C C C CExon1 Exon2 Exon3Duration diPINTRON(xi | xi-1…xi-w)PEXON_DUR(d)iPEXON((i – j + 2)%3)) (xi | xi-1…xi-w)j+2P5’SS(xi-3…xi+4)PSTOP(xi-4…xi+3)CS262 Lecture 9, Win07, BatzoglouGenscan•Burge, 1997•First competitive HMM-based gene finder, huge accuracy jump•Only gene finder at the time, to predict partial genes and genes in both strandsFeatures–Duration HMM–Four different parameter sets•Very low, low, med, high GC-contentCS262 Lecture 9, Win07, BatzoglouUsing Comparative InformationCS262 Lecture 9, Win07, BatzoglouUsing Comparative Information •Hox cluster is an example where everything is conservedCS262 Lecture 9, Win07, BatzoglouPatterns of Conservation30% 1.3%0.14% 58%14%10.2%Genes Intergenic Mutations Gaps FrameshiftsSeparation2-fold10-fold75-foldCS262 Lecture 9, Win07, BatzoglouComparison-based Gene Finders•Rosetta, 2000•CEM, 2000–First methods to apply comparative genomics (human-mouse) to improve gene prediction•Twinscan, 2001–First HMM for comparative gene prediction in two genomes•SLAM, 2002–Generalized pair-HMM for simultaneous alignment and gene prediction in two genomes•NSCAN, 2006–Best method to-date based on a phylo-HMM for multiple genome gene predictionCS262 Lecture 9, Win07, BatzoglouTwinscan1. Align the two sequences (eg. from human and mouse)2. Mark each human


View Full Document

Stanford CS 262 - Gene Recognition

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Gene Recognition
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Gene Recognition and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Gene Recognition 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?