DOC PREVIEW
Stanford CS 262 - Lecture 11- Static Race Detection

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Slide 1Next Few TopicsGene RecognitionReadingGene expressionGene structureSlide 7Slide 8Finding GenesApproaches to gene findingSlide 11Slide 122. Recognize “coding bias”Slide 14Slide 15Biology of Splicing3. Recognize splice sitesSlide 18Slide 19Slide 20Slide 21Slide 22Slide 23HMM-based Gene FindersBetter way to do it: negative binomialGENSCAN’s hidden weaponEvaluation of AccuracyResults of GENSCANGenomics 101• DNA sequencing• Alignment• Gene identification• Gene expression• Genome evolution• …Next Few Topics•Gene RecognitionFinding genes in DNA with computational methods•Large-scale alignment & multiple alignmentComparing whole genomes, or large families of genes•Gene Expression and RegulationMeasuring the expression of many genes at a time Finding elements in DNA that control the expression of genesGene RecognitionCredits for slides:Marina AlexanderssonLior PachterSerge SaxonovReading•GENSCAN•EasyGene•SLAM•TwinscanOptional:Chris Burge’s ThesisGene expressionProteinRNADNAtranscriptiontranslationCCTGAGCCAACTATTGATGAAPEPTIDECCUGAGCCAACUAUUGAUGAAGene structureexon1exon2 exon3intron1 intron2transcriptiontranslationsplicingexon = protein-codingintron = non-codingCodon:A triplet of nucleotides that is converted to one amino acidWhere are the genes?Where are the genes?In humans:~22,000 genes~1.5% of human DNAFinding Genes1. Exploit the regular gene structureATG—Exon1—Intron1—Exon2—…—ExonN—STOP 2. Recognize “coding bias”CAG-CGA-GAC-TAT-TTA-GAT-AAC-ACA-CAT-GAA-…3. Recognize splice sitesIntron—cAGt—Exon—gGTgag—Intron4. Model the duration of regionsIntrons tend to be much longer than exons, in mammalsExons are biased to have a given minimum length5. Use cross-species comparisonGene structure is conserved in mammalsExons are more similar (~85%) than intronsApproaches to gene finding•HomologyBLAST, Procrustes.•Ab initioGenscan, Genie, GeneID.•HybridsGenomeScan, GenieEST, Twinscan, SGP, ROSETTA, CEM, TBLASTX, SLAM.Start codonATG5’3’Exon 1Exon 2Exon 3Intron 1 Intron 2Stop codonTAG/TGA/TAASplice sites1. Exploit the regular gene structureNext Exon:Frame 0Next Exon:Frame 12. Recognize “coding bias”•Each exon can be in one of three framesag—gattacagattacagattaca—gtaag Frame 0ag—gattacagattacagattaca—gtaag Frame 1ag—gattacagattacagattaca—gtaag Frame 2Frame of next exon depends on how many nucleotides are left over from previous exon•Codons “tag”, “tga”, and “taa” are STOPNo STOP codon appears in-frame, until end of geneAbsence of STOP is called open reading frame (ORF)•Different codons appear with different frequencies—coding bias2. Recognize “coding bias”Amino Acid SLC DNA codonsIsoleucine I ATT, ATC, ATALeucine L CTT, CTC, CTA, CTG, TTA, TTGValine V GTT, GTC, GTA, GTGPhenylalanine F TTT, TTCMethionine M ATGCysteine C TGT, TGCAlanine A GCT, GCC, GCA, GCG Glycine G GGT, GGC, GGA, GGG Proline P CCT, CCC, CCA, CCGThreonine T ACT, ACC, ACA, ACGSerine S TCT, TCC, TCA, TCG, AGT, AGCTyrosine Y TAT, TACTryptophan W TGGGlutamine Q CAA, CAGAsparagine N AAT, AACHistidine H CAT, CACGlutamic acid E GAA, GAGAspartic acid D GAT, GACLysine K AAA, AAGArginine R CGT, CGC, CGA, CGG, AGA, AGGStop codons Stop TAA, TAG, TGA Can map 61 non-stop codons to frequencies & take log-odds ratiosatgtgaggtgagggtgagggtgagcaggtgcagatgcagttgcaggccggtgagBiology of Splicing(http://genes.mit.edu/chris/)3. Recognize splice sites(http://www-lmmb.ncifcrf.gov/~toms/sequencelogo.html)Donor: 7.9 bitsAcceptor: 9.4 bits(Stephens & Schneider, 1996)5’3’Donor sitePosition-8 … -2 -1 0 1 2 … 17A 26 … 60 9 0 1 54 … 2 1C 26 … 15 5 0 1 2 … 27G 25 … 12 78 99 0 41 … 27T 23 … 13 8 1 98 3 … 253. Recognize splice sites•WMM: weight matrix model = PSSM (Staden 1984)•WAM: weight array model = 1st order Markov (Zhang & Marr 1993)•MDD: maximal dependence decomposition (Burge & Karlin 1997) Decision-tree algorithm to take pairwise dependencies into account•For each position I, calculate Si = ji2(Ci, Xj)•Choose i* such that Si* is maximal and partition into two subsets, until•No significant dependencies left, or•Not enough sequences in subsetTrain separate WMM models for each subsetAll donor splice sitesG5not G5G5G-1G5not G-1G5G-1A2G5G-1not A2G5G-1A2U6G5G-1A2not U63. Recognize splice sites4. Model the duration of regionsGTCAGATGAGCAAAGTAGACACTCCAGTAACGCGGTGAGTACATTAAexonexonexonintronintronintergeneintergeneHidden Markov Models for Gene FindingIntergene StateFirst Exon StateIntronStateGTCAGATGAGCAAAGTAGACACTCCAGTAACGCGGTGAGTACATTAAexonexonexonintronintronintergeneintergeneHidden Markov Models for Gene FindingIntergene StateFirst Exon StateIntronStateTAA A A A A A A A A A A AA AAT T T T T T T T T T T T T T TG GGG G G G GGGG G G G GC C C C C C CExon1 Exon2 Exon3durationDuration HMM for Gene FindingDuration ModelingIntrons: regular HMM states—geometric durationExons: special duration modelVE0,0(i) = maxd=1…D { Prob[duration(E0,0)=d]aIntron0,E0,0 j=i-d+1…ieE0,0(xj) }where i is an admissible exon-ending state,D is restricted by the longest ORFGENSCAN:Chris Burge and Sam Karlin, 1997Best performing de novo gene finderHMM with duration modeling for Exon statesHMM-based Gene Finders•GENSCAN (Burge 1997)Big jump in accuracy of de novo gene findingCurrently, one of the bestHMM with duration modeling for Exon states •FGENESH (Solovyev 1997)Currently one of the best•HMMgene (Krogh 1997)•GENIE (Kulp 1996)•GENMARK (Borodovsky & McIninch 1993)•VEIL (Henderson, Salzberg, & Fasman 1997)Better way to do it: negative binomial•EasyGene:Prokaryoticgene-finderLarsen TS, Krogh A•Negative binomial with n = 3GENSCAN’s hidden weapon•C+G content is correlated with:Gene content (+)Mean exon length (+)Mean intron length (–)•These quantities affect parameters of model•SolutionTrain parameters of model in four different C+G content ranges!Evaluation of Accuracy(Slide by NF Samatova)Sensitivity (SN) Fraction of exons (coding nucleotides) whose boundaries are predicted exactly (that are predicted as coding)•Specificity (Sp) Fraction of the predicted exons (coding nucleotides) that are exactly correct (that are coding)•Correlation Coefficient (CC)Combined measure of Sensitivity & Specificity Range: -1 (always wrong)  +1 (always right)TP FP TN FN TP FN TNActualPredictedCoding


View Full Document

Stanford CS 262 - Lecture 11- Static Race Detection

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture 11- Static Race Detection
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 11- Static Race Detection and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 11- Static Race Detection 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?