DOC PREVIEW
CMU BSC 03510 - Lecture
Pages 31

This preview shows page 1-2-14-15-30-31 out of 31 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 31 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Computational Biology, Part 9 GenefindingClues to locations of genes (Prokaryotic Signals)Prokaryotic vs. Eukaryotic GenefindingClues to locations of genes (Eukarytoic Signals)Clues to locations of genes (Eukaryotic Signals)DNA->RNA->proteinPowerPoint PresentationRibosomal Binding SiteSplicing SignalsDonor and Acceptor Sites: GT and AG dinucleotidesDonor and Acceptor Sites: Motif LogosSignal Sensors: Consensus SequencesSignal Sensors: NetworksContent Sensors: Coding RegionsIntegrated SystemsHMM approaches to GenefindingGene modelBasic ImplementationGENSCAN HMMSlide 20Adding HomologyAdding ESTsDrawbacksAssessing PerformanceMachine Learning 101Slide 26Slide 27Performance MeasuresResults for 7 programsNucleotide accuracyExon accuracyComputational Biology, Part 9GenefindingComputational Biology, Part 9GenefindingRobert F. MurphyRobert F. MurphyCopyright Copyright  1997, 2001, 2003-2008. 1997, 2001, 2003-2008.All rights reserved.All rights reserved.Clues to locations of genes (Prokaryotic Signals)Clues to locations of genes (Prokaryotic Signals)for Transcriptionfor TranscriptionPromotersPromotersTranscription factor binding sitesTranscription factor binding sitesfor Translationfor TranslationRibosome binding sitesRibosome binding sitesStart/stop codonsStart/stop codonsProkaryotic vs. Eukaryotic GenefindingProkaryotic vs. Eukaryotic GenefindingSimple to create programs to look for Simple to create programs to look for “grammatical” combination of prokaryotic “grammatical” combination of prokaryotic signalssignalsMuch more complicated for eukaryotes due Much more complicated for eukaryotes due to the presence of introns and additional to the presence of introns and additional regulatory elementsregulatory elementsClues to locations of genes (Eukarytoic Signals)Clues to locations of genes (Eukarytoic Signals)for Transcriptionfor TranscriptionPromotersPromotersTranscription terminatorsTranscription terminatorsTopoisomerase II binding sitesTopoisomerase II binding sitesTopoisomerase I cleavage sitesTopoisomerase I cleavage sitesTranscription factor binding sitesTranscription factor binding sitesfor Splicingfor SplicingDonor and acceptor sitesDonor and acceptor sitesBranch pointsBranch pointsClues to locations of genes(Eukaryotic Signals)Clues to locations of genes(Eukaryotic Signals)for mRNA Processingfor mRNA ProcessingPolyadenylation sitesPolyadenylation sitesfor Translationfor TranslationRibosome binding sitesRibosome binding sitesStart/stop codonsStart/stop codonsDNA->RNA->proteinDNA->RNA->proteinRibosomal Binding SiteRibosomal Binding SiteSplicing SignalsSplicing SignalsTry to recognize location of splicing signals at exon-Try to recognize location of splicing signals at exon-intron junctionsintron junctionsThis has yielded a weakly conserved donor splice This has yielded a weakly conserved donor splice site and acceptor splice sitesite and acceptor splice siteProfiles for sites are still weak, and lends the problem Profiles for sites are still weak, and lends the problem to the Hidden Markov Model (HMM) approaches, to the Hidden Markov Model (HMM) approaches, which capture the statistical dependencies between which capture the statistical dependencies between sitessitesDonor and Acceptor Sites: GT and AG dinucleotidesDonor and Acceptor Sites: GT and AG dinucleotidesThe beginning and end of exons are signaled by donor and The beginning and end of exons are signaled by donor and acceptor sites that usually have GT and AC dinucleotidesacceptor sites that usually have GT and AC dinucleotidesDetecting these sites is difficult, because GT and AC appear Detecting these sites is difficult, because GT and AC appear very oftenvery oftenexon 1 exon 2GT ACAcceptorSiteDonorSite(http://www-lmmb.ncifcrf.gov/~toms/sequencelogo.html)Donor: 7.9 bitsAcceptor: 9.4 bits(Stephens & Schneider, 1996)Donor and Acceptor Sites: Motif LogosDonor and Acceptor Sites: Motif LogosSignal Sensors:Consensus SequencesSignal Sensors:Consensus SequencesSimpleSimpleTATATATAPROSITE expressionPROSITE expressionY-x-G-A-[FL]-[KRHNQ]-C-L-x(3,4)-G-Y-x-G-A-[FL]-[KRHNQ]-C-L-x(3,4)-G-[DENQ]-V-[GA]-[FYW][DENQ]-V-[GA]-[FYW](iron binding site in transferrin)(iron binding site in transferrin)Signal Sensors:NetworksSignal Sensors:NetworksProfile, PSSM (equivalent to perceptron)Profile, PSSM (equivalent to perceptron)Neural Network (multi-layer)Neural Network (multi-layer)Content Sensors:Coding RegionsContent Sensors:Coding RegionsGeneMark: 3 fifth-order Markov modelsGeneMark: 3 fifth-order Markov modelsone for each reading frameone for each reading frameGRAIL: uses neural net with inputs fromGRAIL: uses neural net with inputs fromcoding potential measurescoding potential measuresbase compositionbase compositionsignal sensor output for flanking splice sitessignal sensor output for flanking splice sitesIntegrated SystemsIntegrated SystemsUse dynamic programming to find best Use dynamic programming to find best combination of signal/content sensorscombination of signal/content sensorsApply “linguistic” rules to say what parts Apply “linguistic” rules to say what parts are required and in what orderare required and in what orderHMM approaches to GenefindingHMM approaches to GenefindingGene modelGene modelB = gene startB = gene startS = translation startS = translation startD = donorD = donorA = accceptorA = accceptorT = translation stopT = translation stopE = gene endE = gene endBasic ImplementationBasic ImplementationUse an HMM to model what state Q each Use an HMM to model what state Q each nucleotide from X is in (given parameters nucleotide from X is in (given parameters Train HMM with known genes to estimate Train HMM with known genes to estimate For unknown sequence, find Q to maximize For unknown sequence, find Q to maximize P(Q | X, P(Q | X, Used by GENSCAN, HMMgeneUsed by GENSCAN, HMMgeneGENSCAN HMMGENSCAN HMMHandles genes Handles genes on both on both forward and forward and reverse reverse strandsstrandsAdding HomologyAdding HomologyCan try to include information from Can try to include information from databases of known proteins to help decide databases of known proteins to help decide whether an exon is codingwhether an exon is codingFor each candidate exon, increase the score For each candidate exon, increase the score if there is


View Full Document

CMU BSC 03510 - Lecture

Download Lecture
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?