DOC PREVIEW
Stanford CS 262 - Gibbs Sampling

This preview shows page 1-2-3-21-22-23-43-44-45 out of 45 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 45 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Gibbs Sampling in Motif FindingGibbs SamplingSlide 3Slide 4Slide 5Slide 6Slide 7Advantages / DisadvantagesRepeats, and a Better Background ModelExample Application: Motifs in YeastProcessing of DataMotifs in Periodic ClustersMotifs in Non-periodic ClustersPhylogenetic Footprinting (Slides by Martin Tompa)Phylogenetic Footprinting (Tagle et al. 1988)Slide 16Slide 17Slide 18Slide 19An Exact Algorithm (generalizing Sankoff and Rousseau 1975)Slide 21Slide 22Slide 23Limits of Motif FindersSlide 25RNA Secondary StructureRNA and TranslationRNA and SplicingSlide 29Slide 30Slide 31Slide 32Modeling RNA Secondary Structure: Context-Free GrammarsA Context Free GrammarExample: modeling a stem loopSlide 36A parse tree: alignment of CFG to sequenceAlignment scores for parses!The Nussinov AlgorithmSlide 40Slide 41The Nussinov Algorithm and CFGsSlide 43Stochastic Context Free GrammarsExampleGibbs Sampling in Motif FindingGibbs Sampling in Motif FindingGibbs Sampling•Given: x1, …, xN, motif length K,background B,•Find:Model MLocations a1,…, aN in x1, …, xNMaximizing log-odds likelihood ratio: NiKkikaikaiixBxkM1 1)(),(logGibbs Sampling•AlignACE: first statistical motif finder•BioProspector: improved version of AlignACEAlgorithm (sketch):1. Initialization:a. Select random locations in sequences x1, …, xNb. Compute an initial model M from these locations2. Sampling Iterations:a. Remove one sequence xib. Recalculate modelc. Pick a new location of motif in xi according to probability the location is a motif occurrenceGibbs SamplingInitialization:•Select random locations a1,…, aN in x1, …, xN•For these locations, compute M:NikakjjxNMi1)(1•That is, Mkj is the number of occurrences of letter j in motif position k, over the totalGibbs SamplingPredictive Update:•Select a sequence x = xi•Remove xi, recompute model:))(()1(1,1NisskajkjjxBNMswhere j are pseudocounts to avoid 0s,and B = j j MGibbs SamplingSampling:For every K-long word xj,…,xj+k-1 in x:Qj = Prob[ word | motif ] = M(1,xj)…M(k,xj+k-1)Pi = Prob[ word | background ] B(xj)…B(xj+k-1)Let Sample a random new position ai according to the probabilities A1,…, A|x|-k+1.1||1//kxjjjjjjPQPQA0 |x|ProbGibbs SamplingRunning Gibbs Sampling:1. Initialize2. Run until convergence3. Repeat 1,2 several times, report common motifsAdvantages / Disadvantages•Very similar to EMAdvantages:•Easier to implement•Less dependent on initial parameters•More versatile, easier to enhance with heuristicsDisadvantages:•More dependent on all sequences to exhibit the motif•Less systematic search of initial parameter spaceRepeats, and a Better Background Model•Repeat DNA can be confused as motifEspecially low-complexity CACACA… AAAAA, etc.Solution: more elaborate background model0th order: B = { pA, pC, pG, pT }1st order: B = { P(A|A), P(A|C), …, P(T|T) }…Kth order: B = { P(X | b1…bK); X, bi{A,C,G,T} }Has been applied to EM and Gibbs (up to 3rd order)Example Application: Motifs in YeastGroup:Tavazoie et al. 1999, G. Church’s lab, HarvardData:•Microarrays on 6,220 mRNAs from yeast Affymetrix chips (Cho et al.)•15 time points across two cell cyclesProcessing of Data1. Selection of 3,000 genesGenes with most variable expression were selected2. Clustering according to common expression•K-means clustering•30 clusters, 50-190 genes/cluster•Clusters correlate well with known function3. AlignACE motif finding •600-long upstream regions•50 regions/trialMotifs in Periodic ClustersMotifs in Non-periodic ClustersPhylogenetic FootprintingPhylogenetic Footprinting(Slides by Martin Tompa)(Slides by Martin Tompa)Phylogenetic Footprinting(Tagle et al. 1988)•Functional sequences evolve slower than nonfunctional ones•Consider a set of orthologous sequences from different species•Identify unusually well conserved regionsSubstring Parsimony ProblemGiven:•phylogenetic tree T,•set of orthologous sequences at leaves of T,•length k of motif•threshold dProblem:•Find each set S of k-mers, one k-mer from each leaf, such that the “parsimony” score of S in T is at most d.This problem is NP-hard.Small ExampleAGTCGTACGTGAC... (Human)AGTAGACGTGCCG... (Chimp)ACGTGAGATACGT... (Rabbit)GAACGGAGTACGT... (Mouse)TCGTGACGGTGAT... (Rat)Size of motif sought: k = 4SolutionParsimony score: 1 mutationAGTCGTACGTGAC...AGTAGACGTGCCG...ACGTGAGATACGT...GAACGGAGTACGT...TCGTGACGGTGAT...ACGGACGTACGTACGTCLUSTALW multiple sequence alignment (rbcS gene)Cotton ACGGTT-TCCATTGGATGA---AATGAGATAAGAT---CACTGTGC---TTCTTCCACGTG--GCAGGTTGCCAAAGATA-------AGGCTTTACCATTPea GTTTTT-TCAGTTAGCTTA---GTGGGCATCTTA----CACGTGGC---ATTATTATCCTA--TT-GGTGGCTAATGATA-------AGG--TTAGCACATobacco TAGGAT-GAGATAAGATTA---CTGAGGTGCTTTA---CACGTGGC---ACCTCCATTGTG--GT-GACTTAAATGAAGA-------ATGGCTTAGCACCIce-plant TCCCAT-ACATTGACATAT---ATGGCCCGCCTGCGGCAACAAAAA---AACTAAAGGATA--GCTAGTTGCTACTACAATTC--CCATAACTCACCACCTurnip ATTCAT-ATAAATAGAAGG---TCCGCGAACATTG--AAATGTAGATCATGCGTCAGAATT--GTCCTCTCTTAATAGGA-------A-------GGAGCWheat TATGAT-AAAATGAAATAT---TTTGCCCAGCCA-----ACTCAGTCGCATCCTCGGACAA--TTTGTTATCAAGGAACTCAC--CCAAAAACAAGCAAADuckweed TCGGAT-GGGGGGGCATGAACACTTGCAATCATT-----TCATGACTCATTTCTGAACATGT-GCCCTTGGCAACGTGTAGACTGCCAACATTAATTAAALarch TAACAT-ATGATATAACAC---CGGGCACACATTCCTAAACAAAGAGTGATTTCAAATATATCGTTAATTACGACTAACAAAA--TGAAAGTACAAGACCCotton CAAGAAAAGTTTCCACCCTC------TTTGTGGTCATAATG-GTT-GTAATGTC-ATCTGATTT----AGGATCCAACGTCACCCTTTCTCCCA-----APea C---AAAACTTTTCAATCT-------TGTGTGGTTAATATG-ACT-GCAAAGTTTATCATTTTC----ACAATCCAACAA-ACTGGTTCT---------ATobacco AAAAATAATTTTCCAACCTTT---CATGTGTGGATATTAAG-ATTTGTATAATGTATCAAGAACC-ACATAATCCAATGGTTAGCTTTATTCCAAGATGAIce-plant ATCACACATTCTTCCATTTCATCCCCTTTTTCTTGGATGAG-ATAAGATATGGGTTCCTGCCAC----GTGGCACCATACCATGGTTTGTTA-ACGATAATurnip CAAAAGCATTGGCTCAAGTTG-----AGACGAGTAACCATACACATTCATACGTTTTCTTACAAG-ATAAGATAAGATAATGTTATTTCT---------AWheat GCTAGAAAAAGGTTGTGTGGCAGCCACCTAATGACATGAAGGACT-GAAATTTCCAGCACACACA-A-TGTATCCGACGGCAATGCTTCTTC--------Duckweed ATATAATATTAGAAAAAAATC-----TCCCATAGTATTTAGTATTTACCAAAAGTCACACGACCA-CTAGACTCCAATTTACCCAAATCACTAACCAATTLarch TTCTCGTATAAGGCCACCA-------TTGGTAGACACGTAGTATGCTAAATATGCACCACACACA-CTATCAGATATGGTAGTGGGATCTG--ACGGTCACotton ACCAATCTCT---AAATGTT----GTGAGCT---TAG-GCCAAATTT-TATGACTATA--TAT----AGGGGATTGCACC----AAGGCAGTG-ACACTAPea


View Full Document

Stanford CS 262 - Gibbs Sampling

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Gibbs Sampling
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Gibbs Sampling and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Gibbs Sampling 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?