DOC PREVIEW
Stanford CS 262 - Finding Regulatory Motifs

This preview shows page 1-2-3-23-24-25-26-46-47-48 out of 48 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 48 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Finding Regulatory MotifsRegulatory Motif DiscoveryCharacteristics of Regulatory MotifsSequence LogosProblem DefinitionDiscrete Approaches to Motif FindingDiscrete FormulationsExhaustive SearchesSlide 9MULTIPROFILERSlide 11Slide 12CONSENSUSSlide 14Slide 15Expectation Maximization in Motif FindingExpectation MaximizationSlide 18Slide 19Slide 20Slide 21Expectation Maximization: E-stepExpectation Maximization: M-stepSlide 24Initial Parameters Matter!Overview of EM AlgorithmGibbs Sampling in Motif FindingGibbs SamplingSlide 29Slide 30Slide 31Slide 32Slide 33Advantages / DisadvantagesRepeats, and a Better Background ModelLimits of Motif FindersExample Application: Motifs in YeastMotifs in Periodic ClustersMotifs in Non-periodic ClustersSlide 40Comparison-based Regulatory Motif DiscoveryKnown motifs are frequently conservedFinding conserved motifs in whole genomes M. Kellis PhD Thesis on yeasts, X. Xie & M. Kellis on mammalsTest 1: Intergenic conservationTest 2: Intergenic vs. CodingTest 3: Upstream vs. DownstreamConstructing full motifsSummary for promoter motifsCS262 Lecture 18, Win06, BatzoglouFinding Regulatory MotifsCS262 Lecture 18, Win06, BatzoglouRegulatory Motif DiscoveryDNAGroup of co-regulated genesCommon subsequenceFind motifs within groups of corregulated genes slide credits: M. KellisCS262 Lecture 18, Win06, BatzoglouCharacteristics of Regulatory Motifs•Tiny•Highly Variable•~Constant SizeBecause a constant-size transcription factor binds•Often repeated•Low-complexity-ishCS262 Lecture 18, Win06, BatzoglouSequence LogosCS262 Lecture 18, Win06, BatzoglouProblem DefinitionProbabilisticMotif: Mij; 1  i  W1  j  4Mij = Prob[ letter j, pos i ]Find best M, and positions p1,…, pN in sequencesCombinatorialMotif M: m1…mWSome of the mi’s blankFind M that occurs in all si with  k differencesGiven a collection of promoter sequences s1,…, sN of genes with common expressionCS262 Lecture 18, Win06, BatzoglouDiscrete Approaches to Motif FindingCS262 Lecture 18, Win06, BatzoglouDiscrete FormulationsGiven sequences S = {x1, …, xn}•A motif W is a consensus string w1…wK•Find motif W* with “best” match to x1, …, xnDefinition of “best”:d(W, xi) = min hamming dist. between W and any word in xid(W, S) = i d(W, xi)CS262 Lecture 18, Win06, BatzoglouExhaustive Searches1. Pattern-driven algorithm:For W = AA…A to TT…T (4K possibilities)Find d( W, S )Report W* = argmin( d(W, S) )Running time: O( K N 4K )(where N = i |xi|)Advantage: Finds provably “best” motif WDisadvantage: TimeCS262 Lecture 18, Win06, BatzoglouExhaustive Searches2. Sample-driven algorithm:For W = any K-long word occurring in some xiFind d( W, S )Report W* = argmin( d( W, S ) )or, Report a local improvement of W*Running time: O( K N2 )Advantage: TimeDisadvantage: If the true motif is weak and does not occur in datathen a random motif may score better than any instance of true motifCS262 Lecture 18, Win06, BatzoglouMULTIPROFILER•Extended sample-driven approachGiven a K-long word W, define: Nα(W) = words W’ in S s.t. d(W,W’)  αIdea: Assume W is occurrence of true motif W*Will use Nα(W) to correct “errors” in WCS262 Lecture 18, Win06, BatzoglouMULTIPROFILERAssume W differs from true motif W* in at most L positionsDefine: A wordlet G of W is a L-long pattern with blanks, differing from WL is smaller than the word length KExample: K = 7; L = 3W = ACGTTGAG = --A--CGCS262 Lecture 18, Win06, BatzoglouMULTIPROFILERAlgorithm:For each W in S:For L = 1 to Lmax1. Find the α-neighbors of W in S  Nα(W)2. Find all “strong” L-long wordlets G in Na(W)3. For each wordlet G,1. Modify W by the wordlet G  W’2. Compute d(W’, S)Report W* = argmin d(W’, S)Step 2 above: Smaller motif-finding problem; Use exhaustive searchCS262 Lecture 18, Win06, BatzoglouCONSENSUSAlgorithm:Cycle 1:For each word W in S (of fixed length!)For each word W’ in SCreate alignment (gap free) of W, W’Keep the C1 best alignments, A1, …, AC1ACGGTTG , CGAACTT , GGGCTCT …ACGCCTG , AGAACTA , GGGGTGT …CS262 Lecture 18, Win06, BatzoglouCONSENSUSAlgorithm:Cycle t:For each word W in SFor each alignment Aj from cycle t-1Create alignment (gap free) of W, AjKeep the Cl best alignments A1, …, ACtACGGTTG , CGAACTT , GGGCTCT …ACGCCTG , AGAACTA , GGGGTGT …… … …ACGGCTC , AGATCTT , GGCGTCT …CS262 Lecture 18, Win06, BatzoglouCONSENSUS•C1, …, Cn are user-defined heuristic constantsN is sum of sequence lengthsn is the number of sequencesRunning time: O(N2) + O(N C1) + O(N C2) + … + O(N Cn)= O( N2 + NCtotal)Where Ctotal = i Ci, typically O(nC), where C is a big constantCS262 Lecture 18, Win06, BatzoglouExpectation Maximization in Motif FindingExpectation Maximization in Motif FindingCS262 Lecture 18, Win06, BatzoglouAll K-long wordsmotifbackgroundExpectation MaximizationAlgorithm (sketch):1. Given genomic sequences find all k-long words2. Assume each word is motif or background3. Find likeliest Motif ModelBackground Modelclassification of words into either Motif or BackgroundCS262 Lecture 18, Win06, BatzoglouExpectation MaximizationGiven sequences x1, …, xN,•Find all k-long words X1,…, Xn•Define motif model: M = (M1,…, MK)Mi = (Mi1,…, Mi4)(assume {A, C, G, T})where Mij = Prob[ letter j occurs in motif position i ]•Define background model:B = B1, …, B4Bi = Prob[ letter j in background sequence ]motifbackgroundACGTM1MKM1BCS262 Lecture 18, Win06, BatzoglouExpectation Maximization•Define Zi1 = { 1, if Xi is motif; 0, otherwise }Zi2 = { 0, if Xi is motif; 1, otherwise }•Given a word Xi = x[s]…x[s+k],P[ Xi, Zi1=1 ] =  M1x[s]…Mkx[s+k] P[ Xi, Zi2=1 ] = (1 – ) Bx[s]…Bx[s+k]Let 1 = ; 2 = (1 – )motifbackgroundACGTM1MKM1B 1 – CS262 Lecture 18, Win06, BatzoglouExpectation MaximizationDefine:Parameter space  = (M, B) 1: Motif; 2: BackgroundObjective:Maximize log likelihood of model:   2121111211log)|(log))|(log(),|,...(logj jjijnijiijnini jjijijnZZZXPXPZXXPACGTM1MKM1B 1 – CS262 Lecture 18, Win06, BatzoglouExpectation Maximization•Maximize expected likelihood, in iteration of two steps:Expectation:Find expected value of log likelihood:Maximization:Maximize expected value over , )],|,...([log1ZXXPEnCS262 Lecture 18, Win06, BatzoglouExpectation:Find expected value of log likelihood: 


View Full Document

Stanford CS 262 - Finding Regulatory Motifs

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Finding Regulatory Motifs
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Finding Regulatory Motifs and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Finding Regulatory Motifs 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?