DOC PREVIEW
U of I CS 498 - Motif finding with Gibbs sampling

This preview shows page 1-2-3-25-26-27 out of 27 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 27 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Motif finding with Gibbs samplingRegulatory networksDecoding the regulatory networkTranscriptional regulationSlide 5A motif modelMotifs: Matrices and ConsensusPosition weight matricesInformation ContentSlide 10Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple AlignmentMotif Finding ProblemGenerative ModelNotationsProbability of data given modelScoring FunctionOptimization and SamplingMarkov Chain samplingGibbs sampling to maximize FAlgorithmSlide 21Slide 22Slide 23Slide 24Slide 25Slide 26Local optimaMotif finding with Gibbs samplingCS 498 SSRegulatory networks•Genes are switches, transcription factors are (one type of) input signals, proteins are outputs•Proteins (outputs) may be transcription factors and hence become signals for other genes (switches)•This may be the reason why humans have so few genes (the circuit, not the number of switches, carries the complexity)Decoding the regulatory network•Find patterns (“motifs”) in DNA sequence that occur more often than expected by chance–These are likely to be binding sites for transcription factors–Knowing these can tell us if a gene is regulated by a transcription factor (i.e., the “switch”)GENEACAGTGATRANSCRIPTIONFACTORPROTEINTranscriptional regulationGENEACAGTGATRANSCRIPTIONFACTORPROTEINTranscriptional regulationA motif model•To define a motif, lets say we know where the motif starts in the sequence•The motif start positions in their sequences can be represented as s = (s1,s2,s3,…,st)Genes regulatedby same transcription factorMotifs: Matrices and Consensus a G g t a c T t C c A t a c g tAlignment a c g t T A g t a c g t C c A t C c g t a c g G _________________ A 3 0 1 0 3 1 1 0Matrix C 2 4 0 0 1 4 0 0 G 0 1 4 0 0 0 3 1 T 0 0 0 5 1 0 1 4 _________________Consensus A C G T A C G T•Line up the patterns by their start indexes s = (s1, s2, …, st)•Construct “position weight matrix” with frequencies of each nucleotide in columns•Consensus nucleotide in each position has the highest frequency in columnPosition weight matrices•Suppose there were t sequences to begin with•Consider a column of a position weight matrix•The column may be (t, 0, 0, 0)–A perfectly conserved column •The column may be (t/4, t/4, t/4, t/4)–A completely uniform column•“Good” profile matrices should have more conserved columnsInformation Content•In a PWM, convert frequencies to probabilities•PWM W: Wk = frequency of base  at position k•q = frequency of base  by chance•Information content of W:€ WβklogWβkqββ ∈{A,C,G,T }∑k∑Information Content•If Wk is always equal to q, i.e., if W is similar to random sequence, information content of W is 0.•If W is different from q, information content is high.Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple AlignmentLawrence et al. 1993Motif Finding Problem•Given a set of sequences, find the motif shared by all or most sequences, while its starting position in each sequence is unknown•Assumption:–Each motif appears exactly once in one sequence–The motif has fixed lengthGenerative Model•Suppose the sequences are aligned, the aligned regions are generated from a motif model •Motif model is a PWM. A PWM is a position-specific multinomial distribution. –For each position i, a multinomial distribution on (A,C,G,T): qiA,qiC,qiG,qiT•The unaligned regions are generated from a background model: pA,pC,pG,pTNotations•Set of symbols: •Sequences: S = {S1, S2, …, SN}•Starting positions of motifs: A = {a1, a2, …, aN}•Motif model ( ) : qij = P(symbol at the i-th position = j)•Background model: pj = P(symbol = j)•Count of symbols in each column: cij= count of symbol, j, in the i-th column in the aligned regionΣθProbability of data given model∏ ∏=Σ==Wi jcijijqASP1||1),|( θ∏ ∏=Σ==Wi jcjijpASP1||10),|( θScoring Function•Maximize the log-odds ratio:•Is greater than zero if the data is a better match to the motif model than to the background model∏ ∏=Σ==Wi jcijijqASP1||1),|( θ∏ ∏=Σ==Wi jcjijpASP1||10),|( θ∑ ∑=Σ===Wi jjijijpqcASPASPF1||10log),|(),|(logθθOptimization and Sampling•To maximize a function, f(x):–Brute force method: try all possible x–Sample method: sample x from probability distribution: p(x) ~ f(x)–Idea: suppose xmax is argmax of f(x), then it is also argmax of p(x), thus we have a high probability of selecting xmaxMarkov Chain sampling•To sample from a probability distribution p(x), we set up a Markov chain s.t. each state represents a value of x and for any two states, x and y, the transitional probabilities satisfy:)()(1lim xpxCNN=∞→•This would then imply that if the Markov chain is “run” for “long enough”, the probability thereafter of being in state x will be p(x) € p(x)Pr(x → y) = p(y)Pr(y → x)Gibbs sampling to maximize F•Gibbs sampling is a special type of Markov chain sampling algorithm•Our goal is to find the optimal A = (a1,…aN)•The Markov chain we construct will only have transitions from A to alignments A’ that differ from A in only one of the ai •In round-robin order, pick one of the ai to replace•Consider all A’ formed by replacing ai with some other starting position ai’ in sequence Si•Move to one of these A’ probabilistically•Iterate the last three stepsAlgorithmRandomly initialize A0;Repeat:(1) randomly choose a sequence z from S;A* = At \ az; compute θt from A*;(2) sample az according to P(az = x), which is proportional to Qx/Px; update At+1 = A*  x;Select At that maximizes F;Qx: the probability of generating x according to θt; Px: the probability of generating x according to the background model€ qij=cijcikk∑AlgorithmCurrent solution AtAlgorithmChoose one az to replaceAlgorithmFor each candidate sitex in sequence z, calculate Qx and Px:Probabilities of samplingx from motif model andbackground model resp.xAlgorithmAmong all possible candidates, choose one(say x) with probabilityproportional to Qx/PxxAlgorithmSet At+1 = A*  xxAlgorithmRepeatxLocal optima•The algorithm may not find the “global” or true maximum of the scoring function•Once “At” contains many similar substrings, others matching these will be chosen with higher probability•Algorithm will “get locked” into a “local optimum” –all neighbors have poorer scores, hence low chance of moving out of this


View Full Document

U of I CS 498 - Motif finding with Gibbs sampling

Documents in this Course
Lecture 5

Lecture 5

13 pages

LECTURE

LECTURE

39 pages

Assurance

Assurance

44 pages

LECTURE

LECTURE

36 pages

Pthreads

Pthreads

29 pages

Load more
Download Motif finding with Gibbs sampling
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Motif finding with Gibbs sampling and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Motif finding with Gibbs sampling 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?