DOC PREVIEW
Stanford CS 262 - Lecture 19

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Motif Findinglecture 19Expectation Maximization in Motif FindingExpectation Maximization was first applied to motif finding in 1994 in the MEME algorithm by Bailey and Elkan. We are given a set of promoters which we believe to contain some common motif. There is an approximate word in a probabilistic sense that has at a given position a letter or set of letters that is more likely than the others. In a word of length 10, there are 20 bits of information. If we already know that a given letter is an A or T then we already have half the information. A typical motif contains approximately 10 bits of information (i.e. each position has 1 or 2 choices). The promoter sequences can be thought of as a mixture of occurrences of a motif of words bound by a transcription factor and non-motif words. motif background For every partition of a promoter into motif words and non motif words we will assign a likelihood to how likely this partition is (how closely do the words look like background words or motif words). We then try to find the likeliest background and motif models and the partition into these models. This is an intractable problem, so we use Expectation Maximization to attempt to find the local maximum of this likelihood function.The background and motif models used by MEME are as follows (PSSM model):motif background A C G T M1 MK M1 B ! 1 – !Let us define the background distribution as a probability distribution over A,C,G and T. This is the frequency of A,C,G and T within a promoter/genome in this particular species. The motif is defined as a matrix with k columns which correspond to the motif length, k. Each of these columns contains an independent probability distribution over A,C, G and T.Notation:Given N sequences, x1, ... , xN- In each of the sequences, find all k-long words X1, ... , Xn, put all of these words into a bag so we don’t know which promoter or sequence that it came from. From now on we will be working in word space.- Define the motif model as: M = (M1, ... Mk) Mi = Mi1, ..., Mi4 (assume {A,C,G,T}) Mij = Prob[letter j occurs in the motif at position i]-Define the background model as: B = B1,..., B4 Bi = Prob[letter i is in the background sequence]- Indicator Variables Zi1 = 1 (true) if Xi is a motif , or 0 (false) otherwise. Zi2 = 0 (true) if Xi is a motif, or 1( (false) otherwise.-λ is the prior of a word being a motif, 1- λ is the prior of a word being in the background.Given the ith word, Xi = x[s]…x[s+k],The probability of this word, if this is a motif, is given by: P[ Xi, Zi1=1 ] = λM1x[s]…Mkx[s+k] Similarly, if this word is a background word, the probability is given by:P[ Xi, Zi2=1 ] = (1 –λ ) Bx[s]…Bx[s+k]From now on, we will let: λ1 = λ λ2 = (1 – λ)We are looking for the best motif as well as the words out of the promoters that are motif instances.The parameter space, θ, for the motif and background models can be broken down into:θ1 for the motif space and θ2 for the background space.These are the parameters that we would like to train. We would like to maximize the joint probability of all of the words, X1 to XN and their assignments Z to either motif or background given parameters θ and λ. The log likelihood of this model is given by:We sum over all words, during which we sum over the joint probabilities j = 1 to 2 of whether this word is a motif or background word (the indicator variable Z). The indicator variable is multiplied by the log of the prior distribution if being in this state and the probability that this word is in theta.Above we have also shown that you can break the priors away from the log probabilities of the motif and background. So now we can maximize these two terms independently given assignments.Expectation:We would now like to find the expected value of the log likelihood and maximize it over θ and λ. The expected value if the log likelihood can be computed as follows:The is the expectation of the assignment multiplied by the log probability of the appropriate model.The expected value of Z is computed like this:The probability of a word being a motif or background word can be given by taking the prior of the word multiplied by the probability that it is in this model divided by theprobability that we assign to this event happening + the probability of the other event happening.Maximization:In order to calculate the new prior of being in a motif, we calculate the sum of the expected Z’s over the number of words. This is the expected fraction of motifs in our promoters. Similarly, for the parameters in θ = (M,B) we can define:cjk = E[ # times letter k appears in motif position j]c0k = E[ # times letter k appears in background] cij values are calculated easily from Z* valuesThe new probabilities for the motif and the background of the kth letter in the jth position of the motif are given by:It is important not to have any 0’s in these probabilities. If the dataset is small, you can add pseudocounts (artificially introduce a small value that each count starts with) to ensure that a divide by 0 error does not occur.Setting Initial Parameters:This is not a global optimization step, it is a local optimization step. You will not find the partition that globally gives you the highest likelihood. For this reason you may want to run the maximization many times different initial values.This is illustrated by the following example:Let us have 6-mers X1, …, Xn: (n = 2000)990 words “AAAAAA”990 words “CCCCCC”20 words “ACACAC”Some local maxima:λ = 49.5%; B = 100/101 C, 1/101 A M = 100% AAAAAAλ = 1%; B = 50% C, 50% A M = 100% ACACACBy having different starting values of λ and letter probabilities you may converge to different local maxima and find vastly different motifs. So it is good to have many random restarts. This can be done by trying different values of λ between N-0.5 to 1/(2K) and repeating both the expectation and the maximization steps for each of them until there is convergence. Report the best several sets of parameters to the user. This program is still used heavily by biologists. It is quite powerful for being so simple.Gibbs Sampling in Motif FindingGibbs sampling is another popular method used in motif finding. AlignACE was the first statistical motif finder to use Gibbs sampling. BioProspector (developed at Stanford by Shirley Liu) is an improved version of AlignACE .The basic algorithm works as


View Full Document

Stanford CS 262 - Lecture 19

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture 19
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 19 and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 19 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?