DOC PREVIEW
Stanford CS 262 - Lecture 16 Gene Regulation and Microarrays

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Gene Regulation and Microarrays Finding Regulatory Motifs Given a collection of genes with common expression Find the TF binding motif in common Characteristics of Regulatory Motifs Tiny Highly Variable Constant Size Because a constant size transcription factor binds Often repeated Low complexity ish Sequence Logos Information at pos n I H i letter x freq x i log2 freq x i Height of x at pos n i L x i freq x i 2 H i Examples freq A i 1 H i 0 L A i 2 A C G H i 1 5 L A i L not T i Problem Definition Given a collection of promoter sequences s1 sN of genes with common expression Probabilistic Motif Mij 1 i W 1 j 4 Mij Prob letter j pos i Combinatorial Motif M m1 mW Some of the mi s blank Find M that occurs in all si with k differences Or Find M with smallest total hamming dist Find best M and positions p1 pN in sequences Essentially a Multiple Local Alignment Find best multiple local alignment Alignment score defined differently in probabilistic combinatorial cases Algorithms Combinatorial CONSENSUS TEIRESIAS SP STAR others Probabilistic 1 Expectation Maximization MEME 2 Gibbs Sampling AlignACE BioProspector Combinatorial Approaches to Motif Finding Discrete Formulations Given sequences S x1 xn A motif W is a consensus string w1 wK Find motif W with best match to x1 xn Definition of best d W xi min hamming dist between W and any word in xi d W S i d W xi Approaches Exhaustive Searches CONSENSUS MULTIPROFILER TEIRESIAS SP STAR WINNOWER Exhaustive Searches 1 Pattern driven algorithm For W AA A to TT T Find d W S Report W argmin d W S 4K possibilities Running time O K N 4K where N i xi Advantage Finds provably best motif W Disadvantage Time Exhaustive Searches 2 Sample driven algorithm For W any K long word occurring in some xi Find d W S Report W argmin d W S or Report a local improvement of W Running time O K N2 Advantage Time Disadvantage If the true motif is weak and does not occur in data then a random motif may score better than any true motif instance of CONSENSUS Algorithm Cycle 1 For each word W in S For each word W in S Create alignment gap free of W W of fixed length Keep the C1 best alignments A1 AC1 ACGGTTG ACGCCTG CGAACTT AGAACTA GGGCTCT GGGGTGT CONSENSUS Algorithm Cycle t For each word W in S For each alignment Aj from cycle t 1 Create alignment gap free of W A j Keep the Cl best alignments A1 ACt ACGGTTG ACGCCTG ACGGCTC CGAACTT AGAACTA GGGCTCT GGGGTGT AGATCTT GGCGTCT CONSENSUS C1 Cn are user defined heuristic constants N is sum of sequence lengths n is the number of sequences Running time O N2 O N C1 O N C2 O N Cn O N2 NCtotal Where Ctotal i Ci typically O nC where C is a big constant MULTIPROFILER Extended sample driven approach Given a K long word W define N W words W in S s t d W W Idea Assume W is occurrence of true motif W Will use N W to correct errors in W MULTIPROFILER Assume W differs from true motif W in at most L positions Define A wordlet G of W is a L long pattern with blanks differing from W L is smaller than the word length K Example K 7 L 3 W G ACGTTGA A CG MULTIPROFILER Algorithm For each W in S For L 1 to Lmax 1 Find the neighbors of W in S 2 Find all strong L long wordlets G in Na W 3 For each wordlet G 1 Modify W by the wordlet G 2 Compute d W S Report W argmin d W S Step 1 above Smaller motif finding problem Use exhaustive search N W W Expectation Maximization in Motif Finding Expectation Maximization The MM algorithm part of MEME package uses Expectation Maximization Algorithm sketch 1 2 3 Given genomic sequences find all K long words Assume each word is motif or background Find likeliest Motif Model Background Model classification of words into either Motif or Background Expectation Maximization Given sequences x1 xN Find all k long words X1 Xn Define motif model M M1 MK Mi Mi1 Mi4 assume A C G T where Mij Prob letter j occurs in motif position i Define background model B B1 B4 Bi Prob letter j in background sequence Expectation Maximization Define Zi1 1 if Xi is motif 0 otherwise Zi2 0 if Xi is motif 1 otherwise Given a word Xi x 1 x k P Xi Zi1 1 M1x 1 Mkx k P Xi Zi2 1 1 Bx 1 Bx K Let 1 2 1 Expectation Maximization Define Parameter space M B 1 Motif 2 Background Objective Maximize log likelihood of model n 2 log P X 1 X n Z Z ij log j P X i j i 1 j 1 n i 1 2 Z j 1 n ij log P X i j i 1 2 Z j 1 ij log j Expectation Maximization Maximize expected likelihood in iteration of two steps Expectation Find expected value of log likelihood E log P X 1 X n Z Maximization Maximize expected value over Expectation Maximization E step Expectation Find expected value of log likelihood E log P X 1 X n Z n 2 E Z i 1 j 1 n ij log P X i j i 1 2 E Z j 1 ij log j where expected values of Z can be computed as follows E Z ij j P X i j 2 k 1 k P X i k Expectation Maximization M step Maximization Maximize expected value over and independently For this is easy NEW j n n i 1 i 1 arg m a x E Z ij log j j Z ij n Expectation Maximization M step For M B define cjk E times letter k appears in motif position j c0k E times letter k appears in background cij values are calculated easily from E Z values It easily follows M NEW jk c jk 4 k 1 c jk NEW k B to not allow any 0 s add pseudocounts c0 k 4 c k 1 0 k Initial Parameters Matter Consider the following artificial example x1 xN contain 212 patterns on A T A A A AT T T 212 patterns on C G C C C CG G G D 212 occurrences of 12 mer ACTGACTGACTG Some local maxima B C G Mi A T i 1 12 D 2k 1 B A C G T M1 100 A M2 100 C M3 100 T etc Overview of EM Algorithm 1 Initialize parameters M B Try different values of from N 1 2 up to 1 2K 2 Repeat a Expectation b Maximization 3 Until change in M B falls below 4 Report results for several good Overview of EM Algorithm One iteration running time O NK Usually need N iterations …


View Full Document

Stanford CS 262 - Lecture 16 Gene Regulation and Microarrays

Documents in this Course
Lecture 8

Lecture 8

38 pages

Lecture 7

Lecture 7

27 pages

Lecture 4

Lecture 4

12 pages

Lecture 1

Lecture 1

11 pages

Biology

Biology

54 pages

Lecture 7

Lecture 7

45 pages

Load more
Download Lecture 16 Gene Regulation and Microarrays
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture 16 Gene Regulation and Microarrays and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture 16 Gene Regulation and Microarrays 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?