DOC PREVIEW
U of I CS 498 - Motif finding

This preview shows page 1-2-16-17-18-33-34 out of 34 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 34 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Motif finding: Lecture 1From DNA to Protein: In wordsGene expressionTranscriptionStep 1: From DNA to mRNATranscriptional regulationSlide 7The importance of gene regulationPowerPoint PresentationSlide 10Binding sites and motifsBinding sitesSlide 13Slide 14Slide 15MotifSlide 17Motif representationThe motif finding problemSlide 20A variant of motif findingBinding sites from a weight matrix motifSlide 23Ab initio motif findingSlide 25Ab initio motif finding - consensus string motifsSlide 27Slide 28Slide 29Ab initio motif finding - PWM motifsGibbs sampling: The search spaceGibbs sampling: algorithmSlide 33Slide 34Motif finding: Lecture 1CS 498 CXZFrom DNA to Protein: In words1. DNA = nucleotide sequence •Alphabet size = 4 (A,C,G,T)2. DNA  mRNA (single stranded)•Alphabet size = 4 (A,C,G,U)3. mRNA  amino acid sequence•Alphabet size = 204. Amino acid sequence “folds” into 3-dimensional molecule called proteinAATACGAAGTAAAAUACGAAGUAAAsn Thr Lys StopGene expression•Process of making a protein from a gene as template•Transcription, then translation•Can be regulatedTranscription•Process of making a single stranded mRNA using double stranded DNA as template•Only genes are transcribed, not all DNAStep 1: From DNA to mRNATranscriptionSOURCE: http://academy.d20.co.edu/kadets/lundberg/DNA_animations/rna.dcrGENEACAGTGATRANSCRIPTIONFAC TORPROTEINTranscriptional regulationGENEACAGTGATRANSCRIPTIONFAC TORPROTEINTranscriptional regulationThe importance of gene regulationQuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.Genetic regulatory network controlling the development of the body plan of the sea urchin embryoDavidson et al., Science, 295(5560):1669-1678.•That was the “circuit” responsible for development of the sea urchin embryo•Nodes = genes•Switches = gene regulation•Change the switches and the circuit changes•Gene regulation significance:–Development of an organism–Functioning of the organism–Evolution of organismsBinding sites and motifsBinding sites•Binding sites of transcription factor “Bicoid”, collected experimentallyQuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.http://webdisk.berkeley.edu/~dap5/data_04/motifs/bicoid.gifQuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.http://webdisk.berkeley.edu/~dap5/data_04/motifs/bicoid.gif T A A T C C CMotif (“Consensus String”)QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.http://webdisk.berkeley.edu/~dap5/data_04/motifs/bicoid.gif W A A T C C NMotifW = T or AN = A,C,G,TMotif•Common sequence “pattern” in the binding sites of a transcription factor•A succinct way of capturing variability among the binding sitesQuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.1 1 9 9 0 0 0 1A6 0 0 0 0 9 8 7 C1 0 0 0 1 0 0 1 G1 8 0 0 8 0 1 0 TAlternative way to represent motifPosition weight matrix (PWM)Or simply, “weight matrix”Motif representation•Consensus string–May allow “degenerate” symbols in string, e.g., N = A/C/G/T; W = A/T; S = C/G; R = A/G; Y = T/C etc.•Position weight matrix –More powerful representation–Probabilistic treatmentThe motif finding problem•Suppose a transcription factor (TF) controls five different genes•Each of the five genes should have binding sites for TF in their promoter regionGene 1Gene 2Gene 3Gene 4Gene 5Binding sites for TFThe motif finding problem•Now suppose we are given the promoter regions of the five genes G1, G2, … G5•Can we find the binding sites of TF, without knowing about them a priori ?–Binding sites are similar to each other, but not necessarily identical•This is the motif finding problem•To find a motif that represents binding sites of an unknown TFA variant of motif finding•Given a motif (e.g., consensus string, or weight matrix), find the binding sites in an input sequence•For consensus string, problem is trivial–For each position l in input sequence, check if substring starting at position l matches the motif. •For weight matrix, not so trivialBinding sites from a weight matrix motif1 1 9 9 0 0 0 1A6 0 0 0 0 9 8 7 C1 0 0 0 1 0 0 1 G1 8 0 0 8 0 1 0 TW.11 .11 1 1 0 0 0 .11A.67 0 0 0 0 1 .89 .78 C.11 0 0 0 .11 0 0 .11 G.11 .89 0 0 .89 0 .11 0 TCounts of each baseIn each columnProbability of each baseIn each columnWk = probability of base  in column k • Given a string s of length l = 7• s = s1s2…sl• Pr(s | W) = • Example: Pr(CTAATCCG) = 0.67 x 0.89 x 1 x 1 x 0.89x 1 x 0.89 x 0.11€ k∏WskkBinding sites from a weight matrix motif•Given sequence S (e.g., 1000 base-pairs long)•For each substring s of S, –Compute Pr(s|W)–If Pr(s|W) > some threshold, call that a binding site•Look at S, as well as its “reverse complement”–Rev.Compl. of AGTTACACCA is TGGTGTAACT–(That’s what is on the other strand of DNA)Ab initio motif finding•The original motif finding problem •To find a motif that represents binding sites of an unknown TFAb initio motif finding•Define a motif score, find the motif with maximum score over all possible motifs in search space (motif model)•Consensus string model => exhaustive search algorithm, guarantee on finding the optimal motif•PWM model => local search, not guaranteed to find optimal motif.Ab initio motif finding - consensus string motifs•A precise motif model defines the search space (I.e., a list of all candidate motifs).•The motif model also prescribes exactly how to determine if a substring is a match to a particular motif. •Define motif model preciselyAb initio motif finding - consensus string motifs•E.g., string over alphabet {A,C,G,T} of fixed length l. If l = 4, all 256 strings AAAA, AAAT, AAAC, …, TTTT, are “candidate motifs”.•E.g., string over alphabet {A,C,G,T} of fixed length l, and allowing up to d mismatches. If AAAA is a motif, and d=1, then AAAT, AATA etc. are also counted as matches to motif.•E.g., string over extended alphabet {A,C,G,T,N} of fixed length l. Here “N” stands for any character (A,C,G,or T.)–If AANAA is the motif, then AACAA, AAGAA, AATAA or AAAAA are all counted as matches to this motif.Ab initio motif finding - consensus string motifs•Define a motif score, i.e., a real number associated with each candidate motif, in relation to the input sequences.•E.g., count Ns of a motif s in input sequences(s).•E.g., some function


View Full Document

U of I CS 498 - Motif finding

Documents in this Course
Lecture 5

Lecture 5

13 pages

LECTURE

LECTURE

39 pages

Assurance

Assurance

44 pages

LECTURE

LECTURE

36 pages

Pthreads

Pthreads

29 pages

Load more
Download Motif finding
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Motif finding and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Motif finding 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?