Regulatory Motif FindingOutlineRegulation of GenesOverview of Gene ControlTranscriptional RegulationSlide 6Slide 7Slide 8Slide 9What is a motif?daf-19 Binding Sites in C. elegansMotif RepresentingMotif Logos: an ExampleMeasure of ConservationSlide 15Finding Regulatory MotifsIdentifying Motifs: ComplicationsCurrent Motif Discovery MethodsMotif Finding: Comparative AnalysisMotif Discovery ProcedureAlignment of promoters & 3’ UTRsMotif Conservation Score (MCS)MCSConservation Properties of Regulatory MotifsSlide 25Results: motifs in promotersResults: motifs in 3’ UTRsProperty1: strand specificityProperty2Properties => miRNAThe microRNA pathwayRelationship with miRNA8-mer motifs ->new miRNA genesPrevalence of miRNA regulationSummary: comparative genome analysisNow…Motif Finding: Structural KnowledgeStructure-based approachCys2His2 Zinc Finger protein familyCys2His2 Zinc Finger: Canonical DNA binding modelCys2His2 Zinc Finger: DNA Binding ModelCys2His2 Zinc Finger: Compiling datasetProfile HMMExample: full profile HMMSlide 45Cys2His2 Zinc Finger: Probabilistic ModelEM algorithmEstimate DNA-recognition preferencesApply on TFs from the same familyEvaluationCompare with other preferencesSummaryDiscussionThank you!Regulatory Motif FindingWenxiu MaCS374 Presentation11/03/20052OutlineRegulation of genesRegulatory MotifsMotif RepresentationCurrent Motif Discovery Methods3Regulation of Genes What turns genes on (producing a protein) and off? When is a gene turned on or off? Where (in which cells) is a gene turned on? How many copies of the gene product are produced?4Overview of Gene ControlThe mechanisms that control the expression of genes operate at many levels.source: Molecular Biology of the Cell (4th ed.), A. Johnson, et al.5Transcriptional RegulationThe transcription of each gene is controlled by a regulatory region of DNA relatively near the transcription start site (TSS). two types of fundamental componentsshort DNA regulatory elementsgene regulatory proteins that recognize and bind to them.6Regulation of GenesGeneRegulatory ElementRNA polymerase(Protein)Transcription Factor(Protein)DNAsource: M. Tompa, U. of Washington7Regulation of GenesGeneRNA polymeraseTranscription Factor(Protein)Regulatory ElementDNAsource: M. Tompa, U. of Washington8Regulation of GenesGeneRNA polymeraseTranscription FactorRegulatory ElementDNANew proteinsource: M. Tompa, U. of Washington9OutlineRegulation of genesRegulatory MotifsMotif RepresentationCurrent Motif Discovery Methods10What is a motif?A subsequence (substring) that occurs in multiple sequences with a biological importance.Motifs can be totally constant or have variable elements.Protein Motifs often result from structural features.DNA Motifs (regulatory elements)Binding sites for proteinsShort sequences (5-25)Up to 1000 bp (or farther) from geneInexactly repeating patterns11daf-19 Binding Sites in C. elegansGTTGTCATGGTGACGTTTCCATGGAAACGCTACCATGGCAACGTTACCATAGTAACGTTTCCATGGTAAC che-2 daf-19 osm-1 osm-6 F02D8.3-150 -1source: Peter Swoboda12Motif RepresentingConsensus sequence: a single string with the most likely sequence(+/- wildcards)Regular expression: a string with wildcards, constrained selectionProfile: a list of the letter frequencies at each positionSequence Logo: graphical depiction of a profile conservation of elements in a motif.13Motif Logos: an Example(http://www-lmmb.ncifcrf.gov/~toms/sequencelogo.html)14Measure of ConservationRelative heights of letters reflect their abundance in the alignment.Total height = entropy-based measurement of conservation.Entropy(i) = -SUM { f(base, i)* ln[f(base, i)] }over all basesConservation(i) = 2- Entropy(i)Units of conservation = bits of informationEntropy measures variability/disorder.High conserved = low entropy = tall stackVery variable = high entropy = low stack15OutlineRegulation of genesRegulatory MotifsMotif RepresentationCurrent Motif Discovery Methods16Finding Regulatory MotifsGiven a collection of genes with common expression,Find the (TF-binding) motif in common...17Identifying Motifs: ComplicationsWe do not know the motif sequenceWe do not know where it is located relative to the genes startMotifs can differ slightly from one gene to anotherHow to discern it from “random” motifs?18Current Motif Discovery MethodsGOAL: comprehensive identification of all the regulatory motifs in genomes.by overrepresentationMEME, Gibbs samplingby phylogenetic footprintingFootprinterCross species comparative analysisCombine structure information19Motif Finding: Comparative AnalysisSystematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals. Xie, X. et al., Nature (2005).Identify motifs based on comparative analysis of human, mouse, rat and dog genomesA systematic catalogue of human gene regulatory motifsShort, functional sequences (6-10bp) used many times in a genomeFocus regionsPromoters3’ untranslated regions (3’ UTRs)microRNAs (miRNAs)post-transcriptional regulation20Motif Discovery ProcedureAlignment of promoters & 3’ UTRsMotif conservation score (MCS) Measure the extent of excess conservation“Highly conserved motifs”MCS>6Clustering21Alignment of promoters & 3’ UTRsconstruct a whole-genome alignment for the four mammalian genomes Blastz1 and Multiz2Extract the aligned promoter and 3’ UTRs portions respectively.Coordinates: the annotation of NCBI reference sequences (RefSeq)22Motif Conservation Score (MCS)Consensus sequence representationAlphabet size: 11 (A,C,G,T,[AC], [AG], [AT], [CG], [CT], [GT], [ACGT])conserved occurrence of a motif m is an instance in which an exact match to this motif is found in all four species.conservation rate p = ratio of conserved occurrences to total occurrences in humanExpected conservation rate p0 = avg. conservation rate of 100 random motifs,given same length and redundancy.23MCSMCS = # of s.d. by which the observed conservation rate of a motif p exceeds the expected conservation rate p0.p = k/nBinomial probability of observing k out of nEstimated by way of Normal approximation to the binomial Dist. 0 0!( ) (1 )!( )!k n knP k out of n p pk n k-= --0 0 0( ), , (1 )kz where np np pmm ss-= = = -24Conservation Properties of Regulatory MotifsKnown 8-mer
View Full Document