Regulatory Motif Finding (II)OverviewBiology of MotifsSlide 4Slide 5Slide 6Slide 7Slide 8Slide 9Why motifs?Recap #1Motif Finding OverviewMotif ShootoutSlide 14Slide 15Slide 16Motifs via Functional GenomicsSlide 18Slide 19Motifs via Phylogenetic FootprintingSlide 21Slide 22Slide 23RecapIntegrated Motif FindingSlide 26Slide 27Slide 28Open QuestionsConclusionRegulatory Motif Finding (II)Balaji S. SrinivasanCS 374Lecture 18 12/6/2005OverviewBiology of DNA binding motifsWhy motifs?Overview of motif finding algorithmsOpen problems in this areaBiology of MotifsFrom last time…Biology of MotifsFrom last time…Biology of MotifsGiven transcription factor (TF) of fixed sequence…binding affected bysecondary, tertiary structure of DNAmethylation stateDNA binding motifsBiology of MotifsDNA Motifs (regulatory elements)Binding sites for proteinsShort sequences (5-25)Up to 1000 bp (or farther) from geneInexactly repeating patternsBiology of MotifsTF binding affected bysecondary, tertiary structure of DNAmethylation stateDNA binding motifsShould be on your radar…motifs frontier of research why? sequence data existsstatic, not dynamicdynamic chromosome:accessibility affectstranscription…dynamic epigenome(methylation state)Biology of MotifsProkaryotesfewer TFslong motifsaffinity dep on matchEukaryotes (HARD)more TFs per geneshorter motifsMUCH more noncoding seqregulatory moduleslong range effectsproks:immediateupstream regeuks:long range regulationBiology of MotifsTranscription Factorsoften dimer, tetramer: palindromic binding sitebindingstochasticaffinity = structural/sequence matchhigh affinity not always desirablecombinatorial regulation (esp. eukaryotes)order important!site spacing important!Why motifs?Given: all TF/motif pairsGet: global genetic regulatory networkmicrobialeukaryoticRecap #1To figure out transcriptional control…find transcription factor binding sitesEukaryotes: hard b/cmuch more noncoding sequenceshorter motifslonger range interactionsMotif Finding OverviewMethods1 genomesequence overrepresentation (NBT shootout, not good)Functional Genomicspredict regulons (Segal, etc.)N genomes phylogenetic footprinting (Kellis, etc.)N genomes + Func GenomicsPhylocon (Tompa)New ideas…Motif ShootoutNature Biotech Jan. 200513 way shootoutdisappointing resultsUseful in thatshows importance of using all infobenchmarking is clearly trouble areaMotif ShootoutMotif ShootoutConceptuallyload FASTA hopper of intergenic sequence from 1 genome into black boxoutput: motif matrices But…how to pick sequences?comparison?functional clustering?benchmarking? upstreamsMotif ShootoutBut…how to pick sequences?comparison?functional clustering?benchmarking? Sonot as useful as it seems…huge, artificial limitations“consider a spherical cow”What if limitations removed?Motifs via Functional GenomicsCoexpressionmost popular (e.g. Segal 2003)Functional clusteringthen hunt upstreamMotifs via Functional GenomicsChip/CHIPkey idea: assay DNA segments where TF bindsdirect test of motif binding (e.g. Laub 2002)Disadvantagesone TF at a timeneed an antibody!Motifs via Functional GenomicsCoinheritance, etc.predict regulons, then look upstreamheuristic network integrationwill return to this pointdecent signal in prokaryotes (Manson-Mcguire 2001)Motifs via Phylogenetic FootprintingKey ideafunctional sequence evolves more slowlyconservation hierarchyultraconserved NC elems (Bejerano & Haussler 2004)proteins, ncRNAsDNA binding motifsunconstrained, neutrally drifting regionsno conservationultraconservedMotifs via Phylogenetic FootprintingPhylogenetic footprint“footprint” is conservationsimple versionmultiple alignment of orthologous upstream regionsProblem: nonfunctional sequence drifts rapidlymultiple align difficult if only small % conservedprotein twilight zone: 30% identity nucleic acids upstream regions: often much less…Motifs via Phylogenetic FootprintingPhylogenetic FootprintProblem: multiple alignment of upstreams hits twilight zoneOne solutionsearch for parsimonious substrings…without direct alignment (Blanchette 2003)Motifs via Phylogenetic FootprintingMultiple genome alignment can workneed close enough species Kellis 2003 (four yeasts, genome alignments)Xie 2005 (“four” mammals, genome alignment)Discussed last timeKey pointsGenome wide searchMotif Conservation Score: null model based testRecapMany programs for motif searchmost are useless!Lesson: must use comparative genomics (e.g. alignment)…or functional genomics (e.g. expression)what about both together??Integrated Motif FindingRecallcomparative genomicsone upstream region in N speciesfunctional genomicsN upstream regions in one speciesPhylocon (Tompa 2003)N upstreams in N speciesIntegrated Motif FindingPhylocongiven N speciesalign upstream regionskey idea: align the alignmentsBoosts sensitivityLEU3 hard to find…Integrated Motif FindingBoosts sensitivityLEU3 hard to find…but align the alignmentstrue motif pops out!Integrated Motif FindingImportant featuresno prior motif length reqd.profile approach matches distribution, not sample (robust to subs)several alignments for each upstream are OKdoes well vs. real data…ALLR (avg. log. like. ratio)Q: are 2 profile columns samples from same distribution?if so, that may be a matching motif position…Open QuestionsPhylocon is strong step in right direction…align the alignmentsBut how do we…choose species?choose upstreams?validate motifs? find TF/motif pairs?ConclusionMotifs important static, tractable, impt. want: genetic regulatory networksMotif finder selectionDon’t: use 1 genome w/o comparison or func. genomicsDo: use alignment & func genomicsPhylocon (Tompa), MCS (Kellis)best to date b/c use N genes and M
View Full Document