DOC PREVIEW
Berkeley STATISTICS 246 - Methods for the discovery of cis-regulatory modules

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Methods for the discovery ofcis-regulatory modules, 3Comparative genomicsStatistics 246 Spring 2006Week 15 Lecture 1Introduction In this lecture I‘ll discuss three papers which seek to find TFBSsusing evolutionary conservation. When we look for these inclusters, it is another way to find cis-regulatory modules. The first paper uses an algorithm that its authors call the BayesBlock Aligne (BBA). I will not describe this algorithm in detail fortwo reasons. First, it is a relatively straightforward variant onthe Gibbs sampler that we discussed last week, and details canbe found in Zhu et al (1998) referred to there. Second, it is nolonger a method anyone would use for identifying evolutionarilyconserved genomic sequence, cf. PhastCons later in thislecture. However, this first paper is of historical interest, asbeing typical of several from around 2000.Basic Idea: Motifs are sensitive, but not very specific, i.e. falsepositive rates are high when we try to predict them. We attemptto increase specificity by requiring conservation across species. A caveat. TF binding sites are now thought to evolve much fasterthan coding sequences: variation in these sites is currentlythought to explain most differences between e.g. humans andchimpanzees, rats or mice. (Of course this is just a thought, notan established truth.)Result of using the Bayes Block Aligner, 1 Here we show the probability of alignment for the sequenceflanking the 5’ end of the first exon of natriuretic propeptide(NPPA). The two-dimensional histogram output of the BBA indicates theprobability that any given base j in the human sequence aligns toany given base k in the rat sequence. In the next slide we sumover k.Result of using the Bayes Block Aligner, 2 Probability that a nucleotide in the human NPPA 5’ flankingsequence aligns to any nucleotide in the rat sequence.A broader comparison, 1 The authors next compared 28 orthologous gene pairs that arespecifically upregulated in skeletal-muscle, and for which therewas (in 2000) considerable genomic sequence available. Forthe gene names, see the paper. A set of 99 experimentallydefined binding sites exist in their data, including 24 Sp1 sites(G/C-rich sites), and 75 sequence-specific sites, characterizedas follows: (i) Myogen family (Myf) sites (E-boxes); (ii) Mef2 sites (A/T rich sequences); (iii) SRF sites (CArG boxes); (iv) Tef sites (MCAT boxes); and (v) other experimentally defined but incompletely characterizedsites.In the next 3 slides, we summarize their results with this data set.A broader comparison, 2Comment: data like this now exists on a genome-wide basis.A broader comparison, 3Conservation of genomic sequence between human and rodents foralignments where lengthrodent/lengthhuman ≥ 0.5. Fraction of human nucleotidesIdentified as conserved in the comparison of human and rodent sequences.74/75 of skeletal-muscle-specific TF binding sites are confined to the 19%of human sequences that are most conserved in mouse.A broader comparison, 4Finding TFBSs in unaligned regulatorysequence from coregulated genes Last week we saw how the Gibbs sampler might help inidentifying motifs from unaligned sequence from sets ofco-expressed genes arising from microarray experiments.We did not show this, but it works for yeast, where thesearch space is typically 200-500 bp from the TSS. It does not work for multicellular organisms, where therelevant regions are typically a few kbp, and can be 10s or100s of kbp from the TSS. A reduction by factor of 5 of the search space throughphylogenetic footprinting from human-mouse is thereforehelpful. Greater reductions are possible using conservationacross more species, but of course this comes at a price.IIlustration using the skeletal muscle genes1. Apply the Gibbs sampler to contextual non-coding sequence of many kbparound the human genes: find meaningless patterns.2. Apply it to just the 19% of sequence found by phylogenetic footprinting:find many of the sites we know are there, and get their motifs.←Tree based on genic sequence← Tree based on intergenic sequenceBranch lengths = av # of substitutions/siteComparative analysis of the genome sequence of four yeast spp.The data: ~12,000 multiple alignmentsThere are many uses of these dataStudying large-scale genome evolution, e.g. at telomeresStudying genome-evolution at the nucleotide level, e.g.different rates within and between ORFs (cf. the previouspage), and in different branches of the tree.Gene identificationMapping gene structure (start, stop, introns, etc.)Identifying rapidly and slowly evolving genesGenome-wide identification of regulatory elementsGal4 binding sitesKnown motifs: used to calibrate a Motif Conservation ScoreMethod for genome-wide motif discovery1. Find seeds ("mini-motifs") of the form UVWn(0-21)XYZ. Thesecan be enumerated, coming to 45,760 if reverse complements aregrouped together.2. A mini-motif is called conserved if (details omitted)(i) the conservation score is significantly high in intergenic regions(ii) The conservation score is higher in intergenic regions than ingenic regions(iii) The conservation rate is different upstream and downstreamof genes.3. Conserved mini-motifs are then extended to full motifs bylooking for correlated bases, clustering and merging. Again, thedetails are omitted. Motifs are then scored in various ways, mostimportantly, via the previously defined motif conservation score(MCS), using the previously calibrated threshold.18The paper lists 72 “discovered“ motifs As it would take up too much space, I won’t list the 72discovered motifs. These were obtained by applying athreshold to the MCS and removing overlaps with knownmotifs, leaving 42 new motifs. Attempts were then made to assign candidate functions tothese new motifs using the categories for genes containingthem in regulatory regions, based on a collection ofexperimental data. Again, details are omitted. For us, perhaps the most interesting next step was their briefdiscussion of combinatorial control, and we now turn to that.Combinatorial controlThe authors searched for motifs that occur together in the sameintergenic region much more frequently than would be expectedby chance. This is an interesting and important statisticalproblem, one far from completely solved.They state that with a single genome, few signficant correlationsare found. The basis for this statement isn‘t clear, though ofcourse it sounds


View Full Document

Berkeley STATISTICS 246 - Methods for the discovery of cis-regulatory modules

Documents in this Course
Meiosis

Meiosis

46 pages

Meiosis

Meiosis

47 pages

Load more
Download Methods for the discovery of cis-regulatory modules
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Methods for the discovery of cis-regulatory modules and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Methods for the discovery of cis-regulatory modules 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?