DOC PREVIEW
Stanford CS 374 - Study Notes

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Discovery of Regulatory Elementsby a Computational Methodfor Phylogenetic FootprintingMathieu Blanchette and Martin Tompa1Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195-2350, USAPhylogenetic footprinting is a method for the discovery of regulatory elements in a set of orthologousregulatory regions from multiple species. It does so by identifying the best conserved motifs in thoseorthologous regions. We describe a computer algorithm designed specifically for this purpose, making use of thephylogenetic relationships among the sequences under study to make more accurate predictions. The program isguaranteed to report all sets of motifs with the lowest parsimony scores, calculated with respect to thephylogenetic tree relating the input species. We report the results of this algorithm on several data sets ofinterest. A large number of known functional binding sites are identified by our method, but we also findseveral highly conserved motifs for which no function is yet known.One of the great challenges currently facing biologists is tounderstand the varied and complex mechanisms that regulategene expression. We focus on one important aspect of thischallenge, the identification of binding sites for the factorsinvolved in such regulation.A number of computer algorithms have been proposedfor the discovery of novel regulatory elements in nucleotidesequences. Most of these try to deduce the regulatory ele-ments by considering the regulatory regions of several (puta-tively) coregulated genes from a single genome. Such algo-rithms search for overrepresented motifs in this collection ofregulatory regions, these motifs being good candidates forregulatory elements. Examples of this approach include vanHelden et al. (1998), Hertz and Stormo (1999), Hughes et al.(2000), Sinha and Tompa (2000), and Workman and Stormo(2000).We adopt an orthogonal approach of deducing regula-tory elements by considering orthologous regulatory regionsof a single gene from several species. This approach is called“phylogenetic footprinting” (Tagle et al. 1988). The simplepremise underlying phylogenetic footprinting is that selectivepressure causes functional elements to evolve at a slower ratethan that of nonfunctional sequences. This means that un-usually well conserved sites among a set of orthologous regu-latory regions are excellent candidates for functional regula-tory elements. This approach has proved successful for thediscovery of regulatory elements for many genes, including␧-globin (Tagle et al. 1988; Gumucio et al. 1993), ␥-globin(Tagle et al. 1988), rbcL (Manen et al. 1994), cystic fibrosistransmembrane conductance regulator (Vuillaumier et al. 1997),tumor necrosis factor-␣ (Leung et al. 2000), and interleukin (IL)-4, IL-13, and IL-5 (Loots et al. 2000). See the review by Duretand Bucher (1997) for more details. The same idea of usingcomparative analysis to identify conserved elements, butamong only two or three species (particularly human andmouse), has recently become popular (Hardison et al. 1997;Jareborg et al. 1999; Dubchak et al. 2000; Wasserman et al.2000; Mouchel et al. 2001; Wu et al. 2001).The major advantage of phylogenetic footprinting overthe single genome, multigene approach mentioned earlier isthat the latter requires a reliable method for assembling therequisite collection of coregulated genes. In contrast, phylo-genetic footprinting is capable of identifying regulatory ele-ments specific even to a single gene, as long as they are suf-ficiently conserved across many of the species considered. Ge-nome projects are quickly producing sequences from a widevariety of organisms, so the data necessary for phylogeneticfootprinting are becoming increasingly available.The standard method that has been used for phyloge-netic footprinting is to construct a global multiple alignmentof the orthologous regulatory sequences and then to identifyconserved regions in the alignment. A tool such as CLUSTALW(Thompson et al. 1994) is appropriate for this purpose, as itcan take advantage of knowledge of the phylogeny relatingthe species. To see why this approach to phylogenetic foot-printing does not always work, consider typical lengths of thesequences involved. Regulatory elements tend to be quiteshort (5 to 20 nucleotides long) relative to the entire regula-tory region in which we search for them (a 1000-bp promoterregion would be typical). Given these relative lengths, if thespecies are somewhat diverged, it is likely that the noise of thediverged nonfunctional background will overcome the shortconserved signal. The result is that the alignment may notalign the short regulatory elements together. In that case, theregulatory elements would not appear to belong to conservedregions and would go undetected. Thus, when the entire regu-latory regions considered are moderately to highly diverged,global multiple alignment is likely to miss significant signals.Cliften et al. (2001) made similar observations in con-junction with their comparative analysis of several Saccharo-myces species. They discovered that if the species are tooclosely related, the sequence alignment is obvious but unin-formative, because the functional elements are not suffi-ciently better conserved than the surrounding nonfunctionalsequence. On the other hand, if the species are too distantlyrelated, it is difficult or impossible to find an accurate align-ment (for discussion of these issues, see Tompa 2001).1Corresponding author.E-MAIL [email protected]; FAX (206) 543-8331.Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.6902.Letter12:739–748 ©2002 by Cold Spring Harbor Laboratory Press ISSN 1088-9051/01 $5.00; www.genome.org Genome Research 739www.genome.orgRather than relying on multiple alignment, a more suc-cessful recent approach to phylogenetic footprinting is to useone of the existing motif discovery programs—such as MEME(Bailey and Elkan 1995), Projection (Buhler and Tompa2001), Consensus (Hertz and Stormo 1999), AlignAce (Rothet al. 1998), or ANN-Spec (Workman and Stormo 2000)—orthe segment-based multiple alignment program DIALIGN(Morgenstern et al. 1998, Morgenstern 1999). Cliften et al.(2001), f or instance, reported some successes using AlignAcewhen global multiple alignment tools failed. Another ex-ample of this approach is the work of McCue et al. (2001),who used a Gibbs sampler to perform phylogenetic footprint-ing in bacterial


View Full Document

Stanford CS 374 - Study Notes

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?