DOC PREVIEW
Stanford CS 374 - Study Notes

This preview shows page 1-2-14-15-29-30 out of 30 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 30 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

Discovery of Regulatory Elements by a Computational Method for Phylogenetic FootprintingABSTRACTINTRODUCTIONRESULTSMetallothionein gene familyThe metallothionein gene family is particularly well suited to exhibit the merits of our approach, as a large number of promoter sequences are available from a wide variety of species, the phylogenetic relationships among these sequence have been studied, and a large number of regulatory elements have been experimentally determined in several species. Notice that although we described phylogenetic footprinting as applied to orthologous sequences, the approach applies equally well to paralogous sequences, where two sequences diverged because of duplication rather than speciation, as long as the gene family tree is known.Insulin gene familyInterleukin-3C-myc promoterC-fos promoterC-fos first intronDISCUSSIONKnown vs. Predicted Binding SitesPhylogenetic informationImproving accuracyMETHODSOther useful parametersACKNOWLEDGMENTSREFERENCESDiscovery of Regulatory Elements by a Computational Method for Phylogenetic FootprintingMathieu Blanchette and Martin Tompa 1Department of Computer Science and Engineering, Box 352350, University of Washington, Seattle, WA 98195-2350 USARunning title: Computational Method for Phylogenetic FootprintingKeywords: Phylogenetic footprinting, regulatory element, motif, parsimonyABSTRACTPhylogenetic footprinting is a method for the discovery of regulatory elements in a set of orthologous regulatory regions from multiple species. It does so by identifying the best conserved motifs in those orthologous regions. We describe a computer algorithm designed specifically for this purpose, making use of the phylogenetic relationships among the sequences under study to make more accurate predictions. The program is guaranteed to report all sets of motifs with the lowest parsimony scores, calculated with respect to the phylogenetic tree relating the input species. We report the results of this algorithm on several data sets of interest. A large number of known functional binding sites are identified by our method, but we also find several highly conserved motifs for which no function is yet known. The program is available at http://bio.cs.washington.edu/software.html.1 Corresponding authorE-MAIL [email protected]; fax: 206-543-8331Blanchette 1INTRODUCTIONOne of the great challenges currently facing biologists is to understand the varied and complex mechanisms that regulate gene expression. We focus on one important aspect of this challenge, the identification of binding sites for the factors involved in such regulation. A number of computer algorithms have been proposed for the discovery of novel regulatory elements in nucleotide sequences. Most of these try to deduce the regulatory elements by considering the regulatory regions of several (putatively) coregulated genes from a single genome. Such algorithms search for overrepresented motifs in this collection of regulatory regions, these motifs being good candidates for regulatory elements. Examples of this approach include Hertz and Stormo (1999), Hughes et al. (2000), Sinha and Tompa (2000), van Helden et al. (1998), and Workman and Stormo (2000).We adopt an orthogonal approach of deducing regulatory elements by considering orthologous regulatory regions of a single gene from several species. This approach is called “phylogenetic footprinting” (Tagle et al. 1988). The simple premise underlying phylogenetic footprinting is that selective pressure causes functional elements to evolve at a slower rate than nonfunctional sequences. This means that unusually well conserved sites among a set of orthologous regulatory regions are excellent candidates for functional regulatory elements. This approach has proved successful for the discovery of regulatory elements for many genes, including ε-globin (Tagle et al. 1988, Gumucio et al. 1993), γ-globin (Tagle et al. 1988), rbcL (Manen et al. 1994), CFTR (Vuillaumier et al. 1997), TNF-α (Leung et al. 2000), and IL-4, IL-13, and IL-5 (Loots et al. 2000). See the review by Duret and Bucher (1997) for more details. The same idea of using comparative analysis to identify conserved elements, but among only two or three species (particularly human and mouse), has recently becomepopular; see, for example, (Hardison et al. 1997), (Jareborg et al. 1999), (Dubchak et al. 2000), (Wasserman et al. 2000), (Mouchel et al. 2001), and (Wu et al. 2001).Blanchette 2The major advantage of phylogenetic footprinting over the single genome, multigene approach mentioned earlier is that the latter requires a reliable method for assembling the requisite collection of coregulated genes. In contrast, phylogenetic footprinting is capable of identifying regulatory elements specific even to a single gene, as long as they are sufficiently conserved across many of the species considered. Genome projects are quickly producing sequences from a wide variety of organisms, so the data necessary for phylogenetic footprinting are becoming increasingly available.The standard method that has been used for phylogenetic footprinting is to construct a global multiple alignment of the orthologous regulatory sequences and then identify conserved regions in the alignment. A tool such as CLUSTALW (Thompson et al. 1994) is appropriate for this purpose, as it can take advantage of knowledge of the phylogeny relating the species. To see why this approach to phylogenetic footprinting does not always work, consider typical lengths of the sequences involved. Regulatory elements tend to be quite short(5-20 nucleotides long) relative to the entire regulatory region in which we search for them (a 1000 bp promoter region would be typical). Given these relative lengths, if the species are somewhat diverged it is likely that the noise of the diverged nonfunctional background will overcome the short conserved signal. The result is that the alignment may well not align the short regulatory elements together. In that case, the regulatory elements would not appear to belong to conserved regions and would go undetected. Thus, when the entire regulatory regions considered are moderately to highly diverged, global multiple alignment is likely to miss significant signals.Cliften et al. (2001) made similar observations in conjunction with their comparative analysis of several Saccharomyces species. They discovered that if the species are too closely related, the sequence


View Full Document

Stanford CS 374 - Study Notes

Documents in this Course
Probcons

Probcons

42 pages

ProtoMap

ProtoMap

19 pages

Lecture 3

Lecture 3

16 pages

Load more
Download Study Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Study Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Study Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?