Unformatted text preview:

eMOTIF Maker Nodally Awesome Comparing Results of eMOTIF Maker with Neighbor Joining Trees Biochemistry 218 Douglas L Brutlag Lee Kozar 6 6 02 Pei Hsien Ren pren stanford edu Phylogenetic trees are useful in determining the relationship among proteins and in grouping proteins into their correct family Protein families have been helpful in elucidating the function and structure of new protein members In principle the tree building programs that are distance based generate pairwise alignments of each sequence against all other sequences in the set The mutation distances between all pairs are the stored in a matrix Two taxa are joined as neighbors if the pair has the least mutational distance The optimal tree is finally generated after minimizing mutation distances at each step 1 A quite different program from that of tree building also makes use of alignment of closely related proteins eMOTIF maker takes these sequence alignments and returns a set of motifs with various degree of sensitivity and specificity This is a way to discover motifs that are conserved among a protein family 2 Using this group of motifs to perform a scan in the database returns hits that should include members used to generate that motif or additional members containing that motif The compiled result would vary according to the sensitivity and the specificity of the motif Because of the similarity of approach used in both tree building and motif building albeit for different purposes this project would seek to do a proof of concept experiment to investigate how well the results from these two programs match up Out of the distance based methods to build a tree neighbor joining proves to be very efficient in generating the best tree for large data set 3 In addition neighborjoining does not require the data to be ultrameteric and produces less biased tree when given sequence data that have unequal evolutionary rate 4 These characteristics of neighbor joining make it the suitable tree building method used in this experiment In order to make motifs out of related proteins ungapped alignment of their sequences must be generated so the alignment can be input into eMOTIF maker Block Maker program is used in this experiment to produce these alignments 5 Block Maker is chosen because of its ease of use and manipulation of formatted result The obtained blocks of sequence alignment are then put into eMOTIF maker and the result is a graphical representation of motif enumeration showing each motif positioned according to its specificity on the y axis and the number of training sequences it covers on the x axis 2 If one seeks for a motif that covers a certain number of sequences there is only one motif that can give the best specificity and that is the one that lies lowest on the y axis If one seeks for a motif with a certain specificity there is also only one motif that gives the best coverage and that is the one that lies on the most right on the x axis These dominating motifs can then be connected by a line called the Paretooptimal curve 2 Motifs lying on the Pareto optimal curve are then used for subsequent motif scan in this experiment Two training sets each with 100 sequences or more are used to compare the neighbor joining tree with the results of a motif scan Motif scan done with a motif with the highest specificity should have the least coverage and its hits should correspond to a small cluster under few nodes in the tree This small cluster would contain sequences that are the most related to each other Motif scan done with a low specificity motif should return high number of hits that correspond to sequences under more number of nodes One of the two training set is derived from the globin family heme containing orthologous proteins all of them are vertebrate proteins The set composes mostly of alpha chains beta chains and their variants with a few myoglobins See Appendix for a complete list of the set 6 The large number of alpha and beta chains should return trees with nodes where most if not all of the sequences clustered according to the type of chains The second training set is derived from a subfamily of the serine proteases family trypsin family with the serine active site The set composes of mast cell proteases trypsins and various forms of venom serine proteases See Appendix for a complete list of the set 7 This paralogous family has various proteins that although acts to cleave proteins do not share functions in the same context and would probably cluster according to their functions in context There has been some difficulty in choosing members of this training set because each protein member of the family has diverged much Even if the active sites are very similar the global sequence alignment is impossible because the majority of the sequences are too different form each other This training set attempts to include members that are very similar in both sequence and function within a subgroup but also to include three different divergent functions of serine proteases RESULTS With the orthologous set as the input the neighbor joining method produces a tree that places most of beta chains and their variants as the outgroup to alpha and myoglobin See Figure 1 All and only the myoglobins fall under one main node and so do the alpha chains The beta chains are dispersed and there is no one node under which all the beta chains fall and they have given rise to many other variants of hemoglobins Some of them have evolved from the same ancestral sequence that gives rise to the alpha chains One pair that is closest to alpha chain in distance is the hbb1 torma and hbb2 torma Although they seem to be quite close in distance to the alpha cluster biochemical and structural data from the database have identified the two proteins as beta chains 6 The representation of the tree may seem as if the hbb1 and hbb2 torma evolve from alpha chains but the tree actually shows that they and the alpha chains split from the same ancestral sequence derived from the other beta clusters Looking at the tree again one can also see certain isolated groups such as the epsilon chains that branch off close to the root between two subtypes That could be an indication of recombination Overall the tree seems to be reasonable and there is no one particular pair that seems misplaced The same training set is submitted to Block Maker for alignment and two blocks are generated that covers all sequences in the training set Table 3 EMOTIF maker uses


View Full Document

Stanford BIO 118 - Lecture Notes

Documents in this Course
Surrogacy

Surrogacy

14 pages

Load more
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?