DOC PREVIEW
Stanford BIO 118 - Lecture Notes

This preview shows page 1-2-3-26-27-28 out of 28 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 28 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

eMOTIF Maker: Nodally Awesome:Comparing Results of eMOTIF Maker withNeighbor-Joining TreesBiochemistry 218Douglas L. BrutlagLee Kozar6/6/02Pei-Hsien [email protected] trees are useful in determining the relationship among proteins andin grouping proteins into their correct family. Protein families have been helpful inelucidating the function and structure of new protein members. In principle, the treebuilding programs that are distance-based generate pairwise alignments of each sequenceagainst all other sequences in the set. The mutation distances between all pairs are thestored in a matrix. Two taxa are joined as neighbors if the pair has the least mutationaldistance. The optimal tree is finally generated after minimizing mutation distances ateach step (1).A quite different program from that of tree-building also makes use of alignmentof closely related proteins. eMOTIF maker takes these sequence alignments and returns aset of motifs with various degree of sensitivity and specificity. This is a way to discovermotifs that are conserved among a protein family (2). Using this group of motifs toperform a scan in the database returns hits that should include members used to generatethat motif or additional members containing that motif. The compiled result would varyaccording to the sensitivity and the specificity of the motif. Because of the similarity ofapproach used in both tree-building and motif-building, albeit for different purposes, thisproject would seek to do a proof-of-concept experiment to investigate how well theresults from these two programs match up.Out of the distance-based methods to build a tree, neighbor-joining proves to bevery efficient in generating the best tree for large data set (3). In addition, neighbor-joining does not require the data to be ultrameteric and produces less biased tree whengiven sequence data that have unequal evolutionary rate (4). These characteristics ofneighbor-joining make it the suitable tree-building method used in this experiment.In order to make motifs out of related proteins, ungapped alignment of theirsequences must be generated so the alignment can be input into eMOTIF maker. BlockMaker program is used in this experiment to produce these alignments (5). Block Makeris chosen because of its ease of use and manipulation of formatted result.The obtained blocks of sequence alignment are then put into eMOTIF maker andthe result is a graphical representation of motif enumeration, showing each motifpositioned according to its specificity on the y-axis and the number of training sequencesit covers on the x-axis (2). If one seeks for a motif that covers a certain number ofsequences, there is only one motif that can give the best specificity and that is the one thatlies lowest on the y-axis. If one seeks for a motif with a certain specificity, there is alsoonly one motif that gives the best coverage and that is the one that lies on the most righton the x-axis. These dominating motifs can then be connected by a line called the Pareto-optimal curve (2). Motifs lying on the Pareto-optimal curve are then used for subsequentmotif scan in this experiment.Two training sets, each with 100 sequences or more, are used to compare theneighbor-joining tree with the results of a motif scan. Motif scan done with a motif withthe highest specificity should have the least coverage and its hits should correspond to asmall cluster under few nodes in the tree. This small cluster would contain sequences thatare the most related to each other. Motif scan done with a low specificity motif shouldreturn high number of hits that correspond to sequences under more number of nodes.One of the two training set is derived from the globin family, heme-containingorthologous proteins, all of them are vertebrate proteins. The set composes mostly ofalpha chains, beta chains, and their variants with a few myoglobins. See Appendix for acomplete list of the set (6). The large number of alpha and beta chains should return treeswith nodes where most, if not all, of the sequences clustered according to the type ofchains.The second training set is derived from a subfamily of the serine proteases family– trypsin family with the serine active site. The set composes of mast cell proteases,trypsins, and various forms of venom serine proteases. See Appendix for a complete listof the set (7). This paralogous family has various proteins that although acts to cleaveproteins, do not share functions in the same context and would probably cluster accordingto their functions in context. There has been some difficulty in choosing members of thistraining set because each protein member of the family has diverged much. Even if theactive sites are very similar, the global sequence alignment is impossible because themajority of the sequences are too different form each other. This training set attempts toinclude members that are very similar in both sequence and function within a subgroupbut also to include three different, divergent functions of serine proteases.RESULTSWith the orthologous set as the input, the neighbor-joining method produces a treethat places most of beta chains and their variants as the outgroup to alpha and myoglobin.See Figure 1. All and only the myoglobins fall under one main node and so do the alphachains. The beta chains are dispersed and there is no one node under which all the betachains fall and they have given rise to many other variants of hemoglobins. Some of themhave evolved from the same ancestral sequence that gives rise to the alpha chains. Onepair that is closest to alpha chain in distance is the hbb1_torma and hbb2_torma.Although they seem to be quite close in distance to the alpha cluster, biochemical andstructural data from the database have identified the two proteins as beta chains (6). Therepresentation of the tree may seem as if the hbb1 and hbb2_torma evolve from alphachains but the tree actually shows that they and the alpha chains split from the sameancestral sequence derived from the other beta clusters.Looking at the tree again, one can also see certain isolated groups, such as theepsilon chains, that branch off close to the root between two subtypes. That could be anindication of recombination. Overall, the tree seems to be reasonable and there is no oneparticular pair that seems misplaced.The same training set is submitted to Block Maker for alignment and two blocksare generated that covers all sequences in the training set (Table


View Full Document

Stanford BIO 118 - Lecture Notes

Documents in this Course
Surrogacy

Surrogacy

14 pages

Load more
Download Lecture Notes
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?