Stanford BIO 118 - Comparative Analysis of Methods for Multiple Sequence Alignment - D1139898

Home> Schools> Stanford University> Biology (BIO) > BIO 118> Comparative Analysis of Methods for Multiple Sequence Alignment

Stanford BIO 118 - Comparative Analysis of Methods for Multiple Sequence Alignment

School name Stanford University

Course Bio 118- Genetic Analysis of Biological Processes

Pages 30

Download Save

Unformatted text preview:

Page 1/30Comparative Analysis of Methods forMultiple Sequence AlignmentYong YangAbstractA total of five different multiple sequence alignment methods were analyzed inthis study. The purpose of this study is to evaluate each method’s ability to correctly alignhighly conserved motifs and secondary structure elements including α-helix and β strandin the input sequences, especially, to avoid putting gaps in those regions. Forty-sixprotein sequences from the thymidylate kinase family were used to test the methods.Among the methods tested, the Hidden Markov Model (HMM) method generated thebest alignment.IntroductionThe simultaneous alignment of a number of nucleotide or amino acid sequences toproduce optimal global or local alignment is of great utility in molecular biology.Multiple alignments are used to find motifs or conserved domains to characterize familiesof sequences. It can also be used to identify new members of existing protein families toassign function to new sequences. If the structure of some members of a family is known,then multiple sequence alignments can help predict the secondary and tertiary structure ofother family members. Multiple sequence alignment can also be used for phylogenetic orevolutionary analysis.Automatic multiple sequence alignment is an important topic in computationalbiology. The importance of multiple sequence alignment algorithms is clearly evidencedby the large number of programs available for this task. They can be divided into fourcategories: 1) Progressive global alignment of the sequences starting with alignment ofthe most similar pair of sequences and then adding more sequences to the alignment byaligning sequences to the consensus of the alignment already generated. ClustalW (41)and Pileup (6) belong to this category. 2) Iterative alignment methods that make an initialalignment of the input sequences and then try to iteratively refine the initial alignment toachieve better results. DIALIGN (25, 26) and SAGA (30) belong to this category. 3)optimal local alignments of common subsequences among input sequences. BLOCK-MAKER (9-11), MACAW (39), eMotif (13, 29), MOTIF and Profile-Maker belong tothis category. 4) Use of statistical methods and probabilistic models of the sequences.Hidden Markov Models (14, 17, 19), MEME (2-4, 8), Gibbs sampler (23, 28) belong tothis category.Multiple sequence alignments are usually inferred from primary sequence alone.In multiple sequence alignment, residues among a set of sequences are aligned together inPage 2/30columns. The residues in one column are supposed to be structurally or evolutionarilyrelated. So ideally a column of aligned residues should occupy similar three-dimensionalstructural positions or all residues in that column diverge from a common ancestralresidue. The gold standard for multiple structure alignments is the structural alignments.One difficulty in making sequence alignments reflect structural similarity is that gaps inthe alignment should be confined to regions not in the core. Alignments that reflectstructures in the core regions should have few if any gaps. The core region consists ofsecondary structure elements including α helix and β strand. So in a good multiplealignment there should be few gaps in α helix and β strand since gaps will usually disturbthe structure. Also there should be few gaps in those very conserved motifs or domainsfor a protein family.In this study, five different multiple sequence alignment methods were analyzedto evaluate their ability to correctly place gaps in α helix, β strand and highly conservedmotifs.Materials and MethodsTest Data SetForty-six thymidylate kinases from prokaryotes and eukaryotes were chosen asdata set to test multiple alignment methods. The Swiss-Prot ID for these forty-six proteinsare as follows: KTHY_AQUAE, KTHY_THEMA, KTHY_BACHD, KTHY_BACSU,KTHY_LACLA, KTHY_HALN1, KTHY_ECOLI, KTHY_YERPE, KTHY_HAEIN,KTHY_PASMU, KTHY_VIBCH, KTHY_BUCAI, KTHY_CAUCR, KTHY_DEIRA,KTHY_NEIMA, KTHY_NEIMB, KTHY_PSEAE, KTHY_CHLMU, KTHY_CHLTR,KTHY_CHLPN, KTHY_MYCGE, KTHY_MYCPN, KTHY_UREPA, KTHY_RICPR,KTH1_SULSO, KTHY_ARCFU, KTHY_THEAC, KTHY_XYLFA, KTHY_AERPE,KTHY_METTH, KTHY_METJA, KTHY_HELPJ, KTHY_HELPY, KTHY_CAMJE,KTHY_SYNY3, KTHY_PYRAB, KTHY_PYRHO, KTHY_CAEEL, KTHY_SCHPO,KTHY_HUMAN, KTHY_MOUSE, KTHY_YEAST, KTHY_VACCV, KTHY_VARV, ,KTH2_SULSO, KTHY_ASFB7.The Prosite motif for this family is PS01331: [LIV]-[LIVMGSTC]-[DET]-[RH]-[FYHCS]-x(2)-S-[GSTNP]-x-[AVC]-[FY]-[STANQ].The X-ray structure of Escherichia coli and yeast thymidylate kinase was solved.The PDB ID for these two structures are 4TMK and 3TMK respectively. The secondarystructure of Escherichia coli thymidylate kinase (Fig. 1)and yeast thymidylate kinase(Fig. 2) are very similar to each other. As you can see from the superimposition of 3Dstructures of these two proteins (Fig. 3), the 3D structure of these two kinases are verysimilar, which suggests that thymidylate kinase in prokaryote and eukaryotes may takesimilar structure.Page 3/30Multiple alignment programs testedFive different multiple alignment programs were tested in this study: ClustalW,PSI-BLAST, HMM, BLOCK-MAKER, Pileup. Please see Table 1 for details about thesefive programs.Fig. 1. Secondary structure of Escherichia coli thymidylate kinase (PDB ID: 4TMK).Fig. 2: Secondary structure of Yeast thymidylate kinase (PDB ID: 3TMK)Page 4/30Fig. 3. 3D Superimposition of Escherichia coli thymidylate kinase and yeast thymidylatekinase.Table 1. Multiple alignment programs used for comparative analysis in this study.ProgramNameSource ReferenceClustalW ClustalW 1.81 installed in Eli Lilly and CompanyBioinfo server.(41)HMM http://www.cse.ucsc.edu/research/compbio/HMM-apps/T99-tuneup.html(14, 17, 19)Pileup Pileup in GCG package installed in Eli Lilly andCompany Bioinfo serverA part ofGCG packageof sequenceanalysisprograms.Derived from(6)BLOCK-MAKERhttp://www.blocks.fhcrc.org/blockmkr/make_blocks.html (9-11)PSI-BLAST http://www.ncbi.nlm.nih.gov/BLAST/ (1)Page 5/30ResultsFive different multiple alignment programs listed in Table 1 were analyzed toevaluate each program’s ability to correctly place gaps in α helices, β strands and thosehighly conserved motifs identified by eMOTIF-SEARCH and eMATRIX-SEARCH.The structure of Escherichia coli thymidylate kinase were used as a gold standardfor evaluating the output alignment from each of the five multiple alignment program.There are totally eight α helices and four β strands in Escherichia coli

View Full Document


School:
Email:
New Password:
Confirm Password:

Stanford BIO 118 - Comparative Analysis of Methods for Multiple Sequence Alignment

Sign up for free to view:

Please select your school