DOC PREVIEW
UMD CMSC 828G - Methods for comparative metagenomics

This preview shows page 1-2-3 out of 10 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 10 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

BioMed CentralPage 1 of 10(page number not for citation purposes)BMC BioinformaticsOpen AccessResearchMethods for comparative metagenomicsDaniel H Huson*1, Daniel C Richter1, Suparna Mitra1, Alexander F Auch1 and Stephan C Schuster2Address: 1Center for Bioinformatics ZBIT, Tübingen University, Sand 14, 72076 Tübingen, Germany and 2310 Wartik Laboratories, PennState University, Center for Comparative Genomics, Center for Infectious Disease Dynamics, University Park, PA 1803, USAEmail: Daniel H Huson* - [email protected]; Daniel C Richter - [email protected]; Suparna Mitra - [email protected]; Alexander F Auch - [email protected]; Stephan C Schuster - [email protected]* Corresponding author AbstractBackground: Metagenomics is a rapidly growing field of research that aims at studying unculturedorganisms to understand the true diversity of microbes, their functions, cooperation and evolution,in environments such as soil, water, ancient remains of animals, or the digestive system of animalsand humans. The recent development of ultra-high throughput sequencing technologies, which donot require cloning or PCR amplification, and can produce huge numbers of DNA reads at anaffordable cost, has boosted the number and scope of metagenomic sequencing projects.Increasingly, there is a need for new ways of comparing multiple metagenomics datasets, and forfast and user-friendly implementations of such approaches.Results: This paper introduces a number of new methods for interactively exploring, analyzing andcomparing multiple metagenomic datasets, which will be made freely available in a new,comparative version 2.0 of the stand-alone metagenome analysis tool MEGAN.Conclusion: There is a great need for powerful and user-friendly tools for comparative analysisof metagenomic data and MEGAN 2.0 will help to fill this gap.BackgroundMetagenomics is a rapidly growing field of research thataims at studying uncultured organisms to understand thetrue diversity of microbes, their functions, cooperationand evolution, in environments such as soil, water,ancient remains of animals, or the digestive system of ani-mals and humans. Although it is clear that communitiesof microbes play a vital role in such systems, a moredetailed understanding is only beginning to emerge. Amain promise of metagenomics is that it will acceleratedrug discovery and biotechnology by providing new geneswith novel functions.Currently, the key approach used in metagenomics islarge-scale sequencing of environmental samples. Therecent development of ultra-high throughput sequencingfrom The Seventh Asia Pacific Bioinformatics Conference (APBC 2009)Beijing, China. 13–16 January 2009Published: 30 January 2009BMC Bioinformatics 2009, 10(Suppl 1):S12 doi:10.1186/1471-2105-10-S1-S12<supplement> <title> <p>Selected papers from the Seventh Asia-Pacific Bioinformatics Conference (APBC 2009)</p> </title> <editor>Michael Q Zhang, Michael S Wate rman and Xuegong Zhang</editor> <n ote>Research</note> </supplement>This article is available from: http://www.biomedcentral.com/1471-2105/10/S1/S12© 2009 Huson et al; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.BMC Bioinformatics 2009, 10(Suppl 1):S12 http://www.biomedcentral.com/1471-2105/10/S1/S12Page 2 of 10(page number not for citation purposes)technologies [1,2], which do not require cloning or PCRamplification, and can produce huge numbers of DNAreads at an affordable cost, has boosted the number andscope of metagenomic sequencing projects, see [3,4]. Theanalysis of such datasets is aimed at determining andcomparing the biological diversity and the functionalactivity of different microbial communities.Computationally, species identification relies on the useof reference databases or reference phylogenies that con-tain of sequences of known origin and gene function. Themost prominently used databases are the NR and NT data-bases [5]. Unfortunately, substantial database biasestoward model organisms present a major hurdle formetagenomic analysis, and in a typical metagenome data-set as much as 90% of the reads may exhibit no similarityto any known sequence. However, this problem is beyondthe scope of this paper. Early 2007, our group releasedand published the first publicly available, stand-aloneanalysis tool for metagenomic data, called MEGAN [6,7].We initially developed this tool to analyze the microbialcommunity present in a sample of mammoth bone [8].MEGAN takes as input the result of a BLAST [9] compari-son of a set of metagenomic reads against one or more ref-erence databases and produces as output a taxonomicalanalysis of the sample, obtained by assigning the reads todifferent nodes in the NCBI taxonomy using an "LCA-algorithm".As an exploration tool designed and optimized to run ona laptop, MEGAN complements other systems andresources for metagenome analysis, which are offered inthe form of databases, web portals and web services, suchas [10-14].MEGAN now has over 400 registered users working inmany different biological labs around the world. It is rou-tinely used at the Joint-Genome-Institute (JGI) both inquality control and also to provide initial analyses ofnewly sequenced datasets. Other users include researchersat the J.C. Venter Institute studying viral populations. In arecent publication [15], we demonstrate how to use thesoftware for meta-transcriptomics, as well.Increasingly, the emphasize of metagenome analysis isshifting from species and functional identification forindividual datasets toward comparative analysis. Thispaper addresses the latter issue and provides solutions toquestions such as: Given two or more metagenome data-sets, how similar or different are their taxonomical andfunctional profiles? Are observed differences statisticallysignificant? Have enough reads been sequenced, i.e. whatis the current "rate of discovery" as a function of thenumber of reads sequenced? In the following section, wewill discuss some new ideas for analyzing individualmetagenome datasets. Then, we will focus on new com-parative methods. Finally, we will illustrate the applica-tion of the methods in two comparisons, one comparingthe contents of a human gut [16] with the contents


View Full Document

UMD CMSC 828G - Methods for comparative metagenomics

Documents in this Course
Lecture 2

Lecture 2

35 pages

Load more
Download Methods for comparative metagenomics
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Methods for comparative metagenomics and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Methods for comparative metagenomics 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?