© 2005 Nature Publishing Group *Department of Environmental Science, Policy, and Management,‡Department of Earth and Planetary Sciences, University of California, Berkeley, Berkeley, California 94720, USA.Correspondence to J.F.B.e-mail: [email protected]:10.1038/nrmicro1157COMMUNITY GENOMICS IN MICROBIAL ECOLOGY AND EVOLUTIONEric E. Allen* and Jillian F. Banfield* ‡Abstract | It is possible to reconstruct near-complete, and possibly complete, genomes of the dominant members of microbial communities from DNA that is extracted directly from the environment. Genome sequences from environmental samples capture the aggregate characteristics of the strain population from which they were derived. Comparison of the sequence data within and among natural populations can reveal the evolutionary processes that lead to genome diversification and speciation. Community genomic datasets can also enable subsequent gene expression and proteomic studies to determine how resources are invested and functions are distributed among community members. Ultimately, genomics can reveal how individual species and strains contribute to the net activity of the community.CLONE LIBRARYA collection of targeted DNA sequences, such as the 16S rRNA gene, most often derived from PCR amplification and subsequent cloning into a vector. Specifically, 16S rRNA gene clone libraries are often used in surveys of microbial diversity from environmental samples. Microbial genomics has, until recently, been confined to individual, isolated microbial strains. Genome sequence information for isolates from phylogeneti-cally diverse lineages has had a marked impact on our understanding of microbial physiology, biochemis-try, genetics, ecology and evolution. However, this approach is limited because we do not know how to cultivate most microorganisms1. Consequently, many questions about the roles of uncharacterized organisms in natural ecosystems remain.Our ability to survey the resident microbiota in a given community has been greatly expanded by various cultivation-independent methodologies, which include 16S rRNA gene CLONE LIBRARY collections and group-specific fluorescence in situ hybridization (FISH)2. Although the description and quantitation of the phylogenetic diversity of microbial communities is an important first step, linking these organisms to their ecological roles remains a significant challenge.In the natural environment, individual organisms do not exist in isolation. Rather, microbial communi-ties are dynamic CONSORTIA of microbial species popu-lations. The understanding of consortia function will benefit from genomic information from all coexisting members. This cannot be adequately addressed by focused isolation and individual genome sequencing efforts, as isolates might not be representative of the full genetic and metabolic potential of their associ-ated natural populations. Moreover, artificial cultiva-tion conditions often do not replicate those found in nature. Therefore, there is a compelling impetus to move beyond the culture-centric realm of microbial sequencing and to begin focusing sequencing efforts on microbial communities en masse.The analysis of genome sequence data that has been recovered directly from the environment is motivated by many objectives, which include the establishment of gene inventories and natural prod-uct discovery3,4. This approach is often referred to as metagenomics, which is defined as the functional and sequence-based analysis of the collective micro-bial genomes that are contained in an environmental sample3. Recent reviews have covered environmental and functional metagenomics3,5–8.Here we centre our discussion on the opportu-nities for analysis of ecological and evolutionary processes in natural microbial consortia using envi-ronmentally-derived genome sequence data. We NATURE REVIEWS | MICROBIOLOGY VOLUME 3 | JUNE 2005 | 489 FOCUS ON METAGENOMICS© 2005 Nature Publishing Group CONSORTIUM Physical association between cells of two or more types of microorganism. Such an association might be advantageous to at least one of the microorganisms.COVERAGEThe average number of times a nucleotide is represented by a high-quality base in the sequence data; full genome coverage is usually attained at 8–10X coverage.focus on ‘community genomics’, which emphasizes the analysis of species populations and their interac-tions, recognizing that both species composition and interactions change over time, and in response to envi-ronmental stimuli. This requires that the system under investigation can be sampled repeatedly, and defined well enough to enable in situ ecological studies and the analysis of adaptive processes. Genomics can resolve the genetic and metabolic potential of communities and establish how functions are partitioned in and among populations, reveal how genetic diversity is cre-ated and maintained, and identify the primary drivers of genome evolution and speciation.We draw upon experiences from our ongoing analyses of an extreme acid mine drainage (AMD) ecosystem9,10 BOX 1. We discuss the challenges that are associated with the assembly of near-complete, and potentially complete, genomes of uncultivated organisms, the documentation of genomic heterogene-ity in populations and the use of these data to enable comprehensive functional studies. Approaches to community genomicsCommunity genomics provides a platform to assess natural microbial phenomena that include biogeo-chemical activities, population ecology, evolutionary processes such as lateral gene transfer (LGT) events, and microbial interactions. Only by placing these processes in their environmental context can we begin to understand complex community structure and functions, and the evolutionary constraints that define and sustain them.Insights into the metabolic functions of uncultivated microorganisms have been facilitated by exploiting phylogenetic anchors that are contained in environ-mental libraries BOX 2. For example, in large-insert environmental libraries, contiguous DNA that flanks taxonomic-specific markers such as 16S rRNA genes can provide a glimpse into the genetic potential of sam-pled organisms11–15. Alternatively, random clones from shotgun libraries can be sequenced. In this review, we focus primarily on the shotgun sequencing method, which represents a relatively unbiased, non-directed approach to survey the structure and metabolic capacity of a
View Full Document