DOC PREVIEW
CMU BSC 03711 - Next-generation genomics: an integrative approach

This preview shows page 1-2-3-4 out of 11 pages.

Save
View full document
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
View full document
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience
Premium Document
Do you want full access? Go Premium and unlock all 11 pages.
Access to all documents
Download any document
Ad free experience

Unformatted text preview:

What types of genomic data sets are available?Abstract | Integrating results from diverse experiments is an essential process in our effort to understand the logic of complex systems, such as development, homeostasis and responses to the environment. With the advent of high-throughput methods — including genome-wide association (GWA) studies, chromatin immunoprecipitation followed by sequencing (ChIP–seq) and RNA sequencing (RNA–seq) — acquisition of genome-scale data has never been easier. Epigenomics, transcriptomics, proteomics and genomics each provide an insightful, and yet one-dimensional, view of genome function; integrative analysis promises a unified, global view. However, the large amount of information and diverse technology platforms pose multiple challenges for data access and processing. This Review discusses emerging issues and strategies related to data integration in the era of next-generation genomics.Box 1 | Collaborative projects and technology developmentWhy perform integrative genomic analysis?Figure 1 | Annotating the genome through detecting transcription-factor binding sites and histone-modification states. Promoters can be mapped by the localization of general transcription machinery and transcription factors (TFs), such as RNA polymerase II (RNAPII) or transcription initiation factor TFIID-associated factor 1 (TAF1), or by the localization of histone 3 lysine 4 trimethylation (H3K4me3). The bodies of transcribed genes and non-coding RNAs are marked by H3K36me3. Enhancers can be found by distal TF binding sites or by H3K4me1. This modification often coincides with H3K4me2, which has been shown to be necessary to recruit pioneering TFs to enhancer elements121. In addition, H3K4me1 sites overlap acetylated histone lysines, in agreement with acetylation islands outside promoters identifying functional enhancer elements122,123. Insulators are bound by CCCTC-binding factor (CTCF). Nucleosomes are shown as cylinders and example histone tails are in green. Various TFs are shown as coloured shapes. TFs bound to the insulator include CTCF and subunits of cohesin.Approaches to an integrative analysisFigure 2 | Identification of regulatory SNPs. The sequence of a transcription factor (TF) binding site is shown with the position of an A/T polymorphism. By integrating chromatin signatures of enhancers or TF binding sites with SNP data, SNPs falling with the region would be predicted as regulatory SNPs. These could then be correlated to changes in gene expression. H3K4me1, histone 3 lysine 4 monomethylation.Box 2 | ClusteringFigure 3 | Data visualization. The University of California-Santa Cruz (UCSC) Genome Browser is a tool for viewing genomic data sets. A vast amount of data is available for viewing through this browser. This example from the browser shows numerous data types in K562 cells from the ENCODE Consortium. A random gene was selected — katanin p60 subunit A-like 1 (KATNAL1) — that shows several points that can be identified by using this tool. The promoter has a typical chromatin structure (a peak of histone 3 lysine 4 trimethylation (H3K4me3) between the bimodal peaks of H3K4me1), is bound by RNA polymerase II (RNAPII) and is DNase hypersensitive. The gene is transcribed, as indicated by RNA sequencing (RNA–seq) data, as well as H3K36me3 localization. The gene lies between two CCCTC-binding factor (CTCF)-bound sites that could be tested for insulator activity. An intronic H3K4me1 peak (highlighted) predicts an enhancer element, corroborated by the DNase I hypersensitivity site peak. There is a broad repressive domain of H3K27me3 downstream, which could have an open chromatin structure in another cell type.Using large-scale data sets for integrative analysisBox 3 | Online tools for integrative analysisFigure 4 | Flow chart for data analysis. This example shows a workflow for the analysis of data from chromatin immunoprecipitation followed by sequencing (ChIP–seq). This analysis can be done by a bench scientist using current resources, and a similar strategy could be used for other types of next-generation sequencing data. Blue boxes show steps that can be performed using Galaxy. Integration or cross-sectioning of data can often be done in the University of California-Santa Cruz (UCSC) Genome Browser or by joining lists in Galaxy (purple box). Downstream steps, such as known motif analysis and Gene Ontology analysis, can be achieved with online or stand-alone tools (orange boxes). Galaxy can also be used to establish analytical pipelines for calling SNPs that could then be integrated into sequencing-based data, such as reads from ChIP–seq. CEAS, Cis-regulatory Element Annotation System; MACS, Model-based Analysis of ChIP–Seq; TSS, transcription start site.Future perspectivesDriven by technological advances, recent years have witnessed a deluge of new methods for interrogating different properties of a cell on a genome-wide scale. Each offers a unique, although complementary, view of genome organization and cellular function. It is expected that integrating these data sets will provide more bio-logical insights than using one data set alone. Thanks to the development of next-generation sequencing (NGS) technologies, the human genome has been mapped in many individuals; the challenge we now face is to under-stand this blueprint and to determine how errors lead to disease. The traditional approach of isolating indi-vidual genes and studying them in a model system is being rapidly replaced by data sets generated by both individual laboratories and large consortia using new high-throughput technologies.Although individual data sets — including genomic, epigenomic, transcriptomic and proteomic infor-mation — are highly informative, integrating them together offers the exciting potential to answer many long-standing questions. For example, what are the functional variants of gene-distal loci identified by association studies? Where are the regulatory elements? And to what extent does the activity of regulatory ele-ments contribute to disease phenotypes or to individual gene expression variation? Therefore, integrative analy-sis has become an essential part of experimental design in the era of next-generation genomics and is no longer the preserve of bioinformaticians. However, with the diversity of the high-throughput data and the seemingly endless analyses that can be performed, data integra-tion is posing challenges for both bench


View Full Document

CMU BSC 03711 - Next-generation genomics: an integrative approach

Documents in this Course
lecture

lecture

8 pages

Lecture

Lecture

3 pages

Homework

Homework

10 pages

Lecture

Lecture

17 pages

Delsuc05

Delsuc05

15 pages

hmwk1

hmwk1

2 pages

lecture

lecture

6 pages

Lecture

Lecture

10 pages

barnacle4

barnacle4

15 pages

review

review

10 pages

Homework

Homework

10 pages

Midterm

Midterm

12 pages

lecture

lecture

11 pages

lecture

lecture

32 pages

Lecture

Lecture

7 pages

Lecture

Lecture

17 pages

Lecture

Lecture

12 pages

Lecture

Lecture

21 pages

Lecture

Lecture

11 pages

Lecture

Lecture

28 pages

Homework

Homework

13 pages

Logistics

Logistics

11 pages

lecture

lecture

11 pages

Lecture

Lecture

8 pages

Lecture

Lecture

9 pages

lecture

lecture

8 pages

Problem

Problem

6 pages

Homework

Homework

10 pages

Lecture

Lecture

9 pages

Problem

Problem

7 pages

hmwk4

hmwk4

7 pages

Problem

Problem

6 pages

lecture

lecture

16 pages

Problem

Problem

8 pages

Problem

Problem

6 pages

Problem

Problem

13 pages

lecture

lecture

9 pages

Problem

Problem

11 pages

Notes

Notes

7 pages

Lecture

Lecture

7 pages

Lecture

Lecture

10 pages

Lecture

Lecture

9 pages

Homework

Homework

15 pages

Lecture

Lecture

16 pages

Problem

Problem

15 pages

Load more
Download Next-generation genomics: an integrative approach
Our administrator received your request to download this document. We will send you the file to your email shortly.
Loading Unlocking...
Login

Join to view Next-generation genomics: an integrative approach and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Next-generation genomics: an integrative approach 2 2 and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?