Unformatted text preview:

Quick intro to the MIT CompBio Group Who we are Mike Lin Ben Holmes Soheil Luke Bob Angela Feizi Ward Altshuler Mukul Yen Bansal Manolis Kellis Stata3 Stata4 Jason Ernst Chris Bristow Stefan Washietl Pouya Kheradpour Rachel Sealfon Irwin Daniel Jessica Jungreis Marbach Wu Louisa DiStefano Dave Sushmita Hendrix Roy Loyal Goff What we do Research synopsis Why biology in a computer science group Fundamental biological questions 1 Interpreting the human genome 2 Revealing the logic of gene regulation 3 Principles of evolutionary change Algorithmic machine learning methods Comparative genomics evolutionary signatures Regulatory genomics motifs networks models Epigenomics chromatin states dynamics disease Phylogenomics evolution at the genome scale Defining characteristics of our group Learn genomic rules exploit nature of problems Interdisciplinary collaborations high biology impact 1 Comparative genomics evolutionary signatures Protein coding signatures 1000s new coding exons Translational readthrough Overlapping constraints Non coding RNA signatures Novel structural families Targeting editing stability Structures in coding exons microRNA signatures Novel expanded miR families miR miR arm cooperation Sense anti sense switches Regulatory motif signatures Systematic motif discovery Regulatory motif instances TF miRNA target networks Single binding site resolution 2 Regulatory genomics circuits predictive models ENCODE modENCODE 4 year effort dozens of experimental labs Integrative analysis Systematic genome annotation Flagship NIH project Predictive models of gene regulation Infer networks Predict function Predict regulators Predict gene expression Initial annotation of the non coding genome from 20 to 70 Systems biology for an animal genome for the first time possible Students and postdocs are co first authors leadership roles New phylogenomic pipeline Bayesian formulation Generative model 3 Phylogenomics Bayesian gene tree reconstruction Two components of gene evolution 1 Family rate Fj gamma 2 Species specific rates Si normal i i Selective pressures on gene function Population dynamics of the species Length I Topology T Reconciliation R Alignment data D species level parameters Sequence likelihood Branch length prior Topology prior HKY model traditional Learned Fj Si distributions Birth Death process Jason Ernst 4 Vignette on Epigenomics Using chromatin information to understand human diseases Pouya Kheradpour Challenge of data integration in many marks cells Construct antibodies pull down chromatin ChIP seq tracks Histone tail modifications marks Dozens of chromatin tracks Histone tails Histones Understand their function Reveal their combinations Annotate systematically Our approach learn common chromatin states Explicitly model combinations Unsupervised approach probabilistic model Our approach Multivariate Hidden Markov Model HMM Enhancer Transcribed Region TSS DNA Unobserved Binarized chromatin marks Called based on a poisson distribution Most likely Hidden State H3K4me1 H3K4me3 H3K27ac H3K4me1 1 2 200 base pair interval Emission distribution is a product of independent Bernoulli random variables H3K4me3 H3K4me1 3 4 H3K36me3 H3K36me3 H3K36me3 H3K36me3 6 6 6 6 6 5 5 5 High Probability Chromatin Marks in State 0 8 0 8 0 7 1 H3K4me1 4 K27ac All probabilities H3K4me1 0 8 0 9 are learned from 2 5 K4me1 H3K4me3 the data 0 9 3 0 9 6 H3K4me3 H3K36me3 Binarization leads to explicit modeling of mark combinations and interpretable parameters 9 Ernst and Kellis Nat Biotech 2010 From chromatin marks to chromatin states Promoter states Transcribed states Active Intergenic Repressed Learn de novo significant combinations of chromatin marks Reveal functional elements even without looking at sequence Use for genome annotation Use for studying regulation dynamics in different cell types Ernst and Kellis Nat Biotech 2010 ENCODE Study nine marks in nine human cell lines 9 human cell types 9 marks H3K4me1 HUVEC Umbilical vein endothelial H3K4me2 NHEK Keratinocytes GM12878 Lymphoblastoid K562 Myelogenous leukemia HepG2 Liver carcinoma H4K20me1 NHLF Normal human lung fibroblast H3K36me3 HMEC Mammary epithelial cell CTCF HSMM Skeletal muscle myoblasts H1 Embryonic H3K4me3 H3K27ac H3K9ac H3K27me3 WCE RNA x 81 Chromatin Mark Tracks 281 combinations Brad Bernstein ENCODE Chromatin Group Learned jointly across cell types virtual concatenation State definitions are common State locations are dynamic Ernst et al Nature 2011 Chromatin states dynamics across nine cell types Predicted linking Correlated activity Single annotation track for each cell type Summarize cell type activity at a glance Can study 9 cell activity pattern across Multi cell activity profiles and their correlations Gene expression Chromatin States Active TF motif enrichment TF regulator expression Dip aligned motif biases HUVEC NHEK GM12878 K562 HepG2 NHLF HMEC HSMM H1 ON OFF Active enhancer Repressed Motif enrichment Motif depletion TF On TF Off Motif aligned Flat profile Chromatin state gene expression link enhancers and target genes TF motif enrichment TF expression reveal activators repressors Coordinated activity reveals activators repressors Enhancer activity Activity signatures for each TF Ex1 Oct4 predicted activator of embryonic stem ES cells Ex2 Gfi1 repressor of K562 GM cells Enhancer networks Regulator enhancer target gene Revisiting disease xx associated variants Disease associated SNPs enriched for enhancers in relevant cell types E g lupus SNP in GM enhancer disrupts Ets1 predicted activator Regulatory roles revealed for many studies Title Multiple loci influence erythrocyte phenotypes in the CHARGE Consortium Biological clinical and population relevance of 95 loci for blood lipids Genome wide association study meta analysis identifies seven new rheumatoid arthritis risk loci Genome wide meta analyses identify three loci associated with primary biliary cirrhosis Chinese Han population identifies nine new susceptibility loci for systemic lupus erythematosus Six new loci associated with blood low density lipoprotein cholesterol high density lipoprotein cholesterol or triglycerides in humans Genome wide association study of hematological and biochemical traits in a Japanese population A genome wide meta analysis identifies 22 loci associated with eight hematological parameters in the HaemGen consortium Meta analysis of genome wide association data identifies four new susceptibility loci for colorectal cancer Genome wide


View Full Document

MIT 6 006 - Lecture Notes

Documents in this Course
Quiz 1

Quiz 1

7 pages

Quiz 2

Quiz 2

12 pages

Quiz 2

Quiz 2

9 pages

Quiz 1

Quiz 1

10 pages

Quiz 2

Quiz 2

11 pages

Quiz 1

Quiz 1

12 pages

Graphs

Graphs

27 pages

Load more
Loading Unlocking...
Login

Join to view Lecture Notes and access 3M+ class-specific study document.

or
We will never post anything without your permission.
Don't have an account?
Sign Up

Join to view Lecture Notes and access 3M+ class-specific study document.

or

By creating an account you agree to our Privacy Policy and Terms Of Use

Already a member?