MicroRNA DetectionSlide 2OutlineInformation FlowGene RegulationmicroRNAmiRNA ProcessingSlide 8miRNAs Suppress Gene ExpressionmicroRNA DetectionSlide 11How to Classify Objects?Slide 13Slide 14Slide 15Random ForestSlide 17Comparative GenomicsStructural FeaturesConservation FeaturesDiscovery and validation of new miRNAsSlide 22Slide 23ResultsAccurate Prediction of Mature miRNAsmicroRNA Target RecognitionMotivation for looking at site accessibilityProof of PrincipleHow to use this fact?Test how good ∆∆G isComparison to other target predictorsReferencesOther figuresSlide 34MicroRNA DetectionKhan ShingCS374May 8, 2008Source: Science 2 September 2005: Vol. 309. no. 5740, p. 1518OutlineBiological background•Gene regulation•microRNAsmicroRNA detection•Random forests•Comparative genomicsmicroRNA target recognition•Site accessibilityInformation FlowSource: http://en.wikipedia.org/wiki/Central_dogma_of_molecular_biologyGene Regulation•Transcriptional regulation◦Enhancers, promoters, transcription factors, epigenetic modifications•Post-transcriptional regulation◦mRNA processing, small RNAs•Post-translational regulation◦Protein activation, inhibition, degradationSource: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.microRNA•RNA can fold like proteins: possess primary, secondary and tertiary structure•Secondary hairpin structure crucial to processing of small RNAsSource: Zamore, P.D. and Haley, B. 2005. Ribo-gnome: The big world of small RNAs. Science 309: 1519–1524.miRNA ProcessingSource: Zamore, P.D. and Haley, B. 2005. Ribo-gnome: The big world of small RNAs. Science 309: 1519–1524.miRNA ProcessingSource: Zamore, P.D. and Haley, B. 2005. Ribo-gnome: The big world of small RNAs. Science 309: 1519–1524.miRNAs Suppress Gene ExpressionmicroRNA DetectionStark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.Source: Leo Breiman, Random Forests, Machine Learning, v.45 n.1, p.5-32, October 1 2001.microRNA Detection•Machine learning approach◦Find characteristics that distinguish miRNAs◦Use these features to train a model•Random forests◦Collection of many independently constructed classification trees◦Each tree “votes” and the tallied votes yield a scoreSource: http://www.gmupolicy.net/its/incidentduration/image351.gifHow to Classify Objects?How to Classify?TrainingNode B Node CSource: http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htmRandom ForestN cases in training set, M input variables•Sample N cases at random, with replacement, from the original data. This sample will be the training set for growing the tree. •At each node, m variables (m << M) are selected at random out of the M and the best split on these m is used to split the node. The value of m is held constant during the forest growing. •Each tree is grown to the largest extent possible. There is no pruning.Source: http://www.jfsowa.com/figs/bintree.gifRandom Forest•Trained on RFAM data set of 60 cloned miRNAs and random negative set (250 putative miRNA hairpins) with a variety of features•Independently construct 500 treesSource: CS262 Lecture 17, Win07, BatzoglouComparative GenomicsSource: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.Structural FeaturesCompare the 60 cloned miRNAs in the RFAM database to random “miRNA like” hairpins (~760,000)Source: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.Conservation FeaturesSource: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.Discovery and validation of new miRNAsAlone, each feature does not provide enough discriminatory power, but trained into the model, ~4500 fold enrichmentDiscovery and validation of new miRNAs•Rank all 760,355 putative miRNAs according to this combined score•Finds 41 novel miRNA candidates•Validate by sequencing and other methodsSource: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.Source: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.Results•Antisense strand miRNAs•miRNA* sequencesSource: Stark A. et al. 2007. Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes. Genome Res. doi:10.1101/gr.6593807.Accurate Prediction of Mature miRNAsmicroRNA Target RecognitionKertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of site accessibility in microRNA target recognition. Nat. Genet. 39, 1278–1284 (2007).Source: Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of site accessibility in microRNA target recognition. Nat. Genet. 39, 1278–1284 (2007). Motivation for looking at site accessibility•Existing methods for finding miRNA targets rely mostly on sequence specificity•But miRNAs act as part of a protein complex. They have size and can be blocked by mRNA secondary structureSource: Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of site accessibility in microRNA target recognition. Nat. Genet. 39, 1278–1284 (2007). Proof of PrincipleSource: Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of site accessibility in microRNA target recognition. Nat. Genet. 39, 1278–1284 (2007). How to use this fact?•Develop an energy based score to rate miRNA-target interactions•Explain ∆G – free energy of molecular interactions•∆∆G – the difference between free energy gain of the system when an miRNA binds to its target and the free energy loss of unpairing the mRNA target sequence secondary structure.Source: Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of site accessibility in microRNA target recognition. Nat. Genet. 39, 1278–1284 (2007). Test how good ∆∆G isCorrelates well with repression in luciferase assays:Even better if flanking regions are included:Source: Kertesz, M., Iovino, N., Unnerstall, U., Gaul, U. & Segal, E. The role of site accessibility
View Full Document