1Computational Molecular Biology and GenomicsDannie DurandFall 2005Lecture 2OutlineA whirlwind review of molecular biologyAn overview of computational molecular biologyNew problems in genomicsGenes Encode ProteinsGTGCACCTGACTCCTGAG...V H L T P E...A gene is a DNA sequenceA protein is an amino acid sequenceA protein folds into a 3D structure…aggaggcctcgcctctcccagcatgggctggggctcctgtcccccactgtgtgtgcctggggcctggccaggactcccagtga…chromosomeDNAcellA gene is a locus on a chromosomeGenomes:The complete instruction setNeisseria gonorrhoeae Homo sapiensA prokaryotic genome A eukaryotic genomeGrowth of sequence data during the ’90’sCollins et al, Science, Oct 1998Sequences in GenBankwww.genomesonline.orgWhole Genome SequencesFirst whole genome sequence: H. influenzae, 19952292 whole genome sequences: 36 eukarya, 23 archaea, 233 bacteriaIn progress: 740 prokaryotic genomes, 532 eukaryotic genomes www.genomesonline.orgWhole Genome SequencingWhole Genome Sequencing Highlights(A Eukarya-centric View)1995 H. influenzae – 1stwhole genome sequence1997 Yeast – 1steukaryotic sequence 1998 Caenorhabditis elegans – 1stmulticellular organism2000 Fly, Arabidopsis thaliana –1stplant 2001 Human2002 Mouse, Ciona intestinalis, 2003 Caenorhabditis briggsae, Neurospora Crassa2004 Five more yeasts, silkworm, rat, C. merolae, tetraodon2005 DictyosteliumIn the pipeline: Chicken, zebrafish, fugu, rice, dog, cat, chimpanzee, more fruitflies….OutlineA whirlwind review of molecular biologyAn overview of computational molecular biologyNew problems in genomicsATGCACCTGACTCCTGAG...Gene sequencesComputational Molecular BiologyComputational analysis of a few genes:– Sequence analysis• Pairwise alignment, database searching• Multiple alignment•Motifs, HMMs– Reconstruct evolutionary history – Structure prediction and modelingOutlineA whirlwind review of molecular biologyAn overview of computational molecular biologyNew problems in genomicsGTGCACCTGACTCCTGAG...Gene sequencesGenomic sequencesComputational GenomicsComputational implications:– Need algorithms that scale up– Genomes don’t look the way we thought they did¾ revise models– New biological questions¾ new computational problems3The FantasyTGAAATAAACAACCAGGCAGCAGTTATTAACACGGGAACATGGCGGCCGCAGCCTGGGCTCCCGCGGCGGCGGCGG…Cell Function SimulatorWhole genome sequenceCell Simulator CompilerFrom Genes to OrganismsCellular pathways transcription of mating specific genesreceptor senses pheromone outside cellPheromone signaling pathwayRube Goldberg’s picture snapping machineExample: Pheromone signaling pathwaytranscription of mating specific genesreceptor senses pheromone outside cellFrom Genes to Organisms4• Predict – all genes– all gene products (protein, RNA)– regulatory motifs• Predict structure and function of individual components• Reconstruct the cellular networks– Regulatory pathways– Metabolic pathways– Signaling pathways …• Model cellular behaviorFrom Genes to OrganismsNew computational approaches– New, better algorithms– Use data in new ways • Comparative genomics– Genomic sequence– Gene content– Gene order• Combine different types of dataNew high throughput data sets– mRNA expression– Splice variants– Protein expression – Sub-cellular localization– Protein-protein interactions– Protein-DNA interactionsFrom Genes to OrganismsComputational Functional GenomicsHigh-thoughput functional assaysComputational support for • data acquisition• data analysisHigh-thoughput sequencingComputational support for • data acquisition• data analysisNew computational approaches– New, better algorithms– Use data in new ways • Comparative genomics– Genomic sequence– Gene content– Gene order• Combine different types of dataNew high throughput data sets– mRNA expression– Splice variants– Protein expression – Sub-cellular localization– Protein-protein interactions– Protein-DNA interactionsFrom Genes to OrganismsWhen are genes turned on?genesmRNAsDetermine the set of all genes being transcribed in a given cell type under particular conditionsAlternate splice forms:exon6exon1 exon2 exon3 exon5exon1 exon2 exon3 exon4DNA:mRNA:exon1 exon2 exon3 exon4exon6exon5exon1 exon2 exon3 exon4exon6exon5Determine the set of splice variants in a given cell type under particular conditions5Expressed Sequence Tags (ESTs)– small pieces of DNA sequence (usually 200 to 500 nucleotides long) – generated by sequencing either one or both ends of an expressed gene. Expressed Sequence Tags (ESTs)degradation of mRNA, synthesis of second DNA strandCATGACTCCTTGGCTAC...CCGAGTGCGGCATTTTTTGTACTGAGGAACCGATG...GGCTCACGCCGTAAAAAAdsDNACAUGACUCCUUGGCUAC...CCGAGUGCGGCAUUUUUUGTACTGAGGAACCGATG...GGCTCACGCCGTAAAAAAreverse transcriptasecDNACAUGACUCCUUGGCUAC...CCGAGUGCGGCAUUUUUUmRNAreverse primer3’ ESTforward primer5’ ESTExpressed Sequence Tags– Single-pass sequencing of “random” transcripts– Relatively low quality sequence– 5’ or 3’ end– Tissue specific– No guarantee• that all genes are represented• that all splice forms are represented5’ ESTsmRNA3’ ESTsESTs: molecular tags for genes.ESTs– fast way to capture the coding portion of the genome. (In eukaryotes, most of the genome does not contain protein coding genes. )– provide a crude measure of transcript abundance. However, rare transcripts may be missed.– provide a crude measure of splice variants (if at the 3’ or 5’ end of the gene).When are genes turned on?DNA arrays detect mRNA transcriptsmicroarraysDNA microarraysTargets: Each well contains a cDNA oligonucleotidecorresponding to a unique subsequence of a genecgtaacgctat6DNA microarrays DNA microarraysDown regulated in tumorDNA microarraysUp regulated in tumorDNA microarraysUnchangedExpression array dataclustered array dataunsorted array dataO. Alter, P. O. Brown and D. Botstein, PNAS 97 (18), 2000New computational approaches– New, better algorithms– Use data in new ways • Comparative genomics– Genomic sequence– Gene content– Gene order• Combine different types of dataNew high throughput data sets– mRNA expression– Splice variants– Protein expression– Sub-cellular localization– Protein-protein interactions– Protein-DNA interactionsFrom Genes to Organisms7Not all mRNA transcripts are translated into proteinsmRNA:transcriptionAmino acid sequence:translationRNA polymeraseDNA:promoter geneProtein Expression• Isolate the set
View Full Document