1GenomicsThe Human Genome Project Mapping and Sequencing the Genomes of Model Organisms Data Collection and Distribution Ethical, Legal, and Social Considerations Research Training Technology Development Technology TransferA Few Genome Resources NCBI Genome Resources UCSC Human Genome Browser Ensembl Human Genome ServerGenome Sequencing Progress NCBI Genome Sequence Repository All organisms Eukaryotic genomes Prokaryotic genomes Archaea genomes VirusesGenome SequencingFrom NCBI, 5/2001Human Genome Sequencing 2/11/2001 From NCBI2Human Genome Progress 2/11/2001Total sequence (kb)Non-redundant sequence (kb)Percentage of genomeFinished 1,140,365 1,040,372 32.50%Unfinished 3,547,899 1,951,344 61.00%Total 4,688,264 2,991,716 93.50%From NCBIMicrobial Genomes Published complete microbial genomes Microbial genomes and chromosomes in progress Genome Informatics Annotation and Analysis Data Handling Metabolic Reconstruction Comparative Genomics Functional GenomicsGenome Project Organization Cloning Mapping Sequencing Annotation AnalysisCloning and MappingCloning Large YAC’s 1 Mb BAC’s 100 - 200 KbIntermediate Cosmids Lambda clones Small Plasmids; M133Mapping Establishment of Guideposts Aids in Assembly Error Checking Useful in mapping of genetic disordersGenetic Maps Cytogenetic markers Linkage maps Polymorphic loci screened by PCR to determine inheritence patterns Produce linkage map with nearby lociPhysical Maps Radiation Hybrid/YACs/Cosmids Restriction Sites Sequence Tagged Sites 100 Kb resolution needed 30,000 STS’s Expressed Sequence Tags Detection PCR Hybridization FISH Fluoresecent in situ HybridizationHuman Genome STS Mapping Strategy STS Content Mapping Screen YAC’s by PCR Radiation Hybrid Mapping Screen RH Cell lines by PCR Genetic Mapping PCR Screening of polymorphic loci Combine above to produce an integrated mapMapping Resolution YAC mapping 1 Mb Radiation hybrid mapping 10 Mb Genetic map 30 MbGeneMap’98 Integrated Human Genetic Map Over 30,000 unique gene-based markers 100 Kb resolution http://www.ncbi.nlm.nih.gov/genemap98/4Map IntegrationHuman Chromosome 1 Genetic MapHuman Chromosome 1 Combination MapSequencingSequencing Methods Random Shotgun Ordered Shotgun Directed Primer Walking Direct genomic sequencingRandom Shotgun Sequencing Randomly shear or cut DNA into small pieces 2-4 Kb Clone into M13, pUC or some other sequencing vector Sequence the clones from both ends Rely on the computer to assemble the sequences into one (or as few as possible) contigs5Shotgun Sequencing Statistics Lander and Waterman equation poisson distribution Po = e-m probability that a base is not sequenced where m=sequence coverageH. influenza Sequencing For 1X random sequence coverage = 1.8 Mb P = 0.37 (63% of the bases are sequenced) To get > 99% of the bases sequenced 5X coverage = 8.74 Mb of sequence Po = e-5 = 0.0067 This coverage would leave approx. 128 gaps of about 100 bp in size From Science 269:496-512. 1995Ordered Sequencing Generate a set of large sequence clones in lambda phage May be subcloned from YACs or BACs as necessary End sequence the lambda clones and order the clones to produce a map of the genome Choose a minimal tiling path of the genome from the ordered lambda clonesOrdered Sequencing... Shear and subclone the lambda inserts that comprise the minimal tiling set into sequencing vectors Shotgun sequence and assemble each of these lambda inserts individually Assemble all sequences into one, contiguous genomeDirected Sequencing Process used for finishing following the shotgun sequencing phase Gap closure Use specific sequencing primers to extend appropriate clones into gap regions Use specific sequencing primers to sequence directly from genomic DNASequence Assembly6Assembly of Shotgun Fragments For H. influenzae (TIGR) 1.8 Mb 24,304 Sequence fragments were generated for the random assembly phase 11,631,485 bases Generated 140 contigs Assembled using the TIGR Assembler 30 hours of cpu timephred/phrap/consed Widely used programs for sequence: base calling (phred) assembly (phrap) editing (consed) Developed at the University of Washington Phil Green (phrap) Brent Ewing (phred) David Gordon (consed)Genome Annotation and AnalysisPattern MatchingSequence Annotation ORF identification Frameshift resolution Genome map construction Functional assignments Metabolic pathway assignment Metabolic pathway Reconstruction Comparative analysisAnnotation Tools Semi-automated Manual7MAGPIE Multipurpose Automated Genome Project Investigation Environment Terry Gaasterland et. al. http://genomes.rockefeller.edu/magpie/magpie.htmlAutomated Semi-automated analysis tool for microbial genome projectsMAGPIE ExampleNon-Automated Analysis and Prediction The Ureaplasma urealyticum genome database Run analysis tool Parse results Dump results into the database View results Manually annotateGenomic Sequence Database Data Storage Sequence Gene Map Annotation User Interface Web browser CustomizableThe Ureaplasma urealyticum Genome Project Uu - 751,719 bp http://genome.microbio.uab.edu/uu/uugen.htm Web-based genome analysis tool8Annotation Problems Problems with existing sequence databases Incomplete datasets Skewed datasets Incorrectly annotated records Annotations based on experimental vs. predicted data Nomenclature differences Transitive errors in gene function predictions Functional predictions for “hypothetical” genesMetabolic Pathway ReconstructionMetabolic Pathway Reconstruction Role assignment Extract metabolic pathways from genomes Navigation and analysis Pathway editing9Metabolic Assignments Amino acid Biosynthesis Biosynthesis of cofactors, prosthetic groups, and carriers Cell envelope Cellular processes Central intermediary metabolism Energy metabolism Fatty acid and phospholipid metabolism Purines, pyrimidines, nucleosides, and nucleotides Regulatory functions Replication Transcription Translation Transport and binding proteins Other categories, Unassigned
View Full Document